Friday, July 11, 2014

Index Query optimization Fragmentation and Rebuild





A proper index can improve the performance and a bad index can choke the performance.
 
One of the most important routes to high performance in a SQL Server database is the index. Indexes speed up the querying process by providing swift access to rows in the data tables.

Indexes are created on columns in tables or views. The index provides a fast way to look up data based on the values within those columns.

Create indexes on most columns in a table or a view. The exceptions are primarily those columns configured with large object (LOB) data types, such as image, text, and varchar(max).

Note: A table that has a clustered index is referred to as a clustered table. A table that has no clustered index is referred to as a heap.

Clustered Indexes

A clustered index stores the actual data rows at the leaf level of the index.

Nonclustered Indexes

Unlike a clustered indexed, the leaf nodes of a nonclustered index contain only the values from the indexed columns and row locators that point to the actual data rows, rather than contain the data rows themselves. This means that the query engine must take an additional step in order to locate the actual data.


A proper index can improve the performance and a bad index can choke the performance.




Observation of Clustered Index over Nonclustered Index

We have one query which is not using any index. On the same table there is already nonclustered index created, which is also not being used. Now when we created clustered index on the same table, our query suddenly started to use nonclustered index which was so far it is not using. The interesting part of this is that query is using nonclustered index when clustered index is created on the same.



/* */
CREATE TABLE [dbo].[Tab_Cls_NonCls](
[ID] [int] NOT NULL,
[First] [nchar](10) NULL,
[Second] [nchar](10) NULL
) ON [PRIMARY]
GO

/* Create Sample Table */
INSERT INTO [dbo].[Tab_Cls_NonCls]
([ID],[First],[Second])
SELECT 1,'First1','Second1'
UNION ALL
SELECT 2,'First2','Second2'
UNION ALL
SELECT 3,'First3','Second3'
UNION ALL
SELECT 4,'First4','Second4'
UNION ALL
SELECT 5,'First5','Second5'
GO



/* Create Nonclustered Index over Table */
CREATE NONCLUSTERED INDEX [IX_Tab_Cls_NonCls_NonClustered]
ON [dbo].[Tab_Cls_NonCls]
(
[First] ASC,
[Second] ASC
) ON [PRIMARY]
GO


/* Run following two queries together and observe the
result in by Enabling Actual Execution Plan (CTRL + M)
1st Query will use Table Scan
2nd Query will use Index Seek
*/
SELECT ID
FROM [Tab_Cls_NonCls]
WHERE First = 'First1' AND Second = 'Second1'
SELECT Second
FROM [Tab_Cls_NonCls]
WHERE First = 'First1' AND Second = 'Second1'
GO

It is clear from query that index applies to columns on which it is created. In our case as in WHERE condition we have same columns which are used in Index.

Query 1 – Does not use any index
Query 2 – Does nonclustered index seek



/* Create Clustered Index over Table */
CREATE CLUSTERED INDEX [IX_Tab_Cls_NonCls_Clustered]
ON [dbo].[Tab_Cls_NonCls]
(
[ID] ASC
) ON [PRIMARY]
GO

--Once again run above two same query and see the execution plan.

/* Run following two queries together and observe the
result in 1st Query will use Index Seek
2nd Query will use Index Seek
*/
SELECT ID
FROM [Tab_Cls_NonCls]
WHERE First = 'First1' AND Second = 'Second1'
SELECT Second
FROM [Tab_Cls_NonCls]
WHERE First = 'First1' AND Second = 'Second1'
GO

Query 1 – Does nonclustered index seek
Query 2 – Does nonclustered index seek

Let us go over our steps of whole exercise.
Step 1: We have one table and one nonclustered index.
Step 2: We ran Query 1 which does not use nonclustered index.
Step 3: We created clustered index over table.
Step 4: We ran Query 1 which now use nonclustered index.



The question is why this has happened? If Query can use nonclustered index why did it has to wait for clustered index to be created?

The reason for this is that every nonclustered index refers to clustered index internally. When clustered index is created on table it reorganizes the table in the physical order of the clustered index. When there is no clustered index created on table at that time all nonclustered index points to data in the table to retrieve the data, however once clustered index is created all the nonclustered indexes are reorganized and they point to clustered index. This effect is creating index seek operation on nonclustered index. In our example column on which clustered index is created is in SELECT clause and WHERE clause contains columns which are used in nonclustered index, which is creating the effect which we have observed.

 
Create Index on frequently used columns in T-SQL Code. Columns used in WHERE, ORDER BY and GROUP BY are good candidate for Indexes. Create Index on columns which are used in JOIN Condition.
CREATE NONCLUSTERED INDEX idx_EID ON EMP(ID)
CREATE CLUSTERED INDEX idx_DID ON EMP(DID)

Remove any un-necessary Indexes. As Index occupies hard drive space as well as it decreases performance of all the insert, updates, deletes to the table.
-- Unused Index Script
-- Original Author: Pinal Dave (C) 2011
SELECT TOP 25
o.name AS ObjectName
, i.name AS IndexName
, i.index_id AS IndexID
, dm_ius.user_seeks AS UserSeek
, dm_ius.user_scans AS UserScans
, dm_ius.user_lookups AS UserLookups
, dm_ius.user_updates AS UserUpdates
, p.TableRows
, 'DROP INDEX ' + QUOTENAME(i.name)
+ ' ON ' + QUOTENAME(s.name) + '.' + QUOTENAME(OBJECT_NAME(dm_ius.OBJECT_ID)) AS 'drop statement'
FROM sys.dm_db_index_usage_stats dm_ius
INNER JOIN sys.indexes i ON i.index_id = dm_ius.index_id AND dm_ius.OBJECT_ID = i.OBJECT_ID
INNER JOIN sys.objects o ON dm_ius.OBJECT_ID = o.OBJECT_ID
INNER JOIN sys.schemas s ON o.schema_id = s.schema_id
INNER JOIN (SELECT SUM(p.rows) TableRows, p.index_id, p.OBJECT_ID
FROM sys.partitions p GROUP BY p.index_id, p.OBJECT_ID) p
ON p.index_id = dm_ius.index_id AND dm_ius.OBJECT_ID = p.OBJECT_ID
WHERE OBJECTPROPERTY(dm_ius.OBJECT_ID,'IsUserTable') = 1
AND dm_ius.database_id = DB_ID()
AND i.type_desc = 'nonclustered'
AND i.is_primary_key = 0
AND i.is_unique_constraint = 0
ORDER BY (dm_ius.user_seeks + dm_ius.user_scans + dm_ius.user_lookups) ASC

SELECT object_name(i.object_id) as tableName, i.name as indexName
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats s ON i.object_id = s.object_id AND i.index_id = s.index_id AND s.database_id = db_id()
WHERE objectproperty(i.object_id,'IsUserTable') = 1 and i.index_id> 0
AND s.object_id IS NULL
AND i.is_Primary_Key = 0
AND i.is_unique_constraint = 0
AND i.is_unique = 0

DECLARE @dbid INT
SELECT @dbid = DB_ID(DB_NAME())
SELECT OBJECTNAME = OBJECT_NAME(I.OBJECT_ID),
INDEXNAME = I.NAME,
I.INDEX_ID
FROM SYS.INDEXES I
JOIN SYS.OBJECTS O
ON I.OBJECT_ID = O.OBJECT_ID
WHERE OBJECTPROPERTY(O.OBJECT_ID,'IsUserTable') = 1
AND I.INDEX_ID NOT IN (
SELECT S.INDEX_ID
FROM SYS.DM_DB_INDEX_USAGE_STATS S
WHERE S.OBJECT_ID = I.OBJECT_ID
AND I.INDEX_ID = S.INDEX_ID
AND DATABASE_ID = @dbid)
AND I.INDEX_ID > 0
ORDER BY OBJECTNAME,
I.INDEX_ID,INDEXNAME ASC

Smaller Index Key gives better performance than Index key which covers large data or many columns

Index on Integer Columns performs better than varchar columns.

Clustered Index must exist before creating Non-Clustered Index.

Clustered Index must be created on Single Column which is not changing and narrow in size. Best candidate is primary key.

Non-clustered Indexes increases performance of the query that returns fewer rows and rows has wide selectivity spectrum.

Each table must have one Clustered Index.

If column have low selectivity avoid creating Index on that column as it slow down the rows modification and system will not get benefit from Index.

Multiple Columns Index or Covered Index should be ordered as Most Selective column on left and gradually decreasing selectivity as they go right.

Use SORT_IN_TEMPDB option when table is created if tempdb is on different disk. This will increase the performance to create Index.

SORT_IN_TEMPDB Option For Indexes:

The SORT_IN_TEMPDB option directs the index creation processing to the tempdb database, instead of the database in use.

Create or rebuild an index, by setting the SORT_IN_TEMPDB option to ON you can direct the SQL Server Database Engine to use tempdb to store the intermediate sort results that are used to build the index.

Note: Make sure a dedicated disk is allotted to the tempdb database and it has sufficient disk space. When using the SORT_IN_TEMPDB option, DBA’s usually calculate the disk space requirement while doing capacity planning.

CREATE NONCLUSTERED INDEX AddIndexNmHere ON TableNm (ColumnName) WITH (SORT_IN_TEMPDB = ON)

 

 

The Database Engine first scans the data pages of the base table to retrieve key values and builds an index leaf row for each data row.

When SORT_IN_TEMPDB is set to OFF, the default, the sort runs are stored in the destination filegroup.
If a sort operation is not required or if the sort can be performed in memory, the SORT_IN_TEMPDB option is ignored.
If SORT_IN_TEMPDB is set to ON, there must be sufficient free space in tempdb to store the sort runs, and sufficient free space in the destination filegroup to store the final index structure. The sort runs contain the leaf rows of the index.
If SORT_IN_TEMPDB is set to OFF, the free space in the destination filegroup must be large enough to store the final index structure. The continuity of the index extends may be improved if more free space is available.
Rebuild Index frequently using ALTER INDEX and De-fragment Index to keep performance optimal for Indexes.


Filtered Index

Filtered Index is a new feature in SQL SERVER 2008. Filtered Index is used to index a portion of rows in a table that means it applies filter on INDEX which improves query performance, reduce index maintenance costs, and reduce index storage costs compared with full-table indexes.
When we see an Index created with some WHERE clause then that is actually a FILTERED INDEX.

Points to remember when creating Filtered Index:

·        They can be created only as Nonclustered Index.
·        They can be used on Views only if they are persisted views.
·        They cannot be created on full-text Indexes.
·        Update Query execute more faster if where clause have filtered index

CREATE NONCLUSTERED INDEX idx_NAME ON COUNTRY (NAME) WHERE COUNTRY_ID='INDIA'


A filtered Index is an optimized non clustered Index which is one of the great performance improvements in SQL SERVER 2008 reducing the Index storage cost and reduces maintenance cost.

-- Unused Index Script
-- Original Author: Pinal Dave (C) 2011
SELECT TOP 25
o.name AS ObjectName , i.name AS IndexName, i.index_id AS IndexID,
dm_ius.user_seeks AS UserSeek,dm_ius.user_scans AS UserScans, dm_ius.user_lookups AS UserLookups,
dm_ius.user_updates AS UserUpdates, p.TableRows, 'DROP INDEX ' + QUOTENAME(i.name)
+ ' ON ' + QUOTENAME(s.name) + '.' + QUOTENAME(OBJECT_NAME(dm_ius.OBJECT_ID)) AS 'drop statement'
FROM sys.dm_db_index_usage_stats dm_ius
INNER JOIN sys.indexes i ON i.index_id = dm_ius.index_id AND dm_ius.OBJECT_ID = i.OBJECT_ID
INNER JOIN sys.objects o ON dm_ius.OBJECT_ID = o.OBJECT_ID
INNER JOIN sys.schemas s ON o.schema_id = s.schema_id
INNER JOIN (SELECT SUM(p.rows) TableRows, p.index_id, p.OBJECT_ID
FROM sys.partitions p GROUP BY p.index_id, p.OBJECT_ID) p
ON p.index_id = dm_ius.index_id AND dm_ius.OBJECT_ID = p.OBJECT_ID
WHERE OBJECTPROPERTY(dm_ius.OBJECT_ID,'IsUserTable') = 1
AND dm_ius.database_id = DB_ID()
AND i.type_desc = 'nonclustered'
AND i.is_primary_key = 0
AND i.is_unique_constraint = 0
ORDER BY (dm_ius.user_seeks + dm_ius.user_scans + dm_ius.user_lookups) ASC
GO


Index Types

In addition to an index being clustered or nonclustered, it can be configured in other ways:
  • Composite index: An index that contains more than one column. In both SQL Server 2005 and 2008, you can include up to 16 columns in an index, as long as the index doesn’t exceed the 900-byte limit. Both clustered and nonclustered indexes can be composite indexes.
  • Unique Index: An index that ensures the uniqueness of each value in the indexed column. If the index is a composite, the uniqueness is enforced across the columns as a whole, not on the individual columns. For example, if you were to create an index on the FirstName and LastName columns in a table, the names together must be unique, but the individual names can be duplicated.
A unique index is automatically created when you define a primary key or unique constraint:
    • Primary key: When you define a primary key constraint on one or more columns, SQL Server automatically creates a unique, clustered index if a clustered index does not already exist on the table or view. However, you can override the default behavior and define a unique, nonclustered index on the primary key.
    • Unique: When you define a unique constraint, SQL Server automatically creates a unique, nonclustered index. You can specify that a unique clustered index be created if a clustered index does not already exist on the table.
  • Covering index: A type of index that includes all the columns that are needed to process a particular query. For example, your query might retrieve the FirstName and LastName columns from a table, based on a value in the ContactID column. You can create a covering index that includes all three columns.


Create Indexes with Included Columns
Describes: Add included (or nonkey) columns to extend the functionality of nonclustered indexes. By including nonkey columns, you can create nonclustered indexes that cover more queries.
This is because the nonkey columns have the following benefits:
They can be data types not allowed as index key columns.
 They are not considered by the Database Engine when calculating the number of index key columns or index key size.

Limitations and Restrictions
    Nonkey columns can only be defined on nonclustered indexes.
    All data types except text, ntext, and image can be used as nonkey columns.
Nonkey columns cannot be dropped from a table unless that table’s index is dropped first.
Nonkey columns cannot be changed, except to do the following:
Change the nullability of the column from NOT NULL to NULL.
            Increase the length of varchar, nvarchar, or varbinary columns.

Permissions
Requires ALTER permission on the table or view. User must be a member of the sysadmin fixed server role or the db_ddladmin and db_owner fixed database roles.


-- Creates a nonclustered index on the Person.Address table with four included (nonkey) columns.
-- index key column is PostalCode and the nonkey columns are
-- AddressLine1, AddressLine2, City, and StateProvinceID.
CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID);


Columns appearing in the WHERE, FROM, GROUP BY and ORDER BY should appear in the "main part" of the index.

Columns not otherwise included that appear in the SELECT or HAVING go in the UNCLUDE clause.

Technically - since the "main part" is what the B-tree is based on - by keeping the main part as skinny as you can, you get better perf, while avoiding the bookmark lookup with the INCLUDE stuff.




SELECT OBJECT_NAME(ind.OBJECT_ID) AS TableName,
ind.name AS IndexName, indexstats.index_type_desc AS IndexType,
indexstats.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) indexstats
INNER JOIN sys.indexes ind 
ON ind.object_id = indexstats.object_id
AND ind.index_id = indexstats.index_id
WHERE indexstats.avg_fragmentation_in_percent > 30
ORDER BY indexstats.avg_fragmentation_in_percent DESC



Fragmentation

Storing data non-contiguously on disk is known as fragmentation.

Internal Fragmentation:
When records are stored non-contiguously inside the page, then it is called internal fragmentation.
In other words, internal fragmentation is said to occur if there is unused space between records in a page.
This fragmentation occurs through the process of data modifications (INSERT, UPDATE, and DELETE statements) that are made against the table and therefore, to the indexes defined on the table.
    This unused space causes poor cache utilization and more I/O, which ultimately leads to poor query performance.

External Fragmentation:
When on disk, the physical storage of pages and extents is not contiguous.
When the extents of a table are not physically stored contiguously on disk, switching from one extent to another causes higher disk rotations, and this is called Extent Fragmentation.


How to detect Fragmentation: We can get both types of fragmentation using the DMV: sys.dm_db_index_physical_stats.

SELECT OBJECT_NAME(OBJECT_ID), index_id,index_type_desc,index_level,
avg_fragmentation_in_percent,avg_page_space_used_in_percent,page_count FROM sys.dm_db_index_physical_stats
(DB_ID(N'Adhoc_48hrsRpt'), NULL, NULL, NULL , 'SAMPLED')
ORDER BY avg_fragmentation_in_percent DESC


Along with other information, there are two important columns that for detecting fragmentation, which are as follows:
  • avg_fragmentation_in_percent: This is a percentage value that represents external fragmentation. For a clustered table and leaf level of index pages, this is Logical fragmentation, while for heap, this is Extent fragmentation. The lower this value, the better it is. If this value is higher than 10%, some corrective action should be taken.

  • avg_page_space_used_in_percent: This is an average percentage use of pages that represents to internal fragmentation. Higher the value, the better it is. If this value is lower than 75%, some corrective action should be taken.
Reducing fragmentation:
  • Reducing Fragmentation in a Heap: To reduce the fragmentation of a heap, create a clustered index on the table. Creating the clustered index, rearrange the records in an order, and then place the pages contiguously on disk.

  • Reducing Fragmentation in an Index: There are three choices for reducing fragmentation, and we can choose one according to the percentage of fragmentation:
    • If avg_fragmentation_in_percent > 5% and < 30%, then use ALTER INDEX REORGANIZE: This statement is replacement for DBCC INDEXDEFRAG to reorder the leaf level pages of the index in a logical order. As this is an online operation, the index is available while the statement is running.
    • If avg_fragmentation_in_percent > 30%, then use ALTER INDEX REBUILD: This is replacement for DBCC DBREINDEX to rebuild the index online or offline. In such case, we can also use the drop and re-create index method.
    • (Update: Please note this option is strongly NOT recommended)Drop and re-create the clustered index: Re-creating a clustered index redistributes the data and results in full data pages. The level of fullness can be configured by using the FILLFACTOR option in CREATE INDEX.


-- Index fragmentation in a database - QUICK SYNTAX
SELECT * FROM sys.dm_db_index_physical_stats(DB_ID('user_DB'),
                                                    NULL,NULL,NULL,NULL);

USE user_DB;
-- SQL Server 2005/2008 Index Fragmentation Report
-- SQL Server DMV - Dynamic Management View

SELECT schema_name(t.schema_id)                           AS [Schema],
       object_name(ps.object_id)                          AS [Table],
       i.name                                             AS [Index],
       ps.Index_type_desc                                 AS IndexType,
       convert(TINYINT,ps.avg_fragmentation_in_percent)   AS [AvgFrag%],
       convert(TINYINT,ps.avg_page_space_used_in_percent) AS [AvgSpaceUsed%],
       ps.record_count                                    AS RecordCnt,
       ps.fragment_count                                  AS FragmentCnt
FROM   sys.dm_db_index_physical_stats(db_id(db_name()),
       NULL,NULL,NULL,'DETAILED') ps -- Faster option: SAMPLED
       INNER JOIN sys.indexes i
       ON ps.object_id = i.object_id
       AND ps.index_id = i.index_id
       INNER JOIN sys.tables t
       ON ps.object_id = t.object_id
WHERE    t.is_ms_shipped = 0
ORDER BY [Schema],  [Table],  [Index]



-- SQL Server 2005 and on fragmentation queries based on sys.dm_db_index_physical_stats
USE user_DB;
GO

-- DETAILED fragmentation info on table
SELECT DB=DB_NAME(database_id), [Object]=OBJECT_NAME (OBJECT_ID),
       * FROM sys.dm_db_index_physical_stats
       (DB_ID(N'AdventureWorks2008'),
        OBJECT_ID(N'Sales.SalesOrderHeaderSalesReason'), NULL, NULL , 'DETAILED');
GO


-- Fragmentation info on table
SELECT DB=DB_NAME(database_id), [Object]=OBJECT_NAME (OBJECT_ID),
       * FROM sys.dm_db_index_physical_stats
       (DB_ID(N'AdventureWorks2008'),
        OBJECT_ID(N'Sales.SalesOrderHeaderSalesReason'), NULL, NULL , NULL);
GO

-- All fragmentation info
SELECT DB=DB_NAME(database_id), [Object]=OBJECT_NAME (OBJECT_ID),
      * FROM sys.dm_db_index_physical_stats
       (DB_ID(N'AdventureWorks2008'),
        NULL, NULL, NULL , NULL);
GO

-- SQL Server 2000/8 index fragmentation for all indexes of a table
use user_DB
dbcc showcontig('dbo.drop_ADH_TRACE_OCT_16') with all_indexes

------------
-- SQL Server 2000/8 version - DBCC SHOWCONTIG for index fragmentation
------------

USE user_DB;

SELECT   'DBCC SHOWCONTIG (' + convert(VARCHAR,o.id) + ',' +
convert(VARCHAR,i.indid) + ') -- ' + object_name(o.id) + -- table name
          '.' + i.name -- index name
FROM     sysobjects o
         JOIN sysindexes i
           ON (o.id = i.id)
WHERE    o.xtype = 'U'
         AND i.rows > 0
ORDER BY object_name(o.id),
         i.indid
GO



Reference