Friday, April 15, 2011

SQL Index - Clustered Index and Non-Clustered Index


An index is a database object that, when created on a table, it can provide faster access paths to data and can facilitate faster query execution. Indexes are used to provide SQL Server with amore efficient method of accessing the data. Instead of always searching every data page in a table, an index facilitates retrieving specific rows without having to read a table’s entire content.

By default, rows in a regular un-indexed table aren’t stored in any particular order. A table in an order less state is called a heap. In order to retrieve rows from a heap based on a matching set of search conditions, SQL Server would have to read through all the rows in the table.

When an index is created, its index key data is stored in a B-tree structure. A B-tree structure starts with a root node, which is the beginning of the index. This root node has index data that contains a range of index key values that point to the next level of index nodes, called the intermediate leaf level. The bottom level of the node is called the leaf level. The leaf level differs based on whether the actual index type is clustered or non-clustered. If it is a clustered index, the leaf level is the actual data pages itself. If a non-clustered index, the leaf level contains pointers to the heap or clustered index data pages. A clustered index determines how the actual table data is physically stored. You can only designate one clustered index.

There are two types of indexes:

1. Clustered Indexes

2. Non clustered indexes

Clustered Index

A clustered index determines the physical order of the data in a table. Database Engine allows the creation of a single clustered index per table, because the rows of the table cannot be physically ordered more than one way. A clustered index is built by default for each table, for which you define the primary key using the primary key constraint.

When an index is created, its index key data is stored in a B-tree structure. The top level of the clustered index B-tree is known as the root node, the bottom level nodes are known as leaf nodes, and all nodes in between the root node and leaf nodes are collectively referred to as intermediate nodes. In a clustered index, the leaf nodes contain the actual data rows for a table, and all leaf nodes point to the next and previous leaf nodes, forming a doubly linked list. The clustered index holds a special position in SQL Server indexing: because its leaf nodes contain the actual table data, there can only be one clustered index defined per table.

                                            Figure: Clustered index B-tree structure


Let’s take a look at a visualization of what a B-Tree looks like for a clustered index.

Non Clustered Index

A non-clustered index has the same index structure as a clustered index, with two important differences:

• A non-clustered index does not change the physical order of the rows in the table.

• Unlike clustered indexes, however, each leaf node in a non-clustered index contains the non-clustered key value and a row locator.

The physical order of rows in a table will not be changed if one or more non-clustered indices are defined for that table. For each non-clustered index, Database Engine creates an additional index structure that is stored in index pages. Non-clustered indexes are based on order of the data, but do not physically sort the data.

If a table has a clustered index, all non-clustered indexes defined on the table automatically include the clustered index columns as the row locator. If the table is a heap, SQL Server creates row locators to the rows from the combination of the file identifier, page number, and slot on the page.

                                             Figure: Non-clustered index B-tree structure

Let’s take a look at a visualization of what a B-Tree looks like for a non-clustered index