Clustered index is the type of index in which the records in the table are ordered by index field. When the table is clustered, the index itself becomes an index tree, with the leaf nodes containing the keys being the indexed fields and also containing all the remaining fields of the table.
Because the records can only be arranged in the index tree in a certain order, each table can only have a maximum of one clustered index. You create a clustered index as follows:
CREATE CLUSTERED INDEX index_name ON dbo.Tblname (Colname1, Colname2 …)
When a table has a clustered index , other indexes (nonclustered) will use the clustered index key as a pointer to point to the corresponding record (if the table does not have a clustered index, an internal RID value is used).
Clustered index does not require a unique (unique). But when it’s not unique, the index key is appended with a random 4-byte value to make sure the index nodes are still unique.
The purpose of this is to make the pointer in the other index always point to only one record, then the pointer will include the index key + 4-byte string appended.
Such additions increase the size of the clustered index as well as other indexes, so in most practical situations you should create a clustered index that is unique. In fact, by default a unique clustered index will be created when declaring the primary key.
The search by field has a clustered index that is more optimal than a non-clustered index because it bypasses the bookmark lookup step (since all data fields are already available at the node index).
Tips for selecting a field as a clustered index
Due to the above characteristics of clustered index, there are a few points to keep in mind when selecting a field as a clustered index in order to achieve optimal performance.
A candidate for clustered index needs to meet the following criteria:
Small size: Generally with any type of index, you should choose a small field to reduce the size of the index. For clustered indexes, this criterion is even more important, since its key is used in all non-indexed indexes of the table to point to the record. For example, a VARCHAR (100) field or an approximate data type like FLOAT may need to be reconsidered. It’s best to have an integer type field (INT or BIGINT) because searching by integer is always faster than searching by string. And although clustered indexes allow for multiple fields (complex indexes), you should only use one field, also for the reason that the index size is small.
Always increasing field: As the new value of the clustered index field always increases, new records will always be added to the bottom of the table. If this value changes any, new records can be inserted into the middle of the table. This results in fragmentation of data, ie logically successive records but not stored side by side (stored on different pages). Fragmentation makes the system have more access to read data, especially when it is necessary to retrieve a range of records.
Static field: The clustered index field should not be updated regularly, once it is present in the table, its value should be kept. When it is updated, the clustered index itself needs to be updated to arrange the records in the new order in the correct order, and at the same time the other nonclustered indexes must update as well so that the cursor must now contain the value. new. Updating the clustered index field is therefore very expensive and, if done regularly, also causes the clustered index to become fragmented.
IDENTITY column in many cases is very suitable for clustered index because it satisfies all the above requirements: small size (type INT or BIGINT), always increase, and static (once inserted, you pay less attention to its value and rarely need to update). You can first use the IDENTITY column as a clustered index, and then if you find it inappropriate, switch to another field.
Hopefully with this article, you will better understand clustered index and can choose for yourself the optimal method when indexing.