In the previous part of the blog post series you’ve seen that the query optimizer examines query predicates in order to analyze statistics for useful indexes in the execution plan decision process. Since indexes are quite important to the query optimization process, the first design tip is to consider WHERE, JOIN, MERGE, ORDER BY and GROUP BY clauses in your application queries to determine the most suitable columns for indexing in your tables.

Use narrow and arithmetic data type indexes

Taking into consideration the way that SQL server stores and retrieves data, it is quite reasonable to maximize your performance if you can fit as many rows as possible to a single index page. This would improve IO performance, DB caching and storage allocation space since fewer pages would have to be read, stored and cached by the system.

To achieve fewer pages per index you have to decrease index row size, meaning that you would use as fewer columns as possible and as smaller column types as possible. An index page in MS SQL is always 8KB long. If you use a single column index of  INT type (4 bytes), an index page could store  more than 2000 rows as opposed to a CHAR(200) type (200 bytes) that could less than 40 rows. In a situation were a table has 3500 rows the INT column index would allocate approximately 2 index pages while the CHAR(200) would allocate 88!

System CPU performs arithmetic and not string operations, thus it needs to convert strings to something understandable in order to process it. Have this in mind when designing your index keys strategy since an arithmetic type index key like INT is usually faster than a string type key like CHAR, although you can configure CHAR(4) to have the same size as INT.

Column order matters

In a composite index key,  column order matters. When you create an index on two columns (A,B), the index is sorted on column A and then sub-sorted on column B. If you execute a query with a predicate on column B the query optimizer won’t benefit from the index much, because the index has been constructed with column A, as the first key column, although column B is part of the key.

Because useful indexes are the ones with high selectivity, it is best to set the most distinctive column first, followed by the next distinctive column, etc.

To better understand the above arguments, consider the following example; Table application consists of two columns: app_id and computer_id. The app_id column has a higher selectivity than computer_id and a non-clustered index IX_APP_ID_COMPUTER_IX has been created on app_id (first) and computer_id.

GO
CREATE INDEX IX_APP_ID_COMPUTER_ID ON 
[application] (app_id ASC, computer_id ASC); GO GO SELECT [app_id],[computer_id] FROM [application] WHERE [computer_id] = 5000025; GO

image_thumb_44

The WHERE clause caused the scan of the entire index because the first column defined as index key was app_id and not computer_id. If you filter on a subset of app_id column you get an index seek instead of an index scan

SELECT [app_id],[computer_id] FROM [application]
WHERE [app_id] > 241