{"id":1343,"date":"2014-03-17T09:00:00","date_gmt":"2014-03-17T16:00:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/dataplatforminsider\/2014\/03\/17\/the-new-and-improved-cardinality-estimator-in-sql-server-2014\/"},"modified":"2024-01-22T22:49:00","modified_gmt":"2024-01-23T06:49:00","slug":"the-new-and-improved-cardinality-estimator-in-sql-server-2014","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2014\/03\/17\/the-new-and-improved-cardinality-estimator-in-sql-server-2014\/","title":{"rendered":"The New and Improved Cardinality Estimator in SQL Server 2014"},"content":{"rendered":"
One area where a lot of development work went into SQL Server 2014 is the area of query optimization where a crucial component got re-written in order to improve query access and predictability.<\/p>\n
To get a better idea about the significance of the change and the way how one should deal with it, let\u2019s first check what the component of cardinality estimation is responsible for.\u00a0<\/p>\n
When a query is executed for the first time, it is getting \u2018compiled\u2019. In opposite to a compilation of executables where we talk about binary compilation, the compilation of a SQL Statement is limited to figuring out the most efficient access to the data requested. This means, for example, the query engine needs to decide which index to pick in order to have the most efficient access to the data rows or, in case of table joins, the order of the join and the join methods need to be decided. At the end of the compilation there is a descriptive query plan which will be cached. During the execution phase the query plan is read from cache and interpreted. Since compilation of a query plan represents significant resource consumption, one usually is keen to cache these query plans.<\/p>\n
At a very high level, the Query Optimizer uses the following input (in order) during the compilation phase:<\/p>\n
There might be a few more inputs, but these are the main ones. The first step which often is decisive for the performance of the query is to estimate how many rows the query might deliver back based on the inputs above. It is not only estimated how many rows the query would deliver. In order to decide for the correct join method in e.g. a table join, one also needs to estimate how many rows would be delivered by each execution step and query branch accessing each of the tables involved in the join. This is done in the so called Cardinality Estimation (referred to as CE from now), which this article is about. The estimated number of rows will be the base for the next step of query optimization assuming that the estimated number of rows is correct. This means the choice of picking a certain index or the choice of a join order or join type is majorly impacted by the first step of estimating the cardinality of a query.<\/p>\n
If that estimation is plain wrong, one can\u2019t expect that the subsequent steps of query optimization will be able to derive an efficient plan, unless the query itself leans to a particular index like querying exactly one row by specifying the Primary Key or hardly any indexes are available to be used.<\/p>\n
In order to estimate the number of rows returned, CE requires the statistics over indexes and columns of tables as a reliable source of information to figure out how often a certain value of a parameter value, submitted with the query, is present. Hence if there are no statistics or plain wrong or stale statistics, the best CE logic will hardly be able to predict the right values of returning rows for a query.<\/p>\n
The role CE is playing in the process to generate a good query plan is important. One hardly can expect a great Query Execution Plan as a result when the CE would estimates would be completely off workload which required SQL Server code fixes circled around issues in Cardinality Estimation.<\/p>\n
What changed in SQL Server 2014?<\/b><\/p>\n
At a very high level, the CE process in SQL Server 2012 was the same through all prior releases back to SQL Server 7.0 (the release before SQL 2000). In recent years we tried to put a lot of fixes or QFEs in regards to query optimization (including cardinality estimation) under trace flag in order not to cause a general regression over all those releases due to changed behavior by the fix.<\/p>\n
With the release of SQL Server 2014, there were several good reasons to overhaul what was basically the CE as introduced 15 years ago. Our goals in SQL Server Development certainly were to avoid issues we so far experienced over the past 15 years and which were not fixable without a major redesign. However to state it pretty clearly as well, it was NOT a goal to avoid any regressions compared to the existing CE. The new SQL Server 2014 CE is NOT integrated following the principals of QFEs. This means our expectation is that the new SQL Server 2014 CE will create better plans for many queries, especially complex queries, but will also result in worse plans for some queries than the old CE resulted in. To define what we mean by better or worse plans, better plans have lower query latency and\/or less pages read and worse plans have higher query latency and\/or more pages read.<\/p>\n
Knowing that the CE has changed and purposefully was not overcautiously re-architected to avoid regressions, you need to prepare a little more carefully for a SQL Server 2014 upgrade if you will utilize the new Cardinality Estimator. Based on our experience so far, you need to know first:<\/p>\n
Activation\/De-Activation of the new CE<\/b><\/p>\n
As delivered, SQL Server 2014 decides if the new Cardinality Estimator will be utilized for a specific database based simply on the compatibility level of that database within the SQL Server 2014 instance. In case the compatibility level of a specific database is set to \u2018SQL Server 2014 (120)\u2019 as shown below, the new CE is going to be used.<\/p>\n
If the Compatibility level of a specific database is set to a lower value, like \u2018SQL Server 2012 (110)\u2019, the old CE is going to be used for the specific database.<\/p>\n As with other releases of SQL Server, upgrading your SQL Server instances in place or attaching existing databases from earlier releases of SQL Server will not result in a change of the compatibility level of a database. Hence by purely upgrading a SQL Server instance or attaching a database from SQL Server 2012 or earlier releases will not activate the new CE. One would need to change the compatibility level of such a database manually to \u2018SQL Server 2014 (120)\u2019 to activate the new CE. However toggling between the compatibility levels of 110 and 120 in case of SQL Server 2014 does have some other impact beyond the change of CE algorithms. E.g. the parallel insert functionality of SELECT INTO would be disabled by staying on the old compatibility levels as well.<\/p>\n Another alternative to enable\/disable the different CE algorithms is the usage of a trace flag. Even with the compatibility level of a database set to \u2018SQL Server 2014 (120)\u2019, the trace flag would enforce the usage of the old CE. In opposite to the compatibility level which is applied to a specific database, the trace flag, if used as startup trace flag, would affect the usage of the CE SQL Server instance wide. This behavior might not be desired when many different databases are run in one instance. However, for situations where one runs one user database per SQL instance or consolidates several databases of a type of application under one SQL server instance, it indeed might make sense to use the trace flag approach. In order to force the old CE algorithms, trace flag 9481 can be used as startup trace flag.<\/p>\n Why can query plans change with the new CE?<\/b><\/span><\/p>\n Calculate combined density of filters differently<\/b><\/p>\n In this section we refer quite a bit to index or column statistics and the specific data that is captured in those. For folks less familiar with how index or column statistics are structured and what is contained in those, it would be advisable to read the Microsoft TechNet article, \u201cStatistics Used by the Query Optimizer in Microsoft SQL Server 2008<\/a>\u201d before continuing.<\/p>\n One of the most significant changes in the new CE is the way how the CE calculates the selectivity of a combination of filters submitted with a query. In the old CE, the assumption was that the selectivity of the filters was independent, assuming any combination of values stored in the different columns the filters are applied to, would have the same probability of occurring. However very often reality is that combinations of values of different columns are not as independent. We can find masses of cases where values in different column within one table are not independent of each other. Think about a car brand like Toyota, Volkswagen, Ford, Mercedes, etc being represented in one column and the type like Corolla, Jetta, Focus or E-Class being represented in another column of the table. It immediately becomes clear that certain combinations can\u2019t happen since Volkswagen is not producing a car named \u2018Corolla\u2019.<\/p>\n Let\u2019s see at what the differences really are in the calculation of selectivity.<\/p>\n Let\u2019s assume the following indexes against a table like this:<\/p>\n
<\/a><\/p>\n