Wednesday, November 22, 2006

Recompile

There has been some talk over the last couple of weeks about recompiling on sql server and the tone has been a very negative one. So I thought I would touch base on the issue of recompiling and the positive as well as the negative side.

Let's start with the meaning of recompile. Many developers hear this term and assume that it is bad. Based on their background that is understandable. Recompiling a C/C++ program can be a pain, take a long time, and if the changes are significant, can introduce new problems. Recompiling a stored procedure in SQL server however, is much different and can often be a good thing.

Recompiling a stored procedure is simply the server engine reexamining the query plan based on the information available at that time. Let me explain. When a stored procedure is first called the server looks at all of the relevant information contained in the stored procedure. Things like what tables are involved, how big are those tables , are there joins that need to take place and what type of join is best and things of that nature. Once the plan is decided upon it is cached in memory and stays there until it expires. This process is explained very well by Alexander Chiqrik and his article at Database Journal. Alexander correctly states;

The stored procedure cache is an area of memory where SQL Server keeps the compiled execution plans. Once a stored procedure has been executed, the execution remains in memory, so the subsequent users, each with a different execution context (including parameters and variables) can share one copy of the procedure in memory. SQL Server 2000 has one unified cache, where it stores data pages with the stored procedures and queries plans. Because SQL Server 2000 can dynamically allocate more memory when it is needed, the execution plan can stay in the cache as long as it is useful. However, if there is not enough memory for the current SQL Server work, the older plans could be deleted from the cache to free up memory.

Each execution plan has an associated cost factor that indicates how expensive the structure is to compile. The more expensive the stored procedure is to compile, the larger the associated cost factor it will have, and vice versa. Each time the stored procedure is referenced by a connection, its cost factor is incremented. Therefore, a cached plan can have a big cost factor when the object is referenced by a connection very frequently or when recreation of the execution plan is very expensive. The lazywriter process is responsible for determining whether to free up the memory the cache is using, or keep the plan in cache. This process periodically scans the list of objects in the procedure cache. If the cost of a buffer is greater than zero when the lazywriter inspects it, the lazywriter decrements the cost factor by one.

Every time a cached plan is reused, its cost reverts to its original value. The lazywriter process deletes the execution plan from the cache when the following three conditions are met:

  • The memory is required for other objects and all available memory is currently in use.
  • The cost factor for the cached plan is equal to zero.
  • The stored procedure is not currently referenced by a connection.

The frequently referenced stored procedures do not have their cost factor decremented to zero and are not aged from the cache. Even though the plan's cost factor will be equal to zero, it will stay in cache unless memory is required for other objects.

The bottom line is, that if it is being used and needed your plan will stay in cache and be reused when it is needed. Now for the downside of that. The cost to recompile a query plan can often be much less than the cost of the execution of the plan itself. When you have large tables with large or complex joins the time it takes to recompile is very minute when compared to the cost of using an out of date query plan. A cached query plan can not take advantage of new information or statistics. One of the best examples is one we ran across where because a query plan wasn't recompiled but run over and over again with larger and larger sets of data, the lack of a recompile cost us dearly in performance.

Basically we were joining a couple of tables and the dataset that was being used was small, based on a small date range select. Because of this, the DB engine chose the query plan and type of join that was best for that situation. The plan was then cached and was called over and over as it should be. The problem with that was the data set changed in size based on a changing date range that increased the dataset dramatically. So now the server was still using the query plan that was choosing a type of join based on a small dataset and the query performance came to a stand still. The fix of course was to recompile the procedure each time it was called so that it could evaluate the size of the dataset being joined and then chose the correct type of join to use in the query plan.

So I guess my point is simply this. Don't assume that recompiling stored procedures is a bad thing. If the DBA starts yelling about recompiles, make sure you do not just blindly try and fix "the problem" it may actually be the solution. Look into what they are saying and determine why the recompile is happening and the purpose it servers.

On a side note, there is a situation that we have come across that causes a recompile every time and in most cases it is not needed. The cause of this is the set option concat_null_yields_null. This can be avoided by using the isnull function to concatenate strings.

Quote of the day:
Doing linear scans over an associative array is like trying to club someone to death with a loaded Uzi.

Larry Wall

No comments: