Friday, December 26, 2008

SQL Server - Max Degree of Parallelism (MAXDOP)

When SQL Server runs on a computer with more than one microprocessor/CPU or on SMP (a computer-architecture where two or more identical processors can connect to a single shared main memory and I/O and can perform the same functions. In case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors), it detects the best degree of parallelism, that is, the number of processors employed to run a single statement, for each parallel plan execution. You can use the max degree of parallelism option to limit the number of processors to use in parallel plan execution. In other words, the max degree of parallelism setting governs the maximum number of processors that a particular query statement or index operations (index data definition language) can utilize at run time.
The default value of 0 uses all available processors. Set max degree of parallelism to 1 to suppress parallel plan generation (serial plan only). Set the value to a number greater than 1 (up to a maximum of 64) to restrict the maximum number of processors used by a single query execution. If a value greater than the number of available processors is specified, the actual number of available processors is used. If the computer has only one processor, the max degree of parallelism value is ignored.

What all benefit from parallel execution plan:
· Complex/Long running queries
– During query optimization, SQL Server looks for queries that might benefit from parallel execution. It distinguishes between queries that benefit from parallelism and those that do not benefit, by comparing the cost of an execution plan using a single processor versus the cost of an execution plan using more than one processor and uses the cost threshold for parallelism (By default 5, this value of cost threshold for parallelism determines which queries are considered short, and they should therefore be run using serial plans) value as a boundary point to determine short or long query. In a parallel query execution plan, the INSERT, UPDATE, and DELETE operators are executed serially. However, the WHERE clause of an UPDATE or a DELETE statement, or the SELECT part of an INSERT statement may be executed in parallel. The actual data changes are then serially applied to the database.
· Index data definition language (DDL) – Index operations that create or rebuild an index (REBUILD only, not applicable to REORGANIZE), or drop a clustered index and queries that use CPU cycles heavily are the best candidates for a parallel plan. For example, joins of large tables, large aggregations, and sorting of large result sets are good candidates.
· Other Operations – Apart from the above, this option also controls the parallelism of DBCC CHECKTABLE, DBCC CHECKDB, and DBCC CHECKFILEGROUP.
The degree of parallelism value is set at the SQL server instance level and can be modified by using the sp_configure system stored procedure (command shown below). You can override this value for individual query or index statements by specifying the MAXDOP query hint or MAXDOP index option. Note that this can be set differently for each instance of SQL Server. So if you have multiple SQL Server instances in the same server, it is possible to specify a different Maximum DOP value for each one.

--The max degree of parallelism option is an advanced option
--and let you set only when show advanced options is set to 1
sp_configure 'show advanced options', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
--configuring to use 8 processors in parallel
--setting takes effect immediately (without restarting the MSSQLSERVER service)
sp_configure 'max degree of parallelism', 8;
GO
RECONFIGURE WITH OVERRIDE;
GO

By default, when SQL is installed on a server, the parallelism setting is set to 0 meaning that the optimizer can utilize all the available processors to execute an individual query. This is not necessarily the most optimal setting for the application and the types of queries it is designed to support. It is therefore necessary for project teams to examine the impact of parallelism on query performance and server stability and make a considered choice as discussed below.
· Scenario 1 – For OLTP application, a typical setting is 1 would help. The reason for this is that in an OLTP environment, most of the queries are expected to be point queries which address one or a relatively small number of records. Such queries do not need parallelized processing for efficient execution. If there are specific queries which have a need for a setting greater than 1, then the source code needs to be examined to see if a MAXDOP hint can be availed.
· Scenario 2 – For OLAP application, the setting should typically be default 0 (up to 8 processors) or be greater than 1, because each queries, such application will use, will typical target thousands of, millions of records and also there might a scenario when you drop the index before ETL operation and re-create it once refreshed data is uploaded in typical data warehousing application. There will definitely be performance advantages in using multiple processors to do these works in parallel fashion.
Note: Using a setting of 0 in these applications is not recommended, especially when there are more than 8 processors in order to keep the coordination costs, context switching down to manageable levels. It is typical to start with a value of 4 and experiment with the reporting queries to see if this needs to be adjusted upwards.

Please note, using higher value for his setting means using more processors, in turn it means managing more threads, requires more cache synchronization and more context switching। So it’s recommended to test and evaluate your particular scenario in your particular environment before changing the default value in production server.

Further information can be found in:
· Max Degree Of Parallelism
http://technet.microsoft.com/en-us/library/ms181007.aspx
· Parallel Query Processing
http://msdn.microsoft.com/en-us/library/ms178065(SQL.90).aspx
· Parallel Index Operations
http://msdn.microsoft.com/en-us/library/ms191292(SQL.90).aspx

Thursday, December 18, 2008

Microsoft plans for SQL Server 2010 (codename "Kilimanjaro")

After SQL Server 2008, Microsoft plans for SQL Server 2010 codename “Kilimanjaro” which has been slated for release in first half of 2010. The new version of SQL Server will be focused on business intelligence and will further enrich SQL Server’s BI (business intelligence) capabilities while providing a robust and scalable data platform, capable of supporting the largest BI deployments. Kilimanjaro will provide the foundation for Microsoft's first data warehouse appliance, code-named Madison, and a BI tool called Gemini that's being designed to bring a broader range of employees into the BI fold(These are new capabilities and not a rewrite, rework, or upgrade). First CTP version is scheduled to be released within next 12 months.

Gemini
Gemini will focus on delivering new capabilities in the area of managed self-service analysis capabilities (Self service analysis and Self-service reporting) through "deep integration" with Microsoft's SharePoint Server and Excel. It will allow information workers to better “slice and dice data and create their own BI (business intelligence) applications and assets to share and collaborate on from within the familiar, everyday Microsoft Office productivity tools they already use.” Another key feature of Gemini is in-memory BI, which analyzes large amounts of data in memory in order to speed performance.

Madison
In design of Madison, Microsoft will use data warehouse technology from its recent
acquisition of DATAllegro. Madison is designed to deliver massively increased scalability, capable of supporting the very largest data warehousing deployments. The solution will be able to handle the most demanding data warehousing workloads spanning hundreds of terabytes of data and thousands of concurrent users at the low total cost of ownership (TCO). Customers will be able to grow their Madison data warehouses by using a "scale out" approach of adding on standard server boxes as they need them.

Further augmenting the enterprise-class capabilities of Microsoft’s data platform, the
acquisition of data quality vendor Zoomix will help provide richer data quality capabilities in future versions of SQL Server Integration Services, enabling customers to maximize the accuracy of their BI.

Microsoft is working with industry-leading server and storage hardware providers including Bull, Dell Inc., EMC Corp., HP and Unisys Corp. to build a strong ecosystem providing an “appliance-like” buying experience for customers based on the “Madison” solution. Soon, customers can expect new data warehouse reference configurations based on SQL Server 2008 from these hardware partners as well.

PS: This time also Microsoft has borrowed the codename “Kilimanjaro” for SQL Server 2010 from the name of a National Park as it was the case with codename “Katmai” for SQL Server 2008. :)
For more details refer to:
http://en.wikipedia.org/wiki/Kilimanjaro_National_Park
http://en.wikipedia.org/wiki/Katmai