In this workshop you'll learn how to optimize the performance of SQL queries by applying indices
and using best practises.
In this workshop we'll use the a fragment of the famous Stack Overflow database (years 2008-2010). Because of the size we don’t deliver a backup file but we restore the database directly from the datafile StackOverflow2010.mdf
.
- Download
.mdf
file from here- Caution the zipped size is about 1 Gigabyte, unzipped the Database is about 10 Gigabyte.
- Copy the file in your SQL Server data directory (e.g.
C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\DATA)
. - In SQL Server Management Studio (SSMS), right click on Databases, Attach, Add and browse to the
.mdf
file.
Since the folks at Stack Overflow were so friendly to open-source their data, we can take a look around in the database.
- Execute the following command to change ownership of the database:
use [StackOverflow2010] EXEC sp_changedbowner 'sa'
- Draw the database diagram using SSMS;
- Which table will contain the most amount of records, in other words which table is the biggest?
- How many records does the table hold?
Please note that this database is just a small subset of their actual data.
- Execute the following query
SELECT * FROM Posts
- How many seconds does it take on your computer?
- Why does it take so long?
- How many seconds does the following query take?
SELECT Id FROM Posts
- Is there a big difference? Explain why by using the correct terms found in the PowerPoint presentation.
A possible solution of exercise 1 can be found here.
So it's clear that there is quite a lot of data (well atleast more then you're used to work with in the past). It's time to optimize some queries by using best practises and indices.
- Find the best way to select the following columns for all
posts
created in2008
:id
;body
;lasteditordisplayname
.
- Order
posts
byscore
and (in case of equal score)commentcount
both indescending
order. Show the following columns of all theposts
in the most efficient way:id
;score
;commentcount
title
How can you check if your result is executed in the most efficient way? Is the actual table used or are you using an index?
- Create an index on title. Then explain the difference (in the execution plan) between following queries:
SELECT Id ,Title FROM Posts WHERE Title LIKE '%php%';
SELECT Id ,Title FROM Posts WHERE Title LIKE 'php%';
A possible solution of exercise 2 can be found here.