Skip to content

Latest commit

 

History

History
73 lines (63 loc) · 2.98 KB

performance.md

File metadata and controls

73 lines (63 loc) · 2.98 KB

Workshop - Performance

In this workshop you'll learn how to optimize the performance of SQL queries by applying indices and using best practises.

Prerequisites

In this workshop we'll use the a fragment of the famous Stack Overflow database (years 2008-2010). Because of the size we don’t deliver a backup file but we restore the database directly from the datafile StackOverflow2010.mdf.

  1. Download .mdf file from here
    • Caution the zipped size is about 1 Gigabyte, unzipped the Database is about 10 Gigabyte.
  2. Copy the file in your SQL Server data directory (e.g. C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\DATA).
  3. In SQL Server Management Studio (SSMS), right click on Databases, Attach, Add and browse to the .mdf file.

Exercise 1 - Exploration

Since the folks at Stack Overflow were so friendly to open-source their data, we can take a look around in the database.

Call to action

  1. Execute the following command to change ownership of the database:
    use [StackOverflow2010] EXEC sp_changedbowner 'sa'
  2. Draw the database diagram using SSMS;
  3. Which table will contain the most amount of records, in other words which table is the biggest?
  4. How many records does the table hold?

    Please note that this database is just a small subset of their actual data.

  5. Execute the following query
    SELECT * FROM Posts
  6. How many seconds does it take on your computer?
  7. Why does it take so long?
  8. How many seconds does the following query take?
    SELECT Id FROM Posts
  9. Is there a big difference? Explain why by using the correct terms found in the PowerPoint presentation.

Solution

A possible solution of exercise 1 can be found here.


Exercise 2 - Optimization

So it's clear that there is quite a lot of data (well atleast more then you're used to work with in the past). It's time to optimize some queries by using best practises and indices.

Call to action

  1. Find the best way to select the following columns for all posts created in 2008:
    • id;
    • body;
    • lasteditordisplayname.
  2. Order posts by score and (in case of equal score) commentcount both in descending order. Show the following columns of all the posts in the most efficient way:
    • id;
    • score;
    • commentcount
    • title

    How can you check if your result is executed in the most efficient way? Is the actual table used or are you using an index?

  3. Create an index on title. Then explain the difference (in the execution plan) between following queries:
    SELECT Id
        ,Title 
    FROM Posts  
    WHERE Title LIKE '%php%'; 
    SELECT Id
        ,Title 
    FROM Posts  
    WHERE Title LIKE 'php%';

Solution

A possible solution of exercise 2 can be found here.