Azure SQL Database Performance comparison part 3 of 9: Premium

Welcome to my blog series on Azure SQL DB performance. In this blog I’ll write about my findings with the Azure SQL DB Premium tier.

Click here for information about the set-up
Click here for the scripts I’ve used

Intro

In my previous blog I wrote about the standard tier, one that has 9 SKU’s and can be compared with the general purpose tier but with the DTU model. This time I’ll write about the premium tier that has 6 SKU’s ranging between 300 and 12.000 euro’s per month when used full time. The premium tier can be compared with the business critical tier.

Limits

The standard tier starts at 125 DTU’s and goes up to 4000. DTU’s are made up from a magic mix of CPU, memory, read iops and write iops. An iop (Input Output oPeration) should be a 4kb (disk cluster size) read or write. 125 DTU translates to 500 Kb/sec up to 32.000 Kb/sec. As we’re used to datapages which are 8Kb in size, you could say these databases are able to pull 62 to 4.000 pages per second from disk. When there are simultaneous writes, you’ll share the performance. At least that’s my interpretation of the IOP. For the DTU part, I’m still struggling to get a good grip on what it exactly is, beyond the magic mix.

The maximum storage starts at 1 TB and when you get to P11 goes up to 4 TB. Premium has 12 TempDB files across the board. The storage type isn’t disclosed but as it’s the DTU version of business critical, I’m suspecting local SSD storage.

Resources provided

I’ve deployed all the standard options and got the following results:

SKU	GB Memory	CPU Cores	Max workers	Schedulers	Scheduler total count
Premium P1	5	1	12	1	10
Premium P2	11	2	12	2	11
Premium P4	24	6	26	6	25
Premium P6	56	12	20	12	31
Premium P11	80	18	24	18	36
Premium P15	171	128	32	42	149

Deployment results

If you’re looking at these numbers, there are quite a few weird things happening.

Memory and CPU start to increase slowly at the bottom but go wild at the top end. The amount of schedulers, that in my opinion determines the amount of work the database can do, has a much more balanced growth. Every scheduler is ‘bound’ to a cpu core. It should be that the amount of schedulers represents the amount of cores. Yes, there are hidden schedulers but these should be hidden from the used view as well.

The amount of workers on the other hand is a bit wonky. From 12 to 26 back to 20 to 24 and then to 32. When we look at query performance, there might be a relation between performance and amount of workers. The workers are the ones doing the work for the schedulers. But with less workers, the schedulers seem to be done. If my measurements are correct, you can get a lot of cores with less schedulers and even less workers.

One way of checking if the the workers and schedulers can change in your favor is by changing the SKU as far up as it can to see if you can get it to move to another node. Then change back down to your preferred SKU and check the results. It isn’t pretty but it might be quicker than submitting a call to Azure support.

Running query’s

We’re not using a database to sit around idle, we need to insert, read, delete and update data (in whatever order you fancy). When you look into the code post, you’ll find the scripts I’ve used to do just that. Insert a million rows, try and read a lot of stuff, delete 300.000 records and update 300.000. The reads, updates and deletes are randomized. Because of the large number, I think the results are useable and we can compare them between tiers and SKU’s.

Insert 1 million records

I’ve insert the records on all SKU’s within this tier. At P1, this took 13 minutes, at P15 it was done in a little over 2 minutes. The more DTU’s you add, the more performance you get but you pay more as well. If you compare with standard, you can see a huge performance increase when just starting with the P1.

Wait types you’ll encounter most:

HARD_SYNC_COMMIT
PAGELATCH_EX
HADR_GROUP_COMMIT
POOL_LOG_RATE_GOVERNOR
LOG_RATE_GOVERNOR
WRITELOG

To summarize, there’s mostly waits on availability groups, access to datapages in memory, the DTU limits and of course the logfile.

The read and write latency varied between 1 and 7 ms. This is almost only possible with local disks.

Select data

After inserting data, it was time to select some. The script that I’ve used is intentionally horrible to make the worst of the performance. I’m sure your scripts are much better. The P1 SKU took 28 hours to complete. Just like the inserts, you can see enormous increases at the low end but the more you increase, you get less extra performance. The P15 SKU finished in just over 51 minutes. Funny that the low SKU is much quicker than the standard tier but the top premium is a bit slower than the top standard one.

The wait types:

SOS_SCHEDULER_YIELD
PWAIT_EXTENSIBILITY_TASK
HADR_FABRIC_CALLBACK
RESOURCE_GOVERNOR_IDLE
CXPACKET
LATCH_EX

This time round the waits are on reaching the maximum time a query can spend on the processor (4ms), machine learning services, availability groups, resource limits, parallelism and access to non-page data structures to change the data structure.

The latency was, again, between 1 and 7 ms.

Delete data

After selecting data I decided I had enough of the large amount and had to trim some of. So I deleted 300.000 records. This process is quick all over the board, ranging from a little under three minutes to 17 seconds at best. The gains from P4 and up were very little as you can see in the graph.

The wait types:

HADR_FABRIC_CALLBACK
RESOURCE_GOVERNOR_IDLE
PVS_PREALLOCATE
HADR_SYNC_COMMIT
LCK_M_IX
WRITELOG

The wait that I’ve seen the most has to do with the availability groups, you might run into the DTU limits or the persistent version store. There were even some locking waits at the high end and as usual some log waits.

The disk latency was between 2 and 7 ms.

Update data

The last job I let rip on the database was an update task. I ran a script to update 300.000 records. The P1 took almost 3 minutes, P4 and up went under a minute and P15 was the quickest with 14 seconds.

Let’s take a final look at the waits

RESOURCE_GOVERNOR_IDLE
PAGELATCH_EX

Just two waits, either about the DTU limits or waiting for access to modify a page. The rest of the waits were insignificant, also because of query duration.

Pricing

The weird thing is that pricing is shown in an inconsistent way. When you look at the documentation, it states that you pay for the DTU’s allocated to your database. But the Azure Calculator gives off a price for the DTU that you can set for x amount of hours. To create some kind of consistent overview, I’ve set the pricing to usage for a whole month non-stop.

You can see the curve being the exact opposite of the performance curves, more performance requests a larger credit card. This is the point where you need to determine if all the performance from this premium tier is worth the cost. To keep this under control I’d advise you to use the cost center and it’s alerts. Not only the actual costs, you can create cost alerts on predicted costs as well to prevent surprises. We’ve all been there.

Finally

Would I use this one for my use cases? The storage speed is incredible and when you haven’t got a 24/7 workload with a lot of extra requirements around replica’s this might be a very good fit. But increase usage and your costs can spiral out of control quite quickly. One trick you might do is change between standard and premium depending on your workloads. But remember that the disks are different and there are differences in max data size that can either block scaling or increase the time to do this.

Thanks for reading! Next week, general purpose serverless.