Azure SQL Database Performance comparison part 7 of 9: Hyperscale

Welcome to my blog series on Azure SQL DB performance. In this blog I’ll write about my findings with the Azure SQL DB Hyperscale tier.

Click here for information about the set-up
Click here for the scripts I’ve used

Intro

So far, my blogs have been on the different Azure SQL DB offerings where there are differences between DTU and CPU based. But in general, the design is recognizable. With the hyperscale tier, many things change. There are still cores and memory of course, but the rest of the design is totally different. I won’t go into all the details, you’re better off reading them here [https://learn.microsoft.com/en-us/azure/azure-sql/database/service-tier-hyperscale?view=azuresql] and here [https://learn.microsoft.com/en-us/azure/azure-sql/database/hyperscale-architecture?view=azuresql] , but the main differences are the support of up to 100 TB of data in one database (all the other tiers max out at 40 TB), fast database restores based on file snapshots, rapid scale out and rapid scale up.

The Hyperscale architecture is different from all other architectures because of the way the underlying machinery is designed. The storage where all the data is stored has a number of snapshots. The datafiles are loaded into a number of page servers that have a SSD cache. This cache is a covering cache, implying all the data is loaded into that cache, ready to be served. The log service lies between the page servers and the compute node. It’s more than a regular log file or log disk but as a service it has its own local memory and SSD cache. On top of this all lies the compute node that does all the work and is accessible for regular users to do their regular SQL work. This node has a local cache that isn’t covering. This implies that this cache is a selection of all the data.

One thing from I’ve found when using this tier is that when you start scaling up and down, caches get cleared. Not only the ones we’re using for things like monitoring, but from what I could test, the local caches on the compute node as well. This cache needs to be repopulated and that will take time.

Testing this tier is not completely fair. The query to insert data generates a dataset a little over one gigabyte in size. This small amount of data is not where the Hyperscale excels. As I wanted to be able to compare the results between all the tiers and SKU’s, I’ve had to stick to this dataset. Because of the design, the disk latency is quite hard to measure and at times not really reliable. I’ll reiterate the point at the respective graphs as well.

Before finally diving into the measurements, I’d really love to hear your experiences with this tier.

Limits

The hyperscale tier starts at 2 cores as a minimum and goes up to 80.  Storage is simple, 100 TB across the board. This tier is for a lot of data. As I’m telling people in the session that accompanies this series of blog posts, if you need more data for one single database, you’re either in serious data warehousing or you might need to rethink your data strategy. The number of TempDB files is undisclosed but the size starts at 64 GB and goes up to 2 TB. The disk limitations are disclosed as well. Log rate is 100 Mbps across the board. The same goes for the data; starting at 8000 iops and maxing out at 204.800. One iop (or Input Output oPeration) is connected to the disk cluster size. As these are 4 kb, reading or writing one data page (8Kb) equals two iops. 8000 Iops equals something of 32 MB per second. The top end goes to about 800 MB per second. This speed, compared with local SSD should make sure you get to a lot of data quite quickly. Of course, these are the maximum values, it’s possible your results can turn out a bit lower.

Resources provided

I’ve deployed all the standard options and got the following results:

If you’re looking at these numbers, there are quite a few weird things happening.


SKU
MemoryCPU CoresMax workersSchedulersScheduler total count
HS_Gen5_27226212
HS_Gen5_416432419
HS_Gen5_625632620
HS_Gen5_834832824
HS_Gen5_104310321026
HS_Gen5_125212321226
HS_Gen5_146114321430
HS_Gen5_167016321630
HS_Gen5_187918201834
HS_Gen5_208820322036
HS_Gen5_2410824642439
HS_Gen5_321471283232147
HS_Gen5_4018780204098

Wait, didn’t you say there was an 80 core as well? Yes there is and I wasn’t able to test it because of budget reasons. When I get the chance, I’ll give it a go but for now the data is missing.

Memory has a quite stable rate with which it increases. This is something that feels weird. I’ve got a potential of 100 TB of data with 7 GB of memory at the low end. Granted 2 cores isn’t much either but from the data volume perspective, I would have expected more.
At 32 and 40 cores there are more cores available to the ‘instance’ than being used. Hello resource governor. The amount of max worker has some weird behavior but not as wild as I’ve seen in other tiers. The amount of schedulers, that in my opinion determines the amount of work the database can do, has a much more balanced growth and are in line with SKU description. Every scheduler is ‘bound’ to a cpu core. It should be that the amount of schedulers represents the amount of cores. Yes, there are hidden schedulers but these should be hidden from the used view as well.

If you really are trying to work with the large amounts of data, you might want to consider the higher end SKU’s to make sure a substantial part of the data can fit in memory. If not you might be working more with TempDB than you like. If you’re scaling up or down, remember that the local caches get flushed and repopulating them will take some time.

Running query’s

We’re not using a database to sit around idle, we need to insert, read, delete and update data (in whatever order you fancy). When you look into the code post, you’ll find the scripts I’ve used to do just that. Insert a million rows, try and read a lot of stuff, delete 300.000 records and update 300.000. The reads, updates and deletes are randomized. Because of the large number, I think the results are useable and we can compare them between tiers and SKU’s.

Insert 1 million records

I’ve insert the records on all SKU’s within this tier. With 2 cores, this took a little over 4 minutes and with more cores, duration decreased to 1 a little under two minutes with the 10 core SKU. After that, times went up again. Locking as the main culprit.

Wait types I’ve seen the most:

The wait that took the most time, with 28 percent, is the exclusive access to data pages. Second, but not really expected with 26 percent of the time, is the logging operation. As it’s a separate service I expected it to be much lower. Shared access to pages takes 24 percent of the time. Waiting on access to pages takes roughly 50 percent of the time. Of course some time, 6 percent,  is spent waiting on the extended events populating the Azure portal. Exclusive access to pages on disk is about 6 percent as well. Then we’re going into uncharted territories. The RBIO wait. I’m in the middle of a call with Microsoft to find out what it’s doing. I’ve seen these wait rack up significant time, combined with processes waiting on something. I’m not sure yet if there’s causality or not. When I find out, I’ll update this blog.

Let’s see what happens on the disks latency-wise.

The read latency varied between 1 and 67ms, averaging at 10 ms. Write latency varied between 1 and 48 ms, averaging at 5,4 ms. The high numbers are weird, because the local storage is SSD and should be fast. The other graphs will show similar lines where the reads start high and plummet and the writes suddenly increase at the end. One conclusion can be that my testcases aren’t always the best fit for the database. Maybe these numbers are some kind of outlier where the deployment may have suffered from some unseen issue. On the whole, the averages are quite good with 5 to 10 ms latency.

Select data

After inserting data, it was time to select some. The script that I’ve used is intentionally horrible to make the worst of the performance. The 8 core DB took 13 hours and 47 minutes to complete. The 40 core SKU finished in just over 57 minutes. Compared to the computed optimized tier from the last blog, this is a step in the wrong direction. Then again, this isn’t a compute optimized environment.  

The wait types I’ve seen the most:

This time round the waits are on reaching the maximum time a query can spend on the processor (4ms). This was almost 39% of the time over all SKU’s.  Next up parallelism. These waits became more prevalent in the higher SKU’s with more cores. Note that you really need to keep your parallelism in check, or allow for these waits. These waits combined added up to 43 percent of the time. Getting exclusive access to data pages added eight percent of the time. The xe_live_target_tvf wait amounts to about 2 percent of the wait time, just like the pwait wait. Again the RBIO wait Yes there were other waits but with under one percent of the total I’m discarding them for now.

Let’s take a look at the latency

The read latency was anywhere between 1 en 67 ms, averaging at 10.6 ms. The write latency was between 1 and 40 ms, averaging at 4 ms. As there’s not much writing to do, low write latencies are as expected. The reads are mostly low with some exceptions. As they’re the same as the insert, it’s quite possible these are related. Across the board, the figures aren’t all that bad.

Delete data

After selecting data I decided I had enough of the large amount and had to trim some of. So I deleted 300.000 records. This process is quick all over the board, except for the 2 core machine that needed a minute. The 4 core machine halved the time, after that the duration swung around the 14 seconds mark.

The wait types:

This was a bit surprising, the extended events populating the Azure portal leading the way with 21 percent of the time. Locking came in close with 20 percent and logging with 15 percent. Exclusive access to pages on disk followed with 12 percent and again the mysterious RBIO wait. Then the shared access to pages came along with 8 percent followed by our PVS friend at 5 percent. For the first time, I hit the resource governor wait that limited the query’s.

All in all not too bad I think. Keep in mind that the query’s finished really fast, meaning getting wait stats is a bit harder.

The read latency varied between 1 and 67 ms, averaging at 10,3 ms. The write latency varied between 2 and 35 ms averaging at 4,4 ms. As seen before, the outliers at the edges, the other ones are quite stable. I’d expect these figures as well for a locally attached SSD.

Update data

The last job I let rip on the database was an update task. I ran a script to update 300.000 records. The 2 core took 57 seconds, 40 core took 26. The behavior is quite the same as with deletes.

Let’s take a final look at the waits

Just two relevant ones. The one you’ll hit the most with 76 percent of the time is the page modification wait. The second one is related to availability groups but with 12 percent not all that big. Some time is spent for the Azure portal and again the RBIO wait. Not much this time, about 2 percent of the time but it keeps popping up.

Let’s see if the disk latency is as weird as it was with the deletes.

The latency is comparable with the update graph. Read between 1 and 67 ms, averaging at 10,3 ms. Write latency varied between 1 and 33 ms, averaging at 4,3 ms. And again at the 72 core a huge rise in latency.

Pricing


The pricing shows an even line over the different SKU’s. The breaks you’re seeing in the lines are caused by the uneven rises in cores. In the end, you’re paying a price per core (112,45 in my case) so it’s quite predictable.

More performance will cost more and you’ll have to decide if this tier and it’s SKU is worth the cost. To keep this under control I’d advise you to use the cost center and it’s alerts. Not only the actual costs, you can create cost alerts on predicted costs as well to prevent surprises. We’ve all been there.

An extra warning as from October. As energy prices rise, so do the prices of Azure resources. I’ve seen increases in price for various resources including databases and VM’s. Both in pay as you go and in reserved instances.

Finally

The Hyperscale SKU is really suited for big dataloads, we’re using it in one of our large data warehouse projects. Not because we’re passing the 40 TB mark but we needed to get rid of the log waits. To keep cost under control, the ETL process has added scripts to scale the database up and down when needed. This has the drawback of various caches getting emptied but that’s less of an issue then having a large amount of cores just idling and costing money. If you want to test tier Tier for yourself, make sure you choose a relevant load. With the benefit of hindsight, this one wasn’t but for the purpose of comparison between the different tiers, I had to stick with what I got.

I will hopefully come back to the RBIO wait and the 80 core monster.

Thanks for reading and next week, Business Critical.

One thought on “Azure SQL Database Performance comparison part 7 of 9: Hyperscale

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s