Not every SQL is created equal

Yes I know, clickbaity cheesy title. This blog is a bit like that, but maybe you’ll learn something as well.

Issue

In short, what I was facing today was a trial migration from System A to System B. Which means that data has to flow from one system to the other.
System A is a System that exports data in XML format. These XML’s are processed by Azure Data Factory into a data lake, transformed into parquet files and made available with a Synapse Analytics serverless SQL Pool.
System B is a System that’s more modern and uses Dataverse to ingest data. Moving the data from the datalake into System B is handled by ADF in a very simple pipeline. The source is a SQL Query with the correct transformations (datatype, columnnames and filters) and the target a regular Dataverse connector.

One of the transformations is to change one primary key column from integer to GUID. This is something you can do with some trickery you’ll see in the code. But what I found was that, even though the primary key is unique, the GUID’s weren’t. And then the fun starts digging into the why…

Reproduce the issue

To reproduce the issue, let’s first create a table and add some rows into it.

CREATE TABLE dbo.demonumbers (
	id int identity(1,1),
	numberint int,
	numbervarchar varchar(50)
)

INSERT INTO dbo.demonumbers
VALUES
(219638,'219638'),
(219637,'219637'),
(219635,'219635'),
(219634,'219634'),
(219633,'219633'),
(219632,'219632'),
(219631,'219631'),
(219630,'219630'),
(219629,'219629'),
(219628,'219628'),
(219627,'219627'),
(219626,'219626'),
(219625,'219625'),
(219624,'219624'),
(219623,'219623'),
(219622,'219622'),
(219620,'219620'),
(219619,'219619'),
(219618,'219618'),
(219615,'219615')

Nothing special here. Now, let’s transform the columns from integer to GUID.

SELECT numberint,
        cast(CONVERT(BINARY(16), REVERSE(CONVERT(BINARY(16), CAST(numberint as varchar(30))))) as uniqueidentifier) as varcharconversion,
        cast(CONVERT(BINARY(16), REVERSE(CONVERT(BINARY(16), CAST(numberint as int)))) as uniqueidentifier) as integerconversion
FROM dbo.demonumbers
ORDER BY cast(numberintas int) desc

This code doesn’t do much more than mess with the number (casting it as either an integer or a varchar) and outputting a GUID as a result.

Something isn’t right

Again, this code was run against Azure Synapse Serverless SQL. And if you read closely, the left column (converting an int to a varchar and then building the GUID) works as expected. The right column (working with the original integer) returns duplicates. Even though there aren’t any.

Reproducing on SQL Server 2022 Express

This just didn’t fit between my ears, so I tried to see what happens when I run the same code against SQL Server 2022 on my laptop. Express Edition and nothing fancy. Created the table and inserted the rows. And ran the following code.

SELECT numberint,
        cast(CONVERT(BINARY(16), REVERSE(CONVERT(BINARY(16), CAST(numberint as varchar(30))))) as uniqueidentifier) as varcharconversion,
        cast(CONVERT(BINARY(16), REVERSE(CONVERT(BINARY(16), CAST(numberint as int)))) as uniqueidentifier) as integerconversion
FROM dbo.demonumbers
ORDER BY cast(numberint as int) desc

SELECT numbervarchar,
        cast(CONVERT(BINARY(16), REVERSE(CONVERT(BINARY(16), CAST(numbervarchar as varchar(30))))) as uniqueidentifier) as varcharconversion,
        cast(CONVERT(BINARY(16), REVERSE(CONVERT(BINARY(16), CAST(numbervarchar as int)))) as uniqueidentifier) as integerconversion
FROM dbo.demonumbers
ORDER BY cast(numbervarchar as int) desc

To eliminate the source data type, I’ll use both options. The results:

Integer conversion
Varchar conversion

As you can see, both integer and varchar conversions return the same result, but also, both outputs remain identical. To highlight the difference:

Synapse Serverless
SQL Server

Instead of five leading zero’s in Synapse, SQL Server only has three and the first part of the GUID had two letters added, making the row unique.

Why?

I tried to dig in but all the execution plan tells me is Compute Scalar. Without any hint of what’s going on behind the scenes. So let’s dig into the different steps to see where things run off the rails.

SELECT numberint,
	CONVERT(BINARY(16), CAST(numberint as int)) [Convert int to binary],
	REVERSE(CONVERT(BINARY(16), CAST(numberint as int))) [Reverse conversion],
	CONVERT(BINARY(16), REVERSE(CONVERT(BINARY(16), CAST(numbervarchar as int)))) [Convert Reversal],
    cast(CONVERT(BINARY(16), REVERSE(CONVERT(BINARY(16), CAST(numbervarchar as int)))) as uniqueidentifier) as integerconversion
FROM dbo.demonumbers

On SQL Server, this happens:

On Synapse, this:

Now, we can see a difference. In the reverse part, one character is missing in Synapse. Maybe something with the collation?

Synapse:

SQL:

Collation shouldn’t play a part in it. Now, I can see why it happens; in the reverse when converting an integer. But I don’t understand why it happens. And, why it doesn’t happen with the varchar conversion.

SELECT numberint,
	CAST(numberint as varchar(30)) AS [int to varchar],
	CONVERT(BINARY(16), CAST(numberint as varchar(30))) AS [Convert varchar to binary],
	REVERSE(CONVERT(BINARY(16), CAST(numberint as varchar(30)))) AS [Reverse conversion],
	CONVERT(BINARY(16), REVERSE(CONVERT(BINARY(16), CAST(numbervarchar as varchar(30))))) AS [Convert Reversal],
	cast(CONVERT(BINARY(16), REVERSE(CONVERT(BINARY(16), CAST(numbervarchar as varchar(30))))) as uniqueidentifier) as varcharconversion
FROM dbo.demonumbers

On my trusted SQL Server:

Conversion column empty?

On Synapse Serverless:

Just a different number

It’s a bit sad that I can’t help you with the explanation why this happens. Other than that the reverse part of the query has something to do with it. One solution is to work with the varchar, another is to drop the reverse and second conversion altogether. Both result in unique values and solve the problem.


But the main point of this blog is to show you that code that might work perfectly on your SQL Server can have a different result in Synapse Serverless. Even though it feels like you’re still connecting to a SQL endpoint, it’s not equal. And that’s something you might need to be aware of.

Thanks for reading!

One thought on “Not every SQL is created equal

Leave a comment