Splitting strings

I got a question yesterday on how to split a string. Quite easy you might think, as SQL Server offers a number of ways to split strings. There are some differences between these ways.

Starting point

Suppose we’ve got the following string we need to split.

DECLARE @string VARCHAR(50) = 'string||you||want||splitted'

string_split function

Since SQL Server 2016 (130) you can use the string_split function. Let’s see how this one works, because it’s not really a function as you’d expect it to be.

SELECT STRING_SPLIT('||', @string)

This will give you an error:

Msg 195, Level 15, State 10, Line 5
‘string_split’ is not a recognized built-in function name.

But, in my database, SQL 2019 (150) it is a function. And that’s true, but you need to use the correct syntax:

SELECT value FROM STRING_SPLIT(@string, '||')

Cool, let’s run it! But wait, a new error?

Msg 214, Level 16, State 11, Line 7
Procedure expects parameter ‘separator’ of type ‘nchar(1)/nvarchar(1)’.

Ah, so when using string_split, you can’t use seperators that are anything else than 1 character long. So, to get my values, I have to replace my double pipe separator by a single one.

SELECT value FROM STRING_SPLIT(REPLACE(@string, '||', '|'), '|')

Now this code ran nicely and there’s a valid result.

Four rows

This might be the end of this blogpost, but there’s more. Because the requested result wasn’t to present four rows, the result needed consisted of four columns. Pivot I can hear you mumble. To get this to work you’ll need a few tricks. First one is to start with the string_split in a CTE (Common Table Expression). You have to this because if you want to pivot, you’ll have to add a column to pivot on. To do this, I’ve added a row number. But a row number requires you to order by something. It turns out you can order by SELECT NULL. This doesn’t alter the order of the output but adds the needed numbers to pivot. Finally, you have to pivot the values over the rownumbers to get the desired result.

WITH colresult as
(
	SELECT value FROM string_split(REPLACE(@string, '||', '|'), '|')
)

select *
from (
	SELECT value as val,
	ROW_NUMBER() over (order by (select null)) as rn
	FROM colresult
) v
PIVOT
(
	min(val) for rn in ([1],[2],[3],[4])
) piv

But this gives the desired result

got it

Substring and charindex

If you think this is weird code, you can also use some tricks with substring and charindex. Substring allows you to take a small part of a string based on positions. With the added help of charindex you can search for a character or a set of characters. Because charindex can contain multiple characters in the ‘search string’. But, one of the drawbacks is that charindex only gets the first or the last character in the string. If you have more than two, you have do some nasty tricks.

Let’s start with the string and get the first occurrence.

select CHARINDEX('||', @string)

This query results in:

7th in line

The word string has six characters, on position 7 starts my separator. Let’s add the substring function.

select SUBSTRING(@string, 0, charindex('||', @string)) as first

In this query, my substring function picks up the string, starts at character 0 en picks up characters until it reaches the position where the separator starts. The output:

first result

Now the challenge arises. Because there are more words in the string.

SELECT
substring(@string, 0, CHARINDEX('||',LTRIM(RTRIM(@string)))) [1]
,substring( @string, CHARINDEX('||', @string)+2,(CHARINDEX('||',LTRIM(RTRIM(@string)),(CHARINDEX('||', LTRIM(RTRIM(@string)) )+1))) - CHARINDEX('||', @string)-2) [2]
,substring(@string, (CHARINDEX('||',@string, (CHARINDEX('||',@string)+1))+1)+1, (CHARINDEX('||',@string,CHARINDEX('||',@string, (CHARINDEX('||',@string)+1))+1)-(CHARINDEX('||',@string, (CHARINDEX('||',@string)+1))+1)-1)) [3]
,reverse(substring(reverse(@string),0, CHARINDEX('||',REVERSE(LTRIM(RTRIM(@string))))))[4];
GO

If this makes your eyes water, I can only apologize. Yes it works but it’s very far from pretty. But if you are stuck with a multiple character separator; this can work.

done

Performance

Now, the last part, what about performance differences between these two options. Let’s start with the execution plans.

Pivot

The second one doesn’t return an execution plan. Let’s check out the CPU and IO loads

Identical results

Now this is something unexpected. Both query’s claim no CPU and no IO. Because the query’s are running from a local variable and nothing gets to disk, I can see the IO part. But no CPU? I’d expect some CPU usage because there is some work to be done. Then again, this query is just a demo to show you two ways to accomplish this challenge. In real life, many more factors come into play. One of things to look out for is readability and your ability to share this code with others. Even though pivot is a bit weird, it’s easier to explain than all the nested charindex stuff.

Thanks for reading!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s