My first ADF pipeline!

For those who are less young like me, this header will ring a distant bell. Anyway, I had a good day yesterday when i created my first functional and working ADF pipeline.

Here’s the situation. We’re working on a project where the database is being developed and there’s a lot of schema drift. On this same project other people are working on the reporting side of things. These two jobs don’t match because a change in table names destroyes the reports. And, when the tables are changed, locks are applied, time-outs happen and frustration arises.

So, we’ve created a secondary database with just the reporting tables. We’ll update those tables when requested and update the data after that.

At first I thought I could manage this with the data import and export wizard. That’s true but when you need to keep refreshing the data, it gets tedious and frustrating because you can’t save the package this wizard creates in an Azure environment. Because SSIS doesn’t exist there.

So, I moved to ADF. First try: let’s see if there’s an object to select everything from a schema and move it over. Well, no. Ok, hello google. And it turned out Microsoft had a tutorial just for this workflow!

I built this one, tested it and got a load of errors. Errors I hadn’t seen with the wizard.

It turns out that, even when you turn these columns are computed columns and don’t like it when the get data from an insert statement. So I needed to exclude these columns. But how. Well, the tutorial supplies you with a select * statement from the source table. But, you can modify that one with dynamic SQL.

declare @columnlist varchar(4000),    
         @TName varchar(200),
         @query nvarchar(4000)
 set @TName = 'table_name'
 set @columnlist = STUFF((SELECT
                 ',' + '[' + C.name + ']'
               FROM 
                 sys.columns C
               WHERE 
                 OBJECT_NAME(C.object_id) = @TName
                 AND C.is_identity = 0
                 AND C.is_computed = 0
                 AND C.system_type_id <> 189
               FOR XML PATH('')),1,1,'')
 set @query = 'select ' + @columnlist + ' from [schema].[table]'
 print @query
 --exec sp_executesql @query

Now remember that this script is used for testing purposes, you’ll need to change it slightly to be able to use it in ADF. I’m using a huge datatype because the fact tables contain a lot of columns and my first run with varchar(2000) failed because it just wasn’t long enough. Think about the identity column. Do you need it or not. Because if your identity column in the dimension tables is used in your fact tables, you want the value.

Does it perform? Well… I’m still working on the settings of the data factory. The source is a hyperscale 12 core, the target a hyperscale 4 core. Transferring around 40 GB of data took about an hour. All the indexes were active so it might be ok, not sure about that yet.

But, my first attempt was a technical succes. And that’s something I am happy with.

If you want to learn more, you can find a lot of documentation on the Microsoft site, but don’t forget the huge knowledge source of Cathrine Wilhelmsen and the site of Andy Leonard. There are off course a lot more people who share knowledge so mention your favorite site in the comments!

Thanks for reading!

My first ADF pipeline!

Published by reitsees

Leave a comment Cancel reply

Share this:

Related

Published by reitsees

Leave a comment Cancel reply