Last week the big announcement came at Microsoft Ignite, Fabric is GA.
Very cool, a lot of noise again for this shiny toolbox, but do we need to abandon everything and focus solely on the new toys?
Before I’ll answer that question, let’s look at a few moving parts of Fabric.
Integration
The most important part of Fabric is the fact that all the tools are integrated. A few weeks ago I was doing a one day proof of concept and had to fight connections between Synapse, Data Factory and Storage accounts. It took way to much time to get these resources to play nice with each other. Fabric solves this problem by default and during the poc day, I more than once muttered about Fabric being easier.
The OneLake concept (just one per tenant) makes short work of all the different storage accounts for different roles or even layers in the data warehousing environment. A few years ago I attended a session in Gothenburg on datalake storage and one of the take-aways was that an Azure storage account has an upper IO limit (think of it as a top speed of reading/writing data). A tip was to create multiple storage accounts to maximise the IO.
OneLake is explained as an abstraction over multiple datalake storages which should address this take-away.
Data warehousing
When I’m thinking about the data warehousing clients, we can either use Fabric Warehouse or Fabric Lakehouse, in my opinion the difference is in what language you’d like to build your warehouse. The net result is a set of tables that are, in the end, files in the OneLake. Both have a SQL endpoint to query the data, main difference being that in the Lakehouse you can’t change the data with SQL. You can do that in the Warehouse experience. I’m still working on a blogpost comparing the offers in terms of performance and capacity usage.
I am aware of the simplifications in the above comparison, this blog isn’t intended to compare the Lakehouse and the Warehouse.
Security
When working with data, my main objective is to keep it all as secure as possible. Zero Trust is something my coworkers are really getting fed-up with, but in my opinion it’s an essential part of our job as data engineers. I don’t really care about the shape or form of the data (although XML…) but it should only be available to those with the correct privileges. Data should only flow through encrypted connections or private networks.
When you take a look at the road map (and please do, it will help you deciding when to jump in or what to expect), crucial elements of OneSecurity are expected Q2 2024. When you try stuff out in Fabric, you’ll notice that some traffic can’t use private endpoints and will probably use the public internet to connect.
Is this bad? I’m not sure because I haven’t got insights in the technical backend. But whenever I’m not sure, I tend to be careful and rather stay on the safe side. Anonymised, aggregated data is fine, but fine-grained PII data? Not yet.
We’re working with clients who have regulations on data not leaving the country borders. In other words, LRS and ZRS are fine, GRS is off limits. Even though it’s the same geo-political region, the data centre is outside our country and therefore not to be used. When you look at the documentation of OneLake, at the time of writing(!), there’s no clear wording of the OneLake location.
All data in OneLake is accessed through data items. These data items can reside in different regions depending on their workspace, as a workspace is created under a capacity tied to a specific region.
OneLake utilizes zone-redundant storage (ZRS) where available (see Azure regions with availability zones) and locally redundant storage (LRS) elsewhere
Microsoft Learn
The interpretation can be that OneLake is built on a number of storage accounts, each account connected to a workspace that has a region. But can you choose a region in a workspace? No, you cannot. But you can choose a region for your Fabric Capacity when deploying it from the Azure Portal. Though it could be the same effect, the wording is different.
Is Fabric not secure? I’m not saying that at all. In my opinion, the security features are somewhat rudimentary and leave room for improvement. Whenever working with clients, I’m always working towards zero trust, least privilege and encryption. Fabric is getting there but there are some features lacking before I feel comfortable loading sensitive data into the OneLake. Maybe I’m a bit paranoid or rigid when it comes to zero trust. On the other hand, I try to avoid getting on the front page of newspapers ;).
Code enhancements
I wasn’t sure how to call this next part, but it’s quite clear that most code improvements on Synapse are made in Fabric. There are some new commands only working (or having any effect) in the Fabric Synapse environment. I haven’t seen major changes in Azure Data Factory, other than the removal of mapping dataflows and not yet being able to edit the underlying json code from the Fabric portal. An option like data activator won’t make it’s way to Synapse or Data Factory but can play a major role in the Fabric offering as it has a lot of potential.
Current code investments
One question I’ve heard a few times is ‘am I throwing away money if I develop heavily in Azure Data Factory or Synapse now?’. That’s a very good question and though I’m not willing to bet a lot of money on it, I can’t help but think Microsoft will release some sort of migration tool to get your current code to Fabric. Either that or they hopefully will create some seamless Git integration to get to the same result. There will always be some edge cases where migrations fail, but those of you who have been working with SQL Server, haven’t you experienced the same when upgrading to newer versions?
Potential
Yes, I’m convinced Fabric has an enormous potential. This is an amazing start but there’s so much to add. Think about the possibilities of CoPilot for instance. I can’t wait for Fabric to assist me in creating smart connections to API’s. Or my smart colleagues doing amazing stuff in the ML part of Fabric. But there is so much more. When the Git integration is fully active and the deployment pipelines inside Fabric can work with all the objects it will be CI/CD heaven. Until I break stuff. In production. I could go on, but just take a good look at all the options that are included in the product.
Pricing and licensing
Alright, let’s address the elephant in the room. What will this cost? Well, you pay for your compute. These are the Fabric Capacity Units. Besides that, you’ll pay for the OneLake storage usage. This should be the same price as a regular storage account. And you have to get your PowerBI licenses. When you get to F64, things get complicated with P1 PowerBI licensing and that’s where I get lost.
I’d LOVE for Microsoft to make it easier. Disconnect the PowerBI and Fabric CU’s. You get your CU for whatever level you need. You get your PowerBI licensing, Pro or Premium and the amount you need. And keep it like that. This will make calculations a bit easier and more transparent.
Of course, when you add other resources like Azure Sql databases, they add to your bill.
My advice
If you’ve read this far, you’ll probably wondering what my advice is right now. For me, it boils down to the following:
- Get started with your proof of concepts now. But use aggregated and anonymised data only
- Keep watching for migration tools and test them, thoroughly.
- Do not move any production load, sensitive data or any other data you really care about to Fabric until OneSecurity is released, tried and tested.
- Start having sessions with coworkers, engineers and other people in your organisation and brainstorm on what Fabric can help you with.
- Don’t be afraid to say no. Fabric might not be for me right now. But make sure you keep up to date with the new stuff. Maybe the no will change to a yes.
- If you’re working with a consulting company, ask them on their opinion and maybe let them build a very small proof of concept resembling your data.
In the end, every advice is just an opinion. I can only hope my peers in the data community are formulating their opinions and advice so you can get them all together and draw your own conclusions. A good place to start is the content hub by Erwin de Kreuk.
Thanks for reading, happy fabricating and I’d love to read your opinions!
One thought on “Microsoft Fabric GA, and now?”