This month, Ben Weissman (b|t) invites us to write about the experiences in the hybrid data world and to tell where we are in our journey.
To start with the latter, I’m only just beginning the journey. Most of our clients are running on-premises or are thinking about moving to the cloud. But the first ones are (either by themselves or assisted by me and my co-workers). We think we know what we are doing, but just as we’re getting to grips with everything Azure, things change. Sometimes just buttons, sometimes new functionality arises. In any case, we regularly need to adjust our plans.
In some ways this feels like a setback, but it just the steep development-curve Azure is in at the moment. I do wonder sometimes if that’s a good thing or a sign of a product that’s not quite finished yet. On the other hand, when is a product really finished.
Before i digress too much, on to the experiences. Because working in a hybrid world means connecting cloud to on-premises (in our cases). There are a number of things to take into account.
- Ease of access
When you move data to the cloud, your first assumption has to be that everyone has access. Even if they don’t, assume they do. Because you’re up in the cloud, assume you’re dealing with a 360 degrees attack plane. So you have to secure your data and data-environment. For instance, when you allow Azure services to connect to your database, it’s not just the services within your resource group, subscription or tenant. No, It’s all the Azure services. My Azure service can reach your database. No wonder that somewhere in the last months, the default for Azure services has changed from yes (they can connect) to no.
But check your users as well, RBAC assignments and managed identities. If you’re using the key vault, what are the security settings? Within your database, the same question arise. Who can access what. Should you use AlwaysEncrypted? Transparent Data Encryption is enabled by default but this “only” protects your datafiles, not your data. As an Azure DBA, I think you need to be a security admin as well or have someone very close who takes care of that.
Especially in a hybrid environment, data will live on-premises and in the cloud. And at times, data has to move. My experience is that this can be a major pain, because not only do need to set up some sort of VPN or a (very expensive) Express Route, but there’s that concept of bandwidth. Before you know it, your data comes trickling in at a rate of 100kb/s. Good luck finishing your ETL in x hours. On-premises you’re used to a 1 or 10 GB backbone helping you out when you’re taking in all the data. In the cloud, you might get that backbone when you pay enough money. If you can’t, think about what you really need. Maybe daily deltaloads and weekly or even monthly full loads for the special reports.
Remember that the location of your database in the cloud has to be as close to your on-premises environment as possible to reduce latency in data transfer.
Ease of access
My data is in the cloud, so I can easily access it from anywhereanonymous
Well, no. Security makes sure you’re not getting access all that easily. You’ll run into 2FA/MFA requirements, VPN’s you need to set up or other security measures. You can setup your environment that only specific IP addresses can connect to your cloud environment. If you want to use the remote desktop protocol over a public IP address, don’t. Seriously, don’t. We’ve tried it once on an isolated test machine and within 30 minutes the machine was hacked into. Azure offers the Bastion service that takes care of RDP in a more secure way, AWS will have something similar.
If you’re in a hybrid environment where on-premises meets cloud, make sure you manage the expectations of those consuming the data. As a DBA, or data professional, make sure people understand the security, the possible issues and the way they have to access the data before you unleash all the cool marketing stuff that shows what the cloud can do.
The cloud can do a lot. Very cool stuff, very quickly. Faster than you could deploy hardware on-premises. That’s the really nice part. Updates are installed without you noticing and new features keep getting deployed. It’s a never-ending stream of updates. On someone else’s computer. Keep that in mind, and let all the good stuff roam free to create your ideal data estate.
Thanks for reading!
One thought on “T-SQL Tuesday #139: The data world is hybrid”
I agree that having limited bandwidth between your cloud resources and your on-premises resources can be pretty painful.
Thanks for the thoughtful post contributing to this month’s #tsql2sday!
LikeLiked by 1 person