We continue on our path to make a web application using C# and .NET. After choosing the front-end technology in part 1, picking the backend server tech in part 2, and deploying to Azure in part 3, we’re going to concentrate on choosing the database. With emphasis on .NET and Azure technologies of course. A database, or rather databases, is one of the most important aspects of any big application. The database choice can either cause or prevent latency bottlenecks, data consistency issues, availability problems, and development complexity. Choose wisely and you’ll get a lot of leeway to screw up in other areas.
We’re constrained to Azure and Microsoft tech in this article series, but Azure has database services for pretty much all popular solutions, so it’s not really a limitation. You can choose from relational DBs like SQL Server or Postgres, NoSQL databases like Cosmos DB and Table Storage, or in-memory cache instances of Redis. Each of those solutions has a matching managed Azure service that can auto-scale, auto-update, geo-replicate, do point-in-time-backup, and even make coffee. As long as you can afford to pay for it, that is. We’ll take a look at the most popular database offerings in Azure and go over the considerations in choosing the right one. Here is our contender lineup.
The Database Lineup
|Azure SQL Database||Relational||The flagship SQL database offering on Azure. A fully managed relational DB with all the cloud benefits like auto-scale, auto-update, rich security features, etc. Mostly compatible with the classic SQL Server engine, though some features are still missing as far as I know. In most cases, you’ll be able to migrate from SQL Server.|
|Azure SQL Managed Instance||Relational||This offering is halfway between a fully managed Azure SQL database and a classic SQL Server. It’s compatible with SQL Server engine 2008+ and you get more fine-grained control than with Azure SQL Database. Great for migrating an existing SQL Server database that you don’t want to or are unable to migrate to Azure SQL.|
|SQL Server in a VM||Relational||The classic SQL Server in an Azure virtual machine. As with any VM offering, Azure takes care of connectivity, but you have to maintain and upgrade the OS and the SQL Server database. Great as a first step for existing on-premise apps that you want to migrate to the cloud.|
|Azure Database for PostgreSQL||Relational||A managed solution in Azure for PostgreSQL. A very popular database that’s relational but also supports JSON columns that you can query on. Took the first place as the most popular database among professional developers in Stack Overflow Survey 2022.|
|Azure Database for MySQL and MariaDB||Relational||Managed solutions on Azure for MySQL and MariaDB respectively.|
|Azure Cosmos DB||NoSQL||The flagship NoSQL offering by Microsoft. A fully managed schema-less database that offers auto-scale, geo-replications, and different APIs (MongoDB, Gremlin, Cassandra, Azure Table Storage). Most suited as a geographically distributed highly available service with great performance. Not cheap.|
|Table Storage||NoSQL||A simpler and cheaper offering than Cosmos for NoSQL key-value tables. Cosmos DB has all the capabilities of Table Storage (they share an API) with additional capabilities.|
|Azure Cache for Redis||In-memory key-value||A managed instance of Redis on Azure.|
|Elasticsearch on Azure||Text search engine||A database to store, index, and search big amounts of text. You can run Elasticsearch on Azure as a self-managed service or as a managed service using Elastic Cloud.|
|Azure Data Explorer (aka Kusto)||Time series*||This service is defined as a data analytics service for a high throughput of data streaming. I found it to be the perfect database for application logs, IoT events, and telemetry.|
You can work with all of these databases from C# code with regular .NET objects. There are C# clients and drivers for all of them, though in my experience there’s somewhat better support for first-party databases like SQL Server and Azure SQL (with Entity Framework) than for third-party databases like Mongo DB (or Cosmos DB with Mongo API).
Considerations when choosing a database
There are a lot of things to consider when choosing your database. Choose wisely because you’ll be married to that database for a long time. One case study is Amazon Marketplace which started with Oracle and tried to migrate after developing their own database solutions. The migration took somewhere between 5 to 10 years, and they seem to be still partially using Oracle. Though if your company reaches Amazon scale, then you’re probably doing very well.
1. Choose a database you already know
The most important consideration in a new project is whether you have the option to work with a database technology you already know. Or one that someone in your team knows. Databases have their pros and cons, but the popular solutions are mostly all good. If you need a relational database, both Azure SQL and MySQL will be fine. On the other hand, working with a tool that you already know is a huge advantage.
2. Multiple Databases
You can have multiple databases. Each database is good for different purposes and there’s no reason to use the same database instance or database technology for everything. You might have an SQL database for your user information, a time-series database for logs, and a document database for invoices. It’s common in microservice architecture for each service to own its data, so you can have different database instances and technologies for your different services.
3. Do you need ACID transactions?
If you need ACID transactions, your choice of databases is limited. This is good because fewer choices mean an easier decision. Usually, databases that support ACID will be relational SQL databases. Even though most modern document-based databases also support ACID transactions, they have constraints, usually compromising performance.
4. Will you be doing a lot of Joins?
NoSQL databases are notoriously bad at joining. Though a lot of them do allow it, it comes with functional and performance limitations. It’s even common to create data duplication to avoid joins in document databases. If your main use case is join-heavy, a relational database is the way to go.
5. Got a schema planned?
A relational database requires planning ahead. You’ll have to define the schema for your tables and stick to it for all table items. You can change the schema, but each change adds complications like merge conflicts between multiple changes, services that need to work with different schema versions, possible database downtime during the migration, etc. Not that NoSQL doesn’t have such problems, it does, they are just different.
The approach of NoSQL databases is schemaless. You don’t have to plan anything. You can store any JSON document you’d like, without a rigid structure. This comes with a price, of course, because querying over documents that have a different structure is more difficult. You might need different implementations for different schema versions. As always, there is no free lunch in software development.
Consistency in a database means that any read request gets the latest record value. If, for example, a client changed their name in China, and a different client requested that name after a second in Europe, a consistent database should return the modified name.
Consistency has a cost, usually in performance, that you’ll have to pay when your database is distributed on multiple nodes (machines in different regions usually). You might be willing to compromise on consistency for eventual consistency (common in NoSQL databases) where your data will be temporarily inconsistent to achieve better availability. For example, if you replicate once an hour and you don’t mind your clients getting hour-old information, then you can query the closest database nodes and get better performance.
7. Sharding and replication
Sharding is the process of splitting your database’s data across multiple nodes, whereas replication is duplicating the data across nodes for latency and backup. Strategies for both sharding and replication can be delayed to a later stage in a new project, but it wouldn’t hurt to give it some thought in advance. Some databases, like key-value stores, are better for sharding, whereas other databases, like the traditional relational DBs, are worse at it.
Try choosing the sharding strategy in a way that will have queries go to as few nodes as possible, and for those nodes to be as geographically close as possible to the client.
If your application serves businesses and not consumers (a B2B app), you probably need to have multi-tenancy isolation. It’s useful both to validate correctness and for security purposes. If, for example, you develop a SaaS app for health companies, you might have to query to get all doctors. In most cases, the response should return only the doctors relevant to the specific health company, not to all doctors in the database.
There are different patterns for multi-tenancy. Here are a few examples:
- An entirely separate instance of the app per each tenant
- A separate database instance in the same app per tenant
- A single database for all tenants with separate tables per tenant
- A single database for all tenants with shared tables
The 1st solution will usually be the most expensive and most secure, while the 4th is going to be the least expensive and easiest to implement, but with the least isolation and security.
Choosing a database is a huge topic that can fill many books. I tried to give you a bird’s eye view of popular databases in Azure just to get you started in the right direction. If you need to start somewhere, any of the databases mentioned are mature enough and feature-rich enough to provide a decent solution to most projects, You probably won’t make a terrible mistake going with one of them. In any case, if you got some sage advice on working with databases in .NET, leave a comment below. Cheers.