Multi-Tenant: Database Per Tenant or Shared?

Поделиться
HTML-код
  • Опубликовано: 15 янв 2025

Комментарии • 114

  • @logantcooper6
    @logantcooper6 4 месяца назад +19

    We have a shared database for all tenants and use query filters. Works well enough. We plan to squeeze as much life out of this setup as we grow before we decide to take on any more complexity.

    • @alex_chugaev
      @alex_chugaev 4 месяца назад

      That’s agile way, I like it 👍🏻

  • @steve-wright-uk
    @steve-wright-uk 4 месяца назад +57

    Our software supports database per tenant, but in production we've yet to use that feature. Instead of adding filter by tenant to all the queries, we created row level security on a MS-SQL database. Each Tenant has its own SQL USER. Each database tabale uses security policy to apply a predicate at the database level. That predicate will check the SQL USER running the query and apply a predicate to filter out any rows that are not for that tenant. This has the advantages that the developers are no longer responsible for applying the tenant filter in the code, and that any SQL injection attacks will automatically be restricted to just that tenant. Obviously this is only a practical solution when you have a low number of tenants.

    • @awright18
      @awright18 4 месяца назад

      I just came here to say that exact thing :)

    •  4 месяца назад

      Wow, I had no idea that you could do that! Interesting!

    • @lukassinkus6162
      @lukassinkus6162 4 месяца назад +4

      didn't know this existed, very interesting approach.
      what's your flow for onboarding new customers?
      do you have automated scripts that create these users with these special permissions?
      how do you store these user credentials?

    • @steve-wright-uk
      @steve-wright-uk 4 месяца назад

      @@lukassinkus6162 We have a SQL USER per tenant, not per user. Given that we've got a few customers with a large user base, creating the SQL USER is a trivial manual task. In our SAAS system all users authenicate via their tenant's AzureAD system and we then receive a claim principle with their details. Those include a tenant id. We use that to lookup the SQL USER which is then stored in a server side session for that user. All subsequent database queries use this SQL USER in the connection string.

    • @CodeOpinion
      @CodeOpinion  4 месяца назад +1

      Thanks for the comment! Yes, DB user with db level filtering is a great option if you can do it, such with SQL Server.

  • @maple.everything
    @maple.everything 4 месяца назад +11

    For our SaaS product, we're using a hybrid model. Consumers are, writ-large, on one of several regionally-pooled APIs & DBs. For our important enterprise customers, we have dedicated environments for each. This gives them complete isolation and better performance guarantees. We can also move customers from one pool or environment to another in cases where a shared customer is promoted to an enterprise customer - this is where globally unique ids can save you. The enterprise fees are commensurate with the development of both the hybrid model and the resources they're consuming in their environment.
    I can tell you one thing for certain: Infrastructure as Code helps a ton if you're doing either pooled DB & API, DB & API per tenant, or both simultaneously. It makes spinning up those extra environments and deploying changes much, much faster and more reliable.

  • @steve-wright-uk
    @steve-wright-uk 4 месяца назад +14

    One of the advantages for database per tenant is if you need to do a database restore. This allows a single tenant to be affected.

    • @JacobGlanz1234
      @JacobGlanz1234 4 месяца назад

      I would argue. It’s much easier for me to maintain a replica than a DB per tenant.
      All the cloud managed DBs also offer managed replicas

    • @steve-wright-uk
      @steve-wright-uk 4 месяца назад +2

      @@JacobGlanz1234 The problem is that if we ever manage to corrupt a tenant's data and need to roll back then that database restore would affect all tenants. Separate DB per tenant also helps if we have off board a customer and have to delete their data due to GDPR requirements.
      As stated on a previous comment, our software supports DB per tenant, but we're not using that feature yet with any of our tenants.

  • @malehernandez1975
    @malehernandez1975 4 месяца назад +6

    It is worth noting that a multi-tenant system can, more or less cheaply, support a single tenant on a new instance, e.g. if you have some tenants with gigantic amounts of data, or expect some with legal restrictions needing an on premise instance. The reverse path can be more complicated, if you finally realize that you can't maintain a lot of instances.

    • @qj0n
      @qj0n 4 месяца назад +2

      Yes! That's why i always suggest starting with single DB. It's much easier to both switch to db per tenant or to hybrid approach. This is the most flexible way

  • @JoeEnos
    @JoeEnos 4 месяца назад +22

    If you're going with database-per-tenant, I would just make sure you seriously take the time upfront to build (or buy) proper sync and validation tools between each database. Easy to skip if you're just starting with a handful of databases, but that number creeps up, and executing a schema change in a thousand databases, or running 999 schema comparisons manually isn't a lot of fun.

    • @CodeOpinion
      @CodeOpinion  4 месяца назад +2

      Absolutely. The amount of operational overhead can be a nightmare.

    • @ankeshkapil3129
      @ankeshkapil3129 4 месяца назад +7

      but maintaining so much data in a single table is also a nightmare

    • @elpe21
      @elpe21 4 месяца назад

      We've got over 300 databases on-premises. Migrations are part of the system , dbs are not in sync ( don't have to be ). We've also have postgres versions ranging from 9.2 to 16 ;)

    • @JoeEnos
      @JoeEnos 4 месяца назад

      @@elpe21 Ouch, that's just absurd.

    • @elpe21
      @elpe21 4 месяца назад +1

      @@JoeEnos Well, we update customers one by one / when they ask for it. If you think 'MS Windows', they also have multiple version running on computers , also people update the system when they want ;) Often our customers do not have good connection so we can't move them to cloud ( and our system is not cloud-friendly yet )

  • @ulrichborchers5632
    @ulrichborchers5632 4 месяца назад +3

    Very nice topic. Great overview of important considerations and options.
    There are so many variations. Database versioning. API versioning. Backwards compatibility. Containerized versus shared resources. Single instances or Cluster. Split per schema on the same host versus different tables in the same schema versus one distict daemon or machine per tenant ... production, staging and testing environments ... split by tenant in the user-facing data center but mirror into a shared data structure ...

    • @CodeOpinion
      @CodeOpinion  4 месяца назад +2

      Thanks for all the video ideas! 😀

  • @ranulpallemulle5447
    @ranulpallemulle5447 4 месяца назад +3

    At my previous job, we tried to do database per tenant for a SaaS product and one issue we ran into was the actual provisioning of database. We tried to create the database “on-demand” - i.e when a user signed up as a new organization. The problem was we did this on azure with mssql databases so naturally it was slow and added complexity to the sign up process (showing the user a “please wait” screen, azure API requests with preconfigured ARM templates, retries on db creation failure, attempting to use a pool of databases and expanding the pool when needed, managing elastic pools, etc). Not to mention the cost aspect and migration rollouts. Should have gone with something like RLS or query filter

  • @ITManPH
    @ITManPH Месяц назад

    Thanks Man! really need this video for a project.

  • @GigaFro
    @GigaFro 12 дней назад

    Great intro! Super clear. Thank you!

  • @jackwright517
    @jackwright517 4 месяца назад

    Thank you so much this is ultra informative! Im working on a multi tenanted system right now, and the timing is impeccable!

  • @shawn_bullock
    @shawn_bullock 4 месяца назад +9

    At all the banks I've worked at, and where I'm at now (a data analytics company) we always use database per tenant. It's the right thing for us, we cannot risk a query defect leaking data, or a problem with one tenant affecting others, or a query-heavy tenant affecting the CPU/RAM of other tenants. Also if we have to restore (which we've done once or twice over the years) it only affects a single tenant.
    The downside is migrations and cost. But it greatly simplifies our application architecture and compliance audits.

    • @steve-wright-uk
      @steve-wright-uk 4 месяца назад

      There's pros and cons for each method. You have to choose what's important for YOUR business model.

    • @CodeOpinion
      @CodeOpinion  4 месяца назад +1

      Thanks for the comment. Data leaking is probably the comment/subject that gets brought up the most. The other which I totally forgot to bring up is if you are in a situation where you have cross tenant communication. Maybe a separate video one day.

    • @repotranstech
      @repotranstech 4 месяца назад

      For banks and fintech it makes sense.

    • @tibba69
      @tibba69 4 месяца назад

      Yes please do a video about cross tenant communication/data sharing

  • @bitaligners
    @bitaligners 4 месяца назад +1

    I think the hybrid approach is valid: 1 only database to maintain and a schema dedicated for each tenant. Sometimes it's possible to spread schemas along 2 or more db instance to tackle noisy neighbors or performance issues. Schemas ensure isolation without applying any query filters. it's like to select the right database but without the burden to manage many databases.

    • @CodeOpinion
      @CodeOpinion  4 месяца назад

      From an infrastructure operational angle, yes less to manage. From a code operations the migrations/updates to those schemas is what needs to usually be more involved in automation during deployment. Especially if you have a single code/app release that are hitting those different schemas. (eg, add a new column to a large table that is going to lock the table, rolling that out to each schema on the same DB has a bunch of perf/availability implications).

  • @hirre83
    @hirre83 16 дней назад

    Really nice with one DB per tenant, especially when working with security related applications with lots of access control lists. Also for backup and restoration, only affects that specific tenant. Shared database is nice when having a lot of "move" functions from one tenant to another, you only change the "pointer", with one DB / tentant, you need to copy the data.

  • @N8ontv
    @N8ontv 4 месяца назад +1

    I inherited a db per tenant architecture. With CI/CD and automation in place many of the downsides mentioned are easily addressed other than cost. Our real complexity came from also having tenant-agnostic data that all tenants nevertheless need access to.

    • @CodeOpinion
      @CodeOpinion  4 месяца назад +2

      I never brought that topic up, as well as the topic of cross tenant communication, which I wanted to. Likely another video down the road.

  • @ziftytodd
    @ziftytodd 4 месяца назад +1

    Also consider data privacy and retention laws vary by country, so you might want to have a shared multi-tenant database for each unique region/country that your clients are in. Reduces complexity from one db per client, but makes dealing with each unique set of privacy laws easier.

  • @ThingEngineer
    @ThingEngineer 4 месяца назад

    Great video, you covered a lot of variations.
    Multi-Tenant with Supabase is great when using their auth and Postgres functions. There is not need to send user or tenant info because it is part of the session and accessible from the function call, they also scale amazingly.

    • @CodeOpinion
      @CodeOpinion  4 месяца назад

      Nice, thanks for the info!

  • @orianatoubia681
    @orianatoubia681 2 месяца назад

    I love your channel! Thank you

  • @devbert1880
    @devbert1880 4 месяца назад +1

    i love this channel. a topic that would really intrest me: how to deal with identity and access mangement stuff in a multi-tenant scenario - without cloud-lockin :)

    • @CodeOpinion
      @CodeOpinion  4 месяца назад

      I'll add it to my list of topic ideas! Thanks

  • @cdarrigo
    @cdarrigo 4 месяца назад +1

    Resources for tenant, including database and potentially API, also have an impact for provisioning new tenants. If you have a single database and API for all your tenants, provisioning a new tenant is pretty trivial. If every tenant gets their own same resources, then you have to provision and potentially seed all those resources every time you deploy new tenant. Just something else to consider

  • @prashanthb6521
    @prashanthb6521 23 дня назад

    For HPC workloads, its always better to have multiple databases, hopefully on different machines, even if it takes more maintenance.

  • @jprince1993
    @jprince1993 4 месяца назад

    A tenant per collection for Document store, makes me a feel uneasy. I like the clear separation that it gives, however for many tenants with many collections...it gets messy quick. Other methods are either having the tenant ID on each document in a collection OR a tenant shard key.

  • @anhcoder
    @anhcoder 4 месяца назад

    Split to multiple for better querying, re-indexing.

  • @marna_li
    @marna_li 4 месяца назад

    I have multiple tenants per DB based on the query filter method. But I realized there are cases you want it separate. So I could built some infrastructure to opt-out for separate databases based om customer requirements. Provided that they don't want separate instances of the app all together. I have seen that some companies do that because of data requirements and so they don't upgrade all customers to newer versions without them paying. This is kind of the pre-Cloud and pre-SAAS method. I just feel like it must be a hell to manage with branching and patching bugs across specific codebases for each customer.

  • @qj0n
    @qj0n 4 месяца назад +1

    Single db is our approach and we limit the issue with noisy neighbours by using tenantId and PartitionKey. This way is azure storage when data is split between nodes, each tenant should be on one node, so they only affect this node

    • @CodeOpinion
      @CodeOpinion  4 месяца назад +1

      Ya partitioning by tenant at storage, if you db supports it, is nice.

    • @IndigoDalliance
      @IndigoDalliance 4 месяца назад

      This is interesting. Do you guys use PartionKey on Cosmos Db? I don't know anything about it and want to know more.

  • @Philinmiami
    @Philinmiami 4 месяца назад

    We have a database per tenant model. In our use case, the system could be split into micro services, because 1/2 of the business logic and tables are exactly the same for all tenants. However, due to the needs of each tenant, we would have too many variations in the tables, and we need to do quite extensive join operations.
    We looked into a hybrid (multi-tenant for the 1/2 that's similar) and tenant per database, but since most of our business deals with joining from the "core" tables to the tenant tables, the test queries we ran were not promising.
    In our case, since we know exactly how the complexity runs, we have a git repo that is "core" repo, that every tenant is forked from and upstreams from, so in the few cases that we have to change things in the core tables, we just run the git upstream fetch commands.
    One thing to note, is that in our industry, data leakage is 100% not acceptable to any of our clients, and we'd probably lose them as an account (and this would be our best case scenario).

  • @trashmail8
    @trashmail8 4 месяца назад +7

    Besides using a UUID for the customerID, wouldn't it also be a good idea to also use a UUID for the tenantID?
    If you ever want to migrate a tenant from database A to database B, you might run the risk that a tenantID "1" already exists in database B..
    In other words, making both the tenantID and customerID a UUID, you can move data around freely without running the risk of using duplicate keys or data leaking between tenant instances.

    • @lukassinkus6162
      @lukassinkus6162 4 месяца назад +1

      My guess would be that tenants are globally unique and are probably coming from an external source, or that it just didn't make sense to mention it for brevity

    • @iaminterestedineverything
      @iaminterestedineverything 4 месяца назад

      Some database storage engines such as InnoDB don't do well with uuid, as they can cause excessive page IO if they happen to be the clustered index due to their randomness

    • @steve-wright-uk
      @steve-wright-uk 4 месяца назад +1

      @@lukassinkus6162 It's probably a good idea to use a UUID for all records in case you decide to migrate from a DB per tenant to a shared DB.

    • @CodeOpinion
      @CodeOpinion  4 месяца назад

      Yes I likely would make them globally unique, however you likely store the tenants in some type of primary DB which also knows which DB they are associated with. So even then it could be a UUID/GUID but could also be something simple. Also note, as someone else mentioned, DBs like MySQL/InnoDB don't do well with UUIDs as primary keys (unless that's changed?)

    • @steve-wright-uk
      @steve-wright-uk 4 месяца назад

      @@iaminterestedineverything Using UUIDs with clustered indexes is a known performance problem. However UUIDv7 gets around this problem by having a time component that means that they are naturally sorted in the order they are generated. The upcoming .Net 9 will have ths natively, but they are Nuget packages that can generate them now.

  • @JacobGlanz1234
    @JacobGlanz1234 4 месяца назад +1

    On SQL server you can do a RLS, so a query without a Tenant ID doesn’t return anything.
    You can even configure to bypass RLS with db admin users

    • @CodeOpinion
      @CodeOpinion  4 месяца назад

      Ya, row level with a specific user is a great option if your DB supports it. Or choose a DB that does it before you begin if that's the route your going to take. 😀

  • @georgehelyar
    @georgehelyar 4 месяца назад

    For relational, shared database, horizontally sharded on tenant id with something like citus. Currently doing this in prod for 11b req/mo on 100k tenants. Would use nosql but the data in that database is quite relational with a lot of m:m. Considered graph db but most graph db are just an inefficient layer on top of nosql/rdbms.
    In general if you have a lot of databases, they are still on the same server, so they still share resources - separate databases really just saves you from having to put the tenant id in every table and use composite keys, but the cost is in the schema management.

  • @hoareg2
    @hoareg2 4 месяца назад

    In some cases, tenants might have different versions, e.g 8.7 vs 8.6 of the application and DB so sharing the DB between them is not feasible. Also some tenants might have some features which are not available for others. In short, it looks to me that having a DB for each tenant is more complicated but more flexible.

    • @CodeOpinion
      @CodeOpinion  4 месяца назад +1

      features by tenant can often be done by feature flags. Yes, flexibility comes with complexity.

  • @dinov5347
    @dinov5347 4 месяца назад

    We have a notion of tenant (network) and just in the presentation we use automatic filtering on the tenant. The biggest issue is how to create indexes. Do you always index by the tenant id first? Or secondary? Or use db partitioning. Our system has a mixture of the above and quite frankly they are all have pros and cons. What do you think is the best approach of optimizing queries?

  • @nafg613
    @nafg613 4 месяца назад

    Shared postgres cluster, database per tenant. A single program instantiating an application object instance per tenant for one project, for another separate Cloud Run per tenant managed by Pulumi

  • @zonegamma8197
    @zonegamma8197 4 месяца назад +1

    very interesting thanks

    • @CodeOpinion
      @CodeOpinion  4 месяца назад

      It's a giant topic I could spend 1hr on.

  • @appstratum9747
    @appstratum9747 4 месяца назад

    Depending on the RDBMS, schema per tenant may be of interest. To all intents and purposes, when you take advantage of the RDBMS security model, you effectively have a "logical" database that you can isolate with its own associated users. You can migrate individual schemas on a per tenant basis.
    This is just another option to consider in your toolkit of solutions. As ever, whether this makes sense depends on what you're doing, who you're doing it for, what data you're doing it with and what traffic you're dealing with from your various tenants. Like the other approaches mentioned by Derek, this isn't a "one case fits all" solution.

    • @CodeOpinion
      @CodeOpinion  4 месяца назад +1

      Ya it's not a once case fits all for sure, as you clearly know! Ya I've also noticed people doing "table" per tenant which similar idea as schema per. Schema per tenant can be helpful early on for costs and operational maintenance for sure. As always, context is king.. and that context can change.

  • @JacobGlanz1234
    @JacobGlanz1234 4 месяца назад +2

    Have you ever heard of a SaaS product doing DB per tenant besides for legal reasons?

    • @awright18
      @awright18 4 месяца назад +2

      I've seen various countries requiring data to be on their soil.

    • @trashmail8
      @trashmail8 4 месяца назад

      @@awright18 In that case you can still have a database that's shared by multiple tenants, but with multiple data center locations hosting this shared database and assigning tenants based on geographical/legal zones to a particular data center.

    • @awright18
      @awright18 4 месяца назад

      @@trashmail8 true

    • @ChaseFreedomMusician
      @ChaseFreedomMusician 4 месяца назад

      I have. SQLite is fantastic for this.

    • @awright18
      @awright18 4 месяца назад +1

      I once worked at a company that would have various versions (even custom versions) of schema per tenant and they used separate databases, and servers for this.

  • @mohammadtoficmohammad3594
    @mohammadtoficmohammad3594 4 месяца назад +1

    Thank you

  • @nightshade427
    @nightshade427 4 месяца назад

    another benefit of db per tenant is that they can geolocate near the tenant

    • @lukassinkus6162
      @lukassinkus6162 4 месяца назад

      then you probably also need to deploy your app close to them as well, since what matters is distance to the server not just the db

    • @nightshade427
      @nightshade427 4 месяца назад +1

      yep I'm usually using edge for compute and keep db close to tenant, this way compute and db are close to users

    • @CodeOpinion
      @CodeOpinion  4 месяца назад

      Ya, great point.

  • @GabeHandle
    @GabeHandle 4 месяца назад

    Regardless of your choice, Lumbergh is gonna need you to go ahead and come in on Saturday... thanks!

    • @CodeOpinion
      @CodeOpinion  4 месяца назад +1

      Glad someone noticed it!

  • @GrowsDrummer
    @GrowsDrummer 2 месяца назад

    Hello, thank you for the video.
    We are debating our approach for a multi tenant application. We will have some big tenants with sensitive data and some smaller tenants. The way would be a mixed environment with multiple schemas in the same db server.
    Now here comes a big problem, a system admin should be able to create form templates to be available for all tenants. What do you think would be the best approach to make these templates available? Is it possible for a tenant with his own schema to also access a “global schema”? Or is it possible to save the template as a “system admin” on all schemas?
    Could it be done with writing a view on the db accessing all schemas?
    Rel Db is Postgres.
    How would you consider keeping the tenant_id nullable so you can set default values for everyone? Possible use case get the default templates and the tenant specific templates.
    Thanks in advance.

    • @CodeOpinion
      @CodeOpinion  Месяц назад +1

      Ya for sure can have global schema, but the issue, which is similar if tenants need to cross communicate is if they live on different instances, you aren't doing any single DB connection interaction. So you must treat things separately still.

    • @GrowsDrummer
      @GrowsDrummer Месяц назад

      @@CodeOpinion how would you solve this issue? Using some sort of script generator and importing templates via script on all schemas?

  • @guhfeitoza
    @guhfeitoza 2 месяца назад

    Hey, i'm developing a web application (ERP) with a database that will be exactly the same for multiple companies to get and store data. Should I continue with a single database for them all or should i create a database for each client i get? For now, the web application will be mostly to show up data, nothing heavy. Thx

  • @kubrakivan1579
    @kubrakivan1579 4 месяца назад

    I have read all the comments but I don’t see about one database and separate schemas for each tenant. Isn’t it a solution? Can it work such a way? As I know PostgreSQL database can do it.
    Please give me your professional thoughts.

    • @CodeOpinion
      @CodeOpinion  4 месяца назад

      Yup absolutely can do that as well. I mention it in other videos about having a single database instance but different schemas on that instance. Also helpful for ownership over data per service. In this case tenant, so different purpose.

  • @mhsn27mhsn10
    @mhsn27mhsn10 4 месяца назад

    What is about Ai ?

    • @CodeOpinion
      @CodeOpinion  4 месяца назад

      what about it?

    • @mhsn27mhsn10
      @mhsn27mhsn10 4 месяца назад

      @@CodeOpinion make video if you have time and give us your opinion about ai and how you can use it

    • @todd-alex
      @todd-alex 5 дней назад

      This question may be way more relevant now that AI agent tooling is maturing and it’s changing the way software architectures traditionally relied on client CRUD interactions with dbs. In distributed or federated AI training, db per tenant makes sense. Extended memory even when thinking about something like memGPT benefits from per tenant db. But now with AI agents + IaaS, we may soon see per user db become a discussion.

  • @ChaseFreedomMusician
    @ChaseFreedomMusician 4 месяца назад +2

    Hot Take: if you have a multi-tenant shared relational DB do not enforce FK. Have the keys but don't enforce them it will make your life so much better for splitting and changing data.

  • @alex_chugaev
    @alex_chugaev 4 месяца назад

    Is “noisy neighbor” that big issue? I mean you can auto-scale instances (computing and database) in many cases.

    • @CodeOpinion
      @CodeOpinion  4 месяца назад +1

      It depends if you want to auto-scale or not. Often times the noisy neighbor in my experience is because of just bad code/query that isn't running as intended. Or over time data has grown that's causing the issue. Often seems like its in the dark corner of the app nobody goes to then all of a sudden calls something that murders the DB.