For those of you screaming about adding an index, first an index isn't free and second, even if you add an index, this request takes 232ms. I'm not a maths person but 232 feels more than 2ms.
@@nickchapsas I would like to challenge you on that one. Is it possible to get the source and the database? Or just the data as csv
Месяц назад+11
I like your content but I will strongly disagree with you on this. Index is not free but also not expensive if the table is not growing fast. Especially for the things like username which is in most cases not even allowed to update. Plus it is more expensive to do text search on every row instead of using an index. Your comparison is not correct. 232ms is much faster than 900ms. For the non-changing rows it will be 2ms. For the changing rows it will still be around 2-3ms, if the package would use id from the cached data, instead of trying to access same row with more expensive query. I am assuming username will not change. But even if will change, there can be a simple query for checking by id and username (or whatever column we are searching) first and if there is no match then it can do an index search.
Delta isn't free either. Sure it's useful in certain scenario's, that doesn't mean that you should not optimize the database. Users typically string search the to be searched thing only once. So the performance gain in most systems would be negligible at best.
An index doesn't even work in this case as you're using a full wildcard search (%...%). The start of the search parameter has to be a known value, otherwise Sql Server won't be able to use the index. A full-text index would be helpful though, but then you have to rewrite your query too.
@@nickchapsas "fixing your database performance issues" implies that you do something to your database to fix the performance issues that the database has. An example of a similar situation would be if I have a car that has an underpowered engine and therefore cannot go up steep hills. And I would then say: I fixed the car's underpowered engine by avoiding steep hills.I didn't fix the engine, I just circumvented the issue. This doesn't take away that it is a valid solution. And potentially a great solution at that!
@@dondernerdnot really, compatible by saying “fix your long commute“ by buying a flat next to the office. No mention of travelling faster in the original statement so would not assume travelling at a faster speed.
Aside from just the client speed gain, this can also reduce backend load significantly. I can think of a dozen projects at work where this will come in handy already. Thanks!!!
@@jensborrisholt5555 Caching IS a fix. Not every query will perform well, regardless of how much you optimize it. In those situations, caching is arguably the ONLY fix. Indexes aren't magical and won't solve all issues either, though it would help with this. Besides, the index was very clearly just an example, used to show a slow query. I'm sure Nick knows to add indexes to his DBs.
ETags are a great tool - but that timestamp field does more than enable deltas. It allows for optimistic concurrency for writes to the database, which can be super useful as well!
I am also not sure if my query has multiple joins and not all the joined tables has Row version column... real life scenarios not always that straight forward :D
This seems neat but also, as other people have said, perhaps limited usefulness because the first client load is still slow and it doesn't improve much of the backend if you have a lot of clients. It would be nice to see some kind of memcached that was aware of rowversion so the cache would be distributed to all users querying the same data.
So, I'm guessing that Delta is checking and comparing the maximum versionrow value in a table, but what if a row is deleted? Will the change be detected by Delta and force a requery? Also, could it just grab and update those rows whose rowverson has been updated and is now > the last max cashed value?
Deleting an row does not update the max row version (in most cases unless you delete the tow with the current max version) What you would normally do in the case of deleting is check the count of the items in your “cache” if the max rowversion is the same by but the count is different then you know the data has changed.
You can check for this change faster than checking delta, also rows are not usually actually removed even when they are deleted, they are just marked as deleted. Permanently removing rows is a much bigger thing and almost always leads into big problems if done outside of controlled runs
He's searching if a username contains "nick", which means that he doesn't look for an exact "nick" string, but "nick" can be any part of the username string. An index wouldn't save you here unless you want to implement full-text search functionality on a username column
delta not only avoids the db query. it also avoid any code execution that would also be required as part of the request. business rules, input validation, auth, etc
Does this work if I'm loading an entire object graph via orm? Wouldn't it essentially have to do this for every relation in the graph that's being accessed? How would it determine that?
Great for the (contrived) read-heavy scenario identified, but developers should be careful about using caching like this to mask real database performance issues. If you have a very expensive query and lean on this type of browser caching to prevent heavy database load, you're at the mercy of well-behaved clients. A bad actor could simply ignore the etag and hammer the endpoint to potentially DDoS your server. Not saying we shouldn't leverage all the tools in our belt, but fixing fundamental database/query performance issues should be top of mind, and then solutions like Delta are a "yes and" addition.
I have created my own package for my company that tracks a specific class. Any changes to the class, such as update, delete or add will invoke its respective callback I pretty much rely on this to make sure all the objects in the memory are same as the data in db for the most part of the time. It also supports pagination and where clause so that large data set can still be tracked without much performance impact The concept behind this is similar to sqldependency but it is much simpler to use as everything is written in sql to ensure that it can be expanded in any other way that I want in the future as well as adding any kind of support to the sql system that I want
This is interesting. But I wonder, what the limitations are. As i understood it, the cache happens in the users browser based on the ETAG. So if you have an API with permission based on user/role that returns different data and a user uses different accounts in the same browser, the user would see invalid cached data. edit: nvm after the reading the project page. the ETAG consist of three values. AssemblyWriteTime/SQL Timestamp/ optional Suffix..... so you have to set the suffix to the userId/tenantid or whatever is suitable
I mean, I guess it fixes an issue by thinning out the herd so to speak. Is there an index on the rowversion column in your database though so that it can efficiently retrieve max(rowversion)? Otherwise you'd be doing a table scan to determine that max aggregate. Anyway, the idea itself is kinda cool. Transparently adding ETags and so on. I just think that a less-than-aware developer would take this fix and run with it, and eventually they'd have two problems to fix instead of just one problem.
Great video, I now know what rowverson is used for. But I think it's a very very specific case. What I mean? I have 2 cases that I haven't seen. 1. I search for "Nick" then I search for "Nic" and search for "Nick" again. 2. I search for "Nick" then open a new tab or new Postman or and search again for "Nick". How long does it take in these cases? Are they still as fast or not?
Interesting, but as near as I can tell from looking at the code, it uses a database-wide “change_tracking_current_version()” which will be the last version# for *any* tables tracked. If you have two tables tracked, you call API on Table A and then somebody changes Table B, your next request for Table A will not get a 304. Am I misunderstanding?
As someone coming from a DB background, I don't see how this is possible. If you use proper indexes and well written Stored Procedures (you can't get faster than a SP as its at a DB level), this will just be some kind of caching at the UI end, but many tables will be updated by other processes, so each time it will need to check if the table has been updated, which means it will be slow or the data might be invalid. I would just suggest you do things properly at a DB level, then this wouldn't be needed. You can always load common data or all data, if size permits, into memory, if you need very quick response times. I guess if you don't want to optimize things a DB level, then use this, but I would suggest that's a bad idea.
Месяц назад+1
It's always Simon Cropp. One of three 'Permanent Patrons' on the Fody project here. It's always Simon Cropp.
Месяц назад+1
I mean, more than a “permanent patron”, Simon is the one who created Fody, right?
Месяц назад
Yes, and Fody is an awesome project, just like a lot of things from Simon (Verify, Delta, NServiceBus, etc.). And his name is everywhere where there is some "hard parts" to fix on OpenSource Projects. So nice to give a shout out. ... For Fody he made that a "paid product" at some point (no one is going to check though), but you are actually supposed to be a patreon to use it. Therefor the "permanent patreon" with a pay once instead of donate when using. Very few in that tier though, so I hope he gets some compensation for all his work.
I've implemented caching several times in different forms in my current project. They're deceptively simple. I think this library would help in a few scenarios in my application, but not all. We're streaming data in from hardware, so some tables are in constant flux handling up to about 5-20k events per second depending on the hardware and data isolation.
How fine grained is this caching? From looking at the readme, it seems that whenever any record changes in a tracked table, the etag will no longer match, so the request won't be cached, even if you are requesting records that were not altered.
I think this selects the MAX(rowversion) and if data wasn't changed after the last fetch (which is specified by a request header) the browser just gets a 304 Not Modified response instead of a whole body. Otherwise the request is handled the same as if Delta wasn't used (well other than an extra ETag header added to the response for the next fetch request). I guess delete could just be handled by also selecting COUNT(*) together with the MAX.
Месяц назад+1
That is a good question. If delete does not increase the rowversion then cached version is invalid.
@@DavidSmith-ef4eh I did go through the documentation on github and there's no mention of a count lol. It also only works with SqlServer with it's rowversion timestamp mechanism.
@@gileee that optimistic update system from mssql. never used it tbh. but it seems a good system, microsoft surely has reasons for using it instead of locking rows and tables.
is there a security concern having the data stored in the browser? I assume there is probably a time limit as well of how long we can store the data in cache.
delta not only avoids the db query. it also avoid any code execution that would also be required as part of the request. business rules, input validation, auth, etc
I am seeing a lot of criticism in the comments and although 99% valid, i think that this is a great caching approach and is is super easy to use, especially for smaller applications with FEs like blazor. Thx Nick
Very nice, however, I looked into the Delta package and I'm not a fan. Seems to be tightly coupled to the database. I hydrate an in memory cache from the DB with the entity version. I want to pull the version from my cache.
I dont understand, how does it know if the record has changed or not? shouldn't it query the "rowversion" anyways?? why not index the "names" col and get 1ms too???
No this is for read heavy scenarios. That being said, I'm sure every DB has at least a few tables that are more read heavy than others
Месяц назад
@@nickchapsas Isn't rowversion database global? Meaning that any write to any rowversion table will increase the global rowversion which would 'break' any caching. Or does Delta do MAX(rowversion) on the individual table?
> Isn't rowversion database global yes it is. u can query per table but that is not how delta work, given it is very difficult to work out what tables are bing touched in the context of a web request.
no it is not designed for that scenario. basically what is important is the ratio of reads to writes. if u have more writes than reads, obviously Delta will not help u
Even though the first call is still slow, this seems something you can just add by default. It's easy to add and basically non-invasive to your code base. It also takes load off the DB, which means it has more resources for others doing first calls on the data. Appears to be a no-brainer.
Wonder how will it work on BFF or ANY multi-query API? How easy it will break, if somebody changes the code, because they need to query something else first?
So am I missing something? For example I have 3 tables that have no relation to each other. All of them have RowVersion. Changing 1 row in a table causes the etag for all requests to any tables become invalid.
I wonder how much additional storage would a new column like this require overall if you were to add it to most of your tables and they have millions of records.. and would there be a performance hit when updating/recalculating the column value when something has changed in the row (ok that part is probably fine if it's just a timestamp essentially)
You'll probably have a version column anyway for concurrency issues (optimistic/pessimistic locking). Only thing I don't like is it seems tied to the rowversion timestamp mechanic of SqlServer and I prefer Postgres and Sqlite.
the mem consumption is a few small additional string allocations per request and one sql command. and when u have a cache hit it will result in significantly less mem used
without this approach, every web request would query re-fetch all 1 million rows. not that the 1 million rows in the vid is just a way to demonstrate a db query that takes some time. delta does 1 db call that executes 2 very fast querys. io in the majority if scenarios it will be significantly faster than the other db interactions it replaces
@@simoncropp right, so you're saying that yes, it will re-fetch all 1 million rows if you change 1 record in the DB right? thanks for the library btw!!
From what I can see, the library is using SQL Change Tracking, so if the tables in the joins haven't changed, then no need to run again. You can even set what tables to track, so that one table that updates every second, can be ignored.
[Timestamp] public byte[] RowVersion { get; set; } protected override void OnModelCreating(ModelBuilder modelBuilder) { // Loop through all entities in the model foreach (var entityType in modelBuilder.Model.GetEntityTypes()) { // Check if the entity inherits from BaseDomainModel if (typeof(BaseDomainModel).IsAssignableFrom(entityType.ClrType)) { // Get the property info for MyVersionProperty var propertyInfo = entityType.ClrType.GetProperty(nameof(BaseDomainModel.RowVersion)); if (propertyInfo != null) { // Configure the property as a row version modelBuilder.Entity(entityType.ClrType) .Property(propertyInfo.PropertyType, propertyInfo.Name) .IsRowVersion(); } } } }
What about creating a nonclustered index on rowversion column, it would increase the performance? I know that more indexes implies in slower insert/update. But besides that it would speed the process of checking the field? Anyone Knows?
no. it is currently hard coded for sql server. but it would not be too difficult to adapt for mongo. assuming mongo has equivalent timestamp functionality
So this only comes in if the same user does the same search with the data being exactly the same? Then I don't see this having a big impact generally at all. It would be interesting if the data was patched with the updated values, instead of retrieving everything when a single value has changed. This would probably only be useful for non-index searches though, such as name like '%nick%'.
delta not only avoids the db query. it also avoid any code execution that would also be required as part of the request. business rules, input validation, auth, etc > instead of retrieving everything when a single value has changed but without delta this is what happens on every web request
And what about client-server desktop apps? Is there anything like this for desktop approach? I've made my own solution with caching that behaves very similar to Delta but maybe there's someone smarted that did it better and I do not have to play with cache every time
Would like to see a video on how to upload multiple photos to Azure blob storage as fast as possible. Would like to see how you would do that, Nick. Thanks for the helpful videos.
Does this work with rowversion columns which are date type ? I have been working with a lot of ERP stuff and we always have column ROWVERSION which is date type. If this works with such columns, this is really great.
Yes it should work with these columns too, but maybe check yourself. It can work with two different ways of such mechanisms actually as descibed in the docs
Caches that support versioning on whatever they're covering are fantastic to allow invalidate on access, which is basically what this is... Thus one of the 3 (4) hard problems of computer science gets resolved. Sometimes anyway
This is not practical if i cannot change the table structure in the database. I just want to speed up my queries, not change any tables as that is out if my access.
I guess there might be a very niche use case for this, but I think it would be just that: niche. The reason is that, as you say, caching is at the browser level. Which is to say, if you have 1000 users coming to a page, it's NOT the case that for the first user it will be slow and then it will be super fast for the other 999. Nope. All 1000 users will experience the slow query. There is only a speedup if the *same* user asks for the *same* data again. And, oh, nothing in the database has changed.
For those of you screaming about adding an index, first an index isn't free and second, even if you add an index, this request takes 232ms. I'm not a maths person but 232 feels more than 2ms.
@@nickchapsas I would like to challenge you on that one. Is it possible to get the source and the database? Or just the data as csv
I like your content but I will strongly disagree with you on this.
Index is not free but also not expensive if the table is not growing fast. Especially for the things like username which is in most cases not even allowed to update. Plus it is more expensive to do text search on every row instead of using an index.
Your comparison is not correct. 232ms is much faster than 900ms.
For the non-changing rows it will be 2ms.
For the changing rows it will still be around 2-3ms, if the package would use id from the cached data, instead of trying to access same row with more expensive query.
I am assuming username will not change. But even if will change, there can be a simple query for checking by id and username (or whatever column we are searching) first and if there is no match then it can do an index search.
Delta isn't free either. Sure it's useful in certain scenario's, that doesn't mean that you should not optimize the database. Users typically string search the to be searched thing only once. So the performance gain in most systems would be negligible at best.
@@davyx1 delta works for any url that talks to the db. not just search urls
An index doesn't even work in this case as you're using a full wildcard search (%...%). The start of the search parameter has to be a known value, otherwise Sql Server won't be able to use the index. A full-text index would be helpful though, but then you have to rewrite your query too.
This is a very handy package, but I would argue that it is not really improving database performance; it is just avoiding hitting it every time.
It's reducing DB usage, fixing your database performance issues, which is accurate to the title
Not to mention that it's also reducing network io. Which might reduce cost as well.
@@nickchapsas "fixing your database performance issues" implies that you do something to your database to fix the performance issues that the database has.
An example of a similar situation would be if I have a car that has an underpowered engine and therefore cannot go up steep hills. And I would then say: I fixed the car's underpowered engine by avoiding steep hills.I didn't fix the engine, I just circumvented the issue.
This doesn't take away that it is a valid solution. And potentially a great solution at that!
@@dondernerdnot really, compatible by saying “fix your long commute“ by buying a flat next to the office. No mention of travelling faster in the original statement so would not assume travelling at a faster speed.
The term should be a "workaround" rather "fixing"
Aside from just the client speed gain, this can also reduce backend load significantly. I can think of a dozen projects at work where this will come in handy already.
Thanks!!!
Title is kinda wrong, but the package seems awesome
Misleading clickbait titles is Nick's fetish 😂
Very misleading title. ”How to hide bad db performance for a user that makes multiple requests” would be more true.
So caching should never be referred to as a performance improvement but rather just hiding bad performance. Got it
@ Performance improvement yes. Fixing a database no.
How would you feel about ”Fix your bad database indexing” and have a video about Redis?
@@JohanNordberg exactly! His not fixing anything. He's just cashing some stuff. Create a index an do the Database stuff probably!
@@jensborrisholt5555 Caching IS a fix. Not every query will perform well, regardless of how much you optimize it.
In those situations, caching is arguably the ONLY fix.
Indexes aren't magical and won't solve all issues either, though it would help with this. Besides, the index was very clearly just an example, used to show a slow query.
I'm sure Nick knows to add indexes to his DBs.
What happen when you have a query that joins two tables . And table 1 didnt change but table 2 did
ETags are a great tool - but that timestamp field does more than enable deltas. It allows for optimistic concurrency for writes to the database, which can be super useful as well!
I think it is a very good package, but I hope that the maintainer will do a implementation also for PostgreSQL. Nice video, Nick!
Simon will read the comments so I'm sure this will come :D
Concurred
100%
Same actually, that would be huge.
Meanwhile I hope for something for MongoDB. I wanted to implement some ETag mechanism myself, but found that it might not be too easy.
So sad there is no Postgres implementation 😞
Someone should submit this to Code Cop for review...
😂😂😂😂
I'll make that video and double dip
Does this require you to add RowVersion in every table? And does it work with any db type?
I am also not sure if my query has multiple joins and not all the joined tables has Row version column... real life scenarios not always that straight forward :D
This seems neat but also, as other people have said, perhaps limited usefulness because the first client load is still slow and it doesn't improve much of the backend if you have a lot of clients. It would be nice to see some kind of memcached that was aware of rowversion so the cache would be distributed to all users querying the same data.
So, I'm guessing that Delta is checking and comparing the maximum versionrow value in a table, but what if a row is deleted? Will the change be detected by Delta and force a requery?
Also, could it just grab and update those rows whose rowverson has been updated and is now > the last max cashed value?
@@ChrisWalshZX it's looking at the max version of the db. I'm guessing a delete increases the db version.
Deleting an row does not update the max row version (in most cases unless you delete the tow with the current max version) What you would normally do in the case of deleting is check the count of the items in your “cache” if the max rowversion is the same by but the count is different then you know the data has changed.
you can just mark your data as deleted by modifying it. you'll have the timestamp changed this way
You can check for this change faster than checking delta, also rows are not usually actually removed even when they are deleted, they are just marked as deleted. Permanently removing rows is a much bigger thing and almost always leads into big problems if done outside of controlled runs
@@ChrisWalshZX If you're doing a MAX(rowversion) it's easy to add an extra COUNT(*) to the select.
You do not have index for username field. Your query seems doing full table scan and then you hiding real issue using caching.
Aside from that the title is misleading
You cannot (or I guess you technically could but good luck with writes) have indices for all possible columns the user can search on.
He's searching if a username contains "nick", which means that he doesn't look for an exact "nick" string, but "nick" can be any part of the username string. An index wouldn't save you here unless you want to implement full-text search functionality on a username column
Also is local.
delta not only avoids the db query. it also avoid any code execution that would also be required as part of the request. business rules, input validation, auth, etc
Agree with most of the comments, it's a handy package. But I'd rather say it's a fix to the API performance.
I'm sitting on this title for the next video
What about relationships (many-to-one, etc)? For example "user" is linked to "group" and someone changed the group name
I have the same question. 😀
changing group name would result in the cache being busted for the site. and any subsequet request will go all the way to the db
Same question, what about joins?
Does this work if I'm loading an entire object graph via orm? Wouldn't it essentially have to do this for every relation in the graph that's being accessed? How would it determine that?
I've built stuff like this manually. This will save a ton of time.
good man
OMG been waiting for this forever.
Thank you Nick as always great content and all the best
Great for the (contrived) read-heavy scenario identified, but developers should be careful about using caching like this to mask real database performance issues. If you have a very expensive query and lean on this type of browser caching to prevent heavy database load, you're at the mercy of well-behaved clients. A bad actor could simply ignore the etag and hammer the endpoint to potentially DDoS your server. Not saying we shouldn't leverage all the tools in our belt, but fixing fundamental database/query performance issues should be top of mind, and then solutions like Delta are a "yes and" addition.
How does this work if you are using a graph of different tables in a single query?
Don't do this anymore please :)
Don’t do what? Join tables?
@@weluvmusicz what don't include data from multiple tables?
I have created my own package for my company that tracks a specific class. Any changes to the class, such as update, delete or add will invoke its respective callback
I pretty much rely on this to make sure all the objects in the memory are same as the data in db for the most part of the time. It also supports pagination and where clause so that large data set can still be tracked without much performance impact
The concept behind this is similar to sqldependency but it is much simpler to use as everything is written in sql to ensure that it can be expanded in any other way that I want in the future as well as adding any kind of support to the sql system that I want
What about PostgreSQL, MySQL or Sqlite?
you use PostgreSQL, MySQL and Sqlite at the same time?
@@simoncropp He said 'or', not 'and'.
So basically a single modification in any table resets the cache for all the db?
Effectively yeah but it doesn't really "reset" the cache rather than "data has changed in some way so read the new version"
@nickchapsas OK. And yeah, it doesn't go in every browsers to reset the cache 😁
This is interesting. But I wonder, what the limitations are.
As i understood it, the cache happens in the users browser based on the ETAG. So if you have an API with permission based on user/role that returns different data and a user uses different accounts in the same browser, the user would see invalid cached data.
edit: nvm after the reading the project page. the ETAG consist of three values. AssemblyWriteTime/SQL Timestamp/ optional Suffix..... so you have to set the suffix to the userId/tenantid or whatever is suitable
Yes you can heavily customize the logic to your needs
I mean, I guess it fixes an issue by thinning out the herd so to speak. Is there an index on the rowversion column in your database though so that it can efficiently retrieve max(rowversion)? Otherwise you'd be doing a table scan to determine that max aggregate.
Anyway, the idea itself is kinda cool. Transparently adding ETags and so on. I just think that a less-than-aware developer would take this fix and run with it, and eventually they'd have two problems to fix instead of just one problem.
Great video, I now know what rowverson is used for.
But I think it's a very very specific case. What I mean?
I have 2 cases that I haven't seen.
1. I search for "Nick" then I search for "Nic" and search for "Nick" again.
2. I search for "Nick" then open a new tab or new Postman or and search again for "Nick".
How long does it take in these cases? Are they still as fast or not?
Interesting, but as near as I can tell from looking at the code, it uses a database-wide “change_tracking_current_version()” which will be the last version# for *any* tables tracked. If you have two tables tracked, you call API on Table A and then somebody changes Table B, your next request for Table A will not get a 304. Am I misunderstanding?
Wow a fantastic and easy to use library, thanks for sharing Nick
Nick's channel one of the rare channels I haven't block yet because of funny face thumbnails.
Same tbh
@@nickchapsas lol I'm happy that you haven't blocked your own channel from your feed. That's a good, healthy sign
As someone coming from a DB background, I don't see how this is possible. If you use proper indexes and well written Stored Procedures (you can't get faster than a SP as its at a DB level), this will just be some kind of caching at the UI end, but many tables will be updated by other processes, so each time it will need to check if the table has been updated, which means it will be slow or the data might be invalid.
I would just suggest you do things properly at a DB level, then this wouldn't be needed.
You can always load common data or all data, if size permits, into memory, if you need very quick response times.
I guess if you don't want to optimize things a DB level, then use this, but I would suggest that's a bad idea.
It's always Simon Cropp. One of three 'Permanent Patrons' on the Fody project here. It's always Simon Cropp.
I mean, more than a “permanent patron”, Simon is the one who created Fody, right?
Yes, and Fody is an awesome project, just like a lot of things from Simon (Verify, Delta, NServiceBus, etc.). And his name is everywhere where there is some "hard parts" to fix on OpenSource Projects. So nice to give a shout out.
...
For Fody he made that a "paid product" at some point (no one is going to check though), but you are actually supposed to be a patreon to use it. Therefor the "permanent patreon" with a pay once instead of donate when using. Very few in that tier though, so I hope he gets some compensation for all his work.
What will happen if we use backend side pagination, does this still work?
Doesn’t http get automatically return cached result sets unless response header has no cache control set?
I've implemented caching several times in different forms in my current project. They're deceptively simple. I think this library would help in a few scenarios in my application, but not all. We're streaming data in from hardware, so some tables are in constant flux handling up to about 5-20k events per second depending on the hardware and data isolation.
How fine grained is this caching? From looking at the readme, it seems that whenever any record changes in a tracked table, the etag will no longer match, so the request won't be cached, even if you are requesting records that were not altered.
Why not use the optimistic concurrency flag for this?
This one is pretty cool, thanks Nick and Simon
does it work on delete? I assume he sorts the row version desc and fetches all the row versions that are greater than the largest one cached..
I think this selects the MAX(rowversion) and if data wasn't changed after the last fetch (which is specified by a request header) the browser just gets a 304 Not Modified response instead of a whole body. Otherwise the request is handled the same as if Delta wasn't used (well other than an extra ETag header added to the response for the next fetch request).
I guess delete could just be handled by also selecting COUNT(*) together with the MAX.
That is a good question. If delete does not increase the rowversion then cached version is invalid.
they probably get a count I guess.. Tbh, I've been using a similar system for years. My row_version number is just called updated_at :D
@@DavidSmith-ef4eh I did go through the documentation on github and there's no mention of a count lol. It also only works with SqlServer with it's rowversion timestamp mechanism.
@@gileee that optimistic update system from mssql. never used it tbh. but it seems a good system, microsoft surely has reasons for using it instead of locking rows and tables.
is there a security concern having the data stored in the browser? I assume there is probably a time limit as well of how long we can store the data in cache.
You already serve the data to the browser. If it was ok to serve it the first time, it is ok to cache it.
Now this is actually very cool, even if limited to specific scenarios. Still, ingenuously simple.
Isn't it better to use at regular caching with cache invalidation instead of E-tags?
How this will work with multiple distributed app?
In postgres, you can use gin/gist indexes if you don't want to deal with cache. This will speed up queries a ton in cases like this one
Thanks! I’ll look into it!
delta not only avoids the db query. it also avoid any code execution that would also be required as part of the request. business rules, input validation, auth, etc
Now that was a great discovery, thanks for the share!!
I am seeing a lot of criticism in the comments and although 99% valid, i think that this is a great caching approach and is is super easy to use, especially for smaller applications with FEs like blazor. Thx Nick
Maybe silly qustion, but is he using dapper for querying in database?
What DB you use??? (SQL, PostgreSql, Collections, etc) and it's Local or Cloud??
Hi Nick, thanks for your content ! Why don't you use output caching middleware like you showed us in another video ?
Very nice, however, I looked into the Delta package and I'm not a fan. Seems to be tightly coupled to the database. I hydrate an in memory cache from the DB with the entity version. I want to pull the version from my cache.
I dont understand, how does it know if the record has changed or not? shouldn't it query the "rowversion" anyways?? why not index the "names" col and get 1ms too???
As I understand RowVersion is calculated from entire database? It is possible to map particular endpoint only to RowVersion from specyfic table?
Will this add much value if the DB table has fast changing data?
No this is for read heavy scenarios. That being said, I'm sure every DB has at least a few tables that are more read heavy than others
@@nickchapsas Isn't rowversion database global? Meaning that any write to any rowversion table will increase the global rowversion which would 'break' any caching. Or does Delta do MAX(rowversion) on the individual table?
> Isn't rowversion database global
yes it is. u can query per table but that is not how delta work, given it is very difficult to work out what tables are bing touched in the context of a web request.
no it is not designed for that scenario. basically what is important is the ratio of reads to writes. if u have more writes than reads, obviously Delta will not help u
This works with postgres? I used row version in sql server, but didn't know that postgres has a row version implementation?
Perfect as always. Thnx!
So what about if it is not a flat table but the query goes over 6 tables? Does it „check“ all rowVersions?
Yes
Even though the first call is still slow, this seems something you can just add by default. It's easy to add and basically non-invasive to your code base. It also takes load off the DB, which means it has more resources for others doing first calls on the data. Appears to be a no-brainer.
Wonder how will it work on BFF or ANY multi-query API? How easy it will break, if somebody changes the code, because they need to query something else first?
how would this work with EF with migrations scenarios?
And what if my client is not the browser but Windows form or WPF, can I use this technique?
What if I don't have a UI and I just have a background service calling a stored procedure, which has to Insert or Update?
is this for .net 9 only ?
So am I missing something? For example I have 3 tables that have no relation to each other. All of them have RowVersion. Changing 1 row in a table causes the etag for all requests to any tables become invalid.
> Changing 1 row in a table causes the etag for all requests to any tables become invalid.
yes.
I wonder how much additional storage would a new column like this require overall if you were to add it to most of your tables and they have millions of records.. and would there be a performance hit when updating/recalculating the column value when something has changed in the row (ok that part is probably fine if it's just a timestamp essentially)
It's a timestamp column so 8-bytes per row
You'll probably have a version column anyway for concurrency issues (optimistic/pessimistic locking). Only thing I don't like is it seems tied to the rowversion timestamp mechanic of SqlServer and I prefer Postgres and Sqlite.
@@gileee yeah true, I guess if it can be configured to rely on an existing column/setup without the need of DB changes it would be pretty good
How about memory consumption? Is there a noticeable difference?
the mem consumption is a few small additional string allocations per request and one sql command.
and when u have a cache hit it will result in significantly less mem used
What if you layer your db rows over DTOs?
That's totally fine. This will still work
tool looks awesome... but if you change 1 record in the DB, will the query re-fetch all 1 million rows?
without this approach, every web request would query re-fetch all 1 million rows. not that the 1 million rows in the vid is just a way to demonstrate a db query that takes some time. delta does 1 db call that executes 2 very fast querys. io in the majority if scenarios it will be significantly faster than the other db interactions it replaces
@@simoncropp right, so you're saying that yes, it will re-fetch all 1 million rows if you change 1 record in the DB right? thanks for the library btw!!
Can you implement this with Angular?
Thank you, very useful library
Great stuff but what about more complex scenarios with joins?
Same thing. It doesn’t matter. As long as you have the column in the tables of interest, this will work with joins stored procedures or anything else
From what I can see, the library is using SQL Change Tracking, so if the tables in the joins haven't changed, then no need to run again. You can even set what tables to track, so that one table that updates every second, can be ignored.
Wonder if this works with dapper and stored procedure for us who avoid EF…
What do we set the type to if using EF?
[Timestamp]
public byte[] RowVersion { get; set; }
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
// Loop through all entities in the model
foreach (var entityType in modelBuilder.Model.GetEntityTypes())
{
// Check if the entity inherits from BaseDomainModel
if (typeof(BaseDomainModel).IsAssignableFrom(entityType.ClrType))
{
// Get the property info for MyVersionProperty
var propertyInfo = entityType.ClrType.GetProperty(nameof(BaseDomainModel.RowVersion));
if (propertyInfo != null)
{
// Configure the property as a row version
modelBuilder.Entity(entityType.ClrType)
.Property(propertyInfo.PropertyType, propertyInfo.Name)
.IsRowVersion();
}
}
}
}
Note that if using EF you want the delta-ef package not the regular Delta package.
this doesnt seem to do anything for me on a wasm app? all requests are 200s
does it work on dapper?
Yes. That’s what I’m using
What tool have you used to seed 1M users?
Bogus
is only supported for .net9?
Nick established Dometrain just to get one million users and take the data for this demo :) Respect!
🤫🤫🤫
What about creating a nonclustered index on rowversion column, it would increase the performance?
I know that more indexes implies in slower insert/update. But besides that it would speed the process of checking the field?
Anyone Knows?
Great tip! Will check it out.
This works well for stale data, but if the rows are persisted often, I don't see a huge advantage. Either way, seems like a nice package.
Will this work with mongodb
no. it is currently hard coded for sql server. but it would not be too difficult to adapt for mongo. assuming mongo has equivalent timestamp functionality
Nick, are you planning to do a video on the new ms testing platform and x/n unit integrations and if it's actually ready for use?
I am
how to handle your SP's fetching data combining multiple tables?
How will this work in real scenarios. With alot of joins. And many to many etc.
So this only comes in if the same user does the same search with the data being exactly the same? Then I don't see this having a big impact generally at all. It would be interesting if the data was patched with the updated values, instead of retrieving everything when a single value has changed. This would probably only be useful for non-index searches though, such as name like '%nick%'.
delta not only avoids the db query. it also avoid any code execution that would also be required as part of the request. business rules, input validation, auth, etc
> instead of retrieving everything when a single value has changed
but without delta this is what happens on every web request
Can I associate this with Entity Framework?
Sure you can
It seems not working with SQLite in combination with EF at this time
Does the name of the "timestamp" column necessarily have the name "RowVersion" or can it have any other name?
AFAIK you don't have to call it rowversion
And what about client-server desktop apps? Is there anything like this for desktop approach? I've made my own solution with caching that behaves very similar to Delta but maybe there's someone smarted that did it better and I do not have to play with cache every time
You can use Replicant there to have the same logic on the HttpClient
Would like to see a video on how to upload multiple photos to Azure blob storage as fast as possible. Would like to see how you would do that, Nick. Thanks for the helpful videos.
This delay fix ? Would like fix many aka bugs
Does this work with rowversion columns which are date type ? I have been working with a lot of ERP stuff and we always have column ROWVERSION which is date type. If this works with such columns, this is really great.
Yes it should work with these columns too, but maybe check yourself. It can work with two different ways of such mechanisms actually as descibed in the docs
Caches that support versioning on whatever they're covering are fantastic to allow invalidate on access, which is basically what this is... Thus one of the 3 (4) hard problems of computer science gets resolved. Sometimes anyway
this will only work in web applications ?
what about mobile application ?
Check the video until the end
i want to do same for postgres that will the datatype for postgres. And if possible can u make More detailed video on this its looking interesting
Do you get paid or any financial benefit for promoting this package?
If I did I would have to legally disclose it. I just like the package and Simon.
Ok, that explains why I've seen this library on my Github wall...
This is not practical if i cannot change the table structure in the database. I just want to speed up my queries, not change any tables as that is out if my access.
Client side caching nice!
I guess there might be a very niche use case for this, but I think it would be just that: niche. The reason is that, as you say, caching is at the browser level. Which is to say, if you have 1000 users coming to a page, it's NOT the case that for the first user it will be slow and then it will be super fast for the other 999. Nope. All 1000 users will experience the slow query. There is only a speedup if the *same* user asks for the *same* data again. And, oh, nothing in the database has changed.