Hey @ArjanCodes, can you create video series in python instrumentation for observability i.e. metrics, log and trace at (application level, container &pod level and inter microservice ) I love watching your video
THX, Arjan. I know several of these databases. My first code written as part of a job was way back in 1985, it's been a while. I remember Oracle 5 struggling on PC server to join 4 tables containing almost no rows. I remember myself proposing relational DB will be inherently slow and die off quickly before ever making market impact. Technology advances, the world changes, and we all learn. I enjoyed your quick tour de chambre of databases. Good way to expand everybody's view on what a database really is = a technology to efficiently store and access data. Keep up your good work!
Your contents are awesome! Please Arjan, make a large scale real life program with Python. This could include database processes, file operations, performance improvement, computing and web. I will write this comment every video of you I watch :). Greetings
I think a good set of videos is to start with the topic. "geospatial databases" and then talk about the "geospatial" features in each database. (i.e. Redis, Tile38, PostgreSQL, and even DynamoDB with an extension) and then compare databases against each other, to help us decide at what point so we use a generalist database (Redis / PostgreSQL) with a geospatial feature, versus getting a specialized database like Tile38. I mention geo-spatial since that is my biggest need, but a network database is right behind that.
@ArjanCodes, man thank you so much for these contents you upload for us, very helpful, well described, and when you explain things, you make them look very easy, please keep up the amazing work
Thanks for this Video. I always like content that makes you reflect about architecture decisions. Another Database that seems interesting to me is ArangoDB
Spills to disk very well when you have bigger than memory data, not a strength of Polars. You can use all sorts of different languages with it, not just Python. Lots of people know SQL. It is a db with db features like constraints and indexes.
Nice video Arjan. I think session management with openAI is already implemented through the newish OpenAI Assistants API. Just use the same assistant with the same thread ID, and enjoy your key value store!
Influx DB looks VERY INTERESTING! We use RRD for this function and it has the most awful, clunky API you can possibly imagine. I think learning Flux Query Language would be easy-peasy-lemon-squeezy compared to navigating the tortuous documentation of RRD. :)
I have a project that coukd benefit from duckdb i think, data isnt important enough for long term storage, but good to see at a glance as a technician or team of technicians. Perfect
I would love to see a video about non-typical SQLite use cases. It's so flexible and lightweight and I feel like people are sleeping on it just because it's not for a client/server role. I started using it as a local K:V store because I didn't wanna bother with something like redis, and I'm quite impressed.
@ArjanCodes - Would you mind exploring Mojo more, for those who are looking to harness the power and speed it can provide for Python users? There are many topics related like ownership, life cycles, traits, and pointers which are foreign concepts to many of us.
I don't like the implicit nature of duckDB. Constantly grabs objects that exist in a local scope. Polars on the other hand is much more stable because it is very explicit. I have had to fix data scientist's code many times because they didnt realise secondary effects of many duck db operations. Also duck db absolutely messes up the linter and static type checking tools.
I have never used guess it time to give it a try, can we get your views on using typesense in python projects using fastapi or postgres full text search.
I don't understand why people say duckdb is cool ... feels just like sqlite but with the flexibility to work directly over dataframes or files ... but why would i use that instead of just loading the files with some specialized dataframe package like pandas, polars or vaex? It would be cool to see a video on it!
Can be quicker to than Polars and definitely is quicker than pandas. It is really useful when you work with team that are sql heavy/mixed and where there is a lot of legacy sql code to integrate. It's also lighter to setup (I sometime just use the cli or the exe). You can also take creative approach to your pipeline and apply the transformation that are clearer in sql using DuckDB and then continue using your dataframe package. I'm not saying it's a good idea but I did it for a few transformation and it worked really well. I feel like for some bigger than ram dataset it can be better than Polars and also is more mature for the moment if that makes sense. I also find that the "ergonomics" of DuckDB is really where it shine:DuckDB is the easiest way to use sql from python IMO not saying that other tools are difficult but DuckDB is dead simple.
Spills to disk very well when you have bigger than memory data, not a strength of Polars. You can use all sorts of different languages with it, not just Python. Lots of people know SQL. It is a db with db features like constraints and indexes rather than another dataframe lib.
I am using in prod right now as the key piece in a data lakehouse architecture for analytics. It’s soooo nice to have a one stop shop for writing SQL queries that pull from parquet, csv, a few live databases, with zero friction. And it’s super fast for analytical queries on medium sized data. You could do this with an ORM on top of a bunch of Python connectors and leverage polars or whatnot too, but it just feels simple clean and fast to have it in duckdb
These days, Postgres is very very good. You need a good reason not to use it. It is free, mature, scales, has good IDE support, good python support, extensions for everything, and great Docker packages. And if you want third-party support, it is easy to find at every level.
RE: Rediculous DBs - did you know Python has a built-in DB? No, not SQLite! It's called dbm. It's not even relational - it can just store dicts for you! 😂
Was that an official endorsement of hitting interns with mechanical keyboards??! Watch out, you'll get cancelled with talk like that! All joking apart, this was very timely and useful information for me. Thanks!
✅ Get the FREE Software Architecture Checklist, a guide for building robust, scalable software systems: arjan.codes/checklist.
Hey @ArjanCodes, can you create video series in python instrumentation for observability i.e. metrics, log and trace at (application level, container &pod level and inter microservice )
I love watching your video
I'd love to see a deeper dive on DuckDB!
Got it 😊
Me too!
agreed
same
Same here.
THX, Arjan. I know several of these databases. My first code written as part of a job was way back in 1985, it's been a while.
I remember Oracle 5 struggling on PC server to join 4 tables containing almost no rows.
I remember myself proposing relational DB will be inherently slow and die off quickly before ever making market impact.
Technology advances, the world changes, and we all learn.
I enjoyed your quick tour de chambre of databases. Good way to expand everybody's view on what a database really is = a technology to efficiently store and access data.
Keep up your good work!
Boring is good.
Came to say this 🫡
This. Just about everyone can use Postgres and MySQL.
If it ain't broke, don't fix it. But there are uses for these specialized db's.
@@TheEvertw but the selling is wrong, you don't use them because they are cool.
Yep this, boring is generally stable and reliable. So keep your sanity
I’m convinced… mission critical ChatGPT data storage, here I come!
I too ✨
We have to make sure all those shiny Nvidia cards are put into good use!
I know python because of you Corey. Thanks
You're going to need better than that to convince me not to use Postgres.
Your contents are awesome! Please Arjan, make a large scale real life program with Python. This could include database processes, file operations, performance improvement, computing and web. I will write this comment every video of you I watch :). Greetings
Excellent.. You are always to the point, which I like most...
I think a good set of videos is to start with the topic. "geospatial databases" and then talk about the "geospatial" features in each database. (i.e. Redis, Tile38, PostgreSQL, and even DynamoDB with an extension) and then compare databases against each other, to help us decide at what point so we use a generalist database (Redis / PostgreSQL) with a geospatial feature, versus getting a specialized database like Tile38.
I mention geo-spatial since that is my biggest need, but a network database is right behind that.
and even in MS SQL Server!
As a note. "J" is pronounced as "jay". So I would think Neo4j is pronounced, neo-four-jay. The letter "G" is pronounced as "gee"
Differs in other languages ;)
I learned a few months ago that they are exactly the opposite way around in French. 🤷♂️
C is redundant in American and English, soft C is an "ess" aka S, hard C is "kay" aka K
Great Video Thank you ! i am using PostgreSQL (with GIS extension) and Redis for cache. I d love to see comparison DuckDB vs SQL based
There seems to be some issue with signing up for your newsletter and guides. I tried a few times and it is not working. Can anyone else confirm?
Thanks Arjan Influxdb was exactly what I was looking for my testing analytics… great episode 👏👏👏
@ArjanCodes, man thank you so much for these contents you upload for us, very helpful, well described, and when you explain things, you make them look very easy, please keep up the amazing work
Thanks for this Video. I always like content that makes you reflect about architecture decisions. Another Database that seems interesting to me is ArangoDB
Yes to DuckDB, but for me its about how does it differ from what can be done with Polars.
Spills to disk very well when you have bigger than memory data, not a strength of Polars. You can use all sorts of different languages with it, not just Python. Lots of people know SQL. It is a db with db features like constraints and indexes.
SQL and joins across data sources
I'm very interested in more DuckDB content.
Nice video Arjan. I think session management with openAI is already implemented through the newish OpenAI Assistants API. Just use the same assistant with the same thread ID, and enjoy your key value store!
It’s on!
Influx DB looks VERY INTERESTING! We use RRD for this function and it has the most awful, clunky API you can possibly imagine. I think learning Flux Query Language would be easy-peasy-lemon-squeezy compared to navigating the tortuous documentation of RRD. :)
I can't wait for going deeper into the duckdb
what extension do you use for Python in vsc?
Thank you for this very useful video!
Glad you enjoyed it!
PostGIS vs Tile38
can i check benchmark about that?
I have a project that coukd benefit from duckdb i think, data isnt important enough for long term storage, but good to see at a glance as a technician or team of technicians. Perfect
Fantastic video, both educational and wise
Glad you liked it!
Is there something like gdbm or newer available? Focus is on newer.
I would love to see a video about non-typical SQLite use cases. It's so flexible and lightweight and I feel like people are sleeping on it just because it's not for a client/server role. I started using it as a local K:V store because I didn't wanna bother with something like redis, and I'm quite impressed.
Using SQLite for caching is a really great use case!
I would be interested in a deeper dive on DuckDB.
@ArjanCodes - Would you mind exploring Mojo more, for those who are looking to harness the power and speed it can provide for Python users? There are many topics related like ownership, life cycles, traits, and pointers which are foreign concepts to many of us.
Thanks
Thank you so much!
More about DuckDB - maybe a DuckDB vs Polars video? It feels their features heavily overlap, but I'm not sure.
I don't like the implicit nature of duckDB. Constantly grabs objects that exist in a local scope. Polars on the other hand is much more stable because it is very explicit. I have had to fix data scientist's code many times because they didnt realise secondary effects of many duck db operations. Also duck db absolutely messes up the linter and static type checking tools.
18:29 no, only MongoDB Atlas supports vector similarity search
Hope you will soon prepare a tutorial on uv package manager
duckdb is one of my new favorites. it takes the best of data frames and sql and mashes it together. Its awesome.
Loving duckdb for the simplicity of SQL based analytics on heterogeneous data sources
I have never used guess it time to give it a try, can we get your views on using typesense in python projects using fastapi or postgres full text search.
Postgres x TimeScaleDB vs Influx ?
What about Clickhouse?
I don't understand why people say duckdb is cool ... feels just like sqlite but with the flexibility to work directly over dataframes or files ... but why would i use that instead of just loading the files with some specialized dataframe package like pandas, polars or vaex?
It would be cool to see a video on it!
Can be quicker to than Polars and definitely is quicker than pandas.
It is really useful when you work with team that are sql heavy/mixed and where there is a lot of legacy sql code to integrate.
It's also lighter to setup (I sometime just use the cli or the exe).
You can also take creative approach to your pipeline and apply the transformation that are clearer in sql using DuckDB and then continue using your dataframe package. I'm not saying it's a good idea but I did it for a few transformation and it worked really well.
I feel like for some bigger than ram dataset it can be better than Polars and also is more mature for the moment if that makes sense.
I also find that the "ergonomics" of DuckDB is really where it shine:DuckDB is the easiest way to use sql from python IMO not saying that other tools are difficult but DuckDB is dead simple.
Spills to disk very well when you have bigger than memory data, not a strength of Polars. You can use all sorts of different languages with it, not just Python. Lots of people know SQL. It is a db with db features like constraints and indexes rather than another dataframe lib.
Cool! Thanks for the answers! Will take a closer look at it.
I am using in prod right now as the key piece in a data lakehouse architecture for analytics. It’s soooo nice to have a one stop shop for writing SQL queries that pull from parquet, csv, a few live databases, with zero friction. And it’s super fast for analytical queries on medium sized data.
You could do this with an ORM on top of a bunch of Python connectors and leverage polars or whatnot too, but it just feels simple clean and fast to have it in duckdb
Looking at the speed increase of SSD esp. over the last four years... consider using a DB at all.
Neo4j is just fun to use.
Why Redis?! I have it already replaced with KeyDB.
BTW, Postgresql is getting a vector engine too...
curious why not Valkey?
What about PocketDB?
DuckDB video +1 please
CockroachDB is also interesting thing to check :)
What about ArangoDB? a hybrid DB, RDBMS+Graph,Document, better than Neo4j IMHO. There is also one more interesting UnrealDB
I guess Postgres can do most of these tasks using extensions of 😅
Please create series on duckdb.
DuckDB is really very useful.
Neo4j has vector support.
OrientDB is also very interesting
I'll watch your duckdb video when done
What about DNS as a database!
DuckDB is sooooo good
Flux is being deprecated for influxdb 3 fyi
True, but I wanted to stick to open source here, and that is still on version 2.
me hearing Arjan pronouncing Milvus as Milfus:
The person who invented DuckDB is a quack. 🦆
These days, Postgres is very very good. You need a good reason not to use it. It is free, mature, scales, has good IDE support, good python support, extensions for everything, and great Docker packages. And if you want third-party support, it is easy to find at every level.
RocksDB?
Duckdb, pocketdb
What about xml databases?
RE: Rediculous DBs - did you know Python has a built-in DB? No, not SQLite! It's called dbm. It's not even relational - it can just store dicts for you! 😂
sqlite is all you need
I'll stick with "boring" postgres
"new query language like flux"
- naaaaah... NO!
Did he really call it “Readis?”
Yes. I was referring to only half of the database interface. The other half is called Writis.
Keep it simple... CSV? 🤣
Was that an official endorsement of hitting interns with mechanical keyboards??! Watch out, you'll get cancelled with talk like that!
All joking apart, this was very timely and useful information for me. Thanks!
love your boozy demoes
Don't use any of these. Just use Postgres
Database education and Making Interns cry... LMAO!
WTF? Who cares what's boring for you guys if they do their work well...
Boring is not a valid argument.