13:50 - He makes a disingenuous argument. If one had created their relational database to actually contain all of those pieces of information mentioned in the salesperson's question, then it would be possible to get the data out. But when he switches to talking about the graph database, he just presupposes that you have nodes that contain all of this information, and that you have complete information on all of your customers. You may have information about some of the customers that have bought toasters. You may have information about some of the customers that are ex-cons. But, in a graph database, you have no guarantee that you have complete information about each customer. More importantly, you do not know if your lack of information is because the customer never bought a toaster, or because you simply don't know about the toaster the customer bought. With a relational database, you can at least indicate which customers you have complete information about and then only consider those in your statistics. Now, I am a huge fan of graph databases. That's why I'm watching this video. However, it seems almost any time someone tries to compare and contrast two technologies, they always seem to make disingenuous comparisons. They will consider a critical Factor for one technology and then assume that critical Factor is not a big issue with the preferred technology. You got to make even comparisons if you are going to have any chance of choosing a technology based on its merit.
I feel like he only barely touched on the actual advantages of graph databases. The queries he showed off can be done in a relational database without any real performance issues. I think what really separates Graph databases from relational databases is their extensibility and how they treat relationships themselves as entirely separate entities. I'm an RDBMS amateur and have no experience with Graph databases so I could be way off, but it sounds like graph databases can be extremely easy to extend beyond their initially-defined schema without really needing much, if any, refactoring. You can just define a new relationship and start using it to link nodes together. With a relational database you'd have to do a whole lot more refactoring, adding or modifying columns, etc. And then there's the direct focus on relationships between nodes. If you're working with highly-interconnected data and the connections themselves have their own attributes beyond just the two nodes they link together, I can see how a graph database could be useful. Basically it sounds like it's more useful for modeling complex, constantly evolving networks. Like a Social Network for example, but one where you can freely define relationships between you and other people rather than picking from a dropdown or creating explicitly-defined lists of people. What I want to know is how the data is actually stored and indexed beyond just having nodes and relationships, and how that affects query performance. Take the very first query he showed, for example. A simple "SELECT * FROM questions" in MySQL will just find the "questions" table and return every row it contains. But the equivalent "MATCH (q:Question)" in Graph will... do what, exactly? Does it walk across the entire graph to build a list of question nodes? Are the nodes stored in some kind of other internal data structure that makes it possible to grab an entire category of nodes without needing to peek at every node and connection in the entire database?
One of the better presenters in the world because he has slides that people in the back can read and he actually talks about the content of the slides.
"It uses math that I don't understand, but it works. It's pretty cool." LOL. I agree. I've been using graph DB for a couple years and they are incredible for studying relationships between data. I highly recommend taking a look at the APOC procedures for Neo4j, because you will get a bumload of algorithms. I also can't recommend enough to read about the anti-patterns. Neo4j wrote great material about it. They will be fairly intuitive for you, like don't store giant blobs as node properties. I also totally recommend looking at some of the machine learning stuff for this. It's zany what you can do once you start doing like decision trees and shortest path analysis using APOC procedures.
This talk was interesting and taught me some things about graph DBs...but what it didn't do was point out *any* benefit to them. Every single thing mentioned in presentation is fairly easy to do with relational databases and SQL. If the benefit is that unknown questions become faster, then if you have a genuinely massive dataset, this could be good (or in fact amazing). Otherwise, it's pre-optimization, which is the mother of all anti-patterns. Because you could replace graph DBs apparently with just smart decision when you discover new questions. Either this wasn't explained well enough or the speaker doesn't know enough about relational databases (hardly uncommon, for some reason a lot of people just can't understand relational DBs).
As a student learning Relational Databases in my information studies degree, i’m so grateful that you introduced me to this new pattern for databases! Gotta stay on top of emerging standards and technologies. Thanks for sharing!
Fair bit of advice: "emerging" technologies like this often emerge from nowhere promising the moon, usually fail to deliver, and disappear just as quick. As a student you should probably focus on what is proven and tested and in-use in the industry.
What it made me do is to consider looking into alternative _languages_ to query the relational DBs we already have, because clearly SQL is not optimized enough for the common use case. Additionally it made wonder if graph DBs have some hidden power that goes beyond what was shown in the presentation, because that was all pretty basic level SQL.
As other commenters have pointed out, the vast majority of this video spends time on material that is easily handled in relational databases. OP says "indirect" a lot but as a long time user of relational databases, I hoped to hear more about queries where valid results mean traversing a variable number of "joins."
Agreed, though I've created and utilized data structures in RDBMSs for many years as well. Maybe then the real point is graph is easier to learn, not technically more powerful in any particular way?
Kind of goes back to the old saying... "if all you have is a hammer, the whole world looks like a nail". As a long time user of relational dbs myself, the question isn't "can I accomplish this using an existing tool", but rather "can this be MORE EASILY accomplished using a different tool (plus how long will ti take to learn, will it likely be around tomorrow, etc)?". We should ALWAYS be looking for more performant ways to achieve tasks, as this wastes less of our own development time, resources, etc. Additionally, when steering clients/companies, we have an obligation to avoid leading them into more and more technical debt, as they bind themselves to legacy architectures that are ill-suited to the growing complexities of modern questions/tasks.
A constraint in a data model is not something that "won't work" as in the example given of the foreign key but an element of a data model that is there to guarantee that the semantics associated to the data (is socially assigned meaning) is preserved when it's formalized in a logical system to be computerized. That is a constraint is not a problem but a necessary feature of any data model that want to oreserve semantics, the information carried by data as semantic content.
Jesus, at @12:11 it comes apparent that this guy has never managed a large relational database... I don't disagree that graph-databases have a lot of uses, but this isn't one of the cases.
The request was actually simple it would be a few joins but he makes it seem like its impossible to get that answer. If your DB schema is solid and good you can answer questions you never imagined
@@ShaoVideoProduction I guess? but not really. It's not expensive because it's been done 1,000,000 times in databases that are larger than 400million users and we haven't had any issues. So what gives?
Hands up if you ever fought deleting rows in MS Access! Great video and an interesting way to challenge the way we are storing data. I am looking at GDPR currently and it seems like sticking private stuff in a single related table will be allow a lot of freedom from the GDPR restrictions, I am guessing the Graph system fits the real world better and may not be easily manipulated in the same way.
This would be much better if he compared the queries between different databases, e.g. SQL vs Graph, and then pointed out the point of using a Graph database. This mostly seemed like examples of queries but not so much details on why it's better than anything else already out there for many many years.
13:00 Actually this query can be expressed in SQL because you can join the tables with the condition data to the Person table. It doesn't need recursive indirection links.
The example given about the coupon in kansas with the criminal record is not as the effin guy says. Just as you need graph data in some fashion to represent the relationships the same is true of relational dbs. equally as you have a graph between a person, a criminal record, address, purchases and coupons... those things come into existence in similar ways. To suggest that a graph db allows for a dynamic schema means you do not understand relational DB tools
As usual, this is a great and god awful presentation at the same time on an interesting tool for a limited purpose. I'd be happy to have this tool sitting on top of a MySQL schema in case I need to get certain type of answers. I've seen people give some good use cases in separate presentations dealing with other subjects. I do appreciate the criticism that some queries do end up being slow due to too many joints and DBs not being optimized to answer the particular question. Even more noticeable when the DB has lower amounts of RAM allocated to it. But this issue is easily rectified by one or a collection of Functions/Stored Procedures that allow to break up the task into smaller modules (queries) A mistake made by most of the modern developers who don't actually understand properly the technologies they work with but are all too happy to jump onto the next new thing as long as it sounds cool and confuses potential customers into paying a bit of extra.
At 13:25 he said the crucial thing. IF the data somehow is in the system. How could you answer an unanticipated question with any data system if the data is not available. It is not a matter of key constraints or storage structures.
Great talk. I would love to see some simpler examples so I could get more familiar with the query language, but the speaker did an excellent job of getting me hyped on graph dbs (I've been in relationship hell on-and-off the job for my entire life).
But you can represent a graph with a relational database just fine Node(id,...fields) Edge(from_node_id,to_node_id,reltype_id) Relation_type(id,name) It's just that you're nodes can't be dynamica, without introducing a bunch of joins like Node_String_Fileds(node_id,value) Node_Integer_Fields(node_id,value)
I guess that makes relational databases also graph databases :) But I have a hunch they're much slower at joining that many tables and/or possibly needed recursive queries.
Put simply graph databases are optimized many-to-many relationships and that's what makes them useful and fast at what they do. For example give me all the friends of my friends who are not my friends (one of the reasons Facebook started using them) this takes forever in a standard relational DB simply because of design choices. In a graph db it's optimized to do these kinds of queries fast. Otherwise yes all graph DB's are easily implementable in a standard relational database also remember most if not all graphDB's are RAM databases in other words you have to have your graph loaded in RAM for things to go fast. Which right off the bat you see isn't what a standard relationalDB is based around yes a relationalDB can use a lot of RAM but it isn't a complete failure without it either. A graphDB will just not perform at all if it can't load the graph's in RAM. It's a different tool for a different task and a useful one at that.
That's the biggest problem I have with these kinds of lectures. They are always based on the idea that "this new thing will change your life completely and forever so you can just throw away all that old stuff which is bad and disgusting". In IT especially it is crucial to understand that most tools are good for something and no tool is best for everything and the challenge comes from finding the right tools, for the right job with the right cost. Any Turing complete language can do any job given enough time (to configure and to run) and resources, but if language A does the development three times faster than language B and language B does the execution 1.1 times faster than language A when you have X number of requests but language C does it 10 faster than either language A or B but takes 5 times as long to develop than language A, which is the best language for the job? The answer is that it completely depends on the situation and resources you have. No single technology will ever completely crush an older technology in all possible areas especially when it's little used, little known and relatively new. Still they are usually marketed as such through lectures like this with faulty logic and dubious or even completely misunderstood arguments and examples.
"For example give me all the friends of my friends who are not my friends" SELECT ff.PersonID FROM Person me INNER JOIN PersonFriends mf ON mf.PersonID = me.PersonID INNER JOIN PersonFriends ff ON ff.PersonID = mf.FriendPersonID LEFT JOIN PersonFriends nmf ON nmf.PersonID = ff.PersonID AND nmf.FriendPersonID = me.PersonID WHERE me.PersonName = 'Myself' AND nmf.PersonID IS NULL If the PersonID and FriendPersonID and PersonName fields are indexed, this query results in a single lookup, filter, and fetch. Not really that hard...
+HMan I'm pretty sure each one of those joins is a separate set of lookups, and that each lookup is logarithmic, not constant (single). So with N Persons, and a "me" with F average friends, each lookup is log(N), and your algo breaks down to: find me (+ log(N)) find my friends (+ F * log(N)) for each friend, find their friends (+ F^2 * log(N)) filter out my friends (no new lookups, + negligible constant K) assuming our number of friends' friends is F^2, and assuming the cost of filtering is negligible constant K, we get a worstcase runtime of log(N) + F * log(N) + F^2 * log(N) + K = log(N) * (1 + F + F^2) which isn't terrible, but it could be better. with a graph, only the first lookup is necessary, and everything else can be found through the nodes' edges (in _truly_ constant runtime, because nodes are not relative offsets in an index but absolute offsets in memory). there are notably less lookups this way: find me (+ log(N)) find my friends (+ F) for each friend, find their friends (+ F^2) filter out my friends (+ negligible constant K) which gives a worstcase runtime of log(N) + F + F^2 + K = log(N) + F + F^2 Which is quite a bit better when working with extremely large datasets. At least that's my understanding of it, I hope i got that right.
My issue with these kind of databases is the overhead of Json format, it is an improvement over XML for sure, but it is not good for large result sets, you need to repeat your "scheme" in every data record, so it will perform poorly if you have a huge dataset as a result, and parse it out of Json has a cost, altho today that is very optimized, but still is a cost, i need a binary version of this i suppose, but the lack of scheme, make it hard to then parse it out. It's complex. 😕
Nearly all the examples he showed were easily doable in any relational database. I'll give a little leniency on the last one... But, its not like what he showed, was any more less complex then what a `join` statement would look like. Essentially replaced `JOIN` with `MATCH` ..... Then speed? There's a pretty damning whitepaper out now that shows how relationship DB's performance vastly better in nearly 95% of all the cases you'd need a DB for. Bottom line is don't get caught up in superficial hype trains and start evaluating actual-realistic use cases for a Graph. These examples are not it..
Ok so if I’m understanding correctly: Graph db for when you want to access indirect relationships and not be limited by a schema Document db for when you don’t want to be limited by a schema, but most data be accessed is usually all in the same document (?) Relational db when you want extreme structure, you want that schema as a safety net. And you’re not looking to access indirect relationships Is that correct?
I know that this is fairly certainly naivete (from someone who's largely new to practical programming, and also definitely on the opposite side of the theoretical/practical scale), but everything I've learned about relational databases leaves me thinking of as, well, just objects, with a few (admittedly very nice) tie ins to keep track of all the props, and types of props. It seem a little silly that almost nobody was doing this before a couple years ago. What am I missing? Do you need processors of a sufficient strength or flexibility to make the payoff worth it? Is the math knottier than I thought? Am I just being unspeakably naive about how entrenched SQL was, and how hard it would have been to switch over?
WHAT exactly is the payoff is the question? To me all of this was just flavor of the month snowflake idea from someone who doesn't understand why relational databases are so widely used, even though Graph databases as a concept already existed in the 60's. I don't see the payoff at all.
What compels people dig all the way down into the youtube comments for a genuine question and respond with contentless curmugeonry? The world may never know!
A relational DB schema can also be visualised as a graph. This is interesting technology but I think the presenter just doesn't understand relational DB dismissing the technology. A sane relational DB can also answer arbitrary questions with relational algebra which can be translated into SQL.
see graph db's as optimized many-to-many relationships. They are very fast at "joining tables" which is their main purpose they also need a lot of RAM to load the entire graph and uses pre loaded graph's to make sure it's always fast. If you always run a graphDB from disk it's going to be a nightmare.
I must be missing something, anyone who is competent in SQL will be able to design a database in the given scenario and create those queries using an, imo, much simpler syntax as well?
12:10 Something like this should work, but criminal records is probably an API call with select user info: select count(uid) from criminal_records where uid in ( (select distinct uid from purchases where itemtype='toaster') p join (select distinct uid from users where city='Kansas') u on u.uid = p.uid join (select distinct uid from redeemed_coupons where publication_date > current_date - 1) c on c.uid = p.uid )
A row, entity and not mentioned tuples are not always the same in relational databases. Writing queries in databases are easy.. learn the difference between your joins
Hello, may i have your permission to move this video to bilibili and cite the source of Coding Tech youtube channel to share it with more people in China~? It's a great video, sad that most of us have no VPN access to visit youtube...
I'm not convinced. The queries shown in this talk are all beginner- or at most intermediate-level SQL. In fact, I could easily write a transpiler from the subset of ASCII-art language used in the presentation into SQL, assuming you are only joining on foreign key constraints-which of course SQL doesn't restrict you into doing. So either the power of graph databases lies elsewhere, in some advanced features that were not shown, or this thing is just another poor man's schema-less store with a fancy query language. For example, off the top of my head: can you traverse a variable number of joins/edges, depending on the data you find? Can you traverse a variable _type_ of join/edge, depending on the data you find? Can you express a recursive definition of joins/edges, again depending on the data? What about a set of mutually recursive definitions of joins? Are such things performant? And so on. Mind you, good relational DBs allow you to do most of that, but they don't necessarily make it easy or performant, so I could see the use of something more powerful. Which this presentation did not show.
It's funny seeing all the relational db peoples saying "I don't see the point, yeah it's easier and faster, but who cares I've been tracking banana shipments to market data to police records for port authorities across 10 different countries for 25 years your database sucks if you can't!"
I think what we are actually saying is "I don't see HOW this is any faster OR easier, or what tangible advantage it has over the mature, tested and true technology I am already using. Why should I gamble customer data on this solution?
I just think it's absolutely _ridiculous_ that we can't make anything new in software without swarming behavior trying to bite it to death like an intruder into a hive. If this isn't appealing to you, if you find SQL easier, if you're worried this isn't as secure or fast, good. Go write SQL. Otherwise I'm pretty sure I see the same old job security dance belittling anyone that finds their technology obtuse. Competition isn't bad.
That's not what is happening here. I welcome new concepts, IF they provide a worthwhile improvement in some way. I just don't see what this particular concept brings to the table. To be competitive there needs to be a competitive advantage, somehow. If you told me, for example, that this approach is better because it lends itself better to neural net processing, or that it improves efficiency or that it scales better, then I'm interested, you got my attention. All I see in this presentation is a different way of doing something we already do not as well and for the sake of being different.
Graph databases messed up my life. I vouched for the technology at my company. The problem is that in theory it's all good but the technology is so new that all the graph databases out there have so many issues that make them unusable in production. Stay away from Neo4J and Orientdb!
@@vishaljotshi6869 it's been 4 years since I put my reputation on the line for this technology. It's probably much better now. I still would not recommend graph unless you want to try something new. They are fun to play with but not so much fun when things fall apart in prod.
@@LordBadenRulezwhat kind of problems did you have in production. Exist so much possibilities and use cases, maybe your problem was the focus. or the model it's wasn't thought for a concrete problems.
Great video, but the comments about relational databases being rigid and needing to know all the query requirements up front is completely the opposite of reality. One of the main advantages of a normalized relational database is the query flexibility. The joins might be ugly, but you can create any query you'd like against it. NoSQL key value stores on the other hand, do require that you know all your access patterns up front. Can't wait to try a graph database out one day!
It's very important to have in mind that the slowness of current available SQL databases (none of them are actual implementations of the relational data model, please refer to at least Fabian Pascal, C.J.Date and David McGoveran) has nothing to to with features of the relational datamodel itself. How does a graph is stored in a computer? A computer does not stores nodes and edges, these are abstractios, the same way a relation or other things like tables are abstractions. All these abstractions can be represented in computer memory in various physical ways. As I mentioned before a table abstraction or also a matrix abstraction can represent a graph. SQL is not synonym of relational data model which is an application of first order predicate logic. T. Codd thought about using second order logic but as the person in the video mentioned simplicity is to be preferred. And again a data model without constraints is not a data model but simply data. A database might be backed by a data model or not. Most databases unfortunately are not support by proper data models as formal systems. A key value data base has no data model but a syntactical abstraction for representing data. It makes no guarantees on the semantic consistency of your data, therefore you can not safely rely 100%of the time on the inferences you derive of such databases. A graph is a methematical construct. A graph data model is not the same as a graph. And a graph database management system should not store graphs but a graph data model.
This is a moot point, considering the expression of that relation is a table, connected to other tables. That might have been the rational for mathematical theory, but in application it's a table. Just seems silly to make much adieu about nothing...
to be honest you should've brought an example that is more complex to query with SQL, because relational databases are MADE to answer such a question with ease 12:30 .
Since this is over a year old, judging by the name of the talk, I was curious as to whether the permissions for reupload take time, or is it just that you get to watch it late?
I watched this to learn about graph database. 13 1/2 minutes in and I have heard a lot of disdain for relational databases. I don't have any of those issues he talked about and I have had people ask for crazy things.
Ah he lost me when he talked about not being able to answer specific questions in SQL DBs. If the information is not there its simply impossible regardless of your DB type, if it is there and you can't get it out of a SQL db structure you built yourself in at MAXIMUM a day, then probably you have not used it that much (and nothing wrong with that, you could be an expert in other things)
A relational data model has relations (not tables rows and columns), constraints and operations applied to roations and the way data is physically stored is irrelevant for the model since any data model is a formal system, that's based on logic and can be physically implemented in many ways assuming it respects the logic of the defined formal system.
2:45 quite amusing, entertaining, and all but that is not all what a constraint in relational DB is about. For starters, RDBs follow normalization concepts while noSQL DBs do not. Actually noSQL does not mean no sequel. It is actually common to see sanity checks fail in ghost rows that were once part of the "thing" in the scheme, but the "thing" got deletedand the related row having no constraint was not. It is particularly common in nested objects. Maybe that is why FB migrated from Cassandra to PostGres almost the entire persistent layer. Quite sequeled insane, one would say. Now, that is a thing.
I've done too many comments just to say I do have a genuine interest in graph data models, I just feel unnecessary to refer to misunderstandings of what the relational data model is like SQL databases to make the case of graph data modelling. If the video was using SQL database every time it mentions relational data model instead it would be much more precise.
How can I weight the relationships here? I would need an algorythm to weight it (and also some kind of activation equiation for the nodes) to use it more like a NN. I use 'personal brain' since the late 90s. A file manager based ond graph. But with ADHD I messed it up frequently.
This is how all technical stuff should be taught. Great job!
Louis CK got really good at databases with all his down time...
Kind of sounds like him too!
this is an underappreciated comment
Best. Comment.
Lmao 🤣
Looool
13:50 - He makes a disingenuous argument. If one had created their relational database to actually contain all of those pieces of information mentioned in the salesperson's question, then it would be possible to get the data out. But when he switches to talking about the graph database, he just presupposes that you have nodes that contain all of this information, and that you have complete information on all of your customers. You may have information about some of the customers that have bought toasters. You may have information about some of the customers that are ex-cons. But, in a graph database, you have no guarantee that you have complete information about each customer. More importantly, you do not know if your lack of information is because the customer never bought a toaster, or because you simply don't know about the toaster the customer bought. With a relational database, you can at least indicate which customers you have complete information about and then only consider those in your statistics.
Now, I am a huge fan of graph databases. That's why I'm watching this video. However, it seems almost any time someone tries to compare and contrast two technologies, they always seem to make disingenuous comparisons. They will consider a critical Factor for one technology and then assume that critical Factor is not a big issue with the preferred technology.
You got to make even comparisons if you are going to have any chance of choosing a technology based on its merit.
His style and his natural understanding of the topic are such and inspiration for me!
I feel like he only barely touched on the actual advantages of graph databases. The queries he showed off can be done in a relational database without any real performance issues. I think what really separates Graph databases from relational databases is their extensibility and how they treat relationships themselves as entirely separate entities. I'm an RDBMS amateur and have no experience with Graph databases so I could be way off, but it sounds like graph databases can be extremely easy to extend beyond their initially-defined schema without really needing much, if any, refactoring. You can just define a new relationship and start using it to link nodes together. With a relational database you'd have to do a whole lot more refactoring, adding or modifying columns, etc. And then there's the direct focus on relationships between nodes. If you're working with highly-interconnected data and the connections themselves have their own attributes beyond just the two nodes they link together, I can see how a graph database could be useful.
Basically it sounds like it's more useful for modeling complex, constantly evolving networks. Like a Social Network for example, but one where you can freely define relationships between you and other people rather than picking from a dropdown or creating explicitly-defined lists of people.
What I want to know is how the data is actually stored and indexed beyond just having nodes and relationships, and how that affects query performance. Take the very first query he showed, for example. A simple "SELECT * FROM questions" in MySQL will just find the "questions" table and return every row it contains. But the equivalent "MATCH (q:Question)" in Graph will... do what, exactly? Does it walk across the entire graph to build a list of question nodes? Are the nodes stored in some kind of other internal data structure that makes it possible to grab an entire category of nodes without needing to peek at every node and connection in the entire database?
One of the better presenters in the world because he has slides that people in the back can read and he actually talks about the content of the slides.
The delivery of this presentation was excellent. Thanks for the insight into graph dbs.
"It uses math that I don't understand, but it works. It's pretty cool." LOL. I agree. I've been using graph DB for a couple years and they are incredible for studying relationships between data. I highly recommend taking a look at the APOC procedures for Neo4j, because you will get a bumload of algorithms. I also can't recommend enough to read about the anti-patterns. Neo4j wrote great material about it. They will be fairly intuitive for you, like don't store giant blobs as node properties. I also totally recommend looking at some of the machine learning stuff for this. It's zany what you can do once you start doing like decision trees and shortest path analysis using APOC procedures.
Excellent presentation: Fast, lively, practical. Will be exploring this technology further.
This talk was interesting and taught me some things about graph DBs...but what it didn't do was point out *any* benefit to them. Every single thing mentioned in presentation is fairly easy to do with relational databases and SQL. If the benefit is that unknown questions become faster, then if you have a genuinely massive dataset, this could be good (or in fact amazing). Otherwise, it's pre-optimization, which is the mother of all anti-patterns. Because you could replace graph DBs apparently with just smart decision when you discover new questions.
Either this wasn't explained well enough or the speaker doesn't know enough about relational databases (hardly uncommon, for some reason a lot of people just can't understand relational DBs).
As a student learning Relational Databases in my information studies degree, i’m so grateful that you introduced me to this new pattern for databases! Gotta stay on top of emerging standards and technologies. Thanks for sharing!
Fair bit of advice: "emerging" technologies like this often emerge from nowhere promising the moon, usually fail to deliver, and disappear just as quick. As a student you should probably focus on what is proven and tested and in-use in the industry.
this was a very good talk that made me consider graph dbs. thanks
What it made me do is to consider looking into alternative _languages_ to query the relational DBs we already have, because clearly SQL is not optimized enough for the common use case.
Additionally it made wonder if graph DBs have some hidden power that goes beyond what was shown in the presentation, because that was all pretty basic level SQL.
Loved the talk, loved the humor. Great introduction... I'm hopefully working with a startup soon who's deeply using Neo4j
As other commenters have pointed out, the vast majority of this video spends time on material that is easily handled in relational databases.
OP says "indirect" a lot but as a long time user of relational databases, I hoped to hear more about queries where valid results mean traversing a variable number of "joins."
As a long time user of relational databases, perhaps you've lost touch with how complicated they can be to work with.
Maybe :) But coming from the relational side, I wanted to see problems that *I* would consider extremely difficult in relational.
Agreed, though I've created and utilized data structures in RDBMSs for many years as well.
Maybe then the real point is graph is easier to learn, not technically more powerful in any particular way?
matthew rummler hmm hadn't thought of it like that... oh maybe thats what amici was saying
Kind of goes back to the old saying... "if all you have is a hammer, the whole world looks like a nail". As a long time user of relational dbs myself, the question isn't "can I accomplish this using an existing tool", but rather "can this be MORE EASILY accomplished using a different tool (plus how long will ti take to learn, will it likely be around tomorrow, etc)?". We should ALWAYS be looking for more performant ways to achieve tasks, as this wastes less of our own development time, resources, etc. Additionally, when steering clients/companies, we have an obligation to avoid leading them into more and more technical debt, as they bind themselves to legacy architectures that are ill-suited to the growing complexities of modern questions/tasks.
A constraint in a data model is not something that "won't work" as in the example given of the foreign key but an element of a data model that is there to guarantee that the semantics associated to the data (is socially assigned meaning) is preserved when it's formalized in a logical system to be computerized. That is a constraint is not a problem but a necessary feature of any data model that want to oreserve semantics, the information carried by data as semantic content.
Listening to this lecture is so fun😍
The guy is a Great Teacher.
Jesus, at @12:11 it comes apparent that this guy has never managed a large relational database... I don't disagree that graph-databases have a lot of uses, but this isn't one of the cases.
Yep! I agree.
The point is that this is a complex and expensive query, you don't want that.
The request was actually simple it would be a few joins but he makes it seem like its impossible to get that answer. If your DB schema is solid and good you can answer questions you never imagined
@@ALLCAPS i said expensiv, not hard. Bubble sort for exsmple is easy and expensiv.
@@ShaoVideoProduction I guess? but not really. It's not expensive because it's been done 1,000,000 times in databases that are larger than 400million users and we haven't had any issues. So what gives?
The concept of Graph DBs somehow reminds me of Prolog ^^
This guy is really funny. I got a tech talk and a stand up show in one sitting.
Oh man, advertisers are going to love this.
Who in La La Land is this guy? He is HILARIOUS!!!! What a GREAT presenter!!!!
Not even a minute has passed, and I already like this guy. Looks like it's going to be a pretty good talk!
Hands up if you ever fought deleting rows in MS Access! Great video and an interesting way to challenge the way we are storing data. I am looking at GDPR currently and it seems like sticking private stuff in a single related table will be allow a lot of freedom from the GDPR restrictions, I am guessing the Graph system fits the real world better and may not be easily manipulated in the same way.
Last time I did that was around the time my pet dinosaur died
Thanks Ed! Very easy to follow and comprehensive talk about graph db
Figuring out indirect relationship is the strength of Graph database is my key takeaway from the excellent talk...
This would be much better if he compared the queries between different databases, e.g. SQL vs Graph, and then pointed out the point of using a Graph database. This mostly seemed like examples of queries but not so much details on why it's better than anything else already out there for many many years.
13:00 Actually this query can be expressed in SQL because you can join the tables with the condition data to the Person table. It doesn't need recursive indirection links.
The example given about the coupon in kansas with the criminal record is not as the effin guy says. Just as you need graph data in some fashion to represent the relationships the same is true of relational dbs. equally as you have a graph between a person, a criminal record, address, purchases and coupons... those things come into existence in similar ways. To suggest that a graph db allows for a dynamic schema means you do not understand relational DB tools
As usual, this is a great and god awful presentation at the same time on an interesting tool for a limited purpose. I'd be happy to have this tool sitting on top of a MySQL schema in case I need to get certain type of answers. I've seen people give some good use cases in separate presentations dealing with other subjects.
I do appreciate the criticism that some queries do end up being slow due to too many joints and DBs not being optimized to answer the particular question. Even more noticeable when the DB has lower amounts of RAM allocated to it.
But this issue is easily rectified by one or a collection of Functions/Stored Procedures that allow to break up the task into smaller modules (queries)
A mistake made by most of the modern developers who don't actually understand properly the technologies they work with but are all too happy to jump onto the next new thing as long as it sounds cool and confuses potential customers into paying a bit of extra.
At 13:25 he said the crucial thing. IF the data somehow is in the system. How could you answer an unanticipated question with any data system if the data is not available. It is not a matter of key constraints or storage structures.
Great talk. I would love to see some simpler examples so I could get more familiar with the query language, but the speaker did an excellent job of getting me hyped on graph dbs (I've been in relationship hell on-and-off the job for my entire life).
Great overview of graph dbs and fun to watch.
12:30 looks easy in SQL '__')
SQL sucks at temporal queries.
Very cool, think this guy should just do every tech talk from now on
But you can represent a graph with a relational database just fine
Node(id,...fields)
Edge(from_node_id,to_node_id,reltype_id)
Relation_type(id,name)
It's just that you're nodes can't be dynamica, without introducing a bunch of joins like
Node_String_Fileds(node_id,value)
Node_Integer_Fields(node_id,value)
I guess that makes relational databases also graph databases :)
But I have a hunch they're much slower at joining that many tables and/or possibly needed recursive queries.
Put simply graph databases are optimized many-to-many relationships and that's what makes them useful and fast at what they do. For example give me all the friends of my friends who are not my friends (one of the reasons Facebook started using them) this takes forever in a standard relational DB simply because of design choices. In a graph db it's optimized to do these kinds of queries fast. Otherwise yes all graph DB's are easily implementable in a standard relational database also remember most if not all graphDB's are RAM databases in other words you have to have your graph loaded in RAM for things to go fast. Which right off the bat you see isn't what a standard relationalDB is based around yes a relationalDB can use a lot of RAM but it isn't a complete failure without it either. A graphDB will just not perform at all if it can't load the graph's in RAM.
It's a different tool for a different task and a useful one at that.
That's the biggest problem I have with these kinds of lectures. They are always based on the idea that "this new thing will change your life completely and forever so you can just throw away all that old stuff which is bad and disgusting".
In IT especially it is crucial to understand that most tools are good for something and no tool is best for everything and the challenge comes from finding the right tools, for the right job with the right cost. Any Turing complete language can do any job given enough time (to configure and to run) and resources, but if language A does the development three times faster than language B and language B does the execution 1.1 times faster than language A when you have X number of requests but language C does it 10 faster than either language A or B but takes 5 times as long to develop than language A, which is the best language for the job? The answer is that it completely depends on the situation and resources you have.
No single technology will ever completely crush an older technology in all possible areas especially when it's little used, little known and relatively new. Still they are usually marketed as such through lectures like this with faulty logic and dubious or even completely misunderstood arguments and examples.
"For example give me all the friends of my friends who are not my friends"
SELECT ff.PersonID
FROM Person me
INNER JOIN PersonFriends mf ON mf.PersonID = me.PersonID
INNER JOIN PersonFriends ff ON ff.PersonID = mf.FriendPersonID
LEFT JOIN PersonFriends nmf ON nmf.PersonID = ff.PersonID AND nmf.FriendPersonID = me.PersonID
WHERE me.PersonName = 'Myself' AND nmf.PersonID IS NULL
If the PersonID and FriendPersonID and PersonName fields are indexed, this query results in a single lookup, filter, and fetch.
Not really that hard...
+HMan I'm pretty sure each one of those joins is a separate set of lookups, and that each lookup is logarithmic, not constant (single). So with N Persons, and a "me" with F average friends, each lookup is log(N), and your algo breaks down to:
find me (+ log(N))
find my friends (+ F * log(N))
for each friend, find their friends (+ F^2 * log(N))
filter out my friends (no new lookups, + negligible constant K)
assuming our number of friends' friends is F^2, and assuming the cost of filtering is negligible constant K, we get a worstcase runtime of
log(N) + F * log(N) + F^2 * log(N) + K = log(N) * (1 + F + F^2)
which isn't terrible, but it could be better. with a graph, only the first lookup is necessary, and everything else can be found through the nodes' edges (in _truly_ constant runtime, because nodes are not relative offsets in an index but absolute offsets in memory). there are notably less lookups this way:
find me (+ log(N))
find my friends (+ F)
for each friend, find their friends (+ F^2)
filter out my friends (+ negligible constant K)
which gives a worstcase runtime of
log(N) + F + F^2 + K = log(N) + F + F^2
Which is quite a bit better when working with extremely large datasets.
At least that's my understanding of it, I hope i got that right.
Incredible talk and dude!
Jakob Lindskog î
What was that? 4:09
🤣
Relational databases are named after the mathematical concept of relations. It's not about table relationships.
Interesting - realistically it seems like a graph database is just a relational database that maintains its own join tables though.
Wish all presenters were this good!
Long time ago I used to program in Prolog. It would do this kind of stuff but so much more elegant. -- if you can get your head round recursion.
If I was in charge, I'd make anyone planning on using a graph database learn Prolog first, and then see if they still needed a graph database.
It might be a good idea to extend graphs to hypergraphs - that is sets of nodes, not just binary relations.
If you cant do this in SQL, then you’re not very good at it. But graphs databases look like they’re easier to learn; will check it out
My issue with these kind of databases is the overhead of Json format, it is an improvement over XML for sure, but it is not good for large result sets, you need to repeat your "scheme" in every data record, so it will perform poorly if you have a huge dataset as a result, and parse it out of Json has a cost, altho today that is very optimized, but still is a cost, i need a binary version of this i suppose, but the lack of scheme, make it hard to then parse it out. It's complex. 😕
really interesting, providing me with lots of ideas for data manipulation in work :)
Nearly all the examples he showed were easily doable in any relational database. I'll give a little leniency on the last one... But, its not like what he showed, was any more less complex then what a `join` statement would look like. Essentially replaced `JOIN` with `MATCH` .....
Then speed? There's a pretty damning whitepaper out now that shows how relationship DB's performance vastly better in nearly 95% of all the cases you'd need a DB for.
Bottom line is don't get caught up in superficial hype trains and start evaluating actual-realistic use cases for a Graph. These examples are not it..
Liked as soon as I heard the first sentence
This is a brilliant explanation.
[15:42] The Open Source Mental Illness Neo4j database is at:
github.com/OSMIHelp/osmi-survey-graph
great talk on an interesting topic, something for further research for sure, thank you!
Ok so if I’m understanding correctly:
Graph db for when you want to access indirect relationships and not be limited by a schema
Document db for when you don’t want to be limited by a schema, but most data be accessed is usually all in the same document (?)
Relational db when you want extreme structure, you want that schema as a safety net. And you’re not looking to access indirect relationships
Is that correct?
I know that this is fairly certainly naivete (from someone who's largely new to practical programming, and also definitely on the opposite side of the theoretical/practical scale), but everything I've learned about relational databases leaves me thinking of as, well, just objects, with a few (admittedly very nice) tie ins to keep track of all the props, and types of props. It seem a little silly that almost nobody was doing this before a couple years ago. What am I missing? Do you need processors of a sufficient strength or flexibility to make the payoff worth it? Is the math knottier than I thought? Am I just being unspeakably naive about how entrenched SQL was, and how hard it would have been to switch over?
WHAT exactly is the payoff is the question? To me all of this was just flavor of the month snowflake idea from someone who doesn't understand why relational databases are so widely used, even though Graph databases as a concept already existed in the 60's. I don't see the payoff at all.
What compels people dig all the way down into the youtube comments for a genuine question and respond with contentless curmugeonry? The world may never know!
Its 6 years later - I just checked to see if comments reflected all these changed lives...
Amazing speaker and topic!
1 minute in and I’m already cracking up.
A relational DB schema can also be visualised as a graph. This is interesting technology but I think the presenter just doesn't understand relational DB dismissing the technology. A sane relational DB can also answer arbitrary questions with relational algebra which can be translated into SQL.
see graph db's as optimized many-to-many relationships. They are very fast at "joining tables" which is their main purpose they also need a lot of RAM to load the entire graph and uses pre loaded graph's to make sure it's always fast. If you always run a graphDB from disk it's going to be a nightmare.
As a graph with fat arrows which go between table and table maybe, but not as a graph with alots of granular arrows for each dataset/document.
Thank you for this very useful video!
I always liked node based databases. thats what i call them. Nice talk!
Wtf happened at 4:10
I must be missing something, anyone who is competent in SQL will be able to design a database in the given scenario and create those queries using an, imo, much simpler syntax as well?
You missed the point. His talk was a bridge.
Callum Vass can relate
12:10 Something like this should work, but criminal records is probably an API call with select user info:
select count(uid) from criminal_records
where uid in (
(select distinct uid from purchases where itemtype='toaster') p
join (select distinct uid from users where city='Kansas') u on u.uid = p.uid
join (select distinct uid from redeemed_coupons where publication_date > current_date - 1) c on c.uid = p.uid
)
Great introduction. Many thanks
A row, entity and not mentioned tuples are not always the same in relational databases. Writing queries in databases are easy.. learn the difference between your joins
Hello, may i have your permission to move this video to bilibili and cite the source of Coding Tech youtube channel to share it with more people in China~? It's a great video, sad that most of us have no VPN access to visit youtube...
I'm not convinced. The queries shown in this talk are all beginner- or at most intermediate-level SQL.
In fact, I could easily write a transpiler from the subset of ASCII-art language used in the presentation into SQL, assuming you are only joining on foreign key constraints-which of course SQL doesn't restrict you into doing.
So either the power of graph databases lies elsewhere, in some advanced features that were not shown, or this thing is just another poor man's schema-less store with a fancy query language.
For example, off the top of my head: can you traverse a variable number of joins/edges, depending on the data you find? Can you traverse a variable _type_ of join/edge, depending on the data you find? Can you express a recursive definition of joins/edges, again depending on the data? What about a set of mutually recursive definitions of joins? Are such things performant? And so on.
Mind you, good relational DBs allow you to do most of that, but they don't necessarily make it easy or performant, so I could see the use of something more powerful.
Which this presentation did not show.
Excellent talk about graph databases
It's funny seeing all the relational db peoples saying "I don't see the point, yeah it's easier and faster, but who cares I've been tracking banana shipments to market data to police records for port authorities across 10 different countries for 25 years your database sucks if you can't!"
I think what we are actually saying is "I don't see HOW this is any faster OR easier, or what tangible advantage it has over the mature, tested and true technology I am already using. Why should I gamble customer data on this solution?
I just think it's absolutely _ridiculous_ that we can't make anything new in software without swarming behavior trying to bite it to death like an intruder into a hive. If this isn't appealing to you, if you find SQL easier, if you're worried this isn't as secure or fast, good. Go write SQL. Otherwise I'm pretty sure I see the same old job security dance belittling anyone that finds their technology obtuse. Competition isn't bad.
That's not what is happening here. I welcome new concepts, IF they provide a worthwhile improvement in some way. I just don't see what this particular concept brings to the table. To be competitive there needs to be a competitive advantage, somehow. If you told me, for example, that this approach is better because it lends itself better to neural net processing, or that it improves efficiency or that it scales better, then I'm interested, you got my attention. All I see in this presentation is a different way of doing something we already do not as well and for the sake of being different.
Fantastic introduction to Graph Databases, very engaging speaker.
Graph databases messed up my life. I vouched for the technology at my company. The problem is that in theory it's all good but the technology is so new that all the graph databases out there have so many issues that make them unusable in production. Stay away from Neo4J and Orientdb!
I hope those issues would be resolved by now and please write back if you think they are good to be used in production now
@@vishaljotshi6869 it's been 4 years since I put my reputation on the line for this technology. It's probably much better now. I still would not recommend graph unless you want to try something new. They are fun to play with but not so much fun when things fall apart in prod.
@@LordBadenRulezwhat kind of problems did you have in production. Exist so much possibilities and use cases, maybe your problem was the focus. or the model it's wasn't thought for a concrete problems.
Great video, but the comments about relational databases being rigid and needing to know all the query requirements up front is completely the opposite of reality. One of the main advantages of a normalized relational database is the query flexibility. The joins might be ugly, but you can create any query you'd like against it. NoSQL key value stores on the other hand, do require that you know all your access patterns up front.
Can't wait to try a graph database out one day!
It's very important to have in mind that the slowness of current available SQL databases (none of them are actual implementations of the relational data model, please refer to at least Fabian Pascal, C.J.Date and David McGoveran) has nothing to to with features of the relational datamodel itself. How does a graph is stored in a computer? A computer does not stores nodes and edges, these are abstractios, the same way a relation or other things like tables are abstractions. All these abstractions can be represented in computer memory in various physical ways. As I mentioned before a table abstraction or also a matrix abstraction can represent a graph. SQL is not synonym of relational data model which is an application of first order predicate logic. T. Codd thought about using second order logic but as the person in the video mentioned simplicity is to be preferred. And again a data model without constraints is not a data model but simply data. A database might be backed by a data model or not. Most databases unfortunately are not support by proper data models as formal systems. A key value data base has no data model but a syntactical abstraction for representing data. It makes no guarantees on the semantic consistency of your data, therefore you can not safely rely 100%of the time on the inferences you derive of such databases. A graph is a methematical construct. A graph data model is not the same as a graph. And a graph database management system should not store graphs but a graph data model.
8:36 just dots and lines? No, it's just turtles on turtles, all the way down!
How is logic programming (e.g., Prolog) related to graph data bases?
learn much more about graph database. thanks a lot.
Relational database is not about relationships between tables. He is confusing terminology
True. The "Relational" in Relational Database refers to mathematical relations, i.e., tuples.
he told you "i didn't do well in computer science"
In practical application in Production environments, the business world, you couldn’t be more wrong.
@@aledmb then why is he making a case against traditional db ? Lol
This is a moot point, considering the expression of that relation is a table, connected to other tables.
That might have been the rational for mathematical theory, but in application it's a table. Just seems silly to make much adieu about nothing...
damn I love this guy, great talk about graph databases
You just helped me a lot Man, Thanks for your enlightening talk. Cheers! 🥂
to be honest you should've brought an example that is more complex to query with SQL, because relational databases are MADE to answer such a question with ease 12:30 .
Had interview where I was asked if I knew what graph DBS are. Now I know that the answer is: good for many data.
are graph databases relevant in 2018? is there a way I can find out how in demand knowledge of graph databases are in industry?
Since this is over a year old, judging by the name of the talk, I was curious as to whether the permissions for reupload take time, or is it just that you get to watch it late?
Hi Ethan9750. I usually republish new content but sometimes I add a bit older videos just because they are freakin' good :)
I watched this to learn about graph database. 13 1/2 minutes in and I have heard a lot of disdain for relational databases. I don't have any of those issues he talked about and I have had people ask for crazy things.
The example is a bit confusing, we not have all the picture!
Awesome Lecture! Well done....
Thanks for the video.
Ah he lost me when he talked about not being able to answer specific questions in SQL DBs. If the information is not there its simply impossible regardless of your DB type, if it is there and you can't get it out of a SQL db structure you built yourself in at MAXIMUM a day, then probably you have not used it that much (and nothing wrong with that, you could be an expert in other things)
A relational data model has relations (not tables rows and columns), constraints and operations applied to roations and the way data is physically stored is irrelevant for the model since any data model is a formal system, that's based on logic and can be physically implemented in many ways assuming it respects the logic of the defined formal system.
2:45 quite amusing, entertaining, and all but that is not all what a constraint in relational DB is about. For starters, RDBs follow normalization concepts while noSQL DBs do not. Actually noSQL does not mean no sequel. It is actually common to see sanity checks fail in ghost rows that were once part of the "thing" in the scheme, but the "thing" got deletedand the related row having no constraint was not. It is particularly common in nested objects. Maybe that is why FB migrated from Cassandra to PostGres almost the entire persistent layer. Quite sequeled insane, one would say. Now, that is a thing.
The speaker is awesome! and fun!
This would work well with the Trivium method (which is a way of thinking critically)
That guy is a friggin genius.
I wasnt expecting Gilfoyle to give a presentation. (Silicon Valley reference)
Somebody telll this guy that Comedy Central is hiring!!!
So the graph query language is just optimized in performance and comfort for a different use case? Okay. But SQL isn't the same as relational.
The Mark Rippetoe of IT
I've done too many comments just to say I do have a genuine interest in graph data models, I just feel unnecessary to refer to misunderstandings of what the relational data model is like SQL databases to make the case of graph data modelling. If the video was using SQL database every time it mentions relational data model instead it would be much more precise.
How can I weight the relationships here? I would need an algorythm to weight it (and also some kind of activation equiation for the nodes) to use it more like a NN.
I use 'personal brain' since the late 90s. A file manager based ond graph. But with ADHD I messed it up frequently.
You can add properties to edges in Neo4J, and that could store your weights.
This is really nice.
Hire PHP Developers no I won't
what IDE are you using to write the queries?
It's amazing to me how the relational model still dominates nearly 20 years after Zope/ZODB made its splash.
Maybe GDB dont have as many use cases as they say.