13:50 - He makes a disingenuous argument. If one had created their relational database to actually contain all of those pieces of information mentioned in the salesperson's question, then it would be possible to get the data out. But when he switches to talking about the graph database, he just presupposes that you have nodes that contain all of this information, and that you have complete information on all of your customers. You may have information about some of the customers that have bought toasters. You may have information about some of the customers that are ex-cons. But, in a graph database, you have no guarantee that you have complete information about each customer. More importantly, you do not know if your lack of information is because the customer never bought a toaster, or because you simply don't know about the toaster the customer bought. With a relational database, you can at least indicate which customers you have complete information about and then only consider those in your statistics. Now, I am a huge fan of graph databases. That's why I'm watching this video. However, it seems almost any time someone tries to compare and contrast two technologies, they always seem to make disingenuous comparisons. They will consider a critical Factor for one technology and then assume that critical Factor is not a big issue with the preferred technology. You got to make even comparisons if you are going to have any chance of choosing a technology based on its merit.
I feel like he only barely touched on the actual advantages of graph databases. The queries he showed off can be done in a relational database without any real performance issues. I think what really separates Graph databases from relational databases is their extensibility and how they treat relationships themselves as entirely separate entities. I'm an RDBMS amateur and have no experience with Graph databases so I could be way off, but it sounds like graph databases can be extremely easy to extend beyond their initially-defined schema without really needing much, if any, refactoring. You can just define a new relationship and start using it to link nodes together. With a relational database you'd have to do a whole lot more refactoring, adding or modifying columns, etc. And then there's the direct focus on relationships between nodes. If you're working with highly-interconnected data and the connections themselves have their own attributes beyond just the two nodes they link together, I can see how a graph database could be useful. Basically it sounds like it's more useful for modeling complex, constantly evolving networks. Like a Social Network for example, but one where you can freely define relationships between you and other people rather than picking from a dropdown or creating explicitly-defined lists of people. What I want to know is how the data is actually stored and indexed beyond just having nodes and relationships, and how that affects query performance. Take the very first query he showed, for example. A simple "SELECT * FROM questions" in MySQL will just find the "questions" table and return every row it contains. But the equivalent "MATCH (q:Question)" in Graph will... do what, exactly? Does it walk across the entire graph to build a list of question nodes? Are the nodes stored in some kind of other internal data structure that makes it possible to grab an entire category of nodes without needing to peek at every node and connection in the entire database?
One of the better presenters in the world because he has slides that people in the back can read and he actually talks about the content of the slides.
"It uses math that I don't understand, but it works. It's pretty cool." LOL. I agree. I've been using graph DB for a couple years and they are incredible for studying relationships between data. I highly recommend taking a look at the APOC procedures for Neo4j, because you will get a bumload of algorithms. I also can't recommend enough to read about the anti-patterns. Neo4j wrote great material about it. They will be fairly intuitive for you, like don't store giant blobs as node properties. I also totally recommend looking at some of the machine learning stuff for this. It's zany what you can do once you start doing like decision trees and shortest path analysis using APOC procedures.
As a student learning Relational Databases in my information studies degree, i’m so grateful that you introduced me to this new pattern for databases! Gotta stay on top of emerging standards and technologies. Thanks for sharing!
Fair bit of advice: "emerging" technologies like this often emerge from nowhere promising the moon, usually fail to deliver, and disappear just as quick. As a student you should probably focus on what is proven and tested and in-use in the industry.
As other commenters have pointed out, the vast majority of this video spends time on material that is easily handled in relational databases. OP says "indirect" a lot but as a long time user of relational databases, I hoped to hear more about queries where valid results mean traversing a variable number of "joins."
Agreed, though I've created and utilized data structures in RDBMSs for many years as well. Maybe then the real point is graph is easier to learn, not technically more powerful in any particular way?
Kind of goes back to the old saying... "if all you have is a hammer, the whole world looks like a nail". As a long time user of relational dbs myself, the question isn't "can I accomplish this using an existing tool", but rather "can this be MORE EASILY accomplished using a different tool (plus how long will ti take to learn, will it likely be around tomorrow, etc)?". We should ALWAYS be looking for more performant ways to achieve tasks, as this wastes less of our own development time, resources, etc. Additionally, when steering clients/companies, we have an obligation to avoid leading them into more and more technical debt, as they bind themselves to legacy architectures that are ill-suited to the growing complexities of modern questions/tasks.
What it made me do is to consider looking into alternative _languages_ to query the relational DBs we already have, because clearly SQL is not optimized enough for the common use case. Additionally it made wonder if graph DBs have some hidden power that goes beyond what was shown in the presentation, because that was all pretty basic level SQL.
This talk was interesting and taught me some things about graph DBs...but what it didn't do was point out *any* benefit to them. Every single thing mentioned in presentation is fairly easy to do with relational databases and SQL. If the benefit is that unknown questions become faster, then if you have a genuinely massive dataset, this could be good (or in fact amazing). Otherwise, it's pre-optimization, which is the mother of all anti-patterns. Because you could replace graph DBs apparently with just smart decision when you discover new questions. Either this wasn't explained well enough or the speaker doesn't know enough about relational databases (hardly uncommon, for some reason a lot of people just can't understand relational DBs).
A constraint in a data model is not something that "won't work" as in the example given of the foreign key but an element of a data model that is there to guarantee that the semantics associated to the data (is socially assigned meaning) is preserved when it's formalized in a logical system to be computerized. That is a constraint is not a problem but a necessary feature of any data model that want to oreserve semantics, the information carried by data as semantic content.
Jesus, at @12:11 it comes apparent that this guy has never managed a large relational database... I don't disagree that graph-databases have a lot of uses, but this isn't one of the cases.
The request was actually simple it would be a few joins but he makes it seem like its impossible to get that answer. If your DB schema is solid and good you can answer questions you never imagined
@@ShaoVideoProduction I guess? but not really. It's not expensive because it's been done 1,000,000 times in databases that are larger than 400million users and we haven't had any issues. So what gives?
Hands up if you ever fought deleting rows in MS Access! Great video and an interesting way to challenge the way we are storing data. I am looking at GDPR currently and it seems like sticking private stuff in a single related table will be allow a lot of freedom from the GDPR restrictions, I am guessing the Graph system fits the real world better and may not be easily manipulated in the same way.
This would be much better if he compared the queries between different databases, e.g. SQL vs Graph, and then pointed out the point of using a Graph database. This mostly seemed like examples of queries but not so much details on why it's better than anything else already out there for many many years.
A relational DB schema can also be visualised as a graph. This is interesting technology but I think the presenter just doesn't understand relational DB dismissing the technology. A sane relational DB can also answer arbitrary questions with relational algebra which can be translated into SQL.
see graph db's as optimized many-to-many relationships. They are very fast at "joining tables" which is their main purpose they also need a lot of RAM to load the entire graph and uses pre loaded graph's to make sure it's always fast. If you always run a graphDB from disk it's going to be a nightmare.
Great talk. I would love to see some simpler examples so I could get more familiar with the query language, but the speaker did an excellent job of getting me hyped on graph dbs (I've been in relationship hell on-and-off the job for my entire life).
At 13:25 he said the crucial thing. IF the data somehow is in the system. How could you answer an unanticipated question with any data system if the data is not available. It is not a matter of key constraints or storage structures.
13:00 Actually this query can be expressed in SQL because you can join the tables with the condition data to the Person table. It doesn't need recursive indirection links.
The example given about the coupon in kansas with the criminal record is not as the effin guy says. Just as you need graph data in some fashion to represent the relationships the same is true of relational dbs. equally as you have a graph between a person, a criminal record, address, purchases and coupons... those things come into existence in similar ways. To suggest that a graph db allows for a dynamic schema means you do not understand relational DB tools
But you can represent a graph with a relational database just fine Node(id,...fields) Edge(from_node_id,to_node_id,reltype_id) Relation_type(id,name) It's just that you're nodes can't be dynamica, without introducing a bunch of joins like Node_String_Fileds(node_id,value) Node_Integer_Fields(node_id,value)
I guess that makes relational databases also graph databases :) But I have a hunch they're much slower at joining that many tables and/or possibly needed recursive queries.
Put simply graph databases are optimized many-to-many relationships and that's what makes them useful and fast at what they do. For example give me all the friends of my friends who are not my friends (one of the reasons Facebook started using them) this takes forever in a standard relational DB simply because of design choices. In a graph db it's optimized to do these kinds of queries fast. Otherwise yes all graph DB's are easily implementable in a standard relational database also remember most if not all graphDB's are RAM databases in other words you have to have your graph loaded in RAM for things to go fast. Which right off the bat you see isn't what a standard relationalDB is based around yes a relationalDB can use a lot of RAM but it isn't a complete failure without it either. A graphDB will just not perform at all if it can't load the graph's in RAM. It's a different tool for a different task and a useful one at that.
That's the biggest problem I have with these kinds of lectures. They are always based on the idea that "this new thing will change your life completely and forever so you can just throw away all that old stuff which is bad and disgusting". In IT especially it is crucial to understand that most tools are good for something and no tool is best for everything and the challenge comes from finding the right tools, for the right job with the right cost. Any Turing complete language can do any job given enough time (to configure and to run) and resources, but if language A does the development three times faster than language B and language B does the execution 1.1 times faster than language A when you have X number of requests but language C does it 10 faster than either language A or B but takes 5 times as long to develop than language A, which is the best language for the job? The answer is that it completely depends on the situation and resources you have. No single technology will ever completely crush an older technology in all possible areas especially when it's little used, little known and relatively new. Still they are usually marketed as such through lectures like this with faulty logic and dubious or even completely misunderstood arguments and examples.
"For example give me all the friends of my friends who are not my friends" SELECT ff.PersonID FROM Person me INNER JOIN PersonFriends mf ON mf.PersonID = me.PersonID INNER JOIN PersonFriends ff ON ff.PersonID = mf.FriendPersonID LEFT JOIN PersonFriends nmf ON nmf.PersonID = ff.PersonID AND nmf.FriendPersonID = me.PersonID WHERE me.PersonName = 'Myself' AND nmf.PersonID IS NULL If the PersonID and FriendPersonID and PersonName fields are indexed, this query results in a single lookup, filter, and fetch. Not really that hard...
+HMan I'm pretty sure each one of those joins is a separate set of lookups, and that each lookup is logarithmic, not constant (single). So with N Persons, and a "me" with F average friends, each lookup is log(N), and your algo breaks down to: find me (+ log(N)) find my friends (+ F * log(N)) for each friend, find their friends (+ F^2 * log(N)) filter out my friends (no new lookups, + negligible constant K) assuming our number of friends' friends is F^2, and assuming the cost of filtering is negligible constant K, we get a worstcase runtime of log(N) + F * log(N) + F^2 * log(N) + K = log(N) * (1 + F + F^2) which isn't terrible, but it could be better. with a graph, only the first lookup is necessary, and everything else can be found through the nodes' edges (in _truly_ constant runtime, because nodes are not relative offsets in an index but absolute offsets in memory). there are notably less lookups this way: find me (+ log(N)) find my friends (+ F) for each friend, find their friends (+ F^2) filter out my friends (+ negligible constant K) which gives a worstcase runtime of log(N) + F + F^2 + K = log(N) + F + F^2 Which is quite a bit better when working with extremely large datasets. At least that's my understanding of it, I hope i got that right.
I'm not convinced. The queries shown in this talk are all beginner- or at most intermediate-level SQL. In fact, I could easily write a transpiler from the subset of ASCII-art language used in the presentation into SQL, assuming you are only joining on foreign key constraints-which of course SQL doesn't restrict you into doing. So either the power of graph databases lies elsewhere, in some advanced features that were not shown, or this thing is just another poor man's schema-less store with a fancy query language. For example, off the top of my head: can you traverse a variable number of joins/edges, depending on the data you find? Can you traverse a variable _type_ of join/edge, depending on the data you find? Can you express a recursive definition of joins/edges, again depending on the data? What about a set of mutually recursive definitions of joins? Are such things performant? And so on. Mind you, good relational DBs allow you to do most of that, but they don't necessarily make it easy or performant, so I could see the use of something more powerful. Which this presentation did not show.
Ok so if I’m understanding correctly: Graph db for when you want to access indirect relationships and not be limited by a schema Document db for when you don’t want to be limited by a schema, but most data be accessed is usually all in the same document (?) Relational db when you want extreme structure, you want that schema as a safety net. And you’re not looking to access indirect relationships Is that correct?
This is a moot point, considering the expression of that relation is a table, connected to other tables. That might have been the rational for mathematical theory, but in application it's a table. Just seems silly to make much adieu about nothing...
My issue with these kind of databases is the overhead of Json format, it is an improvement over XML for sure, but it is not good for large result sets, you need to repeat your "scheme" in every data record, so it will perform poorly if you have a huge dataset as a result, and parse it out of Json has a cost, altho today that is very optimized, but still is a cost, i need a binary version of this i suppose, but the lack of scheme, make it hard to then parse it out. It's complex. 😕
12:10 Something like this should work, but criminal records is probably an API call with select user info: select count(uid) from criminal_records where uid in ( (select distinct uid from purchases where itemtype='toaster') p join (select distinct uid from users where city='Kansas') u on u.uid = p.uid join (select distinct uid from redeemed_coupons where publication_date > current_date - 1) c on c.uid = p.uid )
I must be missing something, anyone who is competent in SQL will be able to design a database in the given scenario and create those queries using an, imo, much simpler syntax as well?
Nearly all the examples he showed were easily doable in any relational database. I'll give a little leniency on the last one... But, its not like what he showed, was any more less complex then what a `join` statement would look like. Essentially replaced `JOIN` with `MATCH` ..... Then speed? There's a pretty damning whitepaper out now that shows how relationship DB's performance vastly better in nearly 95% of all the cases you'd need a DB for. Bottom line is don't get caught up in superficial hype trains and start evaluating actual-realistic use cases for a Graph. These examples are not it..
A relational data model has relations (not tables rows and columns), constraints and operations applied to roations and the way data is physically stored is irrelevant for the model since any data model is a formal system, that's based on logic and can be physically implemented in many ways assuming it respects the logic of the defined formal system.
It's very important to have in mind that the slowness of current available SQL databases (none of them are actual implementations of the relational data model, please refer to at least Fabian Pascal, C.J.Date and David McGoveran) has nothing to to with features of the relational datamodel itself. How does a graph is stored in a computer? A computer does not stores nodes and edges, these are abstractios, the same way a relation or other things like tables are abstractions. All these abstractions can be represented in computer memory in various physical ways. As I mentioned before a table abstraction or also a matrix abstraction can represent a graph. SQL is not synonym of relational data model which is an application of first order predicate logic. T. Codd thought about using second order logic but as the person in the video mentioned simplicity is to be preferred. And again a data model without constraints is not a data model but simply data. A database might be backed by a data model or not. Most databases unfortunately are not support by proper data models as formal systems. A key value data base has no data model but a syntactical abstraction for representing data. It makes no guarantees on the semantic consistency of your data, therefore you can not safely rely 100%of the time on the inferences you derive of such databases. A graph is a methematical construct. A graph data model is not the same as a graph. And a graph database management system should not store graphs but a graph data model.
I know that this is fairly certainly naivete (from someone who's largely new to practical programming, and also definitely on the opposite side of the theoretical/practical scale), but everything I've learned about relational databases leaves me thinking of as, well, just objects, with a few (admittedly very nice) tie ins to keep track of all the props, and types of props. It seem a little silly that almost nobody was doing this before a couple years ago. What am I missing? Do you need processors of a sufficient strength or flexibility to make the payoff worth it? Is the math knottier than I thought? Am I just being unspeakably naive about how entrenched SQL was, and how hard it would have been to switch over?
WHAT exactly is the payoff is the question? To me all of this was just flavor of the month snowflake idea from someone who doesn't understand why relational databases are so widely used, even though Graph databases as a concept already existed in the 60's. I don't see the payoff at all.
What compels people dig all the way down into the youtube comments for a genuine question and respond with contentless curmugeonry? The world may never know!
Graph databases messed up my life. I vouched for the technology at my company. The problem is that in theory it's all good but the technology is so new that all the graph databases out there have so many issues that make them unusable in production. Stay away from Neo4J and Orientdb!
@@vishaljotshi6869 it's been 4 years since I put my reputation on the line for this technology. It's probably much better now. I still would not recommend graph unless you want to try something new. They are fun to play with but not so much fun when things fall apart in prod.
@@LordBadenRulezwhat kind of problems did you have in production. Exist so much possibilities and use cases, maybe your problem was the focus. or the model it's wasn't thought for a concrete problems.
As usual, this is a great and god awful presentation at the same time on an interesting tool for a limited purpose. I'd be happy to have this tool sitting on top of a MySQL schema in case I need to get certain type of answers. I've seen people give some good use cases in separate presentations dealing with other subjects. I do appreciate the criticism that some queries do end up being slow due to too many joints and DBs not being optimized to answer the particular question. Even more noticeable when the DB has lower amounts of RAM allocated to it. But this issue is easily rectified by one or a collection of Functions/Stored Procedures that allow to break up the task into smaller modules (queries) A mistake made by most of the modern developers who don't actually understand properly the technologies they work with but are all too happy to jump onto the next new thing as long as it sounds cool and confuses potential customers into paying a bit of extra.
Ah he lost me when he talked about not being able to answer specific questions in SQL DBs. If the information is not there its simply impossible regardless of your DB type, if it is there and you can't get it out of a SQL db structure you built yourself in at MAXIMUM a day, then probably you have not used it that much (and nothing wrong with that, you could be an expert in other things)
I've done too many comments just to say I do have a genuine interest in graph data models, I just feel unnecessary to refer to misunderstandings of what the relational data model is like SQL databases to make the case of graph data modelling. If the video was using SQL database every time it mentions relational data model instead it would be much more precise.
I watched this to learn about graph database. 13 1/2 minutes in and I have heard a lot of disdain for relational databases. I don't have any of those issues he talked about and I have had people ask for crazy things.
to be honest you should've brought an example that is more complex to query with SQL, because relational databases are MADE to answer such a question with ease 12:30 .
A row, entity and not mentioned tuples are not always the same in relational databases. Writing queries in databases are easy.. learn the difference between your joins
"How many people, bought a toaster, live in kansas, have criminal record, used a coupon" - you can answer that questions in a graph db(- assuming you had all the date in the db, and there are paths between them). But its also easy to answer that question in an RDBMS (assuming you had all the date in the db, and there are relationships between them). So what am I missing????
Since this is over a year old, judging by the name of the talk, I was curious as to whether the permissions for reupload take time, or is it just that you get to watch it late?
Hello, may i have your permission to move this video to bilibili and cite the source of Coding Tech youtube channel to share it with more people in China~? It's a great video, sad that most of us have no VPN access to visit youtube...
Great video, but the comments about relational databases being rigid and needing to know all the query requirements up front is completely the opposite of reality. One of the main advantages of a normalized relational database is the query flexibility. The joins might be ugly, but you can create any query you'd like against it. NoSQL key value stores on the other hand, do require that you know all your access patterns up front. Can't wait to try a graph database out one day!
Yeah. It changes my life...become more miserable (lol) with all those type of brackets (parentheses, square, curly) and semicolons in a query. Why don't you guys make SQL as standard of reading data ? It's a good presentation though.
The value in graph databases is as much the relationships of data as the data. It's difficult to do some of this in sql, representations would look like tables and it just wouldn't work easily.
Do we have to go through this again? The great debate settled the issue almost 60 years ago. (The debate happened between the two future ACM Turing award recipients). CODASYL is dead, and relational databases won.
CODASYL was an honest attempt to make a graph database. At the time everybody was convinced that graph database is the way to go. There was a century long development of algebra of binary relations which is the foundation of any graph based approach. And there came relatively unknown database researcher with new perspective (Codd)... Please don't tell me that some yahoo programmer can develop an awesome database system. It takes more than that.
Not to mention, even if he could, who would use it? Who wants to test grand theories of supposedly better data management on their customer's data? It's a novelty item at best...
2:45 quite amusing, entertaining, and all but that is not all what a constraint in relational DB is about. For starters, RDBs follow normalization concepts while noSQL DBs do not. Actually noSQL does not mean no sequel. It is actually common to see sanity checks fail in ghost rows that were once part of the "thing" in the scheme, but the "thing" got deletedand the related row having no constraint was not. It is particularly common in nested objects. Maybe that is why FB migrated from Cassandra to PostGres almost the entire persistent layer. Quite sequeled insane, one would say. Now, that is a thing.
I think many of the existing replies somewhat unfairly bind the topic here. The appeal of such a concept concerning your own, existing "graphs" in life are simple to input, file by file. "I have 6 bananas, 3 tbsp soy sauce + some addy, what recipes can I make without spending over $20 at the store ?"
I watched the first 25 mins of the video. The guy was funny and said a lot of words, but like most people did not really say anything. Just a bunch of words. Why don't you tell me about the actual functions and logic behind graphs.
he isnt rude. if you have ever read a textbook about a spezific information it does not come with uneccesary rubbish talking. give me the Information i dont want any context to it, i will figure it out myself HOW I use that
Jeremy Anderson. John is not being rude. He is being logical and straightforward which is a lot more than I can say about the speaker. The speaker is purposely leaving out information/begin vague such that his statements appear as factual evidence for his arguments/conclusions. But in reality, he doesn't provide any valid arguments just a bunch of misleading and fallacy driven arguments.. Everything he mentioned is easily achieved with Relational Databases and in many cases can be easier/faster than using a No SQL database. I'm a Full Stack Engineer and Solutions Architect for a corporation with many large Relationship Databases and No SQL/JSON databases. So I have quite a bit of experience since I do it on the daily. Especially as a Solution Architect, I have to stay current and well versed in new technologies in order to make the best architectural recommendation/requirements for each new project or application.
This is how all technical stuff should be taught. Great job!
Louis CK got really good at databases with all his down time...
Kind of sounds like him too!
this is an underappreciated comment
Best. Comment.
Lmao 🤣
Looool
13:50 - He makes a disingenuous argument. If one had created their relational database to actually contain all of those pieces of information mentioned in the salesperson's question, then it would be possible to get the data out. But when he switches to talking about the graph database, he just presupposes that you have nodes that contain all of this information, and that you have complete information on all of your customers. You may have information about some of the customers that have bought toasters. You may have information about some of the customers that are ex-cons. But, in a graph database, you have no guarantee that you have complete information about each customer. More importantly, you do not know if your lack of information is because the customer never bought a toaster, or because you simply don't know about the toaster the customer bought. With a relational database, you can at least indicate which customers you have complete information about and then only consider those in your statistics.
Now, I am a huge fan of graph databases. That's why I'm watching this video. However, it seems almost any time someone tries to compare and contrast two technologies, they always seem to make disingenuous comparisons. They will consider a critical Factor for one technology and then assume that critical Factor is not a big issue with the preferred technology.
You got to make even comparisons if you are going to have any chance of choosing a technology based on its merit.
I feel like he only barely touched on the actual advantages of graph databases. The queries he showed off can be done in a relational database without any real performance issues. I think what really separates Graph databases from relational databases is their extensibility and how they treat relationships themselves as entirely separate entities. I'm an RDBMS amateur and have no experience with Graph databases so I could be way off, but it sounds like graph databases can be extremely easy to extend beyond their initially-defined schema without really needing much, if any, refactoring. You can just define a new relationship and start using it to link nodes together. With a relational database you'd have to do a whole lot more refactoring, adding or modifying columns, etc. And then there's the direct focus on relationships between nodes. If you're working with highly-interconnected data and the connections themselves have their own attributes beyond just the two nodes they link together, I can see how a graph database could be useful.
Basically it sounds like it's more useful for modeling complex, constantly evolving networks. Like a Social Network for example, but one where you can freely define relationships between you and other people rather than picking from a dropdown or creating explicitly-defined lists of people.
What I want to know is how the data is actually stored and indexed beyond just having nodes and relationships, and how that affects query performance. Take the very first query he showed, for example. A simple "SELECT * FROM questions" in MySQL will just find the "questions" table and return every row it contains. But the equivalent "MATCH (q:Question)" in Graph will... do what, exactly? Does it walk across the entire graph to build a list of question nodes? Are the nodes stored in some kind of other internal data structure that makes it possible to grab an entire category of nodes without needing to peek at every node and connection in the entire database?
The delivery of this presentation was excellent. Thanks for the insight into graph dbs.
His style and his natural understanding of the topic are such and inspiration for me!
One of the better presenters in the world because he has slides that people in the back can read and he actually talks about the content of the slides.
"It uses math that I don't understand, but it works. It's pretty cool." LOL. I agree. I've been using graph DB for a couple years and they are incredible for studying relationships between data. I highly recommend taking a look at the APOC procedures for Neo4j, because you will get a bumload of algorithms. I also can't recommend enough to read about the anti-patterns. Neo4j wrote great material about it. They will be fairly intuitive for you, like don't store giant blobs as node properties. I also totally recommend looking at some of the machine learning stuff for this. It's zany what you can do once you start doing like decision trees and shortest path analysis using APOC procedures.
Excellent presentation: Fast, lively, practical. Will be exploring this technology further.
As a student learning Relational Databases in my information studies degree, i’m so grateful that you introduced me to this new pattern for databases! Gotta stay on top of emerging standards and technologies. Thanks for sharing!
Fair bit of advice: "emerging" technologies like this often emerge from nowhere promising the moon, usually fail to deliver, and disappear just as quick. As a student you should probably focus on what is proven and tested and in-use in the industry.
Loved the talk, loved the humor. Great introduction... I'm hopefully working with a startup soon who's deeply using Neo4j
As other commenters have pointed out, the vast majority of this video spends time on material that is easily handled in relational databases.
OP says "indirect" a lot but as a long time user of relational databases, I hoped to hear more about queries where valid results mean traversing a variable number of "joins."
As a long time user of relational databases, perhaps you've lost touch with how complicated they can be to work with.
Maybe :) But coming from the relational side, I wanted to see problems that *I* would consider extremely difficult in relational.
Agreed, though I've created and utilized data structures in RDBMSs for many years as well.
Maybe then the real point is graph is easier to learn, not technically more powerful in any particular way?
matthew rummler hmm hadn't thought of it like that... oh maybe thats what amici was saying
Kind of goes back to the old saying... "if all you have is a hammer, the whole world looks like a nail". As a long time user of relational dbs myself, the question isn't "can I accomplish this using an existing tool", but rather "can this be MORE EASILY accomplished using a different tool (plus how long will ti take to learn, will it likely be around tomorrow, etc)?". We should ALWAYS be looking for more performant ways to achieve tasks, as this wastes less of our own development time, resources, etc. Additionally, when steering clients/companies, we have an obligation to avoid leading them into more and more technical debt, as they bind themselves to legacy architectures that are ill-suited to the growing complexities of modern questions/tasks.
this was a very good talk that made me consider graph dbs. thanks
What it made me do is to consider looking into alternative _languages_ to query the relational DBs we already have, because clearly SQL is not optimized enough for the common use case.
Additionally it made wonder if graph DBs have some hidden power that goes beyond what was shown in the presentation, because that was all pretty basic level SQL.
This talk was interesting and taught me some things about graph DBs...but what it didn't do was point out *any* benefit to them. Every single thing mentioned in presentation is fairly easy to do with relational databases and SQL. If the benefit is that unknown questions become faster, then if you have a genuinely massive dataset, this could be good (or in fact amazing). Otherwise, it's pre-optimization, which is the mother of all anti-patterns. Because you could replace graph DBs apparently with just smart decision when you discover new questions.
Either this wasn't explained well enough or the speaker doesn't know enough about relational databases (hardly uncommon, for some reason a lot of people just can't understand relational DBs).
A constraint in a data model is not something that "won't work" as in the example given of the foreign key but an element of a data model that is there to guarantee that the semantics associated to the data (is socially assigned meaning) is preserved when it's formalized in a logical system to be computerized. That is a constraint is not a problem but a necessary feature of any data model that want to oreserve semantics, the information carried by data as semantic content.
Jesus, at @12:11 it comes apparent that this guy has never managed a large relational database... I don't disagree that graph-databases have a lot of uses, but this isn't one of the cases.
Yep! I agree.
The point is that this is a complex and expensive query, you don't want that.
The request was actually simple it would be a few joins but he makes it seem like its impossible to get that answer. If your DB schema is solid and good you can answer questions you never imagined
@@ALLCAPS i said expensiv, not hard. Bubble sort for exsmple is easy and expensiv.
@@ShaoVideoProduction I guess? but not really. It's not expensive because it's been done 1,000,000 times in databases that are larger than 400million users and we haven't had any issues. So what gives?
Hands up if you ever fought deleting rows in MS Access! Great video and an interesting way to challenge the way we are storing data. I am looking at GDPR currently and it seems like sticking private stuff in a single related table will be allow a lot of freedom from the GDPR restrictions, I am guessing the Graph system fits the real world better and may not be easily manipulated in the same way.
Last time I did that was around the time my pet dinosaur died
Its 6 years later - I just checked to see if comments reflected all these changed lives...
This would be much better if he compared the queries between different databases, e.g. SQL vs Graph, and then pointed out the point of using a Graph database. This mostly seemed like examples of queries but not so much details on why it's better than anything else already out there for many many years.
The concept of Graph DBs somehow reminds me of Prolog ^^
This guy is really funny. I got a tech talk and a stand up show in one sitting.
A relational DB schema can also be visualised as a graph. This is interesting technology but I think the presenter just doesn't understand relational DB dismissing the technology. A sane relational DB can also answer arbitrary questions with relational algebra which can be translated into SQL.
see graph db's as optimized many-to-many relationships. They are very fast at "joining tables" which is their main purpose they also need a lot of RAM to load the entire graph and uses pre loaded graph's to make sure it's always fast. If you always run a graphDB from disk it's going to be a nightmare.
As a graph with fat arrows which go between table and table maybe, but not as a graph with alots of granular arrows for each dataset/document.
Listening to this lecture is so fun😍
The guy is a Great Teacher.
Great talk. I would love to see some simpler examples so I could get more familiar with the query language, but the speaker did an excellent job of getting me hyped on graph dbs (I've been in relationship hell on-and-off the job for my entire life).
What was that? 4:09
🤣
At 13:25 he said the crucial thing. IF the data somehow is in the system. How could you answer an unanticipated question with any data system if the data is not available. It is not a matter of key constraints or storage structures.
13:00 Actually this query can be expressed in SQL because you can join the tables with the condition data to the Person table. It doesn't need recursive indirection links.
Relational databases are named after the mathematical concept of relations. It's not about table relationships.
Thanks Ed! Very easy to follow and comprehensive talk about graph db
Interesting - realistically it seems like a graph database is just a relational database that maintains its own join tables though.
The example given about the coupon in kansas with the criminal record is not as the effin guy says. Just as you need graph data in some fashion to represent the relationships the same is true of relational dbs. equally as you have a graph between a person, a criminal record, address, purchases and coupons... those things come into existence in similar ways. To suggest that a graph db allows for a dynamic schema means you do not understand relational DB tools
But you can represent a graph with a relational database just fine
Node(id,...fields)
Edge(from_node_id,to_node_id,reltype_id)
Relation_type(id,name)
It's just that you're nodes can't be dynamica, without introducing a bunch of joins like
Node_String_Fileds(node_id,value)
Node_Integer_Fields(node_id,value)
I guess that makes relational databases also graph databases :)
But I have a hunch they're much slower at joining that many tables and/or possibly needed recursive queries.
Put simply graph databases are optimized many-to-many relationships and that's what makes them useful and fast at what they do. For example give me all the friends of my friends who are not my friends (one of the reasons Facebook started using them) this takes forever in a standard relational DB simply because of design choices. In a graph db it's optimized to do these kinds of queries fast. Otherwise yes all graph DB's are easily implementable in a standard relational database also remember most if not all graphDB's are RAM databases in other words you have to have your graph loaded in RAM for things to go fast. Which right off the bat you see isn't what a standard relationalDB is based around yes a relationalDB can use a lot of RAM but it isn't a complete failure without it either. A graphDB will just not perform at all if it can't load the graph's in RAM.
It's a different tool for a different task and a useful one at that.
That's the biggest problem I have with these kinds of lectures. They are always based on the idea that "this new thing will change your life completely and forever so you can just throw away all that old stuff which is bad and disgusting".
In IT especially it is crucial to understand that most tools are good for something and no tool is best for everything and the challenge comes from finding the right tools, for the right job with the right cost. Any Turing complete language can do any job given enough time (to configure and to run) and resources, but if language A does the development three times faster than language B and language B does the execution 1.1 times faster than language A when you have X number of requests but language C does it 10 faster than either language A or B but takes 5 times as long to develop than language A, which is the best language for the job? The answer is that it completely depends on the situation and resources you have.
No single technology will ever completely crush an older technology in all possible areas especially when it's little used, little known and relatively new. Still they are usually marketed as such through lectures like this with faulty logic and dubious or even completely misunderstood arguments and examples.
"For example give me all the friends of my friends who are not my friends"
SELECT ff.PersonID
FROM Person me
INNER JOIN PersonFriends mf ON mf.PersonID = me.PersonID
INNER JOIN PersonFriends ff ON ff.PersonID = mf.FriendPersonID
LEFT JOIN PersonFriends nmf ON nmf.PersonID = ff.PersonID AND nmf.FriendPersonID = me.PersonID
WHERE me.PersonName = 'Myself' AND nmf.PersonID IS NULL
If the PersonID and FriendPersonID and PersonName fields are indexed, this query results in a single lookup, filter, and fetch.
Not really that hard...
+HMan I'm pretty sure each one of those joins is a separate set of lookups, and that each lookup is logarithmic, not constant (single). So with N Persons, and a "me" with F average friends, each lookup is log(N), and your algo breaks down to:
find me (+ log(N))
find my friends (+ F * log(N))
for each friend, find their friends (+ F^2 * log(N))
filter out my friends (no new lookups, + negligible constant K)
assuming our number of friends' friends is F^2, and assuming the cost of filtering is negligible constant K, we get a worstcase runtime of
log(N) + F * log(N) + F^2 * log(N) + K = log(N) * (1 + F + F^2)
which isn't terrible, but it could be better. with a graph, only the first lookup is necessary, and everything else can be found through the nodes' edges (in _truly_ constant runtime, because nodes are not relative offsets in an index but absolute offsets in memory). there are notably less lookups this way:
find me (+ log(N))
find my friends (+ F)
for each friend, find their friends (+ F^2)
filter out my friends (+ negligible constant K)
which gives a worstcase runtime of
log(N) + F + F^2 + K = log(N) + F + F^2
Which is quite a bit better when working with extremely large datasets.
At least that's my understanding of it, I hope i got that right.
I'm not convinced. The queries shown in this talk are all beginner- or at most intermediate-level SQL.
In fact, I could easily write a transpiler from the subset of ASCII-art language used in the presentation into SQL, assuming you are only joining on foreign key constraints-which of course SQL doesn't restrict you into doing.
So either the power of graph databases lies elsewhere, in some advanced features that were not shown, or this thing is just another poor man's schema-less store with a fancy query language.
For example, off the top of my head: can you traverse a variable number of joins/edges, depending on the data you find? Can you traverse a variable _type_ of join/edge, depending on the data you find? Can you express a recursive definition of joins/edges, again depending on the data? What about a set of mutually recursive definitions of joins? Are such things performant? And so on.
Mind you, good relational DBs allow you to do most of that, but they don't necessarily make it easy or performant, so I could see the use of something more powerful.
Which this presentation did not show.
If you cant do this in SQL, then you’re not very good at it. But graphs databases look like they’re easier to learn; will check it out
Figuring out indirect relationship is the strength of Graph database is my key takeaway from the excellent talk...
Very cool, think this guy should just do every tech talk from now on
Not even a minute has passed, and I already like this guy. Looks like it's going to be a pretty good talk!
Long time ago I used to program in Prolog. It would do this kind of stuff but so much more elegant. -- if you can get your head round recursion.
If I was in charge, I'd make anyone planning on using a graph database learn Prolog first, and then see if they still needed a graph database.
It might be a good idea to extend graphs to hypergraphs - that is sets of nodes, not just binary relations.
Ok so if I’m understanding correctly:
Graph db for when you want to access indirect relationships and not be limited by a schema
Document db for when you don’t want to be limited by a schema, but most data be accessed is usually all in the same document (?)
Relational db when you want extreme structure, you want that schema as a safety net. And you’re not looking to access indirect relationships
Is that correct?
Relational database is not about relationships between tables. He is confusing terminology
True. The "Relational" in Relational Database refers to mathematical relations, i.e., tuples.
he told you "i didn't do well in computer science"
In practical application in Production environments, the business world, you couldn’t be more wrong.
@@aledmb then why is he making a case against traditional db ? Lol
This is a moot point, considering the expression of that relation is a table, connected to other tables.
That might have been the rational for mathematical theory, but in application it's a table. Just seems silly to make much adieu about nothing...
My issue with these kind of databases is the overhead of Json format, it is an improvement over XML for sure, but it is not good for large result sets, you need to repeat your "scheme" in every data record, so it will perform poorly if you have a huge dataset as a result, and parse it out of Json has a cost, altho today that is very optimized, but still is a cost, i need a binary version of this i suppose, but the lack of scheme, make it hard to then parse it out. It's complex. 😕
Great overview of graph dbs and fun to watch.
Oh man, advertisers are going to love this.
12:10 Something like this should work, but criminal records is probably an API call with select user info:
select count(uid) from criminal_records
where uid in (
(select distinct uid from purchases where itemtype='toaster') p
join (select distinct uid from users where city='Kansas') u on u.uid = p.uid
join (select distinct uid from redeemed_coupons where publication_date > current_date - 1) c on c.uid = p.uid
)
I must be missing something, anyone who is competent in SQL will be able to design a database in the given scenario and create those queries using an, imo, much simpler syntax as well?
You missed the point. His talk was a bridge.
Callum Vass can relate
really interesting, providing me with lots of ideas for data manipulation in work :)
12:30 looks easy in SQL '__')
SQL sucks at temporal queries.
Nearly all the examples he showed were easily doable in any relational database. I'll give a little leniency on the last one... But, its not like what he showed, was any more less complex then what a `join` statement would look like. Essentially replaced `JOIN` with `MATCH` .....
Then speed? There's a pretty damning whitepaper out now that shows how relationship DB's performance vastly better in nearly 95% of all the cases you'd need a DB for.
Bottom line is don't get caught up in superficial hype trains and start evaluating actual-realistic use cases for a Graph. These examples are not it..
A relational data model has relations (not tables rows and columns), constraints and operations applied to roations and the way data is physically stored is irrelevant for the model since any data model is a formal system, that's based on logic and can be physically implemented in many ways assuming it respects the logic of the defined formal system.
It's very important to have in mind that the slowness of current available SQL databases (none of them are actual implementations of the relational data model, please refer to at least Fabian Pascal, C.J.Date and David McGoveran) has nothing to to with features of the relational datamodel itself. How does a graph is stored in a computer? A computer does not stores nodes and edges, these are abstractios, the same way a relation or other things like tables are abstractions. All these abstractions can be represented in computer memory in various physical ways. As I mentioned before a table abstraction or also a matrix abstraction can represent a graph. SQL is not synonym of relational data model which is an application of first order predicate logic. T. Codd thought about using second order logic but as the person in the video mentioned simplicity is to be preferred. And again a data model without constraints is not a data model but simply data. A database might be backed by a data model or not. Most databases unfortunately are not support by proper data models as formal systems. A key value data base has no data model but a syntactical abstraction for representing data. It makes no guarantees on the semantic consistency of your data, therefore you can not safely rely 100%of the time on the inferences you derive of such databases. A graph is a methematical construct. A graph data model is not the same as a graph. And a graph database management system should not store graphs but a graph data model.
I know that this is fairly certainly naivete (from someone who's largely new to practical programming, and also definitely on the opposite side of the theoretical/practical scale), but everything I've learned about relational databases leaves me thinking of as, well, just objects, with a few (admittedly very nice) tie ins to keep track of all the props, and types of props. It seem a little silly that almost nobody was doing this before a couple years ago. What am I missing? Do you need processors of a sufficient strength or flexibility to make the payoff worth it? Is the math knottier than I thought? Am I just being unspeakably naive about how entrenched SQL was, and how hard it would have been to switch over?
WHAT exactly is the payoff is the question? To me all of this was just flavor of the month snowflake idea from someone who doesn't understand why relational databases are so widely used, even though Graph databases as a concept already existed in the 60's. I don't see the payoff at all.
What compels people dig all the way down into the youtube comments for a genuine question and respond with contentless curmugeonry? The world may never know!
Graph databases messed up my life. I vouched for the technology at my company. The problem is that in theory it's all good but the technology is so new that all the graph databases out there have so many issues that make them unusable in production. Stay away from Neo4J and Orientdb!
I hope those issues would be resolved by now and please write back if you think they are good to be used in production now
@@vishaljotshi6869 it's been 4 years since I put my reputation on the line for this technology. It's probably much better now. I still would not recommend graph unless you want to try something new. They are fun to play with but not so much fun when things fall apart in prod.
@@LordBadenRulezwhat kind of problems did you have in production. Exist so much possibilities and use cases, maybe your problem was the focus. or the model it's wasn't thought for a concrete problems.
Incredible talk and dude!
Jakob Lindskog î
great talk on an interesting topic, something for further research for sure, thank you!
As usual, this is a great and god awful presentation at the same time on an interesting tool for a limited purpose. I'd be happy to have this tool sitting on top of a MySQL schema in case I need to get certain type of answers. I've seen people give some good use cases in separate presentations dealing with other subjects.
I do appreciate the criticism that some queries do end up being slow due to too many joints and DBs not being optimized to answer the particular question. Even more noticeable when the DB has lower amounts of RAM allocated to it.
But this issue is easily rectified by one or a collection of Functions/Stored Procedures that allow to break up the task into smaller modules (queries)
A mistake made by most of the modern developers who don't actually understand properly the technologies they work with but are all too happy to jump onto the next new thing as long as it sounds cool and confuses potential customers into paying a bit of extra.
Ah he lost me when he talked about not being able to answer specific questions in SQL DBs. If the information is not there its simply impossible regardless of your DB type, if it is there and you can't get it out of a SQL db structure you built yourself in at MAXIMUM a day, then probably you have not used it that much (and nothing wrong with that, you could be an expert in other things)
I've done too many comments just to say I do have a genuine interest in graph data models, I just feel unnecessary to refer to misunderstandings of what the relational data model is like SQL databases to make the case of graph data modelling. If the video was using SQL database every time it mentions relational data model instead it would be much more precise.
Fantastic introduction to Graph Databases, very engaging speaker.
I watched this to learn about graph database. 13 1/2 minutes in and I have heard a lot of disdain for relational databases. I don't have any of those issues he talked about and I have had people ask for crazy things.
Who in La La Land is this guy? He is HILARIOUS!!!! What a GREAT presenter!!!!
to be honest you should've brought an example that is more complex to query with SQL, because relational databases are MADE to answer such a question with ease 12:30 .
A row, entity and not mentioned tuples are not always the same in relational databases. Writing queries in databases are easy.. learn the difference between your joins
The example is a bit confusing, we not have all the picture!
"How many people, bought a toaster, live in kansas, have criminal record, used a coupon" - you can answer that questions in a graph db(- assuming you had all the date in the db, and there are paths between them). But its also easy to answer that question in an RDBMS (assuming you had all the date in the db, and there are relationships between them). So what am I missing????
The needed query is somewhat complex and quite expensive
This is a brilliant explanation.
I wasnt expecting Gilfoyle to give a presentation. (Silicon Valley reference)
Since this is over a year old, judging by the name of the talk, I was curious as to whether the permissions for reupload take time, or is it just that you get to watch it late?
Hi Ethan9750. I usually republish new content but sometimes I add a bit older videos just because they are freakin' good :)
Hello, may i have your permission to move this video to bilibili and cite the source of Coding Tech youtube channel to share it with more people in China~? It's a great video, sad that most of us have no VPN access to visit youtube...
Great video, but the comments about relational databases being rigid and needing to know all the query requirements up front is completely the opposite of reality. One of the main advantages of a normalized relational database is the query flexibility. The joins might be ugly, but you can create any query you'd like against it. NoSQL key value stores on the other hand, do require that you know all your access patterns up front.
Can't wait to try a graph database out one day!
Yeah. It changes my life...become more miserable (lol) with all those type of brackets (parentheses, square, curly) and semicolons in a query. Why don't you guys make SQL as standard of reading data ? It's a good presentation though.
The value in graph databases is as much the relationships of data as the data. It's difficult to do some of this in sql, representations would look like tables and it just wouldn't work easily.
How is logic programming (e.g., Prolog) related to graph data bases?
Thank you for this very useful video!
Amazing speaker and topic!
1 minute in and I’m already cracking up.
Had interview where I was asked if I knew what graph DBS are. Now I know that the answer is: good for many data.
Excellent talk about graph databases
Do we have to go through this again? The great debate settled the issue almost 60 years ago. (The debate happened between the two future ACM Turing award recipients). CODASYL is dead, and relational databases won.
CODASYL was a long time ago and has vanishingly little to do with the non-relational databases that are used today.
CODASYL was an honest attempt to make a graph database. At the time everybody was convinced that graph database is the way to go. There was a century long development of algebra of binary relations which is the foundation of any graph based approach. And there came relatively unknown database researcher with new perspective (Codd)... Please don't tell me that some yahoo programmer can develop an awesome database system. It takes more than that.
Not to mention, even if he could, who would use it? Who wants to test grand theories of supposedly better data management on their customer's data? It's a novelty item at best...
CODASYL was an attempt to write a standard for a technology that was not mature enough for standardization. Obvious with the benefit of hindsight.
Who wants better data management? People looking for a competitive advantage. You're welcome.
2:45 quite amusing, entertaining, and all but that is not all what a constraint in relational DB is about. For starters, RDBs follow normalization concepts while noSQL DBs do not. Actually noSQL does not mean no sequel. It is actually common to see sanity checks fail in ghost rows that were once part of the "thing" in the scheme, but the "thing" got deletedand the related row having no constraint was not. It is particularly common in nested objects. Maybe that is why FB migrated from Cassandra to PostGres almost the entire persistent layer. Quite sequeled insane, one would say. Now, that is a thing.
I think many of the existing replies somewhat unfairly bind the topic here.
The appeal of such a concept concerning your own, existing "graphs" in life are simple to input, file by file.
"I have 6 bananas, 3 tbsp soy sauce + some addy, what recipes can I make without spending over $20 at the store ?"
Great introduction. Many thanks
[15:42] The Open Source Mental Illness Neo4j database is at:
github.com/OSMIHelp/osmi-survey-graph
Wtf happened at 4:10
You just helped me a lot Man, Thanks for your enlightening talk. Cheers! 🥂
I watched the first 25 mins of the video. The guy was funny and said a lot of words, but like most people did not really say anything. Just a bunch of words. Why don't you tell me about the actual functions and logic behind graphs.
Why don’t you learn it yourself and make a video showing us how much better at this you are than everyone else? Rude, man, really rude.
Jeremy Anderson I’ll work on that. You might be supervised.
he isnt rude. if you have ever read a textbook about a spezific information it does not come with uneccesary rubbish talking. give me the Information i dont want any context to it, i will figure it out myself HOW I use that
John Sutton
I am under the impression that this talk is very informal, as an orientation, and not intended to be an academic lecture.
Jeremy Anderson. John is not being rude. He is being logical and straightforward which is a lot more than I can say about the speaker. The speaker is purposely leaving out information/begin vague such that his statements appear as factual evidence for his arguments/conclusions. But in reality, he doesn't provide any valid arguments just a bunch of misleading and fallacy driven arguments.. Everything he mentioned is easily achieved with Relational Databases and in many cases can be easier/faster than using a No SQL database. I'm a Full Stack Engineer and Solutions Architect for a corporation with many large Relationship Databases and No SQL/JSON databases. So I have quite a bit of experience since I do it on the daily. Especially as a Solution Architect, I have to stay current and well versed in new technologies in order to make the best architectural recommendation/requirements for each new project or application.
It's amazing to me how the relational model still dominates nearly 20 years after Zope/ZODB made its splash.
Maybe GDB dont have as many use cases as they say.
I always liked node based databases. thats what i call them. Nice talk!
Wish all presenters were this good!
Thanks for the video.
So the graph query language is just optimized in performance and comfort for a different use case? Okay. But SQL isn't the same as relational.
Liked as soon as I heard the first sentence
This would work well with the Trivium method (which is a way of thinking critically)
What’s that little noise at 4:10?
i had no doubt about the speed and benefits of using of graph databases, but this talk made me more confident about them, BTW this guy is awesome.
damn I love this guy, great talk about graph databases
8:36 just dots and lines? No, it's just turtles on turtles, all the way down!
Awesome Lecture! Well done....
are graph databases relevant in 2018? is there a way I can find out how in demand knowledge of graph databases are in industry?
That guy is a friggin genius.
learn much more about graph database. thanks a lot.