Basic terminology and practices related to graph databases and graph modeling

vehementi · on March 6, 2023

> Graphs, for example, can have cycles while trees can't. A cycle means that there is only one way to go to a node by following relationships from another node.

Typo here, that's the opposite of what cycle means, isn't it?

layer8 · on March 6, 2023

It’s not even the opposite. As the sibling says, this can happen in DAGs, but not in trees (both of which are cycle-free). This indicates such a confused understanding of graphs that it gives me very low confidence in the article (and the product).

The article then continues with:

> To fully utilize the power of graphs, you first need to get a basic understanding of the underlying concepts in graph theory.

Indeed. ;)

> There are four components that every graph consists of nodes, relationships, labels, and properties.

This is incorrect. A graph in the graph-theoric sense consists solely of vertices and edges [0] (nodes and relationships), no labels or properties required. Also, there's a colon missing after "of".

The writing is quite sloppy for a field that requires rigorous precision.

[0] https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)#G...

pfisherman · on March 6, 2023

That’s actually a tree.

A cycle is a path from a node back to itself that does not traverse any edge more than once.

You can have a directed acyclic graph (DAG) where there are multiple paths from one node to another, but there is no way to revisit a node once you have moved on.

vpavicic · on March 6, 2023

Thank you for noticing! I'll investigate this error and fix it accordingly ;)

victor106 · on March 6, 2023

This is an awesome introduction.

I wish there was a book/resource that explains when you should NOT use a graph DB( or any technology for that matter). And the pitfalls of using the wrong technology.

You have so many options for technology these days with so much overlapping capabilities it’s hard to decide which tech pick for which problem space.

yamtaddle · on March 6, 2023

> I wish there was a book/resource that explains when you should NOT use a graph DB( or any technology for that matter). And the pitfalls of using the wrong technology.

This is complicated by database companies, in particular, often marketing their products as suitable—or even best—for every situation, even when it's not true.

Graph databases are doing this now, but we saw the same thing with document-oriented databases like Mongo.

With graph databases I'd say the key things to look at are: data integrity / correctness guarantees (this one goes for any DB, really), and which graph operations and combos of operations they're best at. Nb that, depending on what exactly you're doing with a graphdb, your general data size & shape, and which one you're looking at, sometimes e.g. PostgreSQL actually outperforms them at graph-oriented operations.

[EDIT] General advice? Think about them if you've got a densely-connected, large graph and need to answer questions that mostly involve traversing the graph, but not fetching or inspecting much of that data as part of the queries, a graphDB might be a good idea—bearing in mind that using it as a supplement to an RDBMS is an option. Otherwise, it's less likely to be the right call (though it might be—various graph databases may perform very differently under the same workload, a query that runs like dogshit on one might do OK on another, usually this is due to their making different optimization trade-offs at the data structure level)

hot_gril · on March 7, 2023

The relational vs document-oriented dilemma is also solved by not overthinking it. If you don't know which one you should use, relational is the safest option. If you eventually reach a scale and use case where you need to think about a document-oriented DB, it will be much clearer what you need.

Graph DBs are different cause they're less about scaling and more about specialization for certain use cases. Again relational is the safe default when you're unsure.

eurasiantiger · on March 7, 2023

Graph databases are morphologically a superset of relational databases.

Combine a graph db with document support and it can be a big win to not have to model every nested document while getting O(1) query time join performance.

hot_gril · on March 7, 2023

Is it a strict superset? Many graph DBs are implemented using relational ones. This doesn't make a graph DB a good tool for the job of a relational DB or vice versa. Their features and optimizations surround pretty different use cases. To scrape the surface, I can't SQL-query a graph, and I can't DFS-query a relational DB. Relational DBs nowadays have document columns like jsonb, btw, but usually you use them sparingly.

DBs are also the biggest area where ideal design and abstraction will quickly give way to practical concerns like performance (measured, not big-O). Generally, nothing is going to look like it did on paper.

eurasiantiger · on March 7, 2023

Generally, the difference between being able to do SQL or depth-first searching comes down to the storage layer. Traditional row-oriented RDBMSs can’t do DFS efficiently, but a RDBMS backed by RDF-like columnar storage sure could.

momirlan · on March 6, 2023

this is where experience comes in handy. some things cannot fit in blogs, it just takes years of grinding to figure, and see through the marketing bs.

hot_gril · on March 6, 2023

My rule of thumb is, start designing with a relational DB and only think about graph if it becomes painful. If you don't have cyclical FKs, you probably don't have a real use for a graph DB.

taubek · on March 6, 2023

There is one section when not to use graphs in a blog post at https://memgraph.com/blog/graph-database-vs-relational-datab....

I guess it is like with any tool. You need to know the limitations. It is often better to use several tools. Each for the area where it performs the best. But then multiple tools can be pain to maintain.

EDIT: fixed typos

mbuda · on March 6, 2023

I'm glad that you liked it. If you want to see all of these things in action check out Memgraph. You, can find our repo at https://github.com/memgraph/memgraph DISCLAIMER: I'm the co-founder and CTO.

Also, any feedback or suggestion will help us push more of content like this in the future!

grounder · on March 6, 2023

Thanks for this intro to Graphs. Does Memgraph persist to disk or is it in-memory only? If in-memory only, do you have plans to support graphs which become larger than available memory? Thanks!

mbuda · on March 6, 2023

Yes, memgrpah persists data on-disk, but there is not the support for larger than memory datasets yet (but a lot can fit on a single machine). In general, our primary focus at the moment is the scale out / proper graph sharding support. The progress on that side can be tracked under the following project -> https://github.com/orgs/memgraph/projects/5

How big is your graph and do you have specific queries in mind?

grounder · on March 6, 2023

I have around 100GB of data in a relational database. Not sure how that would translate into nodes and edges. Is that amount feasible in Memgraph?

mbuda · on March 6, 2023

It depends on a few things. If you want to migrate all the data that's heavier than just, e.g., the graph part and connections. It heavily depends on the final model, required data types and overall distribution of data. This guide -> https://memgraph.com/docs/memgraph/under-the-hood/storage can help you with calculating the amount of required RAM. Also, we plan to include a simple calculator into the next release of Memgraph Lab (https://memgraph.com/docs/memgraph-lab/) coming probably this wee :)

grounder · on March 6, 2023

Thank you so much! One more question if you'll permit me - and then I'll leave you alone I promise. Social networks are often recommended as a use-case for a graph database. How well would Memgraph perform on generating an activity feed of all of the posts of the people I follow ordered by post date descending, for example (hopefully with pagination)?

mbuda · on March 6, 2023

Hard to say without a particular query, and again it depends on the model, but from the explanation, that could be modeled as a 2-hop relationship. If you put in-place indexes and maybe some filtering, it should be fast.

Please join our Discord at https://discord.gg/memgraph, more people will be able to help you there :)

katelatte · on March 6, 2023

Memgraph does persist data. Snapshots are taken periodically during the entire runtime of Memgraph. When a snapshot is triggered, the whole data storage is written to the disk. There are also write-ahead logs that save all database modifications that happened to a file.

screamingninja · on March 6, 2023

Great tutorial on graph modeling! The author did an excellent job of explaining the basic terminology and practices related to graph databases and graph modeling. The tutorial is well-structured and easy to follow, making it an excellent resource for anyone looking to learn more about graph modeling. The author covers a wide range of use cases for graph databases, including social networks, fraud detection, network analysis, and supply chain management.

mmwako · on March 6, 2023

ChatGPT?

screamingninja · on March 6, 2023

What a world we live in, where a person can't write a coherent blurb of text without being suspected of using ChatGPT. Anyone remember Idiocracy? That movie is starting to sound more and more like a prophecy.

brokencode · on March 6, 2023

Your original post is a summary without any new ideas or thoughts about the article. That is why it reads like you asked an AI to generate it.

I don’t think the question was meant as an accusation, but it is amusing to think about whether some commenters are using an AI to generate their comments. Would we even notice?

screamingninja · on March 6, 2023

> Your original post is a summary without any new ideas or thoughts about the article.

You're right. I often look for a quick summary of the articles here in these comments to avoid falling for clickbaits, so contributing a summary about something I found useful.

> I don’t think the question was meant as an accusation, but it is amusing to think about whether some commenters are using an AI to generate their comments. Would we even notice?

Fair enough. The one word question was hard to read into, and it's amusing indeed. We shall never know for sure!

vpavicic · on March 6, 2023

I think we would notice because we would all start to sound alike :D kind of bland

yamtaddle · on March 6, 2023

Repeating "graph modeling" three sentences in a row, with all those sentences being damn near content-free, reads like it came from something even less capable than ChatGPT, to me.

screamingninja · on March 6, 2023

> Repeating "graph modeling" three sentences in a row, with all those sentences being damn near content-free, reads like it came from something even less capable than ChatGPT, to me.

Sorry to disappoint?

kjs3 · on March 6, 2023

That was the first thing I thought.

taubek · on March 6, 2023

But the account is 8 years old.

kjs3 · on March 6, 2023

There's a real person behind the account, but my guess is they are playing around with the shiny new tech.

vpavicic · on March 6, 2023

ChatGPT or not, if this is an honest comment, thnx! :)

screamingninja · on March 6, 2023

It was, thank you!

pharmakom · on March 6, 2023

Can anyone explain how this improves on a relational database?

The concepts (nodes, edges, etc...) can all be represented in a traditional relational database using tables and foreign keys.

What is the advantage of a graph database?

rajman187 · on March 6, 2023

The first thing to consider is that a graph cannot be mapped in a maximally consistent way with the underlying hardware (the von Neumann architecture represents data in a sequential manner and it is much faster to access it this way rather than randomly).

With that out of the way, there are generally two families in the graph database world: those which use underlying traditional tables of nodes and many-to-many edges; and index-free adjacency which just means each node in the graph knows the memory address of its connections (other side of the edges).

Distributed graphs necessarily end up using the former because it’s difficult if not impossible for a node to know the memory address of its connection when that crosses a physical boundary. So typically index-free adjacency graphs have a master-slave setup with multiple read replicas but a single one to write to.

So with a “native graph” you don’t rely on potentially expensive join operations to find neighbors of neighbors and can traverse complex paths easily.

Here’s how Facebook approached the task of scaling a graph representation to mind boggling heights (spoiler: lots of mysql servers and a plethora of caches) https://engineering.fb.com/2013/06/25/core-data/tao-the-powe...

markjspivey · on March 6, 2023

with property graphs, as compared to RDF, wouldnt the internals of nodes (properties) not be considered actually a part of the graph itself (or atleast not first class)?

dairyleia · on March 6, 2023

Love this!

vpavicic · on March 6, 2023

I am glad you liked it! You can play around with graphs on Memgraph's Playground -> https://playground.memgraph.com/

guilhas · on March 6, 2023

This page crashed.

window.analytics is undefined

Try again

Firefox mobile

wnoise · on March 6, 2023

Firefox desktop too.

taubek · on March 6, 2023

Try to check the archived version at https://web.archive.org/web/20230306165643/https://memgraph....

wnoise · on March 6, 2023

That crashes too, because it still loads the at-fault javascript. But the actual site is now fixed.

katelatte · on March 6, 2023

Thanks for reporting! We will fix it asap