Customizing Redis - Graphs - Part 1

This is my first technical post on the web, so execuse me for my bad writing skills. Well English is not my native language

I have recently come to the conclusion that I won't improve my software skills unless I try to learn and contribute to some open source project. So I didn't hesitate to clone Redis and look at it source code.

Redis is one of my favorite software projects. And that's due to its simplicity, and how easy it can be learned and installed, I have used it in a few projects that I have done before. Either as part of my full time job when I was in Jordan, or for some other fun side projects.

But I Was totally convinced that customizing any open source project will be way better than just understanding it, so I decided to pull up my sleeves and add a new data structure to Redis. Say hi to Redis Graphs!

My decision was based on my understanding of how important the role Graphs can play in many projects, and apparently many big projects use Redis as a memory store for graphs as well.

And BTW, just a few days ago Google released an open-source database for Graphs, check the Reddit link that I posted on reddit

It's a long way to go, if I want Graphs to be part of Redis core. There are still many things to do, and I don't think it will be easy. But again, it was so much fun for me to understand how Redis really works

I will start my topic with talking about Redis Graphs API

gvertex

It's used to add a new vertex to an existing, or a new, graph; it takes two paramters, the graph name, and the vertex name. Both denoted by strings.
gvertex graph_name vertex_name
Example:
> gvertex graph1 v1
1
> gvertex graph1 v2
1
> gvertex graph1 vv3
1
If the graph doesn't already exist, it will be created and added to Redis DB. The current version of Redis/Graphs assumes that this key doesn't already exist with data type other than graphs. I will add the type checking later, it should be easy.

gedge

As you might expect, this command adds an edge between two existing vertices in the graph, with a value of the edge between them. Btw, all the graphs right now are undirected. Example:
> gedge graph1 v1 v2 1.5
1
> gedge graph1 v2 v3 2.5
1

gneighbours

This command returns the neighbours of some vertex in the graph. Example:
> gneighbours graph1 v2
"v1"
"v3"
> gneighbours graph1 v1
"v2"

And a bunch of other commands that I will write a documentation for later are available as well:

gvertices
gedges
gedgeincrby
gedgeexists

What's so special about this, and why I'm interested

I just like the idea of some software that puts a specific functionality in a box, and provides it as a service for other programs running on the same network. Because usually these programs just do the thing right, and fast and the developer just doesn't have to care about how to reinvent the wheel. And Redis is for sure is one of these. Redis protocol is flexible, clear and simple. Some commands return integers, some return strings and some return lists. And all the client just understand this protocol. For example I just tested a Redis Ruby client with the graphs, and it's working as expected without a single character changed in the Redis Client :) .. Isn't that awesome !? .. well yes it's ! I can now use Graphs in PHP, Ruby or any other language, as long as it has a Redis client implemented

I wasn't talking about graphs in particular, as much as how Redis protocol is really good and extensible

By implementing my own idea on Redis, I'm standing on the shoulders of giants. I didn't have to implement my own simple data structures, like hashs, lists or whatever needed by Graphs or any graph algorithm. For example, in the internal Graph datastructure, in Redis/Graph each graph vertex has a list, which includes all the edges that belong to that vertex. So basically to access the neighbours of some graph vertex ( Adjaceny list ). That's how the graphs are basically implemented. But if you read the code you might still find another list inside the graph that lists all the edges as well, but so far it's implemented using some simple list datastructure that I wrote, but I will replace soon to Redis lists

You might say: "Meeh, ok !!". But let me prove my point here. To make the thing more interesting and challenging, I decided to implement a command shortestpath that finds the shortest path between two vertices in the graph using Dijkstra Algorithm.

In order for Dijkstra to work fast, a priority queue is needed at some point to retrieve the graph vertex which has a minimum value (distance). Implementing a priority queue data structure might be really an exhausting job in C, but hey, Redis already has sorted sets. Redis sorted sets are implemented using Skip Lists. I still have to read more about whether they achieve the same performance as Fibonacci Heap for Priority queues, but they are very suitable for this task, because the elements are already sorted, and it takes O(1) to extract the minimum element, which is super for Dijkstra.

The current shortestpath graph v1 v2 command assumes that all the edges have positive float value, and that there is a path between the two vertices. Otherwise you might unexpected results, but still, this will be fixed soon

One other thing in Dijkstra's algorithm: In order to build the shortest path that has been taken between the two vertices, a hash has to be used to map each vertex to its parent (see the previous wikipedia page). And for this task, I used Redis Hashes. Using this hash, I can traverse the path from v2 up to v1 to build up the result that has to be sent back to the client with the shortest path

It's still a long way to go, but I wanted to share my ideas, to get constructive feedback from the community. I learned a lot in the last 3 weeks, and the fun hasn't begun yet ;)

Any documentation?

Redis / Graphs documentation (under progress)

Where is the code?

Check this link Redis / Graphs Source code on Github