From Traversing With Gremlin To A Python/JSON Tree

Dropping you guys a small note about a problem I've been working on. I figure this will help someone in the future.

Let's say you have this graph, and would like to traverse it for displaying the data in a tree structure. This is kind of generic, and a lot of use-cases applies. What caught me up was the conversion from the graph query to recursing the returned list of nested dictionaries in Python. Since none of these cases are ever similar, this one is for the returned key-value format that Rexster will present you with.

Use the following query as a start:

def paths = g.v(id).out().loop(1){it.loops<5}.path{it.id}.tree.cap
return [g.v(id),paths]

The above will leave you with a nested structure like this (for instance):

[
  {1910860825782910978:
    {
    '_key': {'_id': '1910860825782910978'},
    '_value': {7115872404076756996:
            {
            '_key': {'_id': '7115872404076756996'}
            '_value': {7115872404076756994:
                '_key': {'_id': '7115872404076756994'}
                '_value': {'[1910860825782910978, 7115872404076756996, 7115872404076756994]': {}
            },

        }, '[1910860825782910978, 7115872404076756996]': {}
                },

        }
        },
    }
  }
]

An standard query (drop .tree.cap in the query) on the other hand will leave you with a flat structure, like:

==>[8335807037862576130, 7115872404076756994]
==>[8335807037862576130, 1910860825782910978]
==>[8335807037862576130, 6416374668537626626]
==>[8335807037862576130, 6416374668537626626, 7115872404076756996]

You see the difference I guess, if you don't you will notice the complexity when you get started merging the set of lists.

The whole problem is a lot like what you will get into if attempting to use asynchronous calls in Javascript (have a look at my previous Google Maps post for a js example). In this use case I am using the data for a tree structure, and the following function will present us with the nested results from the first query on the format [{A:{children:recursed},B:{children:recursed}}]. The clue being the recursing in the function given below.

def gremlinDict2Json(nodes,parent):
    ret={}
    children=[]
    if '_value' in nodes:
        new_parent=nodes['_key']['_id']
        for val in nodes['_value']:
            r=gremlinDict2json(nodes['_value'][val],new_parent)
            if not len(r)==0:
                children.append(r)

    if '_key' in nodes:
        ret=format_node(nodes['_key'],parent)
        ret['children']=children

    return ret

For initiating gremlinDict2Json I iterated over the paths returned from the query:

main=[]
root_nodes=['1234','5678']
for root_node in root_nodes:
    for path in paths:
        for root_node in path:
            main.append(gremlinDict2json(path[root_node],0)) 

The main-list may again be dumped to json to be returned to a client side library:

json.dumps({'selected': 'true','children': main}) 

Hopefully this have been as inspiring to you as I struggled to get it right.

Tommy

Tommy is an analyst and incident handler with more than seven years of experience from the government and private industry. He holds an M.Sc. in Digital Forensics and a B.Tech. in information security