Recently I published a post in regard to using cognitive automation in information security, which by the way model an extension of what Olivier Thonnard presented in his TRIAGE framework. The point of my extension was to involve a higher degree of cognitive automation. In this article we are going to focus on the presentation part. For the occasion I have chosen a use case related to something fairly familiar to most of you, the passive DNS. Normally the data will be part of a feature vector in the model, but for simplicity we evaluate the DNS records standalone. Edward has a very handy PDNS-tool at GitHub. Let's learn the old dog to howl in a new way!
You will need the following software and libraries for implementing this article:
- Neo4j, standalone or cluster
- Facebook Tornado Webserver
For me, the question is how to best store and not least how to query the data efficiently. When I work with threat infrastructure I often graph it in Maltego, and by that you might figure the use case here: Why do stuff twice (note the format output from the webserver in this article is gexf, the same used in Gephi)? Sometimes you will be forced to manually plot some of it due to missing coverage in your automated sensor network, but if the data is there the connection should be done automatically. Neo4j to the rescue, in this small experiment at least.
In order to ingest the PDNS-records into Neo4j I wrote a small script listening on stdout, which can be easily integrated with the one found in my previous Neo4j article. The following code snippet wraps the command line call and inserts it into the database.
import subprocess from GraphConn.Connect import Graph g = Graph() # connect to graph proc = subprocess.Popen('./passivedns.sh', shell=True, stdout=subprocess.PIPE, bufsize=1 ) while proc.poll() is None: line = proc.stdout.readline() entry = line.strip().split("||") if len(entry)==9: # there should be 9 values ts,src,dst,rrClass,q,qType,answer,ttl,count=entry q=q[:-1] # remove trailing dot from query g.add(q,answer) # add the node
You'll have to unbuffer pdns before piping it (or else you won't get output before quitting the application), the most easy way is to create a shell-script (
#!/bin/bash stdbuf -i0 -o0 -e0 passivedns -L - -l -
In this case I customized the Connect-library to the need.
from neo4jrestclient import client from neo4jrestclient.client import GraphDatabase from neo4jrestclient.query import Q class Graph: def __init__(self): self.gdb = GraphDatabase("http://localhost:7474/db/data/") self.nodes =  def find(self,query): l = (Q("name", iexact=query)) try: return self.gdb.nodes.filter(l) except: return None def add(self,domain,ip): n = self.gdb.nodes.create(name=domain, type='domain') if not self.find(ip): n = self.gdb.nodes.create(name=ip, type='ip') d = (Q("name", iexact=domain)); n1 = self.gdb.nodes.filter(d); i = (Q("name", iexact=ip)); n2 = self.gdb.nodes.filter(i); n1.relationships.create("points to", n2)
I created a small webserver to present the data using sigmajs. You can find the complete example at GitHub. To lookup a node, or e.g. a domain, follow an URL like
http://localhost:8888/google. Remember to populate it first.