/ hadoop

The Ethernet Packet Headers

I've been twisting my head in ingestion pcaps for a month now and ended up at least testing jpcap. Jpcap is a libpcap-based library for Java, so it fits Hadoop, and Flume, pretty well. That doesn't save us from parsing down the protocols - so it's time to keep the tongue straight.

I've got some good, and old, RFCs for this post:

  • The Ethernet Frame - RFC1042 (parts of it anyways)
  • IPv4 specification - RFC791
  • TCP specification - RFC793
  • UDP specification - RFC768

Let us say that you'd want to ingest data through a flume source plugin. Then you could do something like this:

if(packet instanceof IPPacket) {
    IPPacket ipPacket = (IPPacket)packet;
    byte[] packetBytes = ByteBuffer.allocate(
    ipPacket.data.length +     ipPacket.header.length).put(ipPacket.header).put(
    ipPacket.data).array();
    Event flumeEvent = EventBuilder.withBody(packetBytes);

    this.source.processEvent(flumeEvent);
}

Now, if you use HDFS as a sink - you would end up with a header in different layers in the start of the packet.

00 40 66 c0 00 08 00 0c  29 2f 8e 27 08 00 45 00
00 34 c2 3c 40 00 40 06  1c b5 c0 a8 6d 80 c0 a8
6d 01 00 50 c6 bc ed dd  f1 75 b1 aa 1a 2f 80 10
00 7a 29 d9 00 00 01 01  08 0a 07 ca 33 49 30 3d  
13 0d

That's an ethernet header in the beginning there. I looked a lot before I recognised it as that - but you can do the short version by looking at the docs for IP-TCP-FTP here. If you notice the 06 in grey in there, that's the one that signals the next header is a TCP one.

6d 01 00 50 c6 bc ed dd  f1 75 b1 aa 1a 2f 80 10
00 7a 29 d9 00 00 01 01  08 0a 07 ca 33 49 30 3d  
13 0d

In what I've seen the header may vary in size, I solved that by inserting the length of the header as the an int in the beginning of the packet.

The following is copied from the documentation of JPCAP.

Link Level Ethernet Header

00 00 00 00 00 00 MAC destination address
00 00 00 00 00 00 MAC source address
08 00             Ethernet type field (0x800 - ip datagram)
IP HEADER
45                Header length and version (0x4 - IPv4, 0x5 - len)
00                Type of service
00 3c             Total length minus eth header (0x3c - 60 bytes)
00 58             Identification, unique id of this datagram
40 00             3-bit flags (0x2), 13-bit fragment offset (0x000)
40                Time to live (TTL)
06                Protocol (0x06 - TCP)
3c 62             Header checksum
7f 00 00 01       Source IP address (- 127.0.0.1)
7f 00 00 01       Destination IP address (- 127.0.0.1)
TCP HEADER
04 02             Source port number (- 66?)
00 15             Destination port number (0x15 - 21, FTP service)
f0 94 69 f9       Sequence number
00 00 00 00       Acknowledgment number
a                 4-bit header length
0 02              6-bits reserved, urg, ack, psh, rst, syn, fin
79 60             Window size
69 5a             TCP checksum
00 00             Urgent pointer
TCP options, data
02 04 0f 2c       ?
04 02 08 0a       ?
00 06 ff 26       ?
00 00 00 00       ?
01 03 03 00       ?