Asynchronous packet socket reading with PACKET_RX_RING
2016 update
This post is quite old by now. For a more recent example, take a look at github.com/wdebruij/kerneltools/blob/master/tests/psock_rxring_vnet.c
Since Linux 2.6.2x, processes can read network packets asynchronously using a packet socket ring buffer. By setting the socket option SOL_SOCKET PACKET_RX_RING on a packet socket, the kernel allocates a ring buffer to hold packets. It will then copy all packets that a caller would have had to read using read() to this ring buffer. The caller then maps the ring into its virtual memory by executing an mmap() call on the packet socket and from then on can read packets without issuing any system calls. It signals the kernel that it has finished processing a packet by setting a value in a header structure that is prefixed to the packet. If the caller has processed all outstanding packets, it can block by isssuing a select() involving the packet socket.
This snippet shows how to set up a packet socket with ring
#include <stdlib.h> #include <stdio.h> #include <string.h> #include <stdint.h> #include <unistd.h> #include <assert.h> #include <errno.h> #include <fcntl.h> #include <poll.h> #include <arpa/inet.h> #include <netinet/if_ether.h> #include <sys/mman.h> #include <sys/socket.h> #include <sys/stat.h> #include <linux/if_packet.h> /// The number of frames in the ring // This number is not set in stone. Nor are block_size, block_nr or frame_size #define CONF_RING_FRAMES 128 /// Offset of data from start of frame #define PKT_OFFSET (TPACKET_ALIGN(sizeof(struct tpacket_hdr)) + \ TPACKET_ALIGN(sizeof(struct sockaddr_ll))) /// (unimportant) macro for loud failure #define RETURN_ERROR(lvl, msg) \ do { \ fprintf(stderr, msg); \ return lvl; \ } while(0); /// Initialize a packet socket ring buffer // @param ringtype is one of PACKET_RX_RING or PACKET_TX_RING static char * init_packetsock_ring(int fd, int ringtype) { struct tpacket_req tp; char *ring; // tell kernel to export data through mmap()ped ring tp.tp_block_size = CONF_RING_FRAMES * getpagesize(); tp.tp_block_nr = 1; tp.tp_frame_size = getpagesize(); tp.tp_frame_nr = CONF_RING_FRAMES; if (setsockopt(fd, SOL_PACKET, ringtype, (void*) &tp, sizeof(tp))) RETURN_ERROR(NULL, "setsockopt() ring\n"); // open ring ring = mmap(0, tp.tp_block_size * tp.tp_block_nr, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (!ring) RETURN_ERROR(NULL, "mmap()\n"); return ring; } /// Create a packet socket. If param ring is not NULL, the buffer is mapped // @param ring will, if set, point to the mapped ring on return // @return the socket fd static int init_packetsock(char **ring, int ringtype) { int fd; // open packet socket fd = socket(PF_PACKET, SOCK_DGRAM, htons(ETH_P_IP)); if (fd < 0) RETURN_ERROR(-1, "Root priliveges are required\nsocket() rx. \n"); if (ring) { *ring = init_packetsock_ring(fd, ringtype); if (!*ring) { close(fd); return -1; } } return fd; } static int exit_packetsock(int fd, char *ring) { if (munmap(ring, CONF_RING_FRAMES * getpagesize())) { perror("munmap"); return 1; } if (close(fd)) { perror("close"); return 1; } return 0; } /// Example application that opens a packet socket with rx_ring int init_main(int argc, char **argv) { char *ring; int fd; fd = init_packetsock(&ring, PACKET_RX_RING); if (fd < 0) return 1; // TODO: add processing. See next snippet. if (exit_packetsock(fd, ring)) return 1; return 0; }This snippet shows how to process packets at runtime using the packet ring. The first function reads a single packet from the ring, the second updates the header in the ring to release the frame back to the kernel.
static int rxring_offset; /// Blocking read, returns a single packet (from packet ring) static void * process_rx(const int fd, char *rx_ring) { struct tpacket_hdr *header; struct pollfd pollset; int ret; // fetch a frame header = (void *) rx_ring + (rxring_offset * getpagesize()); assert((((unsigned long) header) & (getpagesize() - 1)) == 0); // TP_STATUS_USER means that the process owns the packet. // When a slot does not have this flag set, the frame is not // ready for consumption. while (!(header->tp_status & TP_STATUS_USER)) { // if none available: wait on more data pollset.fd = fd; pollset.events = POLLIN; pollset.revents = 0; ret = poll(&pollset, 1, -1 /* negative means infinite */); if (ret < 0) { if (errno != EINTR) RETURN_ERROR(NULL, "poll()\n"); return NULL; } } // check data if (header->tp_status & TP_STATUS_COPY) RETURN_ERROR(NULL, "skipped: incomplete packed\n"); if (header->tp_status & TP_STATUS_LOSING) fprintf(stderr, "dropped packets detected\n"); // return encapsulated packet return ((void *) header) + PKT_OFFSET; } // Release the slot back to the kernel static void process_rx_release(char *rx_ring) { struct tpacket_hdr *header; // clear status to grant to kernel header = (void *) rx_ring + (rxring_offset * getpagesize()); header->tp_status = 0; // update consumer pointer rxring_offset = (rxring_offset + 1) & (CONF_RING_FRAMES - 1); }This code was copied from a project that required two separate functions. In most cases, you want to read, process and release a frame in a single loop. I'm not particularly proud of using a global variable for the current ring offset. Download the complete sourcecode as packet-rx-ring.c
Hi Willem,
ReplyDeleteThanks for the great example! It got me jump-started on fast raw socket programing.
I ran into the following wierd result: I have two NICs that are looped back, eth2 and eth3.
when I used pkt-gen to send pkts on eth2, used packet-rx-ring to receive packets, it seems that every packets sent by pkt-gen is received twice by packet-rx-ring. one on eth2 and the other on eth3.
Thanks
Jim
That's right, Jim. Packet sockets intercept both incoming and outgoing packets.
ReplyDeleteHow to avoid that, ie, outgoing packet caught in rx_ring buffer ?
DeleteI am using 1 tx_ring & rx_ring. Send a packet- {0, 0, 0, 0, 0, 0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0} - thru TX. Then "PACKET_OUTGOING" is caught in rx_ring.
Is there a socket option to avoid this ?
Hi guys, I have setted the TPACKET_V2 but when I receive a packet that have vlan id setted the vlan is ever o !! could you help me?
DeleteHi Willem,
ReplyDeleteI guess you're not reading these replies anymore, but the link to the complete sourcecode now defaults to some travel page. Any chance of an updated link?
Thanks for the heads-up. Apparently I was not being notified, indeed. I believe I solved it.
DeleteFound the source code here:
ReplyDeletehttps://github.com/vieites4/rawsockets/blob/master/docs/snippets/packet-rx-ring.c