Skip to content

How to increase Buffer Size

Some example articles

UDP Tuning (es.net) Network Performance Tuning Tuning the network performance

Receive socket memory and write socket memory

View values

The default and maximum amount for the receive socket memory:

Bash
# View receive socket memory values
root $> cat /proc/sys/net/core/rmem_default
root $> cat /proc/sys/net/core/rmem_max

# View write socket memory values
root $> cat /proc/sys/net/core/wmem_default
root $> cat /proc/sys/net/core/wmem_max

# View optional memory buffers
root $> cat /proc/sys/net/core/optmem_max

# Or
root $> systctl -a | grep {rmem | wmem}

Tune buffer values

Bash
# Increase receive buffers to 40MB (can be any value)
root $> sysctl -w net.core.rmem_max = 40000000
root $> sysctl -w net.core.rmem_default = 40000000

# Increase write buffers to 40MB (can be any value)
root $> sysctl -w net.core.wmem_max = 40000000
root $> sysctl -w net.core.wmem_default = 40000000

# Increase the length of the processor input queue
sysctl -w net.core.netdev_max_backlog = 65536

In order to make these values persist across reboots, add the above lines to /etc/sysctl.conf:

Bash
1
2
3
4
5
6
7
### Tune the NIC

The Transmit Queue Length ( txqueuelen ) is **a TCP/IP stack network interface value that sets the number of packets allowed per kernel transmit queue of a network interface device**.  txqueuelen usually defaults to 1000.

```sh
# Increase txqueuelen for 10G NIC
root $> ifconfig eth{N} txqueuelen 10000

UDP Tuning

Bash
1
2
3
# Increase udp minimal read/write memory (13k as an example)
root $> sysctl net.ipv4.udp_rmem_min=131072
root $> sysctl net.ipv4.udp_wmem_min=131072

MTU Tuning

Bash
1
2
3
4
5
6
7
8
# Find min and max mtu for device
$> ip -d link list

# Increase MTU to 9k for upstream
$> ifconfig eth0 mtu 9000

# Or
$> ip link set dev eth0 mtu 9000

Monitoring: UDP Protocol Layer Statistics

/proc/net/snmp

Bash
1
2
3
4
5
6
# Monitor detailed UDP protocol statistics by reading `/proc/net/snmp`
$> cat /proc/net/snmp | grep Udp\:
Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors Udp: 16314 0 0 17161 0 0

# Watch UDP statistics in real-time
$> watch -n1 "cat /proc/net/snmp | grep -w Udp|column -t"

In order to understand precisely where these statistics are incremented, you will need to carefully read the kernel source. There are a few cases where some errors are counted in more than one statistic.

  • InDatagrams: Incremented when recvmsg was used by a userland program to read datagram. Also incremented when a UDP packet is encapsulated and sent back for processing.
  • NoPorts: Incremented when UDP packets arrive destined for a port where no program is listening.
  • InErrors: Incremented in several cases: no memory in the receive queue, when a bad checksum is seen, and if sk_add_backlog fails to add the datagram.
  • OutDatagrams: Incremented when a UDP packet is handed down without error to the IP protocol layer to be sent.
  • RcvbufErrors: Incremented when sock_queue_rcv_skb reports that no memory is available; this happens if sk->sk_rmem_alloc is greater than or equal to sk->sk_rcvbuf.
  • SndbufErrors: Incremented if the IP protocol layer reported an error when trying to send the packet and no error queue has been setup. Also incremented if no send queue space or kernel memory are available.
  • InCsumErrors: Incremented when a UDP checksum failure is detected. Note that in all cases I could find, InCsumErrors is incremented at the same time as InErrors. Thus, InErrors - InCsumErros should yield the count of memory related errors on the receive side.

Note that some errors discovered by the UDP protocol layer are reported in the statistics files for other protocol layers. One example of this: routing errors. A routing error discovered by udp_sendmsg will cause an increment to the IP protocol layer's OutNoRoutes statistic.

/proc/net/udp

Bash
1
2
3
# Monitor UDP socket statistics by reading `/proc/net/udp`
$> cat /proc/net/udp
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops 515: 00000000:B346 00000000:0000 07 00000000:00000000 00:00000000 00000000 104 0 7518 2 0000000000000000 0 558: 00000000:0371 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 7408 2 0000000000000000 0 588: 0100007F:038F 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 7511 2 0000000000000000 0 769: 00000000:0044 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 7673 2 0000000000000000 0 812: 00000000:006F 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 7407 2 0000000000000000 0

The first line describes each of the fields in the lines following:

  • sl: Kernel hash slot for the socket
  • local_address: Hexadecimal local address of the socket and port number, separated by :.
  • rem_address: Hexadecimal remote address of the socket and port number, separated by :.
  • st: The state of the socket. Oddly enough, the UDP protocol layer seems to use some TCP socket states. In the example above, 7 is TCP_CLOSE.
  • tx_queue: The amount of memory allocated in the kernel for outgoing UDP datagrams.
  • rx_queue: The amount of memory allocated in the kernel for incoming UDP datagrams.
  • tr, tm->when, retrnsmt: These fields are unused by the UDP protocol layer.
  • uid: The effective user id of the user who created this socket.
  • timeout: Unused by the UDP protocol layer.
  • inode: The inode number corresponding to this socket. You can use this to help you determine which user process has this socket open. Check /proc/[pid]/fd, which will contain symlinks to socket[:inode].
  • ref: The current reference count for the socket.
  • pointer: The memory address in the kernel of the struct sock.
  • drops: The number of datagram drops associated with this socket. Note that this does not include any drops related to sending datagrams (on corked UDP sockets or otherwise); this is only incremented in receive paths as of the kernel version examined by this blog post.

Myricom 10Gig NIC Tuning Tips for Linux

The Myricom NIC provides a number of tuning knobs. In particular setting interrupt coalescing can to help throughput a great deal:

/usr/sbin/ethtool -C ethN rx-usecs 75

For more information on the tradeoffs between rx/tx descriptors, interrupt coalescence, and L1 CPU cache size, see this article:http://patchwork.ozlabs.org/patch/348793/

For more information

Recommended:

README.myri10ge-linux

Myri10GE FAQ 

Myricom Performance Tuning Guide

Myri10GE ethtool output documentation

In particular, we've seen very dramatic improvements in firewalled environments using the Mryicom "Throttle" option. We have also seen up to 25% improvement using the latest driver downloaded from myricom and complied from source instead of the default driver from the Redhat/CentOS release.

Centos 7 Issues

It has been reported that CentOS 7.1 does not enable Myricom's MSI-X by default, which decreases performance. To fix this, to the following:

Create a file /etc/modprobe.d/myri10ge.conf containing:

options myri10ge myri10ge_fw_name=myri10ge_rss_eth_z8e.dat myri10ge_gro=0 myri10ge_max_slices=-1

Answer to question on Stack Overflow

Overview

What is causing the inability to send/receive data locally?

Mostly buffer space. Imagine sending a constant 10MB/second while only able to consume 5MB/second. The operating system and network stack can't keep up, so packets are dropped. (This differs from TCP, which provides flow control and re-transmission to handle such a situation.)

Even when data is consumed without overflowing buffers, there might be small time slices where data cannot be consumed, so the system will drop packets. (Such as during garbage collection, or when the OS task switches to a higher-priority process momentarily, and so forth.)

This applies to all devices in the network stack. A non-local network, an Ethernet switch, router, hub, and other hardware will also drop packets when queues are full. Sending a 10MB/s stream through a 100MB/s Ethernet switch while someone else tries to cram 100MB/s through the same physical line will cause dropped packets.

Increase both the socket buffers size and operating system's socket buffer size.

Linux

The default socket buffer size is typically 128k or less, which leaves very little room for pausing the data processing.

sysctl

Use sysctl to increase the transmit (write memory [wmem]) and receive (read memory [rmem]) buffers:

  • net.core.wmem_max
  • net.core.wmem_default
  • net.core.rmem_max
  • net.core.rmem_default

For example, to bump the value to 8 megabytes:

Bash
sysctl -w net.core.rmem_max=8388608

To make the setting persist, update /etc/sysctl.conf as well, such as:

Bash
net.core.rmem_max=8388608

An in-depth article on tuning the network stack dives into far more details, touching on multiple levels of how packets are received and processed in Linux from the kernel's network driver through ring buffers all the way to C's recv call. The article describes additional settings and files to monitor when diagnosing network issues. (See below.)

Before making any of the following tweaks, be sure to understand how they affect the network stack. There is a real possibility of rendering your network unusable. Choose numbers appropriate for your system, network configuration, and expected traffic load:

  • net.core.rmem_max=8388608
  • net.core.rmem_default=8388608
  • net.core.wmem_max=8388608
  • net.core.wmem_default=8388608
  • net.ipv4.udp_mem='262144 327680 434274'
  • net.ipv4.udp_rmem_min=16384
  • net.ipv4.udp_wmem_min=16384
  • net.core.netdev_budget=600
  • net.ipv4.ip_early_demux=0
  • net.core.netdev_max_backlog=3000
ethtool

Additionally, ethtool is useful to query or change network settings. For example, if ${DEVICE} is eth0 (use ip address or ipconfig to determine your network device name), then it may be possible to increase the RX and TX buffers using:

  • ethtool -G ${DEVICE} rx 4096
  • ethtool -G ${DEVICE} tx 4096
iptables

By default, iptables will log information about packets, which consumes CPU time, albeit minimal. For example, you can disable logging of UDP packets on port 6004 using:

Bash
iptables -t raw -I PREROUTING 1 -p udp --dport 6004 -j NOTRACK
iptables -I INPUT 1 -p udp --dport 6004 -j ACCEPT

Your particular port and protocol will vary.

Monitoring

Several files contain information about what is happening to network packets at various stages of sending and receiving. In the following list ${IRQ} is the interrupt request number and ${DEVICE} is the network device:

  • /proc/cpuinfo - shows number of CPUs available (helpful for IRQ-balancing)
  • /proc/irq/${IRQ}/smp-affinity - shows IRQ affinity
  • /proc/net/dev - contains general packet statistics
  • /sys/class/net/${DEVICE}/queues/QUEUE/rps_cpus - relates to Receive Packet Steering (RPS)
  • /proc/softirqs - used for ntuple filtering
  • /proc/net/softnet_stat - for packet statistics, such as drops, time squeezes, CPU collisions, etc.
  • /proc/sys/net/core/flow_limit_cpu_bitmap - shows packet flow (can help diagnose drops between large and small flows)
  • /proc/net/snmp
  • /proc/net/udp

Summary

Buffer space is the most likely culprit for dropped packets. There are numerous buffers strewn throughout the network stack, each having its own impact on sending and receiving packets. Network drivers, operating systems, kernel settings, and other factors can affect packet drops. There is no silver bullet.

Further Reading

Network testing tools