August | 2015 | 林盎然的博客 Angran Lin's Blog

I’m trying to build up a push system recently. To increase the scalability of the system, the best practice is to make each connection as stateless as possible. Therefore when bottleneck appears, the capacity of the whole system can be easily expanded by adding more machines. Speaking of load balancing and reverse proxying, Nginx is probably the most famous and acknowledged one. However, TCP proxying is a rather recent thing. Nginx introduced TCP load balancing and reverse proxying from v1.9, which is released in late May this year with a lot of missing features. On the other hand, HAProxy, as the pioneer of TCP loading balacing, is rather mature and stable. I chose to use HAProxy to build up the system and eventually I reached a result of 300k concurrent tcp socket connections. I could have achieved a higher number if it were not for my rather outdated client PC.

Step 1. Tuning the Linux system

300k concurrent connection is not a easy job for even the high end server PC. To begin with, we need to tune the linux kernel configuration to make the most use of our server.

File Descriptors

Since sockets are considered equivalent to files from the system perspective, the default file descriptors limit is rather small for our 300k target. Modify /etc/sysctl.conf to add the following lines:

fs.file-max = 10000000 
fs.nr_open = 10000000

These lines increase the total file descriptors’ number to 1 million.
Next, modify /etc/security/limits.conf to add the following lines:

* soft nofile 10000000
* hard nofile 10000000
root soft nofile 10000000
root hard nofile 10000000

If you are a non-root user, the first two lines should do the job. However, if you are running HAProxy as root user, you need to claim that for root user explicitly.

TCP Buffer

Holding such a huge number of connections costs a lot of memory. To reduce memory use, modify /etc/sysctl.conf to add the following lines.

net.ipv4.tcp_mem = 786432 1697152 1945728
net.ipv4.tcp_rmem = 4096 4096 16777216
net.ipv4.tcp_wmem = 4096 4096 16777216

Step 2. Tuning HAProxy

Upon finishing tuning Linux kernel, we need to tune HAProxy to better fit our requirements.

Increase Max Connections

In HAProxy, there is a “max connection cap” both globally and backend specifically. In order to increase the cap, we need to add a line of configuration under the global scope.

maxconn 2000000

Then we add the same line to our backend scope, which makes our backend look like this:

backend pushserver
        mode tcp
        balance roundrobin
        maxconn 2000000

Tuning Timeout

By default, HAProxy will detect dead connections and close inactive ones. However, the default keepalive threshold is too low and when applied to a circumstance where connections have to be kept in a long-pulling way. From my client side, my long socket connection to the push server is always closed by HAProxy as the heartbeat is 4 minutes in my client implementation. Heartbeat that is too frequent is a heavy burden for both client (actually android device) and server. To increase this limit, add the following lines to your backend. By default these numbers are all in milliseconds.

 timeout connect 5000
 timeout client 50000
 timeout server 50000

Configuring Source IP to solve port exhaustion

When you are facing simultaneous 30k connections, you will encounter the problem of “port exhaustion”. It is resulted from the fact that each reverse proxied connection will occupy an available port of a local IP. The default IP range that is available for outgoing connections is around 30k~60k. In other words, we only have 30k ports available for one IP. This is not enough. We can increase this range by modify /etc/sysctl.conf to add the following line.

net.ipv4.ip_local_port_range = 1000 65535

But this does not solve the root problem, we will still run out of ports when the 60k cap is reached.

The ultimate solution to this port exhaustion issue is to increase the number of available IPs. First of all, we bind a new IP to a new virtual network interface.

ifconfig eth0:1 192.168.8.1

This command bind a intranet address to a virtual network interface eth0:1 whose hardware interface is eth0. This command can be executed several times to add arbitrary number of virtual network interfaces. Just remember that the IP should be in the same sub-network of your real application server. In other words, you cannot have any kind of NAT service in your link between HAProxy and application server. Otherwise, this will not work.

Next, we need to config HAProxy to use these fresh IPs. There is a source command that can be used either in a backend scope or as a argument of server command. In our experiment, the backend scope one doesn’t seem to work, so we chose the argument one. This is how HAProxy config file looks like.

backend mqtt
        mode tcp
        balance roundrobin
        maxconn 2000000
        server app1 127.0.0.1:1883 source 192.168.8.1
        server app2 127.0.0.1:1883 source 192.168.8.2
        server app3 127.0.0.1:1883 source 192.168.8.3
        server app4 127.0.0.1:1884 source 192.168.8.4
        server app5 127.0.0.1:1884 source 192.168.8.5
        server app6 127.0.0.1:1884 source 192.168.8.6

Here is the trick, you need to declare them in multiple entries and give them different app names. If you set the same app name for all four entries, the HAProxy will just not work. If you can have a look at the output of HAProxy status report, you will see that even though these entries has the same backend address, HAProxy still treats them as different apps.

That’s all for the configuration! Now your HAProxy should be able to handle over 300k concurrent TCP connections, just as mine.

林盎然的博客 Angran Lin's Blog

永远期待下一段旅程 Always wondering where my next stop is

Monthly Archives: August 2015

Use HAProxy to load balance 300k concurrent tcp socket connections: Port Exhaustion, Keep-alive and others