Use HAProxy to load balance 300k concurrent tcp socket connections: Port Exhaustion, Keep-alive and others

30 Replies

I’m trying to build up a push system recently. To increase the scalability of the system, the best practice is to make each connection as stateless as possible. Therefore when bottleneck appears, the capacity of the whole system can be easily expanded by adding more machines. Speaking of load balancing and reverse proxying, Nginx is probably the most famous and acknowledged one. However, TCP proxying is a rather recent thing. Nginx introduced TCP load balancing and reverse proxying from v1.9, which is released in late May this year with a lot of missing features. On the other hand, HAProxy, as the pioneer of TCP loading balacing, is rather mature and stable. I chose to use HAProxy to build up the system and eventually I reached a result of 300k concurrent tcp socket connections. I could have achieved a higher number if it were not for my rather outdated client PC.

Step 1. Tuning the Linux system

300k concurrent connection is not a easy job for even the high end server PC. To begin with, we need to tune the linux kernel configuration to make the most use of our server.

File Descriptors

Since sockets are considered equivalent to files from the system perspective, the default file descriptors limit is rather small for our 300k target. Modify /etc/sysctl.conf to add the following lines:

fs.file-max = 10000000 
fs.nr_open = 10000000

These lines increase the total file descriptors’ number to 1 million.
Next, modify /etc/security/limits.conf to add the following lines:

* soft nofile 10000000
* hard nofile 10000000
root soft nofile 10000000
root hard nofile 10000000

If you are a non-root user, the first two lines should do the job. However, if you are running HAProxy as root user, you need to claim that for root user explicitly.

TCP Buffer

Holding such a huge number of connections costs a lot of memory. To reduce memory use, modify /etc/sysctl.conf to add the following lines.

net.ipv4.tcp_mem = 786432 1697152 1945728
net.ipv4.tcp_rmem = 4096 4096 16777216
net.ipv4.tcp_wmem = 4096 4096 16777216

Step 2. Tuning HAProxy

Upon finishing tuning Linux kernel, we need to tune HAProxy to better fit our requirements.

Increase Max Connections

In HAProxy, there is a “max connection cap” both globally and backend specifically. In order to increase the cap, we need to add a line of configuration under the global scope.

maxconn 2000000

Then we add the same line to our backend scope, which makes our backend look like this:

backend pushserver
        mode tcp
        balance roundrobin
        maxconn 2000000

Tuning Timeout

By default, HAProxy will detect dead connections and close inactive ones. However, the default keepalive threshold is too low and when applied to a circumstance where connections have to be kept in a long-pulling way. From my client side, my long socket connection to the push server is always closed by HAProxy as the heartbeat is 4 minutes in my client implementation. Heartbeat that is too frequent is a heavy burden for both client (actually android device) and server. To increase this limit, add the following lines to your backend. By default these numbers are all in milliseconds.

 timeout connect 5000
 timeout client 50000
 timeout server 50000

Configuring Source IP to solve port exhaustion

When you are facing simultaneous 30k connections, you will encounter the problem of “port exhaustion”. It is resulted from the fact that each reverse proxied connection will occupy an available port of a local IP. The default IP range that is available for outgoing connections is around 30k~60k. In other words, we only have 30k ports available for one IP. This is not enough. We can increase this range by modify /etc/sysctl.conf to add the following line.

net.ipv4.ip_local_port_range = 1000 65535

But this does not solve the root problem, we will still run out of ports when the 60k cap is reached.

The ultimate solution to this port exhaustion issue is to increase the number of available IPs. First of all, we bind a new IP to a new virtual network interface.

ifconfig eth0:1 192.168.8.1

This command bind a intranet address to a virtual network interface eth0:1 whose hardware interface is eth0. This command can be executed several times to add arbitrary number of virtual network interfaces. Just remember that the IP should be in the same sub-network of your real application server. In other words, you cannot have any kind of NAT service in your link between HAProxy and application server. Otherwise, this will not work.

Next, we need to config HAProxy to use these fresh IPs. There is a source command that can be used either in a backend scope or as a argument of server command. In our experiment, the backend scope one doesn’t seem to work, so we chose the argument one. This is how HAProxy config file looks like.

backend mqtt
        mode tcp
        balance roundrobin
        maxconn 2000000
        server app1 127.0.0.1:1883 source 192.168.8.1
        server app2 127.0.0.1:1883 source 192.168.8.2
        server app3 127.0.0.1:1883 source 192.168.8.3
        server app4 127.0.0.1:1884 source 192.168.8.4
        server app5 127.0.0.1:1884 source 192.168.8.5
        server app6 127.0.0.1:1884 source 192.168.8.6

Here is the trick, you need to declare them in multiple entries and give them different app names. If you set the same app name for all four entries, the HAProxy will just not work. If you can have a look at the output of HAProxy status report, you will see that even though these entries has the same backend address, HAProxy still treats them as different apps.

That’s all for the configuration! Now your HAProxy should be able to handle over 300k concurrent TCP connections, just as mine.

30 thoughts on “Use HAProxy to load balance 300k concurrent tcp socket connections: Port Exhaustion, Keep-alive and others”

Ralf Wenzel 2016年2月26日 at 01:30

I'm not sure about the IP source exhaustion solution:
the "net.ipv4.ip_local_port_range = 1000 65535" tweak makes sense.
This will allow ~60.000 conns targeting a single backend server (having its own IP in a real world szenario).

The next 60.000 conns can target the next backend server (having another than the first backend and so on).
Adding additional IP's to local network interface is only required when targeting a single backend.

Reply ↓
1. admin Post author2016年2月26日 at 03:44
  
  Yeah, it's just as you said.
  
  Our backend server has the ability to handle over 60,000 connections, that's why we have to do this to maximize the capability of the back end server.
  
  Reply ↓
2. 彦祖 2020年8月5日 at 13:43
  
  openwrt 一直修改ulimit，还是运行一段时间之后还是会半死不活的状态，绝望
  
  Reply ↓
Slawek 2016年3月8日 at 20:51

Hi there,

Thanks for a great tutorial. I studied it twice trying to fix issue we are having with our Chrome Ext. and a PHP Ratchet backend server. Problem is that there is a limit on HAProxy or PHP itself (or Debian?) that limits number of concurrent connections.

We had a PHP Websocket server run on port 8080 and limit of concurrent connections was around 1000 connections (1024?), so we have implemented HAProxy and now its LoadBalance traffic from 8080 to 8081, 8082, 8083 and so on (so we have multiple instances of Websocket server on different ports to handle more clients) … unfortunately after hours or digging around (few of thing from your tutorial were already implemented) and changes of configuration 2000 (2048?) is the highest number we can go!

Do you have any idea what might be wrong? Would you have time to have a look at our setup and infrastructure?

Thanks!

Reply ↓
1. admin Post author2016年3月25日 at 03:09
  
  Is your PHP side able to handle 2000 connections?
  
  Reply ↓
willy 2016年3月21日 at 23:58

Note that if you're connecting to 127.0.0.1, you don't need to bind to a "public" address, just use 127.X.Y.Z, they're all yours!

Reply ↓
1. admin Post author2016年3月25日 at 03:10
  
  Correct!
  
  Reply ↓
  1. Exocomp 2017年8月28日 at 00:17
    
    I don't understand the significance of this comment:
    
    "Note that if you're connecting to 127.0.0.1, you don't need to bind to a "public" address, just use 127.X.Y.Z, they're all yours!"
    
    Can you explain in more detail?
    
    Reply ↓
Anonymous 2016年4月26日 at 17:07

Thanks for share!

Reply ↓
hos7ein 2016年5月21日 at 19:25

hi
tnx for your Article.

I saw you use the loop back IP Address (127.0.0.1) on backend.
haproxy service and Your APP run on same server?

Reply ↓
1. admin Post author2016年5月28日 at 02:43
  
  This is just a demo config. In this demo, yes.
  
  Reply ↓
  1. hos7ein 2016年5月28日 at 13:36
    
    Hi
    
    I use haproxy-1.5.14-3.el7.x86_64 on centos 7.2 whit kernel 3.10.0-327.18.2.el7.x86_64
    
    I set two ip on haproxy server for Example eth0=10.10.10.1 and Virtual interface eth0:1 = 10.10.10.2 and use one backend server whit IP 10.10.10.11
    
    I use “source” on configuretion file on haproxy for send request from two IP Address (eth0=10.10.10.1 and eth0:1=10.10.10.2) to backend side,plz see this config :
    
    backend test
    mode tcp
    log global
    option tcplog
    option tcp-check
    balance roundrobin
    
    server myapp-A 10.10.10.11:9999 check source 10.10.10.1
    server myapp-B 10.10.10.11:9999 check source 10.10.10.2
    
    With this scenario,i get 120k connection on backend side (10.10.10.11) and Everything is ok.
    for give more connection I add other backend server for Example 10.10.10.12 , plz see this config :
    
    backend test
    mode tcp
    log global
    option tcplog
    option tcp-check
    balance roundrobin
    
    server myapp-A 10.10.10.11:9999 check source 10.10.10.1
    server myapp-B 10.10.10.11:9999 check source 10.10.10.2
    
    server myapp-C 10.10.10.12:9999 check source 10.10.10.1
    server myapp-D 10.10.10.12:9999 check source 10.10.10.2
    
    In this scenario i expected give 120k on Each backend server,But no! On each backend server only give 60k conncetion!
    
    what was wrong?
    can you help me?
    Tnx
    
    Reply ↓
    1. admin Post author2016年6月24日 at 14:28
      
      Looks like proxy exhausted its port. You need more IPs for each proxy.
      
      Reply ↓
Haven 2016年6月7日 at 17:29

backend mqtt
mode tcp
balance roundrobin
maxconn 2000000
server app1 127.0.0.1:1883 source 192.168.8.1
server app2 127.0.0.1:1883 source 192.168.8.2
server app3 127.0.0.1:1883 source 192.168.8.3
server app4 127.0.0.1:1884 source 192.168.8.4
server app5 127.0.0.1:1884 source 192.168.8.5
server app6 127.0.0.1:1884 source 192.168.8.6

In above configuration, does it mean that we will have two MQTT nodes run on port 1883 and 1884?

Reply ↓
1. admin Post author2016年6月24日 at 14:29
  
  Yes. Server should be able to handle requests from both ports.
  
  Reply ↓
Tom 2016年6月9日 at 07:44

Setting the hard and soft limits to 10 million like you posted will result in a broken system – this is too much even for our Dell R630's that are running CentOS 6.7 (128GB memory)!

1 million is the maximum that you can set these to – I think you have a typo.

Reply ↓
1. admin Post author2016年6月24日 at 14:30
  
  Not quite sure about CentOS. Was using Debian and able to reach the number.
  
  Reply ↓
2. Petrkr 2016年10月14日 at 22:00
  
  You need to set more File Descriptors to be able set more than 1 million. I solved that last day and it is hard to google it. Take look for sysctl fs.nr_open there is by default set 1 million and fs.file-max. Then you will be able set ulimit more than 1 million.
  
  Petr
  
  Reply ↓
Sushil 2016年7月5日 at 20:03

Hello,

We have two redis web servers behind haproxy, but i need all traffic should go to Redis-web1 only and haproxy should divert traffic to Redis-web2 only when Redis-web1 is down ?

Is this possible ? Please suggest

Thanks
Sushil R

Reply ↓
n00b Sys 2016年10月7日 at 04:36

What happens if one using haproxy to proxy traffic to remote servers?

Will the virtual network interface still work? I noticed you suing localhost which means apps will be running locally where haproxy is, but for cases where the apps are running on another server does it mean this is still possible?

If it will be possible then does it mean i will have to create the virtual interfaces on the remote servers? I am guessing that will not be possible right?

Please let me know if you understand my question.
Thanks!!!

Reply ↓
1. admin Post author2016年10月28日 at 05:37
  
  It's definitely doable, just creating the virtual interface will be more complicated. In the meanwhile, your remote server should be configured to accept multiple connections from the same host.
  
  Reply ↓
usergoodvery 2017年8月16日 at 12:58

Hi,

What's the significance of having the server listen on two different port numbers to this setup? Server won't have any port exhaustion issues because it is not initiating outbound connections the same way haproxy is.

regards,

Reply ↓
1. admin Post author2017年8月23日 at 08:33
  
  Just a demo for load balancing. No actual usage if you only have one server.
  
  Reply ↓
helwie ahmad 2018年5月11日 at 14:52

ist real production level? can you give detail specification e.g ram, proc, cpu?

Reply ↓
1. admin Post author2018年5月17日 at 08:19
  
  I kind of forgot. It's just a normal server configuration, like 16 physical cores with 64GB RAM IIRC.
  
  Reply ↓
Pingback: How we fine-tuned HAProxy to achieve 2,000,000 concurrent SSL connections | Cong Nghe Thong Tin - Quang Tri He Thong
phil 2021年6月8日 at 02:53

These lines increase the total file descriptors’ number to 1 million.
Next, modify /etc/security/limits.conf to add the following lines:

* soft nofile 10000000
* hard nofile 10000000
root soft nofile 10000000
root hard nofile 10000000

The above setting is harmful, it will prevent you from logging into your server. Apply this with caution

Reply ↓
Arihant 2022年8月22日 at 02:17

I am using haproxy to loadbalance my MQTT brokers cluster. Each MQTT Broker can handle up to 1,00,000 Connections easily. But the problem i am facing with haproxy is that is only handling upto30k connections per node. Whenever if any node is reaching near 32k connections, the haproxy CPU Would suddenly spike to 100% and now all connections start dropping.

The problem with this is, that for every 30k connection, i have to roll another MQTT broker. How can I increase it to at least 60k connections per MQTT broker node?

Note: I cannot increase virtual network interfaces in digitalocean vpc.

My config –
“`
bind 0.0.0.0:1883
maxconn 1000000
mode tcp

#sticky session load balancing – new feature
balance source
stick-table type string len 32 size 200k expire 30m
stick on req.payload(0,0),mqtt_field_value(connect,client_identifier)
option clitcpka # For TCP keep-alive
option tcplog

timeout client 600s
timeout server 2h
timeout check 5000

server mqtt1 10.20.236.140:1883 check-send-proxy send-proxy-v2 check inter 10s fall 2 rise 5
server mqtt2 10.20.236.142:1883 check-send-proxy send-proxy-v2 check inter 10s fall 2 rise 5
server mqtt3 10.20.236.143:1883 check-send-proxy send-proxy-v2 check inter 10s fall 2 rise 5
“`

Reply ↓
1. Arihant 2022年8月22日 at 02:18
  
  I have done the net.ipv4.ip_local_port_range = 1000 65535 thing.
  
  Reply ↓
  1. Arihant 2022年8月22日 at 02:20
    
    Running haproxy 2.4 on ubuntu 20
    
    Reply ↓