There was once a time when dinosaurs roamed the earth; human beings first started working with tools such as rocks and layer 4 DSR-based load balancers.
At the time, these tools were useful because you could reach the servers with them, and they were also useful because it was easy to ensure the application server was presented with the actual client IP address.
A load balancer sits between the Client and the application, so looking back from the application, who is the sender? The Client (Yes, please) or the load balancer?
This was fairly easy, and you had a few choices open to you.
DSR mode – Direct Server Return.
- The client would connect to the load balancer.
- The load balancer would send it to the Application with the SourceIP of the Client.
- The Application would then send a response directly to the client. It would probably notice that the SourceIP is an external IP, then route the response to the gateway, thus skipping the load balancer on the Response.
So the dinosaurs liked this approach as it used less resource on the infrastructure, and the application gets the real client IP
The downside is that you need to do special networking configuration on the App server AND the big reveal … The load balancer does not get or see the response, so any SSL or content switching is out of the window.
The other disadvantage is that the LB has no idea how well the App server is doing, so it’s harder to load balance to servers that would most likely be more performance-like. The ADC would use the fastest responding or least connections. (Stone Age Man can install an agent on the App server to report this.) Or an ADC could use SNMP to get this.
Anyhow, this setup worked well for many years, like the horse and cart, and indeed, there are still situations where this is the best option, like chariot racing or some older protocols, for example.
Next up is:
Gateway mode
- The client would connect to the load balancer
- The load balancer would send it to the App Server with the SourceIP of the Client
- The Application would then send a response Direct to the client BUT VIA the load balancer because you have changed the default Gateway of the App Server to that of the load balancer. This means the load balancer will get the response.
This is much better in some ways, but ALL External traffic from the App server, such as software updates, is run through the load balancer, so it gets messy.
This was more common when the load balancers looked like switches, and everything was directly plugged in with high-speed cup and string network links.
Man learns to farm and use proxy-based load balancing
Around this time, networks and CPU speeds were getting faster, and we started seeing the Proxy approach.
We decided to come clean and start using the Load balancer as the source IP rather than tricking the application server. We did this for many reasons, including TCP performance, security, and the ability to provide additional services on the load balancer.
But this move away from hunter-gathering came with a big price and challenge.
How do you send the Client IP to the Application server?
Luckily, Good Old HTTP had a great answer as long as you were using HTTP.
X_Forwarded_For Header – allows you to put the client IP in the header and then send it on.
I won’t explain this in too much detail as it’s fairly straightforward, but suffice it to say it’s a header anyone can easily set. Also, it can get complex. For example, what happens if the load balancer is already sent an X_Forwareded_For IP? Should it use that or use the Source IP and change it? These are normally possible to configure on a decent LB, but they all have security considerations, etc.
Anyhow, 99% of the time, any time a web app needs the source, this is how they get it (Note: 99% of stats are made up)
What about other protocols?
A few have some protocol-specific hacks, but there is nothing generic that is more widely adopted.
Well, that was until the Proxy Protocol was proposed and widely adopted.
Proxy Protocol Explained
Proxy Protocol solves this problem by adding a header to the forwarded request that contains the original client’s connection details.
There are already two versions of this protocol.
- Proxy Protocol v1: Uses a simple, human-readable text format.
- Proxy Protocol v2: Uses a binary format, which is more efficient and can support additional features like IPv6 and Unix socket addresses.
What is different is that most application servers will break if you enable this without configuring them to accept it. This is because Proxy Protocol literally adds the required detail to the start of the data.
However, the adoption is growing and many vendors now support this.
It’s great for DNS proxying, for example.
As with X_Forwardard_For, there are still some decisions to be made:
For example, what do you want the load balancer to do if you are presented with a Proxy Protocol header? Remove it and set a new one, or keep it? If you do not expect one to be sent, you should ignore it and get your own, but these decisions must be considered.
Proxy Protocol is a simple way to get the Client IP and is now widely supported, so go ahead and enjoy.
I’m going back to thinking about high availability in the cloud, Ciao!