From: Luke Meyer <lmeyer redhat com>
To: meghdoot bhattacharya <meghdoot_b yahoo com>
Cc: dev lists openshift redhat com
Sent: Tuesday, January 15, 2013 9:41 PM
Subject: Re: questions on haproxy and front proxy layer
Meghdoot, having read the rest of this thread, I think you have some pretty interesting requirements. Although it sounds like you're comfortable rolling your own on-premise solution with Origin, I wonder if you would be interested in an Enterprise trial? I think some of the things you are talking about (SSL between gears of a scaled app) would be concerns for many of our customers, so it would be nice to get your perspective on what we are / aren't providing for your use case.
Some guys who know their stuff have responded already, I may be missing some things, but I thought a few things bore clarification. When talking about haproxy and httpd we have to be clear about which one - each operates at more than one level. Putting in my two cents below and hopefully not too far off. Also, this is assuming current implementation, not the stuff we are designing for the future:
----- Original Message -----
> From: "meghdoot bhattacharya"
<meghdoot_b yahoo com
> I had a few follow up questions based on my observations running
> openshift on Fedora 17 with broker/node running in same host [as per
> Krishna's blog post].
> If you folks can clarify and give your comments as seem fit it would
> be great. I am looking at deploying it on premise.
> 1. Creating multiple scaled apps result in dedicated haproxy load
> balancer per scaled app.
Each scaled app gets an haproxy to balance traffic between the multiple gears of that app and monitor/manage scaling, yes.
> However, both the apps had apache mod_proxy
> act as the front end proxy against their named virtual host?
There are multiple gears, each with a different hostname, running on each node host and all accessed at the same IP/port. The
node host runs httpd with mod_proxy in order to translate from an external IP (10.0.0.1) and hostname (app-domain.example.come) to the internal IP the corresponding gear is listening on for web requests (127.1.2.3:8080).
Something has to do this translation job. httpd does it with vhosts and mod_proxy.
> In this
> fashion the effects of dedicated haproxy is greatly reduced given
> that we are sharing the traffic in front proxy layer?
Not sure what "effects" you mean, or for that matter which haproxy. The haproxy gear receives all the traffic for that app and distributes it out to all the gears of the app. It's the only part of the system that knows where to send that traffic; that's its whole job. Those gears may be on other node hosts, so the proxy request goes out to the external interface and back through the httpd proxy to get to its final destination.
> Or in multi
> node scenario, would there be
only one haproxy loadbalancer per
If a scaled app has gears on multiple nodes (they're located with no consideration for grouping by app), all the traffic still goes first to the one haproxy gear pointed to by the app's hostname, from there to be proxied to other gears as needed.
> 2. Haproxy load balancer is using haproxy port proxy to reach to the
> apps directly running in different gears.
The haproxy listening to the external interface on each node host is not a load balancer. It's a port proxy for all of the non-HTTP(S) traffic, e.g. DB connections, JBoss session replication, and who knows what in the future. It doesn't care about host names, it just maps a port on the external interface to a port on the gear's internal interface.
> So, in that case what is
> the benefit of running apache in those nodes with a named virtual
> host of the gear other than to bypass haproxy for debug
The port proxy doesn't handle the multiple host names that are all coming in on port 80 and 443. The httpd proxy handles that. The port proxy handles everything else.
> messed with the named virtual host name and both haproxy and haproxy
> stats page worked fine proving apache was not in play. The intent is
> not to run that apache in production unless that node may be also
> hosting a haproxy load balancer itself. Hmm...
Not saying you have to run apache, but you have to run something on the node host that translates HTTP(S) requests for different hostnames to different gears.
> 3. On premise lets say there is just one domain www.yyy.com and we
> really may not need dns support.
What do you mean you don't need DNS support? Apps are currently distinguished by their hostnames. You need a new hostname for each new app. How do you plan to identify apps such that you won't
need to DNS resolve their names somewhere? Have a different external port for HTTPS for each gear? When a new app is created, how will people find it?
> And as in our case we have multi
> level hardware load balancers already in place. In that set up,
> ideally I want to kill at least one proxy layer, maybe the apache
> proxy layer?
Why? This proxy layer is all local to one host, it can't be adding more than a couple milliseconds to each request. If that kind of performance is a concern, I'm not sure a PaaS is right for you...
Host httpd and haproxy are the node IP -> gear translators. Gear haproxy is the app load balancer. You need both functions. And ideally, you don't want to have to restart one when the other changes.
> Can you guys suggest how will that setup look? Can
> instead of one haproxy load balancer in a gear can there be multiple
> clone haproxy gears and F5 loadbalances to
haproxies which then
> routes to the nodes. These are secondary level F5 that we can kill
> in future but initially how we can work while keeping them?
Ah, now we're getting into future speculation :) I think the other responses covered that.
I'm not sure I understand what you want the F5 to do. Replace the HAproxy gear? In that case, for each gear created or destroyed in a scaled app, OpenShift would have to update the F5 configs to add backend destinations.
You later asked about making those inter-node connections SSL. Seems like we could do that for HTTP if we don't already. DB and other connection types, more complicated; guess that's one for IPSec.
You also asked (I think) about HA, given the HAproxy gear is a single point of failure (SPOF) for the app. Again this is the subject of future design. It can be done now, but it's not obvious. You can create multiple clones of the same app and load balance
them, but you can't easily control which node host they land on, so you could still end up with everything on one host and thus a SPOF. You basically have two approaches:
1. Create the apps, then (as an OpenShift admin) move the HAproxy gears for the different apps to different hosts if they don't land there.
2. Create another gear profile ("size") or multiple just for the purpose of ensuring your clone apps are living on different hosts from each other.
In either case, need an external load balancer, and the clone apps won't share state unless you configure them to use an external resource (...which could then be a SPOF).
I hope this helps, will be interested to see if this clarifies things for you.