[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: questions on haproxy and front proxy layer



Your friends did answer most of the questions. Thx for your response as well.
Comments in no particular order

I got the response that httpd is running with pre_forked mpm. So, i think its a valid concern that such a proxy server fronting haproxy is simply bad during peak loads. I think the team did respond they are evaluating the node based proxy server. Single event loop servers [nginx, traffic server, haproxy] are better than apache mod_proxy. And for us milliseconds matter.

On the point of DNS, the reason I said we dont need it. So, say we already have a DNS published for www.yyy.com. Now we have different URL paths www.yyy.com/path1, www.yyy.com/path2 etc. In our current set up we just have one named vhost [yyy.com] and then we have mod_proxy rules to target different context paths to different downstream app servers. These app servers just have different context root for their apps but the server portion of the URL never changes. So, the way I see it creating a new app for us means adding a new Proxy rule to the Haproxy for that app and not necessarily creating a new vhost and creating DNS entries.
And we have multiple level of hardware LB with irules [F5] that further split traffic based on paths etc before it trickles down to httpd nodes.


About SPOF around Haproxy I think Ram's solution was interesting.
Either export the haproxy map to external balancers OR
create router clones [DIY with haproxy say] and export the map to these router clones. Hardware LB then balances between these multiple router clones.
In both of these auto scaling operation will need a new sync but initially we may launch enough gears at start up and turn off auto scaling.


I made the comment that for SSL event mpm of apache degrades to worker mpm because I remember reading it. I was told event mpm is fine with SSL. So, I looked back at the documentation
http://httpd.apache.org/docs/current/mod/event.html
It does say "
The improved connection handling does not yet work for certain connection filters, in particular SSL. For SSL connections, this MPM will fall back to the behaviour of the worker MPM and reserve one worker thread per connection"
So, i don't know if there is a fix for that.

SSL between nodes is a INFOSEC requirement for us for most services. There are a few exceptions like DB connection pool servers to DB etc.
It does cost us for multiple ssl terminations and creations but the safety concerns are paramount in our world.

Thx.


From: Luke Meyer <lmeyer redhat com>
To: meghdoot bhattacharya <meghdoot_b yahoo com>
Cc: dev lists openshift redhat com
Sent: Tuesday, January 15, 2013 9:41 PM
Subject: Re: questions on haproxy and front proxy layer

Meghdoot, having read the rest of this thread, I think you have some pretty interesting requirements. Although it sounds like you're comfortable rolling your own on-premise solution with Origin, I wonder if you would be interested in an Enterprise trial? I think some of the things you are talking about (SSL between gears of a scaled app) would be concerns for many of our customers, so it would be nice to get your perspective on what we are / aren't providing for your use case.

Some guys who know their stuff have responded already, I may be missing some things, but I thought a few things bore clarification. When talking about haproxy and httpd we have to be clear about which one - each operates at more than one level. Putting in my two cents below and hopefully not too far off. Also, this is assuming current implementation, not the stuff we are designing for the future:

----- Original Message -----
> From: "meghdoot bhattacharya" <meghdoot_b yahoo com>
>
> Hi,
> I had a few follow up questions based on my observations running
> openshift on Fedora 17 with broker/node running in same host [as per
> Krishna's blog post].
> If you folks can clarify and give your comments as seem fit it would
> be great. I am looking at deploying it on premise.
>
>
>
> 1. Creating multiple scaled apps result in dedicated haproxy load
> balancer per scaled app.

Each scaled app gets an haproxy to balance traffic between the multiple gears of that app and monitor/manage scaling, yes.

> However, both the apps had apache mod_proxy
> act as the front end proxy against their named virtual host?

There are multiple gears, each with a different hostname, running on each node host and all accessed at the same IP/port. The node host runs httpd with mod_proxy in order to translate from an external IP (10.0.0.1) and hostname (app-domain.example.come) to the internal IP the corresponding gear is listening on for web requests (127.1.2.3:8080).

Something has to do this translation job. httpd does it with vhosts and mod_proxy.

> In this
> fashion the effects of dedicated haproxy is greatly reduced given
> that we are sharing the traffic in front proxy layer?

Not sure what "effects" you mean, or for that matter which haproxy. The haproxy gear receives all the traffic for that app and distributes it out to all the gears of the app. It's the only part of the system that knows where to send that traffic; that's its whole job. Those gears may be on other node hosts, so the proxy request goes out to the external interface and back through the httpd proxy to get to its final destination.

> Or in multi
> node scenario, would there be only one haproxy loadbalancer per
> node?

If a scaled app has gears on multiple nodes (they're located with no consideration for grouping by app), all the traffic still goes first to the one haproxy gear pointed to by the app's hostname, from there to be proxied to other gears as needed.

> 2. Haproxy load balancer is using haproxy port proxy to reach to the
> apps directly running in different gears.

The haproxy listening to the external interface on each node host is not a load balancer. It's a port proxy for all of the non-HTTP(S) traffic, e.g. DB connections, JBoss session replication, and who knows what in the future. It doesn't care about host names, it just maps a port on the external interface to a port on the gear's internal interface.

> So, in that case what is
> the benefit of running apache in those nodes with a named virtual
> host of the gear other than to bypass haproxy for debug purpose?

The port proxy doesn't handle the multiple host names that are all coming in on port 80 and 443. The httpd proxy handles that. The port proxy handles everything else.

> I
> messed with the named virtual host name and both haproxy and haproxy
> stats page worked fine proving apache was not in play. The intent is
> not to run that apache in production unless that node may be also
> hosting a haproxy load balancer itself. Hmm...

Not saying you have to run apache, but you have to run something on the node host that translates HTTP(S) requests for different hostnames to different gears.

> 3. On premise lets say there is just one domain www.yyy.com and we
> really may not need dns support.

What do you mean you don't need DNS support? Apps are currently distinguished by their hostnames. You need a new hostname for each new app. How do you plan to identify apps such that you won't need to DNS resolve their names somewhere? Have a different external port for HTTPS for each gear? When a new app is created, how will people find it?

> And as in our case we have multi
> level hardware load balancers already in place. In that set up,
> ideally I want to kill at least one proxy layer, maybe the apache
> proxy layer?

Why? This proxy layer is all local to one host, it can't be adding more than a couple milliseconds to each request. If that kind of performance is a concern, I'm not sure a PaaS is right for you...

Host httpd and haproxy are the node IP -> gear translators. Gear haproxy is the app load balancer. You need both functions. And ideally, you don't want to have to restart one when the other changes.

> Can you guys suggest how will that setup look? Can
> instead of one haproxy load balancer in a gear can there be multiple
> clone haproxy gears and F5 loadbalances to haproxies which then
> routes to the nodes. These are secondary level F5 that we can kill
> in future but initially how we can work while keeping them?

Ah, now we're getting into future speculation :) I think the other responses covered that.

I'm not sure I understand what you want the F5 to do. Replace the HAproxy gear? In that case, for each gear created or destroyed in a scaled app, OpenShift would have to update the F5 configs to add backend destinations.


You later asked about making those inter-node connections SSL. Seems like we could do that for HTTP if we don't already. DB and other connection types, more complicated; guess that's one for IPSec.

You also asked (I think) about HA, given the HAproxy gear is a single point of failure (SPOF) for the app. Again this is the subject of future design. It can be done now, but it's not obvious. You can create multiple clones of the same app and load balance them, but you can't easily control which node host they land on, so you could still end up with everything on one host and thus a SPOF. You basically have two approaches:
1. Create the apps, then (as an OpenShift admin) move the HAproxy gears for the different apps to different hosts if they don't land there.
2. Create another gear profile ("size") or multiple just for the purpose of ensuring your clone apps are living on different hosts from each other.
In either case, need an external load balancer, and the clone apps won't share state unless you configure them to use an external resource (...which could then be a SPOF).

I hope this helps, will be interested to see if this clarifies things for you.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]