[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

TorqueBox scalable application problems?



I've been seeing some weird behavior which I am starting to think is related to my switch to a scalable application.  What happens is that the application starts up fine if I redeploy from git, but at some point (seemingly perhaps a few hours later), it goes down.  I know this because instead of seeing my application, I get the HAProxy dashboard page.  This seems to show that one gear (I think the one running both the app and the proxy) has been down the same length of time the proxy has been up (since the app was deployed) -- and the other gear was up for a few hours before going down.

If I look at the logs, the scale_events.log is quite large (5MB) and full of messages like this:

D, [2012-09-26T17:17:40.790202 #11207] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 2 sessions: 0 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 0 gear_remove_thresh: 20/20
I, [2012-09-26T17:18:41.013913 #11207]  INFO -- : GEAR_DOWN - capacity: 0.0% gear_count: 2 sessions: 0 remove_thresh: 49.9%
D, [2012-09-26T17:18:42.547415 #11207] DEBUG -- : GEAR_DOWN - remove-gear: exit: 0  stdout: 
D, [2012-09-26T17:18:42.549978 #11207] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 2 sessions: 0 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 0 gear_remove_thresh: 20/20
I, [2012-09-26T17:19:42.675263 #11207]  INFO -- : GEAR_DOWN - capacity: 0.0% gear_count: 2 sessions: 0 remove_thresh: 49.9%
D, [2012-09-26T17:19:44.183711 #11207] DEBUG -- : GEAR_DOWN - remove-gear: exit: 0  stdout: 
D, [2012-09-26T17:19:44.185946 #11207] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 2 sessions: 0 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 0 gear_remove_thresh: 20/20
I, [2012-09-26T17:20:44.466615 #11207]  INFO -- : GEAR_DOWN - capacity: 0.0% gear_count: 2 sessions: 0 remove_thresh: 49.9%
D, [2012-09-26T17:20:45.979944 #11207] DEBUG -- : GEAR_DOWN - remove-gear: exit: 0  stdout: 
D, [2012-09-26T17:20:45.982999 #11207] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 2 sessions: 0 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 0 gear_remove_thresh: 20/20

If I try to restart the app with ctl_app restart, it fails.  The only way to get my app running again is to make a meaningless change and push the repository again.

I suspect that for some reason the process used to start the app when using ctl_app and perhaps the auto-scaling is not identical to the one used when initially building and starting the app.  Maybe my pre_start hook is not being called, for example (or something like that)?  Then, at some point the auto-scaling process decides it wants to kill and restart my gear (I'm just guessing here, trying to figure out what might be happening) and this fails.

Any thoughts?  There does seem to be something wrong on the OpenShift side rather than my app -- since I would assume that if the app can be started by redeploying it, it should also be possible to bring it up using ctl_app start or restart.  I don't know for sure that this is related to scaling, but I never saw the problem before, and it seems plausible that there's a connection.  FWIW, traffic to the app is low and should not be triggering a scale up signal.  My understanding is that the system won't try to scale the app below the two initial gears, even if there is no activity.

Thanks for any help.

Chhi'mèd



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]