We've been running into issues concerning app create failures on small districts (where most of our users 'live'). This is a v3 system, where we are currently migrating to an M4 system. I am trying to determine if we'll run into the same problem later on M4. (We also have OSE 2.2.)
The problem we've been experiencing is that OSO is generating TC class IDs outside of the 16-bit range that is accepted by TC. The exact cause for this behavior is not completely clear, but may have to do with UID limits set on those nodes. The node UID limits are currently set to 1000 - 6500. We have approximately 900 gears in that district, being spread over 5 nodes - and have been experiencing consistent failures when trying to build applications in that district. We tried increasing the GEAR_MAX_UID to 10000 on every node in the district, to no avail.
I understand that OpenShift wraps the qdiscs to 64K, but what about the class IDs that are being fed to TC? It appears that those generated class ID's aren't wrapping, causing failures. I'd be very curious to know how you're doing it w/ online. Unfortunately, we don't have the same user-base/load in our OSE environment, so I can't confirm the issue there.
A cursory search led me to Brenton's article: https://brenton-leanhardt.rhcloud.com/understanding-how-uids-are-used-in-openshift-origin/
We'd appreciate any thoughts you might have.