[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: OKD 4 - A Modest Proposal

Dear Clayton,

Thank you very much for your insight and many details about the upcoming version 4.0 of OKD. Based on your mail it sounds nearly too good to be true.

As an ops having played around with OKD 3.10 and 3.11 on CentOS 7 I would like to emphasize on the following negative points I have mainly seen with the aforementioned version of OKD and hope very much these can be improved in 4.0:

- Changing a single parameter of the cluster requires running again the whole ansible deployment which in my case with a small cluster of 3 nodes takes over 20 minutes. This is frustrating and annoying.

- Upgrading from OKD 3.10 to 3.11 was a big pain as it first failed due to version incompatibilities of ansible on CentOS 7 then because of other timeout issues which can be workaround with ugly hacks, etc. I think it took me a few days or even weeks with the help of the mailing list and github issues to finally manage to upgrade successfully. This is IMHO unacceptable from a security standpoint. As you mention in your mail upgrades should be painless and straightforward.

- Finally there is a LOT of documentation available for OKD which is great but in my case with the two main issues I mention above there is no clear documentation or guides helping much. At best one can find different upgrade scenarios which is quite confusing. For instance I still don't understand or found out what is the correct procedure with ansible to keep OKD 3.11 (or 3.10) to it's latest patch level, especially in terms of security patches.

This is my standpoint and opinion as an ops guy operating OKD also I must be honest I am only playing with OKD since 1 year now so don't have too much experience.

But again if I understand correctly and based on your mail below these issues should be addressed in OKD 4.0 so I am really looking forward to trying it out and will make my life as an ops easier. So thank you again so much for the effort.

Best regards,

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, June 20, 2019 11:19 PM, Clayton Coleman <ccoleman redhat com> wrote:

TL:DR - I can’t even summarize this, but it’s worth it to read!

First, I’ll start this off with an apology - I intended to draft an OKD 4 proposal many months ago, but I kept pushing it back to fix “just one more bug”, and as a result there’s been a real gap in regular summarization across the project.  While I have talked to many community members one-on-one, and many of us interact with each other on GitHub and on Slack and at conferences, I was remiss in highlighting and concentrating the roadmap, design, and iteration proposals for a large chunk of the last 6 months and I’ll do my best to rectify that starting now.

OKD 3.11 has been out since the fall, and is still getting fixes. It should be no surprise to folks on this list that the acquisition of CoreOS last spring triggered a rethink / re-imagining of what OpenShift could / should be.  There was a broad agreement that we’ve all been doing Kubernetes The Hard Way™ (even the cloud providers) since the early days of Kube. Some of these hard things we accepted because Kubernetes was moving so fast.

But Kubernetes is maturing.  The code base is moving from a monorepo to a much larger set of individual services and extensions.  The ecosystem on top of Kubernetes is what is now innovating at a rapid pace. Contributors from both CoreOS and OpenShift asked what a v2 of Tectonic and what a v4 of OpenShift would look like if:

  • we built a platform anchored around Kubernetes

  • that allowed us to rapidly include and support the innovation in the broader ecosystem

  • all the way down to the operating system

  • that informed the evolution of operators (the natural way to extend Kubernetes)

That took longer than anticipated.  Many of those pieces were big bets that we weren’t positive could be well integrated, and if you’ve been following along in the almost a hundred repos that make up OKD you know that some of those pieces reached maturity only in the last month or so. Some of the aspects of Tectonic which weren’t open source weren’t immediately replaced, and, as we evolved the initial CoreOS operating system vision, it wasn’t clear whether it would be Fedora, or RHEL, or something in between.  Much of the change happened in the open, but not all of the planning or debate at the high level.

I’m sorry for that. I will make a concerted effort to summarize what is going on and what to expect more regularly, and also do more to move those discussions into the broader forums rather than stay in specific scopes or specific channels.


So - where are we now (June 2019), and where do we go from here?

The first question is philosophy - what sort of shared goals should we define for OKD?

I personally feel strongly that the CoreOS mission - secure the internet with up-to-date open source software - continues to be more relevant with each passing year. As a contributor to Kubernetes, I know how difficult keeping an up-to-date version of it can be. As a personal computing user, I’m horrified at the insecurity of our hardware, our software, and the services and clouds we use.  And it’s not going to get better unless we make a concerted effort to fix it.

The OKD mission has always been to create a developer and operations friendly Kubernetes distribution that isn’t afraid to be opinionated in order to make running software easier.  That opinionation started with development tools on top of Kubernetes and security underneath. But I think we should now take the next step - strengthen our opinions on how we ship and update (continuously!) and how we run the platform itself.  Not just to deliver new features, but to deliver fixes and plug security holes.

So I would propose our goal for OKD 4 to be:


The perfect Kubernetes distribution for those who want to continuously be on the latest Kubernetes and ecosystem components. It should combine an up-to-date OS, the Kubernetes control plane, and a large number of ecosystem operators to provide an easy-to-extend distribution of Kubernetes that is always on the latest released version of ecosystem tools.


Does that resonate with others?  What other goals do people believe in and are willing to support with their time and effort?


The second step (if we agree with the philosophy) is to articulate the choices that would describe Kubernetes The Probably Better Than Before Way:

*** First, Kubernetes is the best platform for running distributed apps, and since we believe that, we should use it to run Kubernetes itself.

You should never have to:

  • Restart your control plane by SSHing to a machine

  • Remember which components are running as a system service vs as a pod

  • Orchestrate pods + node services.

  • Take downtime during a control plane reconfiguration

This ensures that the platform benefits from things we add (resiliency, debuggability, observability), and makes the platform (which HAS to run successfully 100% of the time) the best possible canary for when people introduce dumb bugs into Kubernetes that break that promise.  If a CI job fails because a control plane upgrade doesn’t maintain perfect availability, the contributor who made the change should be able to see that right there, before they merge their code.

Running Kube native things natively means operators, and there were a lot of tricky problems to solve - how do you run an operator that runs the control plane that happens to run the operator?  How do you recover when a node crashes? How do you manage etcd as a container? How do you get each component to have its own operator, and how do those present a coherent story? This part of the story took a long time to evolve and is still evolving, as anyone who has followed the operator SDK, operator lifecycle manager project, controller-runtime project, or the upstream addons project knows.

*** Second, the node and all the software on it exists solely to run containers for Kubernetes, and nodes should be an innate property of a cluster, not an external thing to manage.

You should never have to:

  • Build a golden VM image

  • See the error message “package conflict while updating kubelet”

  • Fix a broken VM on the cloud

  • Try to perform a manual rollback of packages on a host

To that end:

  1. The OS should contain all of the binaries necessary to function as a node

  2. All config on the node should come from the cluster

  3. The cluster decides when the node upgrades

  4. Tthe cluster should be able to deliver updates to the nodes just like any other component

To accomplish this, we decided to bring many of the key components of both Container Linux and Atomic into a harmonious whole.  ostree is really good at treating general purpose OS packages exactly like immutable content sets. Ignition is the *best* first boot solution that is consistent across clouds and metal, and has to be part of the OS to do that job well.  The OS has to be ready to be a node, programmed to join the cluster, securely and safely, and once it’s joined it needs to be 100% focused on running containers. That end result is Fedora CoreOS and RHEL CoreOS - but each one is slightly different, driven by slightly different use cases, and on different schedules.  A key goal is that the cluster controls the software on the node, so the expectation should be that OKD4 will control and own the software on that node from kernel to kubelet.

We also knew that we needed to make node lifecycle trivial.  Not easy, trivial. We added dynamic compute (just like dynamic storage and dynamic load balancing) by adopting and stabilizing aspects of the Kubernetes cluster api (specifically the alpha Machine API) to let you quickly and easily add new nodes, delete them, update them, and in general treat them like pods.  Treating nodes like pods is AMAZING (replaceable, fungible, editable, and no big deal) and is life-changing from an operational perspective. That said, the upstream project is going through a ton of change and iteration right now, so for the purposes of having something stable we chose to expose a limited subset of the machine API as an OpenShift resource and will continue to help steer the upstream in a direction that keeps that amazing user experience.  Bringing infrastructure under the control of the cluster has always been at the heart of Kubernetes, and extending that to nodes is the biggest operational improvement next to using operators for managing components.

*** Third, we want to have only the configuration that matters (no pointless choices), make them trivial to change post install, and make the config of the cluster Just-Another-API-Object.

You should never have to:

  • SSH to the machine to change the config of the cluster

  • Have to look in filesystem directories or set filesystem permissions on config

  • Have to restore anything except your etcd backup

  • Ask what fields are supported on a config change

You should be able to declaratively configure a Kubernetes cluster exactly like you declaratively configure a Kubernetes application.  If you change this global configuration (the spec) you should get an update when it’s been applied (status). The hard choices here were how much config to expose - I’ll be honest, we may have gotten some of it wrong, and some things that work in 3.11 won’t work in OKD4.  We’ll definitely need to assess where we went too far and add back some of that config.

But the new pattern is a lot easier to reason about - config changes become:

   oc apply -f my-infrastructure-config-folder

Or the more interactive:

   oc edit authentication

That’s it!  Instead of 2400 ansible parameters, or 600 kubernetes component flags, we have ~50-60 core parameters on API objects, and a bunch of new API objects (like machinesets and ingress controllers) that can be adopted for your own use cases, and ALL of them can be managed with the same tools you use for your appliactions.

Since we planned to use CRDs for this, there was a LOT of work to make CRDs… well, frankly, be usable.  That included a ton of performance work at scale, validation on CRDs, getting server-side ‘get’ working, and supporting OpenAPI on CRDs:

   oc explain authentication

A huge shout-out to everyone who has been involved in CRDs in Kubernetes - there was a lot of drudge work to get them up to the standards you’d expect so that they can be used for config, and everyone will benefit from this.

*** Fourth, you have to believe that updates will Just Work in order to trust turning on automatic updates

You should never have to:

  • Click the upgrade button

  • Decide whether it is safe to upgrade

  • Forget to upgrade to get the latest security patches

  • Take downtime during an upgrade

This is a hard problem, and it involves a lot of practice.  It means better technology when deploying, but also better testing, better repeatability, and better standardization of *how* updates are rolled out. It means better discipline in upstream projects to avoid breaking changes, and the ability to catch when regressions happen and roll back.  Much of the effort involved in making updates “just work” comes from a community of users who are willing to test, offer feedback, and provide fixes. While the three previous items are a big help, without a community process that encourages early adoption and incentivizes participation this is the most likely to fail.

I think this is the area we have the most risk in achieving as a community, so experimentation and steady evolution are important.


There are other parts of this story, but these four form the core - operators and self management, machines and CoreOS, a simpler and internally consistent set of on-cluster configuration, and trustworthy automatic updates.  I believe that we should inherit and do justice to the CoreOS vision to have a continuously up to date version of Kubernetes that brings in the wider ecosystem where you are never afraid to upgrade - the “cloud service” model of Kubernetes anywhere (any platform, cloud or metal) that hides the boring details so you can Just Run Software.


The work to get from where we are to an OKD4 alpha is probably three major parts:

Because the operating system integration is so critical, we need to make sure that the major components (ostree, ignition, and the kubelet) are tied together in a CoreOS distribution that can be quickly refreshed with OKD - the Fedora CoreOS project is close to being ready for integration in our CI, so that’s a natural place to start.  That represents the biggest technical obstacle that I’m aware of to get our first versions of OKD4 out (the CI systems are currently testing on top of RHEL CoreOS but we have PoCs of how to take an arbitrary ostree based distro and slot it in).

We would also need to define a release process that benefits and reinforces the continuous model that operators enable.  There have been a lot of investments made to CI in OKD and I think that can help us deliver that fast update premise. I think we’ve also seen that a distribution is more than just it’s kernel, so concrete versions and version numbers are less meaningful than in the past.  I suspect a date-based release model where fixes regularly flow in, things are automatically tested, and you can easily upgrade might better reflect the reality of the environment we live in than the hard Kubernetes boundaries of the past, as well as help reduce the time from problem identification to resolution.

Finally, I’m sensitive to feedback I’ve heard from many people who want the freedom to mix and match, so I think we can err on the side of “making forks easy” if you want to experiment with your own distribution.  That matches a similar desire in the Fedora CoreOS working group to target different use cases, including those that have nothing to do with Kubernetes - that freedom to have alternatives is part of what I like the most about open source, and I do think that everyone should have the opportunity to make those choices differently via tooling

I will note that until OKD4 alpha is ready, those who are interested can use https://try.openshift.com to try an OCP4 cluster.  If there are things that you *don’t* like about OCP4 that OKD4 should learn from, that feedback is important.


This was a long email - sorry again - but I hope it can help start the discussion about what comes next.  I suggest we gather feedback and thoughts for a week or so here, and then try to draft something more concrete.  In the meantime, getting a read out from where Fedora CoreOS is currently at, thinking about what pieces of the current integration streams may not make sense for OKD4, and what additional points of view we should consider before moving forward would all be very concrete steps we could take.

I’ll be hosting an OpenShift Commons Briefing on June 26th at 9:00 am Pacific to further explore these topics with the wider community. I hope you’ll join the conversation and look forward to hearing from the others across the community.  Meeting details here: http://bit.ly/OKD4ReleaseUpdate

Thank you for your continued support and for being a part of the OKD community,

Clayton Coleman

Kubernetes and OKD contributor

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]