Rob Hirschfeld | On Computing, Containers, Cloud & Tech Culture
Rob Hirschfeld is CEO and co-founder of RackN, leaders in physical and hybrid DevOps software. He has been in the cloud and infrastructure space for nearly 15 years from working with early ESX Betas to serving four terms on the OpenStack Foundation Board and becoming an Execute at Dell. Follow this blog to get in depth knowledge on Openstack.
Note: OpenStack voting is limited to community members – if you registered by the deadline, you will receive your unique ballot by email. You have 8 votes to distribute as you see fit.
I believe open infrastructure software is essential for our IT future.
Open source has been a critical platform for innovation and creating commercial value for our entire industry; however, we have to deliberately foster communities for open source activities that connect creators, users and sponsors. OpenStack has built exactly that for people interested in infrastructure and that is why I am excited to run for the Foundation Board again.
OpenStack is at a critical juncture in transitioning from a code focus to a community focus.
We must allow the OpenStack code to consolidate around a simple mission while the community explores adjacent spaces. It will be a confusing and challenging transition because we’ll have to create new spaces that leave part of the code behind – what we’d call the Innovator’s Dilemma inside of a single company. And, I don’t think OpenStack has a lot of time to figure this out.
That change requires both strong and collaborative leadership by people who know the community but are not too immersed in the code.
I am seeking community support for my return to the OpenStack Foundation Board. In the two years since I was on the board, I’ve worked in the Kubernetes community to support operators. While on the board, I fought hard to deliver testable interoperability (DefCore) and against expanding the project focus (Big Tent). As a start-up and open source founder, I bring a critical commercial balance to a community that is too easily dominated by large vendor interests.
Re-elected or not, I’m a committed member of the OpenStack community who is enthusiastically supporting the new initiatives by the Foundation. I believe strongly that our industry needs to sponsor and support open infrastructure. I also believe that dominate place for OpenStack IaaS code has changed and we also need to focus those efforts to be highly collaborative.
OpenStack cannot keep starting with “use our code” – we have to start with “let’s understand the challenges.” That’s how we’ll keep building an strong open infrastructure community.
If these ideas resonate with you, then please consider supporting me for the OpenStack board. If they don’t, please vote anyway! There are great candidates on the ballot again and voting supports the community.
Seems fitting to start 2018 by finally posting this list I started in May while working on my DevOpsDays “SRE vs DevOps” presentation, I pulled an SRE and DevOps reading list from some of my favorite authors. I quickly realized that the actual influencer list needed to be expanded some – additional and suggestions welcome. A list like this is never complete.
Offered WITHOUT ordering… I’m sorry if I missed someone! I’ll make it up by podcasting with them!
I love great conversations about technology – especially ones where the answer is not very neatly settled into winners and losers (which is ALL of them in IT). I’m excited that RackN has (re)launched the L8ist Sh9y (aka Latest Shiny) podcast around this exact theme.
Please check out the deep and thoughtful discussion I just had with Mark Thiele (notes) of Aperca where we covered Mark’s thought on why public cloud will be under 20% of IT and culture issues head on.
We feel there’s still room for deep discussions specifically around automated IT Operations in cloud, data center and edge; consequently, we’re branching out to start including deep interviews in addition to our initial stable of IT Ops deep technical topics like Terraform, Edge Computing, GartnerSYM review, Kubernetes and, of course, our own Digital Rebar.
While the RackN team and I have been heads down radically simplifying physical data center automation, I’ve still been tracking some key cloud infrastructure areas. One of the more interesting ones to me is Edge Infrastructure.
This once obscure topic has come front and center based on coming computing stress from home video, retail machine and distributed IoT. It’s clear that these are not solved from centralized data centers.
While I’m posting primarily on the RackN.com blog, I like to take time to bring critical items back to my personal blog as a collection. WARNIING: Some of these statements run counter to other industry. Please let me know what you think!
By far the largest issue of the Edge discussion was actually agreeing about what “edge” meant. It seemed as if every session had a 50% mandatory overhead in definitioning. Putting my usual operations spin on the problem, I choose to define edge infrastructure in data center management terms. Edge infrastructure has very distinct challenges compared to hyperscale data centers. Read article for the list...
Running each site as a mini-cloud is clearly not the right answer. There are multiple challenges here. First, any scale infrastructure problem must be solved at the physical layer first. Second, we must have tooling that brings repeatable, automation processes to that layer. It’s not sufficient to have deep control of a single site: we must be able to reliably distribute automation over thousands of sites with limited operational support and bandwidth. These requirements are outside the scope of cloud focused tools.
If “cloudification” is not the solution then where should we look for management patterns? We believe that software development CI/CD and immutable infrastructure patterns are well suited to edge infrastructure use cases. We discussed this at a session at the OpenStack OpenDev Edge summit.
What do YOU think? This is an evolving topic and it’s time to engage in a healthy discussion.
I’m investing in these Site Reliability Engineering (SRE) discussions because I believe operations (and by extension DevOps) is facing a significant challenge in keeping up with development tooling. The links below have been getting a lot of interest on twitter and driving some good discussion.
Addressing this Ops debt is our primary mission at my company, RackN: we believe that integrated system level tooling is required. We also believe that new tools should not disrupt environments so we work very hard to adapt to requirements of individual sites.
SRE is urgent because it provides a pragmatic path and rationale for investment.
Even if you don’t agree with Google’s term or all their practices, I think fundamental concepts of system thinking, status/pay, automation investment and developer collaboration are essential. It should come as no surprise that these are all Lean/DevOps concepts; however, SRE has the pragmatic side of being a job function.
Here are some recent relevant discussions I’ve been having about SREs with links to both the audio and my text show notes.
TL;DR: infrastructure operations is hard and we need to do a lot more to make these systems widely accessible, easy to sustain and lower risk. We’re discussing these topics on twitter…please join in. Themes include “do we really have consensus and will to act” and “already a solved problem” and “this hurts OpenStack in the end.”
It’s essential to solve these problems in an open way so that we can work together as a community of operators.
It feels like developers are quick to rally around open platforms and tools while operators tend to be tightly coupled to vendor solutions because operational work is tightly coupled to infrastructure. From that perspective, I’m been very involved in OpenStack and Kubernetes open source infrastructure platforms because I believe the create communities where we can work together.
This week, I posted connected items on VMblog and RackN that layout a position where we bring together these communities.
Of course, I do have a vested interest here. Our open underlay automation platform, Digital Rebar, was designed to address a missing layer of physical and hybrid automation under both of these projects. We want to help accelerate these technologies by helping deliver shared best practices via software. The stack is additive – let’s build it together.
I’m very interested in hearing from you about these ideas here or in the context of the individual posts. Thanks!
Contrary to pundit expectations, OpenStack did not roll over and die during the keynotes yesterday.
In my 2011 Boston Summit shirt.
In fact, I saw the signs of a maturing project seeing real use and adoption. More critically, OpenStack leadership started the event with an acknowledgement of being part of, not owning, the vibrant open infrastructure community.
Continued Growth in Core Areas
Practical reasons for running dedicated infrastructure (compliance, control and cost) make OpenStack relevant for companies and governments with significant budgets. There is also a healthy shared infrastructure (aka public cloud) market living in the shadow of the big 3 players. It’s still unclear how this ecosystem will make money for the vendors.
What do customers buy? Should the Core be free?
My personal experience is that most customers are reluctant to (but grudgingly do) buy distros for the core open technology. They are much more willing to pay for adjacencies like security, storage and networking.
Emerging Challenges from Adjacent Technologies
Containers and Kubernetes are making a significant impact on the OpenStack community. At points, the OpenStack keynote was more about Kubernetes than OpenStack. It’s also clear that customers want to use containers as an abstraction layer to make infrastructure less visible or locked-in. That opens the market for using servers directly (bare metal) or other clouds. That portability is likely to help OpenStack more than hurt it because customers can exit workloads from the Big 3 players.
Friction for adoption remains a critical hurdle.
Containers, which are cloud first platforms, have much less friction than IaaS platforms. IaaS platforms, even managed ones, require physical infrastructure with the matching complexity and investment.
OpenStack: an open infrastructure software community
Overall, the summit remains an amazing community space for open infrastructure software and cloud alternatives to the Big 3 players. The Foundation’s pivot to embrace Kubernetes and foster several other open technologies helps maintain the central enthusiasm for open source infrastructure that gave birth to the platform in the first place.
A healthy pragmatic vibe
The summit may not have the same heady taking-on-the-world feeling as the early days; instead, it has a healthy pragmatic vibe. Considering how frothy this space remains, that may be a welcome relief.
What are your impressions? I’m looking forward to hearing from you!
We believe Cloud Native development disciplines are essential regardless of the infrastructure.
Today, RackN announce very low entry level support for Digital Rebar Provisioning – the RESTful Cobbler PXE/DHCP replacement. Having a company actually standing behind this core data center function with support is a big deal; however…
We’re making two BIG claims with Provision: breaking DevOps bottlenecks and cloud native physical provisioning. We think both points are critical to SRE and Ops success because our current approaches are not keeping pace with developer productivity and hardware complexity.
I’m going to post more about Provision can help address the political struggles of SRE and DevOps that I’ve been watching in our industry. A hint is in the release, but the Cloud Native comment needs to be addressed.
First, Cloud Native is an architecture, not an infrastructure statement.
There is no requirement that we use VMs or AWS in Cloud Native. From that perspective, “Cloud” is a useful but deceptive adjective. Cloud Native is born from applications that had to succeed in hands-off, lower SLA infrastructure with fast delivery cycles on untrusted systems. These are very hostile environments compared to “legacy” IT.
What makes Digital Rebar Provision Cloud Native? A lot!
The following is a list of key attributes I consider essential for Cloud Native design.
Micro-services Enabled: The larger Digital Rebar project is a micro-services design. Provision reflects a stand-alone bundling of two services: DHCP and Provision. The new Provision service is designed to both stand alone (with embedded UX) and be part of a larger system.
Swagger RESTful API: We designed the APIs first based on years of experience. We spent a lot of time making sure that the API conformed to spec and that includes maintaining the Swagger spec so integration is easy.
Remote CLI: We build and test our CLI extensively. In fact, we expect that to be the primary user interface.
Security Designed In: We are serious about security even in challenging environments like PXE where options are limited by 20 year old protocols. HTTPS is required and user or bearer token authentication is required. That means that even API calls from machines can be secured.
12 Factor & API Config: There is no file configuration for Provision. The system starts with command line flags or environment variables. Deeper configuration is done via API/CLI. That ensures that the system can be fully managed by remote and configured securely becausee credentials are required for configuration.
Fast Start / Golang: Provision is a totally self-contained golang app including the UX. Even so, it’s very small. You can run it on a laptop from nothing in about 2 minutes including download.
CI/CD Coverage: We committed to deep test coverage for Provision and have consistently increased coverage with every commit. It ensures quality and prevents regressions.
Documentation In-project Auto-generated: On-boarding is important since we’re talking about small, API-driven units. A lot of Provisioning documentation is generated directly from the code into the actual project documentation. Also, the written documentation is in Restructured Text in the project with good indexes and cross-references. We regenerate the documentation with every commit.
We believe these development disciplines are essential regardless of the infrastructure. That’s why we made sure the v3 Provision (and ultimately every component of Digital Rebar as we iterate to v3) was built to these standards.
What do you think? Is this Cloud Native? What did we miss?
Why has it been so hard to untie from Cobbler? Why can’t we just REST-ify these 1990s Era Protocols? Dealing with the limits of PXE, DHCP and TFTP in wide-ranging data centers is tricky and Cobbler’s manual pre-defined approach was adequate in legacy data centers.
Now, we have to rethink Physical Ops in Cloud-first terms. DevOps and SRE minded operators services that have need real APIs, day-2 ops, security and control as primary design requirements.
The Digital Rebar team at RackN is hunting for Cobbler, Stacki, MaaS and Forman users to evaluate our RESTful, Golang, Template-based PXE Provisioning utility. Deep within the Digital Rebar full life-cycle hybrid control was a cutting-edge bare metal provisioning utility. As part of our v3 roadmap, we carved out the Provisioner to also work as a stand-alone service.
Here’s 10 reasons why DR Provisioning kicks aaS:
Swagger REST API & CLI. Cloud-first means having a great, tested API. Years of provisioning experience went into this 3rd generation design and it shows. That includes a powerful API-driven DHCP.
Security & Authenticated API. Not an afterthought, we both HTTPS and user authentication for using the API. Our mix of basic and bearer token authentication recognizes that both users and automation will use the API. This brings a new level of security and control to data center provisioning.
Stand-alone multi-architecture Golang binary. There are no dependencies or prerequisites, plus upgrades are drop in replacements. That allows users to experiment isolated on their laptop and then easily register it as a SystemD service.
Nested Template Expansion. In DR Provision, Boot Environments are composed of reusable template snippets. These templates can incorporate global, profile or machine specific properties that enable users to set services, users, security or scripting extensions for their environment.
Configuration at Global, Group/Profile and Node level. Properties for templates can be managed in a wide range of ways that allows operators to manage large groups of servers in consistent ways.
Multi-mode (but optional) DHCP. Network IP allocation is a key component of any provisioning infrastructure; however, DHCP needs are highly site dependant. DR Provision works as a multi-interface DHCP listener and can also resolve addresses from DHCP forwarders. It can even be disabled if your environment already has a DHCP service that can configure a the “next boot” provider.
Dynamic Provisioner templates for TFTP and HTTP. For security and scale, DR Provision builds provisioning files dynamically based on the Boot Environment Template system. This means that critical system information is not written to disk and files do not have to be synchronized. Of course, when you need to just serve a file that works too.
Node Discovery Bootstrapping. Digital Rebar’s long-standing discovery process is enabled in the Provisioner with the included discovery boot environment. That process includes an integrated secure token sequence so that new machines can self-register with the service via the API. This eliminates the need to pre-populate the DR Provision system.
Multiple Seeding Operating Systems. DR Provision comes with a long list of Boot Environments and Templates including support for many Linux flavors, Windows, ESX and even LinuxKit. Our template design makes it easy to expand and update templates even on existing deployments.
Two-stage TFTP/HTTP Boot. Our specialized Sledgehammer and Discovery images are designed for speed with optimized install cycles the improve boot speed by switching from PXE TFTP to IPXE HTTP in a two stage process. This ensures maximum hardware compatibility without creating excess network load.
Digital Rebar Provision is a new generation of data center automation designed for operators with a cloud-first approach. Data center provisioning is surprisingly complex because it’s caught between cutting edge hardware and arcane protocols embedded in firmware requirements that are still very much alive.
I got to spend a few days hearing IBM’s cloud plans at IBM Interconnect including a presentation, dinner and guest blogging. Read below for links to that content.
As part of their CloudMinds group, we’re encouraged to look at the big picture of the conference and there’s a lot to take in. IBM has serious activity around machine learning, cognitive, serverless, functional languages, block chain, platform and infrastructure as a service. Frankly, that’s a confusing array of technologies.
Does this laundry list of technologies fit into a unified strategy? No, and that’s THE POINT.
Anyone who thinks they can predict a definitive right mix of technologies to solve customer problems is not paying attention to the pace of innovation. IBM is listening to their customers and hearing that needs are expanding not consolidating. In this type of market, limiting choice hurts customers.
That means that a hybrid strategy with overlapping offerings serves their customers interests.
IBM has the luxury and scale of being able to chase multiple technologies to find winners. Of course, there’s a danger of hanging on to losers too long too. So far, it looks like they are doing a good job of riding that sweet spot. Their agility here may be the only way that they can reasonably find a chink in Amazon’s cloud armour.
While the hybrid story is harder to tell, it’s the right one for this market.
Four Posts For Deeper Reading
The posts below cover a broad range of topics! Chris Ferris and I did some serious writing about collaboration and my DevOps/Hybrid post has been getting some attention. It’s all recommended reading so I’ve included some highlights.
“We’ve clearly learned that DevOps automation pays back returns in agility and performance. Originally, small-batch, lean thinking was counter-intuitive. Now it’s time to make similar investments in hybrid automation so that we can leverage the most innovation available in IT today.”