layout: true background-image: url(img/hex-screen.jpg) image-credit: A Joe Beda original photo --- template: full-background background-image: url(img/control-panel.jpg) image-credit: [flickr photo](http://flickr.com/photos/petercastleton/3894184210 "control panel") shared by [petercastleton](http://flickr.com/people/petercastleton) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) class: top, left # The Operations Dividend .cblock[ Joe Beda KubeCon 2015 ] .image-credit[ {{image-credit}} ] ??? --- class: middle, left image-credit: [flickr photo](http://flickr.com/photos/goincase/1808642975 "Red Vines") shared by [Incase.](http://flickr.com/people/goincase) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) background-image: url(img/red-vines.jpg) # Development + Operations .image-credit[ {{image-credit}} ] ??? I'm going to talk about development and ops and how I think they should relate in this new world that we are all building together. Hopefully this will get everyone thinking and talking over this second day of KubeCon. --- class: middle, left background-image: url(img/couch.jpg) image-credit: [flickr photo](http://flickr.com/photos/quinnanya/4989518063 "Couch") shared by [quinn.anya](http://flickr.com/people/quinnanya) under a [Creative Commons ( BY-SA ) license](http://creativecommons.org/licenses/by-sa/2.0/) # DevOps? .image-credit[ {{image-credit}} ] ??? But, if we are talking about Dev and Ops, are we talking about DevOps? Actually, I don't think so because, honestly, I don't know what DevOps is. A cynical definition that I heard once (from Kelsey over beers) is that DevOps is group therapy for big corps. The dev org and the ops org hate each other and don't talk. You can make a mint selling a seminar the just gets them to say hi to each other. In all seriousness, I think there is a role for a stronger alliance here. I'll get to that a little later. --- class: middle, left background-image: url(img/fuzzy.jpg) image-credit: [flickr photo](http://flickr.com/photos/starr-environmental/22634580632 "starr-100817-3836-Argyroxiphium_sandwicense_subsp_macrocephalum-furry_silvery_leaves-Sliding_Sands_Flats_Haleakala_National_Park-Maui") shared by [Starr Environmental](http://flickr.com/people/starr-environmental) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) # WTH is a Micro-Service? .image-credit[ {{image-credit}} ] ??? Let's start with microservices. I'm sure you've heard this term. There are probably a lot of you that think you actually know what it means. And some of you might actually agree. But like much in this world, these concepts are **fuzzy**, so I'll take some time to motivate what *I* mean by micro services. --- class: middle, left background-image: url(img/lego.jpg) image-credit: [flickr photo](http://flickr.com/photos/64604717@N03/6077604789 "LEGO Blue 1x1 Studs") shared by [Eric Lumsden](http://flickr.com/people/64604717@N03) under a [Creative Commons ( BY-ND ) license](http://creativecommons.org/licenses/by-nd/2.0/) # Building Blocks .image-credit[ {{image-credit}} ] ??? First, microservices are a way to break up an application into smaller pieces. It is componentization by another name. You remember object oriented programming? It's that applied to both the architecture of the application and how it is managed in production. In traditional componentization, the components end with the linker. With microservices (or as I like to call them, services) the componentization extends to the datacenter. Your discovery service is your linker. --- class: bottom, left background-image: url(img/tray.jpg) image-credit: [flickr photo](http://flickr.com/photos/lennartt/8696945818 "Queen's Day / King's Day") shared by [Lennart Tange](http://flickr.com/people/lennartt) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) # SoA? .image-credit[ {{image-credit}} ] ??? This is another hack at the idea of SOA. But this time we have JSON. And we won't let standards bodies rain on the parade with an alphabet soup of overgeneralization. --- name: bus-video image-credit: Based on [YouTube Video](https://youtu.be/rhZCyQ3emQg?t=3m22s) posted by [shanrick37](https://www.youtube.com/channel/UCZNyiaDj3e3H589Lj0AK_Uw)
Your browser does not support the video tag.
# The Service Bus .image-credit[ {{image-credit}} ] ??? We won't get hit by the service bus this time. Note: This is the 43 bus on Capitol Hill in Seattle. Seattle can be just as hilly as San Francisco but we also get snow. That is a bad combo. --- class: middle, left image-credit: [flickr photo](http://flickr.com/photos/eflon/4404716468 "favelascape") shared by [eflon](http://flickr.com/people/eflon) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) background-image: url(img/slum.jpg) # Architectural Boundaries .image-credit[ {{image-credit}} ] ??? The big question is how and why do you break an application down into services. Often times this is based on existing systems and natural architectural boundaries. If you squint the right way, your database is a microservice. Any time you put a slow process behind a queue you are defining a service. Language boundaries are another example of natural service breaks. --- class: middle, left image-credit: [flickr photo](http://flickr.com/photos/jeffreyww/5923456035 "Mmm... Pico de Gallo") shared by [jeffreyw](http://flickr.com/people/jeffreyww) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) background-image: url(img/pico.jpg) # Pico-Services? .image-credit[ {{image-credit}} ] ??? The big shift here, however, is to break the app down even further. At one extreme, folks have suggested creating microservices that are so small each function has its own service. Pico-services? Femto-services? I think that is probably an extreme. --- class: middle, left image-credit: [flickr photo](http://flickr.com/photos/carbonnyc/4406858648 "Emptily Nested") shared by [CarbonNYC [in SF!]](http://flickr.com/people/carbonnyc) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) background-image: url(img/scaling.jpg) # Scaling Your Organization .image-credit[ {{image-credit}} ] ??? The biggest gains are when you can break down the services based on human boundaries in addition to architectural boundaries. In this way, microservices are about scaling your *organization*. Mr. Amazon himself (Bezos) has a "2 pizza rule" whereby the goal is to keep teams small enough so that they can be fed with two pizzas. This isn't just because he is cheap (they prefer the term "frugal") but because small teams are more effective. This also becomes natural ways to structure your organization. Conway's Law is that organizations "produce designs which are copies of the communication structures of [the] organization". You can choose to have the happen naturally or to structure your org based on good architectural breakdowns. --- class: middle, left image-credit: [flickr photo](http://flickr.com/photos/kevandotorg/7928566734 "Velodrome") shared by [Kevan](http://flickr.com/people/kevandotorg) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) background-image: url(img/velodrome.jpg) # Velocity .image-credit[ {{image-credit}} ] ??? If you can keep a microservice to the size that a small team can own it you can unlock greater velocity across your organization. --- class: bottom, left image-credit: [flickr photo](http://flickr.com/photos/16801915@N06/7480982660 "Sleepy Calfs") shared by [Reading Tom](http://flickr.com/people/16801915@N06) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) background-image: url(img/calfs.jpg) # Mythical Person Month .image-credit[ {{image-credit}} ] ??? You've all read (or at least pretend to have read?) the "Mythical Man Month"? As an industry we keep trying to ignore it, but the truth is that adding people to a team can make that team less effective. You can have 9 cows make a calf in 1 month. (Google tells me that cattle have a gestation period of ~9 months. Your pet -- cat or dog -- is more of an MVP and has a gestational period of ~2 months. Keeping your cat in a container doesn’t make kittens any faster. Cats do have some parallelism though…) --- class: middle, left image-credit: [flickr photo](http://flickr.com/photos/pio1976/3330670980 "Bookshelf Spectrum 1.0: mission accomplished") shared by [p!o](http://flickr.com/people/pio1976) under a [Creative Commons ( BY-ND ) license](http://creativecommons.org/licenses/by-nd/2.0/) background-image: url(img/spectrum.jpg) # The Service Spectrum .image-credit[ {{image-credit}} ] ??? Thinking of "library as service" is a great way to start in this world but is probably too limited. I think that there are different types of microservices. I'm going to define four here quickly but I'm sure that there are more variations on this theme. --- class: middle, left image-credit: [flickr photo](http://flickr.com/photos/andrepmeyer/4328963602 "behind the desk") shared by [andre.m(eye)r.vitali](http://flickr.com/people/andrepmeyer) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) background-image: url(img/behind-desk.jpg) # 1. Implementation Detail .image-credit[ {{image-credit}} ] ??? The first is what we've been talking about. In this case the microservice is an implementation detail of an application. --- class: bottom, left image-credit: [flickr photo](http://flickr.com/photos/ginnerobot/5539334903 "Krispy Kreme Conveyer Belt") shared by [ginnerobot](http://flickr.com/people/ginnerobot) under a [Creative Commons ( BY-SA ) license](http://creativecommons.org/licenses/by-sa/2.0/) background-image: url(img/donuts.jpg) # 2. Shared Artifact, Private Instance .image-credit[ {{image-credit}} ] ??? The second is what I'd call Shared Artifact, Private Instance. SAPI? Your MySQL database is most likely in this category. You are taking an off the shelf system and running a private instance of it. The artifact doesn't have to be totally off the shelf. Your big company may have some similar systems that are owned by a team where they deliver the code but each consumer runs their own private version of that code. "This service is mine. There are many like it, but this one is mine." --- class: bottom, left image-credit: [flickr photo](http://flickr.com/photos/cogdog/8188824613 "Life is Sharing") shared by [cogdogblog](http://flickr.com/people/cogdog) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) background-image: url(img/sharing.jpg) # 3. Shared Instance .image-credit[ {{image-credit}} ] ??? The third is I think the most interesting. This is a shared instance inside of a larger organization. A single service may serve lots and lots of teams. This could be some heavy system (machine learning sentiment analysis) or something simpler like a service that creates and returns common HTML UI elements to be composited into a final rendered page. --- class: bottom, left image-credit: [flickr photo](http://flickr.com/photos/lukedetwiler/11926630474 "Tillamook Donkey-0145") shared by [LukeDetwiler](http://flickr.com/people/lukedetwiler) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) background-image: url(img/steam-donkey.jpg) # 4. Big-S Service .image-credit[ {{image-credit}} ] ??? Finally, the last is what I've called "Big S Services". These are the fully realized SaaS models that extend outside of the corporation. Think SendGrid or Twilio. That isn't really what we are talking about here. --- class: middle, left image-credit: [flickr photo](http://flickr.com/photos/stendueland/16337035536 "Never let children climb in trees...") shared by [Sten Dueland](http://flickr.com/people/stendueland) under a [Creative Commons ( BY-SA ) license](http://creativecommons.org/licenses/by-sa/2.0/) background-image: url(img/playground.jpg) # Service Network .image-credit[ {{image-credit}} ] ??? Let's rewind back to #3. One interesting implication of this is that you end up with a complicated network of dependencies across your org. You end up with a network of services that spans otherwise independent systems. This is great because the tools that a team build can be widely leveraged. But this creates new challenges also. Understanding and managing this network (without getting overrun by the service bus) will, I suspect, be an active area of innovation. --- class: bottom, left image-credit: [flickr photo](http://flickr.com/photos/stephen-oung/6106479031 "Fire Dancer") shared by [SteFou!](http://flickr.com/people/stephen-oung) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) background-image: url(img/fire-dancer.jpg) # Operations .image-credit[ {{image-credit}} ] ??? Now let's change gears talk about ops for a bit. This is a bit of a non-sequitur here but I'll try to bring it back together. --- class: bottom, left image-credit: [flickr photo](http://flickr.com/photos/seattlemunicipalarchives/4050328720 "Medic One unit at hospital, circa 1973") shared by [Seattle Municipal Archives](http://flickr.com/people/seattlemunicipalarchives) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) background-image: url(img/medic-one.jpg) # SRE vs. Ops .image-credit[ {{image-credit}} ] ??? Inside of Google there are people called "Site Reliability Engineers" -- or SREs. These folks are awesome. They are the ones that sweat the details about making sure any specific application or system stays up and running well. In fact, SREs are so awesome at Google that there is a shortage. Not every product has an SRE team. People fight over getting staffed with limited SRE resources. Usually it is up to the SRE team to _qualify_ a product before taking responsibility for it. The dev team must demonstrate that they code they are handing off is ready to be managed in a sane way. Usually this is done by having the dev team run it in production for a while to establish a track record. --- class: bottom, left image-credit: [flickr photo](http://flickr.com/photos/egansnow/245460917 "2 am") shared by [Egan Snow](http://flickr.com/people/egansnow) under a [Creative Commons ( BY-SA ) license](http://creativecommons.org/licenses/by-sa/2.0/) background-image: url(img/2am-club.jpg) .image-credit[ {{image-credit}} ] ??? SRE are not ops. It's true, they carry a "pager". And they wake up in the middle of the night (or work on a follow-the-sun rotation) to answer pages. And, at 2am, they do what every other engineer does at 2am. They try to stop the bleeding and get things running again so that they can get back to sleep. The true difference comes in the morning (let's say 10am after that 2am page). Do you write the page off and complain about your lack of sleep? Or do you do a proper postmortem, take an honest look at what went wrong so you can identify the root cause? Do you actually have the mandate and support to fix that root cause? Do you run drills and inject faults into your systems to try to preempt the next outage? This is the difference between an SRE and a regular Ops person. Maybe this is what people mean by DevOps. I dunno. --- class: middle, left image-credit: From [Darwin's "Vorage of the Beagle"](https://en.wikipedia.org/wiki/Darwin%27s_finches#/media/File:Darwin%27s_finches_by_Gould.jpg). Public Domain. background-image: url(img/finches.jpg) # Ops Specialization .image-credit[ {{image-credit}} ] ??? I'm a huge fan of Cloud. It provides a huge boost when starting a product or a company. Being able to get a VM in 30 seconds really changes the way that people view infrastructure. However, ops has failed in the cloud world. IaaS is really "HW Ops as a service" and it is great as far as that goes. But everything inside the VM, by and large, has remained a big hairball. The answer here is "ops specialization". Let's break down the stuff running in the VM into layers and solve for those layers. Those layers have clean interfaces that let each individual ops team for that layer concentrate on doing a great job. What are those layers? I'm glad you asked. Here is one way to break it down. --- class: middle, left image-credit: [flickr photo](http://flickr.com/photos/blakespot/11605981536 "From my first Linux install. Debian v1.2 "rex" floppy from Dec. '96. Installed on a 160MHz 5x86.") shared by [blakespot](http://flickr.com/people/blakespot) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) background-image: url(img/debian-disk.jpg) # OS Ops .image-credit[ {{image-credit}} ] ??? The bottom layer above the HW is getting a kernel and base system running and stable. We've seen a host of Linux distributions that have really done a great job of delivering here. This would include CoreOS, RedHat Atomic, RancherOS, Snappy Ubuntu Core, VMWare Photon. The idea is to strip out all of the stuff that isn't needed so that this core can be super stable. --- class: middle, left image-credit: [flickr photo](http://flickr.com/photos/verzo/10071360486 "Blue grapes") shared by [Roberto Verzo](http://flickr.com/people/verzo) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) background-image: url(img/grapes.jpg) # Cluster Ops .image-credit[ {{image-credit}} ] ??? This is where things get interesting. Here we take the cluster system and elevate running it as a separate specialization. We can take something like Kubernetes and have some ops specialists ensure it is a well oiled machine. At Google, these clusters are generally shared across teams (lots of reasons to do that) but this model could also work with private cluster instances. --- class: middle, left image-credit: [flickr photo](http://flickr.com/photos/rotron/3655734558 "Fremont Troll") shared by [Roshan Vyas](http://flickr.com/people/rotron) under a [Creative Commons ( BY-ND ) license](http://creativecommons.org/licenses/by-nd/2.0/) background-image: url(img/fremont-troll.jpg) # App Ops .image-credit[ {{image-credit}} ] ??? This is the top layer. Here we have a set of people that have lots of domain knowledge about the specific application/service that is being run. They know what specific metrics matter and have deep knowledge to stop the bleeding at 2am. If the ops and dev teams are separate, they have a great working relationship because all of the incentives are aligned. If you do this right, the app ops and apps dev team are talking all the time and coordinating. Issues and fixes can cross these boundaries. And the app ops team will talk to the dev team a heck of a lot more then they'll talk to the cluster ops team. --- class: bottom, left image-credit: [flickr photo](http://flickr.com/photos/tibbygirl/6814660428 "Pig") shared by [tibbygirl](http://flickr.com/people/tibbygirl) under a [Creative Commons ( BY-SA ) license](http://creativecommons.org/licenses/by-sa/2.0/) background-image: url(img/rachel-pig.jpg) # Ops Costing .image-credit[ {{image-credit}} ] ??? Now, any sophisticated systems that needs to be run in a production setting will have an operational cost. This is just a law of nature. Let's try and break down that cost a bit. First, more complex systems are more complex to operate. This makes an intuitive sense. I'd even suggest that the cost here is super linear. For example, take two systems where one is twice as complex as the other. I'm guessing that the more complex system will have an exponentially higher cost than the simpler system. These things tend to spiral out of control. But there is also a static cost. Any moving part in production comes with some cognitive overhead. It is another line on a dashboard. Another version that needs to be managed. When you take these together, you I think this determines how granular you want to make your microservices. --- # Hand Wavy Graphs .center[[![Pretty Graphs!](img/ops-cost-calc.png)](https://www.desmos.com/calculator/oo6iyjhzp2)] ??? I've tried to boil a little bit of this down into a formula you can play around with. This graph the cost based on the total system complexity. It assumes that you are dividing that cost evenly across all of your microservices. I'm sure you can think of other terms to introduce here -- some of these may end up dominating. This is all BS, of course, but hopefully it can give you an intuitive feel for some of the trade offs. --- class: middle, left image-credit: [flickr photo](http://flickr.com/photos/adammeek/13210794463 "The Excavator") shared by [-mtnoxx-](http://flickr.com/people/adammeek) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) background-image: url(img/excavator.jpg) # Reducing the Constants .image-credit[ {{image-credit}} ] ??? I'd love to eliminate these costs but I'm not sure that is possible. What this whole conference is really about is reducing the constants in this equation. We want to make the static cost for running a service much much cheaper. We want to create new tools to instrument and discover what is going on so that we can handle more complexity at reduced cost. We want to make each SRE/ops person or developer much more effective. We want to create force multiplies. --- class: middle, left image-credit: [flickr photo](http://flickr.com/photos/stevensnodgrass/5548193945 "Reduce Reuse Recycle") shared by [Steve Snodgrass](http://flickr.com/people/stevensnodgrass) under a [Creative Commons ( BY ) license](http://creativecommons.org/licenses/by/2.0/) background-image: url(img/recycle.jpg) # The Operations Dividend .image-credit[ {{image-credit}} ] ??? To bring it back full circle, this is the operations dividend. As we create new tools we'll unlock potential. It is up to you on how you spend this dividend. You could keep moving at the pace you are moving and go to a 20 hour work week. Or you could take the staff you have and bask in increased productivity. You can take on more complexity and move faster. --- class: middle, left # Thank you! .cblock[ Joe Beda [@jbeda](https://twitter.com/jbeda) http://www.eightypercent.net ] .image-credit[ {{image-credit}} ]