Humans are primates.
(If you disagree with that statement, you should leave this talk right now. You will not be happy.)
That means, among other things:
Hey wow is that an Watch?
You shouldn't let humans anywhere near your software deployments.
They will get it wrong.
Sometimes they just miss a step, or make a typo and some small-but-critical step doesn't work.
Some of this is harmless.
It's not the people who are so unambitious as to never bother to learn how to write shell scripts that are the problem.
That's not really a big deal. I mean, sure, the development team is unhappy, and the customer didn't get what the business promised them, but it's not, you know, bad.
It's the ones who do script things that you have to worry about.
# Deploy the application DEP_ENV=prod whizzerate -infibulator --pkg-directory=$deploydir -q # Remove the now-deployed application package cd $deploydir rm -rf *
It saves time! It's perfect!
YOU: Me
PROJECT MANAGER: OVERLY EXCITABLE AND SHOUTS IN COMIC SANS AND ALSO DOESN'T USE PUNCTUATION!
DEVELOPER: Sullen and sleepy. This wasn't in the job description.
OPS GUY: For those whose favorite Guardians Of The Galaxy character was Groot.
(timestamps) and (crickets): The chorus (everyone else).
Ops: you only have two words, and they are always in the same order: "Herp" and "Derp". But you do have to vary the inflection. Chorus: when it has a time, say "tick tick tick," and when it says "crickets", say "chirp chirp chirp"
(4:46 AM. Phone rings.)
YOU: Mrphgblargle.
PROJECT MANAGER: OH THANK GOD WE FINALLY MANAGED TO GET THROUGH TO YOU! FLORBLEFINGER 2.12.22 FAILED ITS DEPLOYMENT AND WE ONLY HAVE AN HOUR LEFT IN THE WINDOW WHY DID THIS BREAK LITTLE OLD LADIES CAN'T GET THEIR DRUGS!
YOU: Ugh. What time... Hang on. Gotta log in. Effin' VPN. Stand by.
(4:56 AM)
YOU: OK. I don't have access to the box. What's it saying in the logs?
(crickets)
(5:13 AM)
YOU: The stack trace seems to show that the JVM wants to use the vbc.florblefinger.enablePoorLifeDecisions property, and it's not set. Is there a developer here?
(crickets)
YOU: Could we get one, please?
(5:22 AM)
DEVELOPER: It works on my desktop.
YOU: Is your desktop in the data center? Oh boy, did I just say that in my outside voice?
DEVELOPER: I dunno, that variable gets set in the default Eclipse environment we got from Application Engineering.
YOU: How did this get through QA?
PROJECT MANAGER: THIS RELEASE WAS FAST-TRACKED BECAUSE BY THE TIME ARCHITECTURE WAS DONE WE NO LONGER HAD BUDGET FOR A FULL QA CYCLE.
YOU: And a non-full QA cycle would be...?
(crickets)
YOU: Can we get an App Eng res...you know what, never mind. Hey, ops guy?
OPS GUY: Herp! Derp!
YOU: Can you please use a text editor to open the florblefinger wrapper script?
OPS GUY: Herp? Derp?
YOU: Can you please share your screen?
(5:49 AM)
YOU: OK, now type the letter i. Now type the minus sign. Now type capital D. No. Ca... (sigh) Escape. H. R. Hold down shift. Type D. Let up shift. Vee-as-in-vomit...
(5:54 AM)
YOU: Escape. Colon....Colon. As in where your hea...er, hold down shift and press the key right of L. Now type W, then Q. Hit Enter. (deep breath) OK. Now, please follow the last three lines of the Implementation Plan.
OPS GUY: Herp. Derp.
(5:58 AM)
YOU: Welp, looks like it's working.
PROJECT MANAGER: GREAT! I'LL SEE YOU IN ROOM 211B AT 8:00 SHARP FOR THE POST-DEPLOYMENT DEBRIEF! GREAT JOB EVERYONE, WE GOT THE RELEASE INTO PRODUCTION!
(8:05 AM)
PROJECT MANAGER: GREAT! WE HAD A SUCCESSFUL DEPLOYMENT LAST NIGHT BECAUSE I DID SUCH A GREAT JOB MANAGING THE PROJECT THAT I AM THE BESTEST PROJECT MANAGER IN THE HISTORY OF PROJECT MANAGERS AND I AM SURE MY BONUS WILL...
(8:56 AM)
PROJECT MANAGER: ...SO CONGRATULATIONS TEAM AND LET'S GET FOCUSED ON OUR NEXT RELEASE WHICH WILL BE IN 86 DAYS AND...WHAT IS IT NOW?
YOU: So, Project Manager, can you please get Development's manager to pinky-swear that they will actually put the properties into the script wrapper for the next release? And maybe ask whoever owns Application Engineering to have someone update their docs to make it clearer that Development needs to do that in the general case, even if the values are provided in the development environment?
PROJECT MANAGER: SURE! I WILL SET UP ONE RECURRING MEETING BETWEEN YOU AND A LOW-LEVEL DEVELOPER TO WHOM NO ONE WILL LISTEN TO PRIORITIZE THE SWEARS AND PROPERTIES (click click) ON ALL MONDAYS AT 7:30 AM AND THEN ANOTHER RECURRING MEETING BETWEEN YOU AND A CHECKED-OUT CONTRACTOR WHO IS JUST WAITING FOR HIS PAPERWORK FROM ANOTHER VERY BIG ENTERPRISE TO COME THROUGH TO WRAPPING OF THE SCRIPTING GENERAL DOCUMENTATION CASE (click click) ALTERNATING FRIDAYS AT 5:30 PM AND TUESDAY AT NOON SORRY ABOUT THE LUNCH MEETING BUT THAT WAS THE ONLY TIME YOU WERE OPEN LOL! WELL WE'RE OUT OF TIME GOOD JOB EVERYONE!
(spoiler: the properties never go in the wrapper and this conversation happens every three months until you leave the company)
The same thing that left dev, and got tested in QA, is the thing that goes to production.
The deployment machinery is also well tested, and the same machinery functions in all environments after the developer's desktop.
Developers are given clear guidelines as to how they must package their applications to be picked up by the deployment machinery.
Their timelines include budget for getting the desktop-to-dev transition handled.
People aren't afraid of deployments anymore, because the software works reliably.
In fact you usually do rolling deployments in the middle of the day, and hardly ever start something big at 4:45 on Friday.
There is a strong cultural bias AGAINST middle-of-the-night heroics and FOR if-it-breaks-send-it-back: right is more important than Right Now.
Public domain image, http://www.publicdomainpictures.net/view-image.php?image=52498&picture=pony&large=1
The last time I tried this, I tried to drink the Kool-Aid.
- All artifacts were RPMs.
- All files were tracked in RPMs.
- Config files were not edited by hand.
These seemed like pretty simple precepts to follow.
It was a miserable failure.
The commitment to RPMs was superficial.
No one reads documentation. Ever.
- Or did that happen already, and everyone knows but me?
Most places are running some OS now that will support Linux kernel namespaces and hence Docker.
- You can also do the Docker-on-Windows-or-OS-X thing of running Docker on a Linux VM inside VirtualBox (or another full-machine-virtualization engine) on the host OS.
Let's figure out how to containerize.
Provide isolation between multiple services running (perhaps) on the same hardware.
Why is that not, just, y'know, a process?
Because people are stupid.
OK, OK, maybe that's a little harsh.
The short answer is: conflicting namespaces.
Linux kernel namespaces provide containers with their own set of each of these, as well as PIDs, which you don't generally directly care about, the hostname, ditto, and System V IPC, which needs a stake through the heart.
So, sure, you can overcome all these problems by adhering to rigorously followed conventions.
- You will use $HTTPS_LISTENER_PORT rather than hardcoding it.
- Yes, that's right, it's not in your Eclipse environment. You'll need to provide it there or use this script we've written for you...
- Yes, that's right, it's also not in the environment you see when you log in as the mule user to the dev box, because you can have up to four horizontal instances, and so the offset is calculated at....
- NO! You can't just set it in your .profile! Stop it!
See, above, "no one ever reads documentation," and "miserable failure."
Or you can just run the application in a container, which is only a little more heavyweight than a process, but looks like it has:
Which, OK, isn't as elegant as doing it all with nifty environmental setup scripts and clever shell-evaluations to get variable names, but might be a little more approachable.
So, in retrospect, maybe "Read the docs! Trust the environment! Don't act like flinging your feces at onlookers is the highest action of which you are capable!" wasn't the hill to die on.
You can't just drop DevOps into an existing organization without cultural and structural changes.
In a traditional enterprise with separate silos for Operations, Engineering, and Development, developers will almost certainly behave like a bunch of crack-addled gibbons whose only concern is to charge through eighty gajillion sprints all focussed on feature introduction and none at all on fixing bugs and then get their thing running on their lovingly hacked-up personal box and then, with mere seconds to spare before the hard deadline for "you missed your window," chuck it over the wall to Operations, whose job it is to get it running in production, to support it, and to point fingers at Engineering (whose actual role, vis-á-vis Development, has been "to be ignored") when it inevitably immolates itself in a giant conflagration of suck.
Enabling this behavior is like giving a playground full of toddlers a handle of tequila, a barrel of sharp knives, a can of gasoline, a box of strike-anywhere matches, six ounces of crystal meth, and half a pound of plutonium.
You might have guessed that my background is not in software development.
That's just some idiot with a chainsaw, not a crack-addled gibbon.
Nope, that's Rob Ford, former (and perhaps future) mayor of Toronto. Getting warmer, though.
There we go.
It's not that developers really are crack-addled gibbons.
Rather, it is that enterprises provide them with perverse incentives that reward crack-addled gibbonoid performance.
Perverse Incentives is also the name of my Hank Greenberg / Jamie Dimon slashfic novel, available soon on Kindle.
No one cares about your infrastructure.
The only things people care about are your exposed endpoints.
That sounded dirty.
Continuous values (or even a large set of discrete values) are for chumps.
No one cares whether your host is 2% utilized or 78% utilized. [*]
All anyone cares about is the service behind the endpoint, and for that only:
- Which actually means, is it responding accurately to more than some threshold percentage of requests within a certain threshold of time?
- You may want to keep the measured values if you're trying to predict when it's going to go from OK to Not OK.
- But in that case, just warn the user that the threshold is approaching and your best guess as to when it's going to be crossed.
- Digging out the supporting data should be a rare event and it's OK not to cater to it.
If you're on your own hardware, and you're not virtualizing...wait, why are you not...oh, never mind. Let's go with, this should be true.
FINE you can have Green, Yellow, and Red, if you must: OK, Not Really OK, and Really Not OK.
Fair enough. So how about a story that goes like this?
- Embedded hardcoded paths relative to four different researchers' home directories...
- With some fancy /etc/group work and permissions to allow data exchange...
- And relies on some antediluvian version of some library, because it worked with libfoo 3.2 but broke with 4.0 and who has time to chase that down?
Well, nothing, as long as you and five people down the hall are actually the only ones using your code.
But then you need to collaborate with someone who doesn't sit twenty feet away and with whom you haven't been working for the last fifteen years.
Let's assume you can agree on a data format both sides of the collaboration will use.
There sure is a lot of data.
Ayup.
Way too much to ship back and forth.
Yup.
(There may be other political/priority/funding reasons you cannot exchange the data. But the size consideration would be sufficient.)
So you can't move the data.
You don't have much choice but to move the computation, do you?
How do you do that?
What you do is send your collaborator the analysis program, and then spend tens of hours over the next few weeks just making the software work.
- Someone destroys their own environment and can't work on their own stuff anymore, because Frobnoid 14.3, required by the program, is not compatible with Frobnoid++ 2008, which is what used to be on the machine before this ill-starred collaboration happened.
- Someone installs a virtual machine because they are (justifiably) scared of the above and then discovers the joy of trying to install archaeological software on a modern system.
- Novice: Dude, where's my CD?
- Intermediate: Dude, where's a working floppy drive?
- Expert: Dude, where's a nine-track tape drive and a bus-and-tag-to-Centronics-parallel adapter to plug into a parallel-to-USB adapter?
- Death Incarnate: Dude, has anyone seen a punch card reader around here?
This gets much worse once it seems to be working—it's running for a long time and emitting output—but their run of the test case doesn't agree with your run.
- eating broken glass
- setting yourself on fire
- gouging your eyes out with a spork
Well, obviously, the correct answer is, you rewrite all this old broken code into something modern and supportable.
And while you're at it, remove all the hardcoded stuff and make it configuration-service driven.
Oh, and parallelizable, preferably with some nifty annotation-based widget that automagically fires up instances on a public cloud and splits the data processing among them.
And document the data format and software version dependencies.
And a pony.
Public domain image, http://www.publicdomainpictures.net/view-image.php?image=52498&picture=pony&large=1
I meant that in the CS sense of a function that has an environment bundled with it, but thinking of it as a wound closure works too. Maybe better.
The great thing about containers is they let you get away with sloppiness.
I only mean this semi-ironically.
- No need to worry about doing it the right way: just throw the library binaries, framework templates, ancient, buggy, security-vulnerable versions of Java, whatever in there higgedly-piggledy!
First make sure it can run on Linux with a modern kernel. I'm looking at you, vital piece of equipment that talks over a Centronics parallel cable and for which the latest (and proprietary) device driver was written in 1990 for MS-DOS 3.3, by a company that hasn't existed since 1993.
- It doesn't matter HOW hideous or fragile the shell script [§] is that gets it done.
- You can pipe literal strings into it where you'd usually type things
- Run expect.
- Use other hastily written shell scripts or text files as input. How gross it is really doesn't matter.
- Copy that automated process into your container as a COPY line in the Dockerfile.
- Have your Dockerfile run it as part of the build with RUN.
Or, you know, a program to do all the packaging written in FORTRAN, if that's more your style. That's the point.
Is that this whole containerization thing is just a way to hastily paper over shoddy software engineering and bad design decisions?
Yes. Yes I am.
No one cares about your software engineering either.
This test is not graded on a curve. It is straight up pass/fail.
Does the software work, or doesn't it?
(P.S. I'm done with those silly slide transitions. But, you know, I was using Hovercraft, so I had to try 'em.)
You may not get much control over this.
A generic application needs, probably, the following:
Often, however, the Orchestrator and the Service Locator are implicit or manually configured (either at installation, or every single time the service is used)
Some software that I might have some connection to through some organization I'm carefully not mentioning in this talk might have the following characteristics.
It exposes:
It has a bunch of components (most are "actuators"):
- Service location.
- Data store.
- Some kind of encrypted storage for sensitive data.
- Some method of ensuring secure and authorized communication between components.
- Orchestration.
Pick a primary language you're working in. It's not necessary that all your components be in the same language but it does make a lot of things easier.
Choose things with Open Source licenses. Maybe not for the obvious reason.
- Even if you Open Source your stuff, you're still just an add-on to their thing.
- Then you have to deal with either telling the customer how to configure the thing, or you have to sell a prebundled thing, and ugh.
I'm a big fan of Go right now. Go is a lovely language. It's like C, if C had been rebuilt after thirty-five years of observation of where C worked and where it didn't work, and with a 2007 view of what resources were cheap and what were expensive, not a 1970 view.
(which is pretty much what it is)
I now resent having to write stuff in Python. Think about that for a second.
Remember back in the old days, before Perl, when if you wanted to do something you couldn't easily do in a sh/awk/sed pipeline, you'd reach for the C compiler? Go makes that seem like a reasonable idea again.
I find that the gap between, "I have something that is syntactically valid," and "I have a correct program," is consistently way smaller in Go than in anything else I've used, and I've used a lot of languages.
Static linking is really nice in a containerized environment, since you don't end up with the dozens and dozens of supporting packages you would need for an application in, say, Python.
- This does require that container rebuilds and redeployments are actually trivial to perform, of course.
(Yes, I realize the cool kids are on to Rust now; I haven't really wrapped my head around it but it feels like it puts a whole lot of the burden on making sure the compiler is really, really right. I'm not sure how good an idea this is.)
Your exposed endpoint should probably have a real TLS certificate, signed by a real certificate authority, so people's browsers don't get angry.
Your internal services do not need certificates signed by any place real.
You're probably going to want a proxy/load-balancing/distribution layer in front of your services anyway. I like HAProxy. Your mileage may vary.
DNS is pretty traditional for this. Not great if you have Java clients, since Java doesn't respect DNS TTL unless you jump through some hoops.
You may want to farm service location out to your data store.
First, do you need a lightweight, small key-value store, or something that looks a lot more like a database, or both? Choose wisely.
If you use a key-value store, you will likely end up choosing between etcd, consul, and zookeeper. My experience with etcd has been pretty bad (it's not very robust against sudden poweroffs), with consul has been good but I hear it doesn't scale super-well, and I haven't used zk.
If you need a real database, do you need a relational database, a NoSQL database, or both? Choose wisely.
Check the licenses. And if you pick something that is commercial but with a freely available community edition, do your homework to see whether fixes ever get backported to the community edition. Couchbase, I'm looking at you.
You can write your own, but why?
Vault (https://vaultproject.io) seems pretty good.
Yes, fine, Keywhiz (https://square.github.io/keywhiz/). But then, Java. Ew.
That's what TLS is for. Specifically, TLS with client authentication.
This is barely even its own thing. It's just OpenSSL with a thin wrapper around the CA stuff.
The motivation is pretty simple.
- Mutual-auth TLS seems like the right tool for that.
- If we burn the CA signing passphrase as soon as we're done with it, it's pretty secure.
Only the external-facing endpoint needs a "real" certificate.
It shouldn't be, should it?
The OpenSSL command-line interface is 💩 (U+1F4A9). Have you ever tried to:
This turns out to be a very tricky dance of the environment, the certificate signing options, and the configuration file.
How great is it that there's a Unicode Pile Of Poo character? As soon as Unicode 7.0 is widely adopted and everyone has U+1F595, I can consolidate my communications to exactly two characters.
Everything already extant I found fell into one of a few camps:
So I wrote uCA, after spending a long time figuring out the right set of incantations from a slurry of plaintive Stack Overflow questions, half-baked OpenSSL tutorials, forays into its so-called documentation, and sheer bloody-minded determination.
Apache Public License v. 2.0
Vault actually can generate certificates with SubjectAltNames.
Your application is going to be, at some fundamental level, about doing particular operations in a particular order. Some of these operations are gated on other ones.
There are zillions of ways to do this.
Ultimately your choice of technologies here is very application-dependent.
Let's grant that we want containers because of all that stuff about namespaces I burbled earlier.
Bare LXC is kinda tough and fiddly to do.
Doing namespace/cgroup manipulation directly through system calls would be much worse.
Rocket doesn't need docker, but it makes systemd your container-controlling process directly. Your call, I guess.
Docker pretty much works.
Docker can even be used like lightweight full-machine virtualization if you really want to do that, even if all the cool kids will sneer.
Any system is going to need to have some sort of credentials embedded in it; access control, securing communications, something. Even if you're using Vault, there's ultimately going to be some bootstrap stuff that ends up being plain old files in directories.
I recommend putting them in a data container.
Why not just put these inside whatever container needs them? Why the data container complexity?
It decouples credential persistence from app (and data) persistence.
Data Containers are the portable way to persist data across multiple installations of Docker.
A Data Container contains:
There are a few considerations when using them, which are not all obvious.
More to keep track of than an app that's just a process.
Talking to pretty much anything outside your process has to be treated as an off-box call.
- People will sneer at you for running a "fat container," but whatever. They're not the boss of you! (unless they are)
If you treat everything as living a TCP connection away, then you don't care where it's hosted, as long as you know how to get there. That's easy with linked Docker containers, but once you're running on more than one Docker host...
Networking gets weird fast with multiple Docker hosts:
- And then use iptables, a proxy, or Docker-running-with-privilege to hook those up to external well-known ports.
- Or, if you prefer, use DNS to return SRV records for your various services.
Ummm, yeah. You have a lot of options and I don't think there's a clear general-purpose best bet yet.
This space is still pretty immature. Like workflow orchestration, this is one where I think the right answer depends very strongly on your specific application.
Space | Forward |
---|---|
Left, Down, Page Down | Next slide |
Right, Up, Page Up | Previous slide |
P | Open presenter console |
H | Toggle this help |