This past week was a crazy week. I came back from being out of the office with COVID. Felt OK, just couldn’t go back in until either a negative test or enough days had past. Monday was my first day back.
We’ve been working on a major feature for the uber project. Something that hits milestones that get reported to big-wigs. Tuesday was supposed to be the first demonstration of it – didn’t have to work fully, but the point was to get the various teams to integrate their stuff. Tech lead had been saying we were looking strong. I walk in on Monday and now the tech lead is out for COVID reasons, won’t be in all week, and has named me as his backup. OK, the demo’s looking strong, right?
It’s been a “big demo Tuesday, tech lead says everything looks like it’ll be ready, come in on Monday from being ill to find everything broken and said tech lead is now out all week with me named as his backfill” kind of week. And it’s only Tue.
— Tina Coleman (@colemanserious) May 17, 2022
In the full thread, I rant about what I walked into. Everything was broken. Meaning, some things weren’t even lined up to work: weren’t being built in the right environment, weren’t configured for deployment to that environment, etc. Worse, the things we did have built and deployed were suffering from two hairy problems not seen in our dev environment.
First, DevOps had rotated the keystores and even though we had the right location and password, our code was complaining that it couldn’t decrypt the key. Turns out they’d added an extra layer of passwords that our code wasn’t set up to handle. Scrambled to swap to an alternate form of keys that didn’t have as many layers, which meant I had to redeploy our base infrastructure. I hadn’t deployed it the first time – the tech lead had – so I was wading through deployment scripts and properties files trying to set things up correctly.
OK, averted that problem. Had it in hand well before the demo on Tuesday. The one I didn’t have as well in hand: the infrastructure relies on Docker containers running in kubernetes. We don’t launch the containers – the infrastructure does. In our dev environment, everything worked well. In our demo environment – crash and burn. The container failed to start and complained about a permissions error. The tech lead had mentioned the problem the previous week and said DevOps had fixed it. What he hadn’t realized is that they’d fixed it for a particular running container, not for the infrastructure overall. Whenever a new container got launched (because we deployed a new thing or changed the settings of a thing), we’d experience the same problem. I ended up applying DevOps’ same workaround for each container that was ready for the demo, so we’d at least have something to show.
Demo didn’t fail, but only because it wasn’t held. A note went out that meeting was being rescheduled at 10:59. I was already in the conference room for the 11:00 demo, ready for either the firing squad or to bob and weave.
— Tina Coleman (@colemanserious) May 17, 2022
You can read the rest of the thread for the rest of my ranting from Tuesday. I was steamed. But today’s Friday, and by Friday, I have conquered. I found a better workaround solution for the not-running containers bit, one that doesn’t require us to hand-edit k8s yaml descriptors. I got all of the things we had working in dev built and deployed in the demo environment, tuned settings to hit correct endpoints, made sure everything was running well, coordinated with other teams on what Kafka topics to use, and wrote bash scripts that make REST API calls to set up test data and trigger calls that help us show our stuff in action. Oh, and did that while coordinating with other members of the team on pressing support concerns, as well as wrote new code. (That new code isn’t done yet, but…)
Successful week! The tech lead is coming back Monday, just in time for the rescheduled demo. Given that next week’s my last week on that particular project, a great way to go out – saving their bacon in style!