“Listen, I don’t want to waste a lot of time here so I’ll try to keep it brief. I get you don’t have much experience supporting distributed mission-critical data stores at scale and honestly looking strictly at your resume I would never consider you for my data team, but…”
“Well, that’s a hell of a start there” – were my first thoughts while I was listening to the distant and a bit breaking data team lead voice on my cell phone.
That was part of the informal chat / interview round number 4? and the final step of the process that I started a few weeks prior.
I knew there were multiple SRE teams there at HBO, but deep down I hoped to dive deeper into containers and Kubernetes (who wouldn’t, right?). Honestly, I was pretty much fine with any team, except for the one that supports databases.
Even though I dealt with a multitude of data stores at FastCompany, including MySQL, Mongo, Cassandra, Redis and even Riak, it was always kind of painful… I had to figure out how to take backups and restore those in case of disaster. On top of that it was mostly semi-manual error-prone procedures accommodated with the stress of dealing with production data.
Playing with apps, containers and regular servers was much easier and fun – you start them and you kill them, no state and no hard feelings involved (mostly).
“…, but at the same time I was looking for a person like you for the past three years. (…Oh, boy…)” – the team lead’s energetic voice returned me back to reality.
“See, when people think about databases, they think DBAs. I hate DBAs even though I worked as a DBA for more than 15 years (…Oh, this is going to be good…). Traditional DBAs know shit, they just operate in their own little realm. Man, I fu..ing hate it when someone calls our guys DBAs! Call me a DBA and I’ll punch you in the face (…Ha?!..). Seriously. We are database reliability engineers, people who solve complex data problems with reliable bullet-proof processes and automation.”
“This sounds amazing, but I was just wondering if maybe there are other openings within other teams that…” – I tried to squeeze in.
“There are no freaking openings!!! (…Ouch!…) Listen, Sergey, I know you probably think working with Kubernetes is cool, right? Sorry to ruin it for you, but it’s a job for kindergarten kids. Think about it, once you have it set up you pretty much done, the only thing you have to do is hook it to some sort of auto scheduler. Nobody cares about this shit! What we do is much more interesting, we deal with data, loads of data. We replicate it, identify anomalies, work with dev teams on instilling best practices and stuff like that. We are the cool kids, Sergey!”
“Yes, the data stuff definitely sounds interesting… it’s just… I was curious if, maybe, it’s possible to transfer between teams later on if…” – I tried again.
“No, no, no and no! There are no ifs and there are no buts! Man, you are killing me here. Listen, Sergey, as I told you before we need your help. We need someone with fresh perspective and automation background to get us to the next level. I’m willing to take a leap of faith here with you and teach you what we do over here at HBO and I promise in a few years you will become stronger than ever. Worst case you will add a solid DRE background to your resume. At the same time, I totally get it if you decide to pass. All I ask is for you to consider it for a day or so. I know you didn’t expect to hear any of that and I know you want your freaking Docker containers, but what we do here is much more important than that. I’m extending my hand to you, think about it and let me know.”
Wow, that was a lot to take in. Initially, I felt sad and disappointed and was about to pull the plug on the whole opportunity. It was clear that I wanted to avoid databases, but why?
Then it hit me – I was afraid because of the lack of extensive knowledge in that particular area. My comfort zone was shaken and my status quo was under fire and I didn’t like any of that.
My first instinct was to run away from pain towards stability, predictability and comfort, but then, just like Frank Dux in the Bloodsport, I had a series of flashbacks from different points of my life.
I remembered how years ago back in school I made a decision to join a soccer school where I didn’t know anyone. It’s always tough to join a new group of people, to establish yourself once again and find your place under the sun, but when you are 12 years old boy it’s another level of hard.
Teaching myself how to code, moving from the parent’s house to another country, learning a new language, choosing a startup as my first engineering job, switching to Linux for all day-to-day tasks, starting daily fast typing exercises, getting Android phone even though iPhone worked just as fine, migrating to the cloud when it was still scary and unknown, buying the first house, becoming a father.
All of the decisions above made me feel uneasy and unsure, some more and some less, but all of them helped me to grow and conquer my fears.
Suddenly everything became crystal clear and I knew exactly what I had to do. That same day I called and accepted the offer to join the data team at HBO.
Lessons learned baby
Prior to joining HBO I, like most engineers out there, firmly believed that I knew pretty much everything there is to know, at least on the conceptual level. It’s hard to be surprised after 12+ years of professional software development, right?
If you think the same way I did, it will be hard for you to fit in, absorb new ideas and notice hidden patterns that might become useful as you continue on your journey.
For a long time, I believed that big companies are much less organized. It wasn’t the case with HBO, not in my team anyway. The engineering teams here are pretty small and everything we do is visible, so if you are planning to hide out and do your own stuff, it may not work out (sadly).
I didn’t know that a team of five DRE engineers could maintain hundreds of databases spread out through different technologies like Postrges, Oracle, Cassandra, Mongo, Dynamo and by that I mean full production setups including backups, multi-region replication, log collection, point-in-time recovery and alerting/monitoring.
Sure, there are some communications issues here in there, as the company is pretty big and distributed, but Slack and email ties it all pretty nicely.
Here I present some of the most important things that I learned over a short period of 1.5 years which I would never learn otherwise should I choose to stay at my old comfortable place:
- Planning sessions / Corporate Scrum / Agile development at scale – no matter how much I hated it initially (and I still hate it from time to time) there is just no way around a good planning work. It totally makes sense to allocate significant time and map out the next month of work, instead of jumping straight into coding. Especially important if you need to organize the effort of multiple teams.
- Small functional teams – at my previous company I did pretty much everything there is to do, ranging from database backups to application code modifications. Here we have teams organized around particular topics like: DRE (dealing with data), Telemetry (metrics and logs collection), Core (kubernetes, deployments, build pipelines), CDN (yes, there is a CDN team, simply because we deal with a bunch of providers and do anomaly detection on top of viewership data), Microsites (develop and support a bunch of marketing and promotional sites), Infrastructure (Amazon AWS, networking, IAM), Security and of course Dev Teams.
- Microservices architecture – there hundreds of different microservices here at HBO and the number keeps growing. There are fully dockerized boilerplates for different languages that come with standard tooling and tracing allowing developers to get from nothing to the deployed service in a matter of minutes.
- Heavy use of Jenkins pipelines and automation – pretty much everything here (apart from some of the database stuff) is heavily automated. There are tons of shared groovy libraries and sophisticated pipelines that make my mind spin every time I see it. Most builds are going through multiple stages, environments and accounts. Lots of CICD pipelines triggered by the merge to master after PR approval.
- Tight security and Single Sign-On layer – again, this is one of those love/hate items, but if someone would ask me if I would employ same tools at my own company of that magnitude, I would answer yes of course. You basically can’t do anything without your work laptop and cell phone (sorry, that means you can’t outsource your work…)
- Grafana dashboards and Prometheus scrappers at scale – pull over push for metric collection (use both for now, but leaning towards pull only), thousands of service boards, a lot of which gets created and destroyed automatically. Company level availability dashboards built on top of service dashboards to provide us with the quick view of the mission-critical SLAs and services / projects uptimes.
- Zero downtime multi-region databases migrations and cluster upgrades – using tools like AWS DMS for live data migration and shadow cluster creation, plus a host of other custom scripts and automated jobs.
- Rebuild vs in-place upgrades policy – not only for applications but for databases as well. No need to run chef-client on the machines and maintain state any longer. The downside here is, of course, even more automation (remember 5-7 people teams)…
- On call rotations and incident reviews sessions – at my previous company I was pretty much always plugged in to receive critical prod alerts, because, well, I was the one responsible for the infrastructure. Here we have a real handoff process around it, with escalation policies and lots of automated phone calls (sometimes in the middle of the night). Still remember the panic of going to a weekly shift as a DRE primary on-call for the first time… Guess what, over time it becomes a norm.
- Money money money – almost forgot about that one, ha! Here is a simple truth, after switching jobs my bank account started to grow again and thanks to generous 401K match I was able to accumulate the same amount I had there before, but 9 times faster…
I’m sure I could easily list a few more items here, like exposure to golang or dealing with Game of Thrones traffic, but the article is getting too long and I don’t want to sound like a broken record, so let’s just jump to the MOST IMPORTANT item on my list:
- People – it’s a special feeling to be surrounded by super smart and talented engineers. It kind of makes you want to push yourself a bit harder in order to fit in. The only tricky part here is that you have to step out of your shell and get to know people, but once you do it opens up the whole new world of opportunities. Think of it as a StackOverflow on steroids where you can get personalized real-time answers to pretty much any tech question!
Booom… That was quite a lot of stuff! If someone is still reading these lines, I wanted to personally congratulate you for sticking around. We are almost there.
Of course, not everything here is Pink and Shiny, there are quite a few downsides as well, like slow and methodical release cycles (possibly related to my db department in particular), some internal politics, not that many unsolved problems (it’s mostly optimizing the last 5%).
Which brings us back to the most important questions:
Was it worth it? – absolutely yes*.
Do I have any regrets? – absolutely not.
Did it help me shake my boredom? – yes, but only to a certain extent*.
Did I shake hands with celebrities or star in an HBO show? – sadly, but that would be a no to both questions.
* and what next?
It’s kind of weird to write What Next section while still being employed, but it is what it is. Sure, the jobs switch helped to alleviate my mid-life crisis pain, but it did not eliminate it completely. In the process, I realized that you can’t simply run away from yourself. You might silence the internal voice temporarily but it always comes back.
The only right way to deal with the situation is to take time and dig deep to uncover what is it that really bothers you. And so I did.
Over the last few years of self-search and self-discovery, it became clear that my strengths lie in the crossing between tech and entrepreneurial worlds where regular work is more of a distraction / deviation from my path.
Don’t get me wrong, I’ll gladly pocket all of the experience and knowledge that I gathered at this stage of my life as I’m sure it will come handy as I continue my pursuit of happiness.
It is true that everyone is different but here is the universal law – in order to figure out what you want or don’t want you to need to try different things.
Yes, it means you could fail.
Yes, it means your comfort zone may need to be stretched a bit.
Yes, you may go down to trenches in order to rebuild the foundation.
But I promise you one thing – it will be a hell of a ride and you will come out stronger in the end! That all that matters really.
If you feel like you stopped growing (and you know exactly when that happens), it’s time for you to try something new, don’t wait years like I did.
It won’t go away on its own.
Get clear on a new target, take massive action and repeat until you hit it.