On September 28, 2021, Melissa Amaya and Jon Purnell from the Data & Dev podcast interviewed me and Aaron Blum, lead engineer of the security team at Cockroach Labs. The focus was to understand what it takes to start in a computer security career, either fresh out of school or via a career change.
Melissa: So we have with us today Raphael Poss and Aaron Blum, both from Cockroach Labs, which we will hear about shortly. I love the name and their logo is pretty cool too. We are very thankful for their time today. They’re going to talk to us about security, and databases, and how the two are connected, and special considerations that databases need to go for security.
Jon: Thank you for joining us today.
Aaron, we can start with you. When we hear the term “security”, it can mean many things for people, it can be kind of a broad term. So what does security mean when when talking about this kind of tech that you’re working on, with that basis?
Aaron: So I could run to the textbook definition, the whole CIA Triad, but in reality we’re looking to warehouse data and we’re looking to keep that data safe and available. Integrity, I think, goes with the territory. If you have a database and it doesn’t have good data integrity, you don’t really have a database. So security focuses more on those first two, for us. Specifically, the availability and you know, confidentiality and about that. It’s really a very interesting space because, as things go more global and more web, you looking to secure more perimeters. It’s a very different problem from what it was, even a decade ago, because it used to be that the database used to be inside a walled fortress, with its own datacenter and with its own network. And now it’s much, much more open. So there are more different things to try to secure than there used to be. So, it is a much broader topic.
Melissa: That makes sense. And how did you get into the field of security, what’s your background? And what do you do, currently?
Aaron: So I’ve always been a tinkerer. I like to poke systems and understand their rules and where they could be bent and broken, which let me towards more of the security things before there was a formal path of study. And so, once more formal routes of study showed up, I gravitated towards them, but I also found myself more often fixing or finding flaws within systems that need to be fixed for my customers and my clients. So I’ve been sort of close to that space for a long time before it was formally recognized.
Melissa: That’s interesting. When I hear the term security, it feels like such a big umbrella term. I kind of know what it means and I kind of don’t know what it means. If somebody were wanting to go that route, are there specific career trajectories? Talk to somebody who’s says: “Oh that sounds interesting. What do I need to know to pursue a career down that path?”
Aaron: Start with your passion. For me, there are a couple of things that were very interesting and I dug deep into them and understood them. And I don’t just mean from a security perspective. In order to understand how to break software, and secure software, you need to know how software is built. So don’t skimp on the basics. Don’t be afraid of learning to code. Maybe it’s not a glorious language. Maybe it’s not the most fashionable language of late. But find one that works for you, learn the fundamentals, and then you can go deeper, because you have that basic understanding.
Jon: That’s a great point. To put on a little specifics on that, are there particular languages that you think are more prevalent than others in that field?
Aaron: I think that anything that is quick and scrappy, that can get you up and working is useful. Languages like Python, they’re lightweight, they have a lot of modules and libraries that you can just off the shelf and there’s a lot of published information about them. But they also abstract you away from the way the machine does things. So a good crunchy language will help you if you want to go deep within a system. If you’re not interested in going deep into a system, it’s not as big a deal, but languages like Go will teach you a lot if you learn the language about how a computer also works. So it’s a trade-off, but if you’re interested in going deep, find a language that does that.
Melissa: “Crunchy language”. I really like that. I don’t think I’ve ever heard that before, but I’m going to use that.
So when it comes to kind of the day to day, either what you do specifically or what different security professionals might do, what does that look like?
Aaron: So, my day-to-day is a little bit frenetic right now. So I’m the lead and the lone security engineer at Cockroach [Labs] today (which we’re actually working on addressing). So I touch Products Security things, so that’s how are we going to make the product secure; I’ve worked with Raphael on improving our TLS experience within CockroachDB. I’ve worked with our Cloud Team on improving the Defense In Depth for that system. But I also filed incidents off of the production Cloud system, in addition to our corporate environment. So if a laptop is infected, we have run books, but if there’s something weird about that infection or we need to contain the blast damage, I get pulled in and I get to help it that. So, I am first and foremost a generalist, which helps there, but that’s also not a sustainable position to be in, which is why we’re staffing aggressively.
Melissa: It sounds like you are the beeper guy all the time. So are you like, constantly on beeper duty?
Aaron: Right now, we have a temporary unsustainable rotation of two. We’re looking to have that within the next 3 months out to about six people, so we have a real rotation. This is not common and not optimal, and we’re aware of that. A more traditional security role would not be on call like this. This is special for us for right now.
Jon: I’ve got to say, with Azure, in our own ways, in Spectrum, we have those growing pains. We’ve kind of reached beyond our capacity and then sort of catch up a little bit afterwards. One thing I want to follow up on was, you mentioned that you kind of call yourself a generalist, but that’s not always the best fit. I wonder if you could elaborate, especially for people that might be looking to get into this. They might have that question about how specific should they be or how general they should be?
Aaron: So I call myself a generalist but I have depth of knowledge in several distinct areas. So, there are some things that I know very well, but there are many, many things that I do not. So that breadth of knowledge is very useful, especially in situations like I am in today, but there are still some things that I know that I’m good at and I like, they’re aligned with the things I’m passionate about, and I stay sharp in them as best as I can.
If you just go a quarter inch deep, and 10 mile wide understanding, that’s useful for passing tests, that’s useful for getting certifications, but that’s not as practical or applicable in a real-world.
Have something that you know, you can do and can do well. And then that generalist bend will allow you to be more effective in that vertical and also help in other spaces.
Jon: One thing that comes to mind is the idea of known unknowns. If I can paraphrase, you want to at least be aware of these different aspects of the career, of the job, but you have a depth of skill in particular areas. Which I could see, if you were on a team, then you can complement each other. You know where all the skill sets lie, what you each have, your particular niches that you excel at.
Melissa: For the place you are now in your career, again for somebody kind of coming up, how important do you think a formal degree is versus any kind of self-study that can get you passing the certifications in and getting those licenses?
Aaron: So, I’m biased because I went the formal degree route—not in security, per se, but in crunchy computer science. So I have a theoretical comp sci degree and I loved it. But I was going to love that regardless whether it landed me a job. So that’s my bias from the front.
Depending on what type of security work you’re interested in doing, it is as varied as the backgrounds that you can bring to it. If you want to go deep within the way the systems can be secured, that formal education is helpful, but not necessary. If you can find another way to develop a better understanding of the system—I have met many folks who don’t have anything beyond, like, some years of high school, who have that understanding of the way the systems interact—that’s all you needed to be able to do that.
And I think that Tech is increasingly welcoming to folks with interesting backgrounds. We do a blind resume process process here at Cockroach [Labs]. So when I interview candidates, we evaluate them based on how they react to the prompt and how they solve the problem they’re presented with. And we have no knowledge of their background. I think that’s becoming more common, and I really like that. Well, I have that formal degree, I also had a fairly unconventional background in other places, and that probably would have counted against me in a traditional interview process.
Melissa: That’s helpful to hear.
Raphael, coming from probably from a little bit of a different angle within your role at Cockroach [Labs]. What are your thoughts on traditional degrees vs self study?
Raphael: It’s funny you should ask. Not only do I have a degree; I even taught at the university and so I am also very biased in that way. But the more interesting contribution I have to this conversation is from all the people whom I’ve met who have started their career without a formal education; because I found that very important in my academic career to understand what makes someone successful, and all the ways where universities are not that useful and perhaps could be reformed as well.
The truth of the matter is that when I look at software engineering nowadays, it’s a very popular field with a lot of people who are looking at it as a way to elevate their status, especially economic status, in many areas. There are many formulas that have started to be optimized for to bring someone to that point where they can have a job and have a good income. And what I noticed is that many of the trajectories that are now available to people who do not go through traditional education programs are optimized to deliver results that are available to companies who wants to employ software engineers. And that is shaping up people into problem-solving mentality and the ability to communicate well to understand problem statements that are vague, and shape them up into something that is actionable into programming activity.
However, these programs and these outcomes are not optimizing for deep understanding of the kind that Aaron was referring to earlier. Even so, I am 100% behind Aaron’s opinion that many of the most successful security people in the tech industry do not have a formal education and probably would not have needed it. It is not an easy statement to make that anyone who is going to go to another trajectory to become a software engineer would make a good security person. And that difference lies in both the ability and the desire to switch away from problem-solving mentality into analysis mentality.
The way to get there is a combination of curiosity about how systems work, many hours of self-study, and an ability also to break systems. And not in the sense of making illegal things happen, but more into the sense of subverting the primary purpose of a machine, a computer, a software system, and making it do different things that were not designed for. And the ability to understand that the tool actually doesn’t exist for the only purpose of doing the function it was designed for. And that there are many other functions that it can be used for. And that kind of transcending the software’s activity, or even hardware in some cases, is what I call “the tinkerer’s mind”.
And that is something that I found, from an educator perspective, is not easy to cultivate. It is much harder to cultivate than just training someone to become a good problem solver. So either you have it from nature or it needs a lot of additional self study.
Jon: On the point of self-study, are there particular resources that maybe you have found particularly helpful to finding content to stay sharp or to broaden your knowledge of security?
Raphael: There are two routes, I would say. I mean Aaron probably knows more about this. The two routes I know of, personally, are the white route and the black route. Sometimes you can mix them to get the gray route as well.
The white route is to go through online websites that are “Capture the Flag” quests, like sequences of exercises that are security oriented. Some of them are available on public websites that are run by hobbyists, and some are actually run by government for preparing people to get hired for government services. These are very good activities to get trained. And there are also certain books with also exercises in those books, if someone is comfortable reading. And then another activity that I think is very practical and also very approachable: many cities beyond certain size have workspaces, work labs, community spaces, where people can go and do programming activities or computer activities, together with other people. And many of these spaces have a population of people who are security-minded. And then it becomes a community of peers where information can be exchanged and suggestions can be made about exercises, or activities that promote this kind of expertise. That’s the white route.
And then the black route is breaking systems. Like, actively searching, both with personal equipment and online equipment, ways to subvert systems to do things that they were not designed for. Especially in all the ways that are neither documented or, in some cases, not desirable. And then there are ethical ways to do these studies and then they are less ethical ways to do these these studies, but when it comes to educational outcomes, both are equally good.
Jon: I see on the white route, hopefully not straining the analogy too much, from my data science background, you have [?] that is a good source for challenges and ways to connect to a team to test out your skills and compete with others in and kind of iron sharpening iron there. So it sounds like there’s communities like that for going through exercise like that to bring those skills.
Melissa: So to go out of order for a moment, from a traditional interview scope here. I think I skipped the “what is your background?” for both of you. So can you both give a brief “Who are you?” We have your name ready. What do you do at Cockroach [Labs] and and whatever the highlights are from your background that are most interesting in your mind to share?
Aaron: So I’m Aaron Blum. I’m the lead security engineer at Cockroach Labs, which encompasses a lot of things today. The background that’s probably most interesting, I’ve spent four years working for the Department of Defense, that I can’t talk a lot about, but it was some very interesting problems at scale. Where at security conferences there’s always this looming “oh well, a nation-state and a Pringle can can get you completely compromised”, and looking at threats and trying to secure systems against genuinely, well-funded nation actors is a very very different problem set from what most businesses see. It was very informative and really, really hard and a lot of fun. Other than that, I pranced around public and private sector and learned a lot in both.
Melissa: Has your professional working life, has it always been in the realm of security in one way or another?
Aaron: Depends on which resume I give you. <chuckle> So a lot of the basics, I picked up while doing jobs that did not have “security” in the title. I’ve worked to support teams that have built systems with data a scale and, as a result, I’ve a much better understanding of how those systems can fail naturally, never mind maliciously, and that’s helped me in securing those systems going forward.
Melissa: That’s neat. And how about you, Raphael?
Raphael: My background is in academic research and my specialties are computer architecture and operating systems. Before that, programming language design and compiler technology. And this kind of work has taught me that software is a construct made by people to answer certain questions, but usually any of those constructs has many more possible purposes than what it was designed for. Once you look at how the tools that are created by people to make other tools work, you can see that, most of the time, people make tools that have many other possible activities or usage than what they were intended for. And that’s a feature, in fact; that enables creativity in domains that were not envisioned for initially. But what’s really remarkable is that, in the environment where I was working as a researcher, I was, of course, surrounded by people who were exploring ways to use computers and software that were not planned for. And they were not doing that from a security perspective, at least not when I was working in that environment initially. But it came to me very clearly, quickly, that what many people consider to be exceptional behavior, like program crashes or there is an error and so on, is actually pretty common. And when a program stops with an error, that doesn’t mean that the Universe ends. Usually there is a remaining state after that error happens. And that state can be looked at, for troubleshooting purposes for example, to understand where they’re coming from. But in many cases in the running system, especially in a distributed continuous server environment, when there is an error that occurs and there is a remaining state on that system afterwards, that can still be used for… purposes … And depending on how the software’s constructed, the remaining state might actually not be working very well anymore and cause subsequent cascading failures, or, perhaps, “interesting” behavior.
Now, my interest in this was not security. It was just understanding the changing semantics of software in the face of what other people consider exceptional situations, which in my opinion are absolutely not exceptional. And as I was doing this kind of work, I was, of course, meeting people. And then by bit, I started to meet people that I found very interesting that were working in the field of security. Including people explaining that all these failures can be exploited to get access to data that was not meant to be exposed, or to manipulate the software into doing things that users would preferably not want to see happen. That opened my eyes to the fact that these things I was looking at from a purely, I want to say, “contemplative” perspective had actually a purpose for certain people. And also had cost functions associated, what people know as risk analysis. It’s like, basically, what are the unforeseen costs that come from malfunction or misuse or use in a different direction than intended. And those costs are complicated to analyze. And so very quickly I started to understand there was a science behind this, or at least a couple of methodologies, and that probably there were people that were paid to do those things. And I discovered security in that way, incrementally.
But there is something that I do want to share though. Now I work with Cockroach Labs, I work with Cockroach Labs in multiple roles, a bit like Aaron but in a different domain than security: I coordinate certain teams, I do architectural work, I do advice for product management, but the essence of my work is really to educate people to the fact that the decisions they make or the designs they take for software are going to have unintended consequences when they look at the area that are not their primary interest.
Some of the things I look at security oriented, and then I talk to Aaron; some of them are usability oriented and then I talk to other people.
And it’s remarkable to me that many people in the industry, including people who are recognized by their peers to be very advanced in their understanding of systems and their productivity, and so on, typically have a blind eye for unintended behavior. And the fact that we always need to talk about this again and again, and again, is a reminder that security is just not optional.
That’s my contribution for that question.
Melissa: That’s great. The term “the happy path”, right? When you’re solving some kind of problem.
Raphael: Yes, correct.
Melissa: The default is, I’m just going to go down the happy path. Everything works nicely. There are no side effects that I didn’t expect. There’s no errors. And maybe it’s like the innate optimism in humans that we don’t want to think about the unattended consequences and the possible negative ramifications.
I find it especially interesting that your research interests, which were purely, I think you said, “contemplative”, circled back around and it had a industry usage. So many people that go to PhD route—John did a PhD, I’ve pondered it for many years, Aaron I wouldn’t be surprised if you have one, I’m not sure—but it is often just research-based, these big lofty ideas. And then the question is, is it ever actually applicable? So really neat to hear how just your innate desires, kind of like Aaron was saying “pursue your interests”, you realized that it has a very direct application to the field of security.
Jon: One thought, to come back to a comment you had earlier Raphael. You were talking about the difference between problem solving and analysis, and I think it you are kind of touching on that again there. If you could dive a little bit more into that, how you see the difference between those two skills?
Raphael: That is an amazing question. I can’t recall the number of times I’m having that discussion with people around me. When I was teaching, that was one of the key questions that was challenging students. The way I look at it is that a problem solving approach, or solution oriented approach, is a methodology where steps are taken by someone until a satisfactory solution is reached. And then, at the moment a satisfactory solution has been reached, the work is complete. But at that moment, the only thing that matters is an alignment between the solution that’s usually built and the requirements that had been spelled out to start with. And then nobody really cares about what happens afterwards, as long as this thing continues to work as spelled out in requirements. And from an intellectual perspective, a solution-oriented approach is constructive. Like, you start from wherever you start and then you add bits, or remove bits, until… At every step of the way, you compare where you are with where you want to be, and then you add in the direction you want to be, or you subtract to get to the point you need to be. And then, at the moment you’re there, you’re done. That incremental nature of the work is useful because it allows someone to ignore a lot of unnecessary considerations and parameters. It makes it possible to have someone to optimize their productivity to have an asymptotic approach to their solution where they can even, possibly, if they’re good, estimate their time to completion.
Now, an analytic approach, what I call an “analytic approach”, is where the goal is not to reach a solution or to construct something towards certain requirements, but instead to ensure that the participants have a more complete understanding of what is going on—either in a solution or in system or in a problem, sometimes. And the way you measure an understanding is not by seeing whether it matches requirements, because usually when you increase understanding, you don’t know what you’re going to learn yet.
So what happens when people study and do analysis, is that they are constructing a model, like a mental model of how things work or how things are going to possibly work. And then that mental model is going to be more like detailed. And then we can evaluate it by comparing the predictions made by this model—like, you can run simulations, you can ask your model questions and see how the model, using its rules, would derive an answer from it—and then compare those estimations from the model’s perspective with the reality you have available.
Now, that is not an iterative approach. There is no asymptotic model to get there. But there is an optimization function which is, “is the model accurate for the questions we’re asking it?”
Now, in software engineering, this kind of analysis is usually constrained to only the phase where we going to choose between multiple approaches; and then we use different models to see which model is going to give us a solution at the lowest cost, for example, or with a better performance. But when we do system analysis, we don’t look at constructive approaches; it’s not about comparing different approaches anymore. It’s about asking “what would the system do in this or that situation?” And in many cases we don’t actually want to try it out in practice because I might endanger the system or bring it to situations that are undesirable.
So, the real question is, okay “How can we imagine how the system works?” “How can we model it in a way that is going to give us strong predictive power on hypotheticals?” Like, “What if the component fails?” “What if a network request is made to that area using that protocol?” “What if this person is doing this and, at the same time, another person is doing that, and they access the same field in a database together, which one wins? And what is the resulting state?”
That system building in a mental image is an intellectually very different exercise than solution building.
And that’s what I call “analysis”, and it requires three different set of skills.
One is the ability to build models and use them, like, the understanding that there is something that is a model that’s separate from the physical world, and you can also ask questions to that hypothetical thing. That’s one skill.
The second skill is the ability to choose the questions to ask the model, to not be led astray in an area that is not too fruitful. There is a lot of—that’s a problem, of course, with model building because, contrary to solution building, you don’t have exactly a result endpoint in mind, so there is some kind of path finding that is its own skill.
And, the third part is communication: to explain to other people what you’re finding in your understanding. Because all this work, of course, is useless if it doesn’t lead to new knowledge for a community of people. Like, if just one person knows, it doesn’t really help anyone. The company or the organization must know and the communication is to ensure that action is taken, if action needs to be taken. And that means a very good ability to communicate the shape of the model, its abilities, the predictions it’s making and so on to other people.
Jon: And so I imagine that are important skills to have. But that, in the day to day, that you also, you would want to keep time for that analysis. I could see, speaking from personal experience, I can see where the measurable incremental, estimatable constructive work can sometimes push out room and space for analysis, so I can imagine, particularly for security, that it’s important to keep time for both of those activities.
Aaron: Ticket work is antithetical to developing that model.
Melissa: Could you elaborate on that? Because the folk that hopefully will be listening to this may not even know what a ticket is, quite yet. So go down that rabbit trail for us.
Aaron: So closing tickets is the idea that usually you have some level of Ops work, or operations work. That’s going to be event-driven or interrupt-driven. That might be: “Hey, we’ve got a unexpected authorization event for your production system” or, you know, “Someone just got on a plane but left the laptop at the airport.” And you don’t control when these things are going to occur or how they’re going to reach you. And in security, you will often find yourself straddling the gap between dealing with that stream of Ops work tickets, and taking the actions that mitigate, or at least reduce the harm of those events while also trying to balance the model building, as Raphael eloquently explained.
It can sometimes be give or take; you can sometimes find yourself too far at one and not the other. Maybe you’ve spent too long spending a good understanding of your production system but then have to deal with a Bitcoin mining incident because you didn’t secure your AWS keys. So it’s a balance and finding that is challenging. If you figure out how, let me know.
Second part here.
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.