On September 28, 2021, Melissa Amaya and Jon Purnell from the Data & Dev podcast interviewed me and Aaron Blum, lead engineer of the security team at Cockroach Labs. The focus was to understand what it takes to start in a computer security career, either fresh out of school or via a career change.
Melissa: What you just said brings to mind, I’ve learned, the answer to pretty much any question in Tech is, “it depends.” That’s like the standard story to any question. And I would imagine, just like any other area of tech, security is all about the trade-offs. Given infinite time, given infinite resources, we can do anything, but that’s not the reality. So what are some of the main trade-offs on a day-to-day and week-to-week month-to-month basis that you are working with?
Aaron: Always understand the business value of what you’re doing. At the end of the day, security is a business decision. You may believe it’s something else. You may believe, it’s part of a holy crusade; it doesn’t matter. At the end of the day, you are protecting either Business Systems, business data, or you know, the customers themselves. You need to make sure that any effort and energy that you spend to do that is aligned with the risk that you’re helping to offset. We could implement, you know, 16 different ways of authenticating to our system. But that may bring the cost of interacting with their system up such that the customers don’t want to do that. I don’t have the numbers today, but I know in retail, there’s a number they call shrinkage which is just general loss. That’s it’s a low single-digit percentage, but it’s an accepted single digit percentage, because they know that getting it below that would cost more than the loss of those assets.
Melissa: That makes sense. That also makes me think. So, my school, where I’m getting a program, just recently went to a two-factor authentication for our Gmail. So I can’t just log into my Gmail anymore. I have to get up my phone, pull out an app, find the code, and turn it in, and it’s a minor annoyance, but it’s an annoyance. And that, in an effort to better secure my email, and ultimately the school, they’re putting a burden on on the user. So how do you weigh, maybe even calculate, that trade-off? It’s going to be burdening to our customers, but it is necessary because it’s the right decision?
Aaron: So some of these things are measurable. It may take time and energy to do that. And to weigh the cost of, say, loss of confidentiality of a user’s email vs the perceived annoyance of enacting these controls. In general, if your usage of MFA is impacting your everyday usage, then the system may not have been implemented in a way that is optimal.
I’ll jump from MFA for a moment to something related. One of the first questions I got, when I joined Cockroach Labs, was…
You know, we launched Cockroach Cloud, we really want customers to use it, but one of the primary stumbling blocks is, we require them to set up an IP allow list. If you’ve ever worked at Amazon or GCP, they often want you to set that up: “you’re going to put resources on the internet, we recommend that you don’t allow the entire internet to talk to it.” And that’s been pretty much accepted as status quo.
But the first question I got from Product was, “What’s going to take for us to remove that allow list requirement?”
And so [I was like] “why do we want to do that? You have a database. A database intrinsically holds like, sensitive information. It’s a huge attack surface. Why do you want to put that on the internet?”
They said, well, “our customers are struggling to connect to it.”
I said, “really? They can’t do this? Can’t we just help them?”
And I found that I didn’t understand the problem space that the customers were coming from. They were trying to use systems like AWS Lambda and Lambda comes from different IPs, all the time, and so they can’t say, “oh, this IP is where my function is coming from”. And it forced us to think about that problem from a usability standpoint. Say, okay, well, we do need to enable customers to interact with our system. What can we do to ensure that we’re protecting against the right threat. And so the right threat wasn’t “internet at large will connect to your system.” The threat was “some concerted set of attackers are going to try to either brute force access” or “they’re going to probe the system”, they’re going to do other things.
And if you take a step back and understand that threat, then you can sometimes right-size your solution. It’s hard and it’s expensive to come up with a solution. And sometimes the right answer is to pass the pain to your user base because you don’t have the resources to solve it a different way for now. But in an optimal world, and speaking sort of aspirationally, your security controls should align with your users’ use and expectations, because there are many technical controls that will enable that, that don’t require pain for the users. I aspire for security, in general, to be invisible to the users, or as light as possible and easily adapted to their workflows. It shouldn’t be a yes/and because users are innovative creatures and they will find ways to optimize the things they don’t like and these optimizations will often be circumvention of your security controls.
Melissa: That’s really enlightening to hear that side of those decisions. And in your answer, it’s clear, you’re not in this isolated box, hacking away on a machine, trying to find vulnerability and fix them. You actually have to have the ear of, listen to the customer and understand the customer side of things.
Melissa: Is there any realm for the complete stereotypical introvert, where they are just hacking away? Or is there always a need to take customer feedback and make sure you’re implementing real-world scenarios in your solutions?
Aaron: There is room for both. One of the things that I’ve been trying doing over the years is being that [?] that can lay eyes on who can speak the different languages in the different domains. However, I’ve had a lot of fun, building teams that had individuals as you described, that where deep in their knowledge there. “Give me a nice well-defined problem and I will crunch this.” I remember there was a fairly senior individual in one of my teams that had a background in micro-electronics and I needed an optimization for a system and I said, “here is what [?] is going to look like, here is what I need, I need it to run in approximately this amount of time.” He looks back and says, “you know my background is micro-electronics” and I was like “yes, it’s a fixed memory, fixed throughput, fixed time problem” He’s like: “Awesome! I’ll have you something in two weeks”. There’s absolutely room; it’s just finding where you’re going to be comfortable and then finding a team that will allow you to do your best work.
Melissa: That’s neat to see that there is different avenues for different personalities and strengths.
Melissa: Raphael, database historically, there’s this giant room, it’s all physical, you have these servers and you’re probably in the same building or the next building over, it’s all wired. As Cloud computing has exploded, and Cloud database usage has exploded—that’s what CockroachDB is—how has security about database has changed? What are what are the new risks you’re looking at and trying to mitigate?
Raphael: Thank you for asking. There are multiple angles, I think, we can take to look at this. So from an end user perspective—understanding that the end-user here is not the user of the apps using the database, but the programmer that is creating an app that uses the database—there is something interesting happening with the Cloud, and that is that the traffic is going over the internet, and that’s new. In the past, apps were physically located next to their database and there was a very reasonable assumption that the network was secure because it was physically isolated from everything else. And good practices would suggest you have a firewall that isolates the application and database network from everything else. Now, with a Cloud solution where the application and the database lie in different machines, there will be traffic in between that is possibly routed through the internet. And that means there are actors in the middle that might be interested to look into those transactions and perhaps influence them. That’s new.
That is something that the application developers unfortunately have to know about. There are two reasons why they want to care about this. One is because they care, as providers of the service, that their end users are not going to experience problems. And so they want to safeguard their database activity against at least errors and, perhaps, malicious usage. And also probably they have confidentiality expectations from their end users—they want also to keep their data secret. And so they care about this.
And then the other thing they care about is predictability. Even if they didn’t care about confidentiality, security and other things, it’s very onerous in the design of a system to have a database that disappears from under your application, or maybe where some random activity (by accident or not) is deleting data or tables, or other things you would like to continue to exist.
And so this expectation of stability and reliability is actually a security concern because, of course, some of those reliability issues, when they occur, can translate into security incidents. That’s the challenge; that’s a new dimension that is going to be part of application design for the end users—for the developer end-users—that they didn’t have before. So it adds actually a layer of complexity in the creation of client apps. And that complexity is in fact a cost to using a Cloud database that you don’t have if you were to use a local database. So hopefully, that cost is going to be offset by other benefits of using Cloud systems, including, possibly, higher reliability, the ability to use the services of the provider for backups and other things, but it’s still new. And that needs to be explained, that needs to be taught, that needs to be considered. And for us as a company, Cockroach Labs, we do have now an interest to lower the cost for the end-users, by creating client-side frameworks, or SDKs, or other tools that will simplify and smoothen the experience of creating secure apps without having to understand all those things.
So that means more complex development costs and also perhaps an incentive for the database provider to create technology that will ease that cost for the end users, so that the technology becomes more attractive.
That’s like the business, the “user” side of the story.
Now, the other side of the story that interests me a lot is impact on the people who built the database service, like the members of the Cockroach Labs team in my case. Because, now, we have a new situation, as database programmers, and that is that we need to build all these security controls where a regular old school database would probably not have cared about that. And that is new work, that is a new dimension of the software engineering practices. Also, Cloud services fail in many different ways due to reliability—and again, I said these reliability concerns are actually security concerns—and every time we talk about reliability, we need to take a security lens to look at it and see what are all the ways security problems can occur as a result of a fault or a malfunction. And this are new exercises for the technologists inside our organization.
And next to this, we have the evaluation of the work. Like, is our product a good product. Just looking at whether the queries are satisfied and whether they are performing quickly is not any more sufficient to decide whether our service is adequate. We actually do need to run those security analyses: is the service we’re providing over the internet a secure service? And for a database programmer, that is very outside of the regular skill set. That is something we need to teach, that’s something we need new quality controls for. So, the tech leads and the managers typically don’t think about this when they get hired, so we need to also train the organization to do those exercises and they come really as an additional expense that old school database would not have to consider otherwise.
Aaron: I would like to jump in and add something to that, if I may. There is an additional angle, which is also support. Classic database support was, you gain access to a customer’s environment under their supervision, and you help them troubleshoot things and then they close out the access and you go home. But, as we moved to a cloud-hosted database product, our support engineers need the minimum friction to be able to get in and get access to the customer environment and troubleshoot things. So the default is, “yeah, give me access to everything, that way, if I need it I can have it”. But there is no reasonable customer that is going to say, “yeah your support engineer can have access to my database and everyone else’s all the time.” So it’s an additional thing that was not a prior engineering constraint that we must evaluate for the system as we build it.
Melissa: That makes sense.
Melissa: And I’ve got one question for you and then it’s kind of open ended. For somebody who’s like, “yeah, I want to pursue the security route, I’ve either gone through a program or I’m hacking my way through and I’ve got some reasonable skill set. Where is there an entryway?” It feels like—very ignorant feel—you have to already be pretty established in your career to land something security-wise. So where can the Newbie go to get their foot in the door on a security trajectory?
Aaron: So if you’re not already working in the security domain, or if you’re not already working, I recommend any technical type work; it could be website development, it could be scripting, to just continue to build an understanding the systems. Outside of that, looking for anything that’s even security-adjacent. There is a tremendous amount of self-study available, tools and playgrounds to work with, that didn’t exist when I was learning this stuff. You know, go look for things, like exploit exercises. They’ll give you intentionally weak VMs. Maybe you can’t solve them right off because you don’t know the Linux permissions model or how to abuse it. Go find a walk-through! There is no shame in finding a walk-through and then reading the walk-through and understanding why you could exploit a VM in this way, and it will further your understanding.
Melissa: Nice. I’ve got my to-do list for this weekend.
Melissa: Any last thoughts you want to share before you take off for the day?
Aaron: Thank you for this. It’s been interesting to me to be put on the spot for these things, because it also reminded me that, as I sit in my security world and my security perspective, some things that I reach to are this because okay, you can become too biased in your own domain, in your own fiefdom. And sometimes it’s good to have a reminder of someone that doesn’t think about security as a first principle. How would you explain that? This has been a really good refresher on that. Regardless of what you choose to do, step out of your comfort zone as often as you can afford to.
Melissa: It’s great. Thank you so much Aaron.
Aaron: Thank you so much.
Melissa: Great. So going back what you were just sharing the comment you made that software engineers are not coming with this security background because it’s kind of a new field. Do you see that as an area that should be added to traditional educational programs or should that be more on the job learning side? What are your thoughts to bridging those gaps?
Raphael: I like the phrasing of this question. I have thought about it and the answer is, I wouldn’t add security as a field in education programs or traditional teaching activities. The world of security is so broad that just talking about it this way would not bring you actionable coursework or specific talking points during a lecture and so on. The focus, I think, would need to be talked about more during either bootcamp activities, or self-study activities, or education programs, is this analysis standpoint I was talking about earlier.
And so the way this can come in self study or coursework and so on is asking questions of the shape: “What if?” That is really the essence of the situation here.
When we give course work or when we find exercises, usually there is like okay, “create a program that does something”. And so there is some spelling out of what is the expected output and then there is perhaps, sometimes, a discussion: “okay, you have an alternative choice to make between multiple solutions, what are you going to choose? What are your arguments?” All this should continue to exist; they are valuable.
One thing that we don’t have much today, and that would prime people to a security mindset, is the question “What if?” And that is the question: “Okay, outside of the bounds of the programming exercise, after you have constructed your program, what if you were subjecting your program to that kind of input? Or, if your program is reading from a file, what if the file is corrupted? How would you program react?”
Prime people to explain, usually in prose, what their understanding is of what they’re doing, to someone who is not there yet. Someone at their level, so that they can assume the language is the same—like the prose, the English language language, if that is their language, is the same—but the other person doesn’t knows the technology yet. And make that person exercise, in the phrasing of those explanations, those “what if” explanations.
There is an adjacent activity, that’s connected to this, which I think is also possible to integrate in programming exercises. And that is to start doing software engineering for error conditions, for all the things that are not on what you were calling “the happy path” earlier.
Many of the programming techniques that are currently taught for beginners, or perhaps in the coursework for exercises online, or challenges, programming challenges, always try to promote the idea that errors are exceptional and suggest the programmer to take a shortcut: if an exceptional error happens, then just terminate the program. Or, print an error message and exit with certain exit code. And the common understanding of an exceptional situation is the entire program will stop, and that is the end of the story.
We can change this by evolving the programming activities to say, well, if an exceptional condition happens, maybe the system needs to continue running and then we need to have a guarantee—that we can then exercise in testing—that the state of the system is maintained to a consistent situation. So that further requests are not going to encounter unforeseen situations. That is a new kind of phrasing for the programming specifications. Like, don’t stop the program when something happens, but instead handle it, and make something meaningful happen. And then, if the program is to terminate, then there is something that’s possible as well, and that is to ask the programmers to create unit tests that exercise all of the possible error paths in their program.
Like, can we take a program that doesn’t have error handling today, add a unit test that demonstrates that the lack of error handling is going to create undesired behavior, and then modify the program to add the error handling where it was missing. That is a very valid, legitimate regular programming activity for someone who is in the constructive problem-solving mindset. And by having more of those programming activities, we are going to prime people to think about those cases in their regular day-to-day programming. And that additional thinking is going to be sufficient to put people on a better mindset to understand security problems.
Jon: I think this is a great point about the ubiquity for the need for thinking about security.
Jon: One thing I’m not sure we touched on, but if Cockroach [Labs]—I love your focus on databases—I think that technology is a fundamental—without repeating my terms—is a fundamental technology for just how the internet works, right? Because a lot of what we do online is some sort of transaction. Whether it’s exchanging ideas, or exchanging resources, or what not. So databases are needed for that. So is there—to kind of give a sense of of how many different applications there are for databases and all the different areas—would you be able to enumerate a few to illustrate that broadness?
Raphael: I think I can answer your question in two ways. I don’t know which one you would prefer, but let me just try them out nonetheless.
One is to enumerate technologies or approaches to technology and one is to enumerate mental models to understand how to think about data and data storage.
Let me start with the technologies. I think today, in the database field, we can do a reasonably good job of explaining what’s going on. And that is the separation between technologies that are there to support transactions and sources of truth for businesses or record systems where people matter about which data is stored about them, or for them, in the system.
And then, separately, data system that exists as a support for other systems. And you can think here about caching technology where there is a copy held locally for something that’s complex to compute, so that the next request will be accelerated. Or, maybe, a database that contains a copy of the records in a store, like all the previous acquisitions or shipments or whatever, for the purpose of doing a data analysis. So, those “secondary stores” (I call them), do not have the same constraints on their design because it’s much less a big deal if some data is lost. (It is a big deal if the data is compromised, like if there are illegitimate accesses to it, usually.) But at least, the loss of data or inadvertent modification is less of a risk for the business.
So in that way of thinking about things, you can say, well, there are SQL databases or relational management systems with support for transactions; these will be in the first category. And then some people will even build their records of truth using a NoSQL databases, like with documents formats or document schemas that are more loose. As long as those systems support resilience and consistency and data corruption protections, they can usually be used for the purpose of data retention, as a source of truth.
In the second category, we have a much wider diversity of systems. Some of them can be called databases, I think about…
Sorry, it’s complicated. If I start giving names here, people are going to be saying I’m biased, those system can also be used in the first category. And I’m going to get a bad reputation. <chuckle> But if you think about many databases that can be configured in a weaker consistency model for the purpose of additional performance, usually, those weaker consistency models have been designed for those secondary system use cases.
I’m not going to give any names. People are going to have an opinion about things and I don’t want to land into troubles. Let’s put it this way: CockroachDB has been optimized for the first category, and much less for the second one.
So that was the technological standpoint: records of truth versus support systems.
And then, there is another separate perspective to understand data technology nowadays. And that is the way that people relate to the location of the data.
You have systems where the physical location where is data is stored matters.
So, in a system of truth, location can matter: if you have a regulatory environment that requires data to be within the boundaries of national borders, think about medical records that need to typically stay in the same country. For a support system, usually the physical location of the data matters because of the performance that’s going to be associated with it. If you have a caching system, you want to have multiple copies in different geographical areas to ensure that the accesses will be at a low latency wherever you are.
So those systems where the physical location is sensitive, they have a universe of design considerations attached to them. And that is a universe where people using and building those databases will be forced to maintain, in their head a least, a mapping between their logical data domain that they have in the apps, and the physical location—in regions, servers, depending on the abstraction that you use. In that universe, there will be discussions about for what is called “sharding”, which parts of the database, which shards of a database go to which server or which region. And that is a layer of complexity that comes from this interest in keeping a close understanding of where the data is in the world.
Separately from that angle, at that higher level, you have the angle where you can extract the location of the data entirely, where the application does not need to know. And that means that the database can then take initiative to multiply the data, or reduce the amount of copies, or maybe migrate data automatically from one place to another without informing the user or the client application.
Those used to be called Cloud Technologies in the past, but they’re not anymore. Because nowadays more and more people want the Clouds technologies to do the physical locality knowledge. So just saying “cloud” is not sufficient to understand that anymore.
So the new word that’s attached to that is “serverless”, where the word “serverless” is basically to say, “you don’t know where the server is.” Or you don’t even know that there is a server. It doesn’t matter if there’s one or many or zero in fact. You access your data with an internet name that is resolvable wherever you are on Earth and then this system will give you your data wherever it is, or, maybe, go to different copies to it. And that allows the designers of client applications to ignore the actual implementation of the database and how the data is located in the world.
And so to recap, again. In the first way of thinking about databases, you have record systems, sources of truth systems vs support systems, and these have different design constraints. And then at the other level, you have systems that expose physical location of the data because it matters and you have systems that remove this understanding, because it doesn’t matter. And that creates a two by two table. And those 4 cells, if you were to present them as a table, would have different technologies, different brands and different configurations for certain databases to satisfy all those needs.
Jon: That’s perfect. That was a great way to cover that broad topic. Thank you.
Melissa: Yeah, I learned a ton in that last little bit there. Really interesting.
Melissa: Is there anything that you want to elaborate on or anything we didn’t cover that you really wanted to share?
Raphael: I would like to come back to something Aaron said, because I think it’s super important and it’s valid. That is, people can get into security without having studied or researched security first. Like, you can get into security from an existing training, an existing job without having to go through a retraining program. That is a valid approach. And in fact, something that maybe your audience will want to know is that there is currently a huge demand for security specialists. That is an area where the hiring is exploding right now. There are societal and circumstantial reasons why that is the case. (I don’t know that the purpose for today.) But nevertheless, the job market is very active, and that means that there are a lot of opportunities.
Now, unfortunately, people who hire for security, they’re still doing hiring using matching keywords. And so it’s very difficult to tailor a resume to the particular keywords that are used. Usually, the projects that someone has done don’t really match. But it doesn’t mean that it is unattainable.
What is the case is that people can join an organization that is hiring for security in a different job and then gravitate towards the security domain and then be hired internally for security. Like, move sideways in the organization into a security role. And that is pretty common, in fact.
And then the organization you want to have for your own training or your educational, your career path, is find job opportunities that are technical in nature, that allow you to spend time with hardware, with software at a treasonable high level of depth. And if you don’t have that to start with, find a group that does this and find a way to be taught by the existing experts.
And then, as you are doing that education in the context of a job that is not security related, talk to the security-minded people in the organization. Tell them that you have interest in there. Ask them the questions that are going to be interesting to you. But also listen to the questions they have, that they would like to be answered by a security expert. And then do the research that’s needed for that work. Even though you’re not the security expert yet, you can listen to these questions, take that as a suggestion to do research, then do the research and perhaps if the question has not been answered yet, proposed your answer or the outcome of your research to the people who are interested. And that will create an aura of security expertise around you, even though that is not your primary role. And as you get more and more acknowledged in that growing expertise, or curiosity, people are going to look more and more at you as perhaps a consultant or advisory role for security-adjacent questions. And any time these conversations happen, you can record them for your own use later in the job interview. When someone asks you, “what is your security expertise?” you can say, “well, I have been used as an advisor in my organization while I was a software engineer, because I was interested in those things”.
And that is the right way to advertise for security expertise, when you transition into a security role. Because, in that case, is it OK and it’s perfectly understood that your primary position was not security-related but you can have gained that expertise that way.
Melissa: That’s really excellent advice. Thank you so much for that. I think that’ll be very helpful for anybody interested.
Jon: As you said, it speaks to Aaron’s point. Your first or several roles may not have “security” in the title, but that doesn’t mean that you’re not on the security career track.
Melissa: Do you have last thoughts that you want to share?
Raphael: Like Aaron I would like to thank you for organizing this. We need more of this kind of content generally on the internet and in communities. This is a topic that is currently timely, and as I explained earlier, I wish that security concerns or curiosities were more broadly in computer engineering / software engineering environments. And so your project here is going to help with that, and for that I am grateful.
Melissa: Thank you. We’re so glad that you gave us the time. We really appreciate your expertise and your very generous nature in sharing that expertise.
Jon: Yeah, this is great. I really appreciate your willingness, and also I appreciate your elaboration. Your education background, teaching others, that really comes through. So I really appreciate your answers.
Raphael: Thank you.
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.