This blog post provides an overview of the data flows inside CockroachDB and its data security architecture.
Note
This post was originally published in February 2020, targeting CockroachDB v20.1. It was edited in February 2022 to reflect changes in CockroachDB v22.1. A summary of changes is available in this page.
Disclaimers: this information is solely based on the public
CockroachDB source code repository on Github. At the time of this
writing, the most recent stable release of CockroachDB is 21.2.5. The
present analysis reflects the master
branch in the repository
which will become version 22.1. Although I also sometimes contribute to
the CockroachDB project and consult for Cockroach Labs, the opinions
enclosed herein are my own.
1 System overview
A running CockroachDB cluster looks like this: one or more unix processes running the cockroach program binary on different servers, and connected to each other.
(client apps using SQL or web UI sessions in cilent browsers) | | | +------ | ---------+ +------ | ---------+ +------ | ---------+ | v | | v | | v | | +------------+ | | +------------+ | | +------------+ | | | cockroach |<---------->| cockroach |<--------->| cockroach | | | +------------+ | | +------------+ | | +------------+ | | | | | | | | | | | +------------+ | | +------------+ | | +------------+ | | | node-local | | | | node-local | | | | node-local | | | | storage | | | | storage | | | | storage | | | +------------+ | | +------------+ | | +------------+ | | | | | | | +----- server 1 ---+ +----- server 2 ---+ +----- server 3 ---+
CockroachDB automatically replicates the data and handles failover: if a node is shut down or crashes, the other nodes “take over”. Data access is forwarded: clients connected to any node can access data on any other node. When a node goes down, clients can resume their work using any other node.
The diagram above reveals the main two categories of data flows inside CockroachDB:
- inter-node traffic.
- client traffic.
2 Architecture of a single node
A single process running the cockroach
program contains
multiple separate components with data flows between them.
We can model these components as a directed graph:
- nodes represent the components;
- edges represent the “uses” relationship:
A -> B
means “A sends requests to B and receives responses from B”.
We can then draw an inventory of the components and their data flows, as follows:
(See here for the source code to this diagram. This uses the Graphviz graph modeling language.)
In this diagram, the boxes with a double border correspond to server
components that accept connections from outside of the cockroach
process. There are 5 such server components, separated in 3 groups:
- The HTTP API server, which can serve the web UI assets on its own and delegates all the other API logic to the RPC administrative server (see below).
- The SQL protocol server (“pgwire” inside CockroachDB’s source code),
for client apps compatible with PostgreSQL wire protocol. This also
includes a subset of CockroachDB’s own CLI utilities
(e.g.
cockroach sql
). - The RPC servers:
- 3a. The RPC server for clients, which expose data to the web UI and
some of CockroachDB’s CLI utilities (e.g.
cockroach init
). - 3b. The SQL execution server, which executes (parts of) SQL queries on behalf of other nodes.
- 3c. The internal RPC server for all the services of the KV layer and below. This includes replication, transactions, liveness, etc.
- 3a. The RPC server for clients, which expose data to the web UI and
some of CockroachDB’s CLI utilities (e.g.
3 CockroachDB’s external interfaces
As revealed by the previous diagram, CockroachDB provides external connection opportunities as follows:
- over its HTTP port (
--listen-http-addr
), to its HTTP API server. - over its SQL port (
--listen-sql-addr
), to its SQL protocol server. - over a unix datagram socket (
--socket-file
), to its SQL protocol server. - over its RPC port (
--listen-addr
), to all its RPC servers: this port serves both client and inter-node traffic.
Additionally, an inplicit interface is used: the boundary between the
cockroach
process and the underlying storage (filesystem / hard disk):
- data files maintained by RocksDB. This includes persistent data and spill-to-disk temporary query data.
- temporary files created during bulk I/O operations.
- log files, including but not limited to:
- the SQL statement logs (
cockroach-sql-exec.xxx.log
) - the SQL “experimental” audit logs (
cockroach-sql-audit.xxx.log
) - the Pebble log files (
cockroach-pebble.xxx.log
) - the “main” operational logs.
- the SQL statement logs (
3.1 A word about the log files
The Pebble log files only contain operational events (e.g. “the process is shutting down”). The other logs contain possibly sensitive application data, as follows:
- the SQL statement logs contain the SQL text of queries sent by client, including the username, database/table names and the literal values passed as arguments to the query. The results are not logged, however the number of rows in the result is. Error messages returned to SQL clients, if any, are also logged in full.
- The SQL audit logs contain usernames, and database/table names.
- The main (default) logs contain highly sensitive information:
- Data range boundary keys are copied in full in various log messages. Boundary keys include all the data from indexed columns in SQL tables (both primary and secondary indexes).
- Data value strings are included in various logged error messages.
- The amount of data revealed in this way is already relatively high
in the default configuration, and can increase to become very
detailed if the
vmodule
option is changed (which can happen dynamically, during troubleshooting). - This logging of keys and values can occur on any table, including
system tables like
system.users
.
Overall, the main log files should be considered highly sensitive and access to them should be considered to offer pathways to privilege escalation up to complete control over the CockroachDB cluster.
4 CockroachDB’s STRIDE protections
Applying the STRIDE model, here is how CockroachDB provides security over its external interfaces.
4.1 Integrity (tamper-resistance)
Tamper resistance is arranged for network connections as follows:
- TLS integrity over all TCP connections.
- Unix process isolation for the unix socket.
Tamper resistance for storage is arranged as follows:
- Regular consistenty checks between replicas stored on different nodes.
- Optionally encryption-at-rest for data files.
Meanwhile, there is no integrity protection for generated log files, nor for temporary files during Bulk I/O operations.
Generally, CockroachDB was designed with the assumption that storage
integrity is provided by the underlying environment and that no
malicious actors have network access to the OS where cockroach
is
running. Production deployments that care about tampering via physical
access should use RAM and disk encryption.
Note: CockroachDB also uses checksums inside the data files. However, if my reading of the source code is right, Pebble does not use cryptographic hashes to maintain integrity, and programmatic tampering remains possible. As noted here, checksums are a protection against accidental damage to the data (e.g disk corruption) and do not provide resistance against malicious tampering.
4.2 Confidentiality
Resistance against information disclosure over network links is arranged as follows:
- TLS encryption over TCP connections is enabled by default.
- Unix process security for the unix socket.
Resistance against information disclosure in storage is arranged as follows:
- for data files, using optional encryption-at-rest.
- however, there is no confidentiality protection for generated log files nor for certain (non-RocksDB) intermediate temporary files during Bulk I/O operations. An attacker with access to logs can thus escalate privileges and get access to disclose data using a client connection, regardless of whether storage encryption is enabled.
As with tamper resistance above, resistance against information
disclosure (data leaks) of the storage underlying a cockroach
server should be provided by the environment, e.g. via RAM and disk
encryption. Until now, CockroachDB was designed with the assumption
that the cockroach
server is a confidential enclave.
This raises the question of the purpose of the encryption-at-rest
feature, if access to the server running cockroach
provides a
realistic path to information disclosure of stored data. I assume
that the feature was designed to maintain confidentiality of data
files when they are manually copied elsewhere, or when cockroach
is running against networked storage.
4.3 Non-repudiability
The logging subsystem is organized as channels, where events come from inside CockroachDB, routed to sinks, which decide how the logging events are repoted externally to the process.
It is possible to mark a log sink as auditable
, which makes CockroachDB
synchronize all log writes. This ensures that the log events going to that sink are persisted
before operations are allowed to go throught.
By default, the “important” channels, including accesses to audited tables, are
associated to a sink with the auditable
bit.
It is also possible to route the log traffic to an external log collector,
e.g. Fluent, to ensure that the collected events cannot be tampered with
if the server where cockroach
is running gets compromised.
4.4 Authentication
Authentication over network connections is arranged as follows:
- For HTTP clients, using either a password check (hash in
system.users
) or a valid session cookie (stored insystem.web_sessions
). The web interface, via an RPC, enables creation of a session cookie from a password or an OIDC handshake (“login workflow”); it is also possible to create a cookie manually using thecockroach auth-session login
CLI.) - For RPCs between nodes, two-way TLS certificate validation. Only node-to-node certificates are accepted.
- For RPC clients, TLS client certificate validation.
- For SQL clients, depending to the connection origin and the
username, according to the current HBA configuration (see below):
- Either TLS client certificate validation, or
- A password check against the hash stored in
system.users
, or - No authentication (always trust), or
- Via GSSAPI / Kerberos.
CockroachDB also supports OCSP for TLS certificate revocation.
Authentication between the cockroach
process and the accompanying
storage is arranged as follows:
- Access to log files and the data files is authenticated by the unix permission system.
- With encryption-at-rest, using configured encryption keys.
Authentication for outgoing requests performed by cockroach
, e.g.
when exporting data or backups to an external storage system, is
performed using credentials specified in SQL when the bulk I/O job is created.
4.4.1 Aside: the Host-Based Authentication (HBA) configuration language
CockroachDB supports a subset of PostgresSQL’s Host-Based
Authentication
configuration language (pg_hba.conf
), via its cluster setting
server.host_based_authentication.configuration
.
I wrote a separate article dedicated to this topic: Authentication configuration in PostgreSQL and CockroachDB.
4.5 Availability
Resistance to network-based denial of service attacks is arranged as follows:
- Memory usage upon incoming network connections is kept minimal until authentication succeeds (with a caveat, explained below).
- Nodes are “multi-active”: if one node goes away, another node can take over with close to zero time to recovery. This exploits consensus-based replication and thus requires a majority of nodes in every replication group to remain available.
- Automatic scale-up is possible: adding more nodes automatically adds more capacity.
As of v21.1, CockroachDB has a new multi-tenant architecture internally, which provides internal resource isolation between multiple “tenants” that are assumed to be adversaries with regards to availability.
As of this writing, this architecture is only used in the CockroachCloud “serverless” offering. CockroachCloud “dedicated” clusters, as well as users who run CockroachDB themselves, do not have access to this feature. In “single-tenant” deployments, any application can lock the others out via resource exhaustion.
Finally, the note about resource usage until authentication succeeds:
incoming requests are accepted by spawning a goroutine per incoming
connection. There is only minimal throttling based on the total amount
of memory allocated (each connections is initially counted as 10KiB
towards the configured --max-sql-memory
). It is thus possible to
open tens of thousands of pre-authentication connections to a single
node. A semaphore is used internally to limit the CPU usage of
authentication checks for incoming connections and prevent an attacker
from using pre-auth connections to starve a node from CPU.
4.6 Authorization
Resistance against escalation of privileges via the storage system is
solely maintained using the unix permission system. A user logged into
the cockroach
server’s OS with permission to access the
cockroach
data files can obtain a copy of the node-to-node
TLS certificate to gain control over the entire cluster.
It is also possible to tamper with the data files to yield escalation
of privilege. A user with write access to the data files of the
leaseholder node for the system.users
and system.role_members
table can spoof the existence of new security principals in the
admin
role to other nodes.
As explained in A word about the log files above, privilege escalation via information leaked in the log files is also theoretically possible.
Meanwhile, resistance against escalation of privilege using network connections is more complex and requires a deeper dive into CockroachDB’s privilege model. Keep reading.
4.7 Aside: CockroachDB’s “insecure” mode
When nodes are started with the --insecure
flag, all the following
mechanisms are entirely disabled (all-or-nothing):
- confidentiality: all no network and storage encryption is disabled.
- integrity: there are no digests on network messages.
- authentication: any client can authenticate as any principal,
including spoofing the
node
identity. - authorization: all non-
node
principals are assumed to have the “admin” bit. Sincenode
connections can be spoofed, any client effectively can also obtain the “node” bit on demand.
This mode is only offered for testing and disaster recovery.
5 Authorization in CockroachDB
CockroachDB establishes authorization using a matrix between security principals (corresponding approximately to “user accounts”) and privilege bits.
5.1 Security principals
There are three pre-defined principals, with the option to create
additional principals as entries in the system.users
SQL table.
The three pre-defined principals are:
- the
node
pseudo-user, which represents the identity of a node to another node. - the
root
SQL user. - the
admin
SQL role.
Principals are either singular (“user accounts” or node
) or
roles which correspond to user groups: every member of a role has
the bits assigned to the role in addition to its own.
The architecture of CockroachDB currently validates the existence of a
user against the system.users
table, however this is done only
during the authentication step. “During” routine SQL and RPC
operations, no further access is made to system.users
, so that
table is not deeply required as provider of principals. I predict that
CockroachDB will grow to recognize principals defined in external sources.
5.2 Privilege bits
The privilege bits currently defined include:
The “node” bit: can send inter-node RPC requests.
The “admin bit” (the
isAdmin
variable andHasAdminPrivilege()
function in various parts of the source code):- can use all the client RPCs, including administrative RPCs.
- via the administrative RPCs, can read and write log files and other intermediate/temporary files created or used by SQL.
- can create SQL databases.
- can create new principals via write access to
system.users
. - can update cluster settings, and thus change the authentication rules via the HBA configuration.
- overrides all the other SQL-level privilege bits (see below).
- due to the points above, gives effectively “super-user” privilege on the CockroachDB cluster.
The “role admin bit” (the
isAdmin
column insystem.role_members
), aiming to mimic a PostgreSQL feature:- is either set or unset for the membership of a principal in a role.
- when set, gives that principal’s the ability to add or remove members from the role.
The SQL bits, aiming to mimic the PostgreSQL privilege model. These are defined either per principal, or per SQL object (database, table, view, sequence).
Bits per principal (also called “role options”), can be associated with a value:
- (NO)CREATEROLE: ability to create other principals.
- (NO)CREATELOGIN: ability to create credentials for other principals.
- (NO)LOGIN: ability to open either a HTTP or a SQL session.
- (NO)SQLLOGIN: ability to open a SQL session.
- (NO)PASSWORD: password hash that can support password authentication.
- VALIDUNTIL: expiration of the password field. If expired, principal can only login if LOGIN bit is set and a non-password credential is available.
- (NO)CONTROLJOB: ability to control jobs.
- (NO)CONTROLCHANGEFEED: ability to control changefeeds.
- (NO)CREATEDB: ability to create new databases.
- (NO)VIEWACTIVITY: ability to inspect the SQL sessions of other users and query statistics.
- (NO)CANCELQUERY: ability to cancel queries for other users.
- (NO)MODIFYCLUSTERSETTINGS: ability to change cluster settings.
Bits per SQL database:
- CREATE/DROP: creating/deleting new tables/views/sequences/indexes.
- CONNECT: can open a session with this database as current database.
- ZONECONFIG: can alter the zone configuration (mapping of data to nodes) of all tables in a database.
- GRANT: granting SQL bits to other users/roles over SQL databases and objects.
- SELECT/INSERT/UPDATE/DELETE: privileges that are inherited to individual objects inside the database.
Bits per SQL schema:
- USAGE: can access objects inside the schema.
- SELECT/INSERT/UPDATE/DELETE: privileges that are inherited to individual objects inside the schema.
Bits per SQL object (table/view/sequence):
- CREATE: altering the schema of existing tables/views/sequences/indexes.
- DROP: dropping existing tables/views/sequences/indexes.
- SELECT: can read from tables/views/sequences.
- DELETE: can delete rows from tables.
- UPDATE: can update rows in tables.
- INSERT: can add new rows in tables.
- ZONECONFIG: can alter the zone configuration (mapping of data to nodes) of tables.
- GRANT: granting SQL bits to other users/roles over SQL objects.
The mechanisms of the GRANT bit seems intended to turn CockroachDB into a capability-based authorization system over the set of SQL objects: a non-admin user should only be able to grant to other users the bits that they themselves already have. This makes sense because the SQL bits were designed in PostgreSQL to prevent privilege escalation of SQL sessions over the use of SQL objects.
This used to be incorrectly implemented in v20.1 (see this issue) but seems to have been fixed. However, the documentation does not highlight that CockroachDB is capability-based system, so it remains unclear whether that is a product goal.
Also, remarkably, CockroachDB does not currently provide an “operator” privilege bit which would enable a principal to observe, but not modify cluster metadata and configuration.
5.3 Authorization invariants
The following privilege-principal matrix is hard-coded inside CockroachDB and cannot be modified on a running cluster:
- the
node
principal has the “node” and “admin” bits. - the
root
andadmin
principals have the “admin” bit. - only non-role principals can authenticate via the client interfaces.
5.4 Security boundaries
CockroachDB’s security, in agreement with its name, has adopted the “insect” security model: a hard shell with soft insides.
- Authorization is (intended to be) enforced at the boundary between the cluster and clients.
- All inter-node cluster traffic is privileged and no authorization is performed internally.
Looking back at the architectural diagram from the beginning, this places the responsibility for authorization on the server components at the client boundary:
- the RPC server for clients (3a), because some of the other components that it uses do not perform authorization themselves.
- the SQL protocol server (2), or alternatively the SQL executor&planner component which is the protocol server’s only interface to the rest of the CockroachDB’s cluster.
The HTTP server (1) could perform authorization but currently does not, instead delegating authorization to the RPC server for clients (3a).
The RPC server for other nodes (the SQL execution server, 3b, and the
KV APIs, 3c) are not responsible for authorization because they can
only receive requests from an authenticated node
principal, and
node
has the “node” bit and thus total authority over the cluster.
5.5 There is more to authorization than just SQL
A naive view of database authorization would assume that 1) the only data that can be read and used is SQL data, and 2) that the privilege bits configured for each SQL object directly dictate what an authenticated principal can or cannot do.
In this naive view, a SQL client connects “as” a particular SQL user
(e.g. “alice
”). When the SQL client issues SELECT * FROM
bobs_secret_table
, CockroachDB verifies whether alice
has
the SELECT bit over bobs_secret_table
. If it does not, the
SQL query is rejected in error.
The limitations of this naive view come from the assumptions made above:
- there is actually more data that can be accessed than just SQL data.
- there are intermediate authorization deputies inside CockroachDB which deliberately escalate the privilege of certain queries in controlled ways.
5.6 Non-SQL data
Non-SQL data is data for which the SQL privilege bits do not apply. The following non-SQL data stored inside or alongside CockroachDB can be accessed by client sessions over the network.
These are detailed in the following sub-sections.
5.6.1 Log files
The log files on each node can be accessed read-only via an administrative API over RPC and HTTP. Authorization is performed by the API endpoint by asserting that the principal has the “admin” bit.
5.6.2 Monitoring time series
CockroachDB’s internal monitoring time series can be accessed read-only via a status API over RPC and HTTP. No authorization is performed: any authenticated principal can ready all the monitoring data.
5.6.3 Running sessions, queries and jobs
The list of currently running sessions and currently running queries, and the list of jobs can be read and controlled (cancelled/paused):
- read-only access via a status API over RPC and HTTP. Again, authorization is performed by the API endpoint itself.
- read-only access via the virtual tables
crdb_internal.{cluster,node}_{sessions,queries}
andcrdb_internal.jobs
in SQL. Authorization is performed by the table’s data generator. - controlling access via the
CANCEL
,PAUSE
andRESUME
statements in SQL. Authorization is performed by those statement’s execution code.
For these, authorization is performed as follows: any principal with the “admin” bit can read and control all sessions, queries and jobs. Principals without the “admin” bit can read and control their own sessions, queries and jobs (for jobs, only starting in version 20.1), or that of other users if they have the CONTROLJOB, CONTROLCHANGEFEED, VIEWACTIVITY or CANCELQUERY role options depending on the type of access desired.
There is no role propagation: members of a role cannot see nor control sessions, queries and jobs by other members just by virtue of being in the same role.
5.6.4 Privileged cluster metadata
Additional privileged cluster metadata can be accessed read-only via:
- a status API over RPC and HTTP. Authorization is performed by the API endpoint.
- via virtual tables in SQL. Authorization is performed by the table’s data generator.
In both cases, authorization requires the “admin” bit from the authenticated principal, or an appropriate role option.
A relatively complete list of the RPC endpoints that retrieve privileged metadata over HTTP is available at the end of the Security updates section of the v19.2.2 release notes. (I was not able to find this information elsewhere.)
The list of privileged metadata available via SQL is, likewise, not yet documented. Here is the list I was able to gather from the source code (may be incomplete):
Privileged metadata | SQL virtual table |
---|---|
Gossiped metadata | crdb_internal.gossip_nodes , crdb_internal.gossip_liveness , crdb_internal.gossip_alerts , crdb_internal.gossip_network |
Stored node properties | crdb_internal.kv_node_status |
Stored store properties | crdb_internal.kv_store_status |
Server process variables | crdb_internal.node_runtime_info |
Statement statistics | crdb_internal.node_statement_statistics |
Transaction statistics | crdb_internal.node_txn_stats |
Cluster settings | crdb_internal.cluster_settings |
Node process metrics | crdb_internal.node_metrics |
Range and range lease details | crdb_internal.ranges{,_no_leases} |
5.6.5 Non-privileged cluster metadata
Access to non-privileged cluster metadata is granted to every authenticated principal without requiring a particular privilege bit.
I was not able to find a complete list of the RPC endpoints
delivering non-privileged metadata in a single place. One can scan
the .proto
files in the pkg/server/serverpb
source code
directory and cross-reference that with the other .go
files in
pkg/server
to reconstruct such a list.
The list of endpoints in the SQL package is more clearly recognizable from the source code (but may be incomplete):
Metadata | SQL virtual table |
---|---|
Table leases (see below for discussion) | crdb_internal.leases |
Server process variables | crdb_internal.node_runtime_info |
Current session trace | crdb_internal.session_trace |
Current session variables | crdb_internal.session_variables |
List of supported built-in SQL functions | crdb_internal.builtin_functions |
Feature usage counters | crdb_internal.feature_usage |
Predefined comments on virtual tables | crdb_internal.predefined_comments |
5.7 Authorization deputies
An authorization deputy is a component that has higher privilege than the other components it serves. It must thus perform additional authorization checks to prevent its client components from escalating their privileges in unwanted ways.
Note
authorization deputies are also sometimes called “surrogates” in the literature.
An example of this in CockroachDB is the system.jobs
table. As a
SQL table, this is accessible only via the “admin” privilege. However,
for the benefit of UX, CockroachDB also offers the option to
non-“admin” principals to control their own jobs. This is done by
introducing four authorization deputies: the crdb_internal.jobs
virtual table, and the three SQL statements RESUME/CANCEL/PAUSE JOB.
These four deputies each operate internally with the “admin” bit,
and perform an additional filtering step to only return
rows owned by the requester.
Here are some examples of other authorization deputies in CockroachDB:
- The server components, as described in Architecture of a single node above. These perform authentication and authorization with the “admin” bit on behalf of yet-to-be-authenticated/authorized client sessions.
- All the SQL virtual tables that expose SQL schema metadata
(database/table/column names and zone configurations). This includes
all of
information_schema
tables, certainpg_catalog
tables and certaincrdb_internal
tables. These accesssystem.namespace
,system.descriptors
andsystem.zones
using the “admin” bit, then filter the results based on whether the requesting principal has any SQL privilege on the returned object. - the SQL execution logic of various
SHOW
statements that do not otherwise defer their authorization to a virtual table. For example,SHOW CLUSTER SETTINGS
verifies that the requester has the “admin” bit. - The
ALTER ... CONFIGURE ZONE
statements operate onsystem.zones
using the “admin” bit but first require the ZONECONFIG privilege from the requesting principal (new in version 20.1).
There are many such deputies throughout the source code, yet I was not able to find a single point of reference that lists them all in a systematic way.
6 Architecture of the authorization code
6.1 Reflections on the “hard shell, soft insides” model
In truth, I struggled to identify the authorization boundaries and deputies in the code. The reason for this struggle is that the code is not capability-based: all the code inside CockroachDB can theoretically operate over any and all the other components without any restriction. The node-to-node privileged RPC channel (with the “node” privilege bit, and thus able to control the entier cluster) is available internally to all source code components.
For example, the source code for a simple SELECT
statement can
freely send a KV request to write to an unrelated table.
I was not able to find a framework or structural enforcement of privilege checks throughout the layers of RPC and SQL execution logic. Neither was I able to find systematic testing of the authorization code either. Certain privileged statements have dedicated multi-user unit tests, but this is not done for all of them.
Therefore, the existence of additional authorization code (and thus of authorization deputies) feels, to this reader of the source code, an post-hoc model of how the code was written.
6.2 Unstructured and duplicated authorization
An architectural detail that caught my attention is that authorization policies are not systematically enforced in a single place.
Certain deputies do provide a single point of authorization in the
source code for the various pathways that can request through
them. For example, the logic to list running sessions and queries is
performed in one place ((*statusServer) ListLocalSessions()
), that
single code is used both for RPC and SQL endpoints, and it takes
responsibility for authorizing access.
However, most sources of data have multiple deputies, each responsible
for enforcing authorization policy. For example, the “Statement
details” RPC ((*statusServer) Statements()
) and SQL’s
crdb_internal.node_statement_details
data generator each require
the “admin” bit from their client separately. There does not seem to
be any structural enforcement that the various access paths enforce
the same authorization.
In some cases, the code accidentally enforces the right
authorization policy, even though that does not seem to be intended.
For example the virtual tables crdb_internal.forward_dependencies
and .backward_dependencies
report the tables/indexes that are
referred to/from using foreign key (FK) relationships. The code acts
as an authorization deputy by filtering rows out that should remain
invisible to the requesting principal. However, as of this writing, it
only validates the SQL privilege bits on one end of the
relationship. At first glance, it thus seems theoretically possible
that these two tables be used to learn about the existence and the
logical schema of tables over which the logged-in user has no SQL
privilege. This is not the case, however! In a separate part of the
source code, the DDL statements that establish FK relations and the
execution code that enforces FK constraints during mutation statements
do check these privileges. It thus so happens that CockroachDB does not
support FK relationships over mixed-privileged tables, so there is no
additional information disclosure available via the virtual
tables. This is OK, but does not seem to be deliberately designed.
To summarise, to the credit of the CockroachDB developers, it appears to me that the necessary authorization checks are indeed performed in the right places—with some caveats shared below. Even though it feels serendipitous, the authors of the code have been methodical, so far, and mistakes are promptly fixed, when recognized.
Example past issues that directly stem from this lack of systematic authorization:
7 Areas where further investigation is needed
While casually browsing the source code, I found the following points of interest:
- The virtual table
crdb_internal.leases
reports all the leases on SQL objects throughout the cluster, and this virtual table is readable by any principal. This reveals the existence and the name of all the “currently active” tables/views/sequences. - Any SQL session can request “session tracing” via the statement
SET tracing = on
(oron,kv
). After a trace is collected, it can view its own trace viaSHOW TRACE
or the virtual tablecrdb_internal.session_trace
. The trace contains details of KV operations (including detailed key/values) prior to authorization and filtering by authorization deputies. This can be used e.g. to discover the list of all database and table names, by tracing aSHOW TABLES
statement. In general, SQL session tracing can be used to bypass authorization and read any stored data processed by deputies with the “admin” bit. This is a textbook confused deputy situation. - Go
error
objects produced in one component flow through other components, including authorization deputies. While deputies filter the “positive” (non-error) data returned to requesters, they typically do not filter out sensitive data contained in errors produced with their “admin” bit. Given that KV-level error objects can contain key/value data, it is theoretically possible to exploit error handling to leak data through authorization deputies. This is another confused deputy situation.
8 Summary & things I would have done differently
CockroachDB is a distributed database where nodes communicate with each other and accept connections from database clients. From the STRIDE perspective, CockroachDB follows the state of the art and makes reasonable operational assumptions with regards to Authentication, Integrity (tamper-resistance) and “external” Confidentiality. There is little provision for Non-repudiability but this seems to remain out of scope for the project at this time.
CockroachDB’s Availability story is slightly more complex. While it was designed from the ground up as a Highly Available (HA) system in the face of accidents, there are some gaps regarding availability in the face of malicious behavior.
Finally, its Authorization story is “interesting” and offers the most opportunities for future improvements. CockroachDB uses a combination of SQL and non-SQL privilege bits, and the authorization policies that enforce privilege bits on service endpoints (through its HTTP, RPC and SQL APIs) are heterogeneously implemented, without a common structured framework. CockroachDB’s documentation is strong in its advertisement and explanation of the SQL privilege model over SQL resources, but does not yet systematically explain Non-SQL data and non-SQL authorization rules. Although the currently released versions do offer reasonably strong authorization guarantees, there are extremely little guardrails in the source code that prevent the accidental introduction of privilege escalation avenues during the development of new features.
As an engineer with interest in security, there are several low hanging fruits that I could recommend to lift the project into a better and stronger place:
- Immediately change the default configuration to separate the SQL and RPC ports and strongly recommend in documentation to firewall the RPC port to prevent malicious use or “node”-level privilege escalation.
- Separate the privilege bit required to access (and write to)
system
tables, containing critical operational parameters, from the “admin” bit required to view Privileged cluster metadata. This would strongly incentivize the engineers who add new views into cluster metadata to think about implementing and documenting a new authorization deputy with clear escalation rules. - Systematically eliminate the redundancy in authorization enforcement between the RPC servers and the SQL code. Ideally, a single point in the code should be responsible to both enforce authorization and log controlled privilege escalations, such as that performed by SQL virtual tables that act as non-privileged views into privileged data.
- Gradually introduce a capability authorization sub-system, where the privileged components would require capability bits from their requesters, and where capability tokens would be delivered to a principal as part of their authentication. This system should be designed using cryptographic principles so that an engineer would not be tempted to synthetize a capability token “for convenience” when they are pressed for time implementing a new feature.