This blog post provides an overview of the data flows inside CockroachDB and its data security architecture.

Note

This post was originally published in February 2020, targeting CockroachDB v20.1. It was edited in February 2022 to reflect changes in CockroachDB v22.1. A summary of changes is available in this page.

Disclaimers: this information is solely based on the public CockroachDB source code repository on Github. At the time of this writing, the most recent stable release of CockroachDB is 21.2.5. The present analysis reflects the master branch in the repository which will become version 22.1. Although I also sometimes contribute to the CockroachDB project and consult for Cockroach Labs, the opinions enclosed herein are my own.

1   System overview

A running CockroachDB cluster looks like this: one or more unix processes running the cockroach program binary on different servers, and connected to each other.

(client apps using SQL or web UI sessions in cilent browsers)
        |                         |                        |
+------ | ---------+      +------ | ---------+     +------ | ---------+
|       v          |      |       v          |     |       v          |
|  +------------+  |      |  +------------+  |     |  +------------+  |
|  | cockroach  |<---------->| cockroach  |<--------->| cockroach  |  |
|  +------------+  |      |  +------------+  |     |  +------------+  |
|       |          |      |       |          |     |        |         |
|  +------------+  |      |  +------------+  |     |  +------------+  |
|  | node-local |  |      |  | node-local |  |     |  | node-local |  |
|  | storage    |  |      |  | storage    |  |     |  | storage    |  |
|  +------------+  |      |  +------------+  |     |  +------------+  |
|                  |      |                  |     |                  |
+----- server 1 ---+      +----- server 2 ---+     +----- server 3 ---+

CockroachDB automatically replicates the data and handles failover: if a node is shut down or crashes, the other nodes “take over”. Data access is forwarded: clients connected to any node can access data on any other node. When a node goes down, clients can resume their work using any other node.

The diagram above reveals the main two categories of data flows inside CockroachDB:

  • inter-node traffic.
  • client traffic.

2   Architecture of a single node

A single process running the cockroach program contains multiple separate components with data flows between them.

We can model these components as a directed graph:

  • nodes represent the components;
  • edges represent the “uses” relationship: A -> B means “A sends requests to B and receives responses from B”.

We can then draw an inventory of the components and their data flows, as follows:

(See here for the source code to this diagram. This uses the Graphviz graph modeling language.)

In this diagram, the boxes with a double border correspond to server components that accept connections from outside of the cockroach process. There are 5 such server components, separated in 3 groups:

  1. The HTTP API server, which can serve the web UI assets on its own and delegates all the other API logic to the RPC administrative server (see below).
  2. The SQL protocol server (“pgwire” inside CockroachDB’s source code), for client apps compatible with PostgreSQL wire protocol. This also includes a subset of CockroachDB’s own CLI utilities (e.g. cockroach sql).
  3. The RPC servers:
    • 3a. The RPC server for clients, which expose data to the web UI and some of CockroachDB’s CLI utilities (e.g. cockroach init).
    • 3b. The SQL execution server, which executes (parts of) SQL queries on behalf of other nodes.
    • 3c. The internal RPC server for all the services of the KV layer and below. This includes replication, transactions, liveness, etc.

3   CockroachDB’s external interfaces

As revealed by the previous diagram, CockroachDB provides external connection opportunities as follows:

  • over its HTTP port (--listen-http-addr), to its HTTP API server.
  • over its SQL port (--listen-sql-addr), to its SQL protocol server.
  • over a unix datagram socket (--socket-file), to its SQL protocol server.
  • over its RPC port (--listen-addr), to all its RPC servers: this port serves both client and inter-node traffic.

Additionally, an inplicit interface is used: the boundary between the cockroach process and the underlying storage (filesystem / hard disk):

  • data files maintained by RocksDB. This includes persistent data and spill-to-disk temporary query data.
  • temporary files created during bulk I/O operations.
  • log files, including but not limited to:
    • the SQL statement logs (cockroach-sql-exec.xxx.log)
    • the SQL “experimental” audit logs (cockroach-sql-audit.xxx.log)
    • the Pebble log files (cockroach-pebble.xxx.log)
    • the “main” operational logs.

3.1   A word about the log files

The Pebble log files only contain operational events (e.g. “the process is shutting down”). The other logs contain possibly sensitive application data, as follows:

  • the SQL statement logs contain the SQL text of queries sent by client, including the username, database/table names and the literal values passed as arguments to the query. The results are not logged, however the number of rows in the result is. Error messages returned to SQL clients, if any, are also logged in full.
  • The SQL audit logs contain usernames, and database/table names.
  • The main (default) logs contain highly sensitive information:
    • Data range boundary keys are copied in full in various log messages. Boundary keys include all the data from indexed columns in SQL tables (both primary and secondary indexes).
    • Data value strings are included in various logged error messages.
    • The amount of data revealed in this way is already relatively high in the default configuration, and can increase to become very detailed if the vmodule option is changed (which can happen dynamically, during troubleshooting).
    • This logging of keys and values can occur on any table, including system tables like system.users.

Overall, the main log files should be considered highly sensitive and access to them should be considered to offer pathways to privilege escalation up to complete control over the CockroachDB cluster.

4   CockroachDB’s STRIDE protections

Applying the STRIDE model, here is how CockroachDB provides security over its external interfaces.

4.1   Integrity (tamper-resistance)

Tamper resistance is arranged for network connections as follows:

  • TLS integrity over all TCP connections.
  • Unix process isolation for the unix socket.

Tamper resistance for storage is arranged as follows:

  • Regular consistenty checks between replicas stored on different nodes.
  • Optionally encryption-at-rest for data files.

Meanwhile, there is no integrity protection for generated log files, nor for temporary files during Bulk I/O operations.

Generally, CockroachDB was designed with the assumption that storage integrity is provided by the underlying environment and that no malicious actors have network access to the OS where cockroach is running. Production deployments that care about tampering via physical access should use RAM and disk encryption.

Note: CockroachDB also uses checksums inside the data files. However, if my reading of the source code is right, Pebble does not use cryptographic hashes to maintain integrity, and programmatic tampering remains possible. As noted here, checksums are a protection against accidental damage to the data (e.g disk corruption) and do not provide resistance against malicious tampering.

4.2   Confidentiality

Resistance against information disclosure over network links is arranged as follows:

  • TLS encryption over TCP connections is enabled by default.
  • Unix process security for the unix socket.

Resistance against information disclosure in storage is arranged as follows:

  • for data files, using optional encryption-at-rest.
  • however, there is no confidentiality protection for generated log files nor for certain (non-RocksDB) intermediate temporary files during Bulk I/O operations. An attacker with access to logs can thus escalate privileges and get access to disclose data using a client connection, regardless of whether storage encryption is enabled.

As with tamper resistance above, resistance against information disclosure (data leaks) of the storage underlying a cockroach server should be provided by the environment, e.g. via RAM and disk encryption. Until now, CockroachDB was designed with the assumption that the cockroach server is a confidential enclave.

This raises the question of the purpose of the encryption-at-rest feature, if access to the server running cockroach provides a realistic path to information disclosure of stored data. I assume that the feature was designed to maintain confidentiality of data files when they are manually copied elsewhere, or when cockroach is running against networked storage.

4.3   Non-repudiability

The logging subsystem is organized as channels, where events come from inside CockroachDB, routed to sinks, which decide how the logging events are repoted externally to the process.

It is possible to mark a log sink as auditable, which makes CockroachDB synchronize all log writes. This ensures that the log events going to that sink are persisted before operations are allowed to go throught.

By default, the “important” channels, including accesses to audited tables, are associated to a sink with the auditable bit.

It is also possible to route the log traffic to an external log collector, e.g. Fluent, to ensure that the collected events cannot be tampered with if the server where cockroach is running gets compromised.

4.4   Authentication

Authentication over network connections is arranged as follows:

  • For HTTP clients, using either a password check (hash in system.users) or a valid session cookie (stored in system.web_sessions). The web interface, via an RPC, enables creation of a session cookie from a password or an OIDC handshake (“login workflow”); it is also possible to create a cookie manually using the cockroach auth-session login CLI.)
  • For RPCs between nodes, two-way TLS certificate validation. Only node-to-node certificates are accepted.
  • For RPC clients, TLS client certificate validation.
  • For SQL clients, depending to the connection origin and the username, according to the current HBA configuration (see below):
    • Either TLS client certificate validation, or
    • A password check against the hash stored in system.users, or
    • No authentication (always trust), or
    • Via GSSAPI / Kerberos.

CockroachDB also supports OCSP for TLS certificate revocation.

Authentication between the cockroach process and the accompanying storage is arranged as follows:

  • Access to log files and the data files is authenticated by the unix permission system.
  • With encryption-at-rest, using configured encryption keys.

Authentication for outgoing requests performed by cockroach, e.g. when exporting data or backups to an external storage system, is performed using credentials specified in SQL when the bulk I/O job is created.

4.4.1   Aside: the Host-Based Authentication (HBA) configuration language

CockroachDB supports a subset of PostgresSQL’s Host-Based Authentication configuration language (pg_hba.conf), via its cluster setting server.host_based_authentication.configuration.

I wrote a separate article dedicated to this topic: Authentication configuration in PostgreSQL and CockroachDB.

4.5   Availability

Resistance to network-based denial of service attacks is arranged as follows:

  • Memory usage upon incoming network connections is kept minimal until authentication succeeds (with a caveat, explained below).
  • Nodes are “multi-active”: if one node goes away, another node can take over with close to zero time to recovery. This exploits consensus-based replication and thus requires a majority of nodes in every replication group to remain available.
  • Automatic scale-up is possible: adding more nodes automatically adds more capacity.

As of v21.1, CockroachDB has a new multi-tenant architecture internally, which provides internal resource isolation between multiple “tenants” that are assumed to be adversaries with regards to availability.

As of this writing, this architecture is only used in the CockroachCloud “serverless” offering. CockroachCloud “dedicated” clusters, as well as users who run CockroachDB themselves, do not have access to this feature. In “single-tenant” deployments, any application can lock the others out via resource exhaustion.

Finally, the note about resource usage until authentication succeeds: incoming requests are accepted by spawning a goroutine per incoming connection. There is only minimal throttling based on the total amount of memory allocated (each connections is initially counted as 10KiB towards the configured --max-sql-memory). It is thus possible to open tens of thousands of pre-authentication connections to a single node. A semaphore is used internally to limit the CPU usage of authentication checks for incoming connections and prevent an attacker from using pre-auth connections to starve a node from CPU.

4.6   Authorization

Resistance against escalation of privileges via the storage system is solely maintained using the unix permission system. A user logged into the cockroach server’s OS with permission to access the cockroach data files can obtain a copy of the node-to-node TLS certificate to gain control over the entire cluster.

It is also possible to tamper with the data files to yield escalation of privilege. A user with write access to the data files of the leaseholder node for the system.users and system.role_members table can spoof the existence of new security principals in the admin role to other nodes.

As explained in A word about the log files above, privilege escalation via information leaked in the log files is also theoretically possible.

Meanwhile, resistance against escalation of privilege using network connections is more complex and requires a deeper dive into CockroachDB’s privilege model. Keep reading.

4.7   Aside: CockroachDB’s “insecure” mode

When nodes are started with the --insecure flag, all the following mechanisms are entirely disabled (all-or-nothing):

  • confidentiality: all no network and storage encryption is disabled.
  • integrity: there are no digests on network messages.
  • authentication: any client can authenticate as any principal, including spoofing the node identity.
  • authorization: all non-node principals are assumed to have the “admin” bit. Since node connections can be spoofed, any client effectively can also obtain the “node” bit on demand.

This mode is only offered for testing and disaster recovery.

5   Authorization in CockroachDB

CockroachDB establishes authorization using a matrix between security principals (corresponding approximately to “user accounts”) and privilege bits.

5.1   Security principals

There are three pre-defined principals, with the option to create additional principals as entries in the system.users SQL table.

The three pre-defined principals are:

  • the node pseudo-user, which represents the identity of a node to another node.
  • the root SQL user.
  • the admin SQL role.

Principals are either singular (“user accounts” or node) or roles which correspond to user groups: every member of a role has the bits assigned to the role in addition to its own.

The architecture of CockroachDB currently validates the existence of a user against the system.users table, however this is done only during the authentication step. “During” routine SQL and RPC operations, no further access is made to system.users, so that table is not deeply required as provider of principals. I predict that CockroachDB will grow to recognize principals defined in external sources.

5.2   Privilege bits

The privilege bits currently defined include:

  • The “node” bit: can send inter-node RPC requests.

  • The “admin bit” (the isAdmin variable and HasAdminPrivilege() function in various parts of the source code):

    • can use all the client RPCs, including administrative RPCs.
    • via the administrative RPCs, can read and write log files and other intermediate/temporary files created or used by SQL.
    • can create SQL databases.
    • can create new principals via write access to system.users.
    • can update cluster settings, and thus change the authentication rules via the HBA configuration.
    • overrides all the other SQL-level privilege bits (see below).
    • due to the points above, gives effectively “super-user” privilege on the CockroachDB cluster.
  • The “role admin bit” (the isAdmin column in system.role_members), aiming to mimic a PostgreSQL feature:

    • is either set or unset for the membership of a principal in a role.
    • when set, gives that principal’s the ability to add or remove members from the role.
  • The SQL bits, aiming to mimic the PostgreSQL privilege model. These are defined either per principal, or per SQL object (database, table, view, sequence).

    Bits per principal (also called “role options”), can be associated with a value:

    • (NO)CREATEROLE: ability to create other principals.
    • (NO)CREATELOGIN: ability to create credentials for other principals.
    • (NO)LOGIN: ability to open either a HTTP or a SQL session.
    • (NO)SQLLOGIN: ability to open a SQL session.
    • (NO)PASSWORD: password hash that can support password authentication.
    • VALIDUNTIL: expiration of the password field. If expired, principal can only login if LOGIN bit is set and a non-password credential is available.
    • (NO)CONTROLJOB: ability to control jobs.
    • (NO)CONTROLCHANGEFEED: ability to control changefeeds.
    • (NO)CREATEDB: ability to create new databases.
    • (NO)VIEWACTIVITY: ability to inspect the SQL sessions of other users and query statistics.
    • (NO)CANCELQUERY: ability to cancel queries for other users.
    • (NO)MODIFYCLUSTERSETTINGS: ability to change cluster settings.

    Bits per SQL database:

    • CREATE/DROP: creating/deleting new tables/views/sequences/indexes.
    • CONNECT: can open a session with this database as current database.
    • ZONECONFIG: can alter the zone configuration (mapping of data to nodes) of all tables in a database.
    • GRANT: granting SQL bits to other users/roles over SQL databases and objects.
    • SELECT/INSERT/UPDATE/DELETE: privileges that are inherited to individual objects inside the database.

    Bits per SQL schema:

    • USAGE: can access objects inside the schema.
    • SELECT/INSERT/UPDATE/DELETE: privileges that are inherited to individual objects inside the schema.

    Bits per SQL object (table/view/sequence):

    • CREATE: altering the schema of existing tables/views/sequences/indexes.
    • DROP: dropping existing tables/views/sequences/indexes.
    • SELECT: can read from tables/views/sequences.
    • DELETE: can delete rows from tables.
    • UPDATE: can update rows in tables.
    • INSERT: can add new rows in tables.
    • ZONECONFIG: can alter the zone configuration (mapping of data to nodes) of tables.
    • GRANT: granting SQL bits to other users/roles over SQL objects.

The mechanisms of the GRANT bit seems intended to turn CockroachDB into a capability-based authorization system over the set of SQL objects: a non-admin user should only be able to grant to other users the bits that they themselves already have. This makes sense because the SQL bits were designed in PostgreSQL to prevent privilege escalation of SQL sessions over the use of SQL objects.

This used to be incorrectly implemented in v20.1 (see this issue) but seems to have been fixed. However, the documentation does not highlight that CockroachDB is capability-based system, so it remains unclear whether that is a product goal.

Also, remarkably, CockroachDB does not currently provide an “operator” privilege bit which would enable a principal to observe, but not modify cluster metadata and configuration.

5.3   Authorization invariants

The following privilege-principal matrix is hard-coded inside CockroachDB and cannot be modified on a running cluster:

  • the node principal has the “node” and “admin” bits.
  • the root and admin principals have the “admin” bit.
  • only non-role principals can authenticate via the client interfaces.

5.4   Security boundaries

CockroachDB’s security, in agreement with its name, has adopted the “insect” security model: a hard shell with soft insides.

  • Authorization is (intended to be) enforced at the boundary between the cluster and clients.
  • All inter-node cluster traffic is privileged and no authorization is performed internally.

Looking back at the architectural diagram from the beginning, this places the responsibility for authorization on the server components at the client boundary:

  • the RPC server for clients (3a), because some of the other components that it uses do not perform authorization themselves.
  • the SQL protocol server (2), or alternatively the SQL executor&planner component which is the protocol server’s only interface to the rest of the CockroachDB’s cluster.

The HTTP server (1) could perform authorization but currently does not, instead delegating authorization to the RPC server for clients (3a).

The RPC server for other nodes (the SQL execution server, 3b, and the KV APIs, 3c) are not responsible for authorization because they can only receive requests from an authenticated node principal, and node has the “node” bit and thus total authority over the cluster.

5.5   There is more to authorization than just SQL

A naive view of database authorization would assume that 1) the only data that can be read and used is SQL data, and 2) that the privilege bits configured for each SQL object directly dictate what an authenticated principal can or cannot do.

In this naive view, a SQL client connects “as” a particular SQL user (e.g. “alice”). When the SQL client issues SELECT * FROM bobs_secret_table, CockroachDB verifies whether alice has the SELECT bit over bobs_secret_table. If it does not, the SQL query is rejected in error.

The limitations of this naive view come from the assumptions made above:

  1. there is actually more data that can be accessed than just SQL data.
  2. there are intermediate authorization deputies inside CockroachDB which deliberately escalate the privilege of certain queries in controlled ways.

5.6   Non-SQL data

Non-SQL data is data for which the SQL privilege bits do not apply. The following non-SQL data stored inside or alongside CockroachDB can be accessed by client sessions over the network.

These are detailed in the following sub-sections.

5.6.1   Log files

The log files on each node can be accessed read-only via an administrative API over RPC and HTTP. Authorization is performed by the API endpoint by asserting that the principal has the “admin” bit.

5.6.2   Monitoring time series

CockroachDB’s internal monitoring time series can be accessed read-only via a status API over RPC and HTTP. No authorization is performed: any authenticated principal can ready all the monitoring data.

5.6.3   Running sessions, queries and jobs

The list of currently running sessions and currently running queries, and the list of jobs can be read and controlled (cancelled/paused):

  • read-only access via a status API over RPC and HTTP. Again, authorization is performed by the API endpoint itself.
  • read-only access via the virtual tables crdb_internal.{cluster,node}_{sessions,queries} and crdb_internal.jobs in SQL. Authorization is performed by the table’s data generator.
  • controlling access via the CANCEL, PAUSE and RESUME statements in SQL. Authorization is performed by those statement’s execution code.

For these, authorization is performed as follows: any principal with the “admin” bit can read and control all sessions, queries and jobs. Principals without the “admin” bit can read and control their own sessions, queries and jobs (for jobs, only starting in version 20.1), or that of other users if they have the CONTROLJOB, CONTROLCHANGEFEED, VIEWACTIVITY or CANCELQUERY role options depending on the type of access desired.

There is no role propagation: members of a role cannot see nor control sessions, queries and jobs by other members just by virtue of being in the same role.

5.6.4   Privileged cluster metadata

Additional privileged cluster metadata can be accessed read-only via:

  • a status API over RPC and HTTP. Authorization is performed by the API endpoint.
  • via virtual tables in SQL. Authorization is performed by the table’s data generator.

In both cases, authorization requires the “admin” bit from the authenticated principal, or an appropriate role option.

A relatively complete list of the RPC endpoints that retrieve privileged metadata over HTTP is available at the end of the Security updates section of the v19.2.2 release notes. (I was not able to find this information elsewhere.)

The list of privileged metadata available via SQL is, likewise, not yet documented. Here is the list I was able to gather from the source code (may be incomplete):

Privileged metadata SQL virtual table
Gossiped metadata crdb_internal.gossip_nodes, crdb_internal.gossip_liveness, crdb_internal.gossip_alerts, crdb_internal.gossip_network
Stored node properties crdb_internal.kv_node_status
Stored store properties crdb_internal.kv_store_status
Server process variables crdb_internal.node_runtime_info
Statement statistics crdb_internal.node_statement_statistics
Transaction statistics crdb_internal.node_txn_stats
Cluster settings crdb_internal.cluster_settings
Node process metrics crdb_internal.node_metrics
Range and range lease details crdb_internal.ranges{,_no_leases}

5.6.5   Non-privileged cluster metadata

Access to non-privileged cluster metadata is granted to every authenticated principal without requiring a particular privilege bit.

I was not able to find a complete list of the RPC endpoints delivering non-privileged metadata in a single place. One can scan the .proto files in the pkg/server/serverpb source code directory and cross-reference that with the other .go files in pkg/server to reconstruct such a list.

The list of endpoints in the SQL package is more clearly recognizable from the source code (but may be incomplete):

Metadata SQL virtual table
Table leases (see below for discussion) crdb_internal.leases
Server process variables crdb_internal.node_runtime_info
Current session trace crdb_internal.session_trace
Current session variables crdb_internal.session_variables
List of supported built-in SQL functions crdb_internal.builtin_functions
Feature usage counters crdb_internal.feature_usage
Predefined comments on virtual tables crdb_internal.predefined_comments

5.7   Authorization deputies

An authorization deputy is a component that has higher privilege than the other components it serves. It must thus perform additional authorization checks to prevent its client components from escalating their privileges in unwanted ways.

Note

authorization deputies are also sometimes called “surrogates” in the literature.

An example of this in CockroachDB is the system.jobs table. As a SQL table, this is accessible only via the “admin” privilege. However, for the benefit of UX, CockroachDB also offers the option to non-“admin” principals to control their own jobs. This is done by introducing four authorization deputies: the crdb_internal.jobs virtual table, and the three SQL statements RESUME/CANCEL/PAUSE JOB. These four deputies each operate internally with the “admin” bit, and perform an additional filtering step to only return rows owned by the requester.

Here are some examples of other authorization deputies in CockroachDB:

  • The server components, as described in Architecture of a single node above. These perform authentication and authorization with the “admin” bit on behalf of yet-to-be-authenticated/authorized client sessions.
  • All the SQL virtual tables that expose SQL schema metadata (database/table/column names and zone configurations). This includes all of information_schema tables, certain pg_catalog tables and certain crdb_internal tables. These access system.namespace, system.descriptors and system.zones using the “admin” bit, then filter the results based on whether the requesting principal has any SQL privilege on the returned object.
  • the SQL execution logic of various SHOW statements that do not otherwise defer their authorization to a virtual table. For example, SHOW CLUSTER SETTINGS verifies that the requester has the “admin” bit.
  • The ALTER ... CONFIGURE ZONE statements operate on system.zones using the “admin” bit but first require the ZONECONFIG privilege from the requesting principal (new in version 20.1).

There are many such deputies throughout the source code, yet I was not able to find a single point of reference that lists them all in a systematic way.

6   Architecture of the authorization code

6.1   Reflections on the “hard shell, soft insides” model

In truth, I struggled to identify the authorization boundaries and deputies in the code. The reason for this struggle is that the code is not capability-based: all the code inside CockroachDB can theoretically operate over any and all the other components without any restriction. The node-to-node privileged RPC channel (with the “node” privilege bit, and thus able to control the entier cluster) is available internally to all source code components.

For example, the source code for a simple SELECT statement can freely send a KV request to write to an unrelated table.

I was not able to find a framework or structural enforcement of privilege checks throughout the layers of RPC and SQL execution logic. Neither was I able to find systematic testing of the authorization code either. Certain privileged statements have dedicated multi-user unit tests, but this is not done for all of them.

Therefore, the existence of additional authorization code (and thus of authorization deputies) feels, to this reader of the source code, an post-hoc model of how the code was written.

6.2   Unstructured and duplicated authorization

An architectural detail that caught my attention is that authorization policies are not systematically enforced in a single place.

Certain deputies do provide a single point of authorization in the source code for the various pathways that can request through them. For example, the logic to list running sessions and queries is performed in one place ((*statusServer) ListLocalSessions()), that single code is used both for RPC and SQL endpoints, and it takes responsibility for authorizing access.

However, most sources of data have multiple deputies, each responsible for enforcing authorization policy. For example, the “Statement details” RPC ((*statusServer) Statements()) and SQL’s crdb_internal.node_statement_details data generator each require the “admin” bit from their client separately. There does not seem to be any structural enforcement that the various access paths enforce the same authorization.

In some cases, the code accidentally enforces the right authorization policy, even though that does not seem to be intended. For example the virtual tables crdb_internal.forward_dependencies and .backward_dependencies report the tables/indexes that are referred to/from using foreign key (FK) relationships. The code acts as an authorization deputy by filtering rows out that should remain invisible to the requesting principal. However, as of this writing, it only validates the SQL privilege bits on one end of the relationship. At first glance, it thus seems theoretically possible that these two tables be used to learn about the existence and the logical schema of tables over which the logged-in user has no SQL privilege. This is not the case, however! In a separate part of the source code, the DDL statements that establish FK relations and the execution code that enforces FK constraints during mutation statements do check these privileges. It thus so happens that CockroachDB does not support FK relationships over mixed-privileged tables, so there is no additional information disclosure available via the virtual tables. This is OK, but does not seem to be deliberately designed.

To summarise, to the credit of the CockroachDB developers, it appears to me that the necessary authorization checks are indeed performed in the right places—with some caveats shared below. Even though it feels serendipitous, the authors of the code have been methodical, so far, and mistakes are promptly fixed, when recognized.

Example past issues that directly stem from this lack of systematic authorization:

7   Areas where further investigation is needed

While casually browsing the source code, I found the following points of interest:

  • The virtual table crdb_internal.leases reports all the leases on SQL objects throughout the cluster, and this virtual table is readable by any principal. This reveals the existence and the name of all the “currently active” tables/views/sequences.
  • Any SQL session can request “session tracing” via the statement SET tracing = on (or on,kv). After a trace is collected, it can view its own trace via SHOW TRACE or the virtual table crdb_internal.session_trace. The trace contains details of KV operations (including detailed key/values) prior to authorization and filtering by authorization deputies. This can be used e.g. to discover the list of all database and table names, by tracing a SHOW TABLES statement. In general, SQL session tracing can be used to bypass authorization and read any stored data processed by deputies with the “admin” bit. This is a textbook confused deputy situation.
  • Go error objects produced in one component flow through other components, including authorization deputies. While deputies filter the “positive” (non-error) data returned to requesters, they typically do not filter out sensitive data contained in errors produced with their “admin” bit. Given that KV-level error objects can contain key/value data, it is theoretically possible to exploit error handling to leak data through authorization deputies. This is another confused deputy situation.

8   Summary & things I would have done differently

CockroachDB is a distributed database where nodes communicate with each other and accept connections from database clients. From the STRIDE perspective, CockroachDB follows the state of the art and makes reasonable operational assumptions with regards to Authentication, Integrity (tamper-resistance) and “external” Confidentiality. There is little provision for Non-repudiability but this seems to remain out of scope for the project at this time.

CockroachDB’s Availability story is slightly more complex. While it was designed from the ground up as a Highly Available (HA) system in the face of accidents, there are some gaps regarding availability in the face of malicious behavior.

Finally, its Authorization story is “interesting” and offers the most opportunities for future improvements. CockroachDB uses a combination of SQL and non-SQL privilege bits, and the authorization policies that enforce privilege bits on service endpoints (through its HTTP, RPC and SQL APIs) are heterogeneously implemented, without a common structured framework. CockroachDB’s documentation is strong in its advertisement and explanation of the SQL privilege model over SQL resources, but does not yet systematically explain Non-SQL data and non-SQL authorization rules. Although the currently released versions do offer reasonably strong authorization guarantees, there are extremely little guardrails in the source code that prevent the accidental introduction of privilege escalation avenues during the development of new features.

As an engineer with interest in security, there are several low hanging fruits that I could recommend to lift the project into a better and stronger place:

  • Immediately change the default configuration to separate the SQL and RPC ports and strongly recommend in documentation to firewall the RPC port to prevent malicious use or “node”-level privilege escalation.
  • Separate the privilege bit required to access (and write to) system tables, containing critical operational parameters, from the “admin” bit required to view Privileged cluster metadata. This would strongly incentivize the engineers who add new views into cluster metadata to think about implementing and documenting a new authorization deputy with clear escalation rules.
  • Systematically eliminate the redundancy in authorization enforcement between the RPC servers and the SQL code. Ideally, a single point in the code should be responsible to both enforce authorization and log controlled privilege escalations, such as that performed by SQL virtual tables that act as non-privileged views into privileged data.
  • Gradually introduce a capability authorization sub-system, where the privileged components would require capability bits from their requesters, and where capability tokens would be delivered to a principal as part of their authentication. This system should be designed using cryptographic principles so that an engineer would not be tempted to synthetize a capability token “for convenience” when they are pressed for time implementing a new feature.

Like this post? Share on: TwitterHacker NewsRedditLinkedInEmail


Raphael ‘kena’ Poss Avatar Raphael ‘kena’ Poss is a computer scientist and software engineer specialized in compiler construction, computer architecture, operating systems and databases.
Comments

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.


Keep Reading


Reading Time

~22 min read

Published

Last Updated

Category

CockroachDB

Tags

Stay in Touch