The idiomatic way to propagate errors in Go is using the predefined type error. Yet Go’s standard library only provides very rudimentary constructors for error objects, errors.New() and fmt.Errorf().

This article introduces how the CockroachDB errors library, a drop-in replacement to Go’s own errors package, expands the vocabulary that Go programmers can use to describe and propagate errors in their code.

Go’s own (too) simple errors

The most common “simple” error object in Go, as constructed by fmt.Errorf(), is analogous to a string enclosed into a struct with the error interface: its Error() method retrieves the string that was put in when it was constructed.

err := fmt.Errorf("hello")
fmt.Println(err) // prints "hello"

Nothing less, nothing more. Printing the error object also reveals the string. This is also incidentally what is constructed by errors.New() when using Go’s default errors package.

Simple errors for everyday code

When using Dave Cheney’s errors library, or even better, the CockroachDB errors library via import "github.com/cockroachdb/errors", simple errors also automatically capture a stack trace at the time the error is constructed.

The stack trace is only displayed when printing the error verbosely. This makes it easier to troubleshoot where an error has originated:

package main

import (
   "fmt"
   "github.com/cockroachdb/errors"
)

func main() {
  err := errors.New("hello")
  fmt.Println(err) // still prints just "hello"

  fmt.Printf("%+v\n", err) // verbose mode
}

This prints:

hello
(1) attached stack trace
  -- stack trace:
  | main.main
  |     /home/kena/src/errors-tests/test.go:10
  | runtime.main
  |     /usr/lib/go-1.14/src/runtime/proc.go:203
  | runtime.goexit
  |     /usr/lib/go-1.14/src/runtime/asm_amd64.s:1373
Wraps: (2) hello
Error types: (1) *withstack.withStack (2) *errutil.leafError

This verbose output includes the result of .Error() on the first line, followed by the stack trace payload.

Experience shows again and again that the ability to extract the stack trace at the exact point an unexpected situation arises in a program is essential to pinpoint the exact cause and successfully troubleshoot an issue. Without this ability, programmers remain blind and immense amounts of time would be wasted.

For this reason alone, I would discourage anyone from using Go’s own fmt.Errorf() or errors.New(). Instead, import github.com/cockroachdb/errors and peruse the following:

  • errors.New(): drop-in replacement to Go’s own errors.New(), with stack traces;
  • errors.Errorf() or errors.Newf(): replacement to Go’s own fmt.Errorf(), with stack traces.
package github.com/cockroachdb/errors

// New constructs a simple error and attaches a stack trace.
func New(msg string) error

// Newf constructs a simple error whose message is composed using printf-like formatting.
// It also attaches a stack trace.
func Newf(format string, args ...interface{}) error

// Errorf is an alias for Newf for convenience
// and drop-in compatibility with github.com/pkg/errors.
func Errorf(format string, args ...interface{}) error

Adding a message prefix to errors to clarify context

When the same piece of logic is called from multiple places, and it can fail with an error, then it is desirable to add a message prefix to any returned error object.

This helps give more context about “where an error has been,” so that if/when an error is raised at run time, it remains clear which code path has yielded the error.

For example:

package main

import (
   "fmt"
   "github.com/cockroachdb/errors"
)

func foo() error {
     return errors.New("boo")
}

func bar() error {
     if err := foo(); err != nil {
        return errors.Wrap(err, "bar")
     }
     return nil
}

func baz() error {
     if err := foo(); err != nil {
        return errors.Wrap(err, "baz")
     }
     return nil
}

func main() {
     r := rollDice()
     var err error
     if (r < 4) {
        err = bar()
     } else {
        err = baz()
     }
     fmt.Println(err)
}

Thanks to errors.Wrap(), which adds a prefix to the message, the main function can report bar: boo or baz: boo and the (human) reader of the error message can known after the fact which function was called. Without errors.Wrap(), which call path led to the error would be undiscoverable.

For convenience, errors.Wrap() returns nil when provided a nil error as input. This allows us to elide the if err != nil condition in many cases. For example:

func bar() error {
     return errors.Wrap(foo(), "bar")
}

func baz() error {
     return errors.Wrap(foo(), "baz")
}

Finally, errors.Wrap() also attaches a secondary stack trace to the error object, which provides additional context when troubleshooting an error’s origin. This is especially useful if an error is communicated “side ways” in a program using e.g. Go channels.

This stack trace, as for errors.New(), is only visible when displaying the error verbosely. For example:

import (
   "fmt"
   "github.com/cockroachdb/errors"
)

func foo() error { return errors.New("world") }
func bar(err error) error { return errors.Wrap(err, "hello") }
func baz() error { return bar(foo()) }

func main() {
  err := baz()
  fmt.Println(err) // still prints just "hello: world"

  fmt.Printf("%+v\n", err) // verbose mode
}

This prints:

hello: world
(1) attached stack trace
  -- stack trace:
  | main.bar
  |     /home/kena/src/errors-tests/test.go:10
  | [...repeated from below...]
Wraps: (2) hello
Wraps: (3) attached stack trace
  -- stack trace:
  | main.foo
  |     /home/kena/src/errors-tests/test.go:9
  | main.baz
  |     /home/kena/src/errors-tests/test.go:11
  | main.main
  |     /home/kena/src/errors-tests/test.go:14
  | runtime.main
  |     /usr/lib/go-1.14/src/runtime/proc.go:203
  | runtime.goexit
  |     /usr/lib/go-1.14/src/runtime/asm_amd64.s:1373
Wraps: (4) world
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.leafError

As before, the result of .Error() is printed on the first line. Then the stack trace for the outermost layer (the result of errors.Wrap()) is printed out. This reveals that the error was wrapped at line 10, but the call trace is otherwise shared with the one displayed below.

Then the verbose display reveals the inner error, with message world and its own stack trace. This inner stack trace reveals that the inner error was generated at line 9.

The error wrapping facility is versatile: one can also compose message prefixes using printf-like formatting. Here is the full API:

package github.com/cockroachdb/errors

// Wrap adds a message prefix and also attaches an additional stack trace.
// If the first argument is nil, it returns nil.
func Wrap(err error, msg string) error

// Wrap adds a message prefix composed using printf-like formatting,
// and also attaches an additional stack trace.
// If the first argument is nil, it returns nil.
func Wrapf(err error, format string, args ...interface{}) error

Additionally, for compatibility with Go 1.13’s fmt.Errorf(), the errors.Newf() and errors.Errorf() functions we saw above also recognize the formatting verb %w, which triggers the wrap logic.

For example:

// The following is similar to errors.Wrapf(err, "hello").
// However, it does not return nil if err is nil!
err = errors.Newf("hello: %w", err)

Note that only Newf() / Errorf() recognizes %w: it is not valid with errors.Wrap().

Tip

Prefer errors.Wrap() to the special verb %w: it properly ignores nil errors given as input, and it is computationally simpler.

Secondary error annotations

Every intermediate Go programmer quickly runs into this painful situation: what should one do if an error is encountered while handling an error?

A common example is cleaning up the filesystem after an error is encountered while processing a file:

func writeConfig(out string, cfgA, cfgB Config) (resErr error) {
    // Create the destination directory.
    if err := os.MkDir(out); err != nil {
       return err
    }
    defer func() {
       // If an error is encountered below, remove
       // the destination directory upon exit.
       if resErr != nil {
          if dirErr := os.RemoveAll(out); dirErr != nil {
             // now... what?
             ...
         }
       }
    }()

    if err := writeCfg(out, cfgA, "a.json", "config A"); err != nil {
      return err
    }
    return writeCfg(out, cfgB, "b.json", "config B")
}

func writeCfg(outDir path, cfg Config, filename, desc string) error {
    j, err := json.Marshal(cfg)
    if err != nil {
       return errors.Wrapf(err, "marshaling %s", desc)
    }
    return ioutil.WriteFile(filepath.Join(out, filename), j, 0777)
}

The function in this example creates an output directory to write the two config objects into it. However, an error could occur while writing some of the config objects. In that case, the function wants to clean up after itself by removing the directory that was just created.

What happens then if an error occurs during the directory removal? Which error should be returned?

  • if we return the original error, we become blind to the directory removal error.
  • if we return the directory removal error, we become blind to the file generation error.

Somehow, we want to return details about both errors to aid during troubleshooting. Meanwhile, we want to be mindful to preserve the first error encountered as “main” error, for the purpose of cause analysis.

We can achieve this by tweaking the code as follows:

defer func() {
   // If an error is encountered below, remove
   // the destination directory upon exit.
   if resErr != nil {
      if dirErr := os.RemoveAll(out); dirErr != nil {
         // This attaches dirErr as an ancillary error
         // to the error object that was already stored in resErr.
         resErr = errors.WithSecondaryError(resErr, dirErr)
     }
   }
}()

With this programming pattern, we can be confident that we preserve the full picture of what happened when an error is encountered while handling another error.

Secondary error annotations do not influence the text returned by .Error() on the main error. From the perspective of the surrounding code, and that of the standard API errors.Is(), the code behaves as if only the primary error had occurred.

However, the secondary error is revealed during verbose prints. For example:

package main

import (
   "fmt"
   "github.com/cockroachdb/errors"
)

func main() {
  err := errors.New("hello")
  err = errors.WithSecondaryError(err, errors.New("friend"))
  fmt.Println(err) // prints just "hello"

  fmt.Printf("%+v\n", err) // verbose mode
}

This prints:

hello
(1) secondary error attachment
  | friend
  | (1) attached stack trace
  |   -- stack trace:
  |   | main.main
  |   |         /home/kena/src/errors-tests/test.go:11
  |   | runtime.main
  |   |         /usr/lib/go-1.14/src/runtime/proc.go:203
  |   | runtime.goexit
  |   |         /usr/lib/go-1.14/src/runtime/asm_amd64.s:1373
  | Wraps: (2) friend
  | Error types: (1) *withstack.withStack (2) *errutil.leafError
Wraps: (2) attached stack trace
  -- stack trace:
  | main.main
  |     /home/kena/src/errors-tests/test.go:10
  | runtime.main
  |     /usr/lib/go-1.14/src/runtime/proc.go:203
  | runtime.goexit
  |     /usr/lib/go-1.14/src/runtime/asm_amd64.s:1373
Wraps: (3) hello
Error types: (1) *secondary.withSecondaryError (2) *withstack.withStack (3) *errutil.leafError

Like before, we see the text of .Error() on the first line. Then we see the verbose printout for the attached secondary error, indented to the right relative to the main error. The secondary error’s own .Error() is friend, which is printed first, followed by the secondary errors’s embedded stack trace.

Then the printout continues, unindented, with the stack trace of the main error.

API overview:

package github.com/cockroachdb/errors

// WithSecondaryError attaches secondary as an annotation
// to the primary error. If primary is nil, nil is returned.
func WithSecondaryError(primary error, secondary error) error

// CombineErrors attaches err2 to err1 as secondary error
// if both err1 and err2 are not nil. If err1 is nil, err2
// is returned instead.
func CombineErrors(err1, err2 error) errors

Smarter error handling for subtasks

The extension package errgroup (docs) provides a reusable library for “synchronization, error propagation, and Context cancelation for groups of goroutines working on subtasks of a common task.”

Its implementation can be found here: https://github.com/golang/sync/blob/master/errgroup/errgroup.go

At a high level, it uses a sync.WaitGroup to run multiple goroutines with a barrier at the end. Additionally, it cancels every other goroutine in the group as soon as any of them terminates with an error.

The issue with the logic there is that if two or more goroutines fail with an error, only the first error is reported. The other errors are “forgotten”:

func (g *Group) Go(f func() error) {
     g.wg.Add(1)

     go func() {
             defer g.wg.Done()

             if err := f(); err != nil {
                     // errOnce.Do executes its argument just once. The second time an
                     // error is encountered, it is simply forgotten altogether! Not nice.
                     g.errOnce.Do(func() {
                             g.err = err
                             if g.cancel != nil {
                                     g.cancel()
                             }
                     })
             }
     }()
}

We can fix this as follows:

type Group struct {
     ...
     errOnce sync.Once
     mu {
        sync.Mutex // makes .err race-free.
        err     error
     }
}

func (g *Group) Wait() error {
   ...
   return g.mu.err
}

func (g *Group) Go(f func() error) {
     ...
     go func() {
             ...
             if err := f(); !errors.Is(err, context.Canceled) {
                   g.mu.Lock()
                   defer g.mu.Unlock()
                   g.mu.err = errors.CombineErrors(g.mu.err, err)
             }
     }()
}

With this alternate version of the errgroup.Group, if there are two or more errors in the sub-tasks, the first one will become the “main” error and every other error after the first will be attached as a secondary error annotations.

The code further uses errors.Is(err, context.Canceled) to exclude error objects that were produced by the Group‘s call to the shared context’s cancel() function—these are just noise and likely not useful during troubleshooting.

Checking the identity of errors

In the most common cases, errors are propagated to eventually be returned over a network connection, or printed to a log file.

Sometimes however, the code needs to inspect an error object to decide an alternate behavior.

For this purpose, libraries can define error predicates that can be used to detect specific situations. For example:

package os

// IsExist returns a boolean indicating whether the error is known to report
// that a file or directory already exists. It is satisfied by ErrExist as
// well as some syscall errors.
func IsExist(err error) bool

This can be used as follows:

func ensureDirectoryExists(path string) error {
    if err := os.Mkdir(path); err != nil {
       if os.IsExist(err) {
         // The directory already exists. This is OK,
         // no need to report an error.
         err = nil
       }
       return err
    }
    fmt.Println("directory created")
}

This function attempts to create a directory. If it already exists, it behaves as a no-op and does nothing. If another error is encountered (e.g. write-only filesystem, disk corruption, etc), that error is reported.

Another technique is to use “sentinel” errors, and compare a returned error object to those sentinels to detect particular situations.

We saw an example of that above with errors.Is(err, context.Canceled); here is another example from a SQL client program:

func (c *sqlConn) Query(query string, args []driver.Value) (*sqlRows, error) {
     if err := c.ensureConn(); err != nil {
             return nil, err
     }
     rows, err := c.conn.Query(query, args)
     if errors.Is(err, driver.ErrBadConn) {
             // If the connection has been closed by the server or
             // there was some other kind of network error, close
             // the connection on our side so that the call to
             // ensureConn() above establishes a new connection
             // during the next query.
             c.Close()
             c.reconnecting = true
     }
     if err != nil {
             return nil, err
     }
     return &sqlRows{rows: rows.(sqlRowsI), conn: c}, nil
}

This code detects when the SQL driver returns driver.ErrBadConn and chooses a special behavior in just this case. Any other error is returned as-is, and causes the program to stop somewhere in the callers of this function.

Note

errors.Is() can detect sentinel errors throughout the direct causal chain of an error by recursing on the error’s Unwrap() method. Any secondary error annotation found “on the way” is thus ignored. This behavior was chosen by design: a tree-like behavior would make it hard to reason about what it means for an error to be the “cause” of another; it would also raise difficult questions about the order of traversal in the other API errors.As(). On a personal note, experience has not yet shown me that anything more than a linear causal chain was useful in practice.

Differences with pkg/errors

The de facto standard replacement to Go’s errors package since 2016 was Dave Cheney‘s pkg/errors library, available at https://github.com/pkg/errors.

This is the package that originally introduced the notion of error objects that are linked lists, and the idea of wrapping errors and adding stack traces automatically to provide more context during troubleshooting.

Unfortunately, the release of Go 1.13 forced pkg/errors into obsolescence: Dave Cheney defined his library to use a method called Cause() to extract the linear cause of an error chain; when Go 1.13 adopted the idea to make errors linked lists, it defined another method called Unwrap() to extract causes. Therefore, Go’s standard errors.Is() and other APIs are unable to understand errors originating from pkg/errors.

Additionally, the objects from pkg/errors severely suffer from the Go error printing catastrophe, and that library thus makes the implementation of custom error types excessively difficult.

The CockroachDB errors library takes over where pkg/errors left off: it adopts the Go 1.13 conventions, provides a drop-in replacement to the Go 1.13 standard APIs, and averts the Go error printing catastrophe. It also implements most of the pkg/errors interfaces, so it can be used as a drop-in replacement in programs that were using Dave Cheney’s library previously.

Summary

The Go library provides simplistic implementations of the error interface via fmt.Errorf() and errors.New() from Go’s own errors package.

A better experience can be obtained by using the CockroachDB errors library instead, a drop-in replacement to Go’s errors package and Dave Cheney’s pkg/errors.

Its error constructors errors.New() / errors.Newf() (aliased as errors.Errorf()) automatically include stack traces in error objects, which can be displayed for troubleshooting using verbose formatting with fmt.Printf("%+v", err).

It also provides a vocabulary of error wrappers. The most common is a message prefix annotation with errors.Wrap() / errors.Wrapf() to annotate the call paths to a function called from multiple places. This also includes a stack trace behind the scenes.

Another common wrapper solves the puzzling problem of what to do in Go when an error is encountered while handling another error: with a secondary cause annotation, attached using errors.WithSecondaryCause() or errors.CombineErrors(), Go code can preserve both errors so that a programmer can see both during troubleshooting.

Errors from the CockroachDB error library also avert the great Go error printing catastrophe, by providing consistent behavior and a helpful display structure when formatting errors verbosely. We will explore this topic more in a later article in this series dedicated to the implementation of custom errors.

Like this post? Share on: TwitterHacker NewsRedditLinkedInEmail


Raphael ‘kena’ Poss Avatar Raphael ‘kena’ Poss is a computer scientist and software engineer specialized in compiler construction, computer architecture, operating systems and databases.
Comments

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.


Reading Time

~11 min read

Published

The CockroachDB errors library

Category

Programming

Stay in Touch