Go’s formatting APIs - The CockroachDB errors library, part 2/

This writeup is part of the prologue to a series of articles that talk about the “CockroachDB errors library”, which is really a general-purpose, open source replacement for Go’s standard errors package.

Consider for example the following piece of code:

import "fmt"

type T struct {
   x int
}

func main() {
   v := T{123}
   fmt.Println(v)
}

This program prints {123}, even though we havent taught Go how to print our type T. How does it do this?

Equivalence of printers

The logic in the fmt package is shared between all the printers, such that the following calls are all guaranteed to be equivalent:

fmt.Print(x)
fmt.Printf("%v", x)
os.Stdout.Write([]byte(fmt.Sprint(x)))
os.Stdout.Write([]byte(fmt.Sprintf("%v", x)))

In other words, the logic for fmt.Print is always the same as using Printf with the verb %v—to the point that the former actually uses the latter as its implementation.

Likewise, fmt.Println uses fmt.Print and thus the %v verb under the hood, and ditto for fmt.Sprintln and fmt.Sprint.

`fmt.Stringer` and `fmt.Formatter`

Now, add the following at the bottom of the code above:

func (t T) String() string { return "boo" }

And run the program again. What happens? It prints boo. The value 123 is nowhere to be seen.

What is happening here is that a method String() returning string implements the standard interface fmt.Stringer, and the functions in fmt try to use that if they can find it.

Separately, try removing the String() function definition above, and replace it with this:

func (t T) Format(s fmt.State, _ rune) {
  fmt.Fprint(s, "baa")
}

What happens then? The program now prints baa. Again the value 123 is nowhere to be seen.

What is happening here is that a method Format(fmt.State, rune) implements the fmt.Formatter interface, and the functions in fmt try to use that if they can find it.

What if both methods are available?

The program then prints baa: fmt.Formatter is preferred over fmt.Stringer if both are available.

And when neither method is available, the fmt logic “falls back” on its own internal display code, which does a best effort at representing the value.

What `fmt` knows about `error`

Go’s standard error interface provides just an Error() method returning a string, and nothing else.

The fmt logic knows about error, and knows how to use its Error() method, by extending the preference rule explained above:

fmt.Formatter is preferred in all cases if present.
if fmt.Formatter is not present, but error is, then Error() is preferred.
otherwise fmt.Stringer is used if present.

Relationship between `%s`, `%v`, `%q` and `%x` / `%X`

So far we’ve seen how the fmt logic can optionally use fmt.Stringer, error fmt.Formatter under the hood for %v.

Yet perhaps the more common verb used in Go code is %s. How does %s relate to %v?

Generally, %s uses more or less the same logic as %v: if either fmt.Stringer, error or fmt.Formatter is present, it will use that with the same preference.

The difference appears when the object implements neither String(), Error() nor Format(). In this case, %v has some predefined representation (e.g. {123} in the example above), whereas %s complains that “the argument has the wrong type” and fails to represent anything.

This is why unless the code is manipulating values with the specific type string, the Go idiom is to reach out for %v in the general case instead of %s.

The additional verbs %q and %x / %X are variants of %s (with the same restrictions when neither String(), Error() nor Format() is available):

%q quotes the resulting string, so that fmt.Printf("%q", `he said "hi"`) prints he said \"hi\".
%x / %X show a hexadecimal representation of the bytes in the string. I personally found this was extremely rarely used in practice (in contrast to using it for integer types, which is relatively common).

By-value printing and by-reference methods

Now consider the program above, and the following combination of implementations (beware of the receiver types):

func (t T) String() string { return "boo" }
func (t *T) Format(s fmt.State, _ rune) { fmt.Fprint(s, "baa") }

This, now, prints boo again. What is happening? The code above passes the T instance by value. At that level, only the String() method is available, so the fmt logic prefers that. But now check out this:

func (t *T) String() string { return "boo" }
func (t *T) Format(s fmt.State, _ rune) { fmt.Fprint(s, "baa") }

What gives? The program now prints {123} again. Neither method is visible to the fmt logic.

Hence the following rule: if an object is printed by value, only its by-value methods are considered.

By-reference printing and by-value methods

Now let us switch things over, with the following main program instead:

func main() {
   v := &T{123}
   fmt.Println(v)
}

Now consider the following program variants:

Variant A:

func (t T) String() string { return "boo" }
func (t T) Format(s fmt.State, _ rune) { fmt.Fprint(s, "baa") }

Variant B:

func (t T) String() string { return "boo" }
func (t *T) Format(s fmt.State, _ rune) { fmt.Fprint(s, "baa") }

Variant C:

func (t *T) String() string { return "boo" }
func (t T) Format(s fmt.State, _ rune) { fmt.Fprint(s, "baa") }

Variant D:

func (t *T) String() string { return "boo" }
func (t *T) Format(s fmt.State, _ rune) { fmt.Fprint(s, "baa") }

What is printed in each case?

Both variants A, B and D print baa.
Variant C prints boo.

What is going on? The answer is to look at the receiver type for the methods. The fmt logic first looks at the exact type of the argument, which in this case is *T, and only if it cannot find anything at that level it tries to look at the type “pointed to”, in this case T. This clarifies what happens:

in variant D, both fmt.Stringer and fmt.Formatter are available on *T, so that is picked.
in variant C, fmt.Stringer is available on *T, so that is picked.
in variant B, fmt.Formatter is available on *T, so that is picked.
in variant A, no method is available on *T, so fmt looks at T next. At that level it finds both and prefers fmt.Formatter, as explained above.

Hence the following general rules:

if an object is printed by-reference, its by-reference methods are considered first.
when implementing your own custom printer methods, prefer implementing them by-reference, so they get picked up in more cases.

Verbose printing with `%+v`

The + flag for numeric types forces the display of a plus sign for positive values, so that the sign is always shown.

In combination with v however, it triggers “verbose printing”.

With the default fmt logic, this adds the name of fields to structs.

If just fmt.Stringer is implemented, + does not change anything; however if fmt.Formatter is implemented, then by convention the code in the Format() method includes more details in the output than when + is not specified.

The Go library does not prescribe how this should be achieved: different packages tend to do this in different ways. The lack of specification is not an issue however; in either case the output is intended for use by human eyes and so minor display inconsistencies are not (yet) considered consequential.

Go representation and the `%#v` verb

Finally, change the original main program to use the %#v verb instead:

func main() {
   v := T{123}
   fmt.Printf("%#v\n", v)
}

What does this print?

if a String() method is available, it is ignored.
if a Format() method is available, that is used.
otherwise, if a GoString() method is available (from the fmt.GoStringer interface), that is used.
otherwise, a printout of the structure using Go syntax is produced.

What is happening here is that the %#v specifier intends to print out the “Go representation” of the value, not its “human representation.” The fmt logic knows how to do this, but a custom type can customize this behavior with the fmt.Formatter or fmt.GoStringer interfaces.

Note that I include this explanation of fmt.GoStringer for completeness; I have found in practice that it is only rarely used.

I also personally recommend the facility at https://github.com/kr/pretty, which is able to print Go representations much more clearly than Go’s standard library; for example: fmt.Printf("%# v", pretty.Formatter(x)).

Formatting verbs, flags and modifiers

We have seen so far how %v differs from %s in intent and purpose, and how e.g. %v differs from %+v.

What if we wanted to define our own customization with a different result for each of them?

The reliable customization mechanism, for all three cases, is the fmt.Formatter interface:

package fmt

// Formatter can be implemented by your custom types.
type Formatter interface {
     Format(s State, verb rune)
}

// An object of type State is provided by the fmt
// logic to your custom Format() method.
type State interface {
     io.Writer // inherits the Write() method

     Flag(int) bool

     Width() (int, bool)
     Precision() (int, bool)
}

What interests us most is:

the verb argument passed directly to our own custom Format() method. This indicates the main “formatting verb”: for %v, verb == 'v'. For %#v, verb == 'v' also. For %s, the verb is s, and so on.
the Flag() method on the fmt.State passed as argument to the Format() method. Flag() returns true iff the corresponding formatting flag has been set.

For example, for %v, Flag('#') == false, whereas for %#v, Flag('#') == true.
the fact that fmt.State also implements io.Writer. This makes it possible to e.g. pass the State variable directly as first argument to another call to fmt.Fprint to further simplify the implementation of custom Format() methods.

The Width() and Precision() methods on fmt.State are also interesting as they give access to the additional numeric parameters, or modifiers, in a formatting string. For example, in %3.2f, we have width 3 and precision 2. However, I found that these were used less often in practice.

Here is a rather idiomatic example:

type Response struct {
     code int
     msg string
}

func (r *Response) Format(s fmt.State, verb rune) {
   switch verb {
   case 'v':
       if s.Flag('+') {
          // With %+v, we print both the message and the code.
          fmt.Fprintf(s, "%s (%d)", r.msg, r.code)
       }
       fallthrough
   case 's':
       // For %s, or %v without +, we just print the message.
       fmt.Fprint(s, r.msg)
   }
}

// String is provided for convenience.
func (r *Response) String() string { return fmt.Sprint(r) }

What is going on here?

the main representation function for type *Response is fmt.Formatter. When used with %+v, it prints both the message and the code between parentheses. With just %v / %s, it prints just the message.
to make the type compatible with the fmt.Stringer interface, for use in other places where a String() method is required, an implementation of String() is implemented by calling into fmt.Sprint.

This is discussed further below.

An interesting aspect of this code is that it does not handle %q / %x / %X. For these verbs, it outputs nothing. fmt is OK with that.

Neither does it support other flags to %v than +; for example it treats %#v and %v the same.

In fact, the Go API does not make it easy to implement a custom Format() that is as general and powerful as its own internal logic, and Go packages “in the wild” often contain incomplete implementations like the one above.

Custom formatters in practice

I have found in practice that the following properties hold well across packages in the ecosystem:

custom Format() methods always do something valid and useful for the v verb, regardless of the flags provided.
the behavior of Format() with verb v and no flags (i.e. a simple %v) is most often kept consistent with the behavior of String(), if it is also available.
if a custom formatter has both a “simple” and a “verbose” mode, it commonly recognizes + as the flag to access the verbose mode.
if both %s and %v (without flags) are recognized, they usually emit the same thing.
it’s uncommon to see %q, %x and %X handled properly in custom Format() methods, if at all.
custom formatters for non-numeric types nearly never handle the width and precision modifiers.

This last point in particular is the reason why code that cares about fixed-width string formatting should spell out the printing in two steps, as follows:

s := fmt.Sprint(v)
fmt.Printf("%30s", s)  // instead of printing v directly

Code reuse between `fmt.Stringer`, `fmt.Formatter` and `error`

An example above was implementing String() by calling fmt.Sprint, which in turn uses the Format() method on the same type. To simplify:

type T struct { msg string }

func (r *T) Format(s fmt.State, _ rune) {
   fmt.Fprint(s, r.msg)
}

func (r *T) String() string {
   // This causes fmt to call Format() above and ultimately
   // print r.msg.
   return fmt.Sprint(r)
}

Why would one choose to implement String() via return fmt.Sprint(r) instead of return r.msg in this case?

This is an instance of DRY: if later the logic needs to change to “print more stuff”, only the Format() methods needs to be modified; the String() method automatically benefits from it.

This pattern is relatively common; but so is the following:

type T struct { msg string }

func (r *T) String() string {
   return r.msg
}

func (r *T) Format(s fmt.State, _ rune) {
   fmt.Fprint(s, r.String()) // or: s.Write([]byte(r.String()))
}

Again, one method is implemented “using the other”, so that one only needs to change either of them to get the same behavior in both.

Likewise, if the error interface is involved, we see all combinations of reuses in practice:

type T struct { msg string }

func (r *T) Error() string { return r.msg }
func (r *T) String() string { return r.Error() }
func (r *T) Format(s fmt.State, _ rune) { fmt.Fprint(s, r.Error()) }

type U struct { msg string }

func (r *U) String() string { return r.msg }
func (r *U) Error() string { return r.String() }
func (r *U) Format(s fmt.State, _ rune) { fmt.Fprint(s, r.String()) }

type V struct { msg string }

func (r *V) String() string { return fmt.Sprint(r) }
func (r *V) Error() string { return fmt.Sprint(r) }
func (r *V) Format(s fmt.State, _ rune) { fmt.Fprint(s, r.msg) }

Why do we see so much diversity?

I am not exactly sure, but I blame the lack of prescription in the Go library documentation. Also see the two answers below.

Does it matter since we get the same result in every case?

From a functional perspective these examples are all equivalent. From a performance perspective, one should consider which of the variants is used more often in the program. If the String() method is commonly used, more so than printing out the object, then having String() contain the simplest implementation may yield better performance. This is because the logic in the fmt package is a little heavyweight. However note that in practice I have not found this to be often the case, so I would say it does not matter much.

I am implementing my own custom type. What pattern should I aim for?

If your type only has just one representation, then you can reach out to String() directly (or Error() if you are implementing an error type), and omit Format() entirely.

If you need to make a difference between “simple” and “verbose” displays, then implement Format() first then derive String() (and/or Error()) from it.

Summary and take aways

Go provides a general-purpose formatting API in its standard fmt package.

All the functions in that API are powered by common logic, which is the logic used under the hood by Printf / Sprintf: each object is displayed in the context of some formatting “verb”.

The most common and reliable verb is v (tip: it is “v” like “value”), also used under the hood by Print() and Println(). It can print pretty much anything and is not picky about whether the value is nil or implements a particular interface.

Meanwhile, when implementing your own type, you can customize the behavior of fmt by implementing certain interfaces:

fmt.Stringer, a simple String() string method.
error, a simple Error() string method.
fmt.Formatter, a Format() method. This can be used to display different things when used via %v vs. %+v and other combinations of verbs and flags.

In pratice, we see packages that provide both String() and Format() methods side-by-side, or Error() and Format(). One is often implemented by calling the other, to avoid code duplication. All combinations of reuse are allowed by Go’s standard library, and we actually can find all variants in the ecosystem.

References

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Comments

Go’s formatting APIs The CockroachDB errors library, part 2/