Hillel proposed on Twitter today:

Expressiveness” is how easily we can ‘do’ things in a system. “Tractability” is how easily we can assert properties on things we ‘do’. F.ex DFAs are tractable, Turing machines are expressive. The expressiveness vs tractability tradeoff is one of the most fundamental in CS.

—Hillel Wayne

I am also a big fan of crafting vocabulary to catch nuance that’s otherwise too difficult to pin down or remember. But I think this particular choice is so misguided as to be useless in practice—untractable, if you will.


In the discussion that followed, Hillel seems to explain that “expressivity” is about the size of the language: how many sentences (or programs) are accepted as valid in the language:

The set of languages a turing machine accepts is far larger than the set of languages a DFA accepts.

—Hillel Wayne


Before I dive into the meat of my reaction, let us start with a textbook reminder: the question of what it means for a language to be “larger” than another.

The layman idea of size is related to numbers used to measure things.

Languages are usually defined as the set of all possible texts (sentences, utterances, programs) that are accepted as valid in that language.

Most interesting languages do not have a finite size. I can always build at text, sentence, program larger than the last one I built. Nearly all written languages, in particular all computer languages to-date (including both programming, specification and modeling languages) have the same “size” by the count-the-sentences metric: the cardinality of integers.

This include the language of DFAs (set of all possible DFAs).

There are a few finite languages, where there’s just so many valid texts. For example the language recognized by the regular expression a(b|c)d contains exactly two sentences abd and acd.

So we could naively think this is what Hillel refers to in the answer above: the set of phrases recognized by a DFA (singular in the quote) could be finite and small, compared to the (countably infinite) set of texts/programs one can recognize with a Turing machine.

This says nothing of the set of all DFAs (the language of DFA construction) which, in aggregate, has the same cardinality as that of the programs for a Turing machine.

Yet, there is something that distinguishes the expressivity of DFAs from the expressivity of Turing machines. If it is not the cardinality of programs, what is it?


I’ll do some charitable reading here and assume that Hillel wanted to refer to the fact some languages accept certain texts/programs that other languages do not.

For example, a program that takes a sentence as input and says yes or no to whether it is a palindrome can be programmed on a Turing machine, but not using a DFA.

So in that sense, I can do “more” of certain things with a Turing machine than with a DFA.

Now, in order to make this statement I need three things:

  1. I need the ability to talk about the difference between a program (the thing that I spell out) and its meaning (the thing that it does/mean).
  2. I need a way to compare sets of meanings (why is testing palindromes qualitatively “more” than, say, testing whether a word starts and ends with an “a”).
  3. I need a way to “draw a line”: the ability to say “you can’t do that with a DFA”.

None of these things are trivial. There’s a lot to anchor before we can even take a shot at explaining “expressivity”.


Regarding the first point: texts vs meanings. This sounds trivial for computer scientists, but alas in my experience it is not trivial even for professional programmers, let alone for the layman. Talking about the semantics of a language needs a meta-language, and for most humans the meta-language is not expressed in words; it is an artifact of their culture and subconscious. I know enough folk who balk at the idea that things—even programs—can have different meaning depending on context.

This brings me to the third point. DFAs have been invented originally to simplify reasoning about (and with) a particular mathematical theory, that of rational languages.

But the construction language of DFAs is, arguably, the same as that of flowcharts and other structured thinking languages: shapes linked together by arrows, with labels inside the shapes and the arrows. And I am not even talking about the visual representation of DFAs (or flowcharts) here! Humans are just very good at putting notions into discrete boxes, even metaphorically, and maintain edges between them to crystallize the associativity of their thoughts, so much that they become unable to associate thoughts that they didn’t relate together in the first place.

Now, take a DFA drawn by a mathematician to express some interesting traversal of coloured graphs. That DFA will contain boxes about colors “red”, “white”, “black” and arrows between them.

Now print that DFA out on paper, and hand it out to a child going to school that day. Expect: at the end of the day, an invitation by the head teacher to explain why you are instigating racial divide into your child, by giving them arbitrary notions of which skin color can marry which skin color.

“You can’t do that with a DFA.”

Sure, but try explaining that to the head teacher who never heard about DFAs.


On a more serious note, Perl extends regular expressions with back-references. For example /(\w)(?:(?R)|\w?)\1/ recognizes palindromes. Now we have a problem on our hands:

  • a sizeable chunk of folk assume that the language of regular expressions and that of DFAs is the same.
  • Perl regular expressions can “do” certain things that contradict what we said above.

What gives? “You can’t do that with DFAs!”

Yet “you can do this in Perl”, so somehows “Perl doesn’t use DFAs”. Indeed, it uses something else. That something else (its backtracking engine) is used for all regexes, not just the palindromic one above. Perl barely uses DFAs at all.

So now we have a language, that of regular expressions, which was intended and designed as a representation for DFAs, repurposed to program a Perl thing that has barely any relationship with DFAs any more.

If I can take any restricted language, lift it into a representation system, express all my things in that representation system, and then reuse that representation system for a different machine that can “do” more things, was that first language really less able to “do” things to start with?

Isn’t any restricted language in fact just “on its way” to become extended by innovative creators into revealing its full expressivity?

This would cause a road block on the second point above: if I tell you “I can do more thing in my language than yours” you can come back to me with “look how I extended my language!” and every time you do that, we both lose a little bit of our collective ability to talk about differences in semantic systems. In that back-and-forth, there is absolutely zero difference in “expressivity” between languages, just different stages of maturity or development between languages.


Now I think of it, I probably focused a bit too much on the line Hillel tried to draw between Turing machines and DFAs. A key lies in his first words: expressiveness is how easily we can do things.

Both Turing machines and the Python programming language recognize the same languages, they can “do” about the same things, but noone would refute that Python is more easy to work with.

To a human, that is.

For a compiler, for example, it is much easier to translate a program to Brainfuck than it is to translate it to Python.

So it seems like expressiveness depends on who does the expression.

This theory is strengthened by the recollection that the introduction of recursion in programming languages (in ALGOL, in the 1950s) caused serious dissent between engineers and mathematicians: Dijkstra and the mathematicians found that recursive procedures made the language more expressive (to themselves), whereas the engineers didn’t see any value in recursion (to themselves) and disliked them because of the added machine complexity.

I found this particularly instructive, given that ALGOL had loops and dynamically sized arrays and was thus, from a formal perspective, just as able to compute anything in either case: recursion added expressivity from the perspective of the (continental) mathematicians, but not from that of the (american) engineers, but their ability to “do work” in the end remained equal: you could program anything in ALGOL without recursion, even though the language eventually supported it.

It seems to me, therefore, that this variant of expressivity is entirely subjective and doesn’t really impact the ability of folk to “do” anything—as long as the language is sufficiently general and allows different styles for different individual preferences.


I think I’ll do some more charitable reading here and assume that Hillel cared about something closer to home, given that his job is literally to talk about automated testing and verification, and in particular automate the removal of errors from programs.

So let’s talk about “expressivity” as “how easy it is to make mistakes”.

By this (quite common) definition, C is notoriously “expressive,” so much so that it accepts all kinds of programs that we would rather prefer did not exist. This flaw of C weighs so heavy on society that considerable resources are being poured into e.g. Rust, which (in my opinion) only flourished because of these flaws of C: Rust makes it much harder to make those mistakes that we dislike about C.

By this definition (and only that definition), Rust is less expressive than C and that’s somewhat an advantage.

In fact, there are many more things Rust can express: traits, lifetimes, sum types, etc., directly increase the productivity of programmers by making it possible to express certain concepts more concisely, and carry over their meaning to other programmers more effectively. This is what is traditionally meant by the word “expressivity.” By that other definition, Rust is far more expressive than C has ever been.


This is the moment we can talk about “tractability,” the other key word in the original quote. There are many possible definitions for that word, but Hillel points at the one that interests us: “asserting properties on things.”

In Hillel’s world, we care about computing properties about programs, in particular getting an automated answer to the question “is this program correct or not.”

I’d personally argue that “correctness” is highly subjective; Hillel is only able to make a useful statement because he’s working with a very narrow definition, that of whether a program is an adequate translation of its specification: if the programmer “did their work correctly” then the program should do at least what it was meant to do, and not so much more as to cause unacceptable problems.

So computing things about correctness needs four things:

  1. a specification of what was intended, ahead of time.

    (This exludes exploratory programming where folk mash up program bits together without purpose until it “clicks” into something that’s vaguely agreeable.)

  2. a way to write the specification that’s sufficiently formal that a correctness-testing program can use it as input to test something.

    (This excludes most business programming where specs are half-formed international English sentences spoken over lossy phone lines.)

  3. the existence of algorithms/programs that can actually compute something about programs in relationship to their specification, for example whether they are different, how different, where the difference is, etc.

  4. a language to talk about the kind of mistakes that we care about, and their relative importance. For example, a program able to tell me that my other program did not sort the names of the patients in the hospital alphabetically, or does so using Bubble sort instead of Quicksort, is arguably less important than one that tells me the other program simply deletes the names of the last 3 patients from every output list.

    (This last example is painfully taken from experience: there are enough senior engineers who argue incessantly about the correctness of the “core” algorithms without caring about edge cases, even when lives are on the line in these edge cases.)

I’ll go on a limb here and argue that Hillel does not like “expressive” languages like C (for its expressivity of mistakes) or Rust (for its expressivity of abstraction) because they make point (3) here intractable to him: for these languages,

  • it is often impossible to even build an algorithm that derives interesting properties on the relationship between an arbitrary program and its specification,
  • when that specification is written in TLA,
  • and the desired output about correctness are boolean statements about termination, liveness and the presence/absence of particular results.

And so it is tempting to create a strawman trade-off between these “expressive” languages and the “tractability” of automating TLA proofs of TLA-flavored correctness about arbitrary programs expressed in them.

But that is just too specific!


So, to recap, a definition of “expressivity” cannot give us something useful to work with if it is about:

  • what is possible to express or not: most languages can express everything other languages can do, or can be evolved to achieve that.
  • how easy it is to build things: different folk will experience that expressivity differently due to diverse cultural backgrounds; and two languages can end up incomparably expressive as a result.

Instead, we can make it work as a measure of how easy it is to express certain abstractions (positive expressivity) and how easy it is to make mistakes (negative expressivity).

The art of programming language design is to maximize positive expressivity and minimize negative expressivity. Of course, in the general case that is doomed to fail (expressivity always cuts both ways) but we can optimize a language to facilitate the expression of certain abstractions while making it hard to express certain mistakes.

Meanwhile, “tractability” is (and always has been) an evaluation of how easy it is to compute certain things within a set of constraints.

For example, it is tractable to compute a mortgage plan on a phone in 2020, whereas it was intractable on a phone in 1980.

For folk like Hillel who care about proving correctness of programs, certain programming languages make correctness proofs more tractable because they were designed to facilitate the expression of programs that closely follow their specification language.

For that folk, “tractability” in a system is related to how much the system forces a relationship between specifications and realization. If the system (e.g. a programming language) has much more expressivity than what the specification language has, it becomes easy for the programmer to “say” more things that the person writing (and verifying) the spec can comprehend, both positive and/or negative.

But whose fault is that?

In my world, the specification language is defective. If engineers find themselves needing to use languages that can express more than the specification languages that correctness-minded folk care about, maybe the latter could design their spec languages better so they auto-translate to the machines that the former care about running.


Expressivity and (that particular notion of) tractability are not a trade-off. The distance between them changes over time, as a witness of how society invests into translating human intent into automation.

Like this post? Share on: TwitterHacker NewsRedditLinkedInEmail

Raphael ‘kena’ Poss Avatar Raphael ‘kena’ Poss is a computer scientist and software engineer specialized in compiler construction, computer architecture, operating systems and databases.

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Keep Reading

Reading Time

~11 min read





Stay in Touch