On this page:
2.1 S-Expressions as Inspiration
2.2 Shrubbery Goals
2.3 Shrubbery Abstract Syntax
2.4 Shrubbery Syntax Details
2.4.1 Separators and Terminators
2.4.2 Required and Optional Newlines
2.4.3 Indentation Rules and Variations
2.4.4 Two More Tricks
2.5 Exercise
8.18.0.18

2 Shrubbery Notation🔗

Now that you’ve tried working with some Rhombus code, you probably have lots of questions about syntax—such as where newlines and indentation are required and how matching works inside ''.

2.1 S-Expressions as Inspiration🔗

Rhombus syntax is like Lisp syntax in that it’s bicameral: there are two layers to the specification, and both layers must be satisfied to write a valid expression. In Lisp, the first layer is S-expression notation; the second layer is the grammar of definitions, conditionals, function calls, etc. That second layer is expressed in terms of the first layer. Rhombus is similarly layered, where the first layer is shrubbery notation.

S-expression notation has an especially simple grammar: The grammars here mingle abstract-syntax notation with concrete-syntax literals like ( and ). The literals are meant to stand in place of constructor tags while suggesting the relevant concrete syntax.

 

term

 ::= 

atom

 

  |  

( term* )

We’re simplifying a little bit here, as the atom nonterminal hides a lot of complexity and we’re ignoring the distinction between pairs and (improper) lists.

Still, this abstract syntax is simple, and the concrete version mimics the abstract version’s simplicity: a term sequence is combined into a single using ( and ) around the sequence, and whitespace appears between the terms in the sequence. This simplicity is a great strength of the language design, but it also extracts a price from programmers, requiring them to write lots of parentheses.

The value of this simple, underlying S-expression layer in the syntax is that it lets us factor the definition of the syntax of the language. This first layer’s parsed form serves itself as the concrete syntax for the second layer. The second layer parses into an AST, i.e., abstract syntax as we normally mean it, that a compiler or interpreter can understand. The advantage of this split for a language like Lisp or Rhombus is in the way that the S-expression or shrubbery notation layer is defined once and for all, applying both inside and outside of quoted forms. Then, we can build a second, extensible layer on top; because the first S-expression or shrubbery layer is fixed, the macro writer needs to consider only forms that have the first layer’s basic structure.

2.2 Shrubbery Goals🔗

Shrubbery notation is design to look conventional and lightweight, particularly relative to S-expressions. This goal is realized in two ways:

  • Shrubbery notation is provides most tree structure, but it leaves some tree structure to a second layer of parsing, as in 1 + x * 3.

    Individual elements like 1, +, and x are called terms. A sequence of terms where extra parsing will be needed is called a group. Groups, meanwhile, are typically separated by newlines or commas.

    Forms written with (), [], {}, or '' are also terms on the outside, but they contain multiple groups on the inside. For example, f(1 + x) is a group of two terms, where the first term is f and the second term is (1 + x). The term (1 + x) contains a single group that has three terms. Similarly, f(3 * x, 4 * y) is also two terms, but the group that is surrounded by parentheses contains two groups, 3 * x and 4 * y, and those two groups each consist of three terms (a number, the asterisk, and a variable).

  • Large-scale block structure relies on newlines and indentation, instead of relying on closing )s or }s that look noisy and redundant in code that is otherwise formatted in a reasonable way.

    For example, the end of a function body is determined by outdenting.

    fun f(x):

      println(x)

      x + 1

     

    f(0)

    A block, like the function body above, contains a sequence of groups. Roughly, a group spans one line of text within a block, but there are various ways for a group to span multiple lines. For example, an opening ( to create a term within a group might be closed with ) on a later line.

    fun f(x):

      println(

        x

      ) // of the first group in the function body's block

      x + 1

2.3 Shrubbery Abstract Syntax🔗

Many previous attempts to make the abstract structure of S-expressions look less noisy also use newlines and whitespace. Shrubbery notation differs from previous efforts by intentionally adding complexity in the abstract structure. Shrubbery notation enables a second layer of syntax to make distinctions among forms that used (), [], {}, : or |, instead making distinctions only based on a leading identifier. Still, the shrubbery abstract grammar is still far simpler than the AST of any realistic programming language.

Here’s the abstract syntax that a shrubbery form describes:

 

document

 ::= 

group*

 

group

 ::= 

item* [block] [alts]     must be nonempty

 

term

 ::= 

item  |  block  |  alts

 

item

 ::= 

atom

 

  |  

( group* )      comma-separated groups

 

  |  

[ group* ]      comma-separated groups

 

  |  

{ group* }      comma-separated groups

 

  |  

' group* '      newline-separated groups

 

block

 ::= 

: group*        newline-separated groups

 

alts

 ::= 

{| group*}+     newline-separated groups

Why these particular textual elements?

  • Distinguishing (), [], and {} has clear advantages, and even some S-expression dialects (e.g., Clojure) make that distinction.

  • Quoting code is important for our goals, so we add '' to the sets of bracketing forms, which both makes it distinct and highlights its “quoting” nature.

  • Using : plus indentation for block structure reads and writes well.

  • Using | to generalize : for multi-block forms helps highlight alternatives, such the cases of an if, match, or algebraic-datatype declaration.

Although a group can end with both a block and an alts, usually only one of those (or neither) appears in a group. A block created with : is itself a term within an enclosing group, just like (), [], {}, and '' create terms. Each individual | form within alts is conceptually a block, since it has the same shape as blockbut a term created with | encompasses a sequence of one or more | blocks, not each individual | block.

The atoms of shrubbery notation are mostly straightforward, and we’ll leave them to the documentation. Note that shrubbery distinguishes identifiers, such as x or to_string, from operators, such as + and ->. The tokens : and | are not operators, and neither are (, ), [, ], {, }, or '.

2.4 Shrubbery Syntax Details🔗

Shrubbery notation is designed to look simple and natural, but as we all know, natural language can be surprisingly complex. For shrubbery notation, the complexities mostly involve group formation: where exactly newlines are required and where they are allowed.

This section offers a compact summary the key rules, but it’s still too much for a short tutorial. Rhombus programmers are mostly expected to infer sensible rules, anyway. We advise skipping to the exercise.

2.4.1 Separators and Terminators🔗

The first set of rules are about separators and closers:

  • Spaces are used to separate items within a group, but groups in a group* sequence are separated either by a comma , or by a newline. Specifically, groups in (), [], or {} are comma-separated, while groups other places (including in '') are separated by newlines.

    Examples:

    (group 1, group 2, group 3)

     

    [group 1, group 2, group 3]

     

    {group 1, group 2, group 3}

     

    'group 1

     group 2

     group 3'

     

    : group 1

      group 2

      group 3

     

    | group 1

      group 2

      group 3

  • Only some of the sequence-combining forms rely on paired characters to mark the start and end of the sequence. Sequences that start with : or | rely on a newline plus reduced indentation to mark the end of a sequence.

    Examples:

    (start sequence, sequence end)

     

    [start sequence, sequence end]

     

    {start sequence, sequence end}

     

    'start sequence

     sequence end'

     

    : start sequence: start nested sequence

                      nested sequence end

      sequence end

     

    | start sequence

      continue sequence | start nested sequence

                          nested sequence end

      sequence end

     

    : start sequence | start nested sequence

                       nested sequence end

      sequence end

  • There is one exception to the previous rule: | on the same line as another | ends the preceding |, instead of requiring a newline to end the preceding |. This exception applies only if the two |s are inside the same (), [], {}, and ''.

    Examples:

    | one alt | second alt | third alt

    | one alt (| nested alt) | second alt

2.4.2 Required and Optional Newlines🔗

Some newlines are required as group separators, but there are a few places where newlines are optional and allowed:

  • Extra newlines are allowed between group in a group sequence, including sequences where , separates groups.

    Examples:

    (group 1, group 2,

     group 3)

     

    (group 1,

     

     group 2)

     

    'group 1

     

     group 2'

  • A newline is allowed before the first group in a group sequence, such as after a (, :, or |.

    Examples:

    (

      group 1,   

      group 2

    )

     

    :

      group 1

      group 2

     

    |

      group 1

      group 2

  • A newline is allowed before each | in an alts. In fact, a newline may be needed to create a nested alts without ending an enclosing alts alternative, due to the rule about a | acting as a closer for an earlier | on the same line.

    Examples:

    group 1

    | alt 1 nested group 1

      alt 1 nested group 2

    | alt 2 | alt 3 nested group 1

              alt 3 nested group 2

Note that an alts is not just one | form, but a sequence of | forms, each with a group sequence. The individual |s within one alts are also separated by newlines. A newline is therefore potentially ambiguous as a group separator or a |-form separator; the ambiguity is resolved by also choosing to interpret the newline as a |-form separator.

: group 1

  | alt 1 within group 1

  | alt 2 within group 1

  group 2

A newline as a group separator can be written with ;. A ; cannot be used between groups in (), [], or {}, since newlines there are optional whitespace, not separators. When ; is allowed, redundant ;s are allowed and ignored.

: group 1; group 2;

  group 3

2.4.3 Indentation Rules and Variations🔗

When a newline is used, either to acts as a separator or when allowed between ,-separated groups, indentation of the newly formed line is constrained:

  • The indentation of the first group in a group sequence must be more indented than an enclosing group when the new group sequence follows : or |. There is no constraint on the first group within (), [], {}, or ''.

    enclosing group:

        nested group (

      // not a standard layout, but allowed:

      more nested group

        )

  • If a group starts on the new line, and if it is not the first group in its sequence, it must begin at the same column as the first group in its sequence.

    enclosing group:

        nested group 1 (

      // not a standard layout, but allowed:

      more nested group A,

      more nested group B

        )

        nested group 2

  • If the first | of an alts starts on a new line, it must be indented the same as its enclosing group. The first |s of an alts does not have to start on a new line.

    enclosing group 1

    | alt in enclosing group 1

     

    enclosing group 2 | alt in enclosing group 2

  • If a | that is not first in its alts starts on a new line, it must line up with the first | in the alts. A non-first |s does not have to start on a new line.

    enclosing group

    | alt 1 | alt 2

    | alt 3

     

    enclosing group | alt 1 | alt 2

                    | alt 3

2.4.4 Two More Tricks🔗

There’s more to shrubbery notation that you could learn at your leisure, including the #// group-commenting form, @ notation for working with text, and «» for whitespace-insensitive mode, but there are only two more details that are essential for practical use:

  • You can continue a group on another line with a combination of extra indentation and starting with an operator (as opposed to an identifier). After continuing a group once that way, continue again by using the same indentation and an operator.

    Examples:

    a + b

      + c

      + d

     

    obj.m1()

      .m2()

      .m3()

      + 10

  • Nesting '' inside of '' is a problem, since the same character is used as an opener and closer. Sometimes, you can put the inner '' inside (), [], or []. If not, use »' for the outside quotes, or nest »' freely.

    Examples:

    'a ('b c')'

     

    '«a 'b c'»'

     

    '«a '«b c»'»'

    The first example here is different than the latter two. The first example is a '' term that contains a () term, and the () term contains a '' term. The latter two examples are each a '' term that contains an immediate '' term, and the «» disappear in the transition from concrete shrubbery syntax to abstract shrubbery syntax.

2.5 Exercise🔗

The shrubbery language permits any shrubbery form in a module body as a sequence of groups, and it prints the shrubbery’s abstract form using an S-expression encoding that is described in the documentation.

For example,

#lang shrubbery

 

start:

  "hello"

  "world"

end:

  "bye"

prints

(multi

 (group start (block (group "hello") (group "world")))

 (group end (block (group "bye"))))

where multi is used to combine a top-level sequence of groups in a module.

Experiment with #lang shrubbery, then reverse-engineer the following outputs by writing a shrubbery form that produces it. Note that if you make the DrRacket window narrower, the parenthesized output will contain more newlines, making it easier to see if you got the answer right.

  • (multi

     (group fun

            f

            (parens (group x))

            (block (group x (op +) 1)))

     (group f

            (parens (group 2))))

  • (multi

     (group

      (quotes

       (group math

              (op |.|)

              max

              (parens (group (op $) x) (group (op ...))))

       (group (op ...)))))

  • (multi

     (group

      match

      x

      (alts

       (block (group 1 (block (group "one"))))

       (block (group 2 (block (group "two")))))))