Balanced Parentheses as an Embedded Domain Specific Language

Let's look at how to use macros to extend the Lean 4 parser and embed a language for building balanced parentheses. This language accepts strings given by the BNF grammar

Dyck ::=
  "(" Dyck ")"
  | "{" Dyck "}"
  | end

We begin by defining an inductive data type of the grammar we wish to parse:

inductive Dyck : Type where
  | round : Dyck → Dyck  -- ( <inner> )
  | curly : Dyck → Dyck  -- { <inner> }
  | leaf : Dyck

We begin by declaring a syntax category using the declare_syntax_cat <category> command. This names our grammar and allows us to specify parsing rules associated with our grammar.

declare_syntax_cat brack

Next, we specify the grammar using the syntax <parse rule> command:

syntax "end" : brack

The above means that the token "end" lives in syntax category brack.

Similarly, we declare the rules "(" Dyck ")" and "{" Dyck "}" using the rules:

syntax "(" brack ")" : brack
syntax "{" brack "}" : brack

Finally, we need a way to build Lean 4 terms from this grammar -- that is, we must translate out of this grammar into a Dyck value, which is a Lean 4 term. For this, we create a new kind of "quotation" that consumes syntax in brack and produces a term.

syntax "`[Dyck| " brack "]" : term

To specify the transformation rules, we use macro_rules to declare how the syntax `[Dyck| <brack>] produces terms. This is written using a pattern-matching style syntax, where the left-hand side declares the syntax pattern to be matched, and the right-hand side declares the production. Syntax placeholders (antiquotations) are introduced via the $<var-name> syntax. The right-hand side is an arbitrary Lean term that we are producing.

macro_rules
  | `(`[Dyck| end])    => `(Dyck.leaf)
  | `(`[Dyck| ($b)]) => `(Dyck.round `[Dyck| $b])  -- recurse
  | `(`[Dyck| {$b}]) => `(Dyck.curly `[Dyck| $b])  -- recurse
#check `[Dyck| end]      -- Dyck.leaf
#check `[Dyck| {(end)}]  -- Dyck.curl (Dyck.round Dyck.leaf)

In summary, we've seen:

  • How to declare a syntax category for the Dyck grammar.
  • How to specify parse trees of this grammar using syntax
  • How to translate out of this grammar into Lean 4 terms using macro_rules.

The full program listing is given below:

inductive Dyck : Type where
  | round : Dyck → Dyck  -- ( <inner> )
  | curly : Dyck → Dyck  -- { <inner> }
  | leaf : Dyck

-- declare Dyck grammar parse trees
declare_syntax_cat brack
syntax "(" brack ")" : brack
syntax "{" brack "}" : brack
syntax "end" : brack

-- notation for translating `brack` into `term`
syntax "`[Dyck| " brack "]" : term

-- rules to translate Dyck grammar into inductive value of type Dyck
macro_rules
  | `(`[Dyck| end])    => `(Dyck.leaf)
  | `(`[Dyck| ($b)]) => `(Dyck.round `[Dyck| $b])  -- recurse
  | `(`[Dyck| {$b}]) => `(Dyck.curly `[Dyck| $b])  -- recurse

-- tests
#check `[Dyck| end]      -- Dyck.leaf
#check `[Dyck| {(end)}]  -- Dyck.curl (Dyck.round Dyck.leaf)