From MyInfoRepo
Jump to navigation Jump to search

Syntax Definition Formalisms 3 (SDF3) is an academic syntax definition, used for compiling languages. It is part of the Spoofax workbench.


none yet

Syntax and language basics

Formal language definitions are very much related to (context-free) grammars, as we know them from automata.


Programs are sentences, where a sentence consists of phrases. A phrase is either a terminal or a non-terminal. Sorts are the categorization of both terminals and non-terminals. By defining something as a sort, we can use the declaration in something called productions. A grammar of a language consists of rules aka productions to build a valid program structure. A production consists of a sort, or sort constructor, followed by a body. The body is a template and will be parsed. The body may also be a phrase or sub-phrase.

An example:

 sorts Expr
 context-free syntax
   Expr = <Expr * Expr> // Everything after = is the body, also known as template

Sub-phrases are separated by layout, where layout is defined in the language as well. Layout is mostly whitespace, but also contains comments and other elements that should not be compiled. The constructor is used to construct abstract syntax tree nodes. Thus, the Bind constructor creates trees with two arguments trees for the identifier (ID) and expression (Exp) subtrees; in abstract syntax we leave out the literals and layout.

 Expr.Bind = <<ID> = <Exp>;>

Tree syntax

Some productions contain templates that can be ignored by the compiler when constructing the Abstract Syntax Tree (AST). One such production is {bracket}.

sorts Exp
context-free syntax
  Exp = <(<Exp>)> {bracket}

An expression, for example x+y, may be wrapped in brackets: (x+y), without affecting the AST.

Other productions might be ambiguous without clarification. For those, we need to specify the preference. Take for example the following statement:

a + b + c

It may construct an AST of add(add(a,b),c) or add(a,add(a,b)) (AST is simplified). For additions we should add {left}, since addition is (typically) left-associative. For operations as raising to the power, we might use {right}.

Lexical vs Context free

Lexical syntax defines the allowed characters for sorts.

An example program

Inverse Quotation

All words in a production body <> are literals. When a sort is required, it must be wrapped in another set of angle brackets <<ID>>.



Formatting rules for the parsing tree