-
Notifications
You must be signed in to change notification settings - Fork 3
Specification
This is the formal specification of the Flect programming language. All implementations of the language are expected to conform to the minimum subset of the language that this specification requires.
When the terms required, must, shall, and similar are used in this specification, it means that an implementation must behave as specified in order to conform to this specification. Conversely, if the terms must not, shall not, and so on are used, an implementation must not behave in the specified way in order to confirm to this specification.
Only when the terms optional, can, and may are explicitly used in this specification does it indicate that an implementation does not have to implement the specified behavior in order to be conforming. It is, however, recommended that implementations also conform to all optional behaviors described in this specification.
Examples are provided throughout this specification. These demonstrate the expected behavior of an implementation and serve as clarification for possibly unclear or ambiguous statements. In other words, an implementation must behave in the way that examples in this specification demonstrate.
Finally, rationales are given in some parts of this specification where a design decision may not have an immediately obvious justification.
Throughout this specification, the Flect language grammar will be given where relevant. It is specified in a variation of the Extended Backus-Naur Form (EBNF). EBNF consists of a series of production rules (also called non-terminals) built on fundamental symbols, operators, and literals (called terminals). The meaning of the EBNF variant used in this specification is given here.
A production rule is defined as follows, using the ::=
operator:
rule-name ::= ...
A number of operators are used to build the right-hand side of production rules:
-
"A"
: Means the literal characterA
. Some literals may be escaped, e.g."\\"
which means the\
character and"\""
which means the"
character. -
U+NNNNNNNN
: Means the Unicode code point as specified by the hexadecimalNNNN
value. -
"A" .. "B"
: Constructs a code point range fromA
toB
according to the order of Unicode code points. Such ranges indicate that any one of the code points in the range is allowed. -
A ... | B ...
: The pipe character constructs an alternation. This means that eitherA ...
orB ...
(be they production rules or terminal symbols) are allowed. -
[ A ... ]
: Square brackets construct an option. This means thatA ...
may or may not appear. -
{ A ... }
: Curly braces construct a repetition. This means that zero or multiple ofA ...
may appear. -
< A ... >
: Angle brackets construct a repetition where at least one occurrence ofA ...
must appear, and possibly more. -
A ... * N
: The asterisk indicates thatA ...
must occurN
times. -
( A ... )
: Parentheses perform simple grouping (as in arithmetic) to resolve precedence issues. -
? ... ?
: Specifies a special sequence. The meaning of the sequence is explicitly given in the...
part.
(In the above rules (except for the ? ... ?
rule), three periods (...
) indicate zero or more production rule names or terminal symbols.)
No production rules allow end of file (EOF) to occur unless explicitly specified.
For example, given the above definitions, one could specify production rules for integer and floating point literals as follows:
integer ::= "0" .. "9"
float ::= integer "." integer [ exponent ]
exponent ::= ("e" | "E") [ "+" | "-" ] integer
A Flect source file consists of a series of Unicode code points encoded as UTF-8. In order to correctly interpret such a source file, a compiler must pass these code points through the Flect lexical grammar which produces a series of tokens, which are then passed through the preprocessor grammar which produces a filtered set of tokens, which are finally passed through the Flect syntactic grammar. The end result is then (presumably) a syntax tree used for further semantic analysis and finally code generation and/or execution.
The terminal symbols of the lexical grammar are Unicode code points, while the terminal symbols of the syntactic grammar are the tokens produced from lexical analysis.
The lexical grammar is given in this section. The syntactic grammar is given throughout this specification in relevant sections, with a full grammar in the final section.
token ::= directive
| operator-or-separator
| identifier
| keyword
| integer-literal
| float-literal
| character-literal
| string-literal
All white space in the Unicode Zs (Separator, Space) category must be dropped during lexical analysis, as must the horizontal tab, vertical tab, and form feed characters.
white-space ::= ? any Unicode category Zs character ?
| U+00000009
| U+0000000B
| U+0000000C
White space is thus ignored in the other productions in the lexical grammar and assumed not to be present in between the parts that make up the right-hand side of production rules.
Comments must be stripped during lexical analysis as with white space.
comment ::= line-comment | block-comment
line-comment ::= "//" { ? any character except U+0000000A ? } ( U+0000000A | ? end of file ? )
block-comment ::= "/*" { ? any character sequence except "*/" ? } "*/"
An implementation may choose to preserve comments for the purpose of documentation generation but they shall have no effect on program semantics.
As with white space, comments are ignored in the lexical grammar's production rules and assumed not to be present in between the parts that make up the right-hand side of production rules.
Preprocessor directives are used to control conditional compilation of code. Unlike in other languages, preprocessing in Flect happens between lexing and parsing, so preprocessor directives only have to be tokenized during lexical analysis.
directive ::= "\\\\" identifier
These are various operators used in expressions, and separators used to delineate code elements.
operator-or-separator ::= operator | separator
operator ::= "+"
| "-"
| "->"
| "*"
| "/"
| "%"
| "&"
| "&&"
| "|"
| "||"
| "|>"
| "^"
| "~"
| "!"
| "!="
| "!=="
| "."
| ".."
| "@"
| "="
| "=="
| "==="
| "<"
| "<<"
| "<="
| "<|"
| ">"
| ">>"
| ">="
separator ::= "("
| ")"
| "{"
| "}"
| "["
| "]"
| ","
| ";"
| ":"
| "::"
Identifiers are used for naming types, functions, variables, and so on. They are simple alphanumerical sequences (underscores are also allowed).
identifier ::= ( "a" .. "z" | "A" .. "Z" | "_" ) { "a" .. "z" | "A" .. "Z" | "0" .. "9" | "_" }
All identifiers prefixed with __
(two underscores) are reserved for future expansion of the language and for implementation-specific features.
Keywords are special identifiers used to direct syntactic analysis of Flect programs. They shall not be treated as identifiers.
Some keywords are reserved for future expansion of the language.
keyword = used-keyword | reserved-keyword
used-keyword ::= "mod"
| "use"
| "pub"
| "priv"
| "trait"
| "impl"
| "struct"
| "union"
| "enum"
| "type"
| "fn"
| "ext"
| "ref"
| "glob"
| "tls"
| "mut"
| "imm"
| "let"
| "as"
| "if"
| "else"
| "cond"
| "match"
| "loop"
| "while"
| "for"
| "break"
| "goto"
| "return"
| "safe"
| "unsafe"
| "asm"
| "true"
| "false"
| "null"
| "new"
| "assert"
| "in"
| "meta"
| "test"
| "macro"
| "quote"
| "unquote"
reserved-keyword ::= "yield"
| "fixed"
| "pragma"
| "scope"
| "move"
Literals are the fundamental building block of the Flect language. These are the values that can only be expressed directly as terminal symbols to the syntactic grammar and not through any other language construct.
An integer literal is a number with a base of 2, 8, 10, or 16 and an optional type specifier.
integer-literal ::= < "0" .. "9" >
| "0" ( "b" | "B" ) < "0" .. "1" >
| "0" ( "o" | "O" ) < "0" .. "7" >
| "0" ( "x" | "X" ) < "0" .. "9" | "a" .. "f" | "A" .. "F" >
typed-integer-literal ::= integer-literal [ ":" ( "i" | "u" ) [ "8" | "16" | "32" | "64" ] ]
A literal prefixed with 0b
is a binary integer literal, 0o
is an octal literal, and
0x` is a hexadecimal literal. If no prefix is present, the literal is decimal.
The suffix :i8
means that the literal is interpreted as a signed 8-bit integer, while :u8
means it is interpreted as an unsigned 8-bit integer, and so on. The special suffixes :i
and :u
mean word-sized integer types, signed and unsigned respectively. If no suffix is given, the literal's type is inferred during semantic analysis.
A floating point literal is an IEEE 754 floating point number consisting of an integral part, a fractional part, an optional exponent part, an optional exponent sign, and an optional type specifier.
float-literal ::= float-part "." float-part [ float-exponent ]
typed-float-literal ::= float-literal [ ":f" ( "32" | "64" ) ]
float-part ::= < "0" .. "9" >
float-exponent ::= ( "e" | "E" ) [ "+" | "-" ] float-part
If the suffix :f32
is used, the number is interpreted as an IEEE 754 binary32 value. If the :f64
suffix is used, it is interpreted as an IEEE 754 binary64 value. If no suffix is given, the literal's type is inferred during semantic analysis.
A character literal is a single Unicode code point. Its value is the number of the code point.
character-literal ::= "'" ( ? any character except "'" and "\\\\" ? | character-escape-sequence ) "'"
character-escape-sequence ::= "\\\\" ( character-escape-code | character-escape-unicode )
character-escape-code ::= "0" | "a" | "b" | "f" | "n" | "r" | "t" | "v" | "'" | "\\\\"
character-escape-unicode ::= "u" ( ( "0" .. "9" | "a" .. "f" | "A" .. "F" ) * 8 )
An escape sequence can be used to form a special character as shown in the following table.
Escape sequence | Character name | Unicode code point |
---|---|---|
\0 |
Null | U+00000000 |
\a |
Alert | U+00000007 |
\b |
Backspace | U+00000008 |
\f |
Form feed | U+0000000C |
\n |
Line feed | U+0000000A |
\r |
Carriage return | U+0000000D |
\t |
Horizontal tab | U+00000009 |
\v |
Vertical tab | U+0000000B |
\' |
Single quote | U+00000027 |
\\ |
Backslash | U+0000005C |
\uPPPPPPPP |
Code point | U+PPPPPPPP |
A string literal is a series of Unicode code points encoded as UTF-8.
string-literal ::= "\\"" { ? any character except "\\"" and "\\" ? | string-escape-sequence } "\\""
string-escape-sequence ::= "\\\\" ( string-escape-code | string-escape-unicode )
string-escape-code ::= "0" | "a" | "b" | "f" | "n" | "r" | "t" | "v" | "\\"" | "\\\\"
string-escape-unicode ::= "u" ( ( "0" .. "9" | "a" .. "f" | "A" .. "F" ) * 8 )
An escape sequence can be used to form a special character as shown in the following table.
Escape sequence | Character name | Unicode code point |
---|---|---|
\0 |
Null | U+00000000 |
\a |
Alert | U+00000007 |
\b |
Backspace | U+00000008 |
\f |
Form feed | U+0000000C |
\n |
Line feed | U+0000000A |
\r |
Carriage return | U+0000000D |
\t |
Horizontal tab | U+00000009 |
\v |
Vertical tab | U+0000000B |
\\ |
Backslash | U+0000005C |
\" |
Double quote | U+00000022 |
\uPPPPPPPP |
Code point | U+PPPPPPPP |
This section lists a few grammar elements that are commonly used throughout the syntactic grammar.
A qualified identifier is an unambiguous name referring to a module, type, function, variable, or macro (depending on the lexical context).
qualified-identifier ::= identifier { "::" identifier }
A Flect program consists of one or more modules which make up a bundle (which is a static library, shared library, or executable). These concepts are the fundamental building blocks for abstractions in Flect.
program ::= { module-declaration }
The module is the fundamental unit of encapsulation and reuse in Flect. It is a container of declarations; that is, types, functions, variables, and macros. A module has a visibility (pub
or priv
) which indicates whether or not it can be imported outside its containing bundle. Each declaration in the module also has a visibility which indicates whether that declaration can be at all accessed outside the module.
A source file can contain multiple module declarations; modules have no particular semantic relationship with the source file(s) they reside in.
Modules cannot be lexically nested. However, nested module namespaces are allowed (that is, one module declaration's name is foo::bar
and another's is foo::bar::baz
).
module-declaration ::= "mod" qualified-identifier "{" { declaration } "}"
By convention, module names should contain at least one prefixing component corresponding to the bundle they are part of. For example, if a bundle math
(libmath
) has a module for vector math, this module might be math::vector
.
Note that all module names prefixed with a component equal to core
(core run-time modules), std
(standard library modules), etc
(implementation-specific modules), or exp
(experimental modules for future inclusion into one of the former) are reserved for use by implementations of the Flect language.
A bundle is a collection of Flect modules. A bundle can be either a static library, a shared library, or an executable. Bundles are the primary mechanism through which code is packaged and distributed for use or reuse.
Bundles have no direct representation in the language as they are purely an aspect of Flect's compilation model.
It shall be an error for multiple modules with the same full name to exist in the same bundle. It is also an error if two modules with the same full name are present during compilation of a bundle. Suppose for instance that bundles A and B both have modules called foo
. Bundle C now links to bundle A and B. Since two modules named foo
exist, name resolution is ambiguous in bundle C, and an error shall therefore be issued.
Note that the bundle names core
, std
, etc
, and exp
are reserved for implementations of the Flect language.
The type system in Flect consists of integers, floating point numbers, the bool
and unit
types, tuples, structures, discriminated unions, arrays, vectors, various pointer types, function pointer types (with closures), and finally, user-defined, nominal types. Inner type qualifiers (mut
, imm
) are used to construct types with different levels of mutation guarantees. Traits and implementations (effectively a type class system) are used to aid in writing type-generic code.
type ::= nominal-type
| tuple-type
| function-type
| array-type
| vector-type
| pointer-type
Note that while some type names (i8
, f32
, self
, etc) get special treatment during semantic analysis, they are not keywords. They are treated as regular identifiers by lexical analysis and can be used as such.
There is a special grammar rule for function return types:
return-type ::= type | "!"
The !
character indicates that the function diverges; that is, it does not return, so it does not have a return type. This is primarily intended for functions such as the abort
function in the standard C library.
nominal-type ::= named-type [ type-arguments ]
named-type ::= integer-type
| float-type
| bool-type
| unit-type
| self-type
| qualified-identifier
type-arguments ::= "[" type { "," type } "]"
integer-type ::= "i8" | "i16" | "i32" | "i64" | "int" | "u8" | "u16" | "u32" | "u64" | "uint"
float-type ::= "f32" | "f64"
bool-type ::= "bool"
unit-type ::= "unit"
self-type ::= "self"
tuple-type ::= "(" type < "," type > ")"
function-type ::= function-pointer-type | closure-pointer-type
function-pointer-type ::= "fn" [ function-type-convention ] function-type-parameters "->" return-type
function-type-convention ::= "ext" string-literal
function-type-parameters ::= "(" [ function-type-parameter { "," function-type-parameter } ] ")"
function-type-parameter ::= [ "mut" ] [ "ref" ] type
closure-pointer-type ::= "fn" "@" function-type-parameters "->" return-type
array-type ::= managed-array-type | unsafe-array-type | general-array-type
managed-array-type ::= "@" [ "mut" | "imm" ] "[" type "]"
unsafe-array-type ::= "*" [ "mut" | "imm" ] "[" type "]"
general-array-type ::= "&" [ "mut" | "imm" ] "[" type "]"
vector-type ::= "[" type ".." integer-literal "]"
pointer-type ::= managed-pointer-type | unsafe-pointer-type | general-pointer-type
managed-pointer-type ::= "@" [ "mut" | "imm" ] type
unsafe-pointer-type ::= "*" [ "mut" | "imm" ] type
general-pointer-type ::= "&" [ "mut" | "imm" ] type
attribute ::= "@" "[" attribute-name [ attribute-arguments ] "]"
attribute-name ::= keyword | identifier
attribute-arguments ::= "(" [ attribute-argument { "," attribute-argument } ] ")"
attribute-argument ::= attribute-name [ "=" attribute-value ]
attribute-value ::= attribute-literal | attribute-arguments
attribute-literal ::= typed-integer-literal | typed-float-literal | character-literal | string-literal
The application binary interface (ABI) specifies certain conventions that shall be followed when compiling Flect source code to machine code.
When compiled to an object file format (such as ELF or PE/COFF), Flect functions (fn
declarations) should have their names mangled according to the following procedure:
- Start out with the string
fl__
. - Take the full module name of the module containing the function and replace all instances of
::
with_
. - Append the adjusted module name.
- Append two underscores (
__
). - Append the name of the function.
For example, a function do_stuff
in a module foo::bar
shall be mangled as fl__foo_bar__do_stuff
.
Note that only functions with flect
linkage need to be mangled. Functions with any other linkage, such as cdecl
, follow the name mangling rules of the ABI on the target platform.
Global variables (glob
declarations) and constants (const
declarations) are also subject to name mangling according to this procedure:
- Start out with the string
fl_g__
(for global variables) orfl_c__
(for constants). - Take the full module name of the module containing the global variable or constant and replace all instances of
::
with_
. - Append the adjusted module name.
- Append two underscores (
__
). - Append the name of the global variable or constant.
For example, a global variable data
in a module foo::bar
shall be mangled as fl_g__foo_bar__data
. A constant table
in a module bar::baz
shall be mangled as fl_c__bar_baz__table
.
The memory layout of the program stack is implementation-defined. It is, however, recommended that implementations follow the requirements of the target platform's C ABI.
Structures (struct
declarations) shall map 1:1 to C struct
s on the target platform; that is, they shall follow the same alignment and padding rules as the target platform's C ABI specifies.
For instance, consider this structure:
pub struct Foo {
pub x : u32;
pub y : f64;
}
This will compile to a C struct
like this:
struct Foo {
unsigned int x;
double y;
};
Enumerations (enum
declarations) shall compile down to the underlying type specified as part of the declaration.
Take for instance this enumeration:
enum Foo : u16 {
Bar = 0;
Baz = 1;
Qux = 2;
}
Whenever a value of type Foo
is created, it shall compile directly to the u16
equivalent. For instance, Foo.Qux
shall compile to 2:u16
.
Unions (union
declarations) shall compile down to C struct
s where the first field is a uint
tag describing the union case the instance represents. The rest of the resulting struct
is mostly opaque, but the remaining space must be large enough to hold all fields in the largest union case.
For example, consider this discriminated union:
pub union Union {
Foo {
pub x : i32;
}
Bar {
pub x : i32;
pub y : i32;
}
}
This would compile down to this C code:
struct Union {
size_t tag;
char data[8];
};
The data
field's size represents the size of the largest case in the union. This size may actually differ depending on alignment and padding rules of the target platform - the above is only what the struct
would look like on a 32-bit x86 processor.
Whenever a union
is matched against, the data
field is simply reinterpreted as the relevant union case's memory. For the purposes of memory layout, a union case can be thought of as a structure by itself.
The default (flect
) linkage uses an implementation-defined calling convention. This specification does not dictate any aspects of it, but does recommend that implementations use a commonly supported calling convention such as cdecl
.
All other linkage types are subject to whatever rules are dictated by the target platform's C ABI.
- Home
- Introduction
- Motivation
- Features
- Tutorial
- Library
- FAQ
- General
- Interoperability
- Syntax
- Type System
- Macros and CTE
- Specification
- Introduction
- Lexical
- Common Grammar Elements
- Modules and Bundles
- Type System
- Declarations
- Expressions
- Macros
- Compile-Time Evaluation
- Memory Management
- Application Binary Interface
- Foreign Function Interface
- Unit Testing
- Documentation Comments
- Style
- Indentation
- Braces
- Spacing
- Naming