A new grammar for configuration #3099
This PR proposes an own grammar for KeY settings and configuration.
Status Quo
Currently, we store our settings inside of Java Properties files. Either for global settings inside $HOME/.key
or inside of proof files. Java Properties are a simple and easy solution, but also have some disadvantages. For example,
- the lack of type safety (everything is a string),
- missing positional information (we are not able to give error messages regarding incorrect settings),
- no support for adding comments, and
- no complex data structures.
Proposal
I propose an own grammar, written an ANTLR4, for KeY's configuration that is compatible with JSON and the grammar of KeY.
Why an own grammar? (or why not JSON, TOML or Co.)
There are many popular grammars for configuration files, e.g., JSON, TOML and YAML. I decided against them for the following reason:
- The grammar must be embeddable into our KeY files and grammar.
- No external dependencies and complexity. The grammar is small, and parsing is done in a few hundred boring LoC.
- No support for storing additional information, like comments.
- We can still use it to parse old properties files.
The grammar
The grammar is an extension of JSON, allowing comments, multiline strings, non-quoted keys, etc. An example is given in the comments below.
The grammar is an recursive definition of values, which can simple express as an ANTLR4 grammar.
antlr4
// Config
cfile: cvalue*;
ckv: doc=DOC_COMMENT? ckey ':' cvalue;
ckey: IDENT | STRING_LITERAL;
cvalue:
IDENT #csymbol
| STRING_LITERAL #cstring
| BIN_LITERAL #cintb
| HEX_LITERAL #cinth
| MINUS? INT_LITERAL #cintd
| FLOAT_LITERAL #cfpf
| DOUBLE_LITERAL #cfpd
| REAL_LITERAL #cfpr
| (TRUE|FALSE) #cbool
| LBRACE
(ckv (COMMA ckv)*)? COMMA?
RBRACE #table
| LBRACKET (cvalue (COMMA cvalue)*)? RBRACKET #986
Changes
- Settings inside proof files would no longer be strings instead the section follows the grammar above.
- All configuration files inside
$HOME/.key
will follow the new grammar. - The interface for settings (
#writeSettings
and#readSettings
) is extended with two new methods which take an object ofMap<String, Object>
instead of aProperties
object.
Migration strategies
- We keep the handling of properties in place. And have switches to load "json" over "props" files from the
.key
directory. - In KeY files, we decide based on the grammar, (STRING_LITERAL vs. JSON-object) which settings handling to trigger.
The properties handling will be marked as deprecated and will be removed in the future.
Builds upon #3031 which is currently not merged.
Request for comments: @unp1 @mattulbrich @WolframPfeifer @lks9
The grammar OLD & DEPRECATED
The grammar is an extension of the INI files, where the value of entries can be a type-safe and complex data structure. For example the entry for a solver can be look like this:
[SMT_SOLVER]
Z3 = {
name = z3,
path = "$HOME/bin/z3",
flags = [ "-T", 25, "-smt2" ]
}
- Categories. The grammar allows to have non-nested category marked with brackets:
[category]
. The scope of category is valid until the nest category or the end of file. - Entry. An entry is a key-value pair separated by an equal sign
=
. The key is just a string, whereas the value is one of the following data structures: - Boolean:
true
orfalse
- Integer: any number without a dot
- Float: any number with a dot
- String: marked with double quotes
- List: A list is given with brackets, e.g, [1,2,3]
- Map: A structure with named sub-elements.
- Fallback: A list of identifiers without any double quotes or something (e.g.,
a b c
), is treated as a string like in Properties files.
Note, this grammar is very similar to TOML files.
Grammar in detail
```antlr grammar Config;
file: (section? kv); section: LBRACKET SYMBOL RBRACKET; kv: (SYMBOL|STRING) EQUAL value; value: SYMBOL #symbol | STRING #string | NUMBER #num | FLOATING #fp | BOOLEAN #bool | LBRACE (kv (COMMA kv))? RBRACE #table | LBRACKET (value (COMMA value)*)? RBRACKET #list;
COMMENT: '#' ~('\n'|'\r') -> channel(HIDDEN); ML_COMMENT: '/' (ML_COMMENT|.)? '/'->channel(HIDDEN); WS: ('\n'|'\r'|' '|'\f'|'\t')+ -> channel(HIDDEN); COMMA:','; BOOLEAN: 'true'|'false'|'FALSE'|'TRUE'; FLOATING: [+-]? [0-9] '.' [0-9]+; NUMBER: [+-]? [0-9]+; STRING: '"' ~( '"' ) '"'; RBRACE:'}'; LBRACE:'{'; LBRACKET: '['; RBRACKET: ']'; EQUAL: '='; SYMBOL: [a-zA-Z] [a-zA-Z]*; ```
Artifacts
- 30. May 2023 15:33 (170.22 kB large)
- 21. Apr 2023 10:46 (9986.96 kB large)
- 17. Apr 2023 14:24 (5240.87 kB large)
- 17. Apr 2023 12:55 (170.07 kB large)
- 14. Apr 2023 12:34 (170.20 kB large)
- 07. Apr 2023 16:04 (7329.03 kB large)
- 06. Apr 2023 19:28 (7225.19 kB large)
- 05. Apr 2023 13:04 (6334.10 kB large)
- 05. Apr 2023 12:54 (6333.46 kB large)
- 03. Apr 2023 17:21 (167.77 kB large)
- 03. Apr 2023 13:40 (4454.34 kB large)