Artiweb > Pull Request #3099

A new grammar for configuration #3099

Github

Description

This PR proposes an own grammar for KeY settings and configuration.

Status Quo

Currently, we store our settings inside of Java Properties files. Either for global settings inside $HOME/.key or inside of proof files. Java Properties are a simple and easy solution, but also have some disadvantages. For example,

  • the lack of type safety (everything is a string),
  • missing positional information (we are not able to give error messages regarding incorrect settings),
  • no support for adding comments, and
  • no complex data structures.

Proposal

I propose an own grammar, written an ANTLR4, for KeY's configuration that is compatible with JSON and the grammar of KeY.

Why an own grammar? (or why not JSON, TOML or Co.)

There are many popular grammars for configuration files, e.g., JSON, TOML and YAML. I decided against them for the following reason:

  • The grammar must be embeddable into our KeY files and grammar.
  • No external dependencies and complexity. The grammar is small, and parsing is done in a few hundred boring LoC.
  • No support for storing additional information, like comments.
  • We can still use it to parse old properties files.

The grammar

The grammar is an extension of JSON, allowing comments, multiline strings, non-quoted keys, etc. An example is given in the comments below.

The grammar is an recursive definition of values, which can simple express as an ANTLR4 grammar.

antlr4 // Config cfile: cvalue*; ckv: doc=DOC_COMMENT? ckey ':' cvalue; ckey: IDENT | STRING_LITERAL; cvalue: IDENT #csymbol | STRING_LITERAL #cstring | BIN_LITERAL #cintb | HEX_LITERAL #cinth | MINUS? INT_LITERAL #cintd | FLOAT_LITERAL #cfpf | DOUBLE_LITERAL #cfpd | REAL_LITERAL #cfpr | (TRUE|FALSE) #cbool | LBRACE (ckv (COMMA ckv)*)? COMMA? RBRACE #table | LBRACKET (cvalue (COMMA cvalue)*)? RBRACKET #986

Changes

  1. Settings inside proof files would no longer be strings instead the section follows the grammar above.
  2. All configuration files inside $HOME/.key will follow the new grammar.
  3. The interface for settings (#writeSettings and #readSettings) is extended with two new methods which take an object of Map<String, Object> instead of a Properties object.

Migration strategies

  1. We keep the handling of properties in place. And have switches to load "json" over "props" files from the .key directory.
  2. In KeY files, we decide based on the grammar, (STRING_LITERAL vs. JSON-object) which settings handling to trigger.

The properties handling will be marked as deprecated and will be removed in the future.


Builds upon #3031 which is currently not merged.


Request for comments: @unp1 @mattulbrich @WolframPfeifer @lks9


The grammar OLD & DEPRECATED

The grammar is an extension of the INI files, where the value of entries can be a type-safe and complex data structure. For example the entry for a solver can be look like this:

[SMT_SOLVER] Z3 = { name = z3, path = "$HOME/bin/z3", flags = [ "-T", 25, "-smt2" ] }

  • Categories. The grammar allows to have non-nested category marked with brackets: [category]. The scope of category is valid until the nest category or the end of file.
  • Entry. An entry is a key-value pair separated by an equal sign =. The key is just a string, whereas the value is one of the following data structures:
  • Boolean: true or false
  • Integer: any number without a dot
  • Float: any number with a dot
  • String: marked with double quotes
  • List: A list is given with brackets, e.g, [1,2,3]
  • Map: A structure with named sub-elements.
  • Fallback: A list of identifiers without any double quotes or something (e.g., a b c), is treated as a string like in Properties files.

Note, this grammar is very similar to TOML files.

Grammar in detail

```antlr grammar Config;

file: (section? kv); section: LBRACKET SYMBOL RBRACKET; kv: (SYMBOL|STRING) EQUAL value; value: SYMBOL #symbol | STRING #string | NUMBER #num | FLOATING #fp | BOOLEAN #bool | LBRACE (kv (COMMA kv))? RBRACE #table | LBRACKET (value (COMMA value)*)? RBRACKET #list;

COMMENT: '#' ~('\n'|'\r') -> channel(HIDDEN); ML_COMMENT: '/' (ML_COMMENT|.)? '/'->channel(HIDDEN); WS: ('\n'|'\r'|' '|'\f'|'\t')+ -> channel(HIDDEN); COMMA:','; BOOLEAN: 'true'|'false'|'FALSE'|'TRUE'; FLOATING: [+-]? [0-9] '.' [0-9]+; NUMBER: [+-]? [0-9]+; STRING: '"' ~( '"' ) '"'; RBRACE:'}'; LBRACE:'{'; LBRACKET: '['; RBRACKET: ']'; EQUAL: '='; SYMBOL: [a-zA-Z] [a-zA-Z]*; ```

Artifacts