KDL v2 #21

jaxter184 · 2025-02-14T07:39:12Z

Started working on v2 changes, and making a PR just to communicate interest in working on it. If someone else is already on it, feel free to close this PR.

Below is the contents of https://github.com/kdl-org/kdl/blob/main/CHANGELOG.md organized to show the relevant changes that need to be made:

general

keywords

null, true, and false are now #null, #true, and #false. Using the unprefixed versions of these values is a syntax error.
#inf, #-inf, and #nan have been added in order to properly support IEEE floats for implementations that choose to represent their decimals that way.
Correspondingly, the identifiers inf, -inf, and nan are now syntax errors.

comments

node-space is now allowed as whitespace after a slashdash, meaning line continuations will work now.
Slashdash (/-) -compatible locations adjusted to be more clear and intuitive. They can now be used in exactly three different places: before nodes, before entire entries, or before entire child blocks.
- Slashdash (/-)-compatible locations and related grammar adjusted to be more clear and intuitive. This includes some changes relating to whitespace, including comments and newlines, which are breaking changes. (duplicate)
- Furthermore, The ordering of slashdashed elements has been restricted such that a slashdashed child block cannot go before an entry (including slashdashed entries).

string characters

Solidus/Forward slash (/) is no longer an escaped character.
Space (U+0020) can now be written into quoted strings with the \s escape.
- ~~\s is now a valid escape within a string, representing a space character.~~ (duplicate)
~~Single line comments (//) can now be immediately followed by a newline.~~ (seems to already have been the case in knuffel)
All literal whitespace following a \ in a string is now discarded.
Vertical tabs (U+000B) are now considered to be newlines.
The grammar syntax itself has been described, and some confusing definitions in the grammar have been fixed accordingly (mostly related to escaped characters).
,, <, and > are now legal identifier characters. They were previously reserved for KQL but this is no longer necessary.
# is no longer a legal identifier character.
~~Equals signs other than = are no longer supported in properties.~~ (seems to already have been the case in knuffel)

code points

Code points under 0x20 (except newline and whitespace code points), code points above 0x10FFFF, Delete control character (0x7F), and the unicode "direction control" characters are now completely banned from appearing literally in KDL documents. They can now only be represented in regular strings, and there's no facilities to represent them in raw strings. This should be considered a security improvement.
Code points have been constrained to Unicode Scalar Values only, including values used in string escapes (\u{}). All KDL documents and string values should be valid UTF-8 now, as was intended.

raw strings

Raw strings no longer require an r prefix: they are now specified by using #""#.
Raw string productions are now explicitly non-greedy (and "fallible").
- ~~raw-string productions have been updated to be explicitly non-greedy and "fallible".~~ (duplicate)
Grammar has been fixed to disallow raw strings like #"""#, which are now properly treated as invalid multi-line raw strings (instead of the equivalent of "\"").

multiline

Multi-line strings' literal Newline sequences are now normalized to single LFs.
Multi-line strings must now use """ as delimeters. The opening delimiter must be immediately followed by a newline, and the closing delimiter must be on its own line, prefixed by optional whitespace.
Multi-line strings are now automatically dedented, according to the common whitespace matching the whitespace prefix of the closing line.
Multiline strings, both Raw and Quoted, must now use """ instead of a single ". Using """ for a single-line string is a syntax error.
Multiline string escape rules have been tweaked significantly.
~~Some details have been clarified around the treatment of whitespace in multiline strings.
One or two consecutive double-quotes are now allowed in the bodies of multi-line quoted strings, without needing to be escaped.

not relevant

The spec prose has more explicitly stated that whitespace and newlines are not valid identifier characters, even though the grammar already expressed this.
The spec prose now more explicitly states that strings and raw strings can be used as type annotations.
Removed a statement in the spec prose that said "It is reasonable for an implementation to ignore null values altogether when deserializing". This is no longer encouraged or desired.
A slew of additional slashdash and multi-line string compliance tests have been added. Have fun. :)
Fixed an issue with the unicode_silly test case.
Various updates to test suite to reflect changes.
Some tests have been added, others adjusted, some removed, after a cleanup pass.
Test suite has been updated to include a _fail suffix in all test cases which are expected to fail.
The organization of string types in the spec prose has been updated to a hopefully more helpful structure.
Some rewordings and clarification in the spec prose.

TheLostLambda · 2025-02-16T04:01:40Z

@jaxter184 As far as I'm aware, you're the only one working on it at the moment! This is looking exciting! Keep up the great work and let me know if you need anything!

jaxter184 added 9 commits February 13, 2025 23:33

make node terminator optional

3a692bc

remove r prefix for raw strings

b7d2bb7

add keywords

c240d21

bare identifiers as strings

531643a

Add u128/i128

1e35ad7

disallow \u{feff}

53667c6

forward slash no longer escaped, vertical tab is newline

795784b

from bare identifier chars, remove ,<>, add #

be485e9

add \s escape

4274c84

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KDL v2 #21

KDL v2 #21

jaxter184 commented Feb 14, 2025 •

edited

Loading

TheLostLambda commented Feb 16, 2025

KDL v2 #21

Are you sure you want to change the base?

KDL v2 #21

Conversation

jaxter184 commented Feb 14, 2025 • edited Loading

general

keywords

comments

string characters

code points

raw strings

multiline

not relevant

TheLostLambda commented Feb 16, 2025

jaxter184 commented Feb 14, 2025 •

edited

Loading