Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KDL v2 #21

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

KDL v2 #21

wants to merge 9 commits into from

Conversation

jaxter184
Copy link

@jaxter184 jaxter184 commented Feb 14, 2025

Started working on v2 changes, and making a PR just to communicate interest in working on it. If someone else is already on it, feel free to close this PR.

Below is the contents of https://github.com/kdl-org/kdl/blob/main/CHANGELOG.md organized to show the relevant changes that need to be made:

general

  • Line continuations can be followed by an EOF now, instead of requiring a newline (or comment). node \<EOF> is now a legal KDL document.
  • Bare identifiers can now be used as values in Arguments and Properties, and are interpreted as string values.
    • TODO tests
  • The last node in a child block no longer needs to be terminated with ;, even if the closing } is on the same line, so this is now a legal node: node{foo;bar;baz}
  • More places allow whitespace (node-spaces, specifically) now. With great power comes great responsibility:
    • Inside (foo) annotations (so, ( foo ) would be legal (( f oo ) would not be, since it has two identifiers))
    • Between annotations and the thing they're annotating ((blah) node (thing) 1 y= (who) 2)
    • Around = for props (x = 1)
  • The BOM is now only allowed as the first character in a document. It was previously treated as generic whitespace.
    • has been disallowed, still needs to be allowed as first character
  • .1, +.1 etc are no longer valid identifiers, to prevent confusion and conflicts with numbers.
  • u128 and i128 have been added as well-known number type annotations.
    • 128-bit integer type annotations have been added to the list of "well-known" type annotations. (duplicate)
  • Optional version marker /- kdl-version 2 (or 1) as the first line in a document, optionally preceded by the BOM.
  • Slight grammar tweak where the pre-terminator node-space* for node and final-node have been moved into base-node.

keywords

  • null, true, and false are now #null, #true, and #false. Using the unprefixed versions of these values is a syntax error.
  • #inf, #-inf, and #nan have been added in order to properly support IEEE floats for implementations that choose to represent their decimals that way.
  • Correspondingly, the identifiers inf, -inf, and nan are now syntax errors.

comments

  • node-space is now allowed as whitespace after a slashdash, meaning line continuations will work now.
  • Slashdash (/-) -compatible locations adjusted to be more clear and intuitive. They can now be used in exactly three different places: before nodes, before entire entries, or before entire child blocks.
    • Slashdash (/-)-compatible locations and related grammar adjusted to be more clear and intuitive. This includes some changes relating to whitespace, including comments and newlines, which are breaking changes. (duplicate)
    • Furthermore, The ordering of slashdashed elements has been restricted such that a slashdashed child block cannot go before an entry (including slashdashed entries).

string characters

  • Solidus/Forward slash (/) is no longer an escaped character.
  • Space (U+0020) can now be written into quoted strings with the \s escape.
    • \s is now a valid escape within a string, representing a space character. (duplicate)
  • Single line comments (//) can now be immediately followed by a newline. (seems to already have been the case in knuffel)
  • All literal whitespace following a \ in a string is now discarded.
  • Vertical tabs (U+000B) are now considered to be newlines.
  • The grammar syntax itself has been described, and some confusing definitions in the grammar have been fixed accordingly (mostly related to escaped characters).
  • ,, <, and > are now legal identifier characters. They were previously reserved for KQL but this is no longer necessary.
  • # is no longer a legal identifier character.
  • Equals signs other than = are no longer supported in properties. (seems to already have been the case in knuffel)

code points

  • Code points under 0x20 (except newline and whitespace code points), code points above 0x10FFFF, Delete control character (0x7F), and the unicode "direction control" characters are now completely banned from appearing literally in KDL documents. They can now only be represented in regular strings, and there's no facilities to represent them in raw strings. This should be considered a security improvement.
  • Code points have been constrained to Unicode Scalar Values only, including values used in string escapes (\u{}). All KDL documents and string values should be valid UTF-8 now, as was intended.

raw strings

  • Raw strings no longer require an r prefix: they are now specified by using #""#.
  • Raw string productions are now explicitly non-greedy (and "fallible").
    • raw-string productions have been updated to be explicitly non-greedy and "fallible". (duplicate)
  • Grammar has been fixed to disallow raw strings like #"""#, which are now properly treated as invalid multi-line raw strings (instead of the equivalent of "\"").

multiline

  • Multi-line strings' literal Newline sequences are now normalized to single LFs.
  • Multi-line strings must now use """ as delimeters. The opening delimiter must be immediately followed by a newline, and the closing delimiter must be on its own line, prefixed by optional whitespace.
  • Multi-line strings are now automatically dedented, according to the common whitespace matching the whitespace prefix of the closing line.
  • Multiline strings, both Raw and Quoted, must now use """ instead of a single ". Using """ for a single-line string is a syntax error.
  • Multiline string escape rules have been tweaked significantly.
  • ~~Some details have been clarified around the treatment of whitespace in multiline strings.
  • One or two consecutive double-quotes are now allowed in the bodies of multi-line quoted strings, without needing to be escaped.

not relevant

  • The spec prose has more explicitly stated that whitespace and newlines are not valid identifier characters, even though the grammar already expressed this.
  • The spec prose now more explicitly states that strings and raw strings can be used as type annotations.
  • Removed a statement in the spec prose that said "It is reasonable for an implementation to ignore null values altogether when deserializing". This is no longer encouraged or desired.
  • A slew of additional slashdash and multi-line string compliance tests have been added. Have fun. :)
  • Fixed an issue with the unicode_silly test case.
  • Various updates to test suite to reflect changes.
  • Some tests have been added, others adjusted, some removed, after a cleanup pass.
  • Test suite has been updated to include a _fail suffix in all test cases which are expected to fail.
  • The organization of string types in the spec prose has been updated to a hopefully more helpful structure.
  • Some rewordings and clarification in the spec prose.

@TheLostLambda
Copy link
Owner

@jaxter184 As far as I'm aware, you're the only one working on it at the moment! This is looking exciting! Keep up the great work and let me know if you need anything!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants