Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix header parser single quote bug #22996

Open
wants to merge 2 commits into
base: blead
Choose a base branch
from

Conversation

demerphq
Copy link
Collaborator

HeaderParser.pm - parse single quoted backslash properly

@khwilliamson gave me a test script which was breaking because we did not handle single quoted escapes like '\' and '\n' and the like. This fixes the tokenizer pattern to handle these without dieing, although we do not validate that the escape sequence itself is legal, we no longer choke on having backslash something in the single quote.

This patch sequence also includes a commit which teaches t/porting/header_parser.t how to update its own Data::Dumper based test. Updating this test manually was annoying, so I made the script do it for us when needed.

  • This set of changes does not require a perldelta entry.

@@ -115,7 +115,7 @@ BEGIN {
(?<literal>
(?<define> defined\(\w+\) )
| (?<func> \w+\s*\(\s*\w+(?:\s*,\s*\w+)*\s*\) )
| (?<const> (?:0x[a-fA-F0-9]+|\d+[LU]*|'.') )
| (?<const> (?:0x[a-fA-F0-9]+|\d+[LU]*|'.'|'\\.') )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of '.'. It matches ''' and '\', neither of which is valid C.

Also, multi-char constants are allowed: 'AB' is valid C (but IIRC the value of this constant is implementation-defined.)

(Also also, the [LU] numeric suffix should be [LUlu] because it's case insensitive.)

We should do something like for string literals:

(?<const> (?: ... |'(?:[^'\\]|\\.)+') )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind looking into these improvements as a separate patch, but I dont know when ill get time for it and im a little concerned we are opening a can of worms. The purpose of HeaderParser isnt to completely parse C, but rather to parse enough C pre-processor that we can do textual manipulation of it without horribly breaking things. So for now i'd rather we applied this part as is to get Karl working and then follow up with more validation of constructs like you mention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This is not a general purpose parser that we release for public use. It should be able to parse all the core hdr files including constructs that are likely to be added. I'm actually wanting this for use in Devel::PPPort, which parses header files for the current and previous Perl versions during its regeneration for each new release of the module. The current parser doesn't do nearly as good a job as this one. So multi-char constants may be valid C, but they aren't something we are going to use. Uu and string constants might be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, if you don't want to support multi-char constants, just remove the + from '(?:[^'\\]|\\.)+'. And if you don't want to factor out the common ' matches, you can write it in two parts: '[^'\\]'|'\\.'
But widening '[^'\\]' to '.' (as in the original code) is too sloppy to feel comfortable for me.

t/porting/header_parser.t Outdated Show resolved Hide resolved
Updating it manually is annoying, add support for UPDATE_DUMP=1
when running the test script, which will update the test script
with the latest output.
Karl gave me a test script which was breaking because we did not handle
single quoted escapes like '\\' and '\n' and the like. This fixes the
tokenizer pattern to handle these without dieing, although we do not
validate that the escape sequence itself is legal, we no longer
choke on having backslash something in the single quote.

This includes tests for this, as well as for constructs like

    #if 1

which we had no tests for previously.
@demerphq demerphq force-pushed the yves/fix_HeaderParser_single_quote_bug branch from 15bf1b6 to 107cc35 Compare February 13, 2025 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants