-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix header parser single quote bug #22996
base: blead
Are you sure you want to change the base?
Conversation
@@ -115,7 +115,7 @@ BEGIN { | |||
(?<literal> | |||
(?<define> defined\(\w+\) ) | |||
| (?<func> \w+\s*\(\s*\w+(?:\s*,\s*\w+)*\s*\) ) | |||
| (?<const> (?:0x[a-fA-F0-9]+|\d+[LU]*|'.') ) | |||
| (?<const> (?:0x[a-fA-F0-9]+|\d+[LU]*|'.'|'\\.') ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of '.'
. It matches '''
and '\'
, neither of which is valid C.
Also, multi-char constants are allowed: 'AB'
is valid C (but IIRC the value of this constant is implementation-defined.)
(Also also, the [LU]
numeric suffix should be [LUlu]
because it's case insensitive.)
We should do something like for string literals:
(?<const> (?: ... |'(?:[^'\\]|\\.)+') )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind looking into these improvements as a separate patch, but I dont know when ill get time for it and im a little concerned we are opening a can of worms. The purpose of HeaderParser isnt to completely parse C, but rather to parse enough C pre-processor that we can do textual manipulation of it without horribly breaking things. So for now i'd rather we applied this part as is to get Karl working and then follow up with more validation of constructs like you mention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This is not a general purpose parser that we release for public use. It should be able to parse all the core hdr files including constructs that are likely to be added. I'm actually wanting this for use in Devel::PPPort, which parses header files for the current and previous Perl versions during its regeneration for each new release of the module. The current parser doesn't do nearly as good a job as this one. So multi-char constants may be valid C, but they aren't something we are going to use. Uu and string constants might be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, if you don't want to support multi-char constants, just remove the +
from '(?:[^'\\]|\\.)+'
. And if you don't want to factor out the common '
matches, you can write it in two parts: '[^'\\]'|'\\.'
But widening '[^'\\]'
to '.'
(as in the original code) is too sloppy to feel comfortable for me.
Updating it manually is annoying, add support for UPDATE_DUMP=1 when running the test script, which will update the test script with the latest output.
Karl gave me a test script which was breaking because we did not handle single quoted escapes like '\\' and '\n' and the like. This fixes the tokenizer pattern to handle these without dieing, although we do not validate that the escape sequence itself is legal, we no longer choke on having backslash something in the single quote. This includes tests for this, as well as for constructs like #if 1 which we had no tests for previously.
15bf1b6
to
107cc35
Compare
HeaderParser.pm - parse single quoted backslash properly
@khwilliamson gave me a test script which was breaking because we did not handle single quoted escapes like '\' and '\n' and the like. This fixes the tokenizer pattern to handle these without dieing, although we do not validate that the escape sequence itself is legal, we no longer choke on having backslash something in the single quote.
This patch sequence also includes a commit which teaches t/porting/header_parser.t how to update its own Data::Dumper based test. Updating this test manually was annoying, so I made the script do it for us when needed.