memcollxfrm: Handle above-Unicode code points #22989

khwilliamson · 2025-02-10T20:04:48Z

As stated in the comments added by this commit, it is undefined behavior to call strxfrm() on above-Unicode code points, and especially calling it with Perl's invented extended UTF-8. This commit changes all such input into a legal value, replacing all above-Unicode with the highest permanently unassigned code point, U+10FFFF.

This set of changes may require a perldelta entry, and please state your opinion

This value is not going to be used again. I put in the ++ out of habit.

This creates an internal macro that skips some error checking for use when we don't care if it is completely well-formed or not.

The next commit will want to use the results later.

As stated in the comments added by this commit, it is undefined behavior to call strxfrm() on above-Unicode code points, and especially calling it with Perl's invented extended UTF-8. This commit changes all such input into a legal value, replacing all above-Unicode with the highest permanently unassigned code point, U+10FFFF.

khwilliamson added 5 commits February 10, 2025 13:01

locale.c: Remove useless ++ increment

753d054

This value is not going to be used again. I put in the ++ out of habit.

utf8.h: Split a macro into components

207ec75

This creates an internal macro that skips some error checking for use when we don't care if it is completely well-formed or not.

run/locale.t: Add detail to test names

e1ae10b

run/locale.t: Hoist code out of a block

0328942

The next commit will want to use the results later.

khwilliamson force-pushed the locale_leak branch from 9b053d8 to 7d9b578 Compare February 11, 2025 12:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memcollxfrm: Handle above-Unicode code points #22989

memcollxfrm: Handle above-Unicode code points #22989

khwilliamson commented Feb 10, 2025

memcollxfrm: Handle above-Unicode code points #22989

Are you sure you want to change the base?

memcollxfrm: Handle above-Unicode code points #22989

Conversation

khwilliamson commented Feb 10, 2025