Skip to content

builtins.toXML can return strings with invalid UTF-8 encoding #12061

Open
@NaN-git

Description

Describe the bug

Applying builtins.toXML to a string with invalid UTF-8 encoding returns

"<?xml version='1.0' encoding='utf-8'?>\n<expr>\n  <string value=\"[...]\" />\n</expr>\n"

where [...] is the input string with invalid UTF-8 encoding.

Steps To Reproduce

  1. Download test file:
wget -O - https://github.com/flenniken/utf8tests/raw/refs/heads/main/utf8tests.bin | head -n 208 > utf8tests.bin
  1. Evaluate the following Nix expression, e.g. in nix repl:
builtins.toXML (builtins.readFile ./utf8tests.bin)
  1. The output contains the invalid UTF-8 input string.

Expected behavior

Either builtins.readFile or builtins.toXML should fail and a proper error message should be displayed.

Additional context

Related to issue #12060.

Checklist


Add 👍 to issues you find important.

Metadata

Assignees

No one assigned

    Labels

    bugidea approvedThe given proposal has been discussed and approved by the Nix team. An implementation is welcome.languageThe Nix expression language; parser, interpreter, primops, evaluation, etc

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions