Skip to content

Reading a valid Unicode String with a limit #430

Open
@OptimumCode

Description

Hello,

Could you please advise what could be used in KMP project to read a valid String from the Source providing the approximate limit in bytes for that String?

The use-case is the following:

I have a Source that is used for parsing data from a file (file might contain non-ASCII characters). I need to read a portion of the content, parse it and if more data needed - read another portion from the Source, etc.

Right now there is a method Source.readString(byteCount: Long) that accepts limit in bytes but if the last byte is just a part of the actual codepoint it will be substituted with the replacement codepoint. And I won't get that last codepoint on a second read attempt either.

I wonder if there is a way to solve my use-case without reimplementing UTF-16 decoding on my side (logic from here). For example, in Java I could use java.io.Reader#read(char[]) method and if the last char is a high-surrogate I could try to read another char to check whether the string is ill-formed or not (real example is a StreamReader from SnakeYAML)

Would really appreciate your thoughts and suggestions. Thank you!

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions