Reading a valid Unicode String with a limit

Hello,

Could you please advise what could be used in KMP project to read a **valid** String from the `Source` providing the approximate limit in bytes for that `String`?

The use-case is the following:

I have a `Source` that is used for parsing data from a file (file might contain non-ASCII characters). I need to read a portion of the content, parse it and if more data needed - read another portion from the `Source`, etc.

Right now there is a method `Source.readString(byteCount: Long)` that accepts limit in bytes but if the last byte is just a part of the actual codepoint it will be substituted with the replacement codepoint. And I won't get that last codepoint on a second read attempt either.

I wonder if there is a way to solve my use-case without reimplementing UTF-16 decoding on my side (logic from [here](https://github.com/Kotlin/kotlinx-io/blob/master/core/common/src/internal/-Utf8.kt)). For example, in Java I could use `java.io.Reader#read(char[])` method and if the last char is a high-surrogate I could try to read another `char` to check whether the string is ill-formed or not (real example is a `StreamReader` from [SnakeYAML](https://bitbucket.org/snakeyaml/snakeyaml/src/a51246b7bbc66394bdbb7d5ab23b7898a724054b/src/main/java/org/yaml/snakeyaml/reader/StreamReader.java#lines-183))

Would really appreciate your thoughts and suggestions. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading a valid Unicode String with a limit #430

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development