Skip to content

Index out of range error with multibyte unicode on invalid column reported by linter #226

Open
@Maneren

Description

FAQ

  • I have checked the FAQ and it didn't resolve my problem.

Issues

  • I have checked existing issues and there are no issues with the same problem.

Neovim Version

NVIM v0.10.2

Dev Version?

  • I am using a stable Neovim release version, or if I am using a dev version of Neovim I have confirmed that my issue is reproducible on a stable version.

Operating System

GNU/Linux 6.12.6-1-MANJARO

Minimal Config

vim.g.loaded_remote_plugins = ""
vim.o.runtimepath = vim.env["VIMRUNTIME"]

local temp_dir = vim.fs.dirname(vim.fs.dirname(vim.fn.tempname()))

local package_root = vim.fs.joinpath(temp_dir, "nvim", "site")
vim.o.packpath = package_root

local install_path = vim.fs.joinpath(package_root, "pack", "deps", "start", "mini.deps")

local null_ls_config = function()
	local null_ls = require("null-ls")
	-- add only what you need to reproduce your issue
	null_ls.setup({
		sources = {
			null_ls.builtins.diagnostics.markdownlint,
		},
		debug = true,
	})
end

local function load_plugins()
	-- only add other plugins if they are necessary to reproduce the issue
	local deps = require("mini.deps")
	deps.setup({
		path = {
			package = package_root,
		},
	})
	deps.add({
		source = "nvimtools/none-ls.nvim",
		depends = { "nvim-lua/plenary.nvim" },
	})
	deps.later(null_ls_config)
end

if vim.fn.isdirectory(install_path) == 0 then
	vim.fn.system({ "git", "clone", "--filter=blob:none", "https://github.com/echasnovski/mini.deps", install_path })
end
load_plugins()

Steps to Reproduce

Put this line into a markdown file and open it:

- $\forall 𝑣_1, 𝑣_2, \ldots, 𝑣_𝑛 \in 𝑇_𝑛: 𝑣_1, 𝑣_2, \ldots, 𝑣_𝑛 \in 𝐾 \iff

Note that the italic characters are hard-coded in Unicode italic (originally copied from a PDF) that take 4 bytes. They are part of the text, not a part of code highlighting.

Reproducibility Check

  • I confirm that my minimal config is based on the minimal_init.lua template and that my issue is reproducible by running nvim --clean -u minimal_init.lua and following the steps above.

Expected Behavior

Diagnostic Line length [Expected: 80, Actual: 85] displayed at the last character of the line.

Actual Behavior

[null-ls] failed to run generator: ...im/lazy/none-ls.nvim/lua/null-ls/helpers/diagnostics.lua:73: index out of range

Debug Log

[TRACE čt 26. prosince 2024, 01:51:30] /tmp/nvim-minimal.maneren/nvim/site/pack/deps/opt/none-ls.nvim/lua/null-ls/client.lua:127: starting null-ls client
[TRACE čt 26. prosince 2024, 01:51:30] /tmp/nvim-minimal.maneren/nvim/site/pack/deps/opt/none-ls.nvim/lua/null-ls/rpc.lua:117: received LSP request for method initialize
[DEBUG čt 26. prosince 2024, 01:51:30] /tmp/nvim-minimal.maneren/nvim/site/pack/deps/opt/none-ls.nvim/lua/null-ls/client.lua:209: unable to notify client for method textDocument/didOpen (client not active): {
  textDocument = {
    uri = "file:///home/maneren/obsidian-vault/Z%c4%8cU/TI/minimal.md"
  }
}
[TRACE čt 26. prosince 2024, 01:51:30] /tmp/nvim-minimal.maneren/nvim/site/pack/deps/opt/none-ls.nvim/lua/null-ls/rpc.lua:144: received LSP notification for method initialized
[TRACE čt 26. prosince 2024, 01:51:30] /tmp/nvim-minimal.maneren/nvim/site/pack/deps/opt/none-ls.nvim/lua/null-ls/rpc.lua:144: received LSP notification for method textDocument/didOpen
[TRACE čt 26. prosince 2024, 01:51:30] /tmp/nvim-minimal.maneren/nvim/site/pack/deps/opt/none-ls.nvim/lua/null-ls/generators.lua:48: running generators for method NULL_LS_DIAGNOSTICS_ON_OPEN
[DEBUG čt 26. prosince 2024, 01:51:30] /tmp/nvim-minimal.maneren/nvim/site/pack/deps/opt/none-ls.nvim/lua/null-ls/helpers/generator_factory.lua:352: spawning command "markdownlint" at /home/maneren/obsidian-vault with args { "--stdin" }
[TRACE čt 26. prosince 2024, 01:51:30] /tmp/nvim-minimal.maneren/nvim/site/pack/deps/opt/none-ls.nvim/lua/null-ls/helpers/generator_factory.lua:236: error output: stdin:1 MD041/first-line-heading/first-line-h1 First line in a file should be a top-level heading [Context: "- Dva blokové kódy $K$ a $K'$ ..."]
stdin:3:81 MD013/line-length Line length [Expected: 80; Actual: 85]

[TRACE čt 26. prosince 2024, 01:51:30] /tmp/nvim-minimal.maneren/nvim/site/pack/deps/opt/none-ls.nvim/lua/null-ls/helpers/generator_factory.lua:237: output: nil
[WARN  čt 26. prosince 2024, 01:51:30] /tmp/nvim-minimal.maneren/nvim/site/pack/deps/opt/none-ls.nvim/lua/null-ls/generators.lua:148: failed to run generator: ...eps/opt/none-ls.nvim/lua/null-ls/helpers/diagnostics.lua:73: index out of range

Help

Yes

Implementation Help

From what I understand, this issue is similar to #51 except with unicode on top.

The diagnostic itself is wrong because markdownlint has also somewhat broken Unicode handling due to Javascript handling string as UTF-16, thus reporting the issue on column 85 out of the 74 columns observed in UTF-8 (I reported this there as well).

However I would expect none-ls to not hard crash with index out of range but suppress the error and highlight the end of the line (as is usually the intention of the linter) or at least show a more informative error message and either way continue to process the rest of the diagnostics.

One way I found to prevent this error is to modify the check added in #36, replacing string.len (that counts bytes) with vim.fn.strdisplaywidth (that afaik counts characters in similar way as vim.str_byteindex).

It seems to work fine in my case, however I don't know if it's the right permanent solution, since I am only moderately knowledgeable in unicode. For example, Neovim docs talk about UTF-32 vs UTF-16 in relation to these functions and I am not sure how that even relates to UTF-8.

To illustrate

local col = tonumber(entries["col"]) or math.huge
col = math.min(col, vim.fn.strdisplaywidth(content_line))
local byte_index_col = vim.str_byteindex(content_line, col)
print(vim.inspect({
    byte_index_col = byte_index_col,
    entries_col = entries["col"],
    col = col,
    content_line = content_line,
    len = string.len(content_line),
    displaywidth = vim.fn.strdisplaywidth(content_line),
}))

produces

{ 
 byte_index_col = 107, 
 col = 74, 
 content_line = "  $\\forall 𝑣_1, 𝑣_2, \\ldots, 𝑣_𝑛 \\in 𝑇_𝑛: 𝑣_1, 𝑣_2, \\ldots, 𝑣_𝑛 \\in 𝐾 \\iff", 
 displaywidth = 74, 
 entries_col = "81", 
 len = 107 
} 

Requirements

  • I have read and followed the instructions above and understand that my issue will be closed if I did not provide the required information.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions