Error Messages: on name resolution failure, suggest similar names #5522
+603
−144
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This change improves error messages when the user provides a name in Unison source code files that does not exactly match a known name in the codebase or within the file. Specifically, it suggests that the user try one of the top 3 closest name matches by Levenshtein edit distance. This is intended to aid users in more quickly resolving typos and similar errors.
The PR introduces a few new suggestion categories that align with this fuzzy matching behaviour. Suggestions for matches with a similar name and the right type are presented first if possible. Otherwise, suggestions for similar name but wrong type are presented.
For example,
Changes to Error Message Preambles
Additionally, suggestions in error messages on name resolution failure now use plurality-agnostic messages for simplicity and consistency. For example we now have,
For example, this simplifies some of the error message code from:
to
Relevant Issues
Partially addresses #713.
Implementation notes
How does it accomplish it, in broad strokes? i.e. How does it change the Haskell codebase?
Unison.FileParsers
, functioncomputeTypecheckingEnvironment
, to produce aTypechecker.Env
with a new fieldfreeNameToFuzzyTermsByShortName
.freeNameToFuzzyTermsByShortName
is a mapping from the free names in the Unison file that needs to be type-checked to a mapping from short names to terms (similar to the fieldtermsByShortname
).fuzzyFindByEditDistanceRanked
used to produce fuzzy matches ranked by edit distance. This function uses two new functions inUnison.Names
,queryEditDistances
andqueryEditDistances'
that search throughNames
andSet Name
respectively to get edit distnaces.Unison.Typechecker
such that theresolve
function used byresolveNote
and in turn bytypeDirectedNameResolution
to produce and append fuzzy match suggestions to theResolution
data type.Unison.Typechecker.Context
to include theSimilarNameRightType
andSimilarNameWrongType
cases.Unison.PrintError
to handle printing of the new suggestion types.Interesting/controversial decisions
1
I chose fuzzy matching by edit distance because it is usually more appropriate than FZF-style fuzzy finding for suggesting typos or other user errors. I tried FZF-style matching initially since existing code used it, but it would sometimes produce strange matches. Additionally, the FZF-style matching code is significantly slower and currently includes a prefilter that restricts matching to substrings and is not as useful.
2
A tricky piece of improving fuzzy match results was preprocessing names in the search space to the last N name segments where N is the number of name segments in the free name the user provided. This helped reduce edit distance bias towards shorter, fully-qualified names. For example, a deeply nested name like
foo.bar.biz.bite
would be deprioritized compared to a name likebyz.byte
if the user tried to usebete
orbaz.bete
despite the edit distance being the same when considering only the last N segments. This could perhaps be improved by also considering aliases and qualified imports, but the adjustment here proved relatively simple and effective.This behaviour is covered by one of the transcript tests.
3
Somewhat arbitrarily, I chose to show only the top 3 matches by edit distance.
Additionally, I decided to filter out matches with large edit distance. Here I made the decision after some manual testing to go with the following max score formula:
4
There may be some unintended impact to the dependence of some transcript tests to the global namespace. When the global namespace changes, fuzzy matching results could change and this may change transcript test outputs. This will usually change in ucm transcripts with the
:error
directive so the change should be automatically captured, but needs to be reviewed.Test coverage
Some existing transcript tests cover name resolution failure error messages.
A new transcript
name-resolution-fuzzy.md
was added to more thoroughly cover all fuzzy name resolution scenarios.As noted in this transcript, there is another case where the term matches with terms with the wrong name but the right type. This case was not added here because it is unclear how to produce a concrete example. This kind of suggestion does not seem to be produced in the code paths in
Unison.TypeChecker
.Loose ends
1
It would be nice to have LSP code actions for accepting suggestions. This should be a separate ticket, and the design here hopefully doesn't conflict with that goal, but I don't have enough experience in the codebase to know.
2
It would be nice if LSP error reporting did not stop on the first error in the file. Although in the UCM it may be overwhelming to print all errors, it is useful while working on source code to be able to see all error hints so the user can work through errors in an order that works for them.
3
When name resolution fails it normally prints
However, when suggestions are available this isn't printed. If none of the suggestions make sense, then it might be good to for the user to be shown more general guidance, perhaps after the suggestions.
4
This PR addresses some of the situations documented by @seagreen in #713, but not all of them.
Example 1 - Local Name Resolution (Directly Addressed)
Now gives the more helpful suggestions:
Example 2 - Type Symbol Resolution (Not Addressed)
The following case where the name of the type is wrong (
Bool
instead ofBoolean
) is not addressed here as type symbol resolution seems to occur before term resolution and needs additional work.Example 3 - Global Name Resolution (Partially Addressed)
Unfortunately, some builtins like
true
don't seem to be readily found in the parsing environment and global namespace so the following provides some awkward suggestions:However, many other global names can be found and this scenario usually works. See the example above with
truncate
in the overview.