Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Messages: on name resolution failure, suggest similar names #5522

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

xmbhasin
Copy link
Contributor

@xmbhasin xmbhasin commented Jan 4, 2025

Overview

This change improves error messages when the user provides a name in Unison source code files that does not exactly match a known name in the codebase or within the file. Specifically, it suggests that the user try one of the top 3 closest name matches by Levenshtein edit distance. This is intended to aid users in more quickly resolving typos and similar errors.

The PR introduces a few new suggestion categories that align with this fuzzy matching behaviour. Suggestions for matches with a similar name and the right type are presented first if possible. Otherwise, suggestions for similar name but wrong type are presented.

For example,

-- Should match truncate and truncate0 though they have different types
myFunction : Float -> Nat
myFunction = Flat.TRuncate2x4
  Loading changes detected in scratch.u.

  I couldn't figure out what Flat.TRuncate2x4 refers to here:

      3 | myFunction = Flat.TRuncate2x4

  I found one or more terms in scope with similar names but the wrong types.
  If you meant to use one these, try using it instead and then adjusting types:

  truncate : Float -> Int
  truncate0 : Int -> Nat

Changes to Error Message Preambles

Additionally, suggestions in error messages on name resolution failure now use plurality-agnostic messages for simplicity and consistency. For example we now have,

  I found one or more terms in scope with similar names and the right types.
  If you meant to use one of these, try using it instead:

For example, this simplifies some of the error message code from:

    wrongTypeText pl =
      Pr.paragraphyText
        ( mconcat
            [ "I found ",
              pl "a term" "some terms",
              " in scope with ",
              pl "a " "",
              "matching name",
              pl "" "s",
              " but ",
              pl "a " "",
              "different type",
              pl "" "s",
              ". ",
              "If ",
              pl "this" "one of these",
              " is what you meant, try using its full name:"
            ]
        )
        <> "\n\n"

to

  rightNameWrongTypeText _ =
    mconcat
      [ "I found one or more terms in scope with the ",
        Pr.bold "right names ",
        "but the ",
        Pr.bold "wrong types.",
        "\n",
        "If you meant to use one of these, try using it with its full name and then adjusting types",
        ":\n\n"
      ]

Relevant Issues

Partially addresses #713.

Implementation notes

How does it accomplish it, in broad strokes? i.e. How does it change the Haskell codebase?

  1. Updates Unison.FileParsers, function computeTypecheckingEnvironment, to produce a Typechecker.Env with a new field freeNameToFuzzyTermsByShortName.
    1. The field freeNameToFuzzyTermsByShortName is a mapping from the free names in the Unison file that needs to be type-checked to a mapping from short names to terms (similar to the field termsByShortname).
      • We need to group the terms by free name so that we don't mix fuzzy matches between one free name and another giving poor suggestions.
    2. Adds function fuzzyFindByEditDistanceRanked used to produce fuzzy matches ranked by edit distance. This function uses two new functions in Unison.Names, queryEditDistances and queryEditDistances' that search through Names and Set Name respectively to get edit distnaces.
  2. Updates Unison.Typechecker such that the resolve function used by resolveNote and in turn by typeDirectedNameResolution to produce and append fuzzy match suggestions to the Resolution data type.
    • Suggestion types were expanded in Unison.Typechecker.Context to include the SimilarNameRightType and SimilarNameWrongType cases.
  3. Updates Unison.PrintError to handle printing of the new suggestion types.

Interesting/controversial decisions

1

I chose fuzzy matching by edit distance because it is usually more appropriate than FZF-style fuzzy finding for suggesting typos or other user errors. I tried FZF-style matching initially since existing code used it, but it would sometimes produce strange matches. Additionally, the FZF-style matching code is significantly slower and currently includes a prefilter that restricts matching to substrings and is not as useful.

2

A tricky piece of improving fuzzy match results was preprocessing names in the search space to the last N name segments where N is the number of name segments in the free name the user provided. This helped reduce edit distance bias towards shorter, fully-qualified names. For example, a deeply nested name like foo.bar.biz.bite would be deprioritized compared to a name like byz.byte if the user tried to use bete or baz.bete despite the edit distance being the same when considering only the last N segments. This could perhaps be improved by also considering aliases and qualified imports, but the adjustment here proved relatively simple and effective.

This behaviour is covered by one of the transcript tests.

3

Somewhat arbitrarily, I chose to show only the top 3 matches by edit distance.

Additionally, I decided to filter out matches with large edit distance. Here I made the decision after some manual testing to go with the following max score formula:

    -- Expect edit distances (number of typos) to be about half the length of the name being queried
    -- But clamp max edit distance to work well with very short names
    -- and keep ranking reasonably fast when a verbose name is queried
    maxScore = clamp (3, 16) $ Text.length (nameToText name) `ceilingDiv` 2

4

There may be some unintended impact to the dependence of some transcript tests to the global namespace. When the global namespace changes, fuzzy matching results could change and this may change transcript test outputs. This will usually change in ucm transcripts with the :error directive so the change should be automatically captured, but needs to be reviewed.

Test coverage

Some existing transcript tests cover name resolution failure error messages.

A new transcript name-resolution-fuzzy.md was added to more thoroughly cover all fuzzy name resolution scenarios.

As noted in this transcript, there is another case where the term matches with terms with the wrong name but the right type. This case was not added here because it is unclear how to produce a concrete example. This kind of suggestion does not seem to be produced in the code paths in Unison.TypeChecker.

Loose ends

1

It would be nice to have LSP code actions for accepting suggestions. This should be a separate ticket, and the design here hopefully doesn't conflict with that goal, but I don't have enough experience in the codebase to know.

2

It would be nice if LSP error reporting did not stop on the first error in the file. Although in the UCM it may be overwhelming to print all errors, it is useful while working on source code to be able to see all error hints so the user can work through errors in an order that works for them.

3

When name resolution fails it normally prints

  I also don't know what type it should be.

  Some common causes of this error include:
    * Your current namespace is too deep to contain the
      definition in its subtree
    * The definition is part of a library which hasn't been
      added to this project
    * You have a typo in the name

However, when suggestions are available this isn't printed. If none of the suggestions make sense, then it might be good to for the user to be shown more general guidance, perhaps after the suggestions.

4

This PR addresses some of the situations documented by @seagreen in #713, but not all of them.

Example 1 - Local Name Resolution (Directly Addressed)

aaaaa : Boolean
aaaaa = true

f : ()
f =
  if aaaaA then () else ()

Now gives the more helpful suggestions:

  I couldn't figure out what aaaaA refers to here:

      6 |   if aaaaA then () else ()

  I found one or more terms in scope with similar names and the right types.
  If you meant to use one of these, try using it instead:

  aaaaa : Boolean

Example 2 - Type Symbol Resolution (Not Addressed)

The following case where the name of the type is wrong (Bool instead of Boolean) is not addressed here as type symbol resolution seems to occur before term resolution and needs additional work.

x : Bool
x = true

Example 3 - Global Name Resolution (Partially Addressed)

Unfortunately, some builtins like true don't seem to be readily found in the parsing environment and global namespace so the following provides some awkward suggestions:

x : Boolean
x = True
  I found one or more terms in scope with similar names but the wrong types.
  If you meant to use one of these, try using it instead and then adjusting types:

  Type : Type -> Link
  Pattern.run : Pattern a -> a -> Optional ([a], a)
  Scope.run : (Unit ->{Scope s, g} r) ->{g} r

However, many other global names can be found and this scenario usually works. See the example above with truncate in the overview.

@xmbhasin xmbhasin force-pushed the 713-fuzzy-name-res branch 4 times, most recently from e0120ee to 95893c6 Compare January 4, 2025 16:21
* Use plurality-agnostic messages for suggestions for simplicity and consistency
* Partially addresses unisonweb#713
@xmbhasin xmbhasin force-pushed the 713-fuzzy-name-res branch from 95893c6 to 25bead3 Compare January 4, 2025 16:33
@xmbhasin xmbhasin marked this pull request as ready for review January 4, 2025 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant