Files
Alex Waygood 8eef2fcaeb [ty] Replace strsim with CPython-based Levenshtein implementation (#23291)
## Summary

For a couple of diagnostics currently, we add a "Did you mean...?"
diagnostic hint if it appears like there's an obvious typo that caused
us to emit an error. The "Did you mean...?" suggestion is generated via
the `strsim` Levenshtein implementation on `crates.io`.

This PR replaces the `strsim` implementation of Levenshtein used to
create these hints with a custom Levenshtein implementation based on the
one that CPython itself uses to create these hints:

```pycon
>>> class Foo:
...     xyxy = 42
...     
>>> Foo.xyxyz
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    Foo.xyxyz
AttributeError: type object 'Foo' has no attribute 'xyxyz'. Did you mean: 'xyxy'?
```

The added tests are also derived from CPython's test suite.

The motivation for copying CPython's implementation almost exactly is
that CPython has had this feature for several Python versions now, and
during that time many bug reports have been filed regarding incorrect
suggestions, which have since been fixed. This implementation is thus
very well "battle-tested" by this point; we can say with a reasonable
degree of confidence that it gives good suggestions for typos in the
Python context.

The ecosystem report on this PR bears out that this is an improvement.
We see bad suggestions going away:

```diff
- [error] invalid-key - Unknown key "pair" for TypedDict `RPCAnalyzedDFMsg` - did you mean "data"?
+ [error] invalid-key - Unknown key "pair" for TypedDict `RPCAnalyzedDFMsg`: Unknown key "pair"
```

and good suggestions being added:

```diff
- [error] invalid-key - Unknown key "old_entity_id" for TypedDict `_EventEntityRegistryUpdatedData_CreateRemove`: Unknown key "old_entity_id"
+ [error] invalid-key - Unknown key "old_entity_id" for TypedDict `_EventEntityRegistryUpdatedData_CreateRemove` - did you mean "entity_id"?
```

This Levenshtein implementation was originally proposed in #18705, and
then again in #18751. Those PRs also made other changes to use the
Levenshtein implementation in certain other areas, however, where
computing the list of suggestions to pass into the Levenshtein algorithm
turned out to be prohibitively expensive. This PR therefore _only_
updates the Levenshtein implementation being used for our existing
subdiagnostics, rather than expanding the callsites of the Levenshtein
implementation.

## Test plan

Unit tests have been added in `levenshtein.rs`. Some mdtests and
snapshots were updated to ensure that they still test what they're meant
to be testing, even with the new Levenshtein implementation.

Co-authored-by: Brent Westbrook <brentrwestbrook@gmail.com>
2026-02-16 10:36:36 +00:00

43 lines
1.4 KiB
TOML

[files]
# https://github.com/crate-ci/typos/issues/868
extend-exclude = [
"crates/ty_vendored/vendor/**/*",
"**/resources/**/*",
"**/snapshots/**/*",
"crates/ruff_linter/src/rules/flake8_implicit_str_concat/rules/collection_literal.rs",
# Completion tests tend to have a lot of incomplete
# words naturally. It's annoying to have to make all
# of them actually words. So just ignore typos here.
"crates/ty_ide/src/completion.rs",
# Same for "Did you mean...?" levenshtein tests.
"crates/ty_python_semantic/src/diagnostic/levenshtein.rs",
]
[default.extend-words]
"arange" = "arange" # e.g. `numpy.arange`
hel = "hel"
whos = "whos"
spawnve = "spawnve"
ned = "ned"
pn = "pn" # `import panel as pn` is a thing
poit = "poit"
BA = "BA" # acronym for "Bad Allowed", used in testing.
jod = "jod" # e.g., `jod-thread`
Numer = "Numer" # Library name 'NumerBlox' in "Who's Using Ruff?"
CPY = "CPY" # it's a Ruff rule category
[default]
extend-ignore-re = [
# Line ignore with trailing "spellchecker:disable-line"
"(?Rm)^.*#\\s*spellchecker:disable-line$",
"LICENSEs",
# Various third party dependencies uses `typ` as struct field names (e.g., lsp_types::LogMessageParams)
"typ",
# TODO: Remove this once the `TYP` redirects are removed from `rule_redirects.rs`
"TYP",
"ntBre"
]
[default.extend-identifiers]
"FrIeNdLy" = "FrIeNdLy"