mirror of https://github.com/astral-sh/ruff.git synced 2026-05-06 08:56:57 -04:00

Files

T

Carl Meyer 7d8c6422b1 [ty] do not union Unknown into unannotated container types (#23718 )

## Summary

Part of https://github.com/astral-sh/ty/issues/1240

Stop unioning `Unknown` into the types of un-annotated container
literals.

We discussed perhaps continuing to union `Unknown` if the inferred type
is a singleton type like `None`. I'd like to explore this as a separate
change so we can see the ecosystem impact more clearly.

## Test Plan

Adjusted many mdtest expectations.

There's one test case that regresses with this change, because we don't
fully support union type contexts (it can require a lot of repeat
inference in pathological cases). So `x10: list[int | str] | list[int |
None] = [1, 2, 3]` previously passed only because we inferred the RHS as
`list[Unknown | int]` -- now we infer it as `list[int]` and the
assignment fails due to invariance. I've kept this test as a TODO since
it's not trivial to fix. Mypy errors in the same way we now do,
suggesting it's not necessarily a huge priority either.

## Ecosystem

This change is expected to cause new diagnostics and some false
positives, since we are replacing very-forgiving gradual types with
non-gradual inference heuristics.

Many of these issues could be solved or significantly mitigated by
https://github.com/astral-sh/ty/issues/1473, depending how far we are
able to go with that, and particularly whether we can afford to apply it
also to container literals which are not empty at construction. The
downside of broad application of this approach is that in some cases it
could cause us to widen container types when the user actually just made
a mistake and added the wrong thing to a container, and would prefer an
error at that location.

Some categories of new error that show up in the ecosystem report:

### Implicit TypedDicts

These are cases where the dictionary is heterogeneous and would ideally
be typed as a `TypedDict` but isn't, for example:

```py
def make_person(photo: bytes | None):
    person = {"name": "Pat", age: 29}
    if photo is not None:
        person["photo"] = photo
```

We (and pyrefly, and pyright in strict mode) error on the last line here
because we already inferred `dict[str, str | int]`, so we can't add a
`bytes` value.

Mypy prefers common-base joins over union joins, so it infers `dict[str,
object]`, which avoids the error adding a `bytes` value. This means the
value type is less precise, which theoretically means potentially more
errors using values from the dict later. But in practice with this
heterogeneous pattern, either `object` or the union will cause similar
problems when using values from the dict -- in either case you'd
probably have to cast or narrow.

Pyright (in non-strict mode) has a special case where it falls back to
`Unknown` when it sees heterogenous value types, so it infers this as
`dict[str, Unknown]`.

I think we could consider either the mypy or pyright approaches here,
but we don't need to do it in this PR; we can file an issue and consider
it as a follow-up.

Another symptom of this same root cause is repetitive diagnostics
arising from a large union inferred as value type; the same fixes would
address this.

### Negative intersections, particularly with e.g. `~AlwaysFalsy` or
`~None`.

Example:

```py
class A: ...

def _(a: A | None) -> dict[str, A]:
    if a:
        d = {"a": a}
        return d
    return {}
```

We error on `return d` because "expected `dict[str, A]`, found
`dict[str, A & ~AlwaysFalsy]`". This is an issue specific to
intersection types, so no other type checker has this problem.

I think when we "promote literals" (we may need to give this operation a
broader name -- it's really "type promotion to give a better inferred
type when invariance means too-precise is bad") we should also eliminate
all negative types from intersections. I would prefer to do this as a
separate PR for easier review and better visibility of ecosystem impact,
but I think it's high priority to land soon after this PR (ideally
before a release).

### Overly-precise inference for singleton `None`

This did show up, to the tune of ~100 new diagnostics
([example](https://github.com/pytorch/ignite/blob/b73a4c20e991b3e14949f2a69651ed2a7219f2fd/tests/ignite/engine/test_engine.py#L158)),
so I think it is worth addressing as a follow-up.

2026-03-05 15:40:48 -08:00

6.0 KiB

Raw Blame History

Cycles

Function signature

Deferred annotations can result in cycles in resolving a function signature:

from __future__ import annotations

# error: [invalid-type-form]
def f(x: f):
    pass

reveal_type(f)  # revealed: def f(x: Unknown) -> Unknown

Unpacking

See: https://github.com/astral-sh/ty/issues/364

class Point:
    def __init__(self, x: int = 0, y: int = 0) -> None:
        self.x = x
        self.y = y

    def replace_with(self, other: "Point") -> None:
        self.x, self.y = other.x, other.y

p = Point()
reveal_type(p.x)  # revealed: Unknown | int
reveal_type(p.y)  # revealed: Unknown | int

Self-referential bare type alias

[environment]
python-version = "3.12"  # typing.TypeAliasType

from typing import Union, TypeAliasType, Sequence, Mapping

A = list["A" | None]

def f(x: A):
    # TODO: should be `list[A | None]`?
    reveal_type(x)  # revealed: list[Divergent]
    # TODO: should be `A | None`?
    reveal_type(x[0])  # revealed: Divergent

JSONPrimitive = Union[str, int, float, bool, None]
JSONValue = TypeAliasType("JSONValue", 'Union[JSONPrimitive, Sequence["JSONValue"], Mapping[str, "JSONValue"]]')

def _(x: JSONValue):
    reveal_type(x)  # revealed: Sequence[JSONValue] | int | float | None | Mapping[str, JSONValue]

Self-referential legacy type variables

from typing import Generic, TypeVar

B = TypeVar("B", bound="Base")

class Base(Generic[B]):
    pass

Parameter default values

This is a regression test for https://github.com/astral-sh/ty/issues/1402. When a parameter has a default value that references the callable itself, we currently prevent infinite recursion by simply falling back to Unknown for the type of the default value, which does not have any practical impact except for the displayed type. We could also consider inferring Divergent when we encounter too many layers of nesting (instead of just one), but that would require a type traversal which could have performance implications. So for now, we mainly make sure not to panic or stack overflow for these seeminly rare cases.

Functions

class C:
    def f(self: "C"):
        def inner_a(positional=self.a):
            return
        self.a = inner_a
        # revealed: def inner_a(positional=...) -> Unknown
        reveal_type(inner_a)

        def inner_b(*, kw_only=self.b):
            return
        self.b = inner_b
        # revealed: def inner_b(*, kw_only=...) -> Unknown
        reveal_type(inner_b)

        def inner_c(positional_only=self.c, /):
            return
        self.c = inner_c
        # revealed: def inner_c(positional_only=..., /) -> Unknown
        reveal_type(inner_c)

        def inner_d(*, kw_only=self.d):
            return
        self.d = inner_d
        # revealed: def inner_d(*, kw_only=...) -> Unknown
        reveal_type(inner_d)

We do, however, still check assignability of the default value to the parameter type:

class D:
    def f(self: "D"):
        # error: [invalid-parameter-default] "Default value of type `Unknown | (def inner_a(a: int = ...) -> Unknown)` is not assignable to annotated parameter type `int`"
        def inner_a(a: int = self.a): ...
        self.a = inner_a

Lambdas

class C:
    def f(self: "C"):
        self.a = lambda positional=self.a: positional
        self.b = lambda *, kw_only=self.b: kw_only
        self.c = lambda positional_only=self.c, /: positional_only
        self.d = lambda *, kw_only=self.d: kw_only

        # revealed: (positional=...) -> Unknown
        reveal_type(self.a)

        # revealed: (*, kw_only=...) -> Unknown
        reveal_type(self.b)

        # revealed: (positional_only=..., /) -> Unknown
        reveal_type(self.c)

        # revealed: (*, kw_only=...) -> Unknown
        reveal_type(self.d)

Self-referential implicit attributes

class Cyclic:
    def __init__(self, data: str | dict):
        self.data = data

    def update(self):
        if isinstance(self.data, str):
            self.data = {"url": self.data}

# revealed: Unknown | str | dict[Unknown, Unknown] | dict[str, str]
reveal_type(Cyclic("").data)

Decorator defined on a base class with constrained typevars, accessed from a subclass with decorated generic parameters

This example was minimized from a real issue in robotframework. It created a complicated cycle with multiple cycle heads, which also involved a tricky Salsa behavior that comes up when a query oscillates between being a cycle head and not being one.

entry.py:

from derived import Derived

Derived.decorate
# revealed: bound method <class 'Derived'>.decorate[T](item_class: type[T]) -> type[T]
reveal_type(Derived.decorate)

derived.py:

from ty_extensions import reveal_mro
import bases

class Derived(bases.GenericBase["Foo", "Bar"]): ...

@Derived.decorate
class Foo(bases.Foo): ...

# revealed: <class 'Foo'>
reveal_type(Foo)
# revealed: (<class 'derived.Foo'>, <class 'bases.Foo'>, <class 'object'>)
reveal_mro(Foo)

@Derived.decorate
class Bar(bases.Bar): ...

# revealed: <class 'Bar'>
reveal_type(Bar)
# revealed: (<class 'derived.Bar'>, <class 'bases.Bar'>, <class 'object'>)
reveal_mro(Bar)

bases.py:

from typing import Generic, TypeVar, Type
from ty_extensions import reveal_mro

T = TypeVar("T")
B1 = TypeVar("B1", bound="Foo")
B2 = TypeVar("B2", bound="Bar")

class GenericBase(Generic[B1, B2]):
    @classmethod
    def decorate(cls, item_class: Type[T]) -> Type[T]:
        return item_class

# revealed: <class 'GenericBase'>
reveal_type(GenericBase)
# revealed: (<class 'GenericBase[Unknown, Unknown]'>, typing.Generic, <class 'object'>)
reveal_mro(GenericBase)
# revealed: (<class 'GenericBase[Foo, Bar]'>, typing.Generic, <class 'object'>)
reveal_mro(GenericBase["Foo", "Bar"])

class Foo: ...
class Bar: ...

6.0 KiB Raw Blame History