## Summary
Setting `CARGO_DEV_OPT_LEVEL=1` makes compile times much slower but
mdtest runtime much faster. Overall, for a full run of ty's mdtest
suite, it's faster to run mdtests (including compilation time) if you
set this environment variable.
This PR sets the environment variable in mdtest.py, but only if filters
weren't specified. If you specified a filter, only a (probably small)
subset of mdtests will be run, so the compile time is going to dominate
and setting this environment variable will be counterproductive. For a
full mdtest run, though, it'll be helpful.
## Test Plan
I ran mdtest.py locally and observed that the whole mdtest suite
finished in <3s, compared to 13.77s on `main`
## Summary
This change improves `mdtest.py` watch mode to automatically re-run
tests when snapshot files are rejected during review (e.g., via `cargo
insta review` in a separate process).
When a pending snapshot (`.snap.new` file) is deleted — which happens
when a snapshot is rejected — the watch mode now:
1. Detects the deletion of `.snap.new` files in the snapshots directory
2. Maps the snapshot filename back to its source `.md` file using a new
`_md_file_for_snapshot()` helper method
3. Re-runs the corresponding mdtest to regenerate the snapshot from the
current code state
This is useful because rejected snapshots are often stale (produced by
an earlier test run), and re-running the test ensures the snapshot is
regenerated with the latest code.
## Test Plan
This PR was Claude-generated (with guidance and iteration by me), but I
tested it manually locally and it does what I want it to! To test it
manually I:
- Started `mdtest.py` running in an embedded terminal inside VSCode
- Edited an mdtest with `<!-- snapshot-diagnostics -->` enabled so that
a new diagnostic was emitted
- In a separate process, ran `cargo insta review`
- Went back to VSCode and changed the `.md` file again so that the
diagnostic would change
- Went back to the separate process and rejected the snapshot change
(it's now stale, since I edited the `.md` file again)
- Observed that a new pending snapshot was instantly regenerated by
mdtest.py after the old (stale) snapshot was rejected. This is different
to what happens on `main`, where mdtest.py does not create a new
snapshot for you to accept/reject unless you press return (to rerun all
tests) or make another edit to the `.md` file
---------
Co-authored-by: Claude <noreply@anthropic.com>
Without this, I was getting build failures running `mdtest.py` because
`uv run` would choose Python 3.14, but the locked version of watchfiles
used an older PyO3 and wouldn't build with 3.14.
Alternative fixes could include pinning Python to `<3.14` (ugh), or just
updating the lockfile without pinning a minimum version of watchfiles --
but pinning a minimum version seems harmless?
## Summary
Add lockfiles for all mdtests which make use of external dependencies.
When running tests normally, we use this lockfile when creating the
temporary venv using `uv sync --locked`. A new
`MDTEST_UPGRADE_LOCKFILES` environment variable is used to switch to a
mode in which those lockfiles can be updated or regenerated. When using
the Python mdtest runner, this environment variable is automatically set
(because we use this command while developing, not to simulate exactly
what happens in CI). A command-line flag is provided to opt out of this.
## Test Plan
### Using the mdtest runner
#### Adding a new test (no lockfile yet)
* Removed `attrs.lock` to simulate this
* Ran `uv run crates/ty_python_semantic/mdtest.py -e external/`. The
lockfile is generated and the test succeeds.
#### Upgrading/downgrading a dependency
* Changed pydantic requirement from `pydantic==2.12.2` to
`pydantic==2.12.5` (also tested with `2.12.0`)
* Ran `uv run crates/ty_python_semantic/mdtest.py -e external/`. The
lockfile is updated and the test succeeds.
### Using cargo
#### Adding a new test (no lockfile yet)
* Removed `attrs.lock` to simulate this
* Ran `MDTEST_EXTERNAL=1 cargo test -p ty_python_semantic --test mdtest
mdtest__external` "naively", which outputs:
> Failed to setup in-memory virtual environment with dependencies:
Lockfile not found at
'/home/shark/ruff/crates/ty_python_semantic/resources/mdtest/external/attrs.lock'.
Run with `MDTEST_UPGRADE_LOCKFILES=1` to generate it.
* Ran `MDTEST_UPGRADE_LOCKFILES=1 MDTEST_EXTERNAL=1 cargo test -p
ty_python_semantic --test mdtest mdtest__external`. The lockfile is
updated and the test succeeds.
#### Upgrading/downgrading a dependency
* Changed pydantic requirement from `pydantic==2.12.2` to
`pydantic==2.12.5` (also tested with `2.12.0`)
* Ran `MDTEST_EXTERNAL=1 cargo test -p ty_python_semantic --test mdtest
mdtest__external` "naively", which outputs a similar error message as
above.
* Ran the command suggested in the error message (`MDTEST_EXTERNAL=1
MDTEST_UPGRADE_LOCKFILES=1 cargo test -p ty_python_semantic --test
mdtest mdtest__external`). The lockfile is updated and the test
succeeds.
## Summary
This PR adds the possibility to write mdtests that specify external
dependencies in a `project` section of TOML blocks. For example, here is
a test that makes sure that we understand Pydantic's dataclass-transform
setup:
````markdown
```toml
[environment]
python-version = "3.12"
python-platform = "linux"
[project]
dependencies = ["pydantic==2.12.2"]
```
```py
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
user = User(id=1, name="Alice")
reveal_type(user.id) # revealed: int
reveal_type(user.name) # revealed: str
# error: [missing-argument] "No argument provided for required parameter
`name`"
invalid_user = User(id=2)
```
````
## How?
Using the `python-version` and the `dependencies` fields from the
Markdown section, we generate a `pyproject.toml` file, write it to a
temporary directory, and use `uv sync` to install the dependencies into
a virtual environment. We then copy the Python source files from that
venv's `site-packages` folder to a corresponding directory structure in
the in-memory filesystem. Finally, we configure the search paths
accordingly, and run the mdtest as usual.
I fully understand that there are valid concerns here:
* Doesn't this require network access? (yes, it does)
* Is this fast enough? (`uv` caching makes this almost unnoticeable,
actually)
* Is this deterministic? ~~(probably not, package resolution can depend
on the platform you're on)~~ (yes, hopefully)
For this reason, this first version is opt-in, locally. ~~We don't even
run these tests in CI (even though they worked fine in a previous
iteration of this PR).~~ You need to set `MDTEST_EXTERNAL=1`, or use the
new `-e/--enable-external` command line option of the `mdtest.py`
runner. For example:
```bash
# Skip mdtests with external dependencies (default):
uv run crates/ty_python_semantic/mdtest.py
# Run all mdtests, including those with external dependencies:
uv run crates/ty_python_semantic/mdtest.py -e
# Only run the `pydantic` tests. Use `-e` to make sure it is not skipped:
uv run crates/ty_python_semantic/mdtest.py -e pydantic
```
## Why?
I believe that this can be a useful addition to our testing strategy,
which lies somewhere between ecosystem tests and normal mdtests.
Ecosystem tests cover much more code, but they have the disadvantage
that we only see second- or third-order effects via diagnostic diffs. If
we unexpectedly gain or lose type coverage somewhere, we might not even
notice (assuming the gradual guarantee holds, and ecosystem code is
mostly correct). Another disadvantage of ecosystem checks is that they
only test checked-in code that is usually correct. However, we also want
to test what happens on wrong code, like the code that is momentarily
written in an editor, before fixing it. On the other end of the spectrum
we have normal mdtests, which have the disadvantage that they do not
reflect the reality of complex real-world code. We experience this
whenever we're surprised by an ecosystem report on a PR.
That said, these tests should not be seen as a replacement for either of
these things. For example, we should still strive to write detailed
self-contained mdtests for user-reported issues. But we might use this
new layer for regression tests, or simply as a debugging tool. It can
also serve as a tool to document our support for popular third-party
libraries.
## Test Plan
* I've been locally using this for a couple of weeks now.
* `uv run crates/ty_python_semantic/mdtest.py -e`
## Summary
Allow users of `mdtest.py` to press enter to rerun all mdtests without
recompiling (thanks @AlexWaygood).
I swear I tried three other approaches (including a fully async version)
before I settled on this solution. It is indeed silly, but works just
fine.
## Test Plan
Interactive playing around
## Summary
This change to the mdtest runner makes it easy to run on a subset of
tests/files. For example:
```
▶ uv run crates/ty_python_semantic/mdtest.py implicit
running 1 test
test mdtest__implicit_type_aliases ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 281 filtered out; finished in 0.83s
Ready to watch for changes...
```
Subsequent changes to either that test file or the Rust source code will
also only rerun the `implicit_type_aliases` test.
Multiple arguments can be provided, and filters can either be partial
file paths (`loops/for.md`, `loops/for`, `for`) or mangled test names
(`loops_for`):
```
▶ uv run crates/ty_python_semantic/mdtest.py implicit binary/union
running 2 tests
test mdtest__binary_unions ... ok
test mdtest__implicit_type_aliases ... ok
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 280 filtered out; finished in 0.85s
Ready to watch for changes...
```
## Test Plan
Tested it interactively for a while
## Summary
This change reduces MD test compilation time from 6s to 3s on my laptop.
We don't need to build the unit tests and the corpus tests when we're
only interested in Markdown-based tests.
## Test Plan
local benchmarks