Files
astral-ruff/scripts/ty_benchmark
Vlad Apostol f854f4fb7f Fix shell injection via shell=True in subprocess calls (#23894)
## Summary

Enable Ruff's own `S602` rule (`subprocess-popen-with-shell-equals-true`
from flake8-bandit) in `pyproject.toml` to enforce no-shell subprocess
calls going forward, and fix the two existing violations it catches.

**`pyproject.toml`** - adds `S602` to the lint `select` list so any
future `shell=True` usage in the scripts is caught automatically by the
linter.

**`scripts/setup_primer_project.py`** - `project.install_cmd` and
`project.deps` come from the mypy-primer project registry (an
externally-fetched third-party config). Both were joined into shell
strings and executed with `shell=True`, making them a supply-chain
injection vector. Fixed by tokenising with `shlex.split()` and dropping
`shell=True`.

**`scripts/ty_benchmark/src/benchmark/snapshot.py`** - `command.prepare`
was passed to `subprocess.run(..., shell=True)`. While no current caller
sets this field, it is a latent injection point. Fixed by tokenising
with `shlex.split()` and dropping `shell=True`; adds the missing `import
shlex`.

## Test Plan

Both scripts are developer-only utilities. The changes are semantically
equivalent for well-formed inputs - `shlex.split()` produces the same
argument list the shell would have constructed, while refusing to pass
metacharacters through to a shell process.

Verified no other `shell=True` uses remain in `scripts/` or `python/`
that would be newly flagged by S602.
2026-03-13 08:13:14 -06:00
..
2025-12-12 17:05:57 +00:00

Getting started

  1. Install uv
    • Unix: curl -LsSf https://astral.sh/uv/install.sh | sh
    • Windows: powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
  2. Build ty: cargo build --bin ty --release
  3. cd into the benchmark directory: cd scripts/ty_benchmark
  4. Install Pyright: npm ci --ignore-scripts
  5. Run benchmarks: uv run benchmark

Requires hyperfine 1.20 or newer.

Benchmarks

Cold check time

Run with:

uv run --python 3.14 benchmark

Measures how long it takes to type check a project without a pre-existing cache.

You can run the benchmark with --single-threaded to measure the check time when using a single thread only.

Warm check time

Run with:

uv run --python 3.14 benchmark --warm

Measures how long it takes to recheck a project if there were no changes.

Note

: Of the benchmarked type checkers, only mypy supports caching.

LSP: Time to first diagnostic

Measures how long it takes for a newly started LSP to return the diagnostics for the files open in the editor.

Run with:

uv run --python 3.14 pytest src/benchmark/test_lsp_diagnostics.py::test_fetch_diagnostics

Note: Use -v -s to see the set of diagnostics returned by each type checker.

LSP: Re-check time

Measure how long it takes to recheck all open files after making a single change in a file.

Run with:

uv run --python 3.14 pytest src/benchmark/test_lsp_diagnostics.py::test_incremental_edit

Note

: This benchmark uses pull diagnostics for type checkers that support this operation (ty), and falls back to publish diagnostics otherwise (Pyright, Pyrefly).

Known limitations

The tested type checkers implement Python's type system to varying degrees and some projects only successfully pass type checking using a specific type checker.

Updating the benchmark

The benchmark script supports snapshotting the results when running with --snapshot and --accept. The goal of those snapshots is to catch accidental regressions. For example, if a project adds new dependencies that we fail to install. They are not intended as a testing tool. E.g. the snapshot runner doesn't account for platform differences so that you might see differences when running the snapshots on your machine.