Commit Graph

94 Commits

Author SHA1 Message Date
Ken Jin e0fb278064 gh-143421: Allocate all JIT state in one go (GH-143626) 2026-01-09 19:00:49 +00:00
Ken Jin b852236b26 gh-143421: Lazily allocate tracer code and opt buffers (GH-143597) 2026-01-09 16:56:19 +00:00
Hai Zhu aeb3403563 gh-143421: Move JitOptContext from stack allocation to per-thread heap allocation (GH-143536)
* move JitOptContext to _PyThreadStateImpl
* make _PyUOpInstruction buffer a part of _PyThreadStateImpl

Co-authored-by: Kumar Aditya <kumaraditya@python.org>
2026-01-08 19:38:21 +00:00
Nadeshiko Manju 0a5c04a5ce gh-134584: Eliminate redundant refcounting from TO_BOOL_STR (GH-143417)
Signed-off-by: Manjusaka <me@manjusaka.me>
2026-01-06 21:11:53 +00:00
Tomas R. 25c294b6ea gh-134584: Eliminate redundant refcounting from _CALL_TYPE_1 (GH-135818) 2025-12-23 17:01:10 +00:00
Ken Jin 0ac4e6c6cd gh-134584: Remove custom float decref ops (GH-142576) 2025-12-15 19:38:58 +00:00
Ken Jin 97f0a1f203 gh-142276: Watch attribute loads when promoting JIT constants (GH-142303)
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Savannah Ostrowski <savannah@python.org>
2025-12-08 18:03:15 +00:00
Ken Jin b3bf212898 gh-141976: Check stack bounds in JIT optimizer (GH-142201) 2025-12-04 20:28:08 +00:00
Mark Shannon 62423c9c36 GH-141794: Limit size of generated machine code. (GH-142228)
* Factor out bodies of the largest uops, to reduce jit code size.
* Factor out common assert, also reducing jit code size.
* Limit size of jitted code for a single executor to 1MB.
2025-12-03 17:43:35 +00:00
Ken Jin 4fa80ce74c gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310)
This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. **Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR**. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg

Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md .

This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277

The optimizer stack space check is disabled, as it's no longer valid to deal with underflow.

Pros:
* Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator.
* `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace.
* The new JIT frontend is able to handle a lot more control-flow than the old one.
* Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing.
* Better handling of polymorphism. We leverage the specializing interpreter for this.

Cons:
* (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962

Design:
* After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests.
* The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering.
* The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory.
* The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt.
* The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested.
* Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace.
* Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.
2025-11-13 18:08:32 +00:00
Mark Shannon 3b83257366 GH-138378: Move globals-to-consts pass into main optimizer pass (GH-138379) 2025-09-18 10:09:59 +01:00
Ken Jin 46f823bb81 gh-132732: Clear errors in JIT optimizer on error (GH-136048) 2025-09-15 17:24:37 +01:00
AN Long 1ff2cbbac8 gh-137136: Suppress build warnings when build on Windows with --experimental-jit-interpreter (GH-137137) 2025-09-03 15:42:26 +01:00
Ken Jin 7fda8b66de gh-137728 gh-137762: Fix bugs in the JIT with many local variables (GH-137764) 2025-08-20 22:53:54 +08:00
Savannah Bailey f7c380ef67 GH-132732: Use pure op machinery to optimize COMPARE_OP_INT/FLOAT/STR (#137062)
Co-authored-by: Ken Jin <kenjin4096@gmail.com>
2025-07-25 19:02:04 -07:00
Ken Jin 695ab61351 gh-132732: Automatically constant evaluate pure operations (GH-132733)
This adds a "macro" to the optimizer DSL called "REPLACE_OPCODE_IF_EVALUATES_PURE", which allows automatically constant evaluating a bytecode body if certain inputs have no side effects upon evaluations (such as ints, strings, and floats).


Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
2025-06-27 19:37:44 +08:00
Ken Jin 569fc6870f gh-134584: Specialize POP_TOP by reference and type in JIT (GH-135761) 2025-06-24 00:57:14 +08:00
Ken Jin b53b0c14da gh-135608: Add a null check for attribute promotion to fix a JIT crash (GH-135613)
Co-authored-by: devdanzin <74280297+devdanzin@users.noreply.github.com>
2025-06-20 14:33:35 +08:00
Mark Shannon 9731dd2c8d GH-135379: Specialize int operations for compact ints only (GH-135668) 2025-06-19 11:10:29 +01:00
Ken Jin fba5dded6d gh-134584: Decref elimination for float ops in the JIT (GH-134588)
This PR adds a PyJitRef API to the JIT's optimizer that mimics the _PyStackRef API. This allows it to track references and their stack lifetimes properly. Thus opening up the doorway to refcount elimination in the JIT.
2025-06-17 23:25:53 +08:00
Tomas R. 71dea74865 gh-131798: Small improvements to remove_unneeded_uops (GH-134554)
Improve remove_unneeded_uops
2025-05-23 20:48:45 +08:00
Tomas R. 484e00379b GH-131798: Optimize away isinstance calls in the JIT (GH-134369) 2025-05-22 12:52:47 -04:00
Brandt Bucher ec736e7dae GH-131798: Optimize cached class attributes and methods in the JIT (GH-134403) 2025-05-22 11:15:03 -04:00
Brandt Bucher 2f0570caf4 GH-131798: Narrow types more aggressively in the JIT (GH-134373) 2025-05-20 18:09:51 -04:00
Adam Turner f21e42d906 Remove duplicate includes: Python/{bytecodes,ceval,optimizer_analysis}.c (#132622) 2025-05-01 12:07:53 +01:00
Brandt Bucher 4f7f72ce34 GH-130415: Improve the JIT's unneeded uop removal pass (GH-132333) 2025-04-21 09:58:55 -07:00
Brandt Bucher 3a8cefba0b GH-131726: Split up _CHECK_VALIDITY_AND_SET_IP (GH-131810) 2025-04-01 16:55:05 -07:00
mpage 053c285f6b gh-130704: Strength reduce LOAD_FAST{_LOAD_FAST} (#130708)
Optimize `LOAD_FAST` opcodes into faster versions that load borrowed references onto the operand stack when we can prove that the lifetime of the local outlives the lifetime of the temporary that is loaded onto the stack.
2025-04-01 10:18:42 -07:00
Mark Shannon 7ebd71ee14 GH-131498: Remove conditional stack effects (GH-131499)
* Adds some missing #includes
2025-03-20 15:39:38 +00:00
Brandt Bucher 7afa476874 GH-130415: Use boolean guards to narrow types to values in the JIT (GH-130659) 2025-03-02 13:21:34 -08:00
Mark Shannon 54965f3fb2 GH-130296: Avoid stack transients in four instructions. (GH-130310)
* Combine _GUARD_GLOBALS_VERSION_PUSH_KEYS and _LOAD_GLOBAL_MODULE_FROM_KEYS into _LOAD_GLOBAL_MODULE

* Combine _GUARD_BUILTINS_VERSION_PUSH_KEYS and _LOAD_GLOBAL_BUILTINS_FROM_KEYS into _LOAD_GLOBAL_BUILTINS

* Combine _CHECK_ATTR_MODULE_PUSH_KEYS and _LOAD_ATTR_MODULE_FROM_KEYS into _LOAD_ATTR_MODULE

* Remove stack transient in LOAD_ATTR_WITH_HINT
2025-02-28 18:00:38 +00:00
Brandt Bucher 5fa7e1b7fd GH-129715: Remove _DYNAMIC_EXIT (GH-129716) 2025-02-07 11:41:17 -08:00
Mark Shannon 75b4962157 GH-128914: Remove all but one conditional stack effects (GH-129226)
* Remove all 'if (0)' and 'if (1)' conditional stack effects

* Use array instead of conditional for BUILD_SLICE args

* Refactor LOAD_GLOBAL to use a common conditional uop

* Remove conditional stack effects from LOAD_ATTR specializations

* Replace conditional stack effects in LOAD_ATTR with a 0 or 1 sized array.

* Remove conditional stack effects from CALL_FUNCTION_EX
2025-01-27 16:24:48 +00:00
Sam Gross a10f99375e Revert "GH-128914: Remove conditional stack effects from bytecodes.c and the code generators (GH-128918)" (GH-129202)
The commit introduced a ~2.5-3% regression in the free threading build.

This reverts commit ab61d3f430.
2025-01-23 09:26:25 +00:00
Mark Shannon ab61d3f430 GH-128914: Remove conditional stack effects from bytecodes.c and the code generators (GH-128918) 2025-01-20 17:09:23 +00:00
Mark Shannon f0f7b978be GH-128939: Refactor JIT optimize structs (GH-128940) 2025-01-20 15:49:15 +00:00
mpage 2de048ce79 gh-115999: Specialize loading attributes from modules in free-threaded builds (#127711)
We use the same approach that was used for specialization of LOAD_GLOBAL in free-threaded builds:

_CHECK_ATTR_MODULE is renamed to _CHECK_ATTR_MODULE_PUSH_KEYS; it pushes the keys object for the following _LOAD_ATTR_MODULE_FROM_KEYS (nee _LOAD_ATTR_MODULE). This arrangement avoids having to recheck the keys version.

_LOAD_ATTR_MODULE is renamed to _LOAD_ATTR_MODULE_FROM_KEYS; it loads the value from the keys object pushed by the preceding _CHECK_ATTR_MODULE_PUSH_KEYS at the cached index.
2024-12-13 10:17:16 -08:00
Ken Jin 6293d00e72 gh-120619: Strength reduce function guards, support 2-operand uop forms (GH-124846)
Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
2024-11-09 11:35:33 +08:00
mpage f978fb4f8d gh-115999: Refactor LOAD_GLOBAL specializations to avoid reloading {globals, builtins} keys (gh-124953)
Each of the `LOAD_GLOBAL` specializations is implemented roughly as:

1. Load keys version.
2. Load cached keys version.
3. Deopt if (1) and (2) don't match.
4. Load keys.
5. Load cached index into keys.
6. Load object from (4) at offset from (5).

This is not thread-safe in free-threaded builds; the keys object may be replaced
in between steps (3) and (4).

This change refactors the specializations to avoid reloading the keys object and
instead pass the keys object from guards to be consumed by downstream uops.
2024-10-09 15:18:25 +00:00
Ken Jin b84a763dca gh-120619: Optimize through _Py_FRAME_GENERAL (GH-124518)
* Optimize through _Py_FRAME_GENERAL

* refactor
2024-10-03 01:10:51 +08:00
Sam Gross 5aa91c56bf gh-124296: Remove private dictionary version tag (PEP 699) (#124472) 2024-10-01 12:39:56 -04:00
Sam Gross f4997bb3ac gh-123923: Defer refcounting for f_funcobj in _PyInterpreterFrame (#124026)
Use a `_PyStackRef` and defer the reference to `f_funcobj` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
2024-09-24 20:08:18 +00:00
Mark Shannon df13a1821a GH-118095: Add tier two support for BINARY_SUBSCR_GETITEM (GH-120793) 2024-08-01 16:19:05 -07:00
Victor Stinner 9e4a81f00f gh-120642: Move private PyCode APIs to the internal C API (#120643)
* Move _Py_CODEUNIT and related functions to pycore_code.h.
* Move _Py_BackoffCounter to pycore_backoff.h.
* Move Include/cpython/optimizer.h content to pycore_optimizer.h.
* Remove Include/cpython/optimizer.h.
* Remove PyUnstable_Replace_Executor().

Rename functions:

* PyUnstable_GetExecutor() => _Py_GetExecutor()
* PyUnstable_GetOptimizer() => _Py_GetOptimizer()
* PyUnstable_SetOptimizer() => _Py_SetTier2Optimizer()
* PyUnstable_Optimizer_NewCounter() => _PyOptimizer_NewCounter()
* PyUnstable_Optimizer_NewUOpOptimizer() => _PyOptimizer_NewUOpOptimizer()
2024-06-26 13:54:03 +02:00
Mark Shannon 8f5a01707f GH-120982: Add stack check assertions to generated interpreter code (GH-120992) 2024-06-25 16:42:29 +01:00
Mark Shannon 274f844830 GH-120619: Clean up RETURN_VALUE instruction (GH-120624)
* Rename _POP_FRAME to _RETURN_VALUE as it returns a value as well as popping a frame.

* Remove remaining _POP_FRAMEs
2024-06-17 14:40:11 +01:00
Saul Shanabrook 55402d3232 gh-119258: Eliminate Type Guards in Tier 2 Optimizer with Watcher (GH-119365)
Co-authored-by: parmeggiani <parmeggiani@spaziodati.eu>
Co-authored-by: dpdani <git@danieleparmeggiani.me>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Brandt Bucher <brandtbucher@microsoft.com>
Co-authored-by: Ken Jin <kenjin@python.org>
2024-06-08 17:41:45 +08:00
Mark Shannon f5c6b9977a GH-118910: Less boilerplate in the tier 2 optimizer (#118913) 2024-05-10 17:43:23 +01:00
Guido van Rossum 7d83f7bcc4 gh-118335: Configure Tier 2 interpreter at build time (#118339)
The code for Tier 2 is now only compiled when configured
with `--enable-experimental-jit[=yes|interpreter]`.

We drop support for `PYTHON_UOPS` and -`Xuops`,
but you can disable the interpreter or JIT
at runtime by setting `PYTHON_JIT=0`.
You can also build it without enabling it by default
using `--enable-experimental-jit=yes-off`;
enable with `PYTHON_JIT=1`.

On Windows, the `build.bat` script supports
`--experimental-jit`, `--experimental-jit-off`,
`--experimental-interpreter`.

In the C code, `_Py_JIT` is defined as before
when the JIT is enabled; the new variable
`_Py_TIER2` is defined when the JIT *or* the
interpreter is enabled. It is actually a bitmask:
1: JIT; 2: default-off; 4: interpreter.
2024-04-30 18:26:34 -07:00
Mark Shannon f180b31e76 GH-118095: Handle RETURN_GENERATOR in tier 2 (GH-118180) 2024-04-25 11:32:47 +01:00