Commit Graph

73 Commits

Author SHA1 Message Date
reiden e36f8db7e5 gh-143414: Implement unique reference tracking for JIT, optimize unpacking of such tuples (GH-144300)
Co-authored-by: Ken Jin <kenjin4096@gmail.com>
2026-03-23 00:57:23 +08:00
Ken Jin acfb4528de gh-146099: Optimize _GUARD_CODE_VERSION+IP via function version symbols (GH-146101) 2026-03-20 12:26:41 +00:00
Hai Zhu 6fe91a9e80 gh-144888: JIT executor bloom filter wide-type optimization and function Inlining (GH-146114) 2026-03-19 00:58:14 +08:00
Ken Jin 3d0824aef2 gh-127958: Trace from RESUME in the JIT (GH-145905) 2026-03-17 00:18:59 +08:00
Hai Zhu 81ef1b7317 gh-144888: Replace bloom filter linked lists with continuous arrays to optimize executor invalidating performance (GH-145873) 2026-03-16 15:58:18 +00:00
Donghee Na 6908372fb8 gh-145214: Narrow _GUARD_TOS_ANY_{SET,DICT} by using probable type (gh-145215) 2026-03-03 09:58:38 +09:00
Mark Shannon 3f37b94c73 GH-144651: Optimize the new uops added when recording values during tracing. (GH-144948)
* Handle dependencies in the optimizer, not the tracer
* Strengthen some checks to avoid relying on optimizer for correctness
2026-02-19 11:52:57 +00:00
Mark Shannon b53fc7caa6 GH-144179: Use recorded values to make optimizer more robust (GH-144437)
* Add three new symbol kinds
* Do not smuggle code object in _PUSH_FRAME operand
* Fix small bug in predicate analysis
2026-02-05 08:58:41 +00:00
Mark Shannon 141fd8b894 GH-144179: Add value recording to JIT tracing front-end (GH-144303) 2026-02-02 16:57:04 +00:00
Hai Zhu ebbb2ca81f gh-144145: Revert PR#144122 for performance and potential bugs. (GH-144391)
Revert "gh-144145: Track nullness of properties in the Tier 2 JIT optimizer (GH-144122)"

This reverts commit 1dc12b2883.
2026-02-02 14:09:54 +00:00
Hai Zhu 1dc12b2883 gh-144145: Track nullness of properties in the Tier 2 JIT optimizer (GH-144122) 2026-01-30 15:25:19 +00:00
Mark Shannon d77aaa7311 GH-139109: Partial reworking of JIT data structures (GH-144105)
* Halve size of buffers by reusing combined trace + optimizer buffers for TOS caching
* Add simple buffer struct for more maintainable handling of buffers
* Decouple JIT structs from thread state struct
* Ensure terminator is added to trace, when optimizer gives up
2026-01-22 10:55:49 +00:00
reiden 0b08438ea6 gh-130415: Narrowing to constants in branches involving is comparisons with a constant (GH-143895) 2026-01-22 09:37:45 +00:00
Hai Zhu 61ec66acd5 gh-143946: Show JitOptSymbol on abstract stack when set PYTHON_OPT_DEBUG > 4 (GH-143957) 2026-01-17 15:20:35 +00:00
Ken Jin 7e28ae550f gh-142913: Export JIT functions for _testinternalcapi (#143958)
* Export JIT functions for _testinternalcapi

* Add testinternalcapi to paths to run JIT CI on
2026-01-17 13:31:38 +00:00
Ken Jin e370c8db52 gh-143123: Protect against recursive tracer calls/finalization (GH-143126)
* Stronger check for recursive traces

* Add a stop_tracing field

* Stop early when tracing exceptions
2026-01-14 12:23:14 +00:00
Hai Zhu 94dbce1397 gh-138050: Use cold flag instead of warm flag in MAKE_WARM (GH-143827) 2026-01-14 10:27:33 +00:00
Nadeshiko Manju e2f0160026 gh-143604: Hold strong reference to executor during JIT tracing (GH-143646)
Co-authored-by: Ken Jin <kenjin4096@gmail.com>
2026-01-10 11:15:48 +00:00
Ken Jin e0fb278064 gh-143421: Allocate all JIT state in one go (GH-143626) 2026-01-09 19:00:49 +00:00
Hai Zhu aeb3403563 gh-143421: Move JitOptContext from stack allocation to per-thread heap allocation (GH-143536)
* move JitOptContext to _PyThreadStateImpl
* make _PyUOpInstruction buffer a part of _PyThreadStateImpl

Co-authored-by: Kumar Aditya <kumaraditya@python.org>
2026-01-08 19:38:21 +00:00
Mark Shannon 20aeb3a463 GH-143026: Fix assertion error in executor management. (GH-143104) 2025-12-23 17:19:34 +00:00
Mark Shannon e4058d7cb1 GH-142513: Reimplement executor management (GH-142931)
* Invalidating an executor does not cause arbitrary code to run
* Executors are only freed at safe points
2025-12-18 16:43:44 +00:00
Mark Shannon 469f191a85 GH-135379: Top of stack caching for the JIT. (GH-135465)
Uses three registers to cache values at the top of the evaluation stack
This significantly reduces memory traffic for smaller, more common uops.
2025-12-11 10:32:52 +00:00
Mark Shannon b420f6be53 GH-139109: Support switch/case dispatch with the tracing interpreter. (GH-141703) 2025-11-18 13:31:48 +00:00
Ken Jin ed73c909f2 gh-139109: JIT _EXIT_TRACE to ENTER_EXECUTOR rather than _DEOPT (GH-141573) 2025-11-15 20:19:41 +00:00
Ken Jin 4fa80ce74c gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310)
This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. **Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR**. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg

Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md .

This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277

The optimizer stack space check is disabled, as it's no longer valid to deal with underflow.

Pros:
* Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator.
* `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace.
* The new JIT frontend is able to handle a lot more control-flow than the old one.
* Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing.
* Better handling of polymorphism. We leverage the specializing interpreter for this.

Cons:
* (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962

Design:
* After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests.
* The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering.
* The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory.
* The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt.
* The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested.
* Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace.
* Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.
2025-11-13 18:08:32 +00:00
alm 1753ccb432 gh-138050: [WIP] JIT - Streamline MAKE_WARM - move coldness check to executor creation (GH-138240) 2025-10-27 16:37:37 +00:00
Mark Shannon 3b83257366 GH-138378: Move globals-to-consts pass into main optimizer pass (GH-138379) 2025-09-18 10:09:59 +01:00
Donghee Na d873fb42f3 gh-137838: Move _PyUOpInstruction buffer to PyInterpreterState (gh-138918) 2025-09-17 18:50:16 +01:00
Donghee Na 5edfe55acf gh-137838: Fix JIT trace buffer overrun by increasing possible exit stubs (gh-138177) 2025-09-09 09:51:08 +09:00
Ken Jin 2402f84665 gh-138431: JIT Optimizer --- Fix round-tripping references for str and tuple (GH-138458)
Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
2025-09-04 02:05:06 +08:00
AN Long 1ff2cbbac8 gh-137136: Suppress build warnings when build on Windows with --experimental-jit-interpreter (GH-137137) 2025-09-03 15:42:26 +01:00
Mark Shannon a8d9d94784 GH-137959: Replace shim code in jitted code with a single trampoline function. (GH-137961) 2025-08-21 10:40:53 +01:00
Mark Shannon e7b55f564d GH-136410: Faster side exits by using a cold exit stub (GH-136411) 2025-08-01 16:26:07 +01:00
Ken Jin 695ab61351 gh-132732: Automatically constant evaluate pure operations (GH-132733)
This adds a "macro" to the optimizer DSL called "REPLACE_OPCODE_IF_EVALUATES_PURE", which allows automatically constant evaluating a bytecode body if certain inputs have no side effects upon evaluations (such as ints, strings, and floats).


Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
2025-06-27 19:37:44 +08:00
Mark Shannon 9731dd2c8d GH-135379: Specialize int operations for compact ints only (GH-135668) 2025-06-19 11:10:29 +01:00
Ken Jin fba5dded6d gh-134584: Decref elimination for float ops in the JIT (GH-134588)
This PR adds a PyJitRef API to the JIT's optimizer that mimics the _PyStackRef API. This allows it to track references and their stack lifetimes properly. Thus opening up the doorway to refcount elimination in the JIT.
2025-06-17 23:25:53 +08:00
Mark Shannon ac7d5ba96e GH-133231: Changes to executor management to support proposed sys._jit module (GH-133287)
* Track the current executor, not the previous one, on the thread-state. 

* Batch executors for deallocation to avoid having to constantly incref executors; this is an ad-hoc form of deferred reference counting.
2025-05-04 10:05:35 +01:00
Bénédikt Tran 883c2f682b GH-131331: Rename "not" to "invert" (GH-131334) 2025-03-20 16:59:41 -07:00
Victor Stinner b8367e7cf3 gh-130931: Add pycore_typedefs.h internal header (#131396)
Declare _PyInterpreterFrame and _PyRuntimeState types before
declaring their structure members. Break reference cycles between
header files.
2025-03-19 15:23:32 +01:00
Brandt Bucher 7afa476874 GH-130415: Use boolean guards to narrow types to values in the JIT (GH-130659) 2025-03-02 13:21:34 -08:00
Brandt Bucher 5fa7e1b7fd GH-129715: Remove _DYNAMIC_EXIT (GH-129716) 2025-02-07 11:41:17 -08:00
Mark Shannon 2effea4dab GH-128682: Spill the stack pointer in labels, as well as instructions (GH-129618) 2025-02-04 12:18:31 +00:00
Brandt Bucher 828b27680f GH-126599: Remove the PyOptimizer API (GH-129194) 2025-01-28 16:10:51 -08:00
Mark Shannon f0f7b978be GH-128939: Refactor JIT optimize structs (GH-128940) 2025-01-20 15:49:15 +00:00
Xuanteng Huang b44ff6d0df GH-126599: Remove the "counter" optimizer/executor (GH-126853) 2025-01-16 15:57:04 -08:00
Mark Shannon e62e1ca455 GH-126833: Dumps graphviz representation of executor graph. (GH-126880) 2024-12-13 11:00:00 +00:00
Ken Jin 6293d00e72 gh-120619: Strength reduce function guards, support 2-operand uop forms (GH-124846)
Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
2024-11-09 11:35:33 +08:00
Savannah Ostrowski 65f1237098 GH-123516: Improve JIT memory consumption by invalidating cold executors (GH-124443)
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2024-09-27 00:35:42 +00:00
Brandt Bucher 9621a7d017 GH-118093: Handle some polymorphism before requiring progress in tier two (GH-122843) 2024-08-12 12:39:31 -07:00