Commit Graph

128 Commits

Author SHA1 Message Date
Neko Asakura 52a7f1b7f8 gh-148510: restore func_version check in _LOAD_ATTR_PROPERTY_FRAME (GH-148528) 2026-04-14 22:44:39 +08:00
Ken Jin 6f7bb297db gh-146261: JIT: protect against function version changes (#146300)
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
2026-04-13 02:23:47 +08:00
Sam Gross 3ab94d6842 gh-148393: Use atomic ops on _ma_watcher_tag in free threading build (gh-148397)
Fixes data races between dict mutation and watch/unwatch on the same dict.
2026-04-12 10:40:41 -04:00
Kumar Aditya ba1e1c192b gh-131798: do not watch immutable types in JIT (#148383) 2026-04-11 15:43:53 +00:00
Neko Asakura 9831dea3bf gh-148211: decompose _POP_TWO/_POP_CALL(_ONE/_TWO) in JIT (GH-148377) 2026-04-11 20:46:56 +08:00
Neko Asakura 72006a71b2 gh-148211: decompose [_POP_TWO/_INSERT_2]_LOAD_CONST_INLINE_BORROW in JIT (GH-148357) 2026-04-11 18:27:51 +08:00
Ken Jin 266247c9a6 gh-148171: Convert CALL_BUILTIN_FAST to leave inputs on the stack for refcount elimination in JIT (GH-148172) 2026-04-10 23:11:18 +08:00
Neko Asakura aea0b91d65 gh-148211: decompose [_POP_CALL_X/_SHUFFLE_2]_LOAD_CONST_INLINE_BORROW in JIT (GH-148313) 2026-04-10 21:57:01 +08:00
Neko Asakura 0f49232664 gh-148211: decompose _INSERT_1_LOAD_CONST_INLINE(_BORROW) in JIT (GH-148283) 2026-04-10 00:45:39 +08:00
Kumar Aditya 458aca9237 gh-131798: fold super method lookups in JIT (#148231) 2026-04-09 13:25:01 +05:30
Neko Asakura b5ccf00bd6 gh-148211: refactor bool to explicit uops in JIT (GH-148258) 2026-04-09 13:20:31 +08:00
Neko Asakura d2fa4b2b13 gh-148211: decompose _POP_TOP_LOAD_CONST_INLINE(_BORROW) in JIT (GH-148230) 2026-04-08 23:20:31 +08:00
Neko Asakura 756358524e gh-148235: remove duplicate uops _LOAD_CONST_UNDER_INLINE(_BORROW) in JIT (GH-148236) 2026-04-08 16:22:59 +08:00
Donghee Na 853dafe23a gh-148083: Prevent constant folding when lhs is container types (gh-148090) 2026-04-05 00:40:12 +09:00
Donghee Na 289f19adb0 gh-148083: Constant-fold _CONTAINS_OP_SET for frozenset (gh-148084) 2026-04-04 12:32:12 +00:00
Kumar Aditya 8e9d21c64b gh-146558: JIT optimize dict access for objects with known hash (#146559) 2026-03-30 14:23:29 +00:00
Kumar Aditya fc1f6b176e gh-GH-131798: optimize jit attribute loads on immutable types (#146449) 2026-03-26 10:12:10 +00:00
Ken Jin acfb4528de gh-146099: Optimize _GUARD_CODE_VERSION+IP via function version symbols (GH-146101) 2026-03-20 12:26:41 +00:00
Donghee Na 6908372fb8 gh-145214: Narrow _GUARD_TOS_ANY_{SET,DICT} by using probable type (gh-145215) 2026-03-03 09:58:38 +09:00
Mark Shannon 3f37b94c73 GH-144651: Optimize the new uops added when recording values during tracing. (GH-144948)
* Handle dependencies in the optimizer, not the tracer
* Strengthen some checks to avoid relying on optimizer for correctness
2026-02-19 11:52:57 +00:00
Chris Eibl caac966b00 fix warnings in jit builds (GH-144817) 2026-02-14 17:39:10 +00:00
Hai Zhu 988286e510 gh-144623: Fix missing output uops in optimizer debug output (GH-144617) 2026-02-09 13:23:02 +00:00
Mark Shannon b53fc7caa6 GH-144179: Use recorded values to make optimizer more robust (GH-144437)
* Add three new symbol kinds
* Do not smuggle code object in _PUSH_FRAME operand
* Fix small bug in predicate analysis
2026-02-05 08:58:41 +00:00
Hai Zhu ebbb2ca81f gh-144145: Revert PR#144122 for performance and potential bugs. (GH-144391)
Revert "gh-144145: Track nullness of properties in the Tier 2 JIT optimizer (GH-144122)"

This reverts commit 1dc12b2883.
2026-02-02 14:09:54 +00:00
Hai Zhu 1dc12b2883 gh-144145: Track nullness of properties in the Tier 2 JIT optimizer (GH-144122) 2026-01-30 15:25:19 +00:00
Hai Zhu 26996b59ab gh-143946: Add more debug info in optimize_uops (GH-144262) 2026-01-29 16:58:01 +00:00
AN Long 6e55337f8a gh-143995: Eliminate redundant refcounting in the JIT from LOAD_ATTR_MODULE (GH-143996) 2026-01-25 18:24:44 +00:00
reiden 6d972e0104 gh-130415: Narrow types to constants in branches involving specialized comparisons with a constant (GH-144150) 2026-01-24 10:02:08 +00:00
Ken Jin ca99bfdefb gh-144016: Fix bad stack assert in the JIT optimizer (GH-144019) 2026-01-24 09:36:40 +00:00
Mark Shannon d77aaa7311 GH-139109: Partial reworking of JIT data structures (GH-144105)
* Halve size of buffers by reusing combined trace + optimizer buffers for TOS caching
* Add simple buffer struct for more maintainable handling of buffers
* Decouple JIT structs from thread state struct
* Ensure terminator is added to trace, when optimizer gives up
2026-01-22 10:55:49 +00:00
reiden 0b08438ea6 gh-130415: Narrowing to constants in branches involving is comparisons with a constant (GH-143895) 2026-01-22 09:37:45 +00:00
Mark Shannon 9eab67d507 GH-138245: Perform boolean guards by testing a single bit, rather than a full pointer comparison. (GH-143810) 2026-01-21 15:58:27 +00:00
Hai Zhu b4b73245d8 gh-143421: Use new buffer to save optimized uops (GH-143682) 2026-01-17 15:52:16 +00:00
Hai Zhu 61ec66acd5 gh-143946: Show JitOptSymbol on abstract stack when set PYTHON_OPT_DEBUG > 4 (GH-143957) 2026-01-17 15:20:35 +00:00
Ken Jin e0fb278064 gh-143421: Allocate all JIT state in one go (GH-143626) 2026-01-09 19:00:49 +00:00
Ken Jin b852236b26 gh-143421: Lazily allocate tracer code and opt buffers (GH-143597) 2026-01-09 16:56:19 +00:00
Hai Zhu aeb3403563 gh-143421: Move JitOptContext from stack allocation to per-thread heap allocation (GH-143536)
* move JitOptContext to _PyThreadStateImpl
* make _PyUOpInstruction buffer a part of _PyThreadStateImpl

Co-authored-by: Kumar Aditya <kumaraditya@python.org>
2026-01-08 19:38:21 +00:00
Nadeshiko Manju 0a5c04a5ce gh-134584: Eliminate redundant refcounting from TO_BOOL_STR (GH-143417)
Signed-off-by: Manjusaka <me@manjusaka.me>
2026-01-06 21:11:53 +00:00
Tomas R. 25c294b6ea gh-134584: Eliminate redundant refcounting from _CALL_TYPE_1 (GH-135818) 2025-12-23 17:01:10 +00:00
Ken Jin 0ac4e6c6cd gh-134584: Remove custom float decref ops (GH-142576) 2025-12-15 19:38:58 +00:00
Ken Jin 97f0a1f203 gh-142276: Watch attribute loads when promoting JIT constants (GH-142303)
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Savannah Ostrowski <savannah@python.org>
2025-12-08 18:03:15 +00:00
Ken Jin b3bf212898 gh-141976: Check stack bounds in JIT optimizer (GH-142201) 2025-12-04 20:28:08 +00:00
Mark Shannon 62423c9c36 GH-141794: Limit size of generated machine code. (GH-142228)
* Factor out bodies of the largest uops, to reduce jit code size.
* Factor out common assert, also reducing jit code size.
* Limit size of jitted code for a single executor to 1MB.
2025-12-03 17:43:35 +00:00
Ken Jin 4fa80ce74c gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310)
This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. **Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR**. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg

Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md .

This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277

The optimizer stack space check is disabled, as it's no longer valid to deal with underflow.

Pros:
* Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator.
* `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace.
* The new JIT frontend is able to handle a lot more control-flow than the old one.
* Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing.
* Better handling of polymorphism. We leverage the specializing interpreter for this.

Cons:
* (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962

Design:
* After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests.
* The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering.
* The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory.
* The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt.
* The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested.
* Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace.
* Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.
2025-11-13 18:08:32 +00:00
Mark Shannon 3b83257366 GH-138378: Move globals-to-consts pass into main optimizer pass (GH-138379) 2025-09-18 10:09:59 +01:00
Ken Jin 46f823bb81 gh-132732: Clear errors in JIT optimizer on error (GH-136048) 2025-09-15 17:24:37 +01:00
AN Long 1ff2cbbac8 gh-137136: Suppress build warnings when build on Windows with --experimental-jit-interpreter (GH-137137) 2025-09-03 15:42:26 +01:00
Ken Jin 7fda8b66de gh-137728 gh-137762: Fix bugs in the JIT with many local variables (GH-137764) 2025-08-20 22:53:54 +08:00
Savannah Bailey f7c380ef67 GH-132732: Use pure op machinery to optimize COMPARE_OP_INT/FLOAT/STR (#137062)
Co-authored-by: Ken Jin <kenjin4096@gmail.com>
2025-07-25 19:02:04 -07:00
Ken Jin 695ab61351 gh-132732: Automatically constant evaluate pure operations (GH-132733)
This adds a "macro" to the optimizer DSL called "REPLACE_OPCODE_IF_EVALUATES_PURE", which allows automatically constant evaluating a bytecode body if certain inputs have no side effects upon evaluations (such as ints, strings, and floats).


Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
2025-06-27 19:37:44 +08:00