Commit Graph

187 Commits

Author SHA1 Message Date
Ken Jin 6cb245d260 gh-143183: Link trace to side exits, rather than stop (GH-143268) 2025-12-29 15:10:42 +00:00
Petr Viktorin 7d81eab923 gh-142225: Add PyABIInfo_VAR to to _testcapimodule & _testinternalcapi (GH-142833) 2025-12-17 16:33:09 +01:00
Neil Schemenauer e38967ed60 gh-142531: Fix free-threaded GC performance regression (gh-142562)
If there are many untracked tuples, the GC will run too often, resulting
in poor performance.  The fix is to include untracked tuples in the
"long lived" object count. The number of frozen objects is also now
included since the free-threaded GC must scan those too.
2025-12-11 12:30:56 -08:00
Ken Jin 4fa80ce74c gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310)
This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. **Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR**. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg

Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md .

This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277

The optimizer stack space check is disabled, as it's no longer valid to deal with underflow.

Pros:
* Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator.
* `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace.
* The new JIT frontend is able to handle a lot more control-flow than the old one.
* Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing.
* Better handling of polymorphism. We leverage the specializing interpreter for this.

Cons:
* (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962

Design:
* After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests.
* The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering.
* The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory.
* The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt.
* The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested.
* Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace.
* Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.
2025-11-13 18:08:32 +00:00
Victor Stinner b99db92dde gh-139653: Add PyUnstable_ThreadState_SetStackProtection() (#139668)
Add PyUnstable_ThreadState_SetStackProtection() and
PyUnstable_ThreadState_ResetStackProtection() functions
to set the stack base address and stack size of a Python
thread state.

Co-authored-by: Petr Viktorin <encukou@gmail.com>
2025-11-13 17:30:50 +01:00
Petr Viktorin 589a03a8ce gh-140550: Initial implementation of PEP 793 – PyModExport (GH-140556)
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
2025-11-05 12:31:42 +01:00
Dino Viehland ff0cf0af10 gh-139525: Don't specialize functions which have a modified vectorcall (#139524)
Don't specialize functions which have a modified vectorcall
2025-10-03 09:58:32 -07:00
Bénédikt Tran 3779f2b95e gh-139393: fix _CALL_LEN JIT tests for tuples (#139394)
Fix a regression introduced in 7ce25edb8f
where `_PY_NSMALLPOSINTS` was changed from 257 to 1025.
2025-09-28 19:30:44 +02:00
Mark Shannon 16eae6d90d GH-137573: Add test to check that the margin used for overflow protection is larger than the stack space used by the interpreter (GH-137724) 2025-09-23 15:47:27 +02:00
Peter Bierma 2191497933 gh-136003: Execute pre-finalization callbacks in a loop (GH-136004) 2025-09-18 08:29:12 -04:00
Hood Chatham 7ae4749d06 gh-124621: Emscripten: Add support for async input devices (GH-136822)
This is useful for implementing proper `input()`. It requires the
JavaScript engine to support the wasm JSPI spec which is now stage 4.
It is supported on Chrome since version 137 and on Firefox and node
behind a flag.

We override the `__wasi_fd_read()` syscall with our own variant that
checks for a readAsync operation. If it has it, we use our own async
variant of `fd_read()`, otherwise we use the original `fd_read()`.
We also add a variant of `FS.createDevice()` called
`FS.createAsyncInputDevice()`.

Finally, if JSPI is available, we wrap the `main()` symbol with
`WebAssembly.promising()` so that we can stack switch from `fd_read()`.
If JSPI is not available, attempting to read from an AsyncInputDevice
will raise an `OSError`.
2025-07-19 17:14:29 +02:00
William S Fulton 7de8ea7be6 gh-136300: Modify C tests to conform to PEP-737 (GH-136301)
- Use %T format specifier instead of %s and Py_TYPE(x)->tp_name.
- Remove legacy %.200s format specifier for truncating type names.

Co-authored-by: Victor Stinner <vstinner@python.org>
2025-07-11 15:18:35 +02:00
Victor Stinner 28940e8e48 gh-130396: Move PYOS_LOG2_STACK_MARGIN to internal headers (#135928)
Move PYOS_LOG2_STACK_MARGIN, PYOS_STACK_MARGIN,
PYOS_STACK_MARGIN_BYTES and PYOS_STACK_MARGIN_SHIFT macros to
pycore_pythonrun.h internal header. Add underscore (_) prefix to the
names to make them private. Rename _PYOS to _PyOS.
2025-07-01 15:18:17 +02:00
Peter Bierma 10a3d43188 gh-135755: Move PyFunction_GET_BUILTINS to the private API (GH-135938) 2025-06-26 11:43:08 +02:00
Eric Snow 62143736b6 gh-134939: Add the concurrent.interpreters Module (gh-133958)
PEP-734 has been accepted (for 3.14).

(FTR, I'm opposed to putting this under the concurrent package, but
doing so is the SC condition under which the module can land in 3.14.)
2025-06-11 17:35:48 -06:00
sobolevn cebae977a6 gh-133891: Add missing error check to SET_COUNT macro in _testinternalcapi.c (#133892) 2025-06-01 00:33:02 +03:00
Eric Snow 88f8102a8f gh-132775: Support Fallbacks in _PyObject_GetXIData() (gh-133482)
It now supports a "full" fallback to _PyFunction_GetXIData() and then `_PyPickle_GetXIData()`.  There's also room for other fallback modes if that later makes sense.
2025-05-21 07:23:48 -06:00
b-pass f2de1e6861 gh-134144: Fix use-after-free in zapthreads() (#134145) 2025-05-18 20:32:29 +05:30
Eric Snow 8cf4947b0f gh-132775: Add _PyFunction_GetXIData() (gh-133481) 2025-05-12 22:10:56 +00:00
Eric Snow c81fa2b9cd gh-132775: Add _PyCode_GetScriptXIData() (gh-133480)
This converts functions, code, str, bytes, bytearray, and memoryview objects to PyCodeObject,
and ensure that the object looks like a script.  That means no args, no return, and no closure.
_PyCode_GetPureScriptXIData() takes it a step further and ensures there are no globals.

We also add _PyObject_SupportedAsScript() to the internal C-API.
2025-05-08 15:07:46 +00:00
Eric Snow 27128e4fa8 gh-132775: Unrevert "Add _PyCode_VerifyStateless()" (gh-133528)
This reverts commit 3c73cf5 (gh-133497), which itself reverted
the original commit d270bb5 (gh-133221).

We reverted the original change due to failing android tests.
The checks in _PyCode_CheckNoInternalState() were too strict,
so we've relaxed them.
2025-05-08 00:00:33 +00:00
Petr Viktorin 3c73cf51df gh-132775: Revert "gh-132775: Add _PyCode_VerifyStateless() (gh-133221)" (#133497) 2025-05-06 13:09:41 +03:00
Eric Snow ea598730ef gh-132775: Add _PyCode_GetXIData() (gh-133475) 2025-05-05 23:46:03 +00:00
Brandt Bucher b1aa515bd6 GH-133231: Add JIT utilities in sys._jit (GH-133233) 2025-05-05 15:25:22 -07:00
Eric Snow d270bb5792 gh-132775: Add _PyCode_VerifyStateless() (gh-133221)
"Stateless" code is a function or code object which does not rely on external state or internal state.
It may rely on arguments and builtins, but not globals or a closure. I've left a comment in
pycore_code.h that provides more detail.

We also add _PyFunction_VerifyStateless(). The new functions will be used in several later changes
that facilitate "sharing" functions and code objects between interpreters.
2025-05-05 21:48:58 +00:00
Eric Snow 24ebb9ccfd gh-132775: Unrevert "Add _PyCode_GetVarCounts()" (gh-133265)
This reverts commit 811edcf (gh-133232), which itself reverted the original commit 811edcf (gh-133128).

We reverted the original change due to failing s390 builds (a big-endian architecture).
It ended up that I had not accommodated op caches.
2025-05-05 13:24:29 -06:00
Eric Snow 811edcf9cd Revert "gh-132775: Add _PyCode_GetVarCounts() (gh-133128)" (gh-133232)
The change broke the s390 builds, so I'm reverting it while I investigate.

This reverts commit 94b4fcd806.
2025-05-01 02:35:20 +00:00
Eric Snow cb35c11d82 gh-132775: Add _PyPickle_GetXIData() (gh-133107)
There's some extra complexity due to making sure we we get things right when handling functions and classes defined in the __main__ module.  This is also reflected in the tests, including the addition of extra functions in test.support.import_helper.
2025-04-30 17:34:05 -06:00
Eric Snow 94b4fcd806 gh-132775: Add _PyCode_GetVarCounts() (gh-133128)
This helper is useful in a variety of ways, including in demonstrating how the different counts relate to one another.

It will be used in a later change to help identify if a function is "stateless", meaning it doesn't have any free vars or globals.

Note that a majority of this change is tests.
2025-04-30 18:19:20 +00:00
Eric Snow 219d8d24b5 gh-87859: Track Code Object Local Kinds For Arguments (gh-132980)
Doing this was always the intention. I was finally motivated to find the time to do it.

See #87859 (comment).
2025-04-29 02:21:47 +00:00
Eric Snow 96a7fb93a8 gh-132775: Add _PyCode_ReturnsOnlyNone() (gh-132981)
The function indicates whether or not the function has a return statement.

This is used by a later change related treating some functions like scripts.
2025-04-28 20:12:52 -06:00
Eric Snow bdd23c0bb9 gh-132775: Add _PyMarshal_GetXIData() (gh-133108)
Note that the bulk of this change is tests.
2025-04-28 17:23:46 -06:00
Neil Schemenauer 22f0730d40 gh-122320: Limit dict key versions used by test_opcache. (gh-132961)
The `test_load_global_module()` test consumes a lot of dict key versions.
Skip the test if we have consumed half of the available versions that can be
used for the "load global" cache.
2025-04-28 12:54:55 -07:00
Eric Snow 6f04325992 gh-132775: Cleanup Related to crossinterp.c Before Further Changes (gh-132974)
This change consists of adding tests and moving code around, with some renaming thrown in.
2025-04-28 11:55:15 -06:00
Eric Snow cd9536a087 gh-132781: Cleanup Code Related to NotShareableError (gh-132782)
The following are added to the internal C-API:

* _PyErr_FormatV()
* _PyErr_SetModuleNotFoundError()
* _PyXIData_GetNotShareableErrorType()
* _PyXIData_FormatNotShareableError()

We also drop _PyXIData_lookup_context_t and _PyXIData_GetLookupContext().
2025-04-25 14:43:38 -06:00
Sergey B Kirpichev 79f7c67bf6 gh-128813: hide mixed-mode functions for complex arithmetic from C-API (#131703) 2025-04-22 14:18:18 +02:00
Bénédikt Tran edbf7fb129 gh-111178: remove redundant casts for functions with correct signatures (#131673) 2025-04-01 17:18:11 +02:00
Mark Shannon 7ebd71ee14 GH-131498: Remove conditional stack effects (GH-131499)
* Adds some missing #includes
2025-03-20 15:39:38 +00:00
Victor Stinner b69da006a4 gh-131238: Remove includes from pycore_interp.h (#131495)
Remove also now unused includes in C files.
2025-03-20 11:35:23 +00:00
Victor Stinner 20c5f969dd gh-131238: Remove more includes from pycore_interp.h (#131480) 2025-03-19 23:01:32 +01:00
Sam Gross 45bc120d45 gh-130519: Fix crash in QSBR when destructor reenters QSBR (gh-130553)
The `free_work_item()` function in QSBR may call arbitrary code via
Python object destructors, which may reenter the QSBR code. Reorder
the processing of work items to be robust to reentrancy.

Also fix the TODO for the out of memory situation.
2025-02-26 14:55:15 -05:00
Mark Shannon 014223649c GH-130396: Use computed stack limits on linux (GH-130398)
* Implement C recursion protection with limit pointers for Linux, MacOS and Windows

* Remove calls to PyOS_CheckStack

* Add stack protection to parser

* Make tests more robust to low stacks

* Improve error messages for stack overflow
2025-02-25 09:24:48 +00:00
Petr Viktorin ef29104f7d GH-91079: Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now (GH130413)
Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now

Unfortunatlely, the change broke some buildbots.

This reverts commit 2498c22fa0.
2025-02-24 11:16:08 +01:00
Mark Shannon 2498c22fa0 GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)
* Implement C recursion protection with limit pointers

* Remove calls to PyOS_CheckStack

* Add stack protection to parser

* Make tests more robust to low stacks

* Improve error messages for stack overflow
2025-02-19 11:44:57 +00:00
Brandt Bucher 002c4e2982 GH-129386: Use symbolic constants for specialization tests (GH-129415) 2025-01-29 10:49:58 -08:00
Brandt Bucher 828b27680f GH-126599: Remove the PyOptimizer API (GH-129194) 2025-01-28 16:10:51 -08:00
Victor Stinner 1d485db953 gh-128863: Deprecate _PyLong_Sign() function (#129176)
Replace _PyLong_Sign() with PyLong_GetSign().
2025-01-23 03:11:53 +01:00
Mark Shannon f0f7b978be GH-128939: Refactor JIT optimize structs (GH-128940) 2025-01-20 15:49:15 +00:00
Victor Stinner 8ceb6cb117 gh-129033: Remove _PyInterpreterState_SetConfig() function (#129048)
Remove _PyInterpreterState_GetConfigCopy() and
_PyInterpreterState_SetConfig() private functions. PEP 741 "Python
Configuration C API" added a better public C API: PyConfig_Get() and
PyConfig_Set().
2025-01-20 16:31:33 +01:00
Xuanteng Huang b44ff6d0df GH-126599: Remove the "counter" optimizer/executor (GH-126853) 2025-01-16 15:57:04 -08:00