Commit Graph

238 Commits

Author SHA1 Message Date
Ken Jin 4fa80ce74c gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310)
This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. **Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR**. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg

Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md .

This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277

The optimizer stack space check is disabled, as it's no longer valid to deal with underflow.

Pros:
* Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator.
* `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace.
* The new JIT frontend is able to handle a lot more control-flow than the old one.
* Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing.
* Better handling of polymorphism. We leverage the specializing interpreter for this.

Cons:
* (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962

Design:
* After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests.
* The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering.
* The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory.
* The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt.
* The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested.
* Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace.
* Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.
2025-11-13 18:08:32 +00:00
Dino Viehland ff7bb565d8 gh-139924: Add PyFunction_PYFUNC_EVENT_MODIFY_QUALNAME event for function watchers (#139925)
Add PyFunction_PYFUNC_EVENT_MODIFY_QUALNAME event for function watchers
2025-10-10 15:25:38 -07:00
Semyon Moroz 968f6e523a gh-130821: Add type information to error messages for invalid return type (GH-130835) 2025-08-14 11:04:41 +03:00
Xuanteng Huang b1056c2a44 gh-135607: remove null checking of weakref list in dealloc of extension modules and objects (#135614)
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
2025-06-30 11:14:31 +00:00
Peter Bierma 10a3d43188 gh-135755: Move PyFunction_GET_BUILTINS to the private API (GH-135938) 2025-06-26 11:43:08 +02:00
Eric Snow dafd14146f gh-132775: Fix _PyFunctIon_VerifyStateless() (#134900)
The problem we're fixing here is that we were using PyDict_Size() on "defaults",
which it is actually a tuple.  We're also adding some explicit type checks.

This is a follow-up to gh-133221/gh-133528.
2025-05-29 20:13:12 +00:00
Eric Snow 27128e4fa8 gh-132775: Unrevert "Add _PyCode_VerifyStateless()" (gh-133528)
This reverts commit 3c73cf5 (gh-133497), which itself reverted
the original commit d270bb5 (gh-133221).

We reverted the original change due to failing android tests.
The checks in _PyCode_CheckNoInternalState() were too strict,
so we've relaxed them.
2025-05-08 00:00:33 +00:00
Petr Viktorin 3c73cf51df gh-132775: Revert "gh-132775: Add _PyCode_VerifyStateless() (gh-133221)" (#133497) 2025-05-06 13:09:41 +03:00
Eric Snow d270bb5792 gh-132775: Add _PyCode_VerifyStateless() (gh-133221)
"Stateless" code is a function or code object which does not rely on external state or internal state.
It may rely on arguments and builtins, but not globals or a closure. I've left a comment in
pycore_code.h that provides more detail.

We also add _PyFunction_VerifyStateless(). The new functions will be used in several later changes
that facilitate "sharing" functions and code objects between interpreters.
2025-05-05 21:48:58 +00:00
Ivan Kirpichnikov a36367520e gh-132457: make staticmethod and classmethod generic (#132460)
Co-authored-by: sobolevn <mail@sobolevn.me>
2025-05-04 19:26:38 +03:00
Victor Stinner 1a082085ae gh-131238: Remove pycore_object_deferred.h from pycore_object.h (#131549)
Remove also pycore_function.h from pycore_typeobject.h.
2025-03-21 16:44:10 +00:00
Mark Shannon a45f25361d GH-131238: More refactoring of core header files (GH-131351)
Adds new pycore_stats.h header file to help break dependencies involving the pycore_code.h header.
2025-03-17 14:41:05 +00:00
Xuanteng Huang 55f17b77c3 gh-128714: Fix function object races in __annotate__, __annotations__ and __type_params__ in free-threading build (#129016) 2025-02-06 20:10:50 +05:30
mpage 255762c09f gh-127274: Defer nested methods (#128012)
Methods (functions defined in class scope) are likely to be cleaned
up by the GC anyway.

Add a new code flag, `CO_METHOD`, that is set for functions defined
in a class scope. Use that when deciding to defer functions.
2024-12-19 13:03:14 -08:00
Sam Gross f4f530804b gh-127582: Make object resurrection thread-safe for free threading. (GH-127612)
Objects may be temporarily "resurrected" in destructors when calling
finalizers or watcher callbacks. We previously undid the resurrection
by decrementing the reference count using `Py_SET_REFCNT`. This was not
thread-safe because other threads might be accessing the object
(modifying its reference count) if it was exposed by the finalizer,
watcher callback, or temporarily accessed by a racy dictionary or list
access.

This adds internal-only thread-safe functions for temporary object
resurrection during destructors.
2024-12-05 16:07:31 -05:00
mpage 09c240f20c gh-115999: Specialize LOAD_GLOBAL in free-threaded builds (#126607)
Enable specialization of LOAD_GLOBAL in free-threaded builds.

Thread-safety of specialization in free-threaded builds is provided by the following:

A critical section is held on both the globals and builtins objects during specialization. This ensures we get an atomic view of both builtins and globals during specialization.
Generation of new keys versions is made atomic in free-threaded builds.
Existing helpers are used to atomically modify the opcode.
Thread-safety of specialized instructions in free-threaded builds is provided by the following:

Relaxed atomics are used when loading and storing dict keys versions. This avoids potential data races as the dict keys versions are read without holding the dictionary's per-object lock in version guards.
Dicts keys objects are passed from keys version guards to the downstream uops. This ensures that we are loading from the correct offset in the keys object. Once a unicode key has been stored in a keys object for a combined dictionary in free-threaded builds, the offset that it is stored in will never be reused for a different key. Once the version guard passes, we know that we are reading from the correct offset.
The dictionary read fast-path is used to read values from the dictionary once we know the correct offset.
2024-11-21 11:22:21 -08:00
Xuanteng Huang 35df4eb959 gh-126072: do not add None to co_consts if there is no docstring (GH-126101) 2024-10-30 09:01:09 +00:00
Sam Gross 3c4a7fa617 gh-124218: Avoid refcount contention on builtins module (GH-125847)
This replaces `_PyEval_BuiltinsFromGlobals` with
`_PyDict_LoadBuiltinsFromGlobals`, which returns a new reference
instead of a borrowed reference. Internally, the new function uses
per-thread reference counting when possible to avoid contention on the
refcount fields on the builtins module.
2024-10-24 12:44:38 -04:00
Sam Gross 9b0bfba2a2 gh-124218: Use per-thread reference counting for globals and builtins (#125713)
Use per-thread refcounting for the reference from function objects to
the globals and builtins dictionaries.
2024-10-21 12:51:29 -04:00
Zachary Ware c3164ae3cf gh-125017: Fix refleak from GH-125636 (GH-125664) 2024-10-17 17:21:32 -05:00
Jelle Zijlstra f203d1cb52 gh-125017: Fix crash on premature access to classmethod/staticmethod annotations (#125636) 2024-10-17 09:45:25 -07:00
Sam Gross 3ea488aac4 gh-124218: Use per-thread refcounts for code objects (#125216)
Use per-thread refcounting for the reference from function objects to
their corresponding code object. This can be a source of contention when
frequently creating nested functions. Deferred refcounting alone isn't a
great fit here because these references are on the heap and may be
modified by other libraries.
2024-10-15 15:06:41 -04:00
mpage e99f159be4 gh-115999: Stop the world when invalidating function versions (#124997)
Stop the world when invalidating function versions

The tier1 interpreter specializes `CALL` instructions based on the values
of certain function attributes (e.g. `__code__`, `__defaults__`). The tier1
interpreter uses function versions to verify that the attributes of a function
during execution of a specialization match those seen during specialization.
A function's version is initialized in `MAKE_FUNCTION` and is invalidated when
any of the critical function attributes are changed. The tier1 interpreter stores
the function version in the inline cache during specialization. A guard is used by
the specialized instruction to verify that the version of the function on the operand
stack matches the cached version (and therefore has all of the expected attributes).
It is assumed that once the guard passes, all attributes will remain unchanged
while executing the rest of the specialized instruction.

Stopping the world when invalidating function versions ensures that all critical
function attributes will remain unchanged after the function version guard passes
in free-threaded builds. It's important to note that this is only true if the remainder
of the specialized instruction does not enter and exit a stop-the-world point.

We will stop the world the first time any of the following function attributes
are mutated:

- defaults
- vectorcall
- kwdefaults
- closure
- code

This should happen rarely and only happens once per function, so the performance
impact on majority of code should be minimal.

Additionally, refactor the API for manipulating function versions to more clearly
match the stated semantics.
2024-10-08 10:04:35 -04:00
Victor Stinner 7a178b7605 gh-111178: Fix function signatures in funcobject.c (#124908) 2024-10-02 19:29:56 +02:00
sobolevn e9681211b9 gh-122229: Add missing Py_DECREF in func_get_annotation_dict (#122230) 2024-07-24 05:47:52 -07:00
Jelle Zijlstra d28afd3fa0 gh-119180: Lazily wrap annotations on classmethod and staticmethod (#119864) 2024-05-31 14:05:51 -07:00
Jelle Zijlstra e9875ecb5d gh-119180: PEP 649: Add __annotate__ attributes (#119209) 2024-05-22 04:38:12 +02:00
mpage 37d0950022 gh-117657: Disable the function/code cache in free-threaded builds (#118301)
This is only used by the specializing interpreter and the tier 2
optimizer, both of which are disabled in free-threaded builds.
2024-05-03 16:21:04 -04:00
Sam Gross 4ad8f090cc gh-117376: Partial implementation of deferred reference counting (#117696)
This marks objects as using deferred refrence counting using the
`ob_gc_bits` field in the free-threaded build and collects those objects
during GC.
2024-04-12 17:36:20 +00:00
Guido van Rossum 570a82d46a gh-117045: Add code object to function version cache (#117028)
Changes to the function version cache:

- In addition to the function object, also store the code object,
  and allow the latter to be retrieved even if the function has been evicted.
- Stop assigning new function versions after a critical attribute (e.g. `__code__`)
  has been modified; the version is permanently reset to zero in this case.
- Changes to `__annotations__` are no longer considered critical. (This fixes gh-109998.)

Changes to the Tier 2 optimization machinery:

- If we cannot map a function version to a function, but it is still mapped to a code object,
  we continue projecting the trace.
  The operand of the `_PUSH_FRAME` and `_POP_FRAME` opcodes can be either NULL,
  a function object, or a code object with the lowest bit set.

This allows us to trace through code that calls an ephemeral function,
i.e., a function that may not be alive when we are constructing the executor,
e.g. a generator expression or certain nested functions.
We will lose globals removal inside such functions,
but we can still do other peephole operations
(and even possibly [call inlining](https://github.com/python/cpython/pull/116290),
if we decide to do it), which only need the code object.
As before, if we cannot retrieve the code object from the cache, we stop projecting.
2024-03-21 12:37:41 -07:00
Guido van Rossum 7e1f38f2de gh-116916: Remove separate next_func_version counter (#116918)
Somehow we ended up with two separate counter variables tracking "the next function version".
Most likely this was a historical accident where an old branch was updated incorrectly.
This PR merges the two counters into a single one: `interp->func_state.next_version`.
2024-03-18 11:11:10 -07:00
Michael Droettboom ea3cd0498c gh-114312: Collect stats for unlikely events (GH-114493) 2024-01-25 11:10:51 +00:00
Nikita Sobolev 2ac4cf4743 gh-112640: Add kwdefaults parameter to types.FunctionType.__new__ (#112641) 2024-01-11 00:42:30 -08:00
Serhiy Storchaka 18203a6bc9 gh-111789: Use PyDict_GetItemRef() in Objects/ (GH-111827) 2023-11-14 11:25:39 +02:00
Serhiy Storchaka 1d75ef6b61 gh-111999: Add signatures and improve docstrings for builtins (GH-112000) 2023-11-13 09:13:49 +02:00
Irit Katriel 2f9cb7e095 gh-81137: deprecate assignment of code object to a function of a mismatched type (#111823) 2023-11-07 18:54:36 +00:00
Serhiy Storchaka 970e719a7a gh-108082: Use PyErr_FormatUnraisable() (GH-111580)
Replace most of calls of _PyErr_WriteUnraisableMsg() and some
calls of PyErr_WriteUnraisable(NULL) with PyErr_FormatUnraisable().

Co-authored-by: Victor Stinner <vstinner@python.org>
2023-11-02 09:16:34 +00:00
Raymond Hettinger 7f9a99e854 gh-89519: Remove classmethod descriptor chaining, deprecated since 3.11 (gh-110163) 2023-10-27 00:24:56 -05:00
Victor Stinner be5e8a0103 gh-110964: Remove private _PyArg functions (#110966)
Move the following private functions and structures to
pycore_modsupport.h internal C API:

* _PyArg_BadArgument()
* _PyArg_CheckPositional()
* _PyArg_NoKeywords()
* _PyArg_NoPositional()
* _PyArg_ParseStack()
* _PyArg_ParseStackAndKeywords()
* _PyArg_Parser structure
* _PyArg_UnpackKeywords()
* _PyArg_UnpackKeywordsWithVararg()
* _PyArg_UnpackStack()
* _Py_ANY_VARARGS()

Changes:

* Python/getargs.h now includes pycore_modsupport.h to export
  functions.
* clinic.py now adds pycore_modsupport.h when one of these functions
  is used.
* Add pycore_modsupport.h includes when a C extension uses one of
  these functions.
* Define Py_BUILD_CORE_MODULE in C extensions which now include
  directly or indirectly (via code generated by Argument Clinic)
  pycore_modsupport.h:

  * _csv
  * _curses_panel
  * _dbm
  * _gdbm
  * _multiprocessing.posixshmem
  * _sqlite.row
  * _statistics
  * grp
  * resource
  * syslog

* _testcapi: bad_get() no longer uses METH_FASTCALL calling
  convention but METH_VARARGS. Replace _PyArg_UnpackStack() with
  PyArg_ParseTuple().
* _testcapi: add PYTESTCAPI_NEED_INTERNAL_API macro which is defined
  by _testcapi sub-modules which need the internal C API
  (pycore_modsupport.h): exceptions.c, float.c, vectorcall.c,
  watchers.c.
* Remove Include/cpython/modsupport.h header file.
  Include/modsupport.h no longer includes the removed header file.
* Fix mypy clinic.py
2023-10-17 14:30:31 +02:00
Brandt Bucher 13380da91e GH-104584: Fix refleak when tracing through calls (GH-110593) 2023-10-10 08:29:48 +00:00
Mark Shannon 15d4c9fabc GH-108716: Turn off deep-freezing of code objects. (GH-108722) 2023-09-08 10:34:40 +01:00
Guido van Rossum 3107b453bc gh-108253: Fix reads of uninitialized memory in funcobject.c (#108383) 2023-08-23 22:36:19 +00:00
Guido van Rossum b8f96b5eda gh-108253: Fix bug in func version cache (#108296)
When a function object changed its version, a stale pointer might remain in the cache.
Zap these whenever `func_version` changes (even when set to 0).
2023-08-22 08:29:49 -07:00
Guido van Rossum 61c7249759 gh-106581: Project through calls (#108067)
This finishes the work begun in gh-107760. When, while projecting a superblock, we encounter a call to a short, simple function, the superblock will now enter the function using `_PUSH_FRAME`, continue through it, and leave it using `_POP_FRAME`, and then continue through the original code. Multiple frame pushes and pops are even possible. It is also possible to stop appending to the superblock in the middle of a called function, when running out of space or encountering an unsupported bytecode.
2023-08-17 11:29:58 -07:00
Brandt Bucher 05a824f294 GH-84436: Skip refcounting for known immortals (GH-107605) 2023-08-04 16:24:50 -07:00
Victor Stinner 1a3faba9f1 gh-106869: Use new PyMemberDef constant names (#106871)
* Remove '#include "structmember.h"'.
* If needed, add <stddef.h> to get offsetof() function.
* Update Parser/asdl_c.py to regenerate Python/Python-ast.c.
* Replace:

  * T_SHORT => Py_T_SHORT
  * T_INT => Py_T_INT
  * T_LONG => Py_T_LONG
  * T_FLOAT => Py_T_FLOAT
  * T_DOUBLE => Py_T_DOUBLE
  * T_STRING => Py_T_STRING
  * T_OBJECT => _Py_T_OBJECT
  * T_CHAR => Py_T_CHAR
  * T_BYTE => Py_T_BYTE
  * T_UBYTE => Py_T_UBYTE
  * T_USHORT => Py_T_USHORT
  * T_UINT => Py_T_UINT
  * T_ULONG => Py_T_ULONG
  * T_STRING_INPLACE => Py_T_STRING_INPLACE
  * T_BOOL => Py_T_BOOL
  * T_OBJECT_EX => Py_T_OBJECT_EX
  * T_LONGLONG => Py_T_LONGLONG
  * T_ULONGLONG => Py_T_ULONGLONG
  * T_PYSSIZET => Py_T_PYSSIZET
  * T_NONE => _Py_T_NONE
  * READONLY => Py_READONLY
  * PY_AUDIT_READ => Py_AUDIT_READ
  * READ_RESTRICTED => Py_AUDIT_READ
  * PY_WRITE_RESTRICTED => _Py_WRITE_RESTRICTED
  * RESTRICTED => (READ_RESTRICTED | _Py_WRITE_RESTRICTED)
2023-07-25 15:28:30 +02:00
Serhiy Storchaka be1b968dc1 gh-106521: Remove _PyObject_LookupAttr() function (GH-106642) 2023-07-12 08:57:10 +03:00
Serhiy Storchaka 93d292c2b3 gh-106303: Use _PyObject_LookupAttr() instead of PyObject_GetAttr() (GH-106304)
It simplifies and speed up the code.
2023-07-09 15:27:03 +03:00
Serhiy Storchaka 08c08d21b0 gh-106033: Get rid of PyDict_GetItem in _PyFunction_FromConstructor (GH-106044) 2023-06-29 12:31:08 +03:00
Jelle Zijlstra 3fadd7d585 gh-104600: Make function.__type_params__ writable (#104601) 2023-05-18 16:45:37 -07:00