* SEND specialization. Adds 2 new specialized instructions:
* SEND_VIRTUAL: for sends to virtual iterators e.g lists and tuples
* SEND_ASYNC_GEN: for sends to async generators
Tweak FOR_ITER_VIRTUAL so that SEND_VIRTUAL and FOR_ITER_VIRTUAL use equivalent guards
The function__entry and function__return probes stopped working in Python 3.11
when the interpreter was restructured around the new bytecode system. This change
restores these probes by adding DTRACE_FUNCTION_ENTRY() at the start_frame label
in bytecodes.c and DTRACE_FUNCTION_RETURN() in the RETURN_VALUE and YIELD_VALUE
instructions. The helper functions are defined in ceval.c and extract the
filename, function name, and line number from the frame before firing the probe.
This builds on the approach from https://github.com/python/cpython/pull/125019
but avoids modifying the JIT template since the JIT does not currently support
DTrace. The macros are conditionally compiled with WITH_DTRACE and are no-ops
otherwise. The tests have been updated to use modern opcode names (CALL, CALL_KW,
CALL_FUNCTION_EX) and a new bpftrace backend was added for Linux CI alongside
the existing SystemTap tests. Line probe tests were removed since that probe
was never restored after 3.11.
Add a keyword-only `max_threads` argument to `dump_traceback()` and
`dump_traceback_later()`, defaulting to 100 to preserve existing
behavior. Allows server processes with many worker threads to dump
beyond the historical 100-thread cap (previously a hardcoded
`MAX_NTHREADS = 100` in `Python/traceback.c`).
The cap matters in practice: tstates are prepended to the
PyInterpreterState linked list, so the dump walks newest-first. With
more than 100 threads alive, the main thread (oldest, at the tail) is
silently elided from watchdog dumps -- exactly the thread that's
usually wanted.
The hardcoded value is moved to a new internal macro
`_Py_TRACEBACK_MAX_NTHREADS` in `pycore_traceback.h` so the in-tree
fatal-signal callers all reference one source of truth.
* Records the same objects for each member of family before execution
* Records derived values when recording the trace
* This makes sure that specialization, or deoptimization, does not cause invalid values to be recorded
The add_const() function in flowgraph.c uses a linear search over the
consts list to find the index of a constant. After gh-126835 moved
constant folding from the AST optimizer to the CFG optimizer, this
function is now called N times for N inner tuple elements during
fold_tuple_of_constants(), resulting in O(N²) total time.
Fix by maintaining an auxiliary _Py_hashtable_t that maps object
pointers to their indices in the consts list, providing O(1) lookup.
For a file with 100,000 constant 2-tuples:
- Before: 10.38s (add_const occupies 83.76% of CPU time)
- After: 1.48s
Improve `hash()` builtin docstring with caveats.
Mention its return type and that the value can be expected to change between
processes (hash randomization).
Why? The `hash` builtin gets reached for and used by a lot of people whether it
is the right tool or not. IDEs surface docstrings and people use pydoc and
`help(hash)`.
* Replaces ad-hoc logic for ending traces with a simple inequality: `fitness < exit_quality`
* Fitness starts high and is reduced for branches, backward edges, calls and trace length
* Exit quality reflect how good a spot that instruction is to end a trace. Closing a loop is very, specializable instructions are very low and the others in between.
_PyRawMutex_UnlockSlow CAS-removes the waiter from the list and then
calls _PySemaphore_Wakeup, with no handshake. If _PySemaphore_Wait
returns Py_PARK_INTR, the waiter can destroy its stack-allocated
semaphore before the unlocker's Wakeup runs, causing a fatal error from
ReleaseSemaphore / sem_post.
Loop in _PyRawMutex_LockSlow until _PySemaphore_Wait returns Py_PARK_OK,
which is only signalled when a matching Wakeup has been observed.
Also include GetLastError() and the handle in the Windows fatal messages
in _PySemaphore_Init, _PySemaphore_Wait, and _PySemaphore_Wakeup to make
similar races easier to diagnose in the future.
* Make the `PY_UNWIND` monitoring event available as a code-local
event to allow trapping on function exit events when an exception
bubbles up. This complements the PY_RETURN event by allowing to
catch any function exit event.
* Allow `PY_UNWIND` to be `DISABLE`d; disabling it disables the event for the whole code object.
* Do the above for `PY_THROW`, `RAISE`, `EXCEPTION_HANDLED`, and `RERAISE` events.
Forbid marshalling recursive code, slice and frozendict objects which
cannot be correctly unmarshalled.
Reject invalid marshal data produced by marshalling recursive frozendict
objects which was previously incorrectly unmarshalled.
Add multiple tests for recursive data structures.
Fix AArch64 multi-instruction constants and relocations
* Elimates rendundant orr xN, xN, 0xffff after 16 or 32 bit loads
* Merges adrp (21rx) and ldr (12) relocations into single 33rx relocation, when safe to do so.
* Add FOR_ITER_VIRTUAL to specialize FOR_ITER for virtual iterators
* Add GET_ITER_SELF to specialize GET_ITER for iterators (including generators)
* Add GET_ITER_VIRTUAL to specialize GET_ITER for iterables as virtual iterators
* Add new (internal) _tp_iteritem function slot to PyTypeObject
* Put limited RESUME at start of genexpr for free-threading. Fix up exception handling in genexpr