* SEND specialization. Adds 2 new specialized instructions:
* SEND_VIRTUAL: for sends to virtual iterators e.g lists and tuples
* SEND_ASYNC_GEN: for sends to async generators
Tweak FOR_ITER_VIRTUAL so that SEND_VIRTUAL and FOR_ITER_VIRTUAL use equivalent guards
The function__entry and function__return probes stopped working in Python 3.11
when the interpreter was restructured around the new bytecode system. This change
restores these probes by adding DTRACE_FUNCTION_ENTRY() at the start_frame label
in bytecodes.c and DTRACE_FUNCTION_RETURN() in the RETURN_VALUE and YIELD_VALUE
instructions. The helper functions are defined in ceval.c and extract the
filename, function name, and line number from the frame before firing the probe.
This builds on the approach from https://github.com/python/cpython/pull/125019
but avoids modifying the JIT template since the JIT does not currently support
DTrace. The macros are conditionally compiled with WITH_DTRACE and are no-ops
otherwise. The tests have been updated to use modern opcode names (CALL, CALL_KW,
CALL_FUNCTION_EX) and a new bpftrace backend was added for Linux CI alongside
the existing SystemTap tests. Line probe tests were removed since that probe
was never restored after 3.11.
* Replaces ad-hoc logic for ending traces with a simple inequality: `fitness < exit_quality`
* Fitness starts high and is reduced for branches, backward edges, calls and trace length
* Exit quality reflect how good a spot that instruction is to end a trace. Closing a loop is very, specializable instructions are very low and the others in between.
* Add FOR_ITER_VIRTUAL to specialize FOR_ITER for virtual iterators
* Add GET_ITER_SELF to specialize GET_ITER for iterators (including generators)
* Add GET_ITER_VIRTUAL to specialize GET_ITER for iterables as virtual iterators
* Add new (internal) _tp_iteritem function slot to PyTypeObject
* Put limited RESUME at start of genexpr for free-threading. Fix up exception handling in genexpr
Fix also error handling in _BINARY_OP_ADD_FLOAT,
_BINARY_OP_SUBTRACT_FLOAT and _BINARY_OP_MULTIPLY_FLOAT opcodes.
PyStackRef_FromPyObjectSteal() must not be called with a NULL
pointer.
* Convert DEOPT_IFs to EXIT_IFs for guards. Keep DEOPT_IF for intentional drops to the interpreter.
* Modify BINARY_OP_SUBSCR_LIST_INT and STORE_SUBSCR_LIST_INT to handle negative indices, to keep EXIT_IFs and DEOPT_IFs in different uops
Remove PyThread_type_lock (now uses PyMutex internally).
Add new benchmark options:
- work_inside/work_outside: control work inside and outside the critical section to vary contention levels
- num_locks: use multiple independent locks with threads assigned round-robin
- total_iters: fixed iteration count per thread instead of time-based, useful for measuring fairness
- num_acquisitions: lock acquisitions per loop iteration
- random_locks: acquire random lock each iteration
Also return elapsed time from benchmark_locks() and switch lockbench.py to use argparse.
When the interpreter is in a stop-the-world pause, critical sections
don't need to acquire locks since no other threads can be running.
This avoids a potential deadlock where lock fairness hands off ownership
to a thread that has already suspended for stop-the-world.