Commit Graph

72 Commits

Author SHA1 Message Date
Ken Jin 6293d00e72 gh-120619: Strength reduce function guards, support 2-operand uop forms (GH-124846)
Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
2024-11-09 11:35:33 +08:00
mpage 2e95c5ba3b gh-115999: Implement thread-local bytecode and enable specialization for BINARY_OP (#123926)
Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.

Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.

Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
2024-11-04 11:13:32 -08:00
Mark Shannon faa3272fb8 GH-125837: Split LOAD_CONST into three. (GH-125972)
* Add LOAD_CONST_IMMORTAL opcode

* Add LOAD_SMALL_INT opcode

* Remove RETURN_CONST opcode
2024-10-29 11:15:42 +00:00
Brandt Bucher b5b06349eb GH-125912: Teach the JIT's optimizer about _BINARY_OP_INPLACE_ADD_UNICODE (GH-125935) 2024-10-28 14:37:16 -07:00
mpage f978fb4f8d gh-115999: Refactor LOAD_GLOBAL specializations to avoid reloading {globals, builtins} keys (gh-124953)
Each of the `LOAD_GLOBAL` specializations is implemented roughly as:

1. Load keys version.
2. Load cached keys version.
3. Deopt if (1) and (2) don't match.
4. Load keys.
5. Load cached index into keys.
6. Load object from (4) at offset from (5).

This is not thread-safe in free-threaded builds; the keys object may be replaced
in between steps (3) and (4).

This change refactors the specializations to avoid reloading the keys object and
instead pass the keys object from guards to be consumed by downstream uops.
2024-10-09 15:18:25 +00:00
Mark Shannon da071fa3e8 GH-119866: Spill the stack around escaping calls. (GH-124392)
* Spill the evaluation around escaping calls in the generated interpreter and JIT. 

* The code generator tracks live, cached values so they can be saved to memory when needed.

* Spills the stack pointer around escaping calls, so that the exact stack is visible to the cycle GC.
2024-10-07 14:56:39 +01:00
Ken Jin b84a763dca gh-120619: Optimize through _Py_FRAME_GENERAL (GH-124518)
* Optimize through _Py_FRAME_GENERAL

* refactor
2024-10-03 01:10:51 +08:00
Savannah Ostrowski 65f1237098 GH-123516: Improve JIT memory consumption by invalidating cold executors (GH-124443)
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2024-09-27 00:35:42 +00:00
Ken Jin 8810e286fa gh-121459: Deferred LOAD_GLOBAL (GH-123128)
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Sam Gross <655866+colesbury@users.noreply.github.com>
2024-09-14 00:23:51 +08:00
Mark Shannon 4ed7d1d6ac GH-123996: Explicitly mark 'self_or_null' as an array of size 1 to ensure that it is kept in memory for calls (GH-124003) 2024-09-12 15:32:45 +01:00
Mark Shannon a4fd7aa4a6 GH-115776: Allow any fixed sized object to have inline values (GH-123192) 2024-08-21 15:52:04 +01:00
Mark Shannon bb1d30336e GH-118093: Make CALL_ALLOC_AND_ENTER_INIT suitable for tier 2. (GH-123140)
* Convert CALL_ALLOC_AND_ENTER_INIT to micro-ops such that tier 2 supports it

* Allow inexact arguments for CALL_ALLOC_AND_ENTER_INIT.
2024-08-20 16:52:58 +01:00
Mark Shannon c13e7d98fb GH-118093: Specialize CALL_KW (GH-123006) 2024-08-16 17:11:24 +01:00
Mark Shannon eec7bdaf01 GH-120024: Remove CHECK_EVAL_BREAKER macro. (GH-122968)
* Factor some instructions into micro-ops to isolate CHECK_EVAL_BREAKER for escape analysis

* Eliminate CHECK_EVAL_BREAKER macro
2024-08-14 12:04:05 +01:00
Mark Shannon 1795d6ceba GH-122869: Add missing tier two optimizer cases (GH-122936) 2024-08-12 10:35:52 -07:00
Mark Shannon df13a1821a GH-118095: Add tier two support for BINARY_SUBSCR_GETITEM (GH-120793) 2024-08-01 16:19:05 -07:00
Mark Shannon a9d56e38a0 GH-122155: Track local variables between pops and pushes in cases generator (GH-122286) 2024-08-01 09:27:26 +01:00
Mark Shannon 1ca99ed240 Manually override bytecode definition in optimizer, to avoid build error (GH-122316) 2024-07-26 18:38:52 +01:00
Brandt Bucher 64857d849f GH-122294: Burn in the addresses of side exits (GH-122295) 2024-07-26 09:40:15 -07:00
Mark Shannon 95a73917cd GH-122029: Break INSTRUMENTED_CALL into micro-ops, so that its behavior is consistent with CALL (GH-122177) 2024-07-26 14:35:57 +01:00
Mark Shannon afb0aa6ed2 GH-121131: Clean up and fix some instrumented instructions. (GH-121132)
* Add support for 'prev_instr' to code generator and refactor some INSTRUMENTED instructions
2024-07-26 12:24:12 +01:00
Brandt Bucher d9efa45d74 GH-118093: Add tier two support for BINARY_OP_INPLACE_ADD_UNICODE (GH-122253) 2024-07-25 14:45:07 -07:00
Brandt Bucher 5f6001130f GH-118093: Add tier two support for LOAD_ATTR_PROPERTY (GH-122283) 2024-07-25 10:45:28 -07:00
Mark Shannon 2e14a52cce GH-122160: Remove BUILD_CONST_KEY_MAP opcode. (GH-122164) 2024-07-25 16:24:29 +01:00
Brandt Bucher 7b36b67b1e GH-118093: Add tier two support to several instructions (GH-121884) 2024-07-18 14:24:58 -07:00
Brandt Bucher 33903c53db GH-116017: Get rid of _COLD_EXITs (GH-120960) 2024-07-01 13:17:40 -07:00
Ken Jin 22b0de2755 gh-117139: Convert the evaluation stack to stack refs (#118450)
This PR sets up tagged pointers for CPython.

The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.

Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.

This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.

The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.

Please read Include/internal/pycore_stackref.h for more information!

---------

Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
2024-06-27 03:10:43 +08:00
Mark Shannon 8f5a01707f GH-120982: Add stack check assertions to generated interpreter code (GH-120992) 2024-06-25 16:42:29 +01:00
Nadeshiko Manju f385d99f57 gh-120437: Fix _CHECK_STACK_SPACE optimization problems introduced in gh-118322 (GH-120712)
Co-authored-by: Ken Jin <kenjin4096@gmail.com>
2024-06-19 23:34:39 +08:00
Mark Shannon 9cefcc0ee7 GH-120507: Lower the BEFORE_WITH and BEFORE_ASYNC_WITH instructions. (#120640)
* Remove BEFORE_WITH and BEFORE_ASYNC_WITH instructions.

* Add LOAD_SPECIAL instruction

* Reimplement `with` and `async with` statements using LOAD_SPECIAL
2024-06-18 12:17:46 +01:00
Mark Shannon 274f844830 GH-120619: Clean up RETURN_VALUE instruction (GH-120624)
* Rename _POP_FRAME to _RETURN_VALUE as it returns a value as well as popping a frame.

* Remove remaining _POP_FRAMEs
2024-06-17 14:40:11 +01:00
Saul Shanabrook 55402d3232 gh-119258: Eliminate Type Guards in Tier 2 Optimizer with Watcher (GH-119365)
Co-authored-by: parmeggiani <parmeggiani@spaziodati.eu>
Co-authored-by: dpdani <git@danieleparmeggiani.me>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Brandt Bucher <brandtbucher@microsoft.com>
Co-authored-by: Ken Jin <kenjin@python.org>
2024-06-08 17:41:45 +08:00
Jelle Zijlstra 80a4e38994 gh-119821: Support non-dict globals in LOAD_FROM_DICT_OR_GLOBALS (#119822)
Support non-dict globals in LOAD_FROM_DICT_OR_GLOBALS

The implementation basically copies LOAD_GLOBAL. Possibly it could be deduplicated,
but that seems like it may get hairy since the two operations have different operands.

This is important to fix in 3.14 for PEP 649, but it's a bug in earlier versions too,
and we should backport to 3.13 and 3.12 if possible.
2024-05-31 14:05:24 -07:00
Brandt Bucher 5cd3ffd6b7 GH-119258: Handle STORE_ATTR_WITH_HINT in tier two (GH-119481) 2024-05-28 12:47:54 -07:00
Brandt Bucher cfcc054dee GH-119476: Split _CHECK_FUNCTION_VERSION out of _CHECK_FUNCTION_EXACT_ARGS (GH-119510) 2024-05-28 12:45:11 -07:00
Jelle Zijlstra 98e855fcc1 gh-119180: Add LOAD_COMMON_CONSTANT opcode (#119321)
The PEP 649 implementation will require a way to load NotImplementedError
from the bytecode. @markshannon suggested implementing this by converting
LOAD_ASSERTION_ERROR into a more general mechanism for loading constants.

This PR adds this new opcode. I will work on the rest of the implementation
of the PEP separately.

Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
2024-05-22 00:46:39 +00:00
Mark Shannon f5c6b9977a GH-118910: Less boilerplate in the tier 2 optimizer (#118913) 2024-05-10 17:43:23 +01:00
Mark Shannon 1ab6356ebe GH-118095: Use broader specializations of CALL in tier 1, for better tier 2 support of calls. (GH-118322)
* Add CALL_PY_GENERAL, CALL_BOUND_METHOD_GENERAL and call CALL_NON_PY_GENERAL specializations.

* Remove CALL_PY_WITH_DEFAULTS specialization

* Use CALL_NON_PY_GENERAL in more cases when otherwise failing to specialize
2024-05-04 12:11:11 +01:00
Mark Shannon da2cfc4cb6 GH-113464: Remove the extra jump via _SIDE_EXIT in _EXIT_TRACE (GH-118545) 2024-05-04 08:50:24 +01:00
Mark Shannon 67bba9dd0f GH-117442: Check eval-breaker at start (rather than end) of tier 2 loops (GH-118482) 2024-05-02 13:10:31 +01:00
Mark Shannon 5b05d452cd GH-118095: Add tier 2 support for YIELD_VALUE (GH-118380) 2024-04-30 11:33:13 +01:00
Mark Shannon ab6eda0ee5 GH-118095: Allow a variant of RESUME_CHECK in tier 2 (GH-118286) 2024-04-29 07:54:05 +01:00
Mark Shannon 3e06c7f719 GH-118095: Add dynamic exit support and FOR_ITER_GEN support to tier 2 (GH-118279) 2024-04-26 18:08:50 +01:00
Mark Shannon f180b31e76 GH-118095: Handle RETURN_GENERATOR in tier 2 (GH-118180) 2024-04-25 11:32:47 +01:00
Mark Shannon a6647d16ab GH-115480: Reduce guard strength for binary ops when type of one operand is known already (GH-118050) 2024-04-22 13:34:06 +01:00
Mark Shannon e32f6e9e4b GH-115419: Tidy up tier 2 optimizer. Merge peephole pass into main pass (GH-117997) 2024-04-18 11:09:30 +01:00
Peter Lazorchak 1c43468886 gh-116168: Remove extra _CHECK_STACK_SPACE uops (#117242)
This merges all `_CHECK_STACK_SPACE` uops in a trace into a single `_CHECK_STACK_SPACE_OPERAND` uop that checks whether there is enough stack space for all calls included in the entire trace.
2024-04-03 17:14:18 +00:00
Mark Shannon c32dc47aca GH-115776: Embed the values array into the object, for "normal" Python objects. (GH-116115) 2024-04-02 11:59:21 +01:00
Mark Shannon bf82f77957 GH-116422: Tier2 hot/cold splitting (GH-116813)
Splits the "cold" path, deopts and exits, from the "hot" path, reducing the size of most jitted instructions, at the cost of slower exits.
2024-03-26 09:35:11 +00:00
Kirill Podoprigora eebea7e515 gh-117176: Fix compiler warning in Python/optimizer_bytecodes.c (GH-117199) 2024-03-24 20:34:55 +02:00