SVG/Font Glyph Analysis & Web DRM Deobfuscation (Raster Hashing + SSIM)

Tip

AWS ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ:HackTricks Training AWS Red Team Expert (ARTE)
GCP ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ: HackTricks Training GCP Red Team Expert (GRTE) Azure ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ: HackTricks Training Azure Red Team Expert (AzRTE)

HackTricks ์ง€์›ํ•˜๊ธฐ

์ด ํŽ˜์ด์ง€๋Š” ์œ„์น˜๊ฐ€ ์ง€์ •๋œ glyph runs์™€ ์š”์ฒญ๋ณ„ ๋ฒกํ„ฐ glyph ์ •์˜(SVG paths)๋ฅผ ํ•จ๊ป˜ ์ „์†กํ•˜๊ณ , ์Šคํฌ๋ž˜ํ•‘์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์š”์ฒญ๋งˆ๋‹ค glyph ID๋ฅผ ๋ฌด์ž‘์œ„ํ™”ํ•˜๋Š” ์›น ๋ฆฌ๋”๋กœ๋ถ€ํ„ฐ ํ…์ŠคํŠธ๋ฅผ ๋ณต๊ตฌํ•˜๋Š” ์‹ค์šฉ์  ๊ธฐ์ˆ ๋“ค์„ ๋ฌธ์„œํ™”ํ•ฉ๋‹ˆ๋‹ค. ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ์š”์ฒญ ๋ฒ”์œ„์˜ ์ˆซ์ž glyph IDs๋ฅผ ๋ฌด์‹œํ•˜๊ณ  ๋ž˜์Šคํ„ฐ ํ•ด์‹ฑ(raster hashing)์œผ๋กœ ์‹œ๊ฐ์  ํ˜•ํƒœ๋ฅผ ์ง€๋ฌธํ™”ํ•œ ๋‹ค์Œ, ์ฐธ์กฐ font atlas์— ๋Œ€ํ•ด SSIM์„ ์‚ฌ์šฉํ•ด ํ˜•ํƒœ๋ฅผ ๋ฌธ์ž๋กœ ๋งคํ•‘ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ์›Œํฌํ”Œ๋กœ์šฐ๋Š” Kindle Cloud Reader๋ฅผ ๋„˜์–ด ์œ ์‚ฌํ•œ ๋ณดํ˜ธ๋ฅผ ๊ฐ€์ง„ ๋ชจ๋“  ๋ทฐ์–ด์— ์ผ๋ฐ˜ํ™”๋ฉ๋‹ˆ๋‹ค.

๊ฒฝ๊ณ : ์ •๋‹นํ•˜๊ฒŒ ์†Œ์œ ํ•œ ์ฝ˜ํ…์ธ ๋ฅผ ๋ฐฑ์—…ํ•˜๋Š” ๊ฒฝ์šฐ ๋ฐ ํ•ด๋‹น ๋ฒ•๋ฅ ๊ณผ ์•ฝ๊ด€์„ ์ค€์ˆ˜ํ•˜๋Š” ๊ฒฝ์šฐ์—๋งŒ ์ด๋Ÿฌํ•œ ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค.

Acquisition (example: Kindle Cloud Reader)

Endpoint observed:

Required materials per session:

  • Browser session cookies (normal Amazon login)
  • Rendering token from a startReading API call
  • Additional ADP session token used by the renderer

Behavior:

  • Each request, when sent with browser-equivalent headers and cookies, returns a TAR archive limited to 5 pages.
  • For a long book you will need many batches; each batch uses a different randomized mapping of glyph IDs.

Typical TAR contents:

  • page_data_0_4.json โ€” positioned text runs as sequences of glyph IDs (not Unicode)
  • glyphs.json โ€” per-request SVG path definitions for each glyph and fontFamily
  • toc.json โ€” table of contents
  • metadata.json โ€” book metadata
  • location_map.json โ€” logicalโ†’visual position mappings

Example page run structure:

{
"type": "TextRun",
"glyphs": [24, 25, 74, 123, 91],
"rect": {"left": 100, "top": 200, "right": 850, "bottom": 220},
"fontStyle": "italic",
"fontWeight": 700,
"fontSize": 12.5
}

์˜ˆ์‹œ glyphs.json ํ•ญ๋ชฉ:

{
"24": {"path": "M 450 1480 L 820 1480 L 820 0 L 1050 0 L 1050 1480 ...", "fontFamily": "bookerly_normal"}
}

anti-scraping path tricks์— ๋Œ€ํ•œ ๋ฉ”๋ชจ:

  • ๊ฒฝ๋กœ์—๋Š” ๋งŽ์€ ๋ฒกํ„ฐ ํŒŒ์„œ์™€ ๋‹จ์ˆœํ•œ ๊ฒฝ๋กœ ์ƒ˜ํ”Œ๋ง์„ ํ˜ผ๋ž€์‹œํ‚ค๋Š” ๋งˆ์ดํฌ๋กœ ์ƒ๋Œ€ ์ด๋™์ด ํฌํ•จ๋  ์ˆ˜ ์žˆ์Œ(์˜ˆ: m3,1 m1,6 m-4,-7).
  • ๋ช…๋ น/์ขŒํ‘œ ์ฐจ๋ถ„์„ ํ•˜์ง€ ๋ง๊ณ  ๊ฐ•๋ ฅํ•œ SVG ์—”์ง„(์˜ˆ: CairoSVG)์œผ๋กœ ํ•ญ์ƒ ์ฑ„์›Œ์ง„ ์™„์ „ํ•œ ๊ฒฝ๋กœ๋ฅผ ๋ Œ๋”๋งํ•˜์„ธ์š”.

Why naรฏve decoding fails

  • Per-request randomized glyph substitution: glyph IDโ†’character ๋งคํ•‘์ด ๋ฐฐ์น˜๋งˆ๋‹ค ๋ฌด์ž‘์œ„ํ™”๋จ; ID๋Š” ์ „์—ญ์ ์œผ๋กœ ์˜๋ฏธ๊ฐ€ ์—†์Œ.
  • Direct SVG coordinate comparison์€ ์ทจ์•ฝํ•จ: ๋™์ผํ•œ ๋ชจ์–‘์ด๋ผ๋„ ์š”์ฒญ๋งˆ๋‹ค ์ˆ˜์น˜ ์ขŒํ‘œ๋‚˜ ๋ช…๋ น ์ธ์ฝ”๋”ฉ์ด ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์Œ.
  • OCR on isolated glyphs ์„ฑ๋Šฅ์ด ๋‚ฎ์Œ(โ‰ˆ50%): ๊ตฌ๋‘์ ๊ณผ ์œ ์‚ฌ ๊ธ€๋ฆฌํ”„๋ฅผ ํ˜ผ๋™ํ•˜๊ณ  ligatures๋ฅผ ๋ฌด์‹œํ•จ.

Working pipeline: request-agnostic glyph normalization and mapping

  1. Rasterize per-request SVG glyphs
  • ์ œ๊ณต๋œ path๋กœ ๊ธ€๋ฆฌํ”„๋ณ„ ์ตœ์†Œ SVG ๋ฌธ์„œ๋ฅผ ๋งŒ๋“ค๊ณ  CairoSVG ๋˜๋Š” ๊นŒ๋‹ค๋กœ์šด ๊ฒฝ๋กœ ์‹œํ€€์Šค๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋™๋“ฑํ•œ ์—”์ง„์„ ์‚ฌ์šฉํ•ด ๊ณ ์ • ์บ”๋ฒ„์Šค(์˜ˆ: 512ร—512)๋กœ ๋ Œ๋”๋งํ•ฉ๋‹ˆ๋‹ค.
  • ์ฑ„์šฐ๊ธฐ๋Š” ๊ฒ€์ •/ํฐ์ƒ‰์œผ๋กœ ๋ Œ๋”๋งํ•˜๊ณ , ๋ Œ๋”๋Ÿฌ์™€ AA์— ๋”ฐ๋ฅธ ์•„ํ‹ฐํŒฉํŠธ๋ฅผ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด strokes๋Š” ํ”ผํ•ฉ๋‹ˆ๋‹ค.
  1. Perceptual hashing for cross-request identity
  • ๊ฐ ๊ธ€๋ฆฌํ”„ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด perceptual hash(์˜ˆ: imagehash.phash๋ฅผ ํ†ตํ•œ pHash)๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  • ํ•ด์‹œ๋ฅผ ์•ˆ์ •์  ID๋กœ ์ทจ๊ธ‰ํ•˜์„ธ์š”: ์š”์ฒญ ๊ฐ„ ๋™์ผํ•œ ์‹œ๊ฐ์  ๋ชจ์–‘์€ ๋™์ผํ•œ perceptual hash๋กœ ์ˆ˜๋ ดํ•˜์—ฌ ๋ฌด์ž‘์œ„ํ™”๋œ ID๋ฅผ ๋ฌด๋ ฅํ™”ํ•ฉ๋‹ˆ๋‹ค.
  1. Reference font atlas generation
  • ๋Œ€์ƒ TTF/OTF ํฐํŠธ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค(์˜ˆ: Bookerly normal/italic/bold/bold-italic).
  • Aโ€“Z, aโ€“z, 0โ€“9, punctuation, ํŠน์ˆ˜ ๊ธฐํ˜ธ(em/en dashes, quotes) ๋ฐ ๋ช…์‹œ์  ligatures: ff, fi, fl, ffi, ffl์— ๋Œ€ํ•œ ํ›„๋ณด๋ฅผ ๋ Œ๋”๋งํ•ฉ๋‹ˆ๋‹ค.
  • ํฐํŠธ ๋ณ€ํ˜•(normal/italic/bold/bold-italic)๋ณ„๋กœ ๋ณ„๋„ ์•„ํ‹€๋ผ์Šค๋ฅผ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.
  • ligatures์— ๋Œ€ํ•ด ๊ธ€๋ฆฌํ”„ ์ˆ˜์ค€์˜ ์ถฉ์‹ค๋„๊ฐ€ ํ•„์š”ํ•˜๋ฉด proper text shaper(HarfBuzz)๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”; ๋‹จ์ˆœํžˆ ligature ๋ฌธ์ž์—ด์„ ์ง์ ‘ ๋ Œ๋”๋งํ•˜๊ณ  shaping ์—”์ง„์ด ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๋ฉด Pillow ImageFont๋กœ์˜ ๊ฐ„๋‹จํ•œ ๋ž˜์Šคํ„ฐํ™”๋„ ์ถฉ๋ถ„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  1. Visual similarity matching with SSIM
  • ๊ฐ ๋ฏธํ™•์ธ ๊ธ€๋ฆฌํ”„ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ๋ชจ๋“  ํฐํŠธ ๋ณ€ํ˜• ์•„ํ‹€๋ผ์Šค์˜ ํ›„๋ณด ์ด๋ฏธ์ง€๋“ค๊ณผ SSIM(Structural Similarity Index)์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  • ์ตœ๊ณ  ์ ์ˆ˜๋ฅผ ๋ฐ›์€ ๋งค์น˜์˜ ๋ฌธ์ž ๋ฌธ์ž์—ด์„ ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค. SSIM์€ ํ”ฝ์…€ ์ •ํ™• ๋น„๊ต๋ณด๋‹ค ์ž‘์€ ์•ˆํ‹ฐ์•จ๋ฆฌ์–ด์‹ฑ, ์Šค์ผ€์ผ, ์ขŒํ‘œ ์ฐจ์ด๋ฅผ ๋” ์ž˜ ํก์ˆ˜ํ•ฉ๋‹ˆ๋‹ค.
  1. Edge handling and reconstruction
  • ๊ธ€๋ฆฌํ”„๊ฐ€ ligature(๋‹ค์ค‘ ๋ฌธ์ž)๋กœ ๋งคํ•‘๋˜๋ฉด ๋””์ฝ”๋”ฉ ์‹œ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.
  • ๋Ÿฐ ์‚ฌ๊ฐํ˜•(top/left/right/bottom)์„ ์‚ฌ์šฉํ•ด ๋ฌธ๋‹จ ๊ตฌ๋ถ„(Y ๋ธํƒ€), ์ •๋ ฌ(X ํŒจํ„ด), ์Šคํƒ€์ผ ๋ฐ ํฌ๊ธฐ๋ฅผ ์ถ”๋ก ํ•ฉ๋‹ˆ๋‹ค.
  • fontStyle, fontWeight, fontSize ๋ฐ ๋‚ด๋ถ€ ๋งํฌ๋ฅผ ๋ณด์กดํ•˜์—ฌ HTML/EPUB๋กœ ์ง๋ ฌํ™”ํ•ฉ๋‹ˆ๋‹ค.

Implementation tips

  • ํ•ด์‹ฑ ๋ฐ SSIM ์ „์— ๋ชจ๋“  ์ด๋ฏธ์ง€๋ฅผ ๋™์ผํ•œ ํฌ๊ธฐ์™€ ๊ทธ๋ ˆ์ด์Šค์ผ€์ผ๋กœ ์ •๊ทœํ™”ํ•˜์„ธ์š”.
  • ํผ์…‰์ถ”์–ผ ํ•ด์‹œ๋กœ ์บ์‹œํ•˜์—ฌ ๋ฐฐ์น˜ ๊ฐ„ ๋ฐ˜๋ณต ๊ธ€๋ฆฌํ”„์— ๋Œ€ํ•ด SSIM ์žฌ๊ณ„์‚ฐ์„ ํ”ผํ•˜์„ธ์š”.
  • ๋” ๋‚˜์€ ์‹๋ณ„์„ ์œ„ํ•ด ๊ณ ํ’ˆ์งˆ ๋ž˜์Šคํ„ฐ ํฌ๊ธฐ(์˜ˆ: 256โ€“512 px)๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , SSIM ๊ฐ€์†์„ ์œ„ํ•ด ํ•„์š” ์‹œ ์ถ•์†Œํ•˜์„ธ์š”.
  • Pillow๋กœ TTF ํ›„๋ณด๋ฅผ ๋ Œ๋”๋งํ•˜๋Š” ๊ฒฝ์šฐ ๋™์ผํ•œ ์บ”๋ฒ„์Šค ํฌ๊ธฐ๋ฅผ ์„ค์ •ํ•˜๊ณ  ๊ธ€๋ฆฌํ”„๋ฅผ ๊ฐ€์šด๋ฐ์— ๋ฐฐ์น˜ํ•˜๋ฉฐ, ascender/descender๊ฐ€ ์ž˜๋ฆฌ์ง€ ์•Š๋„๋ก ํŒจ๋”ฉํ•˜์„ธ์š”.
Python: end-to-end glyph normalization and matching (raster hash + SSIM) ```python # pip install cairosvg pillow imagehash scikit-image uharfbuzz freetype-py import io, json, tarfile, base64, math from PIL import Image, ImageOps, ImageDraw, ImageFont import imagehash from skimage.metrics import structural_similarity as ssim import cairosvg

CANVAS = (512, 512) BGCOLOR = 255 # white FGCOLOR = 0 # black

โ€” SVG -> raster โ€”

def rasterize_svg_path(path_d: str, canvas=CANVAS) -> Image.Image:

Build a minimal SVG document; rely on CAIRO for correct path handling

svg = fโ€™โ€˜โ€™ โ€˜โ€™โ€™ png_bytes = cairosvg.svg2png(bytestring=svg.encode(โ€˜utf-8โ€™)) img = Image.open(io.BytesIO(png_bytes)).convert(โ€˜Lโ€™) return img

โ€” Perceptual hash โ€”

def phash_img(img: Image.Image) -> str:

Normalize to grayscale and fixed size

img = ImageOps.grayscale(img).resize((128, 128), Image.LANCZOS) return str(imagehash.phash(img))

โ€” Reference atlas from TTF โ€”

def render_char(candidate: str, ttf_path: str, canvas=CANVAS, size=420) -> Image.Image:

Render centered text on same canvas to approximate glyph shapes

font = ImageFont.truetype(ttf_path, size=size) img = Image.new(โ€˜Lโ€™, canvas, color=BGCOLOR) draw = ImageDraw.Draw(img) w, h = draw.textbbox((0,0), candidate, font=font)[2:] dx = (canvas[0]-w)//2 dy = (canvas[1]-h)//2 draw.text((dx, dy), candidate, fill=FGCOLOR, font=font) return img

โ€” Build atlases for variants โ€”

FONT_VARIANTS = { โ€˜normalโ€™: โ€˜/path/to/Bookerly-Regular.ttfโ€™, โ€˜italicโ€™: โ€˜/path/to/Bookerly-Italic.ttfโ€™, โ€˜boldโ€™: โ€˜/path/to/Bookerly-Bold.ttfโ€™, โ€˜bolditalicโ€™:โ€˜/path/to/Bookerly-BoldItalic.ttfโ€™, } CANDIDATES = [ *[chr(c) for c in range(0x20, 0x7F)], # basic ASCII โ€˜โ€“โ€™, โ€˜โ€”โ€™, โ€˜โ€œโ€™, โ€˜โ€โ€™, โ€˜โ€˜โ€™, โ€˜โ€™โ€™, โ€˜โ€ขโ€™, # common punctuation โ€˜ffโ€™,โ€˜fiโ€™,โ€˜flโ€™,โ€˜ffiโ€™,โ€˜fflโ€™ # ligatures ]

def build_atlases(): atlases = {} # variant -> list[(char, img)] for variant, ttf in FONT_VARIANTS.items(): out = [] for ch in CANDIDATES: img = render_char(ch, ttf) out.append((ch, img)) atlases[variant] = out return atlases

โ€” SSIM match โ€”

def best_match(img: Image.Image, atlases) -> tuple[str, float, str]:

Returns (char, score, variant)

img_n = ImageOps.grayscale(img).resize((128,128), Image.LANCZOS) img_n = ImageOps.autocontrast(img_n) best = (โ€˜โ€™, -1.0, โ€˜โ€™) import numpy as np candA = np.array(img_n) for variant, entries in atlases.items(): for ch, ref in entries: ref_n = ImageOps.grayscale(ref).resize((128,128), Image.LANCZOS) ref_n = ImageOps.autocontrast(ref_n) candB = np.array(ref_n) score = ssim(candA, candB) if score > best[1]: best = (ch, score, variant) return best

โ€” Putting it together for one TAR batch โ€”

def process_tar(tar_path: str, cache: dict, atlases) -> list[dict]:

cache: perceptual-hash -> mapping

out_runs = [] with tarfile.open(tar_path, โ€˜r:*โ€™) as tf: glyphs = json.load(tf.extractfile(โ€˜glyphs.jsonโ€™))

page_data_0_4.json may differ in name; list members to find it

pd_name = next(m.name for m in tf.getmembers() if m.name.startswith(โ€˜page_data_โ€™)) page_data = json.load(tf.extractfile(pd_name))

1. Rasterize + hash all glyphs for this batch

id2hash = {} for gid, meta in glyphs.items(): img = rasterize_svg_path(meta[โ€˜pathโ€™]) h = phash_img(img) id2hash[int(gid)] = (h, img)

2. Ensure all hashes are resolved to characters in cache

for h, img in {v[0]: v[1] for v in id2hash.values()}.items(): if h not in cache: ch, score, variant = best_match(img, atlases) cache[h] = { โ€˜charโ€™: ch, โ€˜scoreโ€™: float(score), โ€˜variantโ€™: variant }

3. Decode text runs

for run in page_data: if run.get(โ€˜typeโ€™) != โ€˜TextRunโ€™: continue decoded = [] for gid in run[โ€˜glyphsโ€™]: h, _ = id2hash[gid] decoded.append(cache[h][โ€˜charโ€™]) run_out = { โ€˜textโ€™: โ€˜โ€™.join(decoded), โ€˜rectโ€™: run.get(โ€˜rectโ€™), โ€˜fontStyleโ€™: run.get(โ€˜fontStyleโ€™), โ€˜fontWeightโ€™: run.get(โ€˜fontWeightโ€™), โ€˜fontSizeโ€™: run.get(โ€˜fontSizeโ€™), } out_runs.append(run_out) return out_runs

Usage sketch:

atlases = build_atlases()

cache =

for tar in sorted(glob(โ€˜batches/*.tarโ€™)):

runs = process_tar(tar, cache, atlases)

# accumulate runs for layout reconstruction โ†’ EPUB/HTML

</details>

## Layout/EPUB reconstruction heuristics

- Paragraph breaks: ๋‹ค์Œ run์˜ top Y๊ฐ€ ์ด์ „ ์ค„์˜ baseline์„ ํฐํŠธ ํฌ๊ธฐ์— ์ƒ๋Œ€์ ์ธ ์ž„๊ณ„๊ฐ’ ์ด์ƒ์œผ๋กœ ์ดˆ๊ณผํ•˜๋ฉด ์ƒˆ ๋ฌธ๋‹จ์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
- Alignment: ์™ผ์ชฝ ์ •๋ ฌ ๋ฌธ๋‹จ์€ ์œ ์‚ฌํ•œ left X๋กœ ๊ทธ๋ฃนํ™”ํ•ฉ๋‹ˆ๋‹ค; ๊ฐ€์šด๋ฐ ์ •๋ ฌ์€ ๋Œ€์นญ ์—ฌ๋ฐฑ์œผ๋กœ ๊ฐ์ง€ํ•˜๊ณ ; ์˜ค๋ฅธ์ชฝ ์ •๋ ฌ์€ ์˜ค๋ฅธ์ชฝ ๊ฐ€์žฅ์ž๋ฆฌ๋กœ ๊ฐ์ง€ํ•ฉ๋‹ˆ๋‹ค.
- Styling: ๊ธฐ์šธ์ž„/๊ตต๊ฒŒ๋Š” `fontStyle`/`fontWeight`๋กœ ๋ณด์กดํ•ฉ๋‹ˆ๋‹ค; ์ œ๋ชฉ๊ณผ ๋ณธ๋ฌธ์„ ๊ทผ์‚ฌํ™”ํ•˜๊ธฐ ์œ„ํ•ด `fontSize` ๋ฒ„ํ‚ท๋ณ„๋กœ CSS ํด๋ž˜์Šค๋ฅผ ๋‹ฌ๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
- Links: runs์— ๋งํฌ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ(์˜ˆ: `positionId`)๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์œผ๋ฉด ์•ต์ปค์™€ ๋‚ด๋ถ€ href๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

## Mitigating SVG anti-scraping path tricks

- Use filled paths with `fill-rule: nonzero` and a proper renderer (CairoSVG, resvg). ๊ฒฝ๋กœ ํ† ํฐ ์ •๊ทœํ™”์— ์˜์กดํ•˜์ง€ ๋งˆ์„ธ์š”.
- Avoid stroke rendering; ์ฑ„์›Œ์ง„ ์†”๋ฆฌ๋“œ์— ์ง‘์ค‘ํ•˜์—ฌ ๋ฏธ์„ธํ•œ ์ƒ๋Œ€ ์ด๋™์œผ๋กœ ๋ฐœ์ƒํ•˜๋Š” ํ—ค์–ด๋ผ์ธ ์•„ํ‹ฐํŒฉํŠธ๋ฅผ ํšŒํ”ผํ•˜์„ธ์š”.
- ๋ Œ๋”๋งˆ๋‹ค ์•ˆ์ •์ ์ธ `viewBox`๋ฅผ ์œ ์ง€ํ•˜์—ฌ ๋™์ผํ•œ ๋„ํ˜•์ด ๋ฐฐ์น˜ ๊ฐ„์— ์ผ๊ด€๋˜๊ฒŒ ๋ž˜์Šคํ„ฐํ™”๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

## Performance notes

- ์‹ค๋ฌด์—์„œ๋Š” ์ฑ…์ด ์ˆ˜๋ฐฑ ๊ฐœ์˜ ๊ณ ์œ  ๊ธ€๋ฆฌํ”„(์˜ˆ: ํ•ฉ์ž ํฌํ•จ ์•ฝ 361๊ฐœ)๋กœ ์ˆ˜๋ ดํ•ฉ๋‹ˆ๋‹ค. SSIM ๊ฒฐ๊ณผ๋ฅผ perceptual hash๋กœ ์บ์‹œํ•˜์„ธ์š”.
- ์ดˆ๊ธฐ ๋ฐœ๊ฒฌ ์ดํ›„ ์ดํ›„ ๋ฐฐ์น˜๋“ค์€ ์ฃผ๋กœ ์•Œ๋ ค์ง„ ํ•ด์‹œ๋ฅผ ์žฌ์‚ฌ์šฉํ•˜๋ฏ€๋กœ ๋””์ฝ”๋”ฉ์ด I/O-bound๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
- ํ‰๊ท  SSIM โ‰ˆ0.95๋Š” ๊ฐ•ํ•œ ์‹ ํ˜ธ์ž…๋‹ˆ๋‹ค; ์ ์ˆ˜๊ฐ€ ๋‚ฎ์€ ๋งค์น˜๋Š” ์ˆ˜๋™ ๊ฒ€ํ† ๋ฅผ ์œ„ํ•ด ํ”Œ๋ž˜๊ทธํ•˜๋Š” ๊ฒƒ์„ ๊ณ ๋ คํ•˜์„ธ์š”.

## Generalization to other viewers

๋‹ค์Œ์„ ์ œ๊ณตํ•˜๋Š” ๋ชจ๋“  ์‹œ์Šคํ…œ:
- ์š”์ฒญ ๋ฒ”์œ„์˜ ์ˆซ์ž ID์™€ ํ•จ๊ป˜ ์œ„์น˜ ์ง€์ •๋œ glyph runs๋ฅผ ๋ฐ˜ํ™˜
- ์š”์ฒญ๋ณ„ ๋ฒกํ„ฐ ๊ธ€๋ฆฌํ”„(SVG paths ๋˜๋Š” subset fonts)๋ฅผ ์ „์†ก
- ๋Œ€๋Ÿ‰ ์ถ”์ถœ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์š”์ฒญ๋‹น ํŽ˜์ด์ง€ ์ˆ˜๋ฅผ ์ œํ•œ

โ€ฆ๊ฐ™์€ ์ •๊ทœํ™”๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
- ์š”์ฒญ๋ณ„ ๋„ํ˜• ๋ž˜์Šคํ„ฐํ™” โ†’ perceptual hash โ†’ shape ID
- ๊ธ€๊ผด ๋ณ€ํ˜•๋ณ„ ํ›„๋ณด ๊ธ€๋ฆฌํ”„/ํ•ฉ์ž ์•„ํ‹€๋ผ์Šค
- ๋ฌธ์ž๋ฅผ ํ• ๋‹นํ•˜๊ธฐ ์œ„ํ•œ SSIM(๋˜๋Š” ์œ ์‚ฌํ•œ perceptual metric)
- run ์‚ฌ๊ฐํ˜•/์Šคํƒ€์ผ๋กœ๋ถ€ํ„ฐ ๋ ˆ์ด์•„์›ƒ ์žฌ๊ตฌ์„ฑ

## Minimal acquisition example (sketch)

๋ธŒ๋ผ์šฐ์ €์˜ DevTools๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ reader๊ฐ€ `/renderer/render`๋ฅผ ์š”์ฒญํ•  ๋•Œ ์‚ฌ์šฉ๋˜๋Š” ์ •ํ™•ํ•œ ํ—ค๋”, ์ฟ ํ‚ค ๋ฐ ํ† ํฐ์„ ์บก์ฒ˜ํ•˜์„ธ์š”. ๊ทธ๋Ÿฐ ๋‹ค์Œ ์Šคํฌ๋ฆฝํŠธ๋‚˜ curl์—์„œ ์ด๋ฅผ ๋ณต์ œํ•˜์„ธ์š”. ์˜ˆ์‹œ ๊ฐœ์š”:
```bash
curl 'https://read.amazon.com/renderer/render' \
-H 'Cookie: session-id=...; at-main=...; sess-at-main=...' \
-H 'x-adp-session: <ADP_SESSION_TOKEN>' \
-H 'authorization: Bearer <RENDERING_TOKEN_FROM_startReading>' \
-H 'User-Agent: <copy from browser>' \
-H 'Accept: application/x-tar' \
--compressed --output batch_000.tar

๋…์ž์˜ ์š”์ฒญ์— ๋งž๊ฒŒ ํŒŒ๋ผ๋ฏธํ„ฐ(์ฑ… ASIN, ํŽ˜์ด์ง€ ์œˆ๋„์šฐ, viewport)๋ฅผ ์กฐ์ •ํ•˜์„ธ์š”. ์š”์ฒญ๋‹น ์ตœ๋Œ€ 5ํŽ˜์ด์ง€ ์ œํ•œ์ด ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.

๋‹ฌ์„ฑ ๊ฐ€๋Šฅํ•œ ๊ฒฐ๊ณผ

  • perceptual hashing์„ ํ†ตํ•ด 100๊ฐœ ์ด์ƒ์˜ ๋ฌด์ž‘์œ„ํ™”๋œ ์•ŒํŒŒ๋ฒณ์„ ๋‹จ์ผ ๊ธ€๋ฆฌํ”„ ๊ณต๊ฐ„์œผ๋กœ ์ถ•์†Œ
  • ์•„ํ‹€๋ผ์Šค๊ฐ€ ํ•ฉ์ž(ligatures)์™€ ๋ณ€ํ˜•(variants)์„ ํฌํ•จํ•  ๋•Œ ๊ณ ์œ  ๊ธ€๋ฆฌํ”„๋ฅผ ํ‰๊ท  SSIM ~0.95๋กœ 100% ๋งคํ•‘
  • ์žฌ๊ตฌ์„ฑ๋œ EPUB/HTML์ด ์›๋ณธ๊ณผ ์‹œ๊ฐ์ ์œผ๋กœ ๊ตฌ๋ณ„ ๋ถˆ๊ฐ€

References

Tip

AWS ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ:HackTricks Training AWS Red Team Expert (ARTE)
GCP ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ: HackTricks Training GCP Red Team Expert (GRTE) Azure ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ: HackTricks Training Azure Red Team Expert (AzRTE)

HackTricks ์ง€์›ํ•˜๊ธฐ