Office file analysis

Reading time: 6 minutes

tip

Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)
Learn & practice Az Hacking: HackTricks Training Azure Red Team Expert (AzRTE)

Support HackTricks

For further information check https://trailofbits.github.io/ctf/forensics/. This is just a sumary:

Microsoft has created many office document formats, with two main types being OLE formats (like RTF, DOC, XLS, PPT) and Office Open XML (OOXML) formats (such as DOCX, XLSX, PPTX). These formats can include macros, making them targets for phishing and malware. OOXML files are structured as zip containers, allowing inspection through unzipping, revealing the file and folder hierarchy and XML file contents.

To explore OOXML file structures, the command to unzip a document and the output structure are given. Techniques for hiding data in these files have been documented, indicating ongoing innovation in data concealment within CTF challenges.

For analysis, oletools and OfficeDissector offer comprehensive toolsets for examining both OLE and OOXML documents. These tools help in identifying and analyzing embedded macros, which often serve as vectors for malware delivery, typically downloading and executing additional malicious payloads. Analysis of VBA macros can be conducted without Microsoft Office by utilizing Libre Office, which allows for debugging with breakpoints and watch variables.

Installation and usage of oletools are straightforward, with commands provided for installing via pip and extracting macros from documents. Automatic execution of macros is triggered by functions like AutoOpen, AutoExec, or Document_Open.

bash
sudo pip3 install -U oletools
olevba -c /path/to/document #Extract macros

OLE Compound File exploitation: Autodesk Revit RFA – ECC recomputation and controlled gzip

Revit RFA models are stored as an OLE Compound File (aka CFBF). The serialized model is under storage/stream:

  • Storage: Global
  • Stream: LatestGlobal\Latest

Key layout of Global\Latest (observed on Revit 2025):

  • Header
  • GZIP-compressed payload (the actual serialized object graph)
  • Zero padding
  • Error-Correcting Code (ECC) trailer

Revit will auto-repair small perturbations to the stream using the ECC trailer and will reject streams that don’t match the ECC. Therefore, naïvely editing the compressed bytes won’t persist: your changes are either reverted or the file is rejected. To ensure byte-accurate control over what the deserializer sees you must:

  • Recompress with a Revit-compatible gzip implementation (so the compressed bytes Revit produces/accepts match what it expects).
  • Recompute the ECC trailer over the padded stream so Revit will accept the modified stream without auto-repairing it.

Practical workflow for patching/fuzzing RFA contents:

  1. Expand the OLE compound document
bash
# Expand RFA into a folder tree (storages → folders, streams → files)
CompoundFileTool /e model.rfa /o rfa_out
# rfa_out/Global/Latest is the serialized stream of interest
  1. Edit Global\Latest with gzip/ECC discipline
  • Deconstruct Global/Latest: keep the header, gunzip the payload, mutate bytes, then gzip back using Revit-compatible deflate parameters.
  • Preserve zero-padding and recompute the ECC trailer so the new bytes are accepted by Revit.
  • If you need deterministic byte-for-byte reproduction, build a minimal wrapper around Revit’s DLLs to invoke its gzip/gunzip paths and ECC computation (as demonstrated in research), or re-use any available helper that replicates these semantics.
  1. Rebuild the OLE compound document
bash
# Repack the folder tree back into an OLE file
CompoundFileTool /c rfa_out /o model_patched.rfa

Notes:

  • CompoundFileTool writes storages/streams to the filesystem with escaping for characters invalid in NTFS names; the stream path you want is exactly Global/Latest in the output tree.
  • When delivering mass attacks via ecosystem plugins that fetch RFAs from cloud storage, ensure your patched RFA passes Revit’s integrity checks locally first (gzip/ECC correct) before attempting network injection.

Exploitation insight (to guide what bytes to place in the gzip payload):

  • The Revit deserializer reads a 16-bit class index and constructs an object. Certain types are non‑polymorphic and lack vtables; abusing destructor handling yields a type confusion where the engine executes an indirect call through an attacker-controlled pointer.
  • Picking AString (class index 0x1F) places an attacker-controlled heap pointer at object offset 0. During the destructor loop, Revit effectively executes:
asm
rcx = [rbx]              ; object pointer (e.g., AString*)
rax = [rcx]              ; attacker-controlled pointer to AString buffer
call qword ptr [rax]     ; one attacker-chosen gadget per object
  • Place multiple such objects in the serialized graph so each iteration of the destructor loop executes one gadget (“weird machine”), and arrange a stack pivot into a conventional x64 ROP chain.

See Windows x64 pivot/gadget building details here:

Stack Pivoting - EBP2Ret - EBP chaining

and general ROP guidance here:

ROP & JOP

Tooling:

  • CompoundFileTool (OSS) to expand/rebuild OLE compound files: https://github.com/thezdi/CompoundFileTool
  • IDA Pro + WinDBG TTD for reverse/taint; disable page heap with TTD to keep traces compact.
  • A local proxy (e.g., Fiddler) can simulate supply-chain delivery by swapping RFAs in plugin traffic for testing.

References

tip

Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)
Learn & practice Az Hacking: HackTricks Training Azure Red Team Expert (AzRTE)

Support HackTricks