What You'll Learn
- Explain what static analysis is and why it is the first step in any malware investigation
- Identify the key components of the PE (Portable Executable) file format: DOS header, PE header, section table, and entry point
- Describe the purpose of common PE sections (.text, .data, .rsrc, .reloc) and what anomalies to look for in each
- Extract strings from a binary using FLOSS and the
stringscommand on both Windows and Linux - Identify suspicious string categories: URLs, IP addresses, file paths, API calls, registry keys, and encoded data
- Apply a string analysis workflow to perform initial triage on an unknown binary
- Connect static analysis findings to YARA rules (Module 10) and CyberChef for deeper investigation
- Interpret compilation timestamps and linker metadata to assess binary origin and age
Why Static Analysis Comes First
When a suspicious file lands on your desk — pulled from a quarantine folder, extracted from a phishing email, or flagged by Wazuh — you face a critical decision: do you run it, or do you read it?
Static analysis means examining a binary without executing it. You inspect its structure, read its strings, examine its imports, check its metadata — all without letting it touch a running system. This is always the first step because it is safe, repeatable, and often reveals enough to classify a sample before you ever need a sandbox.
| Analysis Type | What You Do | Risk Level | Speed |
|---|---|---|---|
| Static | Examine file structure, strings, imports, metadata | Zero — file never executes | Minutes |
| Dynamic | Execute in a sandbox and observe behavior | Contained — isolated environment | 10–30 minutes |
| Manual reverse engineering | Disassemble and read code logic | Zero — file never executes | Hours to days |
Static analysis is not a replacement for dynamic analysis — it is a prerequisite. The goal is to extract as much intelligence as possible before execution. A 10-minute static pass might reveal the C2 server, the malware family, and the persistence mechanism — all without booting a sandbox. In Lab 11.1, you will perform a complete static analysis workflow on a real PE binary and extract actionable IOCs before any execution.
The PE File Format: Windows Executables Under the Microscope
Every .exe, .dll, .sys, and .scr file on Windows follows the Portable Executable (PE) format. Understanding PE structure is fundamental because malware authors must work within this format — and every shortcut they take leaves artifacts you can detect.
DOS Header and DOS Stub
Every PE file begins with the DOS header, a legacy artifact from MS-DOS compatibility. The first two bytes are always 4D 5A (the ASCII characters "MZ" — named after Mark Zbikowski, a DOS architect). This magic number is how the operating system and analysis tools recognize a file as a PE executable.
The DOS header contains one critical field for analysts: e_lfanew — a 4-byte offset at position 0x3C that points to the PE header's location. Malware authors occasionally manipulate this value to confuse basic parsers.
Following the DOS header is the DOS stub — a small program that prints "This program cannot be run in DOS mode" if someone tries to run the executable in a DOS environment. Some malware replaces this stub with custom messages or junk data.
00000000 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 |MZ..............|
00000010 B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 |........@.......|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 E0 00 00 00 |................|
PE Header (IMAGE_NT_HEADERS)
The PE header starts with the signature 50 45 00 00 ("PE\0\0") and contains two sub-structures:
File Header (COFF Header) — 20 bytes of critical metadata:
| Field | What It Tells You |
|---|---|
| Machine | Target architecture: 0x14C = x86, 0x8664 = x64 |
| NumberOfSections | How many sections the binary contains |
| TimeDateStamp | Compilation timestamp (Unix epoch format) |
| Characteristics | Flags: executable, DLL, large address aware, etc. |
Optional Header — despite the name, it is mandatory for executables:
| Field | What It Tells You |
|---|---|
| AddressOfEntryPoint | RVA where execution begins — malware may point this to an unusual section |
| ImageBase | Preferred load address (typically 0x00400000 for EXEs, 0x10000000 for DLLs) |
| SectionAlignment / FileAlignment | Memory and disk alignment values |
| SizeOfImage | Total size when loaded in memory |
| Subsystem | GUI (0x02) vs Console (0x03) — a "GUI" app with no window is suspicious |
| DataDirectory | Array of 16 entries pointing to imports, exports, resources, relocations, etc. |
Compilation timestamps are trivially spoofed. Malware authors routinely set fake timestamps to mislead investigators. A timestamp of January 1, 1970 (epoch zero) or a date far in the future is an obvious fake. A timestamp that exactly matches another known-good binary suggests timestomping. Use timestamps as one data point, never as conclusive evidence. Cross-reference with other metadata like the linker version and Rich header hash.
Section Table and Common Sections
After the PE header comes the section table — an array of headers describing each section in the binary. Every section has a name, virtual address, virtual size, raw size, and characteristics flags.
| Section | Purpose | What to Watch For |
|---|---|---|
| .text | Executable code | Unusually small .text + large unknown section = packed binary |
| .data | Initialized global and static variables | Strings, configuration data, embedded payloads |
| .rdata | Read-only data, import/export tables | Import table analysis reveals API usage |
| .rsrc | Resources: icons, dialogs, version info, embedded files | Embedded executables, encrypted payloads hidden as resources |
| .reloc | Relocation table for ASLR | Missing .reloc with ASLR enabled = anomaly |
| UPX0, UPX1 | UPX packer sections | Clear indicator of UPX packing |
| .themida | Themida protector | Commercial packer/protector, common in crimeware |
Section names are cosmetic — the OS ignores them. Malware can name sections anything: .code, .xyz, or even an empty string. What matters is the characteristics flags. A section marked as both writable and executable (0xE0000020) is a red flag — legitimate software rarely needs self-modifying code outside of packers and JIT compilers.
Entry Point Analysis
The AddressOfEntryPoint field tells the OS where to start executing code. In legitimate software, this points into the .text section. Anomalies to watch for:
- Entry point in a non-standard section (not
.text) — suggests packing or injection - Entry point at the very end of a section — common in appended shellcode
- Entry point at offset 0 of a section with high entropy — likely packed or encrypted
- Entry point in a section with a suspicious name (
UPX1,.packed, random characters)
Extracting and Analyzing Strings
Strings are the single most productive static analysis technique for initial triage. Embedded text in a binary reveals what the malware communicates with, what it modifies, and what tools or techniques it uses.
The strings Command
On Linux, the strings command extracts printable ASCII sequences of a minimum length (default 4 characters):
strings suspicious.exe | head -50
strings -n 8 suspicious.exe # minimum 8 characters (reduces noise)
strings -e l suspicious.exe # extract UTF-16LE strings (common in Windows binaries)
On Windows, Sysinternals strings.exe provides equivalent functionality:
strings64.exe -n 8 suspicious.exe
strings64.exe -accepteula suspicious.exe | Select-String -Pattern "http"
FLOSS: Beyond Basic Strings
The FLARE Obfuscated String Solver (FLOSS) from Mandiant goes far beyond strings. It uses static analysis techniques to automatically deobfuscate strings that malware encrypts or encodes at compile time:
floss suspicious.exe
floss --no stack_strings suspicious.exe # skip stack strings for faster results
floss -o floss_output.json suspicious.exe # JSON output for scripting
| Tool | Finds Static Strings | Finds Stack Strings | Deobfuscates Encoded Strings |
|---|---|---|---|
strings | Yes | No | No |
| FLOSS | Yes | Yes | Yes |
Never run FLOSS on a file you suspect is malicious on your analysis workstation without isolation. FLOSS performs partial emulation to decode strings, which can trigger certain behaviors. Always run string extraction tools inside your analysis VM or container — never on your host system.
Suspicious String Categories
When reviewing extracted strings, categorize them systematically:
Network Indicators:
- URLs:
http://,https://,ftp:// - IP addresses:
192.168.,10.0., or public IPs - Domain names: especially DGA-looking domains (
xkjr2.duckdns.org) - User-Agent strings:
Mozilla/5.0, custom agents
File System Indicators:
- Windows paths:
C:\\Users\\,C:\\Windows\\Temp\\,%APPDATA% - Linux paths:
/tmp/,/etc/cron.d/,/var/log/ - File extensions:
.bat,.ps1,.vbs,.dll - Known malware drop locations:
C:\\ProgramData\\,C:\\Users\\Public\\
Windows API Calls:
- Process manipulation:
CreateRemoteThread,VirtualAllocEx,WriteProcessMemory - Execution:
WinExec,ShellExecute,CreateProcess - Network:
InternetOpen,URLDownloadToFile,HttpSendRequest - Registry:
RegSetValueEx,RegCreateKey - Crypto:
CryptEncrypt,CryptDecrypt,BCryptEncrypt
Persistence Indicators:
- Registry keys:
SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run - Service creation:
CreateService,sc create - Scheduled tasks:
schtasks,at.exe
Encoded / Obfuscated Data:
- Base64 strings: long alphanumeric sequences ending in
=or== - Hex-encoded data: continuous hex characters
- XOR keys: short repeated byte sequences
String Analysis Workflow
Efficient string analysis follows a structured workflow that moves from broad extraction to targeted investigation:
Step 1: Extract — Run strings (ASCII and UTF-16) and FLOSS on the binary. Pipe output to a file for reference.
strings -n 6 sample.exe > strings_ascii.txt
strings -n 6 -e l sample.exe > strings_utf16.txt
floss sample.exe > strings_floss.txt
Step 2: Filter noise — Remove common library strings, compiler artifacts, and Windows API boilerplate. Focus on unique, unusual, or contextually suspicious strings.
grep -iE "(http|ftp|\\.[a-z]{2,4}/|[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)" strings_ascii.txt
grep -iE "(CreateRemoteThread|VirtualAlloc|WriteProcessMemory|URLDownload)" strings_ascii.txt
grep -iE "(CurrentVersion\\\\Run|schtasks|cron)" strings_ascii.txt
Step 3: Categorize — Group findings into network IOCs, file system IOCs, behavioral indicators, and encoded data.
Step 4: Pivot — Take discovered IOCs and search for them in threat intelligence platforms. A URL found in strings can be checked in VirusTotal, MISP, or URLhaus. An API call pattern can be matched against known malware family profiles.
Step 5: Document — Record every finding with the offset where the string was found, the category, and its significance.
Connecting Static Analysis to Your Toolkit
Static analysis does not exist in isolation. Every finding connects to tools you already know:
| Finding | Next Step | Tool |
|---|---|---|
| Suspicious string pattern | Write a detection rule for it | YARA (Module 10) |
| Base64-encoded payload | Decode and analyze the payload | CyberChef |
| C2 domain or IP | Search threat intelligence feeds | MISP (Module 5) |
| Compilation timestamp | Correlate with campaign timelines | MISP timeline / ATT&CK |
| API call pattern | Create endpoint detection | Velociraptor (Module 6) |
| File hash (MD5/SHA256) | Check reputation databases | VirusTotal / MalwareBazaar |
YARA and static analysis are natural partners. In Module 10, you wrote YARA rules that match on strings and hex patterns. Every suspicious string you extract during static analysis is a candidate for a YARA rule. In Lab 11.1, you will practice the full loop: extract strings → write a YARA rule → scan a directory for additional samples matching the same patterns.
Linux ELF Binaries: The Other Side
While PE is the dominant format on Windows, Linux malware uses the ELF (Executable and Linkable Format). The same static analysis principles apply:
file suspicious_binary
# suspicious_binary: ELF 64-bit LSB executable, x86-64, dynamically linked
readelf -h suspicious_binary # ELF header (entry point, architecture, type)
readelf -S suspicious_binary # section headers (similar to PE sections)
readelf -d suspicious_binary # dynamic section (shared library dependencies)
strings -n 8 suspicious_binary | grep -iE "(http|/tmp/|/bin/|socket|connect)"
| PE Concept | ELF Equivalent |
|---|---|
| .text section | .text section |
| .data section | .data / .bss sections |
| .rsrc section | No direct equivalent (resources handled differently) |
| Import Address Table | .dynsym / .plt (dynamic symbols and procedure linkage table) |
| PE header | ELF header (readelf -h) |
| DLL dependencies | Shared library dependencies (ldd or readelf -d) |
Key Takeaways
- Static analysis examines a binary without executing it — it is always the first step because it is safe, fast, and often reveals enough to classify a sample
- The PE format has a predictable structure: DOS header (MZ magic), PE header (compilation timestamp, entry point, characteristics), section table, and section data
- Section anomalies reveal packing and tampering: writable+executable sections, entry points outside .text, unusual section names, or entropy mismatches
- Compilation timestamps provide timeline intelligence but are trivially spoofed — always cross-reference with other metadata
- String extraction using
stringsand FLOSS is the highest-value static technique: URLs, IPs, API calls, registry keys, and encoded data all reveal malware intent - Follow a structured string analysis workflow: extract → filter → categorize → pivot → document
- Every static finding connects to your existing toolkit: strings feed YARA rules, encoded data feeds CyberChef, network IOCs feed MISP, API patterns feed Velociraptor hunts
- ELF binaries on Linux follow the same analysis principles — use
readelf,file, andstringsinstead of PE-specific tools
What's Next
You now know how to examine a binary's structure and extract strings — the "what is this file made of?" question. But two critical questions remain: "Have we seen this file before?" and "Is this file hiding something?" In Lesson 11.2, you will learn to hash files for reputation lookups, detect packers that compress and encrypt code, and analyze the Import Address Table to understand what Windows APIs a binary calls — the next layer of static analysis that separates commodity malware from sophisticated threats.
Knowledge Check: PE Structure & String Analysis
10 questions · 70% to pass
What is the primary advantage of static analysis over dynamic analysis as the first step in malware investigation?
What are the first two bytes (magic number) of every valid PE file?
Which PE section typically contains the executable code of a binary?
In Lab 11.1, you extract strings from a PE binary and find the string 'CreateRemoteThread'. What category of suspicious activity does this API call indicate?
What advantage does FLOSS provide over the standard 'strings' command?
You find a PE section with both the writable and executable characteristics flags set. Why is this a red flag?
During string analysis of a suspicious binary in Lab 11.1, you find 'SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run'. What does this indicate?
Why should compilation timestamps in PE headers be treated with caution during analysis?
What is the correct order of steps in a string analysis workflow for malware triage?
Which command extracts UTF-16 Little Endian strings from a binary on Linux — a critical step since Windows binaries often store strings in this encoding?
0/10 answered