- Explain what static analysis is and why it is the first step in any malware investigation - Identify the key components of the PE (Portable Executable) file format: DOS header, PE header, section table, and entry point - Describe the purpose of common PE sections (.text, .data, .rsrc, .reloc) and what anomalies to look for in each - Extract strings from a binary using FLOSS and the `strings` command on both Windows and Linux - Identify suspicious string categories: URLs, IP addresses, file paths, API calls, registry keys, and encoded data - Apply a string analysis workflow to perform initial triage on an unknown binary - Connect static analysis findings to YARA rules (Module 10) and CyberChef for deeper investigation - Interpret compilation timestamps and linker metadata to assess binary origin and age

## Why Static Analysis Comes First When a suspicious file lands on your desk — pulled from a quarantine folder, extracted from a phishing email, or flagged by Wazuh — you face a critical decision: **do you run it, or do you read it?** Static analysis means examining a binary **without executing it**. You inspect its structure, read its strings, examine its imports, check its metadata — all without letting it touch a running system. This is always the first step because it is safe, repeatable, and often reveals enough to classify a sample before you ever need a sandbox. | Analysis Type | What You Do | Risk Level | Speed | |---|---|---|---| | **Static** | Examine file structure, strings, imports, metadata | Zero — file never executes | Minutes | | **Dynamic** | Execute in a sandbox and observe behavior | Contained — isolated environment | 10–30 minutes | | **Manual reverse engineering** | Disassemble and read code logic | Zero — file never executes | Hours to days |

**Static analysis is not a replacement for dynamic analysis — it is a prerequisite.** The goal is to extract as much intelligence as possible before execution. A 10-minute static pass might reveal the C2 server, the malware family, and the persistence mechanism — all without booting a sandbox. In Lab 11.1, you will perform a complete static analysis workflow on a real PE binary and extract actionable IOCs before any execution.

## The PE File Format: Windows Executables Under the Microscope Every `.exe`, `.dll`, `.sys`, and `.scr` file on Windows follows the **Portable Executable (PE)** format. Understanding PE structure is fundamental because malware authors must work within this format — and every shortcut they take leaves artifacts you can detect. ![PE file structure — from DOS header through PE header, section table, and section data](https://cyberblue-academy-content.s3.us-east-2.amazonaws.com/courses/cyberbluesoc-academy/module-malware/lesson-ma-1/pe-file-structure.png) ### DOS Header and DOS Stub Every PE file begins with the **DOS header**, a legacy artifact from MS-DOS compatibility. The first two bytes are always `4D 5A` (the ASCII characters "MZ" — named after Mark Zbikowski, a DOS architect). This magic number is how the operating system and analysis tools recognize a file as a PE executable. The DOS header contains one critical field for analysts: **e_lfanew** — a 4-byte offset at position 0x3C that points to the PE header's location. Malware authors occasionally manipulate this value to confuse basic parsers. Following the DOS header is the **DOS stub** — a small program that prints "This program cannot be run in DOS mode" if someone tries to run the executable in a DOS environment. Some malware replaces this stub with custom messages or junk data. ``` 00000000 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 |MZ..............| 00000010 B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 |........@.......| 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 E0 00 00 00 |................| ``` ### PE Header (IMAGE_NT_HEADERS) The PE header starts with the signature `50 45 00 00` ("PE\0\0") and contains two sub-structures: **File Header (COFF Header)** — 20 bytes of critical metadata: | Field | What It Tells You | |---|---| | **Machine** | Target architecture: `0x14C` = x86, `0x8664` = x64 | | **NumberOfSections** | How many sections the binary contains | | **TimeDateStamp** | Compilation timestamp (Unix epoch format) | | **Characteristics** | Flags: executable, DLL, large address aware, etc. | **Optional Header** — despite the name, it is mandatory for executables: | Field | What It Tells You | |---|---| | **AddressOfEntryPoint** | RVA where execution begins — malware may point this to an unusual section | | **ImageBase** | Preferred load address (typically `0x00400000` for EXEs, `0x10000000` for DLLs) | | **SectionAlignment / FileAlignment** | Memory and disk alignment values | | **SizeOfImage** | Total size when loaded in memory | | **Subsystem** | GUI (`0x02`) vs Console (`0x03`) — a "GUI" app with no window is suspicious | | **DataDirectory** | Array of 16 entries pointing to imports, exports, resources, relocations, etc. |

**Compilation timestamps are trivially spoofed.** Malware authors routinely set fake timestamps to mislead investigators. A timestamp of January 1, 1970 (epoch zero) or a date far in the future is an obvious fake. A timestamp that exactly matches another known-good binary suggests timestomping. Use timestamps as one data point, never as conclusive evidence. Cross-reference with other metadata like the linker version and Rich header hash.

### Section Table and Common Sections After the PE header comes the **section table** — an array of headers describing each section in the binary. Every section has a name, virtual address, virtual size, raw size, and characteristics flags. | Section | Purpose | What to Watch For | |---|---|---| | **.text** | Executable code | Unusually small .text + large unknown section = packed binary | | **.data** | Initialized global and static variables | Strings, configuration data, embedded payloads | | **.rdata** | Read-only data, import/export tables | Import table analysis reveals API usage | | **.rsrc** | Resources: icons, dialogs, version info, embedded files | Embedded executables, encrypted payloads hidden as resources | | **.reloc** | Relocation table for ASLR | Missing .reloc with ASLR enabled = anomaly | | **UPX0, UPX1** | UPX packer sections | Clear indicator of UPX packing | | **.themida** | Themida protector | Commercial packer/protector, common in crimeware |

**Section names are cosmetic — the OS ignores them.** Malware can name sections anything: `.code`, `.xyz`, or even an empty string. What matters is the **characteristics flags**. A section marked as both writable and executable (`0xE0000020`) is a red flag — legitimate software rarely needs self-modifying code outside of packers and JIT compilers.

### Entry Point Analysis The **AddressOfEntryPoint** field tells the OS where to start executing code. In legitimate software, this points into the `.text` section. Anomalies to watch for: - Entry point in a non-standard section (not `.text`) — suggests packing or injection - Entry point at the very end of a section — common in appended shellcode - Entry point at offset 0 of a section with high entropy — likely packed or encrypted - Entry point in a section with a suspicious name (`UPX1`, `.packed`, random characters) ## Extracting and Analyzing Strings Strings are the single most productive static analysis technique for initial triage. Embedded text in a binary reveals what the malware communicates with, what it modifies, and what tools or techniques it uses. ### The strings Command On Linux, the `strings` command extracts printable ASCII sequences of a minimum length (default 4 characters): ```bash strings suspicious.exe | head -50 strings -n 8 suspicious.exe # minimum 8 characters (reduces noise) strings -e l suspicious.exe # extract UTF-16LE strings (common in Windows binaries) ``` On Windows, Sysinternals `strings.exe` provides equivalent functionality: ```powershell strings64.exe -n 8 suspicious.exe strings64.exe -accepteula suspicious.exe | Select-String -Pattern "http" ``` ### FLOSS: Beyond Basic Strings The **FLARE Obfuscated String Solver (FLOSS)** from Mandiant goes far beyond `strings`. It uses static analysis techniques to automatically deobfuscate strings that malware encrypts or encodes at compile time: ```bash floss suspicious.exe floss --no stack_strings suspicious.exe # skip stack strings for faster results floss -o floss_output.json suspicious.exe # JSON output for scripting ``` | Tool | Finds Static Strings | Finds Stack Strings | Deobfuscates Encoded Strings | |---|---|---|---| | `strings` | Yes | No | No | | FLOSS | Yes | Yes | Yes |

**Never run FLOSS on a file you suspect is malicious on your analysis workstation without isolation.** FLOSS performs partial emulation to decode strings, which can trigger certain behaviors. Always run string extraction tools inside your analysis VM or container — never on your host system.

### Suspicious String Categories When reviewing extracted strings, categorize them systematically: **Network Indicators:** - URLs: `http://`, `https://`, `ftp://` - IP addresses: `192.168.`, `10.0.`, or public IPs - Domain names: especially DGA-looking domains (`xkjr2.duckdns.org`) - User-Agent strings: `Mozilla/5.0`, custom agents **File System Indicators:** - Windows paths: `C:\\Users\\`, `C:\\Windows\\Temp\\`, `%APPDATA%` - Linux paths: `/tmp/`, `/etc/cron.d/`, `/var/log/` - File extensions: `.bat`, `.ps1`, `.vbs`, `.dll` - Known malware drop locations: `C:\\ProgramData\\`, `C:\\Users\\Public\\` **Windows API Calls:** - Process manipulation: `CreateRemoteThread`, `VirtualAllocEx`, `WriteProcessMemory` - Execution: `WinExec`, `ShellExecute`, `CreateProcess` - Network: `InternetOpen`, `URLDownloadToFile`, `HttpSendRequest` - Registry: `RegSetValueEx`, `RegCreateKey` - Crypto: `CryptEncrypt`, `CryptDecrypt`, `BCryptEncrypt` **Persistence Indicators:** - Registry keys: `SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run` - Service creation: `CreateService`, `sc create` - Scheduled tasks: `schtasks`, `at.exe` **Encoded / Obfuscated Data:** - Base64 strings: long alphanumeric sequences ending in `=` or `==` - Hex-encoded data: continuous hex characters - XOR keys: short repeated byte sequences ## String Analysis Workflow Efficient string analysis follows a structured workflow that moves from broad extraction to targeted investigation: ![String analysis workflow — from extraction through categorization, pivoting, and IOC generation](https://cyberblue-academy-content.s3.us-east-2.amazonaws.com/courses/cyberbluesoc-academy/module-malware/lesson-ma-1/string-analysis-workflow.png) **Step 1: Extract** — Run `strings` (ASCII and UTF-16) and FLOSS on the binary. Pipe output to a file for reference. ```bash strings -n 6 sample.exe > strings_ascii.txt strings -n 6 -e l sample.exe > strings_utf16.txt floss sample.exe > strings_floss.txt ``` **Step 2: Filter noise** — Remove common library strings, compiler artifacts, and Windows API boilerplate. Focus on unique, unusual, or contextually suspicious strings. ```bash grep -iE "(http|ftp|\\.[a-z]{2,4}/|[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)" strings_ascii.txt grep -iE "(CreateRemoteThread|VirtualAlloc|WriteProcessMemory|URLDownload)" strings_ascii.txt grep -iE "(CurrentVersion\\\\Run|schtasks|cron)" strings_ascii.txt ``` **Step 3: Categorize** — Group findings into network IOCs, file system IOCs, behavioral indicators, and encoded data. **Step 4: Pivot** — Take discovered IOCs and search for them in threat intelligence platforms. A URL found in strings can be checked in VirusTotal, MISP, or URLhaus. An API call pattern can be matched against known malware family profiles. **Step 5: Document** — Record every finding with the offset where the string was found, the category, and its significance. ## Connecting Static Analysis to Your Toolkit Static analysis does not exist in isolation. Every finding connects to tools you already know: | Finding | Next Step | Tool | |---|---|---| | Suspicious string pattern | Write a detection rule for it | **YARA** (Module 10) | | Base64-encoded payload | Decode and analyze the payload | **CyberChef** | | C2 domain or IP | Search threat intelligence feeds | **MISP** (Module 7) | | Compilation timestamp | Correlate with campaign timelines | **MISP timeline / ATT&CK** | | API call pattern | Create endpoint detection | **Velociraptor** (Module 8) | | File hash (MD5/SHA256) | Check reputation databases | **VirusTotal / MalwareBazaar** |

**YARA and static analysis are natural partners.** In Module 10, you wrote YARA rules that match on strings and hex patterns. Every suspicious string you extract during static analysis is a candidate for a YARA rule. In Lab 11.1, you will practice the full loop: extract strings → write a YARA rule → scan a directory for additional samples matching the same patterns.

## PE Inspection Tools for Analysts While command-line tools (`strings`, FLOSS, `objdump`, `readelf`) are essential for lab and scripted workflows, GUI-based PE inspection tools provide visual, interactive analysis that accelerates triage — especially when examining headers, imports, and resources. These are standard tools in every malware analyst's workstation. | Tool | What It Shows | Best For | Platform | |---|---|---|---| | **PEStudio** | Imports, strings, resources, indicators, VirusTotal scores, section entropy | First-pass triage — open any PE and get an instant overview of suspicious indicators | Windows (free) | | **Detect It Easy (DIE)** | Packer/compiler identification, entropy graph per section, file type detection | Identifying what packed or compiled the binary — answers "was this UPX, Themida, or Visual C++?" | Windows, Linux, macOS (free) | | **CFF Explorer** | Full PE header editing, section viewer, import/export tables, resource browser | Deep header inspection — manually examining and understanding every PE field | Windows (free) | | **PEBear** | Visual section layout, import/export browser, resource tree, hex view, signature checks | Side-by-side comparison of sections and visual structure mapping | Windows (free) | ### Which Tool First? For most investigations, the workflow is: 1. **PEStudio** first — it highlights suspicious indicators automatically (flagged imports, anomalous entropy, known-bad strings) and gives you a triage verdict within seconds 2. **Detect It Easy** second — if PEStudio shows signs of packing (few imports, high entropy), DIE identifies the specific packer 3. **CFF Explorer or PEBear** for deep dives — when you need to understand exact header values, manually inspect the import table, or examine resource sections

**Command-line vs GUI.** In the CyberBlue Academy lab environment, you will primarily use command-line tools (`strings`, FLOSS, `objdump`, `readelf`) because the labs run in browser-based terminals. However, on a dedicated malware analysis workstation (Windows VM with REMnux or FlareVM installed), PEStudio and DIE are typically the first tools you open. Both approaches extract the same information — the GUI tools simply make it faster to navigate visually, especially when examining PE binaries with hundreds of imports.

## Linux ELF Binaries: The Other Side While PE is the dominant format on Windows, Linux malware uses the **ELF (Executable and Linkable Format)**. The same static analysis principles apply: ```bash file suspicious_binary # suspicious_binary: ELF 64-bit LSB executable, x86-64, dynamically linked readelf -h suspicious_binary # ELF header (entry point, architecture, type) readelf -S suspicious_binary # section headers (similar to PE sections) readelf -d suspicious_binary # dynamic section (shared library dependencies) strings -n 8 suspicious_binary | grep -iE "(http|/tmp/|/bin/|socket|connect)" ``` | PE Concept | ELF Equivalent | |---|---| | .text section | .text section | | .data section | .data / .bss sections | | .rsrc section | No direct equivalent (resources handled differently) | | Import Address Table | .dynsym / .plt (dynamic symbols and procedure linkage table) | | PE header | ELF header (`readelf -h`) | | DLL dependencies | Shared library dependencies (`ldd` or `readelf -d`) | ## Key Takeaways - **Static analysis** examines a binary without executing it — it is always the first step because it is safe, fast, and often reveals enough to classify a sample - The **PE format** has a predictable structure: DOS header (MZ magic), PE header (compilation timestamp, entry point, characteristics), section table, and section data - **Section anomalies** reveal packing and tampering: writable+executable sections, entry points outside .text, unusual section names, or entropy mismatches - **Compilation timestamps** provide timeline intelligence but are trivially spoofed — always cross-reference with other metadata - **String extraction** using `strings` and FLOSS is the highest-value static technique: URLs, IPs, API calls, registry keys, and encoded data all reveal malware intent - Follow a structured **string analysis workflow**: extract → filter → categorize → pivot → document - Every static finding connects to your existing toolkit: strings feed YARA rules, encoded data feeds CyberChef, network IOCs feed MISP, API patterns feed Velociraptor hunts - **ELF binaries** on Linux follow the same analysis principles — use `readelf`, `file`, and `strings` instead of PE-specific tools ## What's Next Time to examine PE structures hands-on. In **Lab 11.1 — PE File Analysis**, you'll analyze real PE files — examining DOS headers, section tables, and import directories to classify suspicious binaries and identify malware indicators in their structure.