What You'll Learn
- Calculate MD5, SHA1, and SHA256 hashes for malware samples on both Windows and Linux
- Explain why cryptographic hashing is fundamental to malware identification and tracking
- Use fuzzy hashing (ssdeep) to identify similar samples that share partial code
- Perform hash lookups on VirusTotal and MalwareBazaar to determine file reputation
- Identify packed binaries through high entropy sections, minimal imports, and small .text sections
- Recognize common packers (UPX, Themida, custom) and understand basic unpacking approaches
- Analyze the Import Address Table (IAT) to identify malicious API call patterns
- Assess section entropy to detect encrypted or compressed code regions
File Hashing: The Fingerprint of Every Binary
In Lesson 11.1, you extracted strings and examined PE structure. But before you share findings with your team, threat intel feeds, or incident reports, you need one thing: a unique identifier for the file. That identifier is a cryptographic hash.
A hash function takes a file of any size and produces a fixed-length string — a digital fingerprint. Change a single byte in the file, and the hash changes completely. This property makes hashes the universal language of malware identification.
| Algorithm | Output Length | Speed | Use Case |
|---|---|---|---|
| MD5 | 128 bits (32 hex chars) | Fastest | Legacy lookups, quick dedup — collisions exist, not cryptographically secure |
| SHA1 | 160 bits (40 hex chars) | Fast | VirusTotal primary index, still widely used despite theoretical weaknesses |
| SHA256 | 256 bits (64 hex chars) | Moderate | Gold standard for malware identification — no known practical collisions |
Generating Hashes
Linux:
md5sum suspicious.exe
sha1sum suspicious.exe
sha256sum suspicious.exe
sha256sum suspicious.exe | awk '{print $1}'
Windows (PowerShell):
Get-FileHash suspicious.exe -Algorithm MD5
Get-FileHash suspicious.exe -Algorithm SHA1
Get-FileHash suspicious.exe -Algorithm SHA256
Python (for automation):
import hashlib
with open("suspicious.exe", "rb") as f:
data = f.read()
print(f"MD5: {hashlib.md5(data).hexdigest()}")
print(f"SHA1: {hashlib.sha1(data).hexdigest()}")
print(f"SHA256: {hashlib.sha256(data).hexdigest()}")
Always calculate all three hashes. Different platforms index samples differently. VirusTotal uses SHA256 as the primary key but also accepts MD5 and SHA1 searches. MalwareBazaar primarily uses SHA256. Some legacy SIEM rules still reference MD5 hashes. In Lab 11.2, you will generate a hash report for a sample and cross-reference it across multiple platforms.
Fuzzy Hashing with ssdeep
Cryptographic hashes have a limitation: change one byte and the hash changes completely. Malware authors exploit this by recompiling with minor modifications — a different C2 server, a changed variable name — producing a sample with identical behavior but a completely different SHA256 hash.
ssdeep solves this with context-triggered piecewise hashing (fuzzy hashing). Instead of hashing the entire file, ssdeep divides it into variable-length chunks and hashes each chunk independently. The result is a hash that can be compared for similarity, not just equality.
ssdeep suspicious.exe
# 768:Gj4TtmMbhQovVWq+x3RTHdha9cAqhbB7GoNai:Gj4TtHbhQovVN+xBTya9FqhZ7Gi
ssdeep -r /malware/samples/ > hashes.txt
ssdeep -d hashes.txt
| Similarity Score | Interpretation |
|---|---|
| 100 | Identical files |
| > 75 | Very likely variants of the same malware family |
| 50–75 | Possibly related — shared code libraries or components |
| < 50 | Likely unrelated |
| 0 | No detectable similarity |
ssdeep is your variant hunter. When investigating an incident, take the fuzzy hash of the initial sample and compare it against your entire malware repository. You may discover variants the attacker deployed weeks before the one you found — giving you a fuller picture of the campaign timeline and scope.
Hash Lookups: Has Anyone Seen This Before?
Once you have the hash, the first question is: has the security community already analyzed this file?
VirusTotal
VirusTotal aggregates results from 70+ antivirus engines and provides behavioral analysis, community comments, and network indicators.
# Search by SHA256 hash (API)
curl -s "https://www.virustotal.com/api/v3/files/SHA256_HASH_HERE" \
-H "x-apikey: YOUR_API_KEY" | python3 -m json.tool
# Web interface: https://www.virustotal.com/gui/file/SHA256_HASH_HERE
What to look for on VirusTotal:
| Tab | What It Tells You |
|---|---|
| Detection | How many engines flag it and what names they give it |
| Details | PE metadata: sections, imports, compilation timestamp, resources |
| Relations | Contacted domains, IPs, dropped files, parent samples |
| Behavior | Sandbox execution results: process tree, file operations, network connections |
| Community | Analyst comments with context, IOCs, and campaign attribution |
MalwareBazaar
MalwareBazaar (by abuse.ch) is a malware sample sharing platform focused on tracking active threats:
curl -s -X POST "https://mb-api.abuse.ch/api/v1/" \
-d "query=get_info&hash=SHA256_HASH_HERE"
MISP Integration
In Module 5, you learned to use MISP for threat intelligence. Every hash you extract during malware analysis should be checked against your MISP instance — and new hashes should be added as IOCs to share with your community.
Be careful what you upload. Submitting a file to VirusTotal makes it available to all premium subscribers — including threat actors who monitor VT for their own malware to detect when it has been discovered. For sensitive samples from active investigations, search by hash first (which does not reveal you have the file). Only upload if the hash returns no results and you need the analysis.
Packer Detection: When the Binary Hides Its True Self
Packing is a technique where malware is compressed, encrypted, or otherwise transformed so that static analysis reveals nothing useful. The packed binary contains a small unpacking stub that reconstructs the original code in memory at runtime.
Why Attackers Pack Binaries
- Evade signatures: Antivirus rules written for the original strings and byte patterns will not match the packed version
- Defeat static analysis: Strings, imports, and PE structure of the packed binary reveal the packer, not the malware
- Reduce file size: Some packers (like UPX) compress binaries to 30-50% of original size
- Slow down analysts: Unpacking adds time and complexity to the investigation
Identifying Packed Binaries
| Indicator | What to Check | Packed Binary |
|---|---|---|
| Import count | Number of imported functions | Very few imports (< 10), often just LoadLibrary + GetProcAddress |
| Section names | PE section table | Non-standard names: UPX0, UPX1, .themida, .vmp0, random strings |
| Section entropy | Shannon entropy per section | One or more sections with entropy > 7.0 (near-random data) |
| .text section size | Raw size of code section | Abnormally small — real code is compressed elsewhere |
| Section characteristics | Writable + executable flags | Unpacking stub needs to write and execute code |
| String count | Total extractable strings | Very few meaningful strings — most content is encrypted |
| Entry point location | Which section contains the entry point | Points to a section other than .text |
Common Packers
| Packer | Type | Detection | Difficulty to Unpack |
|---|---|---|---|
| UPX | Open-source compressor | Section names UPX0/UPX1, or upx -t check | Easy — upx -d sample.exe |
| Themida / WinLicense | Commercial protector | Section .themida, anti-debug, VM detection | Hard — requires dedicated tools |
| VMProtect | VM-based protector | Section .vmp0/.vmp1, code virtualization | Very hard — code is virtualized |
| MPRESS | Compressor | Section .MPRESS1/.MPRESS2 | Moderate — similar to UPX |
| Custom packers | Malware-specific | No known signatures, unusual section layout | Varies — may require manual unpacking |
Entropy Analysis
Shannon entropy measures randomness on a scale from 0 (perfectly ordered) to 8 (perfectly random). Encrypted or compressed data has high entropy. Normal compiled code has moderate entropy.
python3 -c "
import math, sys
with open(sys.argv[1], 'rb') as f:
data = f.read()
freq = [0]*256
for b in data: freq[b] += 1
entropy = -sum((c/len(data)) * math.log2(c/len(data)) for c in freq if c > 0)
print(f'Entropy: {entropy:.4f}')
" suspicious.exe
| Entropy Range | Interpretation |
|---|---|
| 0.0 – 1.0 | Very structured (padding, null bytes) |
| 4.0 – 6.0 | Normal compiled code or text |
| 6.5 – 7.2 | Possibly compressed (could be legitimate resources like images) |
| 7.2 – 8.0 | Almost certainly encrypted or compressed — strong packing indicator |
High entropy alone does not prove packing. Legitimate binaries can contain high-entropy sections for compressed resources (images, fonts, embedded databases). The key is correlating entropy with other indicators: few imports + high entropy + unusual sections + small .text = packed. High entropy in .rsrc with normal everything else = probably just compressed resources.
Basic UPX Unpacking
UPX is the most common packer encountered in malware analysis. Unpacking is trivial when UPX headers are intact:
upx -t suspicious.exe
# suspicious.exe: UPX executable packer [OK]
upx -d suspicious.exe -o unpacked.exe
strings unpacked.exe | wc -l
# Now you see the real strings
Some malware authors modify UPX headers to prevent the standard upx -d command from working. In those cases, you need to fix the headers manually or use dynamic unpacking (dump the process memory after the unpacking stub runs).
Import Address Table (IAT) Analysis
The Import Address Table lists every external function the binary calls from Windows DLLs. This is one of the most powerful static analysis artifacts because the APIs a binary imports directly reveal its capabilities.
Reading the Import Table
Linux (on a PE file):
objdump -x suspicious.exe | grep -A 100 "Import Address Table"
python3 -c "
import pefile
pe = pefile.PE('suspicious.exe')
for entry in pe.DIRECTORY_ENTRY_IMPORT:
print(f'\n{entry.dll.decode()}:')
for imp in entry.imports:
print(f' {imp.name.decode() if imp.name else hex(imp.ordinal)}')
"
Windows (PowerShell with dumpbin):
dumpbin /imports suspicious.exe
Suspicious API Call Categories
Process Injection:
| API | What It Does | Why Malware Uses It |
|---|---|---|
OpenProcess | Opens a handle to another process | Required first step for injection |
VirtualAllocEx | Allocates memory in another process | Creates space for injected code |
WriteProcessMemory | Writes data to another process's memory | Writes the payload |
CreateRemoteThread | Creates a thread in another process | Executes the injected payload |
NtUnmapViewOfSection | Unmaps a section of a process | Process hollowing technique |
Execution and Download:
| API | What It Does | Why Malware Uses It |
|---|---|---|
WinExec / ShellExecuteA | Executes a command or program | Launches second-stage payloads |
CreateProcessA/W | Creates a new process | Spawns child processes |
URLDownloadToFileA | Downloads a file from a URL | Retrieves additional payloads |
InternetOpenA / HttpSendRequestA | Opens HTTP connections | C2 communication |
Persistence:
| API | What It Does | Why Malware Uses It |
|---|---|---|
RegSetValueExA | Sets a registry value | Autorun keys, configuration storage |
CreateServiceA | Creates a Windows service | Service-based persistence |
Defense Evasion:
| API | What It Does | Why Malware Uses It |
|---|---|---|
IsDebuggerPresent | Checks if a debugger is attached | Anti-analysis check |
GetTickCount / Sleep | Time-based checks | Sandbox evasion (detect fast-forwarding) |
VirtualProtect | Changes memory protection flags | Makes memory executable for unpacking |
A binary that imports only LoadLibraryA and GetProcAddress is almost certainly packed or using dynamic API resolution. These two functions allow a program to load any DLL and resolve any function at runtime — hiding the real imports from static analysis. In Lab 11.2, you will compare the import table of a packed sample (2 imports) versus its unpacked version (50+ imports) to see exactly what was hidden.
DLL Analysis
Malware does not always arrive as a standalone .exe. Many advanced threats use DLL side-loading or DLL injection — placing a malicious DLL where a legitimate application will load it.
When analyzing a DLL:
python3 -c "
import pefile
pe = pefile.PE('suspicious.dll')
print(f'Exports ({len(pe.DIRECTORY_ENTRY_EXPORT.symbols)}):')
for exp in pe.DIRECTORY_ENTRY_EXPORT.symbols:
print(f' {exp.ordinal}: {exp.name.decode() if exp.name else "(ordinal only)"}')
"
| DLL Red Flag | What It Means |
|---|---|
| Export names that mimic legitimate Windows DLLs | Possible DLL hijacking/side-loading |
Single export function with a generic name (ServiceMain, DllMain) | Minimal interface, likely a loader |
| Export by ordinal only (no function names) | Hiding function purposes |
| DLL with no exports at all | Designed to be loaded via LoadLibrary for DllMain execution only |
Key Takeaways
- Cryptographic hashing (MD5, SHA1, SHA256) provides unique file identification — always compute all three for cross-platform lookups
- Fuzzy hashing (ssdeep) identifies malware variants that share code despite different cryptographic hashes
- Hash lookups on VirusTotal, MalwareBazaar, and MISP reveal whether the security community has already analyzed your sample — search by hash before uploading files
- Packing compresses or encrypts binaries to defeat static analysis — detect it through high entropy, few imports, unusual sections, and small .text
- UPX is the most common packer and trivially unpacked with
upx -d; commercial protectors like Themida and VMProtect require advanced techniques - Entropy analysis (Shannon entropy) quantifies randomness — sections above 7.0 are almost certainly encrypted or compressed
- Import Address Table analysis reveals a binary's capabilities: process injection APIs, network functions, persistence mechanisms, and evasion techniques
- A binary importing only
LoadLibraryAandGetProcAddressis hiding its real imports through dynamic resolution — a key packing indicator - DLL analysis extends the same techniques to side-loading and injection scenarios common in advanced threats
What's Next
Static analysis has told you what the file contains, how it is built, and what APIs it calls. But there are questions static analysis cannot answer: What does the malware actually do when it runs? What files does it create? What processes does it spawn? What network connections does it make? In Lesson 11.3, you will cross into dynamic analysis — executing malware in a controlled sandbox and monitoring its behavior with Process Monitor, Process Explorer, and Autoruns to build a complete behavioral profile.
Knowledge Check: Hashing, Packing & Imports
10 questions · 70% to pass
Which hashing algorithm is considered the gold standard for malware identification due to its resistance to collisions?
What problem does ssdeep (fuzzy hashing) solve that cryptographic hashing cannot?
In Lab 11.2, you analyze a packed binary and find it imports only LoadLibraryA and GetProcAddress. Why are these two imports significant?
What Shannon entropy range most strongly indicates that a PE section contains encrypted or compressed data?
Why should you search VirusTotal by hash rather than uploading the file during an active investigation?
Which packer can be trivially removed using its own command-line tool with the -d flag?
You find a binary in Lab 11.2 with sections named UPX0 and UPX1, high entropy in UPX1, and only 3 imported functions. What is the most likely explanation?
Which combination of Windows API imports most strongly suggests process injection capability?
On a Linux system, which command generates the SHA256 hash of a file?
A DLL has no named exports and only exports functions by ordinal. What does this suggest?
0/10 answered