CyberBlue Academy — Blue Team & SOC Training

What You'll Learn

Calculate MD5, SHA1, and SHA256 hashes for malware samples on both Windows and Linux
Explain why cryptographic hashing is fundamental to malware identification and tracking
Use fuzzy hashing (ssdeep) to identify similar samples that share partial code
Perform hash lookups on VirusTotal and MalwareBazaar to determine file reputation
Identify packed binaries through high entropy sections, minimal imports, and small .text sections
Recognize common packers (UPX, Themida, custom) and understand basic unpacking approaches
Analyze the Import Address Table (IAT) to identify malicious API call patterns
Assess section entropy to detect encrypted or compressed code regions

File Hashing: The Fingerprint of Every Binary

In Lesson 11.1, you extracted strings and examined PE structure. But before you share findings with your team, threat intel feeds, or incident reports, you need one thing: a unique identifier for the file. That identifier is a cryptographic hash.

A hash function takes a file of any size and produces a fixed-length string — a digital fingerprint. Change a single byte in the file, and the hash changes completely. This property makes hashes the universal language of malware identification.

Algorithm	Output Length	Speed	Use Case
MD5	128 bits (32 hex chars)	Fastest	Legacy lookups, quick dedup — collisions exist, not cryptographically secure
SHA1	160 bits (40 hex chars)	Fast	VirusTotal primary index, still widely used despite theoretical weaknesses
SHA256	256 bits (64 hex chars)	Moderate	Gold standard for malware identification — no known practical collisions

Generating Hashes

Linux:

md5sum suspicious.exe
sha1sum suspicious.exe
sha256sum suspicious.exe

sha256sum suspicious.exe | awk '{print $1}'

Windows (PowerShell):

Get-FileHash suspicious.exe -Algorithm MD5
Get-FileHash suspicious.exe -Algorithm SHA1
Get-FileHash suspicious.exe -Algorithm SHA256

Python (for automation):

import hashlib

with open("suspicious.exe", "rb") as f:
    data = f.read()
    print(f"MD5:    {hashlib.md5(data).hexdigest()}")
    print(f"SHA1:   {hashlib.sha1(data).hexdigest()}")
    print(f"SHA256: {hashlib.sha256(data).hexdigest()}")

ℹ

Always calculate all three hashes. Different platforms index samples differently. VirusTotal uses SHA256 as the primary key but also accepts MD5 and SHA1 searches. MalwareBazaar primarily uses SHA256. Some legacy SIEM rules still reference MD5 hashes. In Lab 11.2, you will generate a hash report for a sample and cross-reference it across multiple platforms.

Fuzzy Hashing with ssdeep

Cryptographic hashes have a limitation: change one byte and the hash changes completely. Malware authors exploit this by recompiling with minor modifications — a different C2 server, a changed variable name — producing a sample with identical behavior but a completely different SHA256 hash.

ssdeep solves this with context-triggered piecewise hashing (fuzzy hashing). Instead of hashing the entire file, ssdeep divides it into variable-length chunks and hashes each chunk independently. The result is a hash that can be compared for similarity, not just equality.

ssdeep suspicious.exe
# 768:Gj4TtmMbhQovVWq+x3RTHdha9cAqhbB7GoNai:Gj4TtHbhQovVN+xBTya9FqhZ7Gi

ssdeep -r /malware/samples/ > hashes.txt

ssdeep -d hashes.txt

Similarity Score	Interpretation
100	Identical files
> 75	Very likely variants of the same malware family
50–75	Possibly related — shared code libraries or components
< 50	Likely unrelated
0	No detectable similarity

💡

ssdeep is your variant hunter. When investigating an incident, take the fuzzy hash of the initial sample and compare it against your entire malware repository. You may discover variants the attacker deployed weeks before the one you found — giving you a fuller picture of the campaign timeline and scope.

Hash Lookups: Has Anyone Seen This Before?

Once you have the hash, the first question is: has the security community already analyzed this file?

VirusTotal

VirusTotal aggregates results from 70+ antivirus engines and provides behavioral analysis, community comments, and network indicators.

# Search by SHA256 hash (API)
curl -s "https://www.virustotal.com/api/v3/files/SHA256_HASH_HERE" \
  -H "x-apikey: YOUR_API_KEY" | python3 -m json.tool

# Web interface: https://www.virustotal.com/gui/file/SHA256_HASH_HERE

What to look for on VirusTotal:

Tab	What It Tells You
Detection	How many engines flag it and what names they give it
Details	PE metadata: sections, imports, compilation timestamp, resources
Relations	Contacted domains, IPs, dropped files, parent samples
Behavior	Sandbox execution results: process tree, file operations, network connections
Community	Analyst comments with context, IOCs, and campaign attribution

MalwareBazaar

MalwareBazaar (by abuse.ch) is a malware sample sharing platform focused on tracking active threats:

curl -s -X POST "https://mb-api.abuse.ch/api/v1/" \
  -d "query=get_info&hash=SHA256_HASH_HERE"

MISP Integration

In Module 5, you learned to use MISP for threat intelligence. Every hash you extract during malware analysis should be checked against your MISP instance — and new hashes should be added as IOCs to share with your community.

⚠

Be careful what you upload. Submitting a file to VirusTotal makes it available to all premium subscribers — including threat actors who monitor VT for their own malware to detect when it has been discovered. For sensitive samples from active investigations, search by hash first (which does not reveal you have the file). Only upload if the hash returns no results and you need the analysis.

Packer Detection: When the Binary Hides Its True Self

Packing is a technique where malware is compressed, encrypted, or otherwise transformed so that static analysis reveals nothing useful. The packed binary contains a small unpacking stub that reconstructs the original code in memory at runtime.

Packer detection indicators — entropy analysis, section characteristics, import count, and tool signatures

Why Attackers Pack Binaries

Evade signatures: Antivirus rules written for the original strings and byte patterns will not match the packed version
Defeat static analysis: Strings, imports, and PE structure of the packed binary reveal the packer, not the malware
Reduce file size: Some packers (like UPX) compress binaries to 30-50% of original size
Slow down analysts: Unpacking adds time and complexity to the investigation

Identifying Packed Binaries

Indicator	What to Check	Packed Binary
Import count	Number of imported functions	Very few imports (< 10), often just `LoadLibrary` + `GetProcAddress`
Section names	PE section table	Non-standard names: `UPX0`, `UPX1`, `.themida`, `.vmp0`, random strings
Section entropy	Shannon entropy per section	One or more sections with entropy > 7.0 (near-random data)
.text section size	Raw size of code section	Abnormally small — real code is compressed elsewhere
Section characteristics	Writable + executable flags	Unpacking stub needs to write and execute code
String count	Total extractable strings	Very few meaningful strings — most content is encrypted
Entry point location	Which section contains the entry point	Points to a section other than .text

Common Packers

Packer	Type	Detection	Difficulty to Unpack
UPX	Open-source compressor	Section names `UPX0`/`UPX1`, or `upx -t` check	Easy — `upx -d sample.exe`
Themida / WinLicense	Commercial protector	Section `.themida`, anti-debug, VM detection	Hard — requires dedicated tools
VMProtect	VM-based protector	Section `.vmp0`/`.vmp1`, code virtualization	Very hard — code is virtualized
MPRESS	Compressor	Section `.MPRESS1`/`.MPRESS2`	Moderate — similar to UPX
Custom packers	Malware-specific	No known signatures, unusual section layout	Varies — may require manual unpacking

Entropy Analysis

Shannon entropy measures randomness on a scale from 0 (perfectly ordered) to 8 (perfectly random). Encrypted or compressed data has high entropy. Normal compiled code has moderate entropy.

python3 -c "
import math, sys
with open(sys.argv[1], 'rb') as f:
    data = f.read()
freq = [0]*256
for b in data: freq[b] += 1
entropy = -sum((c/len(data)) * math.log2(c/len(data)) for c in freq if c > 0)
print(f'Entropy: {entropy:.4f}')
" suspicious.exe

Entropy Range	Interpretation
0.0 – 1.0	Very structured (padding, null bytes)
4.0 – 6.0	Normal compiled code or text
6.5 – 7.2	Possibly compressed (could be legitimate resources like images)
7.2 – 8.0	Almost certainly encrypted or compressed — strong packing indicator

🚨

High entropy alone does not prove packing. Legitimate binaries can contain high-entropy sections for compressed resources (images, fonts, embedded databases). The key is correlating entropy with other indicators: few imports + high entropy + unusual sections + small .text = packed. High entropy in .rsrc with normal everything else = probably just compressed resources.

Basic UPX Unpacking

UPX is the most common packer encountered in malware analysis. Unpacking is trivial when UPX headers are intact:

upx -t suspicious.exe
# suspicious.exe: UPX executable packer [OK]

upx -d suspicious.exe -o unpacked.exe

strings unpacked.exe | wc -l
# Now you see the real strings

Some malware authors modify UPX headers to prevent the standard upx -d command from working. In those cases, you need to fix the headers manually or use dynamic unpacking (dump the process memory after the unpacking stub runs).

Import Address Table (IAT) Analysis

The Import Address Table lists every external function the binary calls from Windows DLLs. This is one of the most powerful static analysis artifacts because the APIs a binary imports directly reveal its capabilities.

Suspicious API calls organized by malware capability — process injection, network, persistence, evasion, and crypto

Reading the Import Table

Linux (on a PE file):

objdump -x suspicious.exe | grep -A 100 "Import Address Table"

python3 -c "
import pefile
pe = pefile.PE('suspicious.exe')
for entry in pe.DIRECTORY_ENTRY_IMPORT:
    print(f'\n{entry.dll.decode()}:')
    for imp in entry.imports:
        print(f'  {imp.name.decode() if imp.name else hex(imp.ordinal)}')
"

Windows (PowerShell with dumpbin):

dumpbin /imports suspicious.exe

Suspicious API Call Categories

Process Injection:

API	What It Does	Why Malware Uses It
`OpenProcess`	Opens a handle to another process	Required first step for injection
`VirtualAllocEx`	Allocates memory in another process	Creates space for injected code
`WriteProcessMemory`	Writes data to another process's memory	Writes the payload
`CreateRemoteThread`	Creates a thread in another process	Executes the injected payload
`NtUnmapViewOfSection`	Unmaps a section of a process	Process hollowing technique

Execution and Download:

API	What It Does	Why Malware Uses It
`WinExec` / `ShellExecuteA`	Executes a command or program	Launches second-stage payloads
`CreateProcessA/W`	Creates a new process	Spawns child processes
`URLDownloadToFileA`	Downloads a file from a URL	Retrieves additional payloads
`InternetOpenA` / `HttpSendRequestA`	Opens HTTP connections	C2 communication

Persistence:

API	What It Does	Why Malware Uses It
`RegSetValueExA`	Sets a registry value	Autorun keys, configuration storage
`CreateServiceA`	Creates a Windows service	Service-based persistence

Defense Evasion:

API	What It Does	Why Malware Uses It
`IsDebuggerPresent`	Checks if a debugger is attached	Anti-analysis check
`GetTickCount` / `Sleep`	Time-based checks	Sandbox evasion (detect fast-forwarding)
`VirtualProtect`	Changes memory protection flags	Makes memory executable for unpacking

💡

A binary that imports only LoadLibraryA and GetProcAddress is almost certainly packed or using dynamic API resolution. These two functions allow a program to load any DLL and resolve any function at runtime — hiding the real imports from static analysis. In Lab 11.2, you will compare the import table of a packed sample (2 imports) versus its unpacked version (50+ imports) to see exactly what was hidden.

DLL Analysis

Malware does not always arrive as a standalone .exe. Many advanced threats use DLL side-loading or DLL injection — placing a malicious DLL where a legitimate application will load it.

When analyzing a DLL:

python3 -c "
import pefile
pe = pefile.PE('suspicious.dll')
print(f'Exports ({len(pe.DIRECTORY_ENTRY_EXPORT.symbols)}):')
for exp in pe.DIRECTORY_ENTRY_EXPORT.symbols:
    print(f'  {exp.ordinal}: {exp.name.decode() if exp.name else "(ordinal only)"}')
"

DLL Red Flag	What It Means
Export names that mimic legitimate Windows DLLs	Possible DLL hijacking/side-loading
Single export function with a generic name (`ServiceMain`, `DllMain`)	Minimal interface, likely a loader
Export by ordinal only (no function names)	Hiding function purposes
DLL with no exports at all	Designed to be loaded via `LoadLibrary` for DllMain execution only

Key Takeaways

Cryptographic hashing (MD5, SHA1, SHA256) provides unique file identification — always compute all three for cross-platform lookups
Fuzzy hashing (ssdeep) identifies malware variants that share code despite different cryptographic hashes
Hash lookups on VirusTotal, MalwareBazaar, and MISP reveal whether the security community has already analyzed your sample — search by hash before uploading files
Packing compresses or encrypts binaries to defeat static analysis — detect it through high entropy, few imports, unusual sections, and small .text
UPX is the most common packer and trivially unpacked with upx -d; commercial protectors like Themida and VMProtect require advanced techniques
Entropy analysis (Shannon entropy) quantifies randomness — sections above 7.0 are almost certainly encrypted or compressed
Import Address Table analysis reveals a binary's capabilities: process injection APIs, network functions, persistence mechanisms, and evasion techniques
A binary importing only LoadLibraryA and GetProcAddress is hiding its real imports through dynamic resolution — a key packing indicator
DLL analysis extends the same techniques to side-loading and injection scenarios common in advanced threats

What's Next

Static analysis has told you what the file contains, how it is built, and what APIs it calls. But there are questions static analysis cannot answer: What does the malware actually do when it runs? What files does it create? What processes does it spawn? What network connections does it make? In Lesson 11.3, you will cross into dynamic analysis — executing malware in a controlled sandbox and monitoring its behavior with Process Monitor, Process Explorer, and Autoruns to build a complete behavioral profile.

Knowledge Check: Hashing, Packing & Imports

10 questions · 70% to pass

Which hashing algorithm is considered the gold standard for malware identification due to its resistance to collisions?

What problem does ssdeep (fuzzy hashing) solve that cryptographic hashing cannot?

In Lab 11.2, you analyze a packed binary and find it imports only LoadLibraryA and GetProcAddress. Why are these two imports significant?

What Shannon entropy range most strongly indicates that a PE section contains encrypted or compressed data?

Why should you search VirusTotal by hash rather than uploading the file during an active investigation?

Which packer can be trivially removed using its own command-line tool with the -d flag?

You find a binary in Lab 11.2 with sections named UPX0 and UPX1, high entropy in UPX1, and only 3 imported functions. What is the most likely explanation?

Which combination of Windows API imports most strongly suggests process injection capability?

On a Linux system, which command generates the SHA256 hash of a file?

A DLL has no named exports and only exports functions by ordinal. What does this suggest?

0/10 answered

Static Analysis: PE Structure & StringsPrevious Dynamic Analysis: Process & File MonitoringNext

Static Analysis: Hashing, Packing & Imports