CyberBlue Academy — Blue Team & SOC Training

What You'll Learn

Extract key artifacts from phishing emails: sender information, URLs, attachments, and embedded objects
Defang URLs and IP addresses following safe-handling conventions before sharing or documenting
Hash suspicious attachments using MD5 and SHA256 for comparison against threat intelligence databases
Analyze extracted URLs using URLScan.io and VirusTotal to assess reputation and hosting infrastructure
Analyze file attachments on VirusTotal and Hybrid Analysis for malware verdicts and behavioral indicators
Use CyberChef to decode Base64 payloads, URL-encoded strings, and extract URLs from raw HTML
Build a structured IOC table from a single phishing email that feeds downstream blocking and detection
Perform OSINT enrichment of extracted IOCs using free analyst tools

From Email to Evidence

In Lessons PH-1 through PH-3, you learned to identify phishing emails, read their headers, and verify authentication results. That analysis tells you whether an email is malicious. This lesson answers the next question: what exactly is the threat, and how do we weaponize that knowledge defensively?

Artifact extraction is the process of pulling every Indicator of Compromise (IOC) from a phishing email and analyzing each one to understand the attack's infrastructure, intent, and scope. A single phishing email can yield dozens of IOCs — sender addresses, reply-to addresses, originating IPs, URLs, domain names, attachment hashes, embedded scripts, and more.

The goal is not just to confirm "this is phishing." The goal is to build a complete picture: who is attacking, what infrastructure they are using, how the payload works, and what you need to block across your environment to protect every user — not just the one who reported it.

Artifact extraction workflow — from raw email through extraction, analysis, and IOC table creation

ℹ

Every artifact you extract becomes an action. A malicious URL becomes a firewall block. A sender domain becomes an email gateway rule. An attachment hash becomes an EDR detection. The difference between a junior analyst who says "this is phishing" and a senior analyst who neutralizes the campaign is the quality of artifact extraction.

Extracting Sender Artifacts

Start with the envelope and header fields you already know from Lesson PH-3:

Artifact	Where to Find It	Why It Matters
From address	`From:` header (display name + email)	Often spoofed — compare with envelope sender
Envelope sender	`Return-Path:` or `smtp.mailfrom` in authentication headers	The actual sending address; may differ from display `From:`
Reply-To address	`Reply-To:` header	Attackers set this to a different address to capture responses
Originating IP	First `Received:` header (bottom of chain)	The IP that initiated the SMTP session
X-Originating-IP	Sometimes present in webmail-originated messages	Additional source IP indicator
Message-ID domain	`Message-ID:` header (domain after @)	Reveals the actual mail system that generated the message

From: "IT Support Team" <helpdesk@company-secure.com>
Return-Path: <attacker@evil-domain.xyz>
Reply-To: <credential-harvest@protonmail.com>
Received: from mail.evil-domain.xyz (198.51.100.42)
Message-ID: <abc123@evil-domain.xyz>

From this single header block, you extract five IOCs: the spoofed From domain (company-secure.com), the real sender domain (evil-domain.xyz), the reply-to address (credential-harvest@protonmail.com), the originating IP (198.51.100.42), and the Message-ID domain confirming the true origin.

Extracting and Defanging URLs

Phishing emails almost always contain URLs — either in the body text, HTML hyperlinks, or disguised behind buttons. Extracting them requires examining both the visible text and the underlying HTML source.

⚠

Never click URLs from a phishing email on your workstation. Always work with raw source, copy-paste into analysis tools, or use a sandboxed browser. Phishing URLs may fingerprint your browser, log your IP, or trigger drive-by downloads.

Finding URLs in HTML Source

The visible text of a link and its actual destination are often different:

<a href="https://evil-domain.xyz/harvest?id=victim123">
  https://company.com/secure-login
</a>

The user sees https://company.com/secure-login. The actual destination is https://evil-domain.xyz/harvest?id=victim123. Always extract from the href attribute, not the display text.

Defanging Conventions

Before writing URLs or IPs in reports, tickets, chat messages, or IOC lists, defang them to prevent accidental clicks or auto-linking:

Original	Defanged	Method
`https://evil-domain.xyz/payload`	`hxxps://evil-domain[.]xyz/payload`	Replace `http` → `hxxp`, dots in domain → `[.]`
`198.51.100.42`	`198[.]51[.]100[.]42`	Wrap dots in brackets
`evil@attacker.com`	`evil[@]attacker[.]com`	Bracket the @ and domain dots

CyberChef has a built-in Defang URL operation that handles this automatically. In Lab PH-4, you will use it extensively.

💡

Defanging is not optional — it is a professional standard. Sharing a live malicious URL in a Slack channel, email, or Jira ticket can result in someone clicking it. Automated security scanners may also follow live URLs, tipping off the attacker that the campaign has been discovered.

Hashing Attachments

When a phishing email contains an attachment — a Word document, PDF, Excel file, ZIP archive, or executable — the first step is generating cryptographic hashes without opening the file.

# Generate MD5 and SHA256 hashes
md5sum suspicious_invoice.docx
sha256sum suspicious_invoice.docx

# On macOS
md5 suspicious_invoice.docx
shasum -a 256 suspicious_invoice.docx

Hash Algorithm	Length	Primary Use
MD5	32 hex characters	Quick lookup on VT/threat feeds (widely indexed)
SHA256	64 hex characters	Definitive identification (collision-resistant)

🚨

Never open suspicious attachments on your work machine. Even "just looking" at a Word document can trigger macros. Always hash first, check the hash against VirusTotal and your threat intel feeds, and only detonate in a sandbox if analysis is needed.

Why Both Hashes?

MD5 is faster to compute and more widely indexed in legacy threat intelligence databases. SHA256 is cryptographically stronger and the standard for modern IOC sharing (STIX/TAXII, MISP). Always generate both.

Analyzing URLs: URLScan.io and VirusTotal

Once you have extracted and defanged URLs, analyze them using free tools before anyone clicks them.

URLScan.io

URLScan.io visits the URL in a sandboxed browser and captures:

Screenshot of the rendered page (see the phishing page without visiting it)
DOM content — the full HTML source of the destination page
Network requests — every resource loaded (scripts, images, redirects)
Redirect chain — the full path from initial URL to final destination
IP and hosting information — where the page is hosted
Verdict — community and automated classification

Submit the URL (re-fang it for the search, or use URLScan's API) and examine the results. A credential harvesting page will typically show a login form mimicking a known brand, hosted on a recently registered domain or compromised site.

VirusTotal URL Scan

VirusTotal aggregates results from 70+ security vendors. For URL analysis:

Paste the URL into the URL tab (not the file tab)
Review the detection ratio (e.g., 12/87 vendors flagged it as malicious)
Check the Community tab for analyst comments
Examine Relations — other URLs hosted on the same IP, associated files, redirects

💡

Combine both tools. URLScan.io gives you the visual context (what the victim would see) and the technical context (network behavior). VirusTotal gives you vendor consensus and historical associations. Together, they paint a complete picture.

Analyzing Attachments: VirusTotal and Hybrid Analysis

VirusTotal File Analysis

Upload the file hash (not the file itself, to avoid sharing sensitive data) to VirusTotal:

Detection tab: How many AV engines detect it as malicious
Behavior tab: If the file has been detonated in VT's sandbox, you see process creation, file drops, network connections, and registry changes
Relations tab: Other files dropped, contacted domains, similar samples
Community tab: Analyst notes and YARA rule matches

Hybrid Analysis

Hybrid Analysis (hybrid-analysis.com) by CrowdStrike provides deeper behavioral analysis:

Submit the file for sandbox detonation (Windows 7/10, Linux)
View process trees, network connections, DNS queries, and file system changes
See extracted strings, embedded URLs, and dropped payloads
Review MITRE ATT&CK technique mapping for observed behaviors

Feature	VirusTotal	Hybrid Analysis
AV vendor detections	70+ engines	CrowdStrike Falcon + selected engines
Sandbox behavior	Basic (VT sandbox)	Deep (full OS-level behavioral trace)
Network capture	DNS/HTTP summary	Full PCAP available for download
Process tree	Basic	Detailed with parent-child relationships
ATT&CK mapping	Limited	Comprehensive per-behavior mapping
Best for	Quick hash lookups and vendor consensus	Deep-dive behavioral analysis

Using CyberChef for Decoding

CyberChef (gchq.github.io/CyberChef) is the analyst's Swiss Army knife. Phishing emails frequently use encoding to evade detection, and CyberChef can decode virtually anything.

Common Decoding Operations

Base64 Decode: Attackers encode payloads, URLs, or entire scripts in Base64 to bypass email gateways.

Input:  aHR0cHM6Ly9ldmlsLWRvbWFpbi54eXovY3JlZC1oYXJ2ZXN0P3VzZXI9dGFyZ2V0
Recipe: From Base64
Output: https://evil-domain.xyz/cred-harvest?user=target

URL Decode: Percent-encoded URLs hide the true destination.

Input:  https%3A%2F%2Fevil-domain.xyz%2Fpayload%3Fid%3D12345
Recipe: URL Decode
Output: https://evil-domain.xyz/payload?id=12345

Extract URLs from HTML: When you have raw HTML source from an email, CyberChef's "Extract URLs" operation pulls every URL from href attributes, script sources, and embedded content.

Recipe: Extract URLs → Defang URL → Sort → Unique

This four-step recipe takes raw HTML and produces a clean, defanged, deduplicated URL list ready for your IOC table.

ℹ

CyberChef recipes are shareable. You can save a recipe as a URL and share it with your team. In Lab PH-4, you will build several recipes and save them for reuse in future investigations.

Building the IOC Table

Every phishing investigation should produce a structured IOC table. This table becomes the input for blocking rules, SIEM detections, and threat intelligence sharing.

IOC Type	Value	Source	Context
Email Address	`attacker@evil-domain[.]xyz`	Return-Path header	Envelope sender
Domain	`evil-domain[.]xyz`	Return-Path, Message-ID	Attacker-controlled sending infrastructure
IP Address	`198[.]51[.]100[.]42`	Received header	Originating mail server
URL	`hxxps://evil-domain[.]xyz/harvest`	Email body (href)	Credential harvesting page
URL	`hxxps://cdn[.]evil-domain[.]xyz/logo[.]png`	Email body (img src)	Tracking pixel / brand impersonation asset
File Hash (MD5)	`d41d8cd98f00b204e9800998ecf8427e`	Attachment	Malicious document
File Hash (SHA256)	`e3b0c44298fc1c149afbf4c8996fb924....`	Attachment	Malicious document
File Name	`Invoice_Q4_2026.docm`	Attachment	Macro-enabled document

⚠

Always defang IOCs in your table. Even in internal documents. Some ticketing systems, wikis, and chat tools automatically convert URLs into clickable links. Defanging prevents accidental navigation.

OSINT Enrichment of Extracted IOCs

Raw IOCs are useful for blocking. Enriched IOCs tell you the story of the attack — who is behind it, how long the infrastructure has been active, and what other campaigns use the same resources.

IOC enrichment pipeline — from raw extraction through OSINT lookups to actionable intelligence

Domain Enrichment

Tool	What You Learn
WHOIS lookup (whois.domaintools.com)	Registration date, registrar, registrant info (often privacy-protected)
PassiveDNS (VirusTotal Relations tab)	Historical IP resolutions — see if the domain recently changed hosting
URLScan.io	Hosting provider, page content, SSL certificate details
Shodan (shodan.io)	Open ports, services, technologies running on the IP

A domain registered 48 hours ago hosting a "Microsoft 365 login page" on a bulletproof hosting provider is almost certainly malicious.

IP Enrichment

Tool	What You Learn
AbuseIPDB (abuseipdb.com)	Abuse reports from other analysts worldwide
GreyNoise (greynoise.io)	Whether the IP is a known scanner/noise vs. targeted attacker
IPinfo (ipinfo.io)	ASN, geolocation, hosting provider
VirusTotal	Files communicating with this IP, URLs hosted on it

File Hash Enrichment

Tool	What You Learn
VirusTotal	Detection ratio, behavioral analysis, YARA matches, community notes
Hybrid Analysis	Full sandbox report — process tree, network calls, dropped files
MalwareBazaar (bazaar.abuse.ch)	Malware family classification, associated campaigns, download samples
Any.Run (any.run)	Interactive sandbox with visual process tree and network activity

💡

Enrichment reveals campaign scope. If five different phishing emails use five different sender addresses but all link to the same IP — that IP is the campaign's infrastructure. Blocking that single IP neutralizes all five variants. Without enrichment, you would block five addresses and miss the common denominator.

Putting It All Together: The Extraction Workflow

Here is the systematic workflow you will follow in Lab PH-4:

Save the raw email — download the .eml file or copy the full source
Extract sender artifacts — From, Return-Path, Reply-To, originating IP, Message-ID domain
Extract URLs — both visible text and href destinations; use CyberChef "Extract URLs" on HTML source
Defang everything — all URLs, IPs, and email addresses in your working notes
Hash attachments — MD5 and SHA256 without opening the file
Analyze URLs — submit to URLScan.io and VirusTotal
Analyze attachments — submit hash to VirusTotal, detonate in Hybrid Analysis if needed
Decode obfuscated content — use CyberChef for Base64, URL encoding, HTML entities
Build the IOC table — structured, defanged, with source and context columns
Enrich IOCs — WHOIS, PassiveDNS, AbuseIPDB, GreyNoise for each indicator
Document findings — feed the IOC table into your investigation report

ℹ

This workflow is not just for phishing. The same extraction and enrichment process applies to any email-borne threat — BEC, malware delivery, invoice fraud, and even legitimate security notifications that need verification. Master this workflow once and you can apply it everywhere.

Key Takeaways

A single phishing email can yield dozens of IOCs: sender addresses, domains, IPs, URLs, file hashes, and embedded objects
Always defang URLs, IPs, and email addresses before sharing — this is a professional standard, not optional
Generate both MD5 and SHA256 hashes for attachments — MD5 for legacy lookups, SHA256 for modern IOC sharing
Use URLScan.io for visual and network analysis of URLs, and VirusTotal for vendor consensus and historical associations
CyberChef is essential for decoding Base64, URL-encoded strings, and extracting URLs from raw HTML source
Build a structured IOC table with type, value, source, and context columns — this feeds blocking rules and SIEM detections
OSINT enrichment transforms raw IOCs into campaign intelligence: WHOIS, PassiveDNS, AbuseIPDB, and sandbox analysis reveal the attacker's infrastructure and scope

What's Next

You now know how to tear apart a phishing email and extract every artifact it contains. In Lesson PH-5: Defensive Measures & Response, you will learn what to do with those artifacts — how to block them across your email gateway, firewall, and SIEM, how to check whether anyone else in your organization clicked the link or submitted credentials, and how to build a phishing response process that scales. In Lab PH-4, you will put this lesson into practice by extracting artifacts from a realistic phishing email, analyzing each IOC, and building a complete IOC table.

Knowledge Check: Artifact Extraction & Analysis

10 questions · 70% to pass

What is the primary goal of artifact extraction from a phishing email?

Why should analysts generate both MD5 and SHA256 hashes for a suspicious attachment?

What does defanging a URL accomplish, and which of the following is a correctly defanged URL?

In Lab PH-4, you extract URLs from a phishing email's HTML source using CyberChef. Which recipe chain produces a clean, defanged, deduplicated URL list?

What key difference exists between URLScan.io and VirusTotal for URL analysis?

An attacker encodes a malicious URL in Base64 within a phishing email. What CyberChef operation reveals the hidden URL?

When building an IOC table, what four columns should every entry include?

In Lab PH-4, you discover that five different phishing emails all link to the same IP address. What does OSINT enrichment reveal in this scenario?

Which tool is best suited for deep behavioral analysis of a suspicious attachment, including full process trees and PCAP downloads?

Why is the Return-Path header often more valuable than the From header when extracting sender IOCs?

0/10 answered

Phishing Types, Tactics & TechniquesPrevious Defensive Measures & ResponseNext