What You'll Learn
- Extract key artifacts from phishing emails: sender information, URLs, attachments, and embedded objects
- Defang URLs and IP addresses following safe-handling conventions before sharing or documenting
- Hash suspicious attachments using MD5 and SHA256 for comparison against threat intelligence databases
- Analyze extracted URLs using URLScan.io and VirusTotal to assess reputation and hosting infrastructure
- Analyze file attachments on VirusTotal and Hybrid Analysis for malware verdicts and behavioral indicators
- Use CyberChef to decode Base64 payloads, URL-encoded strings, and extract URLs from raw HTML
- Build a structured IOC table from a single phishing email that feeds downstream blocking and detection
- Perform OSINT enrichment of extracted IOCs using free analyst tools
From Email to Evidence
In Lessons PH-1 through PH-3, you learned to identify phishing emails, read their headers, and verify authentication results. That analysis tells you whether an email is malicious. This lesson answers the next question: what exactly is the threat, and how do we weaponize that knowledge defensively?
Artifact extraction is the process of pulling every Indicator of Compromise (IOC) from a phishing email and analyzing each one to understand the attack's infrastructure, intent, and scope. A single phishing email can yield dozens of IOCs — sender addresses, reply-to addresses, originating IPs, URLs, domain names, attachment hashes, embedded scripts, and more.
The goal is not just to confirm "this is phishing." The goal is to build a complete picture: who is attacking, what infrastructure they are using, how the payload works, and what you need to block across your environment to protect every user — not just the one who reported it.
Every artifact you extract becomes an action. A malicious URL becomes a firewall block. A sender domain becomes an email gateway rule. An attachment hash becomes an EDR detection. The difference between a junior analyst who says "this is phishing" and a senior analyst who neutralizes the campaign is the quality of artifact extraction.
Extracting Sender Artifacts
Start with the envelope and header fields you already know from Lesson PH-3:
| Artifact | Where to Find It | Why It Matters |
|---|---|---|
| From address | From: header (display name + email) | Often spoofed — compare with envelope sender |
| Envelope sender | Return-Path: or smtp.mailfrom in authentication headers | The actual sending address; may differ from display From: |
| Reply-To address | Reply-To: header | Attackers set this to a different address to capture responses |
| Originating IP | First Received: header (bottom of chain) | The IP that initiated the SMTP session |
| X-Originating-IP | Sometimes present in webmail-originated messages | Additional source IP indicator |
| Message-ID domain | Message-ID: header (domain after @) | Reveals the actual mail system that generated the message |
From: "IT Support Team" <helpdesk@company-secure.com>
Return-Path: <attacker@evil-domain.xyz>
Reply-To: <credential-harvest@protonmail.com>
Received: from mail.evil-domain.xyz (198.51.100.42)
Message-ID: <abc123@evil-domain.xyz>
From this single header block, you extract five IOCs: the spoofed From domain (company-secure.com), the real sender domain (evil-domain.xyz), the reply-to address (credential-harvest@protonmail.com), the originating IP (198.51.100.42), and the Message-ID domain confirming the true origin.
Extracting and Defanging URLs
Phishing emails almost always contain URLs — either in the body text, HTML hyperlinks, or disguised behind buttons. Extracting them requires examining both the visible text and the underlying HTML source.
Never click URLs from a phishing email on your workstation. Always work with raw source, copy-paste into analysis tools, or use a sandboxed browser. Phishing URLs may fingerprint your browser, log your IP, or trigger drive-by downloads.
Finding URLs in HTML Source
The visible text of a link and its actual destination are often different:
<a href="https://evil-domain.xyz/harvest?id=victim123">
https://company.com/secure-login
</a>
The user sees https://company.com/secure-login. The actual destination is https://evil-domain.xyz/harvest?id=victim123. Always extract from the href attribute, not the display text.
Defanging Conventions
Before writing URLs or IPs in reports, tickets, chat messages, or IOC lists, defang them to prevent accidental clicks or auto-linking:
| Original | Defanged | Method |
|---|---|---|
https://evil-domain.xyz/payload | hxxps://evil-domain[.]xyz/payload | Replace http → hxxp, dots in domain → [.] |
198.51.100.42 | 198[.]51[.]100[.]42 | Wrap dots in brackets |
evil@attacker.com | evil[@]attacker[.]com | Bracket the @ and domain dots |
CyberChef has a built-in Defang URL operation that handles this automatically. In Lab PH-4, you will use it extensively.
Defanging is not optional — it is a professional standard. Sharing a live malicious URL in a Slack channel, email, or Jira ticket can result in someone clicking it. Automated security scanners may also follow live URLs, tipping off the attacker that the campaign has been discovered.
Hashing Attachments
When a phishing email contains an attachment — a Word document, PDF, Excel file, ZIP archive, or executable — the first step is generating cryptographic hashes without opening the file.
# Generate MD5 and SHA256 hashes
md5sum suspicious_invoice.docx
sha256sum suspicious_invoice.docx
# On macOS
md5 suspicious_invoice.docx
shasum -a 256 suspicious_invoice.docx
| Hash Algorithm | Length | Primary Use |
|---|---|---|
| MD5 | 32 hex characters | Quick lookup on VT/threat feeds (widely indexed) |
| SHA256 | 64 hex characters | Definitive identification (collision-resistant) |
Never open suspicious attachments on your work machine. Even "just looking" at a Word document can trigger macros. Always hash first, check the hash against VirusTotal and your threat intel feeds, and only detonate in a sandbox if analysis is needed.
Why Both Hashes?
MD5 is faster to compute and more widely indexed in legacy threat intelligence databases. SHA256 is cryptographically stronger and the standard for modern IOC sharing (STIX/TAXII, MISP). Always generate both.
Analyzing URLs: URLScan.io and VirusTotal
Once you have extracted and defanged URLs, analyze them using free tools before anyone clicks them.
URLScan.io
URLScan.io visits the URL in a sandboxed browser and captures:
- Screenshot of the rendered page (see the phishing page without visiting it)
- DOM content — the full HTML source of the destination page
- Network requests — every resource loaded (scripts, images, redirects)
- Redirect chain — the full path from initial URL to final destination
- IP and hosting information — where the page is hosted
- Verdict — community and automated classification
Submit the URL (re-fang it for the search, or use URLScan's API) and examine the results. A credential harvesting page will typically show a login form mimicking a known brand, hosted on a recently registered domain or compromised site.
VirusTotal URL Scan
VirusTotal aggregates results from 70+ security vendors. For URL analysis:
- Paste the URL into the URL tab (not the file tab)
- Review the detection ratio (e.g., 12/87 vendors flagged it as malicious)
- Check the Community tab for analyst comments
- Examine Relations — other URLs hosted on the same IP, associated files, redirects
Combine both tools. URLScan.io gives you the visual context (what the victim would see) and the technical context (network behavior). VirusTotal gives you vendor consensus and historical associations. Together, they paint a complete picture.
Analyzing Attachments: VirusTotal and Hybrid Analysis
VirusTotal File Analysis
Upload the file hash (not the file itself, to avoid sharing sensitive data) to VirusTotal:
- Detection tab: How many AV engines detect it as malicious
- Behavior tab: If the file has been detonated in VT's sandbox, you see process creation, file drops, network connections, and registry changes
- Relations tab: Other files dropped, contacted domains, similar samples
- Community tab: Analyst notes and YARA rule matches
Hybrid Analysis
Hybrid Analysis (hybrid-analysis.com) by CrowdStrike provides deeper behavioral analysis:
- Submit the file for sandbox detonation (Windows 7/10, Linux)
- View process trees, network connections, DNS queries, and file system changes
- See extracted strings, embedded URLs, and dropped payloads
- Review MITRE ATT&CK technique mapping for observed behaviors
| Feature | VirusTotal | Hybrid Analysis |
|---|---|---|
| AV vendor detections | 70+ engines | CrowdStrike Falcon + selected engines |
| Sandbox behavior | Basic (VT sandbox) | Deep (full OS-level behavioral trace) |
| Network capture | DNS/HTTP summary | Full PCAP available for download |
| Process tree | Basic | Detailed with parent-child relationships |
| ATT&CK mapping | Limited | Comprehensive per-behavior mapping |
| Best for | Quick hash lookups and vendor consensus | Deep-dive behavioral analysis |
Using CyberChef for Decoding
CyberChef (gchq.github.io/CyberChef) is the analyst's Swiss Army knife. Phishing emails frequently use encoding to evade detection, and CyberChef can decode virtually anything.
Common Decoding Operations
Base64 Decode: Attackers encode payloads, URLs, or entire scripts in Base64 to bypass email gateways.
Input: aHR0cHM6Ly9ldmlsLWRvbWFpbi54eXovY3JlZC1oYXJ2ZXN0P3VzZXI9dGFyZ2V0
Recipe: From Base64
Output: https://evil-domain.xyz/cred-harvest?user=target
URL Decode: Percent-encoded URLs hide the true destination.
Input: https%3A%2F%2Fevil-domain.xyz%2Fpayload%3Fid%3D12345
Recipe: URL Decode
Output: https://evil-domain.xyz/payload?id=12345
Extract URLs from HTML: When you have raw HTML source from an email, CyberChef's "Extract URLs" operation pulls every URL from href attributes, script sources, and embedded content.
Recipe: Extract URLs → Defang URL → Sort → Unique
This four-step recipe takes raw HTML and produces a clean, defanged, deduplicated URL list ready for your IOC table.
CyberChef recipes are shareable. You can save a recipe as a URL and share it with your team. In Lab PH-4, you will build several recipes and save them for reuse in future investigations.
Building the IOC Table
Every phishing investigation should produce a structured IOC table. This table becomes the input for blocking rules, SIEM detections, and threat intelligence sharing.
| IOC Type | Value | Source | Context |
|---|---|---|---|
| Email Address | attacker@evil-domain[.]xyz | Return-Path header | Envelope sender |
| Domain | evil-domain[.]xyz | Return-Path, Message-ID | Attacker-controlled sending infrastructure |
| IP Address | 198[.]51[.]100[.]42 | Received header | Originating mail server |
| URL | hxxps://evil-domain[.]xyz/harvest | Email body (href) | Credential harvesting page |
| URL | hxxps://cdn[.]evil-domain[.]xyz/logo[.]png | Email body (img src) | Tracking pixel / brand impersonation asset |
| File Hash (MD5) | d41d8cd98f00b204e9800998ecf8427e | Attachment | Malicious document |
| File Hash (SHA256) | e3b0c44298fc1c149afbf4c8996fb924.... | Attachment | Malicious document |
| File Name | Invoice_Q4_2026.docm | Attachment | Macro-enabled document |
Always defang IOCs in your table. Even in internal documents. Some ticketing systems, wikis, and chat tools automatically convert URLs into clickable links. Defanging prevents accidental navigation.
OSINT Enrichment of Extracted IOCs
Raw IOCs are useful for blocking. Enriched IOCs tell you the story of the attack — who is behind it, how long the infrastructure has been active, and what other campaigns use the same resources.
Domain Enrichment
| Tool | What You Learn |
|---|---|
| WHOIS lookup (whois.domaintools.com) | Registration date, registrar, registrant info (often privacy-protected) |
| PassiveDNS (VirusTotal Relations tab) | Historical IP resolutions — see if the domain recently changed hosting |
| URLScan.io | Hosting provider, page content, SSL certificate details |
| Shodan (shodan.io) | Open ports, services, technologies running on the IP |
A domain registered 48 hours ago hosting a "Microsoft 365 login page" on a bulletproof hosting provider is almost certainly malicious.
IP Enrichment
| Tool | What You Learn |
|---|---|
| AbuseIPDB (abuseipdb.com) | Abuse reports from other analysts worldwide |
| GreyNoise (greynoise.io) | Whether the IP is a known scanner/noise vs. targeted attacker |
| IPinfo (ipinfo.io) | ASN, geolocation, hosting provider |
| VirusTotal | Files communicating with this IP, URLs hosted on it |
File Hash Enrichment
| Tool | What You Learn |
|---|---|
| VirusTotal | Detection ratio, behavioral analysis, YARA matches, community notes |
| Hybrid Analysis | Full sandbox report — process tree, network calls, dropped files |
| MalwareBazaar (bazaar.abuse.ch) | Malware family classification, associated campaigns, download samples |
| Any.Run (any.run) | Interactive sandbox with visual process tree and network activity |
Enrichment reveals campaign scope. If five different phishing emails use five different sender addresses but all link to the same IP — that IP is the campaign's infrastructure. Blocking that single IP neutralizes all five variants. Without enrichment, you would block five addresses and miss the common denominator.
Putting It All Together: The Extraction Workflow
Here is the systematic workflow you will follow in Lab PH-4:
- Save the raw email — download the .eml file or copy the full source
- Extract sender artifacts — From, Return-Path, Reply-To, originating IP, Message-ID domain
- Extract URLs — both visible text and href destinations; use CyberChef "Extract URLs" on HTML source
- Defang everything — all URLs, IPs, and email addresses in your working notes
- Hash attachments — MD5 and SHA256 without opening the file
- Analyze URLs — submit to URLScan.io and VirusTotal
- Analyze attachments — submit hash to VirusTotal, detonate in Hybrid Analysis if needed
- Decode obfuscated content — use CyberChef for Base64, URL encoding, HTML entities
- Build the IOC table — structured, defanged, with source and context columns
- Enrich IOCs — WHOIS, PassiveDNS, AbuseIPDB, GreyNoise for each indicator
- Document findings — feed the IOC table into your investigation report
This workflow is not just for phishing. The same extraction and enrichment process applies to any email-borne threat — BEC, malware delivery, invoice fraud, and even legitimate security notifications that need verification. Master this workflow once and you can apply it everywhere.
Key Takeaways
- A single phishing email can yield dozens of IOCs: sender addresses, domains, IPs, URLs, file hashes, and embedded objects
- Always defang URLs, IPs, and email addresses before sharing — this is a professional standard, not optional
- Generate both MD5 and SHA256 hashes for attachments — MD5 for legacy lookups, SHA256 for modern IOC sharing
- Use URLScan.io for visual and network analysis of URLs, and VirusTotal for vendor consensus and historical associations
- CyberChef is essential for decoding Base64, URL-encoded strings, and extracting URLs from raw HTML source
- Build a structured IOC table with type, value, source, and context columns — this feeds blocking rules and SIEM detections
- OSINT enrichment transforms raw IOCs into campaign intelligence: WHOIS, PassiveDNS, AbuseIPDB, and sandbox analysis reveal the attacker's infrastructure and scope
What's Next
You now know how to tear apart a phishing email and extract every artifact it contains. In Lesson PH-5: Defensive Measures & Response, you will learn what to do with those artifacts — how to block them across your email gateway, firewall, and SIEM, how to check whether anyone else in your organization clicked the link or submitted credentials, and how to build a phishing response process that scales. In Lab PH-4, you will put this lesson into practice by extracting artifacts from a realistic phishing email, analyzing each IOC, and building a complete IOC table.
Knowledge Check: Artifact Extraction & Analysis
10 questions · 70% to pass
What is the primary goal of artifact extraction from a phishing email?
Why should analysts generate both MD5 and SHA256 hashes for a suspicious attachment?
What does defanging a URL accomplish, and which of the following is a correctly defanged URL?
In Lab PH-4, you extract URLs from a phishing email's HTML source using CyberChef. Which recipe chain produces a clean, defanged, deduplicated URL list?
What key difference exists between URLScan.io and VirusTotal for URL analysis?
An attacker encodes a malicious URL in Base64 within a phishing email. What CyberChef operation reveals the hidden URL?
When building an IOC table, what four columns should every entry include?
In Lab PH-4, you discover that five different phishing emails all link to the same IP address. What does OSINT enrichment reveal in this scenario?
Which tool is best suited for deep behavioral analysis of a suspicious attachment, including full process trees and PCAP downloads?
Why is the Return-Path header often more valuable than the From header when extracting sender IOCs?
0/10 answered