- Identify the 8 core log source categories that feed a production SIEM — plus additional sources you'll encounter in enterprise environments - Understand what each category tells you and which attack phases it covers - Know the critical Windows Event IDs every SOC analyst must recognize on sight - Explain what Linux log files reveal about system and user activity - Map log sources to MITRE ATT&CK tactics to understand where your visibility starts and stops - Recognize common log formats (Syslog, JSON, CEF, Windows Event XML) and why format matters for SIEM parsing - Understand the five methods logs use to reach the SIEM (agents, syslog, APIs, forwarders, agentless) - Recognize log source gaps and why no single source is enough

**Three Scenarios in Your Lab.** The Wazuh lab environment contains alerts from three concurrent incidents: **Operation APT Breach** (external attacker, 06:00-09:00, CDB-enriched), **Operation Inside Out** (insider threat, 18:00-21:00), and **Operation LOLBin Strike** (cryptominer via LOLBins, 10:00-13:00). Each scenario produces distinct log types and detection patterns that you'll analyze in the corresponding labs. The insider and LOLBin scenarios expand log source coverage with **PowerShell Script Block Logging (Event ID 4104)**, **Windows Defender events (1116/1117)**, **object access auditing (4663)**, and **explicit credential use (4648)** — giving you hands-on exposure to log sources beyond the core 8 categories.

## Why Log Sources Are Everything In Module 1, you learned that a SIEM is the central nervous system of the SOC. But a SIEM is only as good as the data flowing into it. If a log source isn't connected, the SIEM can't see it — and neither can you. Every blind spot in your log coverage is an opportunity for an attacker to operate undetected. In Lab 1.3, you explored **12 log sources** inside Wazuh across 12 agents and built a Log Source Reference Sheet. Now we're going to expand that foundation. A real enterprise SOC doesn't monitor 12 sources — it monitors **hundreds**, organized into 8 major categories. Understanding these categories is what separates a junior analyst who reacts to alerts from a senior analyst who understands the full picture. > **The #1 question every SOC analyst should ask on day one: "What log sources do we have — and what are we missing?"** ![8 Log Source Categories in the SOC](https://cyberblue-academy-content.s3.us-east-2.amazonaws.com/courses/cyberbluesoc-academy/module-02/lesson-2-1/eight-log-source-categories.png) --- ## Category 1: Windows Event Logs Windows Event Logs are the single most important log source in most enterprise SOCs because the vast majority of corporate endpoints and servers run Windows. The Windows Security channel alone generates the events you'll spend 60-70% of your time investigating. ![Windows Event IDs — The Essential Cheat Sheet](https://cyberblue-academy-content.s3.us-east-2.amazonaws.com/courses/cyberbluesoc-academy/module-02/lesson-2-1/windows-event-ids-cheatsheet.png) ### The Must-Know Event IDs | Event ID | Channel | What It Logs | ATT&CK Relevance | |----------|---------|-------------|-------------------| | **4624** | Security | Successful logon | Lateral movement (T1021), valid accounts (T1078) | | **4625** | Security | Failed logon | Brute force (T1110), password spraying | | **4688** | Security | New process created | Execution (T1059), command-line logging | | **4697** | Security | Service installed | Persistence (T1543.003) | | **4720** | Security | User account created | Account manipulation (T1136) | | **7045** | System | New service registered | Persistence (T1543.003) — duplicate coverage with 4697 | | **1102** | Security | Audit log cleared | Defense evasion (T1070.001) — always critical | | **4648** | Security | Explicit credential logon | Credential use — runas, scheduled tasks | | **4672** | Security | Special privileges assigned | Admin logon — tracks who got elevated access | ### Logon Types — The Context That Changes Everything When you see Event ID 4624 or 4625, the **logon type** field tells you *how* the authentication happened. This single field can mean the difference between a routine event and an active intrusion: | Type | Name | What It Means | Suspicious When... | |------|------|--------------|---------------------| | **2** | Interactive | Keyboard login at the console | Happens outside business hours on a server | | **3** | Network | SMB, RPC, WMI — remote resource access | Source IP is external or from an unexpected subnet | | **5** | Service | Windows service starting | Normal — unless it's a service you didn't install | | **7** | Unlock | Workstation unlocked | Rarely suspicious on its own | | **10** | RemoteInteractive | RDP session | Source IP is external, or user doesn't normally RDP |

**Type 3 (Network) Is Where Lateral Movement Lives.** When an attacker uses stolen credentials to access file shares (SMB), run commands via WMI, or execute PsExec, it generates a Type 3 logon. If you see Type 3 from an unusual source IP — especially hopping between servers — that's your lateral movement indicator.

### What Windows Event Logs Don't Tell You Windows Security logs are powerful, but they have blind spots: - **No command-line arguments** in 4688 unless you enable "Include command line in process creation events" via Group Policy - **No DLL loading, registry changes, or network connections per process** — that's what Sysmon adds (Lesson 2.10) - **No file content analysis** — you know a file was created but not what's in it (that's YARA, Module 10) --- ## Category 2: Linux / Syslog Linux systems generate logs through the **syslog** facility and dedicated log files. In a SOC, you'll encounter Linux logs from web servers, DNS servers, database servers, network appliances, and cloud instances. ### The Critical Log Files | Log File | What It Contains | SOC Relevance | |----------|-----------------|---------------| | **/var/log/auth.log** (Debian/Ubuntu) or **/var/log/secure** (RHEL/CentOS) | SSH logins, sudo commands, PAM (Pluggable Authentication Modules) authentication | Brute force (T1110), privilege escalation (T1548), valid accounts (T1078) | | **/var/log/syslog** or **/var/log/messages** | System events, service starts/stops, kernel messages | Persistence (cron, services), system manipulation | | **/var/log/audit/audit.log** | Detailed audit trail (if auditd is enabled) | Process execution, file access, syscalls — Linux's equivalent of deep telemetry | | **/var/log/cron** | Cron job execution | Persistence via scheduled tasks (T1053.003) | | **/var/log/kern.log** | Kernel-level messages | Firewall drops (iptables), hardware issues, kernel exploits | ### Reading auth.log — What Matters A single SSH brute force attempt in auth.log looks like this: ``` Feb 15 06:15:02 linux-web-01 sshd[5102]: Invalid user admin from 185.220.101.42 port 44891 Feb 15 06:15:14 linux-web-01 sshd[5104]: Failed password for root from 185.220.101.42 port 44903 ``` The fields that matter for triage: - **Timestamp** — when did it happen? - **Hostname** — which server was targeted? - **Program** — sshd, sudo, cron, etc. - **Source IP** — internal (expected) or external (investigate)? - **Username** — valid user or random guess?

**Recognize This IP?** The source IP `185.220.101.42` is the "Operation Shadow Broker" attacker you'll track across every module in this course. You first encountered this IP in Lab 1.1, and you'll see it again in phishing (Module 5), network detection (Module 6), threat intelligence (Module 7), and the final capstone (Module 18).

### Linux Audit Framework (auditd) When auditd is enabled, it provides the deepest Linux visibility — comparable to what Sysmon gives on Windows. It can log: - Every process execution with full command lines - File access and permission changes - Network socket creation - System calls Most production Linux servers in security-conscious organizations run auditd. If you see `/var/log/audit/audit.log` events in your SIEM, you have premium Linux visibility.

**Career Note:** Many SOC job postings list "experience with Linux log analysis" as a requirement. What they really mean is: can you read auth.log, understand sudo events, and spot anomalies in syslog? Lab 1.3 already gave you hands-on practice with exactly this.

--- ## Category 3: Firewall & Network Device Logs Firewalls sit at network boundaries and log every connection they allow or block. They're your perimeter visibility — the first and last line of defense. ### What Firewall Logs Contain | Field | What It Tells You | |-------|-------------------| | **Source IP** | Who initiated the connection | | **Destination IP** | What internal system they targeted | | **Destination Port** | What service they tried to reach (22=SSH, 443=HTTPS, 3389=RDP) | | **Action** | Allow or deny/drop | | **Protocol** | TCP, UDP, ICMP | | **Bytes transferred** | Volume of data (relevant for exfiltration) | ### Common Firewall Platforms You'll Encounter | Platform | Log Format | How It Reaches the SIEM | |----------|-----------|------------------------| | **iptables/nftables** (Linux) | Kernel syslog messages | Syslog forwarding | | **Palo Alto Networks** | Structured CSV or CEF (Common Event Format) | Syslog or API integration | | **Fortinet FortiGate** | Key-value pairs | Syslog forwarding | | **Cisco ASA** | Syslog with message codes | Syslog forwarding | | **pfSense** | BSD syslog format | Syslog forwarding | | **AWS Security Groups / NACLs** | VPC Flow Logs (JSON) | CloudWatch → SIEM connector | ### What to Look For - **Repeated drops to the same port from one IP** — port scanning (reconnaissance) - **Drops on port 4444, 5555, or other non-standard ports** — reverse shell attempts - **Allowed connections to known-bad IPs** — C2 communication that got through - **Large outbound data transfers** — potential exfiltration - **Internal-to-internal drops** — misconfiguration or lateral movement attempts

**Firewall Logs Show You What Was Blocked — And What Got Through.** A "deny" event means the attack was stopped at the perimeter. An "allow" to a suspicious destination means it wasn't. Both are equally important: denies tell you who's knocking, allows tell you who got in.

--- ## Category 4: DNS Query Logs Every network connection starts with a DNS query. Before malware can call home to `evil-c2.example.com`, it has to resolve that domain to an IP address. DNS logs capture every one of these queries. ### Why DNS Logs Are a SOC Goldmine 1. **C2 Detection** — Malware must resolve its C2 domain. DNS logs record it even if the actual C2 traffic is encrypted. 2. **DNS Tunneling** — Attackers encode data inside DNS queries to exfiltrate information. These show up as unusually long subdomains or high query volumes to a single domain. 3. **DGA Detection** — Domain Generation Algorithms produce random-looking domains (`xk7gf2p9.net`). A spike in queries to newly-registered or algorithmically-generated domains is a strong malware indicator. 4. **Shadow IT Discovery** — DNS logs reveal what cloud services employees are using (Dropbox, personal email, unauthorized SaaS). ### What DNS Log Fields Matter | Field | What It Tells You | |-------|-------------------| | **Client IP** | Which internal host made the query | | **Queried domain** | What they tried to resolve | | **Query type** | A (IPv4), AAAA (IPv6), MX (mail), TXT (often used for tunneling) | | **Response code** | NOERROR (found), NXDOMAIN (doesn't exist), SERVFAIL | | **Timestamp** | When the query happened | ### Suspicious DNS Patterns | Pattern | What It May Indicate | |---------|---------------------| | Queries to domains with random characters | DGA malware (T1568.002) | | Very long subdomain strings (50+ chars) | DNS tunneling / exfiltration (T1071.004) | | High volume of NXDOMAIN responses | DGA probing or misconfigured malware | | Queries to recently registered domains (< 30 days) | Newly staged C2 infrastructure | | TXT record queries to unusual domains | DNS-based data exfiltration |

**DNS Never Lies.** An attacker can encrypt their C2 traffic, use legitimate cloud services for hosting, and blend into normal HTTPS traffic. But they can't avoid DNS resolution (unless they hardcode IPs, which limits flexibility). This makes DNS one of the most reliable detection sources across the entire kill chain.

--- ## Category 5: Web Proxy / HTTP Logs A web proxy (also called a Secure Web Gateway) sits between users and the internet. It inspects, logs, and optionally blocks web traffic. In organizations that route all web traffic through a proxy, these logs are a treasure trove. ### What Proxy Logs Capture | Field | What It Tells You | |-------|-------------------| | **User / Client IP** | Who made the request | | **URL** | Full URL including path and parameters | | **HTTP Method** | GET, POST, PUT, DELETE | | **Response Code** | 200 (OK), 403 (blocked), 404 (not found), 500 (server error) | | **Content Type** | Was it HTML, JavaScript, an executable, a ZIP file? | | **User-Agent** | What browser or tool made the request | | **Bytes transferred** | How much data was uploaded or downloaded | | **Category** | Proxy's classification (business, social media, malware, uncategorized) | ### SOC Use Cases - **Malware delivery detection** — User visited a compromised website, proxy logged the URL and the `.exe` download - **C2 callback identification** — Infected host makes periodic HTTPS connections to `cdn-static-assets.xyz` every 60 seconds - **Data exfiltration** — Large POST requests to an uncategorized domain at 3 AM - **Policy violation** — Employee accessing unauthorized file-sharing or personal cloud storage - **Phishing follow-through** — After clicking a link in email, what did they actually browse to? ### Common Proxy Platforms | Platform | Deployment | Notes | |----------|-----------|-------| | **Zscaler** | Cloud-based | Very common in modern enterprises | | **Symantec/BlueCoat ProxySG** | On-prem appliance | Legacy but still widespread | | **Squid** | Open-source, on-prem | Often used in smaller orgs | | **McAfee Web Gateway** | On-prem or cloud | Enterprise-grade | | **Microsoft Defender for Cloud Apps** | Cloud (M365) | CASB functionality |

**Proxy + DNS = Network Visibility Duo.** DNS tells you *what domains* were resolved. Proxy tells you *what content was actually accessed*. Together, they give you near-complete visibility into outbound network activity — even when the traffic is encrypted (because the proxy terminates TLS).

--- ## Category 6: Email Gateway Logs Phishing remains the #1 initial access vector in real-world attacks. In Lab 1.2, the APT29 scenario started with spearphishing emails (T1566.001). Email gateway logs are where you detect and investigate these attacks. ### What Email Gateway Logs Capture | Field | What It Tells You | |-------|-------------------| | **Sender address** | Who sent the email (and whether it's spoofed) | | **Recipient** | Who was targeted | | **Subject line** | Social engineering context | | **Attachment name / type** | `invoice.docm`, `urgent-review.pdf.exe` | | **URLs in body** | Phishing links, credential harvester URLs | | **Verdict** | Delivered, quarantined, blocked | | **SPF/DKIM/DMARC results** | Email authentication — did it pass? | | **Threat classification** | Phishing, malware, spam, BEC | ### Why Email Logs Are Critical for Investigation When a phishing campaign hits your organization, the SIEM alert might only show one user who clicked. Email gateway logs answer the harder questions: 1. **How many people received the same email?** (scope of the campaign) 2. **Did anyone else click the link or open the attachment?** (other potential victims) 3. **Was the email quarantined or delivered?** (do you need to pull it from mailboxes?) 4. **What was the sender domain and IP?** (IOCs for threat intel — Module 7) ### Common Email Security Platforms | Platform | Type | Notes | |----------|------|-------| | **Microsoft Defender for Office 365** | Cloud (M365) | Built into most enterprise email | | **Proofpoint** | Cloud gateway | Market leader in email security | | **Mimecast** | Cloud gateway | Strong attachment sandboxing | | **Barracuda** | Cloud/on-prem | Mid-market | | **Google Workspace Security** | Cloud (Gmail) | Built into Google Workspace |

**The Gap Between "Blocked" and "Delivered."** An email gateway might block 99% of phishing attempts. That sounds great until you realize that in a 10,000-employee organization receiving 1,000 phishing emails per day, "99% blocked" means **10 phishing emails land in inboxes every single day**. Those 10 are why SOC analysts exist.

--- ## Category 7: Cloud Audit Trails Modern organizations run hybrid environments — on-premises servers plus cloud infrastructure (AWS, Azure, GCP) plus SaaS applications (M365, Salesforce, Slack). Each of these generates audit logs that track who did what. ### The Big Three Cloud Audit Sources | Source | Platform | What It Logs | |--------|----------|-------------| | **AWS CloudTrail** | AWS | Every API call — EC2 launches, S3 access, IAM changes, console logins | | **Azure AD / Entra ID Sign-in Logs** | Microsoft | User sign-ins, MFA challenges, conditional access results, risky sign-ins | | **Microsoft 365 Unified Audit Log** | M365 | Email access, SharePoint file operations, Teams activity, admin changes | | **Google Workspace Audit** | Google | Gmail access, Drive sharing, admin console changes | | **GCP Cloud Audit Logs** | GCP | Admin activity, data access, system events | ### Why Cloud Logs Matter More Every Year - **Identity is the new perimeter.** In cloud environments, there's no firewall between the attacker and your data — just an identity (username + password + MFA). Cloud audit logs track every authentication and authorization decision. - **Attackers target cloud directly.** Credential stuffing against Azure AD, phishing for M365 tokens, compromising AWS access keys — these attacks skip your on-prem defenses entirely. - **Data lives in the cloud.** If an attacker exfiltrates data from SharePoint or S3, the only log that records it is the cloud audit trail. ### Key Cloud Events to Monitor | Event | Platform | Why It Matters | |-------|----------|---------------| | Console login from unusual location | AWS CloudTrail / Azure AD | Compromised credentials (T1078.004) | | MFA bypass or disabled | Azure AD / M365 | Attacker removing security controls | | IAM policy changed | AWS CloudTrail | Privilege escalation in cloud (T1098) | | S3 bucket made public | AWS CloudTrail | Data exposure — accidental or malicious | | Mail forwarding rule created | M365 | Business Email Compromise (BEC) persistence | | Mass file download from SharePoint | M365 | Data exfiltration (T1530) |

**Cloud Logs Are the Fastest Growing Category.** Five years ago, most SOCs monitored only on-prem logs. Today, cloud audit trails often generate more events than traditional sources. In some cloud-native organizations, CloudTrail and Azure AD logs are the *primary* data sources in the SIEM.

--- ## Category 8: Application & Database Logs Every application generates logs — web servers, databases, custom business applications, middleware. These logs are often overlooked in SOCs but contain evidence that no other source captures. ### Web Server Logs | Source | Log File | What It Records | |--------|----------|----------------| | **Apache** | access.log, error.log | Every HTTP request: IP, URL, method, status code, user-agent | | **Nginx** | access.log, error.log | Same as Apache with different format | | **IIS** | W3SVC logs | Windows web server access logs | Web server logs detect: - **SQL injection attempts** — `/api/users?id=1' OR '1'='1` (T1190) - **Web shell access** — Repeated requests to `/uploads/shell.php` from a single IP - **Directory traversal** — `/../../etc/passwd` in URL paths - **Vulnerability scanning** — Rapid requests to known vulnerable paths (`/wp-login.php`, `/phpmyadmin`, `/.env`) ### Database Audit Logs | Database | Audit Feature | What It Records | |----------|--------------|-----------------| | **MySQL / MariaDB** | General query log, audit plugin | Every SQL query executed | | **PostgreSQL** | pgaudit extension | Query logging with parameters | | **MSSQL** | SQL Server Audit | Login events, schema changes, query execution | | **Oracle** | Unified Auditing | Comprehensive query and access logging | Database audit logs detect: - **Unauthorized data access** — SELECT queries on sensitive tables from unusual users - **Data manipulation** — UPDATE/DELETE on critical records - **Schema changes** — DROP TABLE, ALTER TABLE from non-admin accounts - **Privilege escalation** — GRANT commands giving excessive permissions ### Custom Application Logs Many organizations build custom applications that generate their own logs. These often contain business-context that no other log source provides: - **Authentication events** specific to the application - **Business logic violations** (e.g., transferring more than a threshold amount) - **API access patterns** that indicate scraping or abuse

**The Overlooked Goldmine.** Web server logs are available on virtually every organization's web-facing systems but are often not forwarded to the SIEM. If you join a SOC and discover that Apache/Nginx access logs aren't being collected, flag it immediately — you're blind to web application attacks.

--- ## Connecting Log Sources to ATT&CK In Lab 1.2, you mapped 15 APT29 techniques to the ATT&CK framework and color-coded them by detection capability. Now let's see which log sources cover which tactics: ![Log Sources Mapped to ATT&CK Tactics](https://cyberblue-academy-content.s3.us-east-2.amazonaws.com/courses/cyberbluesoc-academy/module-02/lesson-2-1/log-sources-attack-mapping.png) | ATT&CK Tactic | Primary Log Sources | Why | |----------------|--------------------|----| | **Initial Access** | Email gateway, web proxy, firewall | Phishing emails, drive-by downloads, and exploit attempts arrive through these channels | | **Execution** | Windows 4688, Linux audit.log, application logs | Process creation captures what ran; app logs capture web-based execution | | **Persistence** | Windows 7045/4697, Linux syslog/cron, cloud audit | New services, scheduled tasks, and IAM changes establish long-term access | | **Defense Evasion** | Windows events, cloud audit, FIM | Log clearing (1102), policy changes, and file modifications reveal evasion attempts | | **Credential Access** | Windows 4625, Linux auth.log, cloud sign-in logs | Failed authentication across all platforms tracks brute force and credential theft | | **Discovery** | DNS logs, Windows events, cloud audit | Network reconnaissance generates DNS queries and enumeration events | | **Lateral Movement** | Windows 4624 (Type 3), firewall logs | Network logons between internal systems and allowed internal traffic patterns | | **Exfiltration** | Web proxy, DNS logs, firewall logs | Outbound data transfers, DNS tunneling, and large file uploads to external services |

**No Single Log Source Covers the Full Kill Chain.** This is the most important takeaway of this lesson. Windows Event Logs alone miss initial access (email), network C2 (DNS/proxy), and cloud attacks entirely. A SOC that only monitors Windows events has massive blind spots. Defense in depth requires *log sources* in depth.

--- ## The Log Source Priority Matrix Not every log source is equally important. If you're building a SOC from scratch or evaluating coverage, here's how to prioritize: | Priority | Log Sources | Why First | |----------|-----------|-----------| | **Tier 1 — Must Have** | Windows Security Events, Linux auth.log, Firewall logs, DNS logs | These cover authentication, process execution, network boundaries, and name resolution — the four pillars of visibility | | **Tier 2 — High Value** | Email gateway, Web proxy, Cloud audit trails (Azure AD / CloudTrail) | These cover the top initial access vector (email), outbound traffic inspection, and cloud identity — where modern attacks happen | | **Tier 3 — Deep Visibility** | Sysmon (Lesson 2.6), Application logs, Database audit, Endpoint telemetry (EDR — Endpoint Detection and Response) | These provide the deep technical detail needed for advanced investigation and threat hunting | ### What Happens When a Log Source Is Missing | Missing Source | What You Can't See | |---------------|-------------------| | No email gateway logs | Phishing campaigns, BEC, malicious attachments — you only find out after the user clicks | | No web proxy logs | C2 callbacks, malware downloads, data exfiltration over HTTPS | | No DNS logs | DNS tunneling, DGA activity, C2 domain resolution | | No cloud audit logs | Compromised cloud accounts, unauthorized data access, shadow IT | | No process creation (4688) | What programs ran on compromised systems — you see the login but not what happened after | --- ## Practical Application: Your Lab Environment In Lab 1.3, you worked with events from 12 agents covering 12 log sources across 4 subnets. Here's how they map to the 8 categories you just learned: | Category | Covered in Lab? | Agent(s) | What You Saw | |----------|----------------|----------|-------------| | Windows Event Logs | Yes | WIN-DC-01, WIN-DB-01, WIN-FS-01 | 4624/4625/4688/7045/1102 | | Linux / Syslog | Yes | linux-web-01, linux-web-02 | SSH auth, sudo, cron | | Firewall & Network | Yes | fw-edge-01, fw-internal-01 | iptables allow/drop | | DNS Query Logs | Yes | dns-server-01 | Named query logs | | Web Proxy / HTTP | Yes | fw-internal-01 | Squid proxy access logs (browsing, C2, blocked malware) | | Email Gateway | Yes | mail-gw-01 | Delivered, blocked, quarantined, phishing, BEC | | Cloud Audit Trails | Yes | dns-server-01 (collector) | AWS CloudTrail + Azure AD events | | Application & DB | Yes | linux-web-01 | Apache access/error logs (404 probes, SQL injection, web shell) | Your lab environment covers all 8 categories. This full-spectrum visibility is exactly what you'll have in a real SOC — and what you need to detect multi-stage attacks that span email, endpoints, network, and cloud. Every category played a role in the "Operation Shadow Broker" APT narrative running through your lab data.

**What about the other 3 lab sources?** In Lab 1.3, you explored 12 individual log sources, but 9 of those map into the 8 categories above (Windows Security and Windows System both fall under "Windows Event Logs"). The remaining 3 are: **Sysmon** (deep endpoint telemetry that enhances Windows Event Logs — covered separately in Lesson 2.6), **File Integrity Monitoring (FIM)** (a Wazuh-native capability that watches for file changes), and **Agent Status** (operational health data from Wazuh agents). These are valuable detection sources, but in the industry they're considered *enhancements* to the core 8 categories rather than categories of their own. Sysmon enhances "Windows Event Logs," FIM is a specialized monitoring layer, and agent status is SIEM-specific telemetry.

--- ## Beyond the Core 8: Sources You'll Encounter in Production The 8 categories above cover the foundation, but production SOCs often monitor additional specialized sources that fill important detection gaps: ### VPN / Remote Access Logs With remote and hybrid work now standard, VPN concentrators generate logs that are critical for detecting account compromise: - **Successful/failed authentications** — brute force against the VPN is a common initial access vector (T1133) - **Geographic anomalies** — a user logged in from New York at 9 AM and from Singapore at 9:15 AM ("impossible travel") - **Concurrent sessions** — the same account connected from two locations simultaneously (legitimate user + attacker) - **Connection duration and data volume** — a VPN session at 3 AM transferring 50 GB suggests exfiltration Common platforms: Cisco AnyConnect, Palo Alto GlobalProtect, Fortinet FortiClient, Zscaler Private Access (ZPA). ### DHCP Logs DHCP logs map **IP addresses to MAC addresses and hostnames** at specific times. On their own they're not exciting — but during an investigation, they're irreplaceable. When a firewall log says `10.0.3.47` was communicating with a C2 server, DHCP logs tell you *which physical device* had that IP at that time. Without DHCP, IP-based alerts can be impossible to attribute in environments with dynamic addressing. ### Web Application Firewall (WAF) Logs A WAF sits in front of web applications (unlike a network firewall which guards the perimeter) and inspects HTTP traffic for application-layer attacks: - **SQL injection** (T1190), cross-site scripting (XSS), command injection - **Bot traffic** — credential stuffing, web scraping, API abuse - **Rate limiting** — repeated requests to login pages or API endpoints - **OWASP Top 10** violations Common platforms: AWS WAF, Cloudflare WAF, Akamai, F5 Advanced WAF, ModSecurity. WAF logs complement web server access logs: the access log shows every request, while the WAF highlights which ones were malicious. ### Identity Provider (IdP) Logs Beyond Azure AD (covered in Category 7), many organizations use dedicated identity providers that manage authentication across all applications: - **Okta** — sign-in events, MFA push approvals/denials, suspicious activity alerts, admin console changes - **Ping Identity** — federation events, SSO session tracking - **CyberArk / BeyondTrust** — privileged account checkout, session recording, password rotation events IdP logs are where you detect **MFA fatigue attacks** (T1621) — an attacker with stolen credentials spamming push notifications until the user accidentally approves. These events only appear in IdP logs, not Windows Event Logs. ### EDR Telemetry While "endpoint logs" are part of Category 1 (Windows Events), **EDR (Endpoint Detection and Response)** platforms generate their own telemetry stream that goes far beyond what standard OS logging provides: - Process creation with full command lines, parent chains, and digital signature verification - File creation/modification/deletion with hashes - Network connections per process (which process connected to which IP) - Registry modifications in real-time - In-memory activity detection (fileless attacks, process injection) Common platforms: CrowdStrike Falcon, Microsoft Defender for Endpoint, SentinelOne, Carbon Black. In our lab environment, **Sysmon** serves the same role on Windows endpoints — it's essentially open-source EDR telemetry. You'll explore it in detail in Lesson 2.10.

**Which Sources to Push For.** If you join a SOC and the core 8 are already covered, the next highest-impact additions are typically: (1) VPN logs — remote access is a top initial access vector; (2) IdP logs — identity attacks (MFA fatigue, token theft) are surging; (3) WAF logs — if the organization has public-facing web applications. Advocating for log source expansion is one of the most impactful things a junior analyst can do.

--- ## Understanding Log Formats Logs from different sources arrive in different **formats**. Knowing the common formats helps you read raw logs and understand how the SIEM parses them: | Format | Description | Used By | |--------|------------|---------| | **Syslog (RFC 5424)** | Timestamp + hostname + facility + severity + message. The oldest and most universal log format. | Linux systems, network devices, firewalls | | **Windows Event XML** | Structured XML with EventID, channels, and EventData fields. Rich but verbose. | Windows Event Log (forwarded via agents or WEF) | | **JSON** | Key-value pairs in nested structures. Machine-readable, growing in popularity. | Cloud audit trails (CloudTrail), modern APIs, Wazuh alert format | | **CEF (Common Event Format)** | Pipe-delimited header + key=value extensions. Standardized by ArcSight. | Palo Alto, Check Point, many enterprise security tools | | **LEEF (Log Event Extended Format)** | Tab-delimited. IBM's equivalent of CEF. | QRadar ecosystem, IBM products | | **CSV / Key=Value** | Simple structured formats used by specific vendors. | Fortinet (key=value), many custom applications |

**Why Format Matters for SOC Analysts.** When logs arrive at the SIEM, **decoders** (covered in Lesson 2.3) parse each format and extract the fields you search on — source IP, destination port, username, etc. If a log format isn't decoded properly, the raw message arrives but the fields are empty, which means your DQL queries won't match. Understanding formats helps you troubleshoot "I know this event happened, but I can't find it in Threat Hunting" problems.

--- ## How Logs Reach the SIEM You now know *what* data feeds a SOC. But how does that data actually get from the source to your Threat Hunting screen? The **Wazuh Architecture** lesson (Lesson 2.1) covers five ingestion methods in detail: 1. **Agent-based collection** — software installed on the endpoint collects logs, monitors files, and sends everything to the SIEM Manager. The richest method — most of the 12 agents in your lab use this. 2. **Syslog forwarding** — network devices (firewalls, routers, switches) send log messages to a central server using the syslog protocol. Universal but limited to log messages only. 3. **API-based collection** — the SIEM polls cloud and SaaS APIs on a schedule (e.g., pulling CloudTrail from S3, Azure AD sign-ins from the Graph API). Essential for cloud-native sources. 4. **Log forwarders/shippers** — lightweight tools like Filebeat, Fluentd, or rsyslog aggregate and normalize logs from many sources before shipping them to the SIEM. Common in large environments. 5. **Agentless monitoring** — the SIEM connects to remote hosts via SSH and reads logs directly. A last resort for systems that can't support agents or syslog. Each method has different trade-offs in data richness, latency, and reliability. In the Architecture lesson, you'll see how each maps to the lab environment and understand why production SIEMs use all five in combination.

- A SIEM monitors 8 core categories of log sources: Windows Events, Linux/Syslog, Firewall, DNS, Web Proxy, Email Gateway, Cloud Audit Trails, and Application/Database logs - Windows Event IDs 4624, 4625, 4688, 7045, and 1102 are the five you must recognize on sight — they cover authentication, process execution, persistence, and anti-forensics - Logon Type in Windows 4624/4625 events tells you *how* the authentication happened — Type 3 (Network) is where lateral movement lives - DNS logs are one of the most reliable detection sources because attackers cannot avoid DNS resolution - Email gateway logs are critical because phishing remains the #1 initial access vector - Cloud audit trails are the fastest-growing log category as organizations move to hybrid and cloud-first architectures - Beyond the core 8, production SOCs also monitor VPN logs, DHCP (IP-to-host mapping), WAF (application-layer attacks), IdP logs (MFA fatigue, identity attacks), and EDR telemetry (deep endpoint visibility) - Logs arrive at the SIEM in different formats (Syslog, JSON, CEF, XML) — decoders parse each format to extract the fields you search on - Five ingestion methods (agents, syslog, APIs, log forwarders, agentless) each serve different source types — production SIEMs use all five - No single log source covers the full ATT&CK kill chain — defense in depth requires log sources in depth - Your first question on day one of a new SOC job: "What log sources are we collecting — and what gaps do we have?"

## What's Next With the 8 log source categories mapped, it's time to go hands-on. In **Lab 2.2 — Log Source Deep Dive**, you'll find specific examples of each critical event type in Wazuh — a failed logon, a new service, a web attack, a DNS query — and practice writing analyst notes that explain why each one matters.