- Write search queries in the Wazuh Dashboard using DQL (Dashboard Query Language) - Filter events by agent, rule ID, severity, time range, and custom fields - Correlate events across multiple log sources to reconstruct attack timelines - Build investigation queries that follow an attacker's trail across hosts - Understand time-based correlation and why sequence matters - Export search results for documentation and incident reports

**Three Scenarios in Your Lab.** The Wazuh lab environment contains alerts from three concurrent incidents: **Operation APT Breach** (external attacker, 06:00-09:00, CDB-enriched), **Operation Inside Out** (insider threat, 18:00-21:00), and **Operation LOLBin Strike** (cryptominer via LOLBins, 10:00-13:00). Each scenario produces distinct log types and detection patterns that you'll analyze in the corresponding labs.

## From Dashboard to Investigation In Lesson 2.3, you learned to read dashboards — spotting spikes, gaps, and anomalies at a glance. But when something catches your eye, you need to drill deeper. **Searching** is how you go from "something looks wrong" to "here's exactly what happened." A SOC analyst's search ability is their most important technical skill. A fast analyst can write a query in 10 seconds, find the relevant events in 30, and have a hypothesis in 60. A slow analyst scrolls through pages of alerts hoping something jumps out. The difference is query literacy. > **The 80/20 Rule of SOC Investigation:** 80% of your investigation time should be spent reading and analyzing results, not writing queries. If you're spending more time constructing searches than reading results, you need to learn the query syntax better. ![Search and Correlation Workflow](https://cyberblue-academy-content.s3.us-east-2.amazonaws.com/courses/cyberbluesoc-academy/module-02/lesson-2-4/search-correlation-workflow.png) --- ## Search Syntax in Wazuh Wazuh Dashboard uses **OpenSearch** (Wazuh's underlying search engine) as its backend. The primary query language is **DQL (Dashboard Query Language)** — a simple, intuitive syntax you type directly into the search bar on the Threat Hunting page.

**DQL is all you need.** You may encounter references to "Lucene syntax" in older Wazuh documentation — DQL is built on Lucene under the hood. This course uses DQL exclusively. The query concepts transfer directly to any SIEM you encounter in your career.

For daily analyst work, **DQL in the search bar** covers 90% of your needs. The remaining 10% (automated searches, complex aggregations) use the OpenSearch Query DSL (JSON-based), which you'll encounter in advanced modules. ### DQL Syntax Essentials The search bar in Wazuh Threat Hunting accepts DQL queries. Here are the patterns you'll use every day: | Query Pattern | Example | What It Does | |---------------|---------|-------------| | **Field : "value"** | `rule.id : "5551"` | Find all alerts where rule.id is exactly 5551 | | **Field : "phrase"** | `rule.description : "brute force"` | Find alerts where description contains the exact phrase | | **Wildcard** | `agent.name : linux*` | Match any agent name starting with "linux" | | **Range** | `rule.level : [10 TO 15]` | Find alerts with severity 10, 11, 12, 13, 14, or 15 | | **Boolean AND** | `rule.level : 10 AND agent.name : "linux-web-01"` | Both conditions must match | | **Boolean OR** | `rule.id : "5551" OR rule.id : "80790"` | Either condition matches | | **NOT** | `rule.level : [10 TO 15] AND NOT rule.id : "530"` | Exclude heartbeat alerts from high-severity results | | **Grouping** | `(rule.id : "5551" OR rule.id : "5503") AND agent.name : "linux-web-01"` | Group conditions with parentheses | | **Exists** | `data.srcip : *` | Find alerts that have a source IP field (any value) | | **Nested fields** | `data.win.eventdata.targetUserName : "Administrator"` | Access deeply nested fields with dot notation |

**Pro Tip: Start Broad, Then Narrow.** When investigating, start with a broad query (`agent.name : "linux-web-01"`) to see all activity on a host, then add conditions to narrow (`agent.name : "linux-web-01" AND rule.level : [7 TO 15]`) until you have a focused set of relevant events.

### Time Range — The Most Important Filter Every search should have a time constraint. Without it, you're searching all historical data, which is slow and returns noise from weeks ago. | Time Range | When to Use It | |------------|---------------| | **Last 15 minutes** | Active incident — what's happening right now? | | **Last 1 hour** | Recent investigation — following up on a dashboard spike | | **Last 24 hours** | Shift overview — what happened today? | | **Last 7 days** | Trend analysis — is this behavior new? | | **Custom range** | Investigation — "show me everything between 2:00 AM and 3:00 AM on Tuesday" | --- ## The Investigation Query Toolkit Here are the most common search queries a SOC analyst runs during investigations. Memorize these — you'll use them daily. ### Query 1: All Activity on a Specific Host ``` agent.name : "linux-web-01" ``` Use this as your starting point when a host appears in a dashboard alert. Sort by timestamp to see the chronological story. ### Query 2: High-Severity Alerts Only ``` rule.level : [10 TO 15] ``` Skip the noise and focus on what matters. Combine with a time range to see recent critical activity. ### Query 3: Activity from a Specific Source IP ``` data.srcip : "185.220.101.42" ``` When you identify a suspicious IP, search for ALL events involving it — not just the alert that caught your attention. The attacker may have interacted with multiple systems. ### Query 4: Failed Authentication Across All Systems ``` rule.groups : "authentication_failed" ``` This catches failed SSH logins, failed Windows logons, and any other authentication failure across all agents and log sources. It's broader than searching for a single rule ID. ### Query 5: Specific Windows Event ID ``` data.win.system.eventID : "4625" AND agent.name : "WIN-DC-01" ``` Drill into a specific Windows event type on a specific host. Change the event ID to investigate different activity types. ### Query 6: File Integrity Changes ``` rule.groups : "syscheck" AND (syscheck.path : "/etc/passwd" OR syscheck.path : "/etc/shadow") ``` Find all file integrity monitoring alerts for critical system files. Changes to these files almost always warrant investigation. ### Query 7: Firewall Blocks to a Specific Port ``` rule.groups : "firewall_drop" AND data.dstport : "445" ``` Find all firewall events targeting a specific port. Port 445 (SMB — Server Message Block, Windows file sharing and remote admin protocol) is a high-value target for lateral movement attempts. ![The Investigation Query Toolkit — 7 Essential Searches](https://cyberblue-academy-content.s3.us-east-2.amazonaws.com/courses/cyberbluesoc-academy/module-02/lesson-2-4/investigation-query-toolkit.png)

**Build a Query Library.** Experienced analysts maintain a personal collection of saved queries for common investigation scenarios. Every time you write a useful query, save it. Over time, this library becomes your most valuable tool — faster than writing queries from scratch every time.

--- ## Event Correlation: Connecting the Dots Individual events tell you *what* happened. **Correlation** tells you *why* — by connecting related events across time, hosts, and log sources into a coherent narrative. Correlation is the skill that separates L1 analysts (Tier 1 — alert triage) from L2 analysts (Tier 2 — incident investigation). It's the ability to look at a failed login, a new service installation, and a file change and ask: "Are these related?" ### The Three Dimensions of Correlation | Dimension | Question | Example | |-----------|----------|---------| | **Time** | Did these events happen close together? | Failed logins at 2:00 AM → new service at 2:05 AM → file change at 2:07 AM | | **Entity** | Do these events share an IP, user, or host? | Same source IP (185.220.101.42) in SSH logs AND firewall logs | | **Behavior** | Do these events form a known attack pattern? | Brute force → lateral movement → persistence = classic intrusion chain | ### Building a Correlation: The Attack Timeline Let's walk through how an analyst correlates events during an investigation. Imagine your dashboard shows a spike in high-severity alerts on linux-web-01 at 06:25 UTC. **Step 1: Identify the trigger alert** ``` agent.name : "linux-web-01" AND rule.level : [10 TO 15] ``` You find rule 5551 (SSH brute force) fired at 06:25. Source IP: 185.220.101.42. **Step 2: Search for all activity from this IP** ``` data.srcip : "185.220.101.42" ``` You discover this IP also appears in firewall logs (blocked SMB attempts on port 445) and earlier failed SSH attempts starting at 06:15. The attacker was probing before the brute force. **Step 3: Check what happened on the target host AFTER the brute force** ``` agent.name : "linux-web-01" AND timestamp : [06:25 TO 07:30] ``` You find: - 06:30 — Successful SSH login (rule 5501) from a different internal IP (pivoted?) - 06:35 — sudo to root (rule 5402) - 06:37 — /etc/passwd modified (rule 550, syscheck) - 06:40 — /etc/shadow modified (rule 550, syscheck) **Step 4: Build the narrative** ``` Timeline: Reconnaissance (06:15) → Brute force (06:25) → Login (06:30) → Privilege escalation (06:35) → Account manipulation (06:37-06:40) ``` This is a **complete attack chain** — from initial probing to credential theft to persistence. Without correlation, each event looks like an isolated alert. Together, they tell the story of a compromise. ![Correlation — Building an Attack Timeline](https://cyberblue-academy-content.s3.us-east-2.amazonaws.com/courses/cyberbluesoc-academy/module-02/lesson-2-4/correlation-attack-timeline.png)

**Time Zones Will Trip You Up.** Wazuh stores timestamps in UTC. Your local time might be different. When correlating events, always work in UTC to avoid confusion. A "2:00 AM UTC" event happened at "9:00 PM EST the previous day" — if you mix time zones, your timeline breaks.

--- ## Cross-Source Correlation The most powerful correlations connect events from different log sources. An attacker's actions leave traces across multiple systems — the SIEM is the only place where all those traces converge. ### Example: Correlating Across Four Log Sources | Time (UTC) | Source | Agent | Event | Query | |------------|--------|-------|-------|-------| | 06:12 | Firewall | fw-edge-01 | Blocked: 185.220.101.42 → port 445 | `data.srcip : "185.220.101.42" AND rule.groups : "firewall_drop"` | | 06:15 | SSH | linux-web-01 | Failed SSH login from 185.220.101.42 | `data.srcip : "185.220.101.42" AND rule.groups : "sshd"` | | 06:25 | SSH | linux-web-01 | Brute force detected (rule 5551) | `rule.id : "5551"` | | 06:30 | Auth | linux-web-01 | Successful login — new session | `agent.name : "linux-web-01" AND rule.id : "5501"` | | 06:37 | FIM | linux-web-01 | /etc/passwd modified | `rule.groups : "syscheck" AND agent.name : "linux-web-01"` | | 06:42 | DNS | dns-server-01 | Query for suspicious domain from 10.0.3.30 | `agent.name : "dns-server-01" AND data.srcip : "10.0.3.30"` | Notice how the attacker's trail crosses **firewall → SSH → authentication → file integrity → DNS** — five different log sources, four different agents. No single log source tells the full story. Only by correlating across all of them do you see the complete picture.

**This Is Why Log Sources Matter.** In Lesson 2.1, you learned the 8 log source categories. Now you see why: each source contributes a piece of the puzzle. The firewall shows the attacker's reconnaissance. SSH shows the attack vector. FIM shows persistence. DNS shows potential C2. Remove any source, and you have a blind spot in your investigation.

--- ## Practical Correlation Techniques ### Technique 1: Pivot Searching Start with one indicator and "pivot" to find related events: 1. **Start with an IP**: `data.srcip : "185.220.101.42"` — find all events from this IP 2. **Pivot to the target host**: `agent.name : "linux-web-01"` — see everything that happened on that host 3. **Pivot to the user**: `data.dstuser : "root"` — see all activity for the targeted account 4. **Pivot to the time window**: Narrow to the 30-minute window around the incident Each pivot reveals new connections and expands your understanding of the incident scope. ### Technique 2: Baseline Comparison Compare current activity against what's normal: ``` agent.name : "linux-web-01" AND rule.id : "5503" ``` Run this for "Last 7 days" and note the average count. Then run it for "Last 24 hours." If today's count is 10x the average, the brute force campaign is real and recent. ### Technique 3: Time Window Analysis When you know the approximate incident time, create a tight window: ``` agent.name : "linux-web-01" AND timestamp : [2026-02-15T06:00:00 TO 2026-02-15T07:00:00] ``` Sort by timestamp ascending. Read the events in order — this gives you the attack narrative in chronological sequence. --- ## Exporting Search Results Investigation findings need to be documented. Here's how to export from Wazuh: | Method | Format | Best For | |--------|--------|----------| | **CSV Export** | Spreadsheet-compatible | Sharing with non-SOC stakeholders, creating reports | | **JSON Export** | Machine-readable | Importing into case management (TheHive), automation | | **Dashboard Screenshot** | Image | Quick email updates, shift handoff notes | | **Saved Search** | Wazuh saved object | Rerunning the same investigation query later |

**Always Save Your Key Queries.** Before closing a browser tab, save any investigation query that produced useful results. Name it descriptively: "2026-02-15 SSH brute force investigation — linux-web-01." Future analysts (including future you) will thank you when a similar incident occurs.

--- ## Common Search Mistakes | Mistake | Impact | Fix | |---------|--------|-----| | **No time filter** | Returns millions of results, slow and unfocused | Always set a time range before searching | | **Too narrow too fast** | Miss related events by filtering too aggressively | Start broad, then narrow progressively | | **Searching one log source** | Miss cross-source correlations | Search by IP/user across ALL sources first | | **Exact string mismatch** | No results because of case or format differences | Use wildcards (`*brute*`) and lowercase | | **Ignoring surrounding events** | See the alert but miss the context before/after | Always expand the time window ±15 minutes around an alert | | **Not saving queries** | Rewrite the same search next time | Save every useful investigation query | --- ## What's Next Now that you can write DQL queries, it's time to put them to work. In **Lab 2.5 — Hunt by Query**, you'll use everything you've learned to hunt for specific threats across the alert dataset using DQL. ---

- Search is the SOC analyst's core technical skill — master DQL syntax until you can write queries in your sleep - Start broad (all events on a host) then narrow (add severity, time, specific fields) to focus your investigation - Correlation connects isolated events into attack narratives by linking them across time, entities, and behavior - Cross-source correlation is the most powerful technique — attackers leave traces across firewall, authentication, FIM, and DNS logs - Pivot searching (IP → host → user → time window) systematically expands your investigation scope - Always set a time range, always save useful queries, and always work in UTC - The 7 essential search queries (host, severity, source IP, auth failures, Windows Event ID, FIM, firewall) cover 90% of investigations