Lesson 4 of 6·12 min read·Includes quiz

Search & Correlation

Query syntax, filtering, correlation

What You'll Learn

  • Write search queries in the Wazuh Dashboard (OpenSearch Query DSL and Lucene syntax)
  • Filter events by agent, rule ID, severity, time range, and custom fields
  • Correlate events across multiple log sources to reconstruct attack timelines
  • Build investigation queries that follow an attacker's trail across hosts
  • Understand time-based correlation and why sequence matters
  • Export search results for documentation and incident reports

From Dashboard to Investigation

In Lesson 2.3, you learned to read dashboards — spotting spikes, gaps, and anomalies at a glance. But when something catches your eye, you need to drill deeper. Searching is how you go from "something looks wrong" to "here's exactly what happened."

A SOC analyst's search ability is their most important technical skill. A fast analyst can write a query in 10 seconds, find the relevant events in 30, and have a hypothesis in 60. A slow analyst scrolls through pages of alerts hoping something jumps out. The difference is query literacy.

The 80/20 Rule of SOC Investigation: 80% of your investigation time should be spent reading and analyzing results, not writing queries. If you're spending more time constructing searches than reading results, you need to learn the query syntax better.

Search and Correlation Workflow


Search Syntax in Wazuh

Wazuh Dashboard uses OpenSearch as its backend, which supports two query syntaxes:

  1. Lucene Query Syntax (the search bar) — quick, simple, used for fast filtering
  2. OpenSearch Query DSL (Dev Tools) — JSON-based, powerful, used for complex queries

For daily analyst work, Lucene syntax in the search bar covers 90% of your needs. You'll use Query DSL when building automated searches or complex correlations.

Lucene Syntax Essentials

The search bar in Wazuh Security Events accepts Lucene query strings. Here are the patterns you'll use every day:

Query PatternExampleWhat It Does
Field:valuerule.id:5551Find all alerts where rule.id is exactly 5551
Field:"phrase"rule.description:"brute force"Find alerts where description contains the exact phrase
Wildcardagent.name:linux*Match any agent name starting with "linux"
Rangerule.level:[10 TO 15]Find alerts with severity 10, 11, 12, 13, 14, or 15
Boolean ANDrule.level:10 AND agent.name:linux-web-01Both conditions must match
Boolean ORrule.id:5551 OR rule.id:80790Either condition matches
NOTrule.level:[10 TO 15] AND NOT rule.id:530Exclude heartbeat alerts from high-severity results
Grouping(rule.id:5551 OR rule.id:5503) AND agent.name:linux-web-01Group conditions with parentheses
Existsdata.srcip:*Find alerts that have a source IP field (any value)
Nested fieldsdata.win.eventdata.targetUserName:AdministratorAccess deeply nested fields with dot notation
💡

Pro Tip: Start Broad, Then Narrow. When investigating, start with a broad query (agent.name:linux-web-01) to see all activity on a host, then add conditions to narrow (agent.name:linux-web-01 AND rule.level:[7 TO 15]) until you have a focused set of relevant events.

Time Range — The Most Important Filter

Every search should have a time constraint. Without it, you're searching all historical data, which is slow and returns noise from weeks ago.

Time RangeWhen to Use It
Last 15 minutesActive incident — what's happening right now?
Last 1 hourRecent investigation — following up on a dashboard spike
Last 24 hoursShift overview — what happened today?
Last 7 daysTrend analysis — is this behavior new?
Custom rangeInvestigation — "show me everything between 2:00 AM and 3:00 AM on Tuesday"

The Investigation Query Toolkit

Here are the most common search queries a SOC analyst runs during investigations. Memorize these — you'll use them daily.

Query 1: All Activity on a Specific Host

agent.name:linux-web-01

Use this as your starting point when a host appears in a dashboard alert. Sort by timestamp to see the chronological story.

Query 2: High-Severity Alerts Only

rule.level:[10 TO 15]

Skip the noise and focus on what matters. Combine with a time range to see recent critical activity.

Query 3: Activity from a Specific Source IP

data.srcip:185.220.101.42

When you identify a suspicious IP, search for ALL events involving it — not just the alert that caught your attention. The attacker may have interacted with multiple systems.

Query 4: Failed Authentication Across All Systems

rule.groups:authentication_failed

This catches failed SSH logins, failed Windows logons, and any other authentication failure across all agents and log sources. It's broader than searching for a single rule ID.

Query 5: Specific Windows Event ID

data.win.system.eventID:4625 AND agent.name:WIN-SERVER-01

Drill into a specific Windows event type on a specific host. Change the event ID to investigate different activity types.

Query 6: File Integrity Changes

rule.groups:syscheck AND (syscheck.path:/etc/passwd OR syscheck.path:/etc/shadow)

Find all file integrity monitoring alerts for critical system files. Changes to these files almost always warrant investigation.

Query 7: Firewall Blocks to a Specific Port

rule.groups:firewall_drop AND data.dstport:445

Find all firewall events targeting a specific port. Port 445 (SMB) is a high-value target for lateral movement attempts.

The Investigation Query Toolkit — 7 Essential Searches

Build a Query Library. Experienced analysts maintain a personal collection of saved queries for common investigation scenarios. Every time you write a useful query, save it. Over time, this library becomes your most valuable tool — faster than writing queries from scratch every time.


Event Correlation: Connecting the Dots

Individual events tell you what happened. Correlation tells you why — by connecting related events across time, hosts, and log sources into a coherent narrative.

Correlation is the skill that separates L1 analysts (who process alerts) from L2 analysts (who investigate incidents). It's the ability to look at a failed login, a new service installation, and a file change and ask: "Are these related?"

The Three Dimensions of Correlation

DimensionQuestionExample
TimeDid these events happen close together?Failed logins at 2:00 AM → new service at 2:05 AM → file change at 2:07 AM
EntityDo these events share an IP, user, or host?Same source IP (185.220.101.42) in SSH logs AND firewall logs
BehaviorDo these events form a known attack pattern?Brute force → lateral movement → persistence = classic intrusion chain

Building a Correlation: The Attack Timeline

Let's walk through how an analyst correlates events during an investigation. Imagine your dashboard shows a spike in high-severity alerts on linux-web-01 at 06:25 UTC.

Step 1: Identify the trigger alert

agent.name:linux-web-01 AND rule.level:[10 TO 15]

You find rule 5551 (SSH brute force) fired at 06:25. Source IP: 185.220.101.42.

Step 2: Search for all activity from this IP

data.srcip:185.220.101.42

You discover this IP also appears in firewall logs (blocked SMB attempts on port 445) and earlier failed SSH attempts starting at 06:15. The attacker was probing before the brute force.

Step 3: Check what happened on the target host AFTER the brute force

agent.name:linux-web-01 AND timestamp:[06:25 TO 07:30]

You find:

  • 06:30 — Successful SSH login (rule 5501) from a different internal IP (pivoted?)
  • 06:35 — sudo to root (rule 5402)
  • 06:37 — /etc/passwd modified (rule 550, syscheck)
  • 06:40 — /etc/shadow modified (rule 550, syscheck)

Step 4: Build the narrative

Timeline: Reconnaissance (06:15) → Brute force (06:25) → Login (06:30) →
          Privilege escalation (06:35) → Account manipulation (06:37-06:40)

This is a complete attack chain — from initial probing to credential theft to persistence. Without correlation, each event looks like an isolated alert. Together, they tell the story of a compromise.

Correlation — Building an Attack Timeline

Time Zones Will Trip You Up. Wazuh stores timestamps in UTC. Your local time might be different. When correlating events, always work in UTC to avoid confusion. A "2:00 AM UTC" event happened at "9:00 PM EST the previous day" — if you mix time zones, your timeline breaks.


Cross-Source Correlation

The most powerful correlations connect events from different log sources. An attacker's actions leave traces across multiple systems — the SIEM is the only place where all those traces converge.

Example: Correlating Across Four Log Sources

Time (UTC)SourceAgentEventQuery
06:12Firewallfw-edge-01Blocked: 185.220.101.42 → port 445data.srcip:185.220.101.42 AND rule.groups:firewall_drop
06:15SSHlinux-web-01Failed SSH login from 185.220.101.42data.srcip:185.220.101.42 AND rule.groups:sshd
06:25SSHlinux-web-01Brute force detected (rule 5551)rule.id:5551
06:30Authlinux-web-01Successful login — new sessionagent.name:linux-web-01 AND rule.id:5501
06:37FIMlinux-web-01/etc/passwd modifiedrule.groups:syscheck AND agent.name:linux-web-01
06:42DNSdns-server-01Query for suspicious domain from 10.0.2.15agent.name:dns-server-01 AND data.srcip:10.0.2.15

Notice how the attacker's trail crosses firewall → SSH → authentication → file integrity → DNS — five different log sources, four different agents. No single log source tells the full story. Only by correlating across all of them do you see the complete picture.

This Is Why Log Sources Matter. In Lesson 2.1, you learned the 8 log source categories. Now you see why: each source contributes a piece of the puzzle. The firewall shows the attacker's reconnaissance. SSH shows the attack vector. FIM shows persistence. DNS shows potential C2. Remove any source, and you have a blind spot in your investigation.


Practical Correlation Techniques

Technique 1: Pivot Searching

Start with one indicator and "pivot" to find related events:

  1. Start with an IP: data.srcip:185.220.101.42 — find all events from this IP
  2. Pivot to the target host: agent.name:linux-web-01 — see everything that happened on that host
  3. Pivot to the user: data.dstuser:root — see all activity for the targeted account
  4. Pivot to the time window: Narrow to the 30-minute window around the incident

Each pivot reveals new connections and expands your understanding of the incident scope.

Technique 2: Baseline Comparison

Compare current activity against what's normal:

agent.name:linux-web-01 AND rule.id:5503

Run this for "Last 7 days" and note the average count. Then run it for "Last 24 hours." If today's count is 10x the average, the brute force campaign is real and recent.

Technique 3: Time Window Analysis

When you know the approximate incident time, create a tight window:

agent.name:linux-web-01 AND timestamp:[2026-02-15T06:00:00 TO 2026-02-15T07:00:00]

Sort by timestamp ascending. Read the events in order — this gives you the attack narrative in chronological sequence.


Exporting Search Results

Investigation findings need to be documented. Here's how to export from Wazuh:

MethodFormatBest For
CSV ExportSpreadsheet-compatibleSharing with non-SOC stakeholders, creating reports
JSON ExportMachine-readableImporting into case management (TheHive), automation
Dashboard ScreenshotImageQuick email updates, shift handoff notes
Saved SearchWazuh saved objectRerunning the same investigation query later
💡

Always Save Your Key Queries. Before closing a browser tab, save any investigation query that produced useful results. Name it descriptively: "2026-02-15 SSH brute force investigation — linux-web-01." Future analysts (including future you) will thank you when a similar incident occurs.


Common Search Mistakes

MistakeImpactFix
No time filterReturns millions of results, slow and unfocusedAlways set a time range before searching
Too narrow too fastMiss related events by filtering too aggressivelyStart broad, then narrow progressively
Searching one log sourceMiss cross-source correlationsSearch by IP/user across ALL sources first
Exact string mismatchNo results because of case or format differencesUse wildcards (*brute*) and lowercase
Ignoring surrounding eventsSee the alert but miss the context before/afterAlways expand the time window ±15 minutes around an alert
Not saving queriesRewrite the same search next timeSave every useful investigation query

Key Takeaways

  • Search is the SOC analyst's core technical skill — learn Lucene syntax until you can write queries in your sleep
  • Start broad (all events on a host) then narrow (add severity, time, specific fields) to focus your investigation
  • Correlation connects isolated events into attack narratives by linking them across time, entities, and behavior
  • Cross-source correlation is the most powerful technique — attackers leave traces across firewall, authentication, FIM, and DNS logs
  • Pivot searching (IP → host → user → time window) systematically expands your investigation scope
  • Always set a time range, always save useful queries, and always work in UTC
  • The 7 essential search queries (host, severity, source IP, auth failures, Windows Event ID, FIM, firewall) cover 90% of investigations

Knowledge Check: Search & Correlation

10 questions · 70% to pass

1

You want to find all SSH brute force alerts (rule 5551) on linux-web-01 in the Wazuh search bar. Which Lucene query is correct?

2

What Lucene query would find all alerts with severity 10 or higher, excluding agent heartbeat events (rule 530)?

3

During an investigation, you find that source IP 185.220.101.42 appears in SSH logs, firewall logs, AND DNS logs across different agents. What correlation technique are you using?

4

An analyst investigating an alert starts by searching for the suspicious IP, then searches for all activity on the target host, then narrows to a specific user account. What technique is this?

5

Why is it critical to always work in UTC when correlating events from multiple agents in Wazuh?

6

An analyst searches for 'agent.name:linux-web-01 AND rule.level:15' and gets zero results. They conclude no critical events occurred. What mistake did they make?

7

You build a correlation timeline and see: Firewall block (06:12) → Failed SSH (06:15) → Brute force (06:25) → Successful login (06:30) → /etc/passwd modified (06:37). What attack phase does the /etc/passwd modification represent?

8

In Labs 1.1 and 1.3, the SSH brute force attack came from IP 185.220.101.42. If you search data.srcip:185.220.101.42 across all agents in the Wazuh lab, which additional log source (beyond SSH) would also contain events from this IP?

9

In Lab 1.3, you discovered that linux-web-01 had FIM alerts for both /etc/passwd and /etc/shadow being modified. What Lucene query would find these specific events?

10

In the lab environment, if you ran the query data.srcip:10.0.1.50 AND agent.name:linux-web-01, what would you find based on the pre-loaded data?

0/10 answered