- Construct YARA conditions using Boolean operators (and, or, not) to combine string matches - Use string counting operators (#, any of, all of, N of) for flexible detection logic - Apply file property checks (filesize, uint16, uint32) to restrict matches to specific file types - Use positional operators ($string at N, $string in range) for precise byte-offset matching - Compare weak rules with strong rules and identify techniques that reduce false positives - Connect condition-building skills to the webshell detection challenge in Lab 7.3

## The Condition Section: Where Precision Lives You have built a toolkit for writing YARA strings — text with modifiers, hex with wildcards and jumps, and regex for variable patterns. But strings alone do not make a good rule. The **condition** section determines when your rule fires, and the difference between a useful rule and a noisy one is almost always in the condition. A rule with great strings and a weak condition (`any of them`) will match thousands of legitimate files. A rule with the same strings and a precise condition will match only the target. The condition is where you control the signal-to-noise ratio. ![YARA condition logic — Boolean operators, counting, file properties, string positions, and a complete example](https://cyberblue-academy-content.s3.us-east-2.amazonaws.com/courses/cyberbluesoc-academy/module-07/lesson-7-3/yara-condition-logic.png) ## Boolean Operators YARA conditions use three Boolean operators: `and`, `or`, and `not`. ### and — Both Must Be True ``` condition: $download and $webclient ``` The rule fires only if **both** `$download` and `$webclient` are found in the file. Adding more `and` clauses makes the rule more restrictive (fewer matches, fewer false positives). ### or — Either Can Be True ``` condition: $eval or $system or $exec ``` The rule fires if **any one** of the three strings is found. Using `or` makes the rule more permissive (more matches, potentially more false positives). Use `or` when different strings indicate the same behavior — a web shell might use `eval`, `system`, or `exec` to execute commands, but they all mean "command execution." ### not — Must NOT Be Present ``` condition: $suspicious_string and not $known_good_string ``` The `not` operator excludes files that contain a specific string. This is powerful for eliminating known false positives. For example, if your web shell rule keeps matching a legitimate PHP framework file that happens to contain `eval(`, you can add a `not` clause for a string unique to that framework: ``` strings: $eval = "eval(" nocase $laravel = "Illuminate\\Foundation\\Application" condition: $eval and not $laravel ``` ### Operator Precedence and Grouping YARA follows standard operator precedence: `not` binds tightest, then `and`, then `or`. Use parentheses to make complex conditions readable: ``` condition: ($eval and $b64) or ($system and $post) or ($exec and $get) ``` Without parentheses, `$eval and $b64 or $system` would be parsed as `($eval and $b64) or $system` — which fires if `$system` alone is present. Always use parentheses when mixing `and` and `or`. ## String Counting Operators Counting operators are the most powerful tools for flexible detection. Instead of specifying exact Boolean combinations of named strings, you can count how many strings matched. ### any of them / all of them ``` condition: any of them // At least 1 string matches all of them // Every defined string matches ``` `any of them` is the loosest possible condition (highest recall, lowest precision). `all of them` is the tightest (lowest recall, highest precision — but fails if the target is missing even one string). ### N of them ``` condition: 3 of them // At least 3 of all defined strings ``` This is the sweet spot for most rules. If you define 6 strings that characterize a malware family, requiring 3 matches means the rule catches variants where some strings have been changed while keeping precision. ### N of ($pattern*) ``` strings: $web_eval = "eval(" nocase $web_system = "system(" nocase $web_exec = "exec(" nocase $web_passthru = "passthru(" nocase $input_post = "$_POST" $input_get = "$_GET" $input_request = "$_REQUEST" condition: 2 of ($web_*) and 1 of ($input_*) ``` The `$web_*` wildcard matches all string names starting with `$web_`. This condition requires at least 2 execution functions **and** at least 1 user input source — a pattern that strongly indicates a web shell. ### String Occurrence Count (#) The `#` operator counts how many times a string appears in the file: ``` condition: #eval > 3 // "eval" appears more than 3 times ``` Multiple occurrences of suspicious function calls are more indicative of malicious intent than a single occurrence. A legitimate PHP file might use `eval` once; a web shell often uses it repeatedly.

**Avoid `any of them` in production rules.** It is useful for rapid triage and testing, but a production rule with `any of them` will almost certainly generate false positives. Every string you define could appear in legitimate software. The power of YARA comes from requiring **combinations** of indicators — `3 of them`, `2 of ($exec_*) and $post`, or explicit Boolean logic. Single-string matching is antivirus; multi-indicator matching is threat hunting.

## File Property Checks File properties let you restrict your rule to specific file types and sizes without relying solely on string matching. ### filesize ``` condition: filesize < 50KB // Less than 50 kilobytes filesize > 100 and filesize < 2MB // Between 100 bytes and 2 megabytes filesize < 10MB // Less than 10 megabytes ``` The `filesize` check is arguably the single most effective false-positive reducer in YARA. Common ranges by target type: | Target | Typical Size | filesize Check | |---|---|---| | Web shell | 50 bytes - 50KB | `filesize < 50KB` | | Malware dropper/stager | 10KB - 500KB | `filesize < 500KB` | | RAT / backdoor | 50KB - 5MB | `filesize < 5MB` | | Ransomware | 100KB - 2MB | `filesize < 2MB` | | Legitimate enterprise app | 10MB - 500MB | (excluded by above ranges) | ### uint16 and uint32 — Magic Byte Checks The `uint16(offset)` and `uint32(offset)` functions read 2 or 4 bytes at a specific file offset and return them as an integer. This is how you check file format magic bytes: ``` condition: uint16(0) == 0x5A4D // PE executable (MZ header) uint32(0) == 0x464C457F // ELF binary (\x7FELF) uint32(0) == 0x04034B50 // ZIP archive (PK header) uint16(0) == 0x8B1F // GZIP compressed data ``` The `uint16(0) == 0x5A4D` check is the standard way to ensure you only match PE (Windows executable) files. Combined with filesize and string checks, this creates highly precise rules: ``` condition: uint16(0) == 0x5A4D and filesize < 1MB and 3 of them ``` This means: the file must be a PE executable, under 1MB, with at least 3 matching strings.

**Note the byte order.** YARA reads `uint16` and `uint32` in **little-endian** format (least significant byte first), which matches how x86 processors store integers. The MZ header bytes are `4D 5A` in the file, but as a uint16 value they are `0x5A4D` (bytes reversed). The ELF header bytes are `7F 45 4C 46` in the file, but as a uint32 they are `0x464C457F`. This catches many beginners off guard.

## Positional Operators Sometimes you need a string to appear at a specific location in the file, not just anywhere. ### at — Exact Offset ``` condition: $mz_header at 0 // MZ must be at the very start $pe_sig at 128 // PE signature at offset 128 ``` ### in — Offset Range ``` condition: $mz_header at 0 and $pe_sig in (60..1024) // PE signature within first 1KB ``` The `in (start..end)` operator restricts the string to a specific byte range. This is useful for file structure validation — you know the PE signature must be within a certain range of the MZ header. ### entrypoint — PE/ELF Entry Point ``` condition: $shellcode at entrypoint // Shellcode starts at the entry point ``` The `entrypoint` variable holds the file offset of the PE or ELF entry point. If your shellcode pattern appears exactly at the entry point, the file is almost certainly malicious — legitimate programs do not start with raw shellcode. ## Weak Rules vs. Strong Rules ![Reducing false positives — comparing a weak rule (high FP risk) with a strong rule (precise detection)](https://cyberblue-academy-content.s3.us-east-2.amazonaws.com/courses/cyberbluesoc-academy/module-07/lesson-7-3/yara-reducing-false-positives.png) The difference between a noisy rule and a production-quality rule is almost always in the condition. Here is a concrete comparison: ### Weak Rule ``` rule Weak_WebShell { strings: $a = "eval(" condition: $a } ``` **Problems:** Matches ANY file containing `eval(` — including legitimate PHP frameworks (Laravel, WordPress, Drupal), JavaScript build tools, Python scripts, and configuration generators. This rule would fire thousands of times on a typical web server with zero malicious files. ### Strong Rule ``` rule Strong_WebShell { strings: $eval = "eval(" nocase $b64 = "base64_decode" nocase $system = "system(" nocase $exec = "exec(" nocase $passthru = "passthru(" nocase $post = "$_POST" nocase $get = "$_GET" nocase $request = "$_REQUEST" nocase $php = " **Build conditions incrementally.** Start with a loose condition (`any of them`) to verify your strings match the target. Then add `filesize`. Then add a format check. Then increase the required count. After each change, re-test against both your malware corpus and your clean corpus. Stop when you have 100% detection of targets and 0% false positives on clean files. ## Combining Everything: A Complete Detection Rule Here is a production-quality rule that demonstrates every condition technique: ``` rule Ransomware_LockBit3_Indicator { meta: author = "CyberBlue Academy" description = "Detects LockBit 3.0 ransomware indicators" date = "2026-02-17" severity = "critical" mitre_att_ck = "T1486" tlp = "TLP:GREEN" strings: // Ransom note strings $note1 = "your data are stolen and encrypted" nocase wide ascii $note2 = ".onion" nocase $note3 = "restore-my-files" nocase wide ascii // Technical indicators $mutex = "Global\\lockbit" nocase wide ascii $ext = ".lockbit" nocase $shadow = "vssadmin delete shadows" nocase $bcdedit = "bcdedit /set {default} recoveryenabled no" nocase $wmic = "wmic shadowcopy delete" nocase // Hex patterns $lockbit_header = { 4C 6F 63 6B 42 69 74 20 33 2E 30 } $pe_header = { 4D 5A [20-200] 50 45 00 00 } condition: uint16(0) == 0x5A4D and filesize < 2MB and ( (2 of ($note*)) or ($mutex and 1 of ($shadow, $bcdedit, $wmic)) or ($lockbit_header and $ext) or (3 of ($note*, $mutex, $ext, $shadow, $bcdedit, $wmic)) ) } ``` This rule uses: - **uint16(0) == 0x5A4D** — only PE executables - **filesize < 2MB** — ransomware is compact - **Multiple detection paths** connected by `or` — catches different variants where some strings may be missing - **Wildcard counting** (`2 of ($note*)`) — flexible matching within string groups - **Named string combinations** — specific pairs that together are highly indicative ## YARA Import Modules: Professional-Grade Detection Everything above uses YARA's built-in features. Professional YARA rules go further by using **import modules** — extensions that give YARA deep knowledge of file formats, mathematics, and cryptographic hashing. ### import "pe" — PE File Structure Analysis The `pe` module lets your rule inspect the internal structure of Windows executables directly: ``` import "pe" rule Suspicious_PE_Structure { strings: $cmd = "cmd.exe" nocase $shell = "ShellExecuteA" condition: pe.is_pe and pe.number_of_sections < 3 and pe.timestamp < 946684800 and any of them } ``` Key `pe` module functions: | Function | What It Checks | Why It Matters | |---|---|---| | `pe.is_pe` | File is a valid PE | Replaces `uint16(0) == 0x5A4D` with structural validation | | `pe.number_of_sections` | Section count | Packed malware often has 1-2 sections; legitimate apps have 4-8 | | `pe.timestamp` | Compilation timestamp | Forged timestamps (year 1970, future dates) indicate tampering | | `pe.imphash()` | Import hash (MD5 of sorted imports) | Groups malware families that import the same DLLs/functions | | `pe.imports("kernel32.dll", "VirtualAllocEx")` | Specific API import | Detects process injection capabilities without string matching | | `pe.number_of_signatures` | Authenticode signature count | Unsigned PE in a directory of signed files is suspicious | **Import hashing** (`pe.imphash()`) is particularly powerful for threat hunting. Malware authors change strings, obfuscate code, and repack binaries — but the set of Windows APIs they import often stays the same. Two samples with identical imphashes are almost certainly from the same family or toolset. ### import "math" — Entropy and Statistical Analysis The `math` module calculates statistical properties — most commonly **Shannon entropy**, which measures randomness in data: ``` import "math" import "pe" rule Packed_Executable { condition: pe.is_pe and math.entropy(pe.sections[0].raw_data_offset, pe.sections[0].raw_data_size) > 7.0 } ``` Entropy ranges from 0 (perfectly uniform) to 8 (perfectly random). Normal code has entropy 4.5-6.5. Encrypted or compressed data has entropy 7.0-8.0. A PE section with entropy above 7.0 is almost certainly packed, encrypted, or compressed — a strong indicator of evasion. | Entropy Range | What It Means | SOC Relevance | |---|---|---| | 0-2.0 | Repetitive data (null padding, ASCII text) | Normal data sections | | 4.5-6.5 | Compiled code, mixed content | Typical legitimate PE sections | | 7.0-7.5 | Compressed or packed data | Likely UPX, Themida, or custom packer | | 7.5-8.0 | Encrypted data | Likely encrypted payload or ransomware blob | ### import "hash" — Section-Level Hashing The `hash` module computes cryptographic hashes of arbitrary byte ranges within a file: ``` import "hash" import "pe" rule Known_Malicious_Section { condition: pe.is_pe and hash.sha256(pe.sections[0].raw_data_offset, pe.sections[0].raw_data_size) == "a1b2c3d4e5f6..." } ``` Section-level hashing detects malware even when the overall file hash changes (different compiler flags, different resources, different metadata) because the code section remains identical. ### Putting It All Together: A Production Import Rule ``` import "pe" import "math" rule APT_Cobalt_Strike_Beacon { meta: author = "CyberBlue Academy" description = "Cobalt Strike beacon with import and entropy analysis" mitre_att_ck = "T1071.001" strings: $sleep = "Sleep" fullword $http = "HttpOpenRequestA" $cfg = { 00 01 00 01 00 02 ?? ?? 00 02 } condition: pe.is_pe and filesize < 1MB and pe.imports("wininet.dll", "HttpOpenRequestA") and pe.imports("kernel32.dll", "VirtualAlloc") and math.entropy(pe.sections[0].raw_data_offset, pe.sections[0].raw_data_size) > 6.8 and 2 of them } ``` This rule combines everything: `pe.is_pe` validates the format, `pe.imports()` checks for specific API calls without relying on string matching (which obfuscation can defeat), `math.entropy()` flags packed or encrypted code sections, and string matching catches configuration artifacts. Each layer independently reduces false positives.

**When to use import modules vs built-in checks.** Use built-in checks (`uint16(0)`, `filesize`, `entrypoint`) when writing cross-platform rules that target PE and ELF files. Use `import "pe"` when your rule is Windows-specific and needs structural analysis (imports, sections, timestamps). Use `import "math"` when you need to detect packing or encryption. Most production rules for Windows malware use at least `pe` and `math` modules — they are not optional extras but essential tools. You will use these extensively in **Module 11 (Malware Analysis)** when connecting YARA findings to deeper PE analysis.

- Boolean operators (`and`, `or`, `not`) combine string matches — use parentheses when mixing operators to ensure correct evaluation - String counting (`N of them`, `N of ($pattern*)`, `#string > N`) provides flexible detection that catches malware variants - `filesize` checks are the single most effective false-positive reducer — always include one based on the expected target size range - `uint16(0)` and `uint32(0)` magic byte checks restrict rules to specific file formats (PE, ELF, ZIP, etc.) - Positional operators (`at`, `in`, `entrypoint`) match strings at specific file offsets for structural validation - Strong rules combine file format checks, size limits, multiple string types, flexible counting, and exclusion strings - Weak rules use single strings with `any of them` — they match thousands of legitimate files - **Import modules** (`pe`, `math`, `hash`) elevate rules from string matching to structural analysis — `pe.imports()` detects API usage, `math.entropy()` flags packing, and `hash.sha256()` identifies known code sections - `pe.imphash()` groups malware families by their import table fingerprint — even when strings and file hashes change - In Lab 7.3, you will write 3 YARA rules to detect 5 webshells hidden among 500 files with zero false positives — condition precision is the key

## What's Next Put your conditions knowledge to the test. In **Lab 10.3 — Webshell Detection**, you'll apply YARA to a real-world scenario: writing rules to find PHP and JSP webshells hidden among legitimate web files — one of the most common YARA use cases in incident response.