Lesson 2 of 6·12 min read·Includes quiz

String Patterns & Matching

Wildcards, hex, case-insensitive

What You'll Learn

  • Apply modifiers (nocase, wide, ascii, fullword, xor, base64) to text strings for flexible pattern matching
  • Construct hex strings with wildcards (??), jumps ([N-M]), and alternation (|) for binary signature detection
  • Write regex patterns for matching variable-length indicators like URLs, IP addresses, and encoded data
  • Choose the correct string type (text, hex, or regex) based on what you are trying to detect
  • Explain how Unicode encoding (wide strings) affects malware detection on Windows systems
  • Connect string pattern techniques to the hex pattern hunting exercises in Lab 7.2

Beyond Basic Strings: Why Modifiers Matter

In Lesson 7.1, you learned the three string types: text, hex, and regex. A basic text string like $s = "cmd.exe" works, but it only matches that exact case and encoding. Real malware is not that cooperative.

Attackers deliberately vary their strings to evade detection:

  • Case variation: CMD.EXE, Cmd.Exe, cMd.ExE — different cases defeat case-sensitive matching
  • Unicode encoding: Windows APIs often use UTF-16 (wide) strings internally — cmd.exe becomes c\x00m\x00d\x00.\x00e\x00x\x00e\x00 in memory
  • XOR encoding: A single-byte XOR turns readable strings into gibberish that text matching cannot find
  • Base64 encoding: PowerShell's -enc parameter accepts Base64-encoded commands — the string cmd.exe becomes Y21kLmV4ZQ==
  • Substring embedding: The string cmd might appear inside acmdline or tcmd_tool — matching substrings creates false positives

YARA's string modifiers solve all of these problems. They transform a basic text match into a flexible detection engine that catches variants without requiring separate rules for each variation.

YARA string types — text, hex, and regex compared with their features, examples, and best use cases

Text String Modifiers

nocase — Case-Insensitive Matching

The nocase modifier makes a text string match regardless of case:

$s = "powershell" nocase

This matches powershell, PowerShell, POWERSHELL, pOwErShElL, and every other combination. Without nocase, YARA performs exact case matching by default.

When to use: Almost always for function names, command names, and tool names. Attackers routinely vary case to evade simple string matching. The only time you want case-sensitive matching is when the exact case is itself a distinguishing indicator (rare).

wide — UTF-16 Little-Endian Matching

Windows internally uses UTF-16 Little-Endian (LE) for many strings. In UTF-16 LE, each ASCII character is followed by a null byte:

ASCII:   c   m   d   .   e   x   e
Bytes:   63  6D  64  2E  65  78  65

UTF-16:  c       m       d       .       e       x       e
Bytes:   63 00   6D 00   64 00   2E 00   65 00   78 00   65 00

The wide modifier matches the UTF-16 LE version:

$s = "cmd.exe" wide

You almost always want both encodings:

$s = "cmd.exe" wide ascii

This matches both the ASCII and UTF-16 LE versions of the string in a single rule.

When to use: Any time you scan Windows executables, DLLs, or memory dumps. Windows PE files contain both ASCII and wide strings. PowerShell scripts that are read by .NET often have wide string variants in memory.

fullword — Whole Word Matching

The fullword modifier ensures the string is not part of a larger word:

$s = "cmd" fullword
InputMatches?
cmd.exe /cYes — "cmd" is bounded by non-alphanumeric characters
cmd /c whoamiYes — "cmd" is a standalone word
acmdlineNo — "cmd" is embedded in a larger word
tcmd_runnerNo — "cmd" follows an alphanumeric character

When to use: Short strings that commonly appear as substrings of legitimate words. Without fullword, searching for $s = "net" would match internet, network, ethernet, and thousands of other legitimate strings.

Combining Modifiers

Modifiers stack:

$s = "CreateRemoteThread" nocase wide ascii fullword

This matches CreateRemoteThread in any case, in both ASCII and UTF-16 encoding, but only as a complete word — not as part of a larger identifier.

YARA string modifiers — nocase, wide/ascii, fullword, xor, and base64 explained with visual examples

Advanced Modifiers: xor and base64

xor — XOR-Encoded String Matching

Attackers frequently XOR-encode strings to hide them from simple text searches. A single-byte XOR with key 0x55 transforms:

Original:  T   h   i   s       p   r   o   g   r   a   m
Hex:       54  68  69  73  20  70  72  6F  67  72  61  6D
XOR 0x55:  01  3D  3C  26  75  25  27  3A  32  27  34  38

The result is unreadable gibberish. But YARA's xor modifier catches it:

$s = "This program" xor

This tests all 256 possible single-byte XOR keys (0x00 through 0xFF) and matches if any key produces the target string. You can also specify a key range:

$s = "This program" xor(0x01-0xff)

This skips key 0x00 (which produces the original plaintext string — no actual encoding).

The xor modifier significantly increases scan time because YARA must test up to 255 XOR key variations for each byte position. Use it judiciously — only on strings that you have evidence are XOR-encoded in the samples you are targeting. Do not apply xor to every string in a rule by default.

base64 — Base64-Encoded String Matching

PowerShell's -EncodedCommand parameter and many malware loaders use Base64 encoding. The base64 modifier matches the Base64-encoded version of a string:

$s = "cmd.exe" base64

YARA will match files containing Y21kLmV4ZQ== (the Base64 encoding of "cmd.exe"). The modifier handles all three possible Base64 alignment offsets, so it catches the encoded string regardless of where it appears in a larger Base64 blob.

$s = "cmd.exe" base64wide

The base64wide variant matches the Base64 encoding of the UTF-16 LE version — useful for detecting encoded PowerShell commands that operate on Unicode strings internally.

Hex Strings: Binary Pattern Mastery

Hex strings are your primary tool for matching binary content — shellcode, file headers, executable signatures, and encoded payloads. They use byte-level matching with powerful wildcards.

Basic Hex Patterns

$pe_header = { 4D 5A 90 00 }

This matches the classic DOS/PE executable header: "MZ" followed by 0x90 0x00. Every Windows .exe and .dll starts with this signature.

Wildcards (??)

$call_pattern = { E8 ?? ?? ?? ?? C3 }

This matches the x86 instruction sequence: CALL relative_offset followed by RET. The E8 byte is the CALL opcode, the four ?? wildcards match any 4-byte relative address, and C3 is the RET instruction. This pattern is common in function epilogues.

Jumps ([N-M])

Jumps let you match patterns with a variable number of bytes between known anchors:

$pattern = { 4D 5A [20-60] 50 45 00 00 }

This matches a file starting with "MZ" followed by "PE\0\0" somewhere between 20 and 60 bytes later. The [20-60] jump says "skip 20 to 60 bytes of anything." This is essential for PE header matching because the offset from the MZ header to the PE signature varies between executables.

Fixed-length jumps are also supported:

$pattern = { E8 [4] C3 }

This is equivalent to { E8 ?? ?? ?? ?? C3 } — a CALL with exactly 4 bytes of offset followed by a RET.

Alternation (|)

Alternation lets you match one of several byte sequences at a position:

$shellcode = { (6A 40 | 6A 20) 68 00 10 00 00 }

This matches either 6A 40 (push 0x40) or 6A 20 (push 0x20) followed by the common VirtualAlloc parameter sequence. Different shellcode generators use different memory protection flags, but the surrounding code is identical.

Combining Hex Features

$complex = { 4D 5A [20-200] 50 45 00 00 [0-500] (2E 74 65 78 74 | 2E 63 6F 64 65) }

This matches PE files where the .text or .code section header appears within 500 bytes after the PE signature. It combines a fixed header, a variable jump, a fixed signature, another variable jump, and alternation — all in a single hex string.

💡

Use hexdump and CyberChef to extract hex patterns from samples. When you have a malware sample, run hexdump -C sample.exe | head -50 to see the first bytes, or paste the file into CyberChef and use the "To Hex" operation. Look for unique byte sequences that do not appear in legitimate software. File headers, embedded strings, and shellcode preambles are the best candidates for hex string rules.

Regex Strings: Variable Pattern Detection

Regex strings handle patterns that cannot be expressed as fixed text or hex. They use PCRE-like syntax between forward slashes:

IP Address Patterns

$ip = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/

Matches any IPv4 address pattern in a file. Useful for finding hardcoded C2 IP addresses.

URL Patterns

$url = /https?:\/\/[a-z0-9\-\.]+\.(ru|cn|tk|xyz|top)\/[a-z0-9]{5,}/i

Matches HTTP/HTTPS URLs pointing to suspicious TLDs with randomized path segments. The /i flag makes the regex case-insensitive.

Base64 Content Detection

$b64 = /[A-Za-z0-9+\/]{50,}={0,2}/

Matches Base64-encoded blobs of at least 50 characters. Useful for finding encoded payloads, commands, or configuration data embedded in scripts.

PowerShell Obfuscation

$ps_obfuscated = /\-[eE][nN][cC][oO]?[dD]?[eE]?[dD]?[cC]?[oO]?[mM]?[mM]?[aA]?[nN]?[dD]?\s+[A-Za-z0-9+\/=]{20,}/

Matches PowerShell's -EncodedCommand parameter (which accepts abbreviations like -enc, -enco, -encoded) followed by a Base64 string.

Regex strings are the most powerful but also the slowest. YARA processes regex patterns using a full regex engine, which is significantly slower than text or hex matching. A rule with 20 complex regex patterns will scan much slower than a rule with 20 text strings. Use regex only when text and hex cannot express the pattern you need. For known, fixed strings, always prefer text strings.

Choosing the Right String Type

What You KnowBest String TypeExample
The exact string in the fileText string$s = "CreateRemoteThread" nocase wide ascii
The exact byte sequenceHex string$h = { 4D 5A 90 00 03 00 00 00 }
A pattern with variable bytesHex string with wildcards$h = { E8 ?? ?? ?? ?? C3 }
A format pattern (IPs, URLs, emails)Regex$r = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/
The string exists but encoding variesText with modifiers$s = "payload" nocase xor base64
A binary header with variable offsetsHex string with jumps$h = { 4D 5A [20-200] 50 45 00 00 }
One of several possible byte sequencesHex with alternation$h = { (6A 40 | 6A 20) 68 00 10 }

The best rules combine multiple string types. A rule with text strings (for readable indicators), hex strings (for binary signatures), and regex strings (for variable patterns) covers more ground than any single type alone.

Practical Example: Cobalt Strike Beacon Detection

Let us apply everything we have learned to a real-world detection rule. Cobalt Strike is one of the most common C2 frameworks encountered in enterprise intrusions. Here is how an analyst would build a detection rule using multiple string types and modifiers:

rule CobaltStrike_Beacon_Indicator
{
    meta:
        author = "CyberBlue Academy"
        description = "Detects Cobalt Strike Beacon indicators"
        date = "2026-02-17"
        severity = "critical"
        mitre_att_ck = "T1071.001"

    strings:
        // Text strings with modifiers
        $sleep = "SleepMask" nocase wide ascii
        $beacon = "beacon.dll" nocase wide ascii
        $pipe = "\\.\pipe\msagent_" nocase

        // Hex strings for binary signatures
        $config_header = { 00 01 00 01 00 02 ?? ?? 00 03 }
        $xor_key = { 69 68 69 68 69 68 69 68 }
        $pe_header = { 4D 5A [20-200] 50 45 00 00 }

        // Regex for C2 URL patterns
        $c2_uri = /\/[a-zA-Z]{4,8}\.(js|html|php|asp)\?id=[0-9]{6,}/

    condition:
        $pe_header at 0 and
        filesize < 1MB and
        (2 of ($sleep, $beacon, $pipe) or
         $config_header or
         ($xor_key and $c2_uri))
}

This rule uses:

  • Text strings with nocase + wide + ascii for known Cobalt Strike strings
  • Hex strings for the beacon configuration header and XOR key
  • Hex string with jump for the PE header validation
  • Regex for the C2 URI pattern
  • Flexible condition with multiple detection paths

Key Takeaways

  • The nocase modifier catches case variations — use it on nearly all function/command/tool name strings
  • The wide ascii combination catches both ASCII and UTF-16 LE encodings — essential for Windows PE and memory scanning
  • The fullword modifier prevents substring false positives — critical for short strings like "cmd", "net", "run"
  • The xor modifier detects single-byte XOR-encoded strings but increases scan time — use only when you have evidence of XOR encoding
  • The base64 modifier detects Base64-encoded strings — essential for PowerShell -enc command detection
  • Hex wildcards (??), jumps ([N-M]), and alternation ((xx|yy)) match variable binary patterns
  • Regex strings handle format patterns (IPs, URLs, Base64 blobs) but are slower than text and hex — use only when needed
  • The best rules combine multiple string types with appropriate modifiers for comprehensive detection
  • In Lab 7.2, you will use hex patterns and CyberChef to decode an embedded C2 URL and write a YARA rule that detects it in binary form

What's Next

You now have a complete toolkit for writing YARA strings that catch malware variants, encoded payloads, and binary signatures. In Lesson 7.3, you will master the condition section — Boolean logic, string counting, file properties, and position operators that control precisely when your rules fire. The condition is where you eliminate false positives and build rules that are precise enough for production deployment.

Knowledge Check: String Patterns & Matching

10 questions · 70% to pass

1

What does the 'nocase' modifier do when applied to a YARA text string?

2

Why is the 'wide ascii' modifier combination important when scanning Windows executables?

3

Which YARA modifier prevents the string 'cmd' from matching inside the word 'acmdline'?

4

An attacker XOR-encoded the string 'This program' with a single-byte key. Which YARA modifier detects the encoded version?

5

In the hex string { E8 ?? ?? ?? ?? C3 }, what does the pattern represent in x86 assembly?

6

What is the purpose of hex string jumps like [20-60] in the pattern { 4D 5A [20-60] 50 45 00 00 }?

7

Which string type is the SLOWEST for YARA to process and should be used only when text and hex strings cannot express the needed pattern?

8

In Lab 7.2, you will decode a hex-encoded C2 URL using CyberChef and write a YARA rule. Which string type is most appropriate for matching the C2 URL in its encoded (hex) form inside the binary?

9

What does hex alternation (|) allow you to do in a YARA hex string like { (6A 40 | 6A 20) 68 00 10 }?

10

An analyst wants to detect the string 'cmd.exe' in all of these forms: plain text, UTF-16, XOR-encoded, and Base64-encoded. Which modifier combination achieves this?

0/10 answered