What You'll Learn
- Explain why post-incident reviews (PIRs) are essential for improving detection, response, and organizational resilience
- Conduct a blameless PIR using timeline reconstruction, root cause analysis, and the 5-Whys technique
- Document actionable lessons learned and convert them into detection rule updates, playbook revisions, and process improvements
- Calculate and interpret key IR metrics — Mean Time to Detect (MTTD), Mean Time to Respond (MTTR), and dwell time
- Share incident-derived threat intelligence through MISP using appropriate TLP markings
- Apply PIR methodology in Lab 13.4 to produce a complete post-incident documentation package
Why Post-Incident Reviews Matter
Every incident — whether a ransomware outbreak, a phishing compromise, or a false alarm that consumed four hours of analyst time — contains information your team can use to improve. The post-incident review (PIR) is the mechanism that extracts that information and converts it into concrete improvements.
Organizations that skip PIRs repeat the same mistakes. The same detection gaps that let an attacker dwell for 72 hours will let the next attacker dwell for 72 hours. The same playbook ambiguity that delayed containment by 45 minutes will delay it again. The PIR breaks this cycle.
PIR vs. Post-Mortem: These terms are often used interchangeably. Some organizations reserve "post-mortem" for major incidents and "PIR" for all incidents. The process is identical — what matters is that it happens consistently, regardless of incident severity.
The Blameless Culture Imperative
A PIR fails if people are afraid to speak honestly. If the analyst who missed the initial alert fears punishment, they will not disclose what happened. If the responder who ran the wrong containment command fears blame, the team will never learn from that mistake.
Blameless does not mean accountable-less. It means:
- Focus on systems and processes, not individuals
- Ask "what allowed this to happen?" not "who let this happen?"
- Treat human error as a symptom of process gaps, not a root cause
- Document findings as opportunities for improvement, not evidence for disciplinary action
| Blame-Oriented Language | Blameless Language |
|---|---|
| "The analyst failed to detect the alert" | "The alert was not visible in the analyst's queue due to filter configuration" |
| "The responder made a mistake during containment" | "The containment playbook did not cover this scenario, leading to an improvised response" |
| "The admin left the port open" | "The change management process did not include a security review step for firewall changes" |
Conducting an Effective PIR
When to Hold the PIR
Schedule the PIR within 48-72 hours of incident closure — soon enough that details are fresh, but far enough from the crisis that participants can reflect objectively. For major incidents, hold an initial "hot wash" within 24 hours for immediate fixes, followed by a deeper PIR within a week.
Who Should Attend
| Role | Why They Attend |
|---|---|
| Incident Commander / Lead | Provides the authoritative timeline and decision log |
| All responding analysts (L1, L2, L3) | Each tier saw different aspects of the incident |
| Detection engineering | Evaluates why existing rules did or did not fire |
| IT/sysadmin (if involved) | Explains infrastructure context and containment actions taken |
| Management (observer) | Understands resource and process gaps without directing the conversation |
The PIR Agenda
A structured agenda prevents the PIR from becoming a rambling war story session:
- Timeline reconstruction (20 min) — Build the authoritative chronological record
- What went well (10 min) — Identify strengths to preserve
- What needs improvement (15 min) — Identify gaps without blame
- Root cause analysis (15 min) — Determine why the incident happened and why response gaps existed
- Action items (15 min) — Assign specific, measurable improvements with owners and deadlines
Timeline Reconstruction
The timeline is the foundation of every PIR. Reconstruct it from multiple sources:
Source | What It Provides
------------------------|------------------------------------------
SIEM/Wazuh logs | Alert timestamps, rule IDs, severity
TheHive case log | Analyst actions, task assignments, notes
Communication records | Slack/Teams messages, email threads, phone logs
Endpoint telemetry | Velociraptor artifacts, process timelines
Network logs | Firewall blocks, DNS queries, proxy logs
Change management | Any infrastructure changes during the window
Build the timeline collaboratively on a shared screen. Each participant fills in their piece. The result should answer: What happened, in what order, and what was the gap between each event and the team's awareness/response?
Time (UTC) | Event | Source | Gap
--------------|------------------------------------------|----------------|------------
Day 1 09:00 | Phishing email delivered | Email gateway | —
Day 1 09:04 | User clicks link, credential harvested | Proxy logs | —
Day 1 09:15 | Attacker logs in via VPN | VPN logs | Not detected
Day 1 10:30 | Lateral movement to file server | Wazuh | Alert fired
Day 1 10:45 | L1 analyst triages alert | TheHive | 15 min
Day 1 11:20 | L2 escalation — confirms compromise | TheHive | 35 min
Day 1 11:45 | Containment — VPN session killed, pwd reset | TheHive | 25 min
Day 1 12:00 | Sweep begins for lateral movement | Velociraptor | —
Root Cause Analysis: The 5-Whys Technique
The 5-Whys is a simple but powerful technique for drilling past symptoms to find root causes. Start with the problem statement and ask "why?" repeatedly:
Problem: Attacker dwelled in the environment for 90 minutes before detection.
- Why? The initial VPN login from the attacker's IP did not trigger an alert.
- Why? The VPN authentication rule only alerts on failed logins, not successful logins from new locations.
- Why? The detection rule was written two years ago when geo-impossible travel detection was not available.
- Why? There is no scheduled review cycle for detection rules.
- Why? Detection engineering resources are allocated 100% to new rule development with no maintenance budget.
Root cause: No process exists for periodic detection rule review and update.
Action item: Implement quarterly detection rule review cycle. Assign detection engineering 20% maintenance time.
Stop when you reach a systemic cause. If your 5-Whys chain ends at "the analyst did not notice" — you have not gone deep enough. Ask why the analyst did not notice: alert fatigue? Poor dashboard design? Inadequate staffing? The root cause is always a process, tooling, or resource gap.
Documenting Lessons Learned
A PIR without documented action items is a conversation that changes nothing. Every lesson learned must be:
- Specific — Not "improve detection" but "add geo-impossible travel rule to VPN authentication alerts"
- Assigned — Every action item has a named owner
- Deadlined — Every action item has a completion date
- Tracked — Action items live in a tracking system (Jira, TheHive tasks, or a dedicated PIR tracker), not in a document that gets forgotten
Converting Lessons into Improvements
| Lesson Category | Example Finding | Concrete Action |
|---|---|---|
| Detection gap | VPN login from new country was not detected | Write Sigma rule for geo-impossible VPN logins; deploy to Wazuh within 2 weeks |
| Playbook gap | No playbook for compromised VPN credentials | Create VPN credential compromise playbook; include session kill, password reset, MFA re-enrollment |
| Tool gap | No way to isolate VPN users remotely | Evaluate Velociraptor network isolation capability; test in staging within 30 days |
| Process gap | Escalation from L1 to L2 took 35 minutes | Implement auto-escalation for alerts matching known campaign IOCs; reduce target to <10 min |
| Communication gap | IT team was not notified of containment actions | Add IT notification step to all containment playbooks |
Updating Playbooks and Detection Rules
The most valuable PIR output is immediate improvement to your defensive posture:
Detection Rule Updates
After every true positive incident, ask:
- Did an existing rule fire? If yes, was the alert clear enough for triage? Could it fire earlier in the kill chain?
- Should a new rule be created? What behavior would have detected this attack at an earlier stage?
- Should existing rules be tuned? Did false positives from unrelated rules slow down triage?
title: VPN Login from New Country - Post-PIR Rule
status: test
description: |
Created after PIR-2026-017. Detects VPN authentication
from a country not seen in the user's 30-day history.
Addresses 90-minute detection gap identified in the
credential harvesting incident.
references:
- internal:PIR-2026-017
logsource:
product: vpn
service: authentication
detection:
selection:
action: "login_success"
filter_known:
src_country|expand: "%user_country_history_30d%"
condition: selection and not filter_known
level: high
tags:
- attack.initial_access
- attack.t1078.004
Playbook Updates
After every incident where the playbook was insufficient:
- Add the missing scenario or decision branch
- Update containment actions with the correct tool commands
- Add the "lessons learned" reference so future analysts know why the step exists
- Test the updated playbook in a tabletop exercise within 30 days
Measuring IR Effectiveness
You cannot improve what you do not measure. Three metrics form the foundation of IR performance tracking:
Mean Time to Detect (MTTD)
Definition: Average time between an attack action and the SOC becoming aware of it.
MTTD = (Time of first detection) - (Time of initial compromise)
Industry benchmarks:
| Category | MTTD |
|---|---|
| Best-in-class SOC | < 1 hour |
| Average enterprise | 24-72 hours |
| Mandiant global median (2024) | 10 days |
| Target for CyberBlue analysts | < 4 hours |
Mean Time to Respond (MTTR)
Definition: Average time between first detection and containment of the threat.
MTTR = (Time of containment) - (Time of first detection)
| Category | MTTR |
|---|---|
| Best-in-class SOC | < 30 minutes |
| Average enterprise | 4-24 hours |
| Target for CyberBlue analysts | < 2 hours |
Dwell Time
Definition: Total time an attacker has access to the environment before being fully eradicated.
Dwell Time = MTTD + MTTR + Eradication Time
Tracking Metrics Over Time
Track these metrics per incident and plot trends quarterly:
Incident ID | Type | MTTD | MTTR | Dwell Time
-------------|----------------|-----------|-----------|------------
PIR-2026-012 | Phishing | 45 min | 2 hr | 3.5 hr
PIR-2026-013 | Malware | 6 hr | 1.5 hr | 9 hr
PIR-2026-014 | Brute Force | 10 min | 20 min | 35 min
PIR-2026-015 | Lateral Mvmt | 3 hr | 4 hr | 12 hr
PIR-2026-016 | Data Exfil | 18 hr | 6 hr | 30 hr
PIR-2026-017 | Cred Harvest | 90 min | 75 min | 4 hr
Trends reveal systemic issues. If MTTD is consistently high for phishing incidents, invest in email gateway detection. If MTTR is high for lateral movement, invest in network segmentation and automated containment.
Track improvement after PIR action items are completed. If your Q1 average MTTD for phishing was 2 hours and you deployed a new email detection rule in the Q1 PIR, measure Q2 phishing MTTD to validate the improvement. Metrics without action are vanity; action without metrics is guesswork.
Sharing Threat Intelligence from Incidents
Every incident produces IOCs, TTPs, and behavioral patterns that may be valuable to other teams and organizations. Sharing this intelligence completes the feedback loop.
Internal Sharing
Update your MISP instance with incident-derived IOCs:
| IOC Type | Example | MISP Category |
|---|---|---|
| IP addresses | C2 servers, exfiltration endpoints | Network activity |
| Domains | Phishing domains, C2 domains | Network activity |
| File hashes | Malware samples, tools dropped | Payload delivery |
| Email addresses | Phishing sender addresses | Social network |
| URLs | Credential harvesting pages | Network activity |
| YARA rules | Detection signatures created during investigation | Payload delivery |
TLP Markings for Sharing
Apply the appropriate Traffic Light Protocol (TLP) marking before sharing outside your organization:
| TLP Level | Sharing Scope | When to Use |
|---|---|---|
| TLP:RED | Named recipients only | Incident involves sensitive business data or ongoing investigation |
| TLP:AMBER | Organization + need-to-know partners | IOCs from confirmed incidents, not yet public |
| TLP:GREEN | Community (sector ISAC, trusted groups) | Anonymized IOCs and TTPs useful to peers |
| TLP:CLEAR | Public | Fully anonymized indicators, published advisories |
External Sharing via MISP
1. Create a MISP event for the incident
2. Add IOCs with appropriate types and categories
3. Add ATT&CK tags for the techniques observed
4. Set TLP marking based on sensitivity
5. Add a narrative description (sanitized of internal details)
6. Publish to your sharing groups (sector ISAC, partner organizations)
Sanitize before sharing. Never include internal hostnames, IP ranges, employee names, or business-specific details in externally shared intelligence. Share the IOCs and TTPs, not the story of your organization's vulnerability.
Building Institutional Memory
Individual analysts come and go. Institutional memory — the collective knowledge of past incidents, decisions, and improvements — must persist regardless of staff turnover.
The PIR Knowledge Base
Maintain a searchable archive of all PIRs. When a new incident occurs, analysts should search past PIRs for similar scenarios:
- "Have we seen this malware family before? What worked last time?"
- "We had a similar VPN compromise in Q3 — what did the PIR recommend?"
- "This looks like the same TTP as PIR-2026-013 — check if the action items were implemented"
Connecting PIRs to Detection Coverage
Map every PIR to the ATT&CK techniques involved. Over time, this creates a heat map showing which techniques your organization has encountered and whether you have adequate detection for each:
ATT&CK Technique | Incidents | Detection Rule Exists | Rule Quality
--------------------|-----------|----------------------|-------------
T1566.001 Phishing | 5 | Yes | High (tuned)
T1078.004 Cloud Acc | 2 | Yes (post-PIR) | Medium (new)
T1021.001 RDP | 3 | Yes | High (tuned)
T1053.005 Sched Task | 1 | No | — COVERAGE GAP
T1071.001 Web C2 | 4 | Yes | High (tuned)
Key Takeaways
- Post-incident reviews transform every incident — true positive or false positive — into concrete improvements to detection, response, and organizational process
- Blameless culture is non-negotiable: focus on systems and processes, not individuals, or people will hide information that the team needs to improve
- The 5-Whys technique drills past symptoms to systemic root causes — keep asking until you reach a process, tooling, or resource gap
- Every PIR action item must be specific, assigned, deadlined, and tracked — undocumented lessons are forgotten lessons
- MTTD, MTTR, and dwell time are the three foundational IR metrics; track them per incident and trend quarterly to identify systemic improvement opportunities
- Share incident-derived intelligence through MISP with appropriate TLP markings — your IOCs may prevent the same attack at a partner organization
- Build institutional memory by maintaining a searchable PIR archive linked to ATT&CK techniques and detection coverage
What's Next
You have learned how to extract maximum value from every incident through structured post-incident reviews. In Lesson 13.5 — Incident Reporting & Communication, you will learn to write professional incident reports for different audiences — from technical deep-dives for your security team to executive summaries for leadership — and understand regulatory reporting requirements that apply when incidents involve protected data.
Knowledge Check: Post-Incident Review
10 questions · 70% to pass
What is the primary purpose of a blameless post-incident review?
Using the 5-Whys technique, an analyst determines that a 90-minute detection gap occurred because there is no process for periodic detection rule review. What type of root cause is this?
When should a post-incident review be scheduled after incident closure?
Which IR metric measures the time between an attack action and the SOC becoming aware of it?
What TLP marking should you apply when sharing incident IOCs with your sector ISAC (Information Sharing and Analysis Center)?
In Lab 13.4, you conduct a PIR for a simulated incident. Which of the following is the FIRST step in the PIR agenda?
An organization's MTTD for phishing incidents averaged 2 hours in Q1. After implementing a new email detection rule from a Q1 PIR, what should they measure in Q2?
What must every PIR action item include to be effective?
In Lab 13.4, you reconstruct an incident timeline from multiple sources. Why is timeline reconstruction done collaboratively with all responders rather than by a single person?
Before sharing incident-derived IOCs externally through MISP, what critical step must you perform?
0/10 answered