Data Collection Methods for CTI (Cyber Threat Intelligence)

Reading Time: 3 minutes

Author: Shahab Al Yamin Chawdhury

Date: March 4, 2024

Version: 1.0

Executive Summary

This blueprint provides a strategic methodology for designing and operationalizing a world-class Cyber Threat Intelligence (CTI) data collection program. It moves beyond simply listing sources to establish a framework grounded in the intelligence lifecycle and driven by Priority Intelligence Requirements (PIRs). The goal is to build a CTI capability that is a dynamic, intelligence-driven engine, empowering organizations to adopt a predictive security posture by effectively leveraging internal telemetry, external intelligence (OSINT, HUMINT), and advanced analytical frameworks like MITRE ATT&CK.

Part I: The Strategic Imperative

A successful CTI program is defined by its ability to provide targeted, actionable insights. This requires a strategy where data collection is deliberately aligned with the organization’s risk profile and security objectives.

1.1 The Intelligence Tiers

CTI operates on three distinct levels:

  • Strategic: High-level, long-term view for executives, focusing on adversary intent and business risk.
  • Operational: Context on specific threat actors and campaigns for security managers, focusing on Tactics, Techniques, and Procedures (TTPs).
  • Tactical: Immediate, technical Indicators of Compromise (IoCs) for SOC analysts and automated systems.

1.2 The Intelligence Lifecycle

This six-step cycle is the architectural blueprint for a CTI capability:

  1. Planning & Direction: Defining goals and stakeholder requirements (PIRs).
  2. Collection: Gathering raw data from selected sources.
  3. Processing: Normalizing, enriching, and structuring data.
  4. Analysis: Synthesizing information into intelligence.
  5. Dissemination: Delivering finished intelligence to stakeholders.
  6. Feedback: Refining requirements based on the utility of the intelligence provided.

Part II & III: Data Sources & Internal Collection

The selection of data sources dictates the scope and quality of a CTI program. Internal data provides high-fidelity, ground-truth evidence, while external data offers a broader view of the threat landscape.

CTI Data Source Evaluation Matrix (Summary)

Data SourceCategoryTimeliness (1-5)Accuracy (1-5)Uniqueness (1-5)Actionability (1-5)Complexity (1-5)
Internal Sources
EDR TelemetryInternal55553
DNS Query LogsInternal55542
OSINT Sources
Security Blogs/ReportsOSINT24231
Dark Web ForumsOSINT33434
HUMINT Sources
Ransomware NegotiationHUMINT55545
Commercial Sources
Recorded FutureCommercial54451
CrowdStrike Falcon XCommercial55451
Community Sources
ISAC/ISAO FeedsCommunity34342

Internal Log Mapping to MITRE ATT&CK (Examples)

MITRE ATT&CK TechniqueTactic(s)Data SourceDetection Value (1-5)
T1059.001: PowerShellExecutionPowerShell Logs5
T1003.001: LSASS MemoryCredential AccessSysmon5
T1021.001: RDPLateral MovementWindows Security Log4
T1574.002: DLL Side-LoadingPersistence, Privilege EscalationEDR Telemetry4

Part IV & VI: External Intelligence & Analytical Frameworks

External data provides vital context, but it must be structured within analytical frameworks to be truly valuable.

The Pyramid of Pain

This model illustrates that focusing collection on adversary behaviors (TTPs) provides a more resilient defense than focusing on simple indicators (IoCs). A mature collection strategy prioritizes data sources that reveal insights from the top of the pyramid.

  • (Top) TTPs: Adversary behavior. Hardest for them to change.
  • Tools: Software used by adversaries.
  • Network/Host Artifacts: Traces left by the adversary.
  • Domain Names: C2 or phishing domains.
  • IP Addresses: Malicious server addresses.
  • (Bottom) Hash Values: File fingerprints. Trivial for them to change.

MITRE ATT&CK Framework

This is a globally recognized knowledge base of adversary TTPs. Mapping collected data to ATT&CK provides a common lexicon to describe adversary behavior and, critically, allows an organization to perform a gap analysis of its defensive visibility, providing a data-backed case for new collection investments.

Part VII & VIII: Operationalizing & Strategic Recommendations

A modern CTI architecture is an ecosystem of integrated platforms designed for automation and data quality assurance.

Architectural Blueprint

  • Collection Layer: All internal and external sources.
  • Aggregation/Storage: Data Lakes for raw storage and SIEMs for real-time analysis.
  • Intelligence Management: A Threat Intelligence Platform (TIP) to process, enrich, and analyze data.
  • Action/Orchestration: A SOAR platform to automate response actions based on intelligence.

Strategic Recommendations

  1. Measure ROI Quantitatively: Use a Cost Avoidance model to justify CTI investments. Track defender-based metrics like Mean Time to Detect (MTTD) and ATT&CK coverage percentage to demonstrate value.
  2. Future-Proof Collection: Build an agile architecture to adapt to emerging trends like AI-driven analysis, Extended Threat Intelligence (XTI) for OT/IoT, and threats targeting cloud-native infrastructure.