Data Collection Methods for CTI (Cyber Threat Intelligence)

Post View Count: 628

Reading Time: 3 minutes

Author: Shahab Al Yamin Chawdhury

Date: March 4, 2024

Version: 1.0

Executive Summary

This blueprint provides a strategic methodology for designing and operationalizing a world-class Cyber Threat Intelligence (CTI) data collection program. It moves beyond simply listing sources to establish a framework grounded in the intelligence lifecycle and driven by Priority Intelligence Requirements (PIRs). The goal is to build a CTI capability that is a dynamic, intelligence-driven engine, empowering organizations to adopt a predictive security posture by effectively leveraging internal telemetry, external intelligence (OSINT, HUMINT), and advanced analytical frameworks like MITRE ATT&CK.

Part I: The Strategic Imperative

A successful CTI program is defined by its ability to provide targeted, actionable insights. This requires a strategy where data collection is deliberately aligned with the organization’s risk profile and security objectives.

1.1 The Intelligence Tiers

CTI operates on three distinct levels:

Strategic: High-level, long-term view for executives, focusing on adversary intent and business risk.
Operational: Context on specific threat actors and campaigns for security managers, focusing on Tactics, Techniques, and Procedures (TTPs).
Tactical: Immediate, technical Indicators of Compromise (IoCs) for SOC analysts and automated systems.

1.2 The Intelligence Lifecycle

This six-step cycle is the architectural blueprint for a CTI capability:

Planning & Direction: Defining goals and stakeholder requirements (PIRs).
Collection: Gathering raw data from selected sources.
Processing: Normalizing, enriching, and structuring data.
Analysis: Synthesizing information into intelligence.
Dissemination: Delivering finished intelligence to stakeholders.
Feedback: Refining requirements based on the utility of the intelligence provided.

Part II & III: Data Sources & Internal Collection

The selection of data sources dictates the scope and quality of a CTI program. Internal data provides high-fidelity, ground-truth evidence, while external data offers a broader view of the threat landscape.

CTI Data Source Evaluation Matrix (Summary)

Data Source	Category	Timeliness (1-5)	Accuracy (1-5)	Uniqueness (1-5)	Actionability (1-5)	Complexity (1-5)
Internal Sources
EDR Telemetry	Internal	5	5	5	5	3
DNS Query Logs	Internal	5	5	5	4	2
OSINT Sources
Security Blogs/Reports	OSINT	2	4	2	3	1
Dark Web Forums	OSINT	3	3	4	3	4
HUMINT Sources
Ransomware Negotiation	HUMINT	5	5	5	4	5
Commercial Sources
Recorded Future	Commercial	5	4	4	5	1
CrowdStrike Falcon X	Commercial	5	5	4	5	1
Community Sources
ISAC/ISAO Feeds	Community	3	4	3	4	2

Internal Log Mapping to MITRE ATT&CK (Examples)

MITRE ATT&CK Technique	Tactic(s)	Data Source	Detection Value (1-5)
T1059.001: PowerShell	Execution	PowerShell Logs	5
T1003.001: LSASS Memory	Credential Access	Sysmon	5
T1021.001: RDP	Lateral Movement	Windows Security Log	4
T1574.002: DLL Side-Loading	Persistence, Privilege Escalation	EDR Telemetry	4

Part IV & VI: External Intelligence & Analytical Frameworks

External data provides vital context, but it must be structured within analytical frameworks to be truly valuable.

The Pyramid of Pain

This model illustrates that focusing collection on adversary behaviors (TTPs) provides a more resilient defense than focusing on simple indicators (IoCs). A mature collection strategy prioritizes data sources that reveal insights from the top of the pyramid.

(Top) TTPs: Adversary behavior. Hardest for them to change.
Tools: Software used by adversaries.
Network/Host Artifacts: Traces left by the adversary.
Domain Names: C2 or phishing domains.
IP Addresses: Malicious server addresses.
(Bottom) Hash Values: File fingerprints. Trivial for them to change.

MITRE ATT&CK Framework

This is a globally recognized knowledge base of adversary TTPs. Mapping collected data to ATT&CK provides a common lexicon to describe adversary behavior and, critically, allows an organization to perform a gap analysis of its defensive visibility, providing a data-backed case for new collection investments.

Part VII & VIII: Operationalizing & Strategic Recommendations

A modern CTI architecture is an ecosystem of integrated platforms designed for automation and data quality assurance.

Architectural Blueprint

Collection Layer: All internal and external sources.
Aggregation/Storage: Data Lakes for raw storage and SIEMs for real-time analysis.
Intelligence Management: A Threat Intelligence Platform (TIP) to process, enrich, and analyze data.
Action/Orchestration: A SOAR platform to automate response actions based on intelligence.

Strategic Recommendations

Measure ROI Quantitatively: Use a Cost Avoidance model to justify CTI investments. Track defender-based metrics like Mean Time to Detect (MTTD) and ATT&CK coverage percentage to demonstrate value.
Future-Proof Collection: Build an agile architecture to adapt to emerging trends like AI-driven analysis, Extended Threat Intelligence (XTI) for OT/IoT, and threats targeting cloud-native infrastructure.

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28