Skip to content Skip to footer

Cross-drive Analysis (CDA) (with a side of Anomaly Detection)

Prior to 2006, investigators focused on forensically analyzing one drive at a time. Why? They didn’t want any cross-contamination or mixing information – the more drives, the more confusing things get. Simon Garfinkel disagreed. What if there were many drives – say, 200 – that had to be examined? There must be a much better way. In 2006, Simon delivered the results of his team’s research at a Digital Forensic Research Conference, and this spurred on the cross-drive analysis practice. (presentation here: https://dfrws.org/sites/default/files/session-files/2006_USA_pres-cross-drive_analysis.pdf and text version here: https://www.sciencedirect.com/science/article/pii/S1742287606000697),

What is cross-drive analysis (CDA)? CDA, sometimes called Anomaly Detection (AD), is a digital forensics technique that helps find similarities across multiple disk images or forensic data sources to provide context and establish baselines for an investigation. These devices, images, and drives include items such as hard disks, cell phones, and memory cards. [1] CDA involves using statistical techniques to correlate information extracted from various data sources, enabling the identification of patterns, clusters, and relationships that may not be apparent when analyzing a single source. [2]

Key steps in cross-drive analysis typically include:

    1. Imaging: Creating forensic disk images or acquiring data from multiple sources.
    2. Feature extraction: Using lexigraphic techniques to extract relevant information from the bulk data.
    3. First-order cross-drive analysis: Analyzing each data source individually to identify notable features or anomalies.
    4. Cross-drive correlation: Comparing the extracted features across multiple data sources to identify similarities, clusters, and relationships.
    5. Report generation: Presenting the findings and insights gained from the cross-drive analysis.

CDA can be particularly useful in investigations involving large data sets or cases where multiple systems or individuals are involved. It can help prioritize work, automatically identify members of social networks under investigation, and provide a more comprehensive understanding of the threat landscape relevant to the case.

Are Cross-drive Analysis and Anomaly Detection really the same?

CDA is technically different from AD, though they are related and often used together. Here’s a comparison of the two:

Cross-drive analysis (CDA):
    • Is a technique in digital forensics that correlates and analyzes information across multiple disk images or data sources simultaneously.
    • aims to identify similarities, connections, and relationships between the data sources by correlating features like email addresses, credit card numbers, message IDs, etc.
    • helps uncover clusters of related drives/data that likely originated from the same entity or were involved in the same activities.
    • provides additional context and information that may not be evident when analyzing a single source in isolation.
Anomaly detection (AD):
    • is a data analysis technique used to identify unusual patterns, outliers, or anomalous instances that deviate significantly from the expected behavior or norm in a dataset.
    • is commonly applied for fraud detection, system health monitoring, intrusion detection, etc.
    • focuses on finding individual data points, events or observations that are statistically different from the majority of the data within a single dataset.

Anomaly detection methods look for global outliers, contextual outliers, or collective outliers based on the specific use case.

While CDA correlates information across multiple data sources in digital forensics, anomaly detection identifies unusual or deviating data points within a single dataset for various analytical purposes. They are distinct techniques with different goals, but can be complementary:

    •     CDA could potentially identify anomalous drives or data sources by correlating features across the corpus.
    •     Anomaly detection could flag suspicious events or data points within a single drive, which could then be correlated using CDA to uncover broader connections.
Timeline Analysis

Timeline Analysis (TA) plays an important complementary role in CDA by providing temporal context and enabling the correlation of events across multiple data sources.

Here are some key ways that TA enhances CDA:

    1. Temporal correlation: Timeline analysis allows investigators to correlate events and activities across multiple drives based on their timestamps. This helps identify related actions that occurred around the same time on different systems.
    2. High-level event summarization: Advanced timeline analysis techniques can automatically produce high-level summaries of activity on computer systems by combining sets of low-level events into meaningful high-level events. For example, reducing several low-level events into a single “USB stick was connected” event.
    3. Contextual understanding: Timelines provide contextual information about the sequence and timing of events, which is crucial for understanding the relationships between activities observed across multiple drives.
    4. Identification of patterns: By analyzing timelines from multiple drives simultaneously, investigators can identify patterns of behavior or activity that may not be apparent when examining drives in isolation.
    5. Prioritization of analysis: Timeline analysis can help prioritize which drives or time periods warrant closer examination in a large-scale investigation involving multiple data sources.
    6. Enhanced cross-drive correlation: Timelines enable the correlation of specific types of events across drives, such as USB device connections, Skype calls, or access to files on removable media.
    7. Visualization of cross-drive relationships: Timeline visualizations can help investigators more easily perceive temporal relationships between events on different drives.

TA enhances cross-drive analysis by providing a temporal framework for correlating and understanding events across multiple data sources. This combination of techniques allows for more comprehensive and insightful digital forensic investigations, especially when dealing with large sets of disk images or data sources.

Tools

Here are a few tools that can be used for cross-drive analysis:

It’s worth noting that, in general, many digital forensics tools can be adapted for cross-drive analysis purposes. The key is using tools that can process multiple disk images simultaneously and correlate information across them. Additionally, custom scripts and data analysis techniques are often employed alongside these tools to perform specific cross-drive analysis tasks.

Sources and Resources: