Introduction
Data carving is a file recovery technique that extracts files from raw data without relying on file system metadata. When files are deleted, formatted, or the file system is damaged, carving can recover data by identifying file signatures and structures directly in the disk image.
By the end of this part, you will understand the principles of data carving, recognize common file signatures (magic numbers), understand techniques for handling fragmented files, and use specialized carving tools like Scalpel and PhotoRec.
Data Carving Principles
Data carving works independently of the file system. Instead of using MFT entries, inodes, or FAT tables, it searches raw disk data for recognizable patterns that indicate the start and end of files.
When Carving is Needed
- File system is corrupted or damaged
- Disk has been formatted
- File system metadata has been overwritten
- Recovering from unallocated space
- MFT entries have been reused
- Unknown or unsupported file system
Carving vs File System Recovery
| Aspect | File System Recovery | Data Carving |
|---|---|---|
| Relies on metadata | Yes (MFT, inodes, etc.) | No |
| Recovers file names | Yes | No (usually) |
| Recovers timestamps | Yes | Sometimes (embedded in file) |
| Handles fragmentation | Yes (has data run info) | Difficult |
| Works after format | Limited | Yes |
Always attempt file system recovery first. If MFT entries exist for deleted files, you'll get more complete information (file names, timestamps, paths). Use carving as a secondary technique for data that file system recovery cannot find.
File Signatures (Magic Numbers)
Every file format has characteristic byte sequences, usually at the beginning (header) and sometimes at the end (footer). These signatures enable carving tools to identify file types and boundaries.
Common File Signatures
| File Type | Header (Hex) | Footer (Hex) |
|---|---|---|
| JPEG | FF D8 FF |
FF D9 |
| PNG | 89 50 4E 47 0D 0A 1A 0A |
49 45 4E 44 AE 42 60 82 |
| GIF | 47 49 46 38 (GIF8) |
00 3B |
25 50 44 46 (%PDF) |
25 25 45 4F 46 (%%EOF) |
|
| ZIP/DOCX/XLSX | 50 4B 03 04 (PK..) |
50 4B 05 06 |
| RAR | 52 61 72 21 1A 07 |
Variable |
| MP3 | FF FB or 49 44 33 (ID3) |
None standard |
| MP4 | 00 00 00 xx 66 74 79 70 |
None standard |
| Windows EXE | 4D 5A (MZ) |
None standard |
| ELF (Linux) | 7F 45 4C 46 |
None standard |
Modern Microsoft Office files (DOCX, XLSX, PPTX) are actually ZIP archives containing XML files. They share the same signature as ZIP files. Legacy Office formats (DOC, XLS, PPT) use the OLE Compound File format with signature D0 CF 11 E0.
Carving Techniques
Different carving approaches handle various scenarios with different trade-offs between speed, accuracy, and complexity.
Header-Footer Carving
The simplest technique: search for header, then search for footer, extract everything between them.
- Pros: Simple, fast, accurate file boundaries
- Cons: Requires known footer, fails with fragmented files
- Best for: JPEG, PNG, PDF files
Header-Max Size Carving
Find header, then extract a maximum expected size. Useful when files have no footer.
- Pros: Works without footer signature
- Cons: May include extra data, wastes space
- Best for: MP3, AVI, executables
Structure-Based Carving
Parse internal file structure to determine size and validate content.
- Pros: Most accurate, validates file integrity
- Cons: Complex, slow, format-specific
- Best for: ZIP, Office documents, databases
Handling Fragmented Files
Fragmentation is the biggest challenge in data carving. When files are not stored in contiguous clusters, simple header-footer carving fails because the data between header and footer includes unrelated content.
Fragmentation Scenarios
- Bifragmented: File split into exactly two pieces
- Multi-fragmented: File split across many non-contiguous areas
- Interleaved: Multiple files' fragments mixed together
Advanced Carving Techniques
Semantic Carving
Uses understanding of file format structure to validate and reassemble fragments. For example, validating JPEG markers or ZIP central directory consistency.
Statistical Analysis
Analyzes entropy and statistical properties to identify boundaries between different file fragments.
Graph-Based Reassembly
Treats fragments as nodes and uses content analysis to determine which fragments connect, building a graph of possible reassemblies.
Studies show that while many files are contiguous, a significant percentage are fragmented. On heavily used systems, fragmentation rates of 20-30% are common. However, most fragmented files have only 2-3 fragments, making partial recovery feasible with advanced tools.
Data Carving Tools
Several specialized tools exist for data carving, ranging from simple signature-based carvers to sophisticated analysis platforms.
PhotoRec
Powerful open-source carver supporting 480+ file formats. Works on disk images or directly on devices. Excellent for images, documents, and multimedia.
Free Cross-PlatformScalpel
Fast, efficient header-footer carver. Highly configurable with custom signature definitions. Based on Foremost.
Free LinuxForemost
Original header-footer carver developed for US Air Force. Simple but effective. Good for basic carving tasks.
Free LinuxBulk Extractor
Extracts features like email addresses, URLs, credit card numbers, and embedded JPEG images. High-speed parallel processing.
Free Cross-PlatformEnCase/FTK
Commercial forensic suites with integrated carving capabilities. Structure-aware carving with validation.
Commercial WindowsAutopsy
Open-source forensic platform with PhotoRec integration. GUI-based carving with result organization.
Free Cross-PlatformConfiguring Custom Signatures
Most carving tools allow custom signature definition. Scalpel uses a configuration file format:
# file_type case_sensitive footer_present header_value [footer_value] [max_size]
jpg y y \xff\xd8\xff \xff\xd9 5000000
pdf y y %PDF %%EOF 20000000
doc y n \xd0\xcf\x11\xe0 10000000
Practical Carving Workflow
A systematic approach to data carving maximizes recovery while managing large volumes of output.
Step 1: Target Selection
Identify what to carve from:
- Full disk image - comprehensive but slow
- Unallocated space only - faster, focuses on deleted data
- Specific partitions - targeted recovery
Step 2: Configure File Types
Select file types relevant to the investigation. Carving everything generates massive output and takes longer.
Step 3: Run Carving Tool
Execute the carving process. This can take hours for large images.
Step 4: Review Results
Carving generates many false positives - fragments that match signatures but aren't valid files. Manually review or use validation scripts.
Step 5: Validate and Hash
Test that carved files open correctly. Calculate hashes for evidence tracking.
Carving can produce thousands of files, many of which are partial or corrupted. Organize output by file type, use duplicate detection to reduce volume, and focus validation on file types most relevant to your case. Document which carved files were actually usable.
Carving Limitations
Understanding carving limitations helps set appropriate expectations and choose the right recovery approach.
Technical Limitations
- No file names: Carved files get sequential names, not original names
- No timestamps: Unless embedded in file itself (like EXIF)
- No folder structure: Files recovered as flat collection
- Fragmentation: Severely fragmented files often unrecoverable
- Overwritten data: Cannot recover data that's been overwritten
Format-Specific Issues
- Encrypted files: Carved but useless without decryption key
- Compressed archives: Partial recovery rarely usable
- Database files: Need complete structure for queries
- Video containers: Complex structures often fragment badly
On SSDs with TRIM enabled, deleted data is often truly gone. The TRIM command tells the SSD to erase deleted blocks, leaving nothing for carving to find. SSD carving may only succeed for very recently deleted files or files on partitions where TRIM wasn't active.
- Data carving recovers files using signatures, not file system metadata
- File signatures (magic numbers) identify file types and boundaries
- Header-footer carving works well for files like JPEG and PDF with known endings
- Fragmented files are the biggest challenge - advanced techniques partially address this
- PhotoRec and Scalpel are powerful free carving tools
- Carved files lack original names, timestamps, and folder structure
- Carving produces many false positives - validation is essential
- SSDs with TRIM make carving largely ineffective for deleted data