Table of Contents
Often images are stored in formats other than raw data dumps (sometimes called dd format). This is done in order to provide for compression, or to store case related meta-data together with the original image.
PyFlag supports many different image formats, through the use of IO Source Drivers. Before we can analyse an image using PyFlag, we must define an IO Source. An IO Source is simply a collection of information which is used to get the raw image data.
For example, in order to read a set of Encase files, we need to know to use the encase Driver, with the different files comprising the set. We also might need to know where in the image the filesystem begins by reading the partition table. This group of necessary information is termed an "IO Source" and we can name it to something meaningful. Later, whenever we want to process this image again, we need to enter all this information. Rather, we simply use the name provided to the IO Source, and PyFlag can use all the related information from that.
An example of how a filesystem is loaded is shown below:
In the figure above we load a new IO Source into PyFlag. We chose the Driver to be the ewf driver (Eyewitness format - as used by Encase). We select the filename, and assign it a name of Test_Disk_1.
This is a disk image, and we are not sure where in the image the filesystem starts. We need to examine the partition table. We do this by clicking on the icon next to the offset field.
A pop-up window will appear telling us which partitions are available in this image. Clicking on either offset will cause the correct offset to be filled into the offset field:
Offsets can generally be specified using one of the following suffixes: s,k,m,G. These correspond to the units of sectors (512 bytes), kilobyte (1024 bytes), megabytes (1024kb) and gigabytes (1024mb).
The Standard IO Source is the most basic. It represents a simple linear file stored on disk. For example this might be a dd image of a partition, ready to load directly into FLAG.
In order to take a standard forensic image suitable to be imported into flag, do this for example (Note that we must take a dd image of a partition, not an entire hard disk):
dd if=/dev/hda1 of=image.dd
The advanced IO Source allows the image to be split into mutiple files. We can also specify an offset into the image to start analysing from. This allows us to analyse a dd image of an entire hard disk. For example:
dd if=/dev/hda of=image.dd
We can find the offset of the partition (we need to specify for FLAG) in the image by doing:
sfdisk -luS /dev/hda
With todays very large hard disks it is sometimes difficult to manipulate dd images. Since dd images are uncompressed, when analysing a 120GB hard disk (which is now commodity on most PCs), the analysis platform must be able to handle a single 120GB file, which may need to be archived etc.
Many people archive very large dd images by using a standard compression program such as gzip or bzip2. This helps the archive of the file, but it is impossible to directly use the compressed file in the analysis without decompressing it first. The main reason for this is because most general purpose compression formats are not designed for seeking randomly through the file.
Most industry standard forensic packages provide a method for manipulating compressed hard disk images directly. FLAG supports a number of different formats at this time namely sgzip, and eyewitness compression format (which is mainly used by Encase(tm), and FTK(tm)).
The sgzip format is based on gzip, but provides a seekable capability. This is achieved by compressing blocks (default size of 32kb) individualy. Then a seek operation simply needs to locate the right 32kb block and decompress that. The specific details of the file format are found in the file sglib.h.
Sgzip is a robust format, which means that if the image file is damaged in some way (e.g. some of it is corrupted, or truncated) it is still possible to retrieve most of the data from within it (contrast this with for example Encase, which can not recover from a corrupted evidence file). To create an sgzip file, use the supplied sgzip utility:
dd if=/dev/hda | sgzip -v > image.sgz
It is also possible to decompress the sgzip file back into a regular dd image::
sgzip -vd image.sgz
EWF (Eye Witness Format)
Eye witness format is a proprietary format which is mainly used by Encase and FTK. This format also compresses data in 32kb chunks to achieve a seekable compressed file. This file format must also be split across files smaller than 2gb (generally 640mb is used).
Although FLAG can also create EWF files, at this stage they are not (yet) readable by Encase. It is perfectly valid to generate EWF files using FLAG for usage within FLAG, however since the EWF format is fragile (i.e. can not tolerate corruption), this is not recommended and it is better to use sgzip for this purpose. The other major disadvantage of the EWF format is that it is impossible to write an EWF file into a pipe. Hence it is not possible to image over the network (using netcat for example). Sgzip is a better format choice here as well.
Most of the time FLAG is used to analyse images taken using encase, or to repair corrupted encase images (The flag ewf implementation is quite flexible and can be used to repair encase images, whereas encase itself will not import those in most cases). See the evtool for examples of how EWF images can be manipulated. To use EWF images in FLAG simply select all the files (with extension .E01,.E02 etc) files in the IO Sources.
PyFlag uses The Sleuthkit as the underlying engine for reading filesystems. The Sleuthkit supports a number of filesystems, but not as many as are generally available using the Linux kernel to mount filesystems. To fill in the shortfall the Mounted IO Source is designed to incorporate the contents of a directory (or a mount point) within PyFlag. There are a number of critical points to note when using the Mounted IO Source:
The Linux kernel filesystem driver does not know or care about deleted Inodes. It is not designed for forensic work, so it is impossible to recover deleted files with mounted filesystems.
Mounting a filesystem is a privileged operation. Typical only root is allowed to mount the filesystem.
Since the filesystem may contain files with ownerships preventing regular users from accessing those files, it is best to run PyFlag as the root user when using the Mounted IO Source. This is the only time when running PyFlag as a privileged user is necessary, and usually this practice is discouraged.
Since PyFlag is not responsible for actually mounting the filesystem, it is completely the user's responsibility that the filesystem is mounted sensibly, i.e. mounted read only (watch out for journaling filesystems modifying images even when mounted read only). It is recommended that the user perform an md5 sum of the image before and after mounting it.
Clearly since the linux kernel does not support sgzip, or EWF, or split dd images, the image files must be simple raw dd images (possibly with an offset specified) which are understood by the loop driver.
Since a Mounted IO Source is not a real IO Source, there are some limitations with using this IO Source. Namely, certain reports that need access to the raw device will not work, for example Extract Files and Indexing. It is recommended that a seperate IO Source representing the raw device be used for these purposes.
The Mounted IO Source can be used to analyse CDROM images (ISO9660), which are not currently supported by Sleuthkit. Even NFS mounts, or SMB mounts can be analysed in this way, if physical access to the hard disk is not possible.
PyFlag is able to automatically process a raid set providing you can provide it with the raid re-assembly map. This advanced topic is covered more closely in the paper RAID Reassembly - A forensic Challenge.