PyFlag Logo
  
  

Hooking IO Calls for Multi-Format Image Support

Author: Michael Cohen

This article has been published in the Sleuthkit Informer #19:: http://www.sleuthkit.org/informer/index.php

Revised in April 2005 to use the new command line switches for PyFlag 0.76.

Overview

Often when analysing hard disk images, the image may be provided in a slightly different format to the expected partition dd image. This may happen because the image was split into multiple files, or it might be that the image was acquired using Encase (TM) which uses its own proprietary image file format.

Many forensic tools require the image to be in a specific format. For example the Sleuthkit requires the image to be an uncompressed partition images, for example that obtained using the dd command line:

dd if=/dev/hda1 of=image.dd

If the raw disk was used, i.e. /dev/hda, the investigator is forced to use dd to "slice" the original image into partitions depending on the partition table (Note that the 63 sector skip is normally found from the partition table, using sfdisk, mmls or a similar tool):

dd if=disk_image.dd of=partition_image.dd bs=512 skip=63

If the original disk was very large to start with, this is a time consuming operation. It would be nice to have an abstraction layer which converts between the different formats of images (a partition image vs. a disk image) on the fly without requiring to copy the image again.

This functionality becomes even more desirable when considering the analysis of images which have been stored using compression. For example, the popular forensic package Encase(tm) stores images in a proprietary format called The Expert Witness Compression Format. This format provides compression as well as splitting large images into manageable parts. By providing a transparent abstraction layer it is possible to enable any tool to automatically support the image format.

Hooking IO for fun and profit

The PyFlag forensic package used to have an IO Subsystem patch for the Sleuthkit which enabled it to operate on a number of different file formats. Although the Sleuthkit is an excellent tool, it soon became obvious that the same functionality was also required of other tools, like strings, sfdisk etc.

Modifying the source code of an application resulted in an increased amount of code maintenance required to retrofit the IO subsystem patch as each version of the Sleuthkit was released. The developers of PyFlag had to find a better way. Ideally the tool would have to involve no source code modification, and allow arbitrary programs to handle the supported file formats transparently.

The obvious solution to this problem was an abstraction layer based on library hooking techniques.

When a program wishes to perform an IO operation on a file (for example open, read or write the file), it is very rare that the program issue the kernel system call directly. In fact, most programs will call the C library's open(), read() and write() calls as required. Since most programs are dynamically linked rather than statically compiled, the linking of the C library code is done during run time, by the dynamic linker.

Most dynamic linker implementations (and in particular the GNU libc dynamic loader) allow a library to be loaded first, before loading other system libraries. Also, if a library provides a required symbol, the linker will stop searching for that symbol in other libraries. This property allows a library to "hook" a library function by simply masking the library function with a locally defined function.

An example serves to illustrate the technique. Assume we have the following program, written in pseudo C code:

main() {
  fd=open("somefile",O_RDONLY);
  read(fd,buffer,SIZE);
  close(fd);
}

When this program is executed, it calls the C library's open function (which actually does the system call). The program then reads some data from the filehandle, by calling the C library's read function, and finally calls the library's close function to close the filehandle.

In the glibc implementation of the dynamic loader (The one used in most Linux systems), the environment variable LD_PRELOAD specifies to the linker that the named library should be loaded before any other libraries. If the desired symbol is present within the named library it will mask other functions with the same name present in other libraries.

In our case, we wish to hook the open(), read() and close() functions, hence we need to create a shared object (a library - we shall call it the hooker object) with these functions defined. After setting LD_PRELOAD to the location of the hooker object we have created, our library will trap all calls to the specified function:

External program ---> Hooker object ---> real libc functions

The result of this is that as far as the external program is concerned, it is operating on a simple partition image as would have been obtained using dd. In practice however, the hooker object is able to read more complex images, emulating a simple partition image to the external program.

Implementation

The PyFlag iohooker tool implements this technique. Not only does it hook open, read, write etc, but also hooks the stream functions fopen, fread, fwrite etc. It currently supports many different external programs, such as dd, disktype, all Sleuthkit executables, strings and many more.

IOHooker is distributed in two components. The main component is a shared object called libio_hooker.so. In order to control this object, environment variables are set by a wrapper program: iowrapper.

For the purposes of demonstration we download the binary version of PyFlag. We untar the distribution in our home directory, and change directory into it.

The first step, prior to being able to use the iowrapper is to set the LD_LIBRARY_PATH environment variable. This is required to allow the dynamic linker to find libio_hooker.so. If we fail to set this properly, the linker can not run the iowrapper:

~/pyflag$ ./bin/iowrapper -h
./bin/iowrapper: error while loading shared libraries: 
libio_hooker.so: cannot open shared object file: No such 
file or directory

After setting the LD_LIBRARY_PATH environment variable, we are able to run the iowrapper normally:

~/pyflag$ export LD_LIBRARY_PATH=`pwd`/libs/
~/pyflag$ ./bin/iowrapper

This program wraps library calls to enable binaries to operate
on images with various formats. NOTE: Ensure that libio_hooker.so
is in your LD_LIBRARY_PATH before running this wrapper.

Usage: ./bin/iowrapper -i subsys -o option prog arg1 arg2 arg3...
      -i subsys: The name of a subsystem to use (help for a list)
      -o optionstr: The option string for the subsystem (help for an example)
      -f wrapped filename: All wrapped filenames will start 
with this string. This is useful for programs that need to 
open other files as well as the target file (for example 
/usr/bin/file needs to open magic files as well).
Loading library now for hooking

The final message "Loading library now for hooking" confirms that the hooker object is properly initialised and ready. Let us first check to see what IO Subsystems are supported by the iowrapper:

~/pyflag$ ./bin/iowrapper -i help
Loading library now for hooking
Available Subsystems:

      standard - Standard Sleuthkit IO Subsystem
      advanced - Advanced Sleuthkit IO Subsystem
      sgzip - Seekable Gzip format
      ewf - Expert Witness Compression format
      raid - Raid 5 implementation
Unhandled Exception(IO Error): No such IO subsystem: help

Each subsystem requires specific options that make sense for it. The Advanced IO subsystem, allows users to specify arbitrary offsets, as well as multiple split image sets. We can get a more detailed explanation of these options:

~/pyflag$ ./bin/iowrapper -i advanced -o help
Loading library now for hooking
Advanced io subsystem options

      offset=bytes            Number of bytes to seek to in 
the image file. Useful if there is some extra data at the start
of the dd image (e.g. partition table/other partitions)
      file=filename           Filename to use for split files.
If your dd image is split across many files, specify this parameter
in the order required as many times as needed for seamless 
integration
      A single word without an = sign represents a filename 
to use

For our first example, we use the Sleuthkit's fls tool to list the files present in partition 6 of a hard disk image. The fls tool does not provide the option of selecting an offset into the image for the start of the filesystem, hence we need to wrap it. First we calculate the offset where the partition starts:

/pyflag# sfdisk -uS -l /tmp/test.dd
Disk /tmp/test.dd: cannot get geometry

Disk /tmp/test.dd: 0 cylinders, 0 heads, 0 sectors/track
read: Inappropriate ioctl for device

Warning: The partition table looks like it was made
  for C/H/S=*/255/63 (instead of 0/0/0).
For this listing I'll assume that geometry.
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/tmp/test.dd1            63     96389      96327  de  Dell Utility
/tmp/test.dd2   *     96390  19647494   19551105   7  HPFS/NTFS
/tmp/test.dd3      19647495  58733639   39086145   c  W95 FAT32 (LBA)
/tmp/test.dd4      58733640 117210239   58476600   5  Extended
/tmp/test.dd5      58733703  59328044     594342  82  Linux swap
/tmp/test.dd6      59328108 117210239   57882132  83  Linux

The start of partition 6 is at 59328108 sectors of 512 bytes. We can therefore use the wrapper to force fls to read the file system located at that offset (note the offset is specified in sectors):

~/pyflag$ ./bin/iowrapper -i advanced -offset 59328108s \
              -filename /tmp/test.dd -- fls foo
Set file to read from as /tmp/test.dd
d/d 11: lost+found
d/d 32769:      etc
l/l 12: cdrom
d/d 131073:     var
...
d/d 3211272:    opt
d/d 3555336:    initrd
l/l 16: vmlinuz

Note that as far as fls is concerned it is opening and reading the file foobar. It does not realise that foobar does not exist, since the wrapper provides it with valid data.

For the next example, we used Encase(tm) to create an evidence file of a floppy disk. The file command is unable to determine what is stored inside the image, due to it being encoded in the proprietary EWF format:

~/pyflag$ file test.e01
test.e01: data
~/pyflag$ hexdump -C test.e01 | head 
00000000  45 56 46 09 0d 0a ff 00  01 01 00 00 00 68 65 61  |EVF...ÿ......hea|
00000010  64 65 72 00 00 00 00 00  00 00 00 00 00 b2 00 00  |der..........²..|
00000020  00 00 00 00 00 a5 00 00  00 00 00 00 00 80 00 10  |.....¥..........|

Lets wrap the hexdump program to show the contents of the raw image:

~/pyflag$ ./bin/iowrapper -i ewf -filename test.e01 -- hexdump -C test.e01 | head
00000000  eb 3c 90 4d 53 44 4f 53  35 2e 30 00 02 01 01 00  |ë<.MSDOS5.0.....|
00000010  02 e0 00 40 0b f0 09 00  12 00 02 00 00 00 00 00  |.à.@.ð..........|
00000020  00 00 00 00 00 00 29 fc  02 29 08 4e 4f 20 4e 41  |......)ü.).NO NA|
00000030  4d 45 20 20 20 20 46 41  54 31 32 20 20 20 33 c9  |ME    FAT12   3É|

From this hexdump it looks like the image is that of a FAT 12 floppy disk. To confirm we can run the file command over the image. Since file opens other files other than the image (it needs to open the magic file), we need to prevent the hooker from hooking those other files (otherwise when the file program tries to open its magic file, it will be getting the image instead). To this end we can use the -f flag to restrict hooking only to files of a given name:

~/pyflag$ ./bin/iowrapper -i ewf -f test.e01 -filename test.e01 -- file test.e01
test.e01: x86 boot sector, code offset 0x3c, OEM-ID "MSDOS5.0", root entries 224, 
sectors 2880 (volumes <=32 MB) , sectors/FAT 9, serial number 0x82902fc, unlabeled, 
FAT (12 bit)

Sleuthkit's fls can be used on this Encase image:

~/pyflag$ ./bin/iowrapper -i ewf -f foo -filename test.e01 -- \
           ./bin/fls foo
r/r 9:  gunzip.exe
r/r 11: Hiew.exe
r/r 12: tar.exe
r/r 22: cygwin1.dll
..

Finally we wish to extract the Encase image into a standard dd image. We wrap dd and redirect the output to a file:

~/pyflag$ ./bin/iowrapper -i ewf -f test.e01 -filename test.e01 -- \
           dd if=test.e01 of=/tmp/test.dd

Note

Encase images often span many individual segments, each in their own file. To enable users to specify all segments at the same time, shell globbing may be used. In the following example, the -f specifies that files called foo should be hooked (i.e. when fls is attempting to open foo, it will get the encase image):

~/pyflag$ ./bin/iowrapper -i ewf -f foo -filename test.e* -- ./bin/fls foo

Remote Access to live systems

Sometimes we wish to analyse a live unix system remotely. This may be so we can quickly see if the system is compromised, without having to acquire the entire image first. We can use our forensic tools to examine the remote raw device by using the remote IO subsystem.

Note

This type of analysis is quite fragile because the system is still live, and using its file system. The forensic tools are accessing the raw device while it is being modified which makes it susceptible to race conditions. For example, if a file is removed just as the forensic utility is accessing its directory inode inconsistant data may be obtained.

The ramifications of this is that forensic tools may crash, or provide inconsistant results. It is impossible, however, for the IO subsystem to alter the live system in any way (since the raw device is opened as read only).

One of the common problems with accessing a remote system is authentication and encryption. Access to the raw device over the network could easily lead to a root compromise by disclosing sensitive system information (e.g. the shadow file). The problem of authentication and encryption is best left to dedicated programs, such as Secure Shell (ssh). This is the approach taken by the remote access IO subsystem. The only requirements on the live system are an ssh server, and the remote_server program (which may be compiled staticly).

These are the steps required to access remote raw devices over the network:

  1. Have a static version of remote_server - the remote server component installed on the remote system.
  2. Have an ssh server available with root logons allowed.
  3. Use the local system to access the remote raw device by wrapping library calls through the wrapper.

The following is an example of a session which might be run on a remote target machine:

~/pyflag$ ./bin/iowrapper -i remote -host target \
  -server_path /path/to/remote_server -device /dev/hda -- \
  mmls -t dos foo

DOS Partition Table
Units are in 512-byte sectors

     Slot    Start        End          Length       Description
00:  -----   0000000000   0000000000   0000000001   Primary Table (#0)
01:  -----   0000000001   0000000062   0000000062   Unallocated
02:  00:00   0000000063   0000096389   0000096327   Dell Utilities FAT (0xde)
03:  00:01   0000096390   0019647494   0019551105   NTFS (0x07)
04:  00:02   0019647495   0058733639   0039086145   Win95 FAT32 (0x0C)
05:  00:03   0058733640   0117210239   0058476600   DOS Extended (0x05)
06:  -----   0058733640   0058733640   0000000001   Extended Table (#1)
07:  -----   0058733641   0058733702   0000000062   Unallocated
08:  01:00   0058733703   0059328044   0000594342   Linux Swap / Solaris x86 (0x82)
09:  01:01   0059328045   0117210239   0057882195   DOS Extended (0x05)
10:  -----   0059328045   0059328045   0000000001   Extended Table (#2)
11:  -----   0059328046   0059328107   0000000062   Unallocated
12:  02:00   0059328108   0117210239   0057882132   Linux (0x83)

We can now list the contents of the windows partition:

~/pyflag$ ./bin/iowrapper -i remote -host target \
  -server_path /path/to/remote_server -device/dev/hda \
  -offset 0000096390s -- fls foo

d/d 12763-144-4:        Documents and Settings
d/d 6672-144-3: DRIVERS
d/d 6941-144-6: I386
r/r 6915-128-3: IO.SYS
d/d 62628-144-5:        LDIR
r/r 6916-128-3: MSDOS.SYS
d/d 16844-144-1:        My Music
r/r 6671-128-3: NTDETECT.COM
r/r 6670-128-3: NTLDR
d/d 13231-144-4:        Program Files
...

In the above analysis we use the following parameters:

host
The host we should try to log on to.
server_path
The path to the remote_server program. This program must reside on the remote machine.
device
The raw device to export
offset
An offset to use on the remote device. This can be speficied in sectors (s), kilobytes (k) or meganbytes(m) depending on the suffix.

Note

This analysis would easily reveal to us if there are hidden files or directories, even in cases where kernel level rootkits are installed. This is because most kernel level rootkits trap system calls accessing files on the filesystem, but do not filter access to raw devices. Since fls is reading the filesystem structures on the raw device, it is independant of the kernel's filesystem driver or filesystem related system calls.

Although it is conceivable that rootkits can filter the raw device to hide files, this will dramatically increase the complexity of the rootkit.

Conclusions

Library hooking is a powerful technique which enables a wrapper to be inserted between an arbitrary executable, and the image. PyFlag has developed an image abstraction layer which allows arbitrary programs to automatically support a variety of forensic image formats transparently.

The remote IO subsystem allows for the remote access and analysis of raw devices by forensic tools, making it possible to detect some kernel level rootkits remotely.