PyFlag Logo
  
  

Log Analysis

Log Analysis is an important forensic technique, often presenting the only substantial evidence in a particular case. This technique is often used in a non forensic context as well.

In any analysis, a general high level process is followed. The analyst must start with a high level theory of what happened - this can just be an idea of why the investigation was prompted in the first place.

The analyst then poses questions to confirm or deny this theory. For example:

  • We suspect a user stole another users credentials: Who logged onto the server within a time window? From where?
  • We suspect a DoS against our web server: Which IP downloaded the most bytes? over what time period?
  • Our marketing department wants to know which document is most popular: What documents are most popular on this site?

These questions are then answered, leading the investigator to revise their theory or reinforce it - leading to further questions and so on.

Often when faced with analysing log files, the analyst faces a number of problems:

  1. Log files have no standard format - different applications produce different log files. Often the same application produces different format log file depending on configuration information.
  2. Log files are typically very large. This makes it difficult to find important information within the mass of data.

PyFlag aims to solve these two issues by:

  1. Log File presets are templates used to load a log file into a standard format. The same presets may be used for the same source of log files (a specific server for example) after being generated.
  2. All data is placed in the database, with appropriate indexes, searching and grouping of data within log files is extremely fast. This allows for an interactive session, allowing the analyst to quickly and efficiently analyse the log data.

The following sections will go through a number of examples. Readers are encouraged to follow these examples to gain an idea of the different features offered.

Preparation

Although PyFlag is very simple to install (Just download the mysql binary release, untar somewhere and run). There are a number of optional features that need to be installed to enhance the PyFlag user experience.

When analysing Log files it is often nice to know who is responsible for a specific IP address. Often analysts perform a whois query on the required IP address. This process is typically slow, however, and the number of queries allowed is typically limited.

PyFlag allows for performing off-line whois queries by downloading the entire assigned IP database when possible and quickly searching through it. The whois database can be downloaded from the relevant registry (e.g. APNIC, RIPE etc). Due to copyright considerations, we are unable to ship these databases with PyFlag. A script is provided with the pyflag distribution to download the freely available whois databases.

To invoke the script:

mic@dell:~/pyflag$ ./utilities/whois_load.sh
searching for /var/tmp//apnic.db.inetnum.gz
retrieving ftp://ftp.apnic.net/apnic/whois-data/APNIC/split/apnic.db.inetnum.gz \
into /var/tmp/results//apnic.db.inetnum.gz

...

As we can see, the script first checks for the apnic database file in the temporary directory. If the transfer is interrupted, the existing database files may be used to rebuild the whois database. After the download is complete, the script will parse the database files and upload them into the PyFlag MySql database. Once loaded PyFlag may be used to resolve addresses very quickly.

Note

This is an optional feature. If the whois databases are not loaded, whois queries will be resolved to "Unknown" in the following examples.

Creating a Log Preset

The first step in analysing a new log files is to create a log preset which describes the format of the log entries. We describe the format to PyFlag so that it can extract and index the individual fields using appropriate data types. This then provides the investigator with a powerful interface for searching, sorting, graphing and so on.

To create a new log preset, select the 'Create Log Preset' report in the 'Log Analysis' report family. A form will be displayed.

The first step asks for a sample log file. Choose a file (I am using the pyflag_apache_standard_log), leave the delimiter (Step 2) at the default value of space, and hit submit to continue to the next step. The form will be redrawn as shown below. The following screen-shots show the screen portion below Step 2, with the top part of the page removed to save space.

images/logload1.png

An example of loading a log file with a simple space delimiter

Below Step 2 in the form, a preview is given of the first four lines of the logfile. This preview helps to fill in the form correctly. In our sample apache log, space is the most appropriate delimiter even though it is not perfect. We will now use prefilters to clean up the lines and make the space delimiter work better.

In our file we see that there are characters such as [,*]* and " surrounding some fields, we do not want these characters in the resulting database, so we can select the prefilter "Remove ["] Chars" to remove them.

There is also a prefilter which we can use to change the text month names to month numbers (e.g. Jan becomes 01). Finally we select the prefilter "DD/MM/YYYY->YYYY/MM/DD" to finish formatting our data to the format required by the mysql database. Selecting multiple filters can be done by holding the Ctrl key and clicking each desired filter. Press submit to redraw the form with the new options, it should now look as shown below (shown from Step 3 down)

images/logload2.png

Sample log file after prefiltering.

In Step 4, a table is provided showing the text with filters applied, then split into fields according to the delimiter. Use this table to assign names and types to each field and set indexes. To ignore a column, leave the field name as ignore. Here is how I chose to set fields in my prefilter:

Field Name Type Index
1 ip_address IP Address Y
2 ignore    
3 ignore    
4 ignore    
5 time datetime Y
6 ignore    
7 ignore    
8 ignore    
9 method varchar(250) Y
10 url varchar(250) N
11 version varchar(250) Y
12 ignore    
13 status int Y
14 bytes int Y

Once you have finished entering names and types, select the checkbox under the table and hit 'Submit' to see a final preview. Here is my example:

images/logload3.png

Final confirmation of table structure set by the prefilter.

PyFlag parses the first few lines of the log file according to you settings and actually inserts them into a temporary mysql table exactly as they would be when you begin a real case. The result is a 'WYSIWYG' table. Note that if there is a problem with the database interpreting the data (for example datetime data), this should be evident in this stage.

Once you are happy with the final preview, enter a name for the new preset, check the 'finished' box and hit submit. The preset is now complete. It is possible to use this preset of any number of logs in the same format in future. Therefore the preset could be a standard for your particular log standards or servers.

Note

Presets are stored in the master PyFlag database. As such they are available to all new cases. Please choose a preset name that describes the product that produced the log rather than a particular server or case name.

Creating a Case

PyFlag uses the concept of a case to group all evidence relating to a single job or investigation. Literally, PyFlag creates a new mysql database for each new case. This database will hold all analysis results relating to that case.

Note

A single case can have many different evidence sources added to it. For example a single case may have a hard disk image and several log files. The different sources are distinguished by unique identifiers, which are used to prefix the table names in the case database.

To create a new case, click on the 'Case Management' report familly in the PyFlag main menu, then the 'Create New Case' report. Enter a name for your case and click 'Submit'.

You can now proceed to the 'Load Preset Log File' report in the 'Load Data' family, there may be a shortcut to this report on screen after you create the new case.

Loading a Log File

Select your case, preset type and file to load. Now enter a name for your new table. Click 'Submit' to see a preview of what the log table will look like. This lets you confirm that the correct preset and log file have been chosen.

images/logload4.png

Loading a log file using a previously generated prefilter.

Click 'Submit' to begin loading the logfile. Loading may take a while for very large log files. The screen show refresh regularly to display progress. Hit the browsers refresh button if the page does not automatically refresh. The progress display shows how many lines have been loaded so far. As the log is loaded one line at a time, PyFlag has no idea how many lines are present so a percentage progress cannot be given. The sample apache log has approximately 150,000 lines and takes about 2-3 minutes to load on my Athlon 2600.

Once the loading is finished you will see a shortcut to the 'List Log File' report from the 'Log Analysis' family. Click this link to begin the investigation.

Log File Analysis

To be continued