![]() |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Log AnalysisLog Analysis is an important forensic technique, often presenting the only substantial evidence in a particular case. This technique is often used in a non forensic context as well. In any analysis, a general high level process is followed. The analyst must start with a high level theory of what happened - this can just be an idea of why the investigation was prompted in the first place. The analyst then poses questions to confirm or deny this theory. For example:
These questions are then answered, leading the investigator to revise their theory or reinforce it - leading to further questions and so on. Often when faced with analysing log files, the analyst faces a number of problems:
PyFlag aims to solve these two issues by:
The following sections will go through a number of examples. Readers are encouraged to follow these examples to gain an idea of the different features offered. PreparationAlthough PyFlag is very simple to install (Just download the mysql binary release, untar somewhere and run). There are a number of optional features that need to be installed to enhance the PyFlag user experience. When analysing Log files it is often nice to know who is responsible for a specific IP address. Often analysts perform a whois query on the required IP address. This process is typically slow, however, and the number of queries allowed is typically limited. PyFlag allows for performing off-line whois queries by downloading the entire assigned IP database when possible and quickly searching through it. The whois database can be downloaded from the relevant registry (e.g. APNIC, RIPE etc). Due to copyright considerations, we are unable to ship these databases with PyFlag. A script is provided with the pyflag distribution to download the freely available whois databases. To invoke the script: mic@dell:~/pyflag$ ./utilities/whois_load.sh searching for /var/tmp//apnic.db.inetnum.gz retrieving ftp://ftp.apnic.net/apnic/whois-data/APNIC/split/apnic.db.inetnum.gz \ into /var/tmp/results//apnic.db.inetnum.gz ... As we can see, the script first checks for the apnic database file in the temporary directory. If the transfer is interrupted, the existing database files may be used to rebuild the whois database. After the download is complete, the script will parse the database files and upload them into the PyFlag MySql database. Once loaded PyFlag may be used to resolve addresses very quickly. Note This is an optional feature. If the whois databases are not loaded, whois queries will be resolved to "Unknown" in the following examples.Creating a Log PresetThe first step in analysing a new log files is to create a log preset which describes the format of the log entries. We describe the format to PyFlag so that it can extract and index the individual fields using appropriate data types. This then provides the investigator with a powerful interface for searching, sorting, graphing and so on. To create a new log preset, select the 'Create Log Preset' report in the 'Log Analysis' report family. A form will be displayed. The first step asks for a sample log file. Choose a file (I am using the pyflag_apache_standard_log), leave the delimiter (Step 2) at the default value of space, and hit submit to continue to the next step. The form will be redrawn as shown below. The following screen-shots show the screen portion below Step 2, with the top part of the page removed to save space.
An example of loading a log file with a simple space delimiter Below Step 2 in the form, a preview is given of the first four lines of the logfile. This preview helps to fill in the form correctly. In our sample apache log, space is the most appropriate delimiter even though it is not perfect. We will now use prefilters to clean up the lines and make the space delimiter work better. In our file we see that there are characters such as [,*]* and " surrounding some fields, we do not want these characters in the resulting database, so we can select the prefilter "Remove ["] Chars" to remove them. There is also a prefilter which we can use to change the text month names to month numbers (e.g. Jan becomes 01). Finally we select the prefilter "DD/MM/YYYY->YYYY/MM/DD" to finish formatting our data to the format required by the mysql database. Selecting multiple filters can be done by holding the Ctrl key and clicking each desired filter. Press submit to redraw the form with the new options, it should now look as shown below (shown from Step 3 down)
Sample log file after prefiltering. In Step 4, a table is provided showing the text with filters applied, then split into fields according to the delimiter. Use this table to assign names and types to each field and set indexes. To ignore a column, leave the field name as ignore. Here is how I chose to set fields in my prefilter:
Once you have finished entering names and types, select the checkbox under the table and hit 'Submit' to see a final preview. Here is my example:
Final confirmation of table structure set by the prefilter. PyFlag parses the first few lines of the log file according to you settings and actually inserts them into a temporary mysql table exactly as they would be when you begin a real case. The result is a 'WYSIWYG' table. Note that if there is a problem with the database interpreting the data (for example datetime data), this should be evident in this stage. Once you are happy with the final preview, enter a name for the new preset, check the 'finished' box and hit submit. The preset is now complete. It is possible to use this preset of any number of logs in the same format in future. Therefore the preset could be a standard for your particular log standards or servers. Note Presets are stored in the master PyFlag database. As such they are available to all new cases. Please choose a preset name that describes the product that produced the log rather than a particular server or case name.Creating a CasePyFlag uses the concept of a case to group all evidence relating to a single job or investigation. Literally, PyFlag creates a new mysql database for each new case. This database will hold all analysis results relating to that case. Note A single case can have many different evidence sources added to it. For example a single case may have a hard disk image and several log files. The different sources are distinguished by unique identifiers, which are used to prefix the table names in the case database.To create a new case, click on the 'Case Management' report familly in the PyFlag main menu, then the 'Create New Case' report. Enter a name for your case and click 'Submit'. You can now proceed to the 'Load Preset Log File' report in the 'Load Data' family, there may be a shortcut to this report on screen after you create the new case. Loading a Log FileSelect your case, preset type and file to load. Now enter a name for your new table. Click 'Submit' to see a preview of what the log table will look like. This lets you confirm that the correct preset and log file have been chosen.
Loading a log file using a previously generated prefilter. Click 'Submit' to begin loading the logfile. Loading may take a while for very large log files. The screen show refresh regularly to display progress. Hit the browsers refresh button if the page does not automatically refresh. The progress display shows how many lines have been loaded so far. As the log is loaded one line at a time, PyFlag has no idea how many lines are present so a percentage progress cannot be given. The sample apache log has approximately 150,000 lines and takes about 2-3 minutes to load on my Athlon 2600. Once the loading is finished you will see a shortcut to the 'List Log File' report from the 'Log Analysis' family. Click this link to begin the investigation. Log File AnalysisTo be continued |