Posts

Introduction

A frequent question when dealing with browser forensics is ‘Does the Hit Count value mean that the user visited site ‘x’, on ‘y’ occasions?’ Most browsers record a ‘Hit Count’ value in one or more of the files they use to track browser activity, and it is important that an analyst understands any potential pitfalls associated with the accuracy, or otherwise, of this value.

We recently received a support request from an analyst who was analysing Internet Explorer data. They had found a record relating to a Bing Images search, which showed a hit count of 911. The particular search string was significant, and very damning had it actually been used 911 times. The analyst wanted to know if the hit count value could be relied upon.

The following experiment was carried out in order to establish how this surprisingly high hit count value could have been generated. In order to obtain a data set which contained as little extraneous data as possible, a brand new VMWare virtual machine was created. The machine was setup from the Microsoft Windows XP SP3 installation disc, which installed Internet Explorer v 6.0.2900.5512.xpsp.080413-2111 by default. Two user accounts were created on the machine – one to be used as an Admin account, for installing software etc; and the other to be used as the ‘browsing’ account. This separation of the accounts further assisted with minimising the possibility of any unwanted data being present within the ‘browsing’ account. Using the Admin account, the version of Internet Explorer in use on the virtual machine was upgraded to IE v 8.0.6001.18702. The ‘browsing’ account was then used for the first time. Starting Internet Explorer immediately directed the user to the MSN homepage. The address ‘www.bing.com’ was typed into the address bar, which led to the Bing search engine homepage. The ‘Images’ tab was clicked. This Auto Suggested a search criterion of ‘Beautiful Britain’, as can be seen in the figure below:

 

IE_2520Bing_2520Images_2520search_2520-_2520aston_2520martin_25202_thumb

Figure 1

The term ‘aston martin’ was then typed into the search box, as shown below:

 

Figure 2

None of the images were clicked or zoomed, nor was the result screen scrolled. Internet Explorer was closed, and the browsing account logged off. The Admin account was used to extract the browser data for processing in NetAnalysis. The below image shows some of the results. Both of these entries are from Master History INDEX.DAT files:

 

Figure 3

 

As can be seen, both entries show a hit count of 5. Both of these pages were visited only once, so it is immediately apparent that the hit count value maintained by Internet Explorer may not be an accurate count of how many times a particular page has been visited. However, this still did not explain how Internet Explorer had produced a hit count of 911.

The virtual machine was started again, and the browsing account logged on. The previous steps were repeated; typing ‘www.bing.com’ into the URL bar; visiting the Bing homepage; and clicking on the ‘Images’ tab. Once again, Bing Auto Suggested the search criterion of ‘Beautiful Britain’, and displayed the same thumbnail results page. The search criterion ‘aston martin’ was again typed into the search box and the same thumbnail results page was produced. None of the images were clicked or zoomed. The results page was scrolled using the side scroll bar, which generated more thumbnails as it went. Internet Explorer was closed, and the browsing account logged off. The Admin account was used to extract the browser data for processing in NetAnalysis. The below image shows some of the results. Both of these entries are again from Master History INDEX.DAT files:

 

Figure_204_20-_20NetAnalysis_20showing_20511_20hit_20count_thumb

Figure 4

As can be seen, the ‘Beautiful Britain’ search now has a hit count of 13 – it is not at all clear how Internet Explorer determined this figure. Moreover, the ‘aston martin’ search now shows a hit count of 511. This page was not visited 511 times, nor were 511 of the thumbnail images clicked. The contents of the INDEX.DAT for the local cache folders (Content.IE5) were checked to see how many records were held relating to thumbnails that had been cached. The results were as follows:

 

Figure 5

So it does not even appear that there are 511 thumbnails held in the local cache. The result page was scrolled quickly, so the user did not see a large proportion of the thumbnail images.

In conclusion, it is apparent that the ‘Hit Count’ maintained by Internet Explorer cannot be relied upon. Although this experiment involved a quite specific process relating solely to image searches carried out on one particular search engine, the disparity between results and reality makes it clear that unquestioning acceptance of what Internet Explorer is recording as a ‘Hit Count’ could lead to significant errors if presented in evidence.

To complete the experiment, two further identical Virtual Machines were created. On one, the Google Chrome browser (v 15.0.874.106 m) was installed and used. On the other, the Mozilla Firefox browser (v 8.0) was installed and used. The same steps were repeated: typing ‘www.bing.com’ into the URL bar; visiting the Bing homepage; and clicking on the ‘Images’ tab. The results from these processes are shown below:

Chrome:

Figure_206_20-_20NetAnalysis_20with_20Google_20Chrome_20Search_thumb

Figure 6

 

Firefox:

Figure_207_20-_20NetANalysis_20with_20Mozilla_20Firefox_20Search_thumb

Figure 7

It is apparent that both of these browsers seem to maintain a more accurate ‘Hit Count’.

Introduction

Over the past few weeks, there has been worldwide interest in the trial of Casey Anthony which was held in Orlando, Florida.  Anthony was indicted on charges of murder following the discovery of the body of her daughter Caylee Marie Anthony in 2008.  On Tuesday 5th July 2011, the jury returned a not guilty verdict and she was cleared of murdering her child.

Those of you who have followed this case and listened to the expert testimony may have been intrigued and possibly confused as to some of the alleged facts as the case unfolded.

Casey Anthony Digital Evidence

The digital forensic evidence in this case is of particular interest to me as it involved the recovery and analysis of a Mozilla Firefox history database.  The Internet history records within this database turned out to be extremely important to the prosecution case as the existence of Google searches relating to “chloroform” and other possibly relevant records prior to the child’s disappearance could have indicated premeditation.  This, of course, could have meant the difference between a conviction for murder in the first degree and manslaughter if found guilty.  The State of Florida also has the death penalty as a punishment option for capital crimes.

During a keyword search of Anthony’s computer, a hit was found for the word “chloroform”.  The hit was identified in what appeared to be a Mork database belonging to Mozilla Firefox.  The file was identified as residing in unallocated clusters, and rather surprisingly, is reported to have been intact.  Furthermore, all of the blocks belonging to the file were said to be contiguous.

Mork Database

The Mork database structure used by Mozilla Firefox v1-2 is unusual to say the least.  It was originally developed by Netscape for their browser (Netscape v6) and the format was later adopted by Mozilla to be used in Firefox.  It is a plain text format which is not easily human readable and is not efficient in its storage structures.  For example, a single Unicode character can take many bytes to store.  The developers themselves complained it was extremely difficult to parse correctly and from Firefox v3, it was replaced by MozStorage which is based on an SQLite database.

Forensic Analysis of the Digital Evidence

It is a matter of record that our software NetAnalysis® (v1.37) was used during the initial examination of this data, and then at a later stage another tool was used.  This is, of course, good forensic practice and is often referred to as “dual tool verification”.

Within a Mork database, the timestamp information relating to visits are stored as a micro-second count from an epoch of 1st January 1970 at 00:00:00 hours UTC (Universal Coordinated Time).  In NetAnalysis® v1.37, the forensic examiner had an option to leave the timestamps as they were recorded in the original evidence or to apply a bias to the UTC value to translate it to a local “Standard Time”.  In this older version, there was no option to present the timestamp as a local value adjusted for DST (Daylight Saving Time).  This changed in NetAnalysis® v1.50 when a further date column was introduced which presented the examiner with UTC and local times adjusted for DST.

According to video footage of the trial testimony, the forensic examiner wanted the output to reflect local time and not standard time and tried another tool.  This second tool was unable to recover any records from the Mork file.  The forensic examiner then approached the developer during a training course and discussed the issues he was having with the software.  The developer of the second tool then reviewed the Mork database over a period of a few nights and corrected the problem.  That software then managed to recover 8,557 records (320 less than NetAnalysis® was able to recover at the time).

Discrepancies between Forensic Tools

During testimony, the defence picked up on the fact that there were some major differences in the results produced by both tools.  The defence assertion was that the initial results produced by NetAnalysis® were in fact correct, and that the results from the second tool were flawed.  This was discussed at some lengths in the video testimony on 1st July 2011 when the forensic examiner was questioned regarding the differences.

According to CNN, Jose Baez, the lead counsel for the defence said:

“the state’s computer forensic evidence involving chloroform research, a central element of their premeditation argument, was used to mislead the jury and that the flaws in that evidence infected their entire case like a cancer.”

He pointed out the discrepancy between the first analysis the sheriff’s office did that showed one visit to a website about chloroform and an analysis done later with a second program that appeared to show 84 visits.  However, according to Baez, the first report showed a progression that made it clear that the 84 visits were actually to MySpace.

This was a major discrepancy with critical digital evidence presented in an extremely serious trial.  As the software developer of NetAnalysis®, I was extremely anxious to review the raw data and confirm the facts.

The first time I was made aware of this case (and the discrepancy between both tools) was around 9th June 2011.  To date, I have not been asked by any party representing the prosecution (or defence) to comment on the discrepancies between both tools.   I have however, since the conclusion of the trial, obtained a copy of the recovered “History.dat” Mork database file.

Mork Database File

Using this data, I will walk through the deconstruction of the critical elements of the file and verify the evidence presented during the trial.  The file is 3,338,603 bytes in length and contains data from a Mork database.

Mork Database Header

Figure 1

The block in Figure 1 shows the definition of the database table holding the history data.  The definition identifies the fields in each row as: “URL”, “Referrer”, “LastVisitDate”, “FirstVisitDate”, “VisitCount”, “Name”, “Hostname”, “Hidden”, “Typed”, “LastPageVisited”, and “ByteOrder”.  Not all of these fields will be present in every history record.  Each field is allocated an integer value for identification purposes.  For example, the “URL” field has been allocated the value 82.

According to the Mozilla Developers Network, the model is described as:

“The basic Mork content model is a table (or synonymously, a sparse matrix) composed of rows containing cells, where each cell is a member of exactly one column (col). Each cell is one attribute in a row. The name of the attribute is a literal designating the column, and the content of the attribute is the value. The content value of a cell is either a literal (lit) or a reference (ref). Each ref points to a lit or row or table, so a cell can “contain” another shared object by reference.”

Deconstructing the Mork Database

To demonstrate how this works, and to verify the data, we will walk through a couple of examples.  As we have no access to the SYSTEM registry hive from the suspect system, we must assume the computer was correctly set to Eastern Time in 2008 during these visits (time zone verification is always one of the first tasks for the forensic examiner prior to examining any time related evidence).

Figure 2 shows a screen shot of NetAnalysis® with the data loaded and filtered showing some of the records identified in the testimony from the trial.

NetAnalysis Screen with Mork Database Loaded

Figure 2

The first record (at the bottom of the screen) shows a visit to MySpace on 2008-03-21 15:16:13 (local time).  The visit count shows the value as 84.  The Mork record for this entry is shown in Figure 3.

Mork record 6E2F

Figure 3

The record is enclosed within square brackets and the individual fields for the record are enclosed within round brackets.  The data stored within the brackets contain name/value pairs.  Moving from left to right, the first block of data “-6E2F” identifies the Mork record ID (record ID values are not unique).  The first name/value pair shows (^82^B1).  If you refer back to the Mork header in Figure 1, we can see that field 82 refers to the “URL” (Uniform Resource Locator).   The data for this field is stored in cell B1.  The data cell is enclosed in brackets as shown in Figure 4 (line 47).  The cell data shows (B1=http://www.myspace.com/).

Mork Field B1

Figure 4

Using the same methodology, we can see that field 84 refers to “LastVisitDate” and is stored in cell 27F42 as shown in Figure 5 (2008-03-21 19:16:13 UTC / 2008-03-21 15:16:13 Local Time).  This integer represents the number of micro-seconds from the 1st  January 1970, 00:00:00 UTC.

Mork Field 27F42

Figure 5

Field 85 refers to “FirstVisitDate” and is stored in cell BAF8 as shown in Figure 6 (2007-12-26 20:25:56 UTC / 2007-12-26 20:25:56 15:25:56 Local Time).

Mork Field BAF8

Figure 6

Field 88 refers to “Hostname” and is stored in cell 16F as shown in Figure 7.

Mork Field 16F

Figure 7

Field 87 refers to “Name” and is stored in cell DA as shown in Figure 8.

Mork Field DA

Figure 8

Further examination of the Index in Figure 3 shows field 86.  This refers to the “VisitCount” and has been assigned the value 84.  This data is actually stored in the Index record and not a separate cell.  If an Index record does not have a field 86, then the “VisitCount” is 1.  Once the visit count is 2 or above, field 86 is assigned a value.  The last field 8A refers to the “Typed” flag and has been assigned the value 1.  This is a Boolean field 0 = False and 1 = True.

Decoded Record 6E2F

Figure 9

The data from this record has been gathered together in Figure 9.  The Name field relates to the Page Title and is stored in pseudo Unicode format with $00 representing 0x00 values.

According to the testimony during the trial, this record was not recovered by the second tool.

Visit Count Discrepancy

At various times during the trial, the prosecution referred to a visit to a page (“http://www.sci-spot.com/Chemistry/chloroform.htm”) which allegedly took place at 15:16:13 hours (local time) on 21st March 2008.  This record was recovered by the second forensic tool and indicated a visit count of 84.  This visit was as a result of a Google search for “how to make chloroform”.

This evidence contradicts the data recovered by NetAnalysis® which showed a single visit at 19:16:34 hours UTC (15:16:34 hours local time).  Figure 9 shows a visit to MySpace, which has been verified manually above, and shows 84 visits as of 21st March 2008 at 15:16:13 hours (local time).  This is the record highlighted in NetAnalysis® in Figure 2.

The Mork record containing “http://www.sci-spot.com/Chemistry/chloroform.htm” is identified as record 174EF.  The Index record from the original file is highlighted and shown in Figure 10 below.

Mork Record 174EF

Figure 10

The entire record is contained within square brackets.  The highlighted line above shows the full record.  The first field 82 (“URL”) is stored in cell 27F4B, as shown in Figure 11.

Mork Field 27F4B

Figure 11

The second field 84 (“LastVisitDate”) is stored in cell 27F4C, as shown in Figure 12 (2008-03-21 19:16:34 UTC / 2008-03-21 15:16:34 Local Time).  Once again, this integer represents the number of micro-seconds from the 1st  January 1970, 00:00:00 UTC.

Mork Field 27F4C

Figure 12

The third field 85 (“FirstVisitDate”) is stored in cell 27F4C.  This is the same cell value as for (“LastVisitDate”) and indicates this is the first visit to this web site during the scope of the current recorded history.  The First and Last visit times are the same.

The fourth field 83 (“Referrer”) is stored in cell 27F49, as shown in Figure 13.

Mork Field 27F49

Figure 13

The referrer field is very interesting from a forensic point of view as it shows the referring page.  As the HTTP GET is sent to the web server for a page, the browser also sends the referring page as part of the request.  This allows web masters to log the route by which visitors land on their pages.  Mozilla Firefox records this information for each record.  It is therefore relatively easy to track the actions of a user from page to page.  In this case, the referring site was a Google search for “how to make chloroform”.  With this information (which NetAnalysis® shows in the “Referral URL” Column) there really is no need to “guess” how a user arrived at a specific page.

The fifth field 88 (“Hostname”) is stored in cell 27F4D, as shown in Figure 14.

Mork Field 27F4D

Figure 14

The last field 87 (“Name”) is stored in cell 27F4E, as shown in Figure 15.  The decoded value for this string is “New Page 1”.

Mork Field 27F4E

Figure 15

Once again, I have gathered together the data for this record and presented it in a table format for easy review.  This can be seen in Figure 16.

Decoded Record 174EF

Figure 16

There are two critical points to make with this record.  Firstly, there is no field 86 (“VisitCount”) therefore this URL has only been visited once (not 84 times).  This is further corroborated by the fact that field 85 (“FirstVisitDate”) shows the exact same date/time as the “LastVisitDate”.  The second point is that the visit was recorded at 15:16:34 hours (local time) and NOT at 15:16:13 hours as was stated during the trial (from the report produced by the second forensic tool).

Validity of the Recovered File

With the release of NetAnalysis® v1.50, the Mork database parser was completely re-written from the ground up (as were the other parsing modules).  This was primarily to make the code easier to migrate and maintain and to ensure we were recovering as much data as possible.  I tested the current release of NetAnalysis® v1.52 against the Casey Anthony data.  I know from manually examining the data, there are 9,075 individual Index records.  Loading the data into NetAnalysis® resulted in 9,060 records being recovered.  This initially caused me some concern.  However, further examination of the data revealed that there was nothing to be concerned about.  There were 15 records which had missing “URL” cells; 14 of these records also had missing “LastVisitDate” cells.

If there are missing data cells within the file, this is a strong indicator that the file is not intact.

Conclusion

There are a number of conclusions to be drawn from the digital evidence presented in this trial; however, I will leave this to the members of the digital forensic community.  Forensic tool validation is certainly at the forefront of our thoughts.  Whilst it may not be possible to verify a tool, it is possible to verify the results against known data sets.  If two forensic tools produce completely different results, this should at least warrant further investigation.

References