Introduction to Character Encoding

Understanding how Character Encoding works is an essential part of understanding digital evidence. It is part of the common core of skills and knowledge.

A character set is a collection of letters and symbols used in a writing system. For example, the ASCII character set covers letters and symbols for English text, ISO-8859-6 covers letters and symbols needed for many languages based on the Arabic script, and the Unicode character set contains characters for most of the living languages and scripts in the world.

Characters in a character set are stored as one or more bytes. Each byte or sequence of bytes represents a given character. A character encoding is the key that maps a particular byte or sequence of bytes to particular characters that the font renders as text.

There are many different character encodings. If the wrong encoding is applied to a sequence of bytes, the result will be unintelligible text.

ASCII

The American Standard Code for Information Interchange, or ASCII code, was created in 1963 by the American Standards Association Committee. This code was developed from the reorder and expansion of a set of symbols and characters already used in telegraphy at that time by the Bell Company.

At first, it only included capital letters and numbers, however, in 1967 lowercase letters and some control characters were added forming what is known as US-ASCII. This encoding used the characters 0 through to 127.

7-bit ASCII is sufficient for encoding characters, number and punctuation used in English, but is insufficient for other languages.

Extended ASCII

Extended ASCII uses the full 8-bit character encoding and adds a further 128 characters for non-English characters and symbols.

 

Hex viewer showing extended ASCII character encoding

Unicode

Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, Europe alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use.

These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption. Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.

The Unicode Standard is a character coding system designed to support the worldwide interchange, processing, and display of the written texts of the diverse languages and technical disciplines of the modern world. In addition, it supports classical and historical texts of many written languages. Unicode 10.0 adds 8,518 characters, for a total of 136,690 characters.

Unicode can be implemented by different character encodings; the Unicode standard defines UTF-8, UTF-16, and UTF-32 (Unicode Transformation Format).

Codepoint

The number assigned to a character is called a codepoint. An encoding defines how many codepoints there are, and which abstract letters they represent e.g. “Latin Capital Letter A”. Furthermore, an encoding defines how the codepoint can be represented as one or more bytes.

The following image shows the encoding of an uppercase letter A using standard ASCII.

 

Image showing character encoding and the transition from Character A to binary and codepoints

 

UTF-8, UTF-16 and UTF-32

UTF-8 is the most widely used encoding and is variable in length. It is capable of encoding all valid Unicode code points and can use between 1 and 4 bytes for each code point. The first 128 code points require 1 byte and match ASCII.

UTF-16 is also a variable-length and is capable of encoding all valid Unicode code points. Characters are encoded with one or two 16-bit code units. UTF-16 was developed from an earlier fixed-width 16-bit encoding known as UCS-2 (for 2-byte Universal Character Set).

UTF-32 is a fixed length encoding that requires 4 bytes for every Unicode code point.

Browser Data Analysis

It is important to understand character encoding when examining Internet and browser data. Browser applications use a variety of different encoding methods for storing data. For example, some browsers use UTF-16 for storing page titles and the default Windows encoding for storing URL data (e.g. Windows 1252). Windows 1252 is a 1-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages.

Selecting a Code Page in NetAnalysis®

An appropriate Code Page can be selected when creating a New Case in NetAnalysis®.

Digital Detective NetAnalysis® new case screen and option to set character encoding

Clicking the button next to the code page shows the following window. This allows the user to select the appropriate code page (if required).

 

Digital Detective NetAnalysis® code page screen to select character encoding

References

Introduction

When using third party image mounting tools to perform the forensic examination of NTFS file systems, it is extremely important to understand NTFS Junction Points so that you don’t find yourself making a critical mistake during your analysis. An issue has been identified with third party image mounting software where NTFS junction points are hard linked to folders on the forensic investigator’s own hard disk. If you use software to process a file system (such as NetAnalysis® or Anti-Virus software) and the file system is mounted with junction points, the Operating System on the forensic workstation may point the software to folders which are not contained within the suspect volume. This leads to the extremely serious situation, where the investigator may inadvertently process their own file system.

This is possible with the following Operating Systems and filesystems:

Operating / File System
Microsoft Windows Vista with NTFS volumes (and server Operating Systems)
Microsoft Windows 7 with NTFS volumes (and server Operating Systems)
Microsoft Windows 8 with NTFS volumes (and server Operating Systems)

Symbolic Links

Windows 2000 and higher supports directory symbolic links, where a directory serves as a symbolic link to another directory on the computer. By using junction points, you can graft a target folder onto another NTFS folder or “mount” a volume onto an NTFS junction point. Junction points are transparent to software applications.

An NTFS symbolic link (symlink) is a file system object in the NTFS file system that points to another file system object. The object being pointed to is called the target. Symbolic links should be transparent to users; the links appear as normal files or directories, and can be acted upon by the user or application in exactly the same manner. Symbolic links are designed to aid in migration and application compatibility with POSIX operating systems, and were introduced with the modifications made to the NTFS file system with Windows Vista. Unlike an NTFS junction point (available since Windows 2000), a symbolic link can also point to a file or remote SMB network path. Additionally, the NTFS symbolic link implementation provides full support for cross file system links. However, the functionality enabling cross-host symbolic links requires that the remote system also support them, which effectively limits their support to Windows Vista and later Windows operating systems.

Unlike an NTFS junction point, a symbolic link can also point to a file or remote SMB network path. While NTFS junction points support only absolute paths on local drives, the NTFS symbolic links allow linking using relative paths. Additionally, the NTFS symbolic link implementation provides full support for cross file system links. However, the functionality enabling cross-host symbolic links requires that the remote system also support them, which effectively limits their support to Windows Vista and later Windows operating systems.

Junction Points

In Windows Vista, Windows Server 2008 and Windows 8, the default locations for user data and system data have changed. For example, user data that was previously stored in the %SystemDrive%\Documents and Settings directory is now stored in the %SystemDrive%\Users directory. For backward compatibility, the old locations have junction points that point to the new locations. For example, C:\Documents and Settings is now a junction point that points to C:\Users. Backup applications must be capable of backing up and restoring junction points. These junction points can be identified as follows:

  • They have the FILE_ATTRIBUTE_REPARSE_POINT, FILE_ATTRIBUTE_HIDDEN, and FILE_ATTRIBUTE_SYSTEM file attributes set.
  • They also have their access control lists (ACLs) set to deny read access to everyone.

Applications that call out a specific path can traverse these junction points if they have the required permissions. However, attempts to enumerate the contents of the junction points will result in failures. It is important that backup applications do not traverse these junction points, or attempt to backup data under them, for two reasons:

  • Doing so can cause the backup application to back up the same data more than once.
  • It can also lead to cycles (circular references).
Warning

Some mounting tools do not respect these permissions and therefore allow software applications to follow the links. As the links are hard coded into the file system, they can point to actual folder locations on the forensic workstation.

Per-User Junctions and System Junctions

The junction points that are used to provide file and registry virtualisation in Windows Vista, Windows Server 2008 and Windows 8 can be divided into two classes: per-user junctions and system junctions.

Per-user junctions are created inside each individual user’s profile to provide backward compatibility for user applications. The junction point at C:\Users\[username]\My Documents that points to C:\Users\[username]\Documents is an example of a per-user junction. Per-user junctions are created by the Profile service when the user’s profile is created.

The other junctions are system junctions that do not reside under the Users\[username] directory. Examples of system junctions include:

  • Documents and Settings
  • Junctions within the All Users, Public, and Default User profiles

Examining Junction Points

The following image shows a volume containing junction points. You can also see the corresponding hard link.

Even though, this volume is mounted as F, accessing the folder F:\Users\All Users opens the link and presents the files from C:\ProgramData as if they were actually contained within F:\Users\All Users.

 

 

References

Good Practice for e-Crime Investigations

Criminal behaviour has shifted to take advantage of electronic mediums and serious and organised criminal networks have become increasingly sophisticated. Corporations, Government departments and businesses now need to invest considerable sums in order to protect their assets and data. Lloyds of London have stated that they are defending up to sixty attacks a day on their corporate infrastructure. Policing needs to equip itself with the necessary skills and knowledge to meet this new challenge.

The Internet, computer networks and automated data systems present wide ranging opportunities for criminals, including organised criminal networks, to partake in an extensive variety of offences; this presents a significant challenge to law enforcement globally.

One of the principal difficulties facing the law enforcement community is how best to tackle the complex and dynamic developments associated with the Internet, Digital Information and evolutions in communications technology. This creates difficulties in the consistency of approach and enforcement nationally; there is a clear need to harmonise practices and procedures throughout the UK. At the same time it should be possible to learn how best to develop and share the experience and skills within British Policing has.

The ACPO Good Practice Guide for Managers of Hi-Tech Crime Units used to be a restricted document; this is no longer the case. The current guide can be downloaded from the link below:

Downloads

The latest version can be downloaded here:

Digital Evidence Good Practice

The ACPO good practice guide for dealing with computer based evidence was first released in the late 1990s. Since then, there have been five iterations; some of the changes include an update in document title. The guide is essential reading for anyone involved in the field of digital forensics. The latest version “ACPO Good Practice Guide for Digital Evidence” has been updated to include more than just evidence from computers.

According to DAC Janet Williams QPM, ACPO lead for the e-Crime Portfolio:

This guide has changed from version 4, where it centred on computer based evidence; the new revision reflects digital based evidence and attempts to encompass the diversity of the digital world. As such this guide would not only assist law enforcement but the wider family that assists in investigating cyber security incidents. I commend all to read and make use of the knowledge and learning contained in this guide to provide us with the right tools to carry out our role.

 

Foreward

It seems that whenever a review of ACPO guidance is carried out we are in the middle of technological changes that have vast impact on the work that is done within digital forensic units. It is a testament to the authors of the original four guiding principles for digital forensics that they still hold today, and one of the key early decisions of the review board was to keep those four principles, with only a slight change of wording to principle four.

We work in an area of constant change. There is a continuing need to re-evaluate and revise our capacities to perform our duties. There is a need to recover and analyse digital data that can now be found within the many devices that are within day to day use, and can supply vital evidence in all our investigations.

Hence a second key early decision was to change the title of the document to ACPO Good Practice Guide for Digital Evidence. This would hopefully encompass all aspects of digital evidence and remove the difficulty about trying to draw the line to what is or isn’t a computer and thus falling within the remit of this guide.

It is important that people who work within the arena of digital forensics do not just concentrate on the technology, as essential as that is, but that the processes we use are fit for the purpose, and that skills and capacities within units reflect the demands that are made on them.

A prime example of this is the use of the word ’triage’. It has been a subject of much discussion within the forensic community. It should be noted that it does not mean a single triage tool rather it is a complete process where certain tools will play a part but are not the whole solution.

This guide is not intended to be an A-Z of digital forensics, or a specific “how to do” instruction manual. It should paint an overall picture and provides an underlying structure to what is required within Digital Forensic Units (DFUs). Therefore, the guide has been produced as a high-level document without the specific guidance included in previous versions, as this guidance is now available elsewhere. Where relevant, links to other guidance documents will be given.

In this document Digital Forensic Unit is used to cover any type of group that is actively involved in the processing of digital evidence.

Downloads

The latest version can be downloaded here:

Introduction to Blade® v1.9

We are pleased to announce the release of Blade v1.9.

Digital Detective Software - Blade Professional - Forensic Data Recovery

 

This release of Blade® brings a number of fixes and some great new features.  This is the first release of Blade® to have evaluation capabilities which allow the user to test and evaluate our software for 30  days. When Blade is installed on a workstation for the first time (and a valid USB dongle licence is not inserted) the software will function in evaluation mode.

The following list contains a summary of the new features:

  • Support for Advanced Forensic Format (AFF®)
  • Hiberfil.sys converter – supports XP, Vista, Windows 7 32 and 64bit
  • Accurate hiberfil.sys memory mapping, not just Xpress block decompression
  • Hiberfil.sys slack recovery
  • Codepage setting for enhanced multi-language support
  • SQLite database recovery
  • 30  Day evaluation version of Blade® Professional
  • New recovery profile parameters for more advanced and accurate data recovery
  • Support for Logicube Forensic Dossier®
  • Support for OMA DRM Content Format for Discrete Media Profile (DCF)

We have also been working on the data recovery engines to make them more efficient and much faster than before. The searching speed has been significantly increased.

Release Information

For further information, please see the following:

Introduction

One of the growth areas in digital forensics is the use of USB dongles for the licensing of software. Every single practitioner now finds themselves in dongle hell trying to manage a veritable menagerie of tiny USB devices just to enable them to carry out their day-to-day work.

Of course, where dongles for core forensic software are concerned, most people will possess their own Digital Detective, EnCase or FTK dongles and these will be jealously guarded, with practitioners unwilling to let their prized (and in some cases, very expensive) hardware leave their sight. But what about some of the lesser used, but no less valuable, licencing dongles out there? At the moment, most labs will resound to the cries of “who’s got the X dongle? I need it to do Y”. Several minutes of frantic searching and head scratching then ensues, until someone remembers that they borrowed it to use in the imaging lab for five minutes, two weeks ago. One solution to this problem is network attached dongle server.

 

Avoiding dongle hell with a network attached SEH MyUTN-80 Dongle Server

Network Attached Dongle Servers

Dongle servers from SEH make USB dongles available over a network. You use your copy-protected software as usual but you don’t need to connect the license dongles directly to your client.

As with locally connected dongles, only one user can use the respective dongle over the point-to-point network connection.

The SEH UTN Manager software tool for Windows, OS X and Linux gives you access to your dongles as if they were connected directly to your computer. The SEH UTN Manager is installed on all notebooks, PCs, servers, and terminals that require dongle access.

 

SEH UTN Dongle Manager

This means that all of your licensing dongles can be stored in one location, and accessible to all of your staff via your forensic network. The port area of the dongle server is lockable, meaning that no-one is able to remove dongles without the key; and if you use the rack-mounting kit, the dongle server can even go in your server rack for further security.

 

Avoid dongle hell with a rack mounted SEH MyUTN-800 Network Attached Dongle Server

If working practices allow, the dongle server can be accessed over the Internet, meaning that on-site working doesn’t have to involve carrying around thousands of pounds worth of dongles. A remote worker can also have temporary access to a dongle when required. The server works with all the common forensic dongles such as Feitian, Aladdin HASP, SafeNet and Wibu CodeMeter. This means that even your core forensic function dongles can be kept securely locked away, safe from loss or damage.

Main Benefits

  • Easily share any licensing dongle via the local area network
  • Lock away expensive dongles to prevent theft
  • Prevent damage through constant insertion and removal
  • Easily share, and provide dongle access to remote workers
  • Easily share licensing dongles in the lab without having to constantly plug/unplug and throw them around

This would be an ideal purchase for small offices that cannot afford to buy licences for everyone, particularly for expensive software which may not be used every day.

Where to Buy

So, if you too would like to avoid dongle hell, please contact us for a quote.

Introduction

A frequent question when dealing with browser forensics is ‘Does the Hit Count value mean that the user visited site ‘x’, on ‘y’ occasions?’ Most browsers record a ‘Hit Count’ value in one or more of the files they use to track browser activity, and it is important that an analyst understands any potential pitfalls associated with the accuracy, or otherwise, of this value.

We recently received a support request from an analyst who was analysing Internet Explorer data. They had found a record relating to a Bing Images search, which showed a hit count of 911. The particular search string was significant, and very damning had it actually been used 911 times. The analyst wanted to know if the hit count value could be relied upon.

The following experiment was carried out in order to establish how this surprisingly high hit count value could have been generated. In order to obtain a data set which contained as little extraneous data as possible, a brand new VMWare virtual machine was created. The machine was setup from the Microsoft Windows XP SP3 installation disc, which installed Internet Explorer v 6.0.2900.5512.xpsp.080413-2111 by default. Two user accounts were created on the machine – one to be used as an Admin account, for installing software etc; and the other to be used as the ‘browsing’ account. This separation of the accounts further assisted with minimising the possibility of any unwanted data being present within the ‘browsing’ account. Using the Admin account, the version of Internet Explorer in use on the virtual machine was upgraded to IE v 8.0.6001.18702. The ‘browsing’ account was then used for the first time. Starting Internet Explorer immediately directed the user to the MSN homepage. The address ‘www.bing.com’ was typed into the address bar, which led to the Bing search engine homepage. The ‘Images’ tab was clicked. This Auto Suggested a search criterion of ‘Beautiful Britain’, as can be seen in the figure below:

 

IE_2520Bing_2520Images_2520search_2520-_2520aston_2520martin_25202_thumb

Figure 1

The term ‘aston martin’ was then typed into the search box, as shown below:

 

Figure 2

None of the images were clicked or zoomed, nor was the result screen scrolled. Internet Explorer was closed, and the browsing account logged off. The Admin account was used to extract the browser data for processing in NetAnalysis. The below image shows some of the results. Both of these entries are from Master History INDEX.DAT files:

 

Figure 3

 

As can be seen, both entries show a hit count of 5. Both of these pages were visited only once, so it is immediately apparent that the hit count value maintained by Internet Explorer may not be an accurate count of how many times a particular page has been visited. However, this still did not explain how Internet Explorer had produced a hit count of 911.

The virtual machine was started again, and the browsing account logged on. The previous steps were repeated; typing ‘www.bing.com’ into the URL bar; visiting the Bing homepage; and clicking on the ‘Images’ tab. Once again, Bing Auto Suggested the search criterion of ‘Beautiful Britain’, and displayed the same thumbnail results page. The search criterion ‘aston martin’ was again typed into the search box and the same thumbnail results page was produced. None of the images were clicked or zoomed. The results page was scrolled using the side scroll bar, which generated more thumbnails as it went. Internet Explorer was closed, and the browsing account logged off. The Admin account was used to extract the browser data for processing in NetAnalysis. The below image shows some of the results. Both of these entries are again from Master History INDEX.DAT files:

 

Figure_204_20-_20NetAnalysis_20showing_20511_20hit_20count_thumb

Figure 4

As can be seen, the ‘Beautiful Britain’ search now has a hit count of 13 – it is not at all clear how Internet Explorer determined this figure. Moreover, the ‘aston martin’ search now shows a hit count of 511. This page was not visited 511 times, nor were 511 of the thumbnail images clicked. The contents of the INDEX.DAT for the local cache folders (Content.IE5) were checked to see how many records were held relating to thumbnails that had been cached. The results were as follows:

 

Figure 5

So it does not even appear that there are 511 thumbnails held in the local cache. The result page was scrolled quickly, so the user did not see a large proportion of the thumbnail images.

In conclusion, it is apparent that the ‘Hit Count’ maintained by Internet Explorer cannot be relied upon. Although this experiment involved a quite specific process relating solely to image searches carried out on one particular search engine, the disparity between results and reality makes it clear that unquestioning acceptance of what Internet Explorer is recording as a ‘Hit Count’ could lead to significant errors if presented in evidence.

To complete the experiment, two further identical Virtual Machines were created. On one, the Google Chrome browser (v 15.0.874.106 m) was installed and used. On the other, the Mozilla Firefox browser (v 8.0) was installed and used. The same steps were repeated: typing ‘www.bing.com’ into the URL bar; visiting the Bing homepage; and clicking on the ‘Images’ tab. The results from these processes are shown below:

Chrome:

Figure_206_20-_20NetAnalysis_20with_20Google_20Chrome_20Search_thumb

Figure 6

 

Firefox:

Figure_207_20-_20NetANalysis_20with_20Mozilla_20Firefox_20Search_thumb

Figure 7

It is apparent that both of these browsers seem to maintain a more accurate ‘Hit Count’.

Internet Explorer Data

As forensic examiners will be aware, Microsoft Internet Explorer stores cached data within randomly assigned folders. This behaviour was designed to prevent Internet data being stored in predictable locations on the local system in order to foil a number of attack types. Prior to the release of Internet Explorer v9.0.2, cookies were an exception to this behaviour and their location was insufficiently random in many cases.

Cookie Files

Generally, for Vista and Windows 7, cookie files are stored in the location shown below:

\AppData\Roaming\Microsoft\Windows\Cookies

The cookie filename format was the user’s login name, the @ symbol and then a partial hostname for the domain of the cookie.

Cookie Files with Standard Name

With sufficient information about a user’s environment, an attacker might have been able to establish the location of any given cookie and use this information in an attack.

Random Cookie Filenames

To mitigate the threat, Internet Explorer 9.0.2 now names the cookie files using a randomly-generated alphanumeric string. Older cookies are not renamed during the upgrade, but are instead renamed as soon as any update to the cookie data occurs. The image below shows an updated cookie folder containing the new files.

Random Cookie Names

This change will have no impact on dealing with the examination of cookie data; however. it will no longer be possible to identify which domain a cookie belongs to from just the file name.

NetAnalysis showing Random Cookie Names

NetAnalysis showing Random Cookie Names

Introduction

Over the past few weeks, there has been worldwide interest in the trial of Casey Anthony which was held in Orlando, Florida.  Anthony was indicted on charges of murder following the discovery of the body of her daughter Caylee Marie Anthony in 2008.  On Tuesday 5th July 2011, the jury returned a not guilty verdict and she was cleared of murdering her child.

Those of you who have followed this case and listened to the expert testimony may have been intrigued and possibly confused as to some of the alleged facts as the case unfolded.

Casey Anthony Digital Evidence

The digital forensic evidence in this case is of particular interest to me as it involved the recovery and analysis of a Mozilla Firefox history database.  The Internet history records within this database turned out to be extremely important to the prosecution case as the existence of Google searches relating to “chloroform” and other possibly relevant records prior to the child’s disappearance could have indicated premeditation.  This, of course, could have meant the difference between a conviction for murder in the first degree and manslaughter if found guilty.  The State of Florida also has the death penalty as a punishment option for capital crimes.

During a keyword search of Anthony’s computer, a hit was found for the word “chloroform”.  The hit was identified in what appeared to be a Mork database belonging to Mozilla Firefox.  The file was identified as residing in unallocated clusters, and rather surprisingly, is reported to have been intact.  Furthermore, all of the blocks belonging to the file were said to be contiguous.

Mork Database

The Mork database structure used by Mozilla Firefox v1-2 is unusual to say the least.  It was originally developed by Netscape for their browser (Netscape v6) and the format was later adopted by Mozilla to be used in Firefox.  It is a plain text format which is not easily human readable and is not efficient in its storage structures.  For example, a single Unicode character can take many bytes to store.  The developers themselves complained it was extremely difficult to parse correctly and from Firefox v3, it was replaced by MozStorage which is based on an SQLite database.

Forensic Analysis of the Digital Evidence

It is a matter of record that our software NetAnalysis® (v1.37) was used during the initial examination of this data, and then at a later stage another tool was used.  This is, of course, good forensic practice and is often referred to as “dual tool verification”.

Within a Mork database, the timestamp information relating to visits are stored as a micro-second count from an epoch of 1st January 1970 at 00:00:00 hours UTC (Universal Coordinated Time).  In NetAnalysis® v1.37, the forensic examiner had an option to leave the timestamps as they were recorded in the original evidence or to apply a bias to the UTC value to translate it to a local “Standard Time”.  In this older version, there was no option to present the timestamp as a local value adjusted for DST (Daylight Saving Time).  This changed in NetAnalysis® v1.50 when a further date column was introduced which presented the examiner with UTC and local times adjusted for DST.

According to video footage of the trial testimony, the forensic examiner wanted the output to reflect local time and not standard time and tried another tool.  This second tool was unable to recover any records from the Mork file.  The forensic examiner then approached the developer during a training course and discussed the issues he was having with the software.  The developer of the second tool then reviewed the Mork database over a period of a few nights and corrected the problem.  That software then managed to recover 8,557 records (320 less than NetAnalysis® was able to recover at the time).

Discrepancies between Forensic Tools

During testimony, the defence picked up on the fact that there were some major differences in the results produced by both tools.  The defence assertion was that the initial results produced by NetAnalysis® were in fact correct, and that the results from the second tool were flawed.  This was discussed at some lengths in the video testimony on 1st July 2011 when the forensic examiner was questioned regarding the differences.

According to CNN, Jose Baez, the lead counsel for the defence said:

“the state’s computer forensic evidence involving chloroform research, a central element of their premeditation argument, was used to mislead the jury and that the flaws in that evidence infected their entire case like a cancer.”

He pointed out the discrepancy between the first analysis the sheriff’s office did that showed one visit to a website about chloroform and an analysis done later with a second program that appeared to show 84 visits.  However, according to Baez, the first report showed a progression that made it clear that the 84 visits were actually to MySpace.

This was a major discrepancy with critical digital evidence presented in an extremely serious trial.  As the software developer of NetAnalysis®, I was extremely anxious to review the raw data and confirm the facts.

The first time I was made aware of this case (and the discrepancy between both tools) was around 9th June 2011.  To date, I have not been asked by any party representing the prosecution (or defence) to comment on the discrepancies between both tools.   I have however, since the conclusion of the trial, obtained a copy of the recovered “History.dat” Mork database file.

Mork Database File

Using this data, I will walk through the deconstruction of the critical elements of the file and verify the evidence presented during the trial.  The file is 3,338,603 bytes in length and contains data from a Mork database.

Mork Database Header

Figure 1

The block in Figure 1 shows the definition of the database table holding the history data.  The definition identifies the fields in each row as: “URL”, “Referrer”, “LastVisitDate”, “FirstVisitDate”, “VisitCount”, “Name”, “Hostname”, “Hidden”, “Typed”, “LastPageVisited”, and “ByteOrder”.  Not all of these fields will be present in every history record.  Each field is allocated an integer value for identification purposes.  For example, the “URL” field has been allocated the value 82.

According to the Mozilla Developers Network, the model is described as:

“The basic Mork content model is a table (or synonymously, a sparse matrix) composed of rows containing cells, where each cell is a member of exactly one column (col). Each cell is one attribute in a row. The name of the attribute is a literal designating the column, and the content of the attribute is the value. The content value of a cell is either a literal (lit) or a reference (ref). Each ref points to a lit or row or table, so a cell can “contain” another shared object by reference.”

Deconstructing the Mork Database

To demonstrate how this works, and to verify the data, we will walk through a couple of examples.  As we have no access to the SYSTEM registry hive from the suspect system, we must assume the computer was correctly set to Eastern Time in 2008 during these visits (time zone verification is always one of the first tasks for the forensic examiner prior to examining any time related evidence).

Figure 2 shows a screen shot of NetAnalysis® with the data loaded and filtered showing some of the records identified in the testimony from the trial.

NetAnalysis Screen with Mork Database Loaded

Figure 2

The first record (at the bottom of the screen) shows a visit to MySpace on 2008-03-21 15:16:13 (local time).  The visit count shows the value as 84.  The Mork record for this entry is shown in Figure 3.

Mork record 6E2F

Figure 3

The record is enclosed within square brackets and the individual fields for the record are enclosed within round brackets.  The data stored within the brackets contain name/value pairs.  Moving from left to right, the first block of data “-6E2F” identifies the Mork record ID (record ID values are not unique).  The first name/value pair shows (^82^B1).  If you refer back to the Mork header in Figure 1, we can see that field 82 refers to the “URL” (Uniform Resource Locator).   The data for this field is stored in cell B1.  The data cell is enclosed in brackets as shown in Figure 4 (line 47).  The cell data shows (B1=http://www.myspace.com/).

Mork Field B1

Figure 4

Using the same methodology, we can see that field 84 refers to “LastVisitDate” and is stored in cell 27F42 as shown in Figure 5 (2008-03-21 19:16:13 UTC / 2008-03-21 15:16:13 Local Time).  This integer represents the number of micro-seconds from the 1st  January 1970, 00:00:00 UTC.

Mork Field 27F42

Figure 5

Field 85 refers to “FirstVisitDate” and is stored in cell BAF8 as shown in Figure 6 (2007-12-26 20:25:56 UTC / 2007-12-26 20:25:56 15:25:56 Local Time).

Mork Field BAF8

Figure 6

Field 88 refers to “Hostname” and is stored in cell 16F as shown in Figure 7.

Mork Field 16F

Figure 7

Field 87 refers to “Name” and is stored in cell DA as shown in Figure 8.

Mork Field DA

Figure 8

Further examination of the Index in Figure 3 shows field 86.  This refers to the “VisitCount” and has been assigned the value 84.  This data is actually stored in the Index record and not a separate cell.  If an Index record does not have a field 86, then the “VisitCount” is 1.  Once the visit count is 2 or above, field 86 is assigned a value.  The last field 8A refers to the “Typed” flag and has been assigned the value 1.  This is a Boolean field 0 = False and 1 = True.

Decoded Record 6E2F

Figure 9

The data from this record has been gathered together in Figure 9.  The Name field relates to the Page Title and is stored in pseudo Unicode format with $00 representing 0x00 values.

According to the testimony during the trial, this record was not recovered by the second tool.

Visit Count Discrepancy

At various times during the trial, the prosecution referred to a visit to a page (“http://www.sci-spot.com/Chemistry/chloroform.htm”) which allegedly took place at 15:16:13 hours (local time) on 21st March 2008.  This record was recovered by the second forensic tool and indicated a visit count of 84.  This visit was as a result of a Google search for “how to make chloroform”.

This evidence contradicts the data recovered by NetAnalysis® which showed a single visit at 19:16:34 hours UTC (15:16:34 hours local time).  Figure 9 shows a visit to MySpace, which has been verified manually above, and shows 84 visits as of 21st March 2008 at 15:16:13 hours (local time).  This is the record highlighted in NetAnalysis® in Figure 2.

The Mork record containing “http://www.sci-spot.com/Chemistry/chloroform.htm” is identified as record 174EF.  The Index record from the original file is highlighted and shown in Figure 10 below.

Mork Record 174EF

Figure 10

The entire record is contained within square brackets.  The highlighted line above shows the full record.  The first field 82 (“URL”) is stored in cell 27F4B, as shown in Figure 11.

Mork Field 27F4B

Figure 11

The second field 84 (“LastVisitDate”) is stored in cell 27F4C, as shown in Figure 12 (2008-03-21 19:16:34 UTC / 2008-03-21 15:16:34 Local Time).  Once again, this integer represents the number of micro-seconds from the 1st  January 1970, 00:00:00 UTC.

Mork Field 27F4C

Figure 12

The third field 85 (“FirstVisitDate”) is stored in cell 27F4C.  This is the same cell value as for (“LastVisitDate”) and indicates this is the first visit to this web site during the scope of the current recorded history.  The First and Last visit times are the same.

The fourth field 83 (“Referrer”) is stored in cell 27F49, as shown in Figure 13.

Mork Field 27F49

Figure 13

The referrer field is very interesting from a forensic point of view as it shows the referring page.  As the HTTP GET is sent to the web server for a page, the browser also sends the referring page as part of the request.  This allows web masters to log the route by which visitors land on their pages.  Mozilla Firefox records this information for each record.  It is therefore relatively easy to track the actions of a user from page to page.  In this case, the referring site was a Google search for “how to make chloroform”.  With this information (which NetAnalysis® shows in the “Referral URL” Column) there really is no need to “guess” how a user arrived at a specific page.

The fifth field 88 (“Hostname”) is stored in cell 27F4D, as shown in Figure 14.

Mork Field 27F4D

Figure 14

The last field 87 (“Name”) is stored in cell 27F4E, as shown in Figure 15.  The decoded value for this string is “New Page 1”.

Mork Field 27F4E

Figure 15

Once again, I have gathered together the data for this record and presented it in a table format for easy review.  This can be seen in Figure 16.

Decoded Record 174EF

Figure 16

There are two critical points to make with this record.  Firstly, there is no field 86 (“VisitCount”) therefore this URL has only been visited once (not 84 times).  This is further corroborated by the fact that field 85 (“FirstVisitDate”) shows the exact same date/time as the “LastVisitDate”.  The second point is that the visit was recorded at 15:16:34 hours (local time) and NOT at 15:16:13 hours as was stated during the trial (from the report produced by the second forensic tool).

Validity of the Recovered File

With the release of NetAnalysis® v1.50, the Mork database parser was completely re-written from the ground up (as were the other parsing modules).  This was primarily to make the code easier to migrate and maintain and to ensure we were recovering as much data as possible.  I tested the current release of NetAnalysis® v1.52 against the Casey Anthony data.  I know from manually examining the data, there are 9,075 individual Index records.  Loading the data into NetAnalysis® resulted in 9,060 records being recovered.  This initially caused me some concern.  However, further examination of the data revealed that there was nothing to be concerned about.  There were 15 records which had missing “URL” cells; 14 of these records also had missing “LastVisitDate” cells.

If there are missing data cells within the file, this is a strong indicator that the file is not intact.

Conclusion

There are a number of conclusions to be drawn from the digital evidence presented in this trial; however, I will leave this to the members of the digital forensic community.  Forensic tool validation is certainly at the forefront of our thoughts.  Whilst it may not be possible to verify a tool, it is possible to verify the results against known data sets.  If two forensic tools produce completely different results, this should at least warrant further investigation.

References

Introduction

Safari is a web browser developed by Apple and is included as part of the Apple Macintosh OS X operating system.  It has been the default browser on all Apple computers since Mac OS X version 10.3 Panther and its first public release was in 2003.  Safari is currently at major version 5 released in June 2010.

In June 2007 Apple released a version of Safari for Microsoft Windows operating systems.  The version of Safari at this time was version 3.  Windows versions have been updated in parallel with Mac OS X versions ever since and are also at the time of writing at version 5.

Forensic Analysis of Safari

NetAnalysis® v1 currently supports the analysis of all versions of Safari.  Safari runs on Microsoft Windows and Apple Macintosh OS X operating systems.  The data created by Safari is file based and the structure of the data it creates is similar between operating systems.

Safari Browser v3 – 5

Safari, like all web browsers, aggressively prompts the user to update to the latest version to incorporate new security patches.  This means that you are likely to find the most recent version on computers currently in use, which at the time of writing is Version 5.

Internet History and Cache data is stored within each users profile, the exact location will vary depending on the operating system in use.

Safari stores Internet history records within an Apple property list file entitled history.plist (as shown in Figure 1).  Property list files have the file extension .plist and therefore are often referred to as plist files.  Plist files may be in either an XML format or a binary format.  For earlier versions of Safari (both Windows and Macintosh variants) the history.plist file was in the XML format.  Later and current versions utilise the binary plist format.  NetAnalysis parses both the XML and binary formatted history plist files.

Apple History Folder

Figure 1

Safari versions 3 to 5 store the cache in SQLite 3 database files entitled cache.db (as shown in Figure 2).  Earlier versions of Safari stored cache in files that had the file extension .cache.  These files are not currently supported.

Apple Cache Folder

Figure 2

Stage 1 – Recovery of Live Safari Data

To process and examine Safari live Internet history and cache with NetAnalysis, the following methodology should be used.  In the first instance, it is important to obtain the live data still resident within the file system (web pages can only be rebuilt from live cache data).

This can be done in either of the following three ways:

  • Export all of the data (preferably in the original folder structure) utilising a mainstream forensic tool
  • Mount the image using a forensic image tool
  • Access the original disk via a write protection device

Once the data has been extracted to an export folder, open NetAnalysis® and select File » Open All History From Folder.  Select the folder containing your exported Safari data.

BrowseForFolder

Figure 3

 

Stage 2 – Recovery of Deleted Safari Data

HstEx® is a Windows-based, advanced professional forensic data recovery solution designed to recover deleted browser artefacts and Internet history from a number of different source evidence types.  HstEx® supports all of the major forensic image formats.

HstEx® currently supports the recovery of Safari XML and Binary plist data.  It cannot at the moment recover cache records (research and development is currently being conducted).  Figure 4 shows HstEx® processing

HstEx Processing Apple

Figure 4

Please see the following link for information on using HstEx® to recover browser data:

Please ensure you select the correct Data Type prior to processing.  Safari v5 stores history data in binary plist files.  When HstEx has finished processing, it will open a window similar to the one shown in Figure 5.  These files can now be imported into NetAnalysis® v1 by either selecting File» Open History and selecting all of the files, or select File » Open All History From Folder and selecting the root recovery folder.

 

HstEx Output Folder for Apple Safari Extraction

Figure 5

Default Folder Locations

Apple Safari data can be found in the following default folder locations (Figure 6):

FileLocations

Figure 6

Further Reading