Introduction to Character Encoding

Understanding how Character Encoding works is an essential part of understanding digital evidence. It is part of the common core of skills and knowledge.

A character set is a collection of letters and symbols used in a writing system. For example, the ASCII character set covers letters and symbols for English text, ISO-8859-6 covers letters and symbols needed for many languages based on the Arabic script, and the Unicode character set contains characters for most of the living languages and scripts in the world.

Characters in a character set are stored as one or more bytes. Each byte or sequence of bytes represents a given character. A character encoding is the key that maps a particular byte or sequence of bytes to particular characters that the font renders as text.

There are many different character encodings. If the wrong encoding is applied to a sequence of bytes, the result will be unintelligible text.


The American Standard Code for Information Interchange, or ASCII code, was created in 1963 by the American Standards Association Committee. This code was developed from the reorder and expansion of a set of symbols and characters already used in telegraphy at that time by the Bell Company.

At first, it only included capital letters and numbers, however, in 1967 lowercase letters and some control characters were added forming what is known as US-ASCII. This encoding used the characters 0 through to 127.

7-bit ASCII is sufficient for encoding characters, number and punctuation used in English, but is insufficient for other languages.

Extended ASCII

Extended ASCII uses the full 8-bit character encoding and adds a further 128 characters for non-English characters and symbols.


Hex viewer showing extended ASCII character encoding


Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, Europe alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use.

These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption. Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.

The Unicode Standard is a character coding system designed to support the worldwide interchange, processing, and display of the written texts of the diverse languages and technical disciplines of the modern world. In addition, it supports classical and historical texts of many written languages. Unicode 10.0 adds 8,518 characters, for a total of 136,690 characters.

Unicode can be implemented by different character encodings; the Unicode standard defines UTF-8, UTF-16, and UTF-32 (Unicode Transformation Format).


The number assigned to a character is called a codepoint. An encoding defines how many codepoints there are, and which abstract letters they represent e.g. “Latin Capital Letter A”. Furthermore, an encoding defines how the codepoint can be represented as one or more bytes.

The following image shows the encoding of an uppercase letter A using standard ASCII.


Image showing character encoding and the transition from Character A to binary and codepoints


UTF-8, UTF-16 and UTF-32

UTF-8 is the most widely used encoding and is variable in length. It is capable of encoding all valid Unicode code points and can use between 1 and 4 bytes for each code point. The first 128 code points require 1 byte and match ASCII.

UTF-16 is also a variable-length and is capable of encoding all valid Unicode code points. Characters are encoded with one or two 16-bit code units. UTF-16 was developed from an earlier fixed-width 16-bit encoding known as UCS-2 (for 2-byte Universal Character Set).

UTF-32 is a fixed length encoding that requires 4 bytes for every Unicode code point.

Browser Data Analysis

It is important to understand character encoding when examining Internet and browser data. Browser applications use a variety of different encoding methods for storing data. For example, some browsers use UTF-16 for storing page titles and the default Windows encoding for storing URL data (e.g. Windows 1252). Windows 1252 is a 1-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages.

Selecting a Code Page in NetAnalysis®

An appropriate Code Page can be selected when creating a New Case in NetAnalysis®.

Digital Detective NetAnalysis® new case screen and option to set character encoding

Clicking the button next to the code page shows the following window. This allows the user to select the appropriate code page (if required).


Digital Detective NetAnalysis® code page screen to select character encoding



When using third party image mounting tools to perform the forensic examination of NTFS file systems, it is extremely important to understand NTFS Junction Points so that you don’t find yourself making a critical mistake during your analysis. An issue has been identified with third party image mounting software where NTFS junction points are hard linked to folders on the forensic investigator’s own hard disk. If you use software to process a file system (such as NetAnalysis® or Anti-Virus software) and the file system is mounted with junction points, the Operating System on the forensic workstation may point the software to folders which are not contained within the suspect volume. This leads to the extremely serious situation, where the investigator may inadvertently process their own file system.

This is possible with the following Operating Systems and filesystems:

Operating / File System
Microsoft Windows Vista with NTFS volumes (and server Operating Systems)
Microsoft Windows 7 with NTFS volumes (and server Operating Systems)
Microsoft Windows 8 with NTFS volumes (and server Operating Systems)

Symbolic Links

Windows 2000 and higher supports directory symbolic links, where a directory serves as a symbolic link to another directory on the computer. By using junction points, you can graft a target folder onto another NTFS folder or “mount” a volume onto an NTFS junction point. Junction points are transparent to software applications.

An NTFS symbolic link (symlink) is a file system object in the NTFS file system that points to another file system object. The object being pointed to is called the target. Symbolic links should be transparent to users; the links appear as normal files or directories, and can be acted upon by the user or application in exactly the same manner. Symbolic links are designed to aid in migration and application compatibility with POSIX operating systems, and were introduced with the modifications made to the NTFS file system with Windows Vista. Unlike an NTFS junction point (available since Windows 2000), a symbolic link can also point to a file or remote SMB network path. Additionally, the NTFS symbolic link implementation provides full support for cross file system links. However, the functionality enabling cross-host symbolic links requires that the remote system also support them, which effectively limits their support to Windows Vista and later Windows operating systems.

Unlike an NTFS junction point, a symbolic link can also point to a file or remote SMB network path. While NTFS junction points support only absolute paths on local drives, the NTFS symbolic links allow linking using relative paths. Additionally, the NTFS symbolic link implementation provides full support for cross file system links. However, the functionality enabling cross-host symbolic links requires that the remote system also support them, which effectively limits their support to Windows Vista and later Windows operating systems.

Junction Points

In Windows Vista, Windows Server 2008 and Windows 8, the default locations for user data and system data have changed. For example, user data that was previously stored in the %SystemDrive%\Documents and Settings directory is now stored in the %SystemDrive%\Users directory. For backward compatibility, the old locations have junction points that point to the new locations. For example, C:\Documents and Settings is now a junction point that points to C:\Users. Backup applications must be capable of backing up and restoring junction points. These junction points can be identified as follows:

  • They also have their access control lists (ACLs) set to deny read access to everyone.

Applications that call out a specific path can traverse these junction points if they have the required permissions. However, attempts to enumerate the contents of the junction points will result in failures. It is important that backup applications do not traverse these junction points, or attempt to backup data under them, for two reasons:

  • Doing so can cause the backup application to back up the same data more than once.
  • It can also lead to cycles (circular references).

Some mounting tools do not respect these permissions and therefore allow software applications to follow the links. As the links are hard coded into the file system, they can point to actual folder locations on the forensic workstation.

Per-User Junctions and System Junctions

The junction points that are used to provide file and registry virtualisation in Windows Vista, Windows Server 2008 and Windows 8 can be divided into two classes: per-user junctions and system junctions.

Per-user junctions are created inside each individual user’s profile to provide backward compatibility for user applications. The junction point at C:\Users\[username]\My Documents that points to C:\Users\[username]\Documents is an example of a per-user junction. Per-user junctions are created by the Profile service when the user’s profile is created.

The other junctions are system junctions that do not reside under the Users\[username] directory. Examples of system junctions include:

  • Documents and Settings
  • Junctions within the All Users, Public, and Default User profiles

Examining Junction Points

The following image shows a volume containing junction points. You can also see the corresponding hard link.

Even though, this volume is mounted as F, accessing the folder F:\Users\All Users opens the link and presents the files from C:\ProgramData as if they were actually contained within F:\Users\All Users.





A frequent question when dealing with browser forensics is ‘Does the Hit Count value mean that the user visited site ‘x’, on ‘y’ occasions?’ Most browsers record a ‘Hit Count’ value in one or more of the files they use to track browser activity, and it is important that an analyst understands any potential pitfalls associated with the accuracy, or otherwise, of this value.

We recently received a support request from an analyst who was analysing Internet Explorer data. They had found a record relating to a Bing Images search, which showed a hit count of 911. The particular search string was significant, and very damning had it actually been used 911 times. The analyst wanted to know if the hit count value could be relied upon.

The following experiment was carried out in order to establish how this surprisingly high hit count value could have been generated. In order to obtain a data set which contained as little extraneous data as possible, a brand new VMWare virtual machine was created. The machine was setup from the Microsoft Windows XP SP3 installation disc, which installed Internet Explorer v 6.0.2900.5512.xpsp.080413-2111 by default. Two user accounts were created on the machine – one to be used as an Admin account, for installing software etc; and the other to be used as the ‘browsing’ account. This separation of the accounts further assisted with minimising the possibility of any unwanted data being present within the ‘browsing’ account. Using the Admin account, the version of Internet Explorer in use on the virtual machine was upgraded to IE v 8.0.6001.18702. The ‘browsing’ account was then used for the first time. Starting Internet Explorer immediately directed the user to the MSN homepage. The address ‘’ was typed into the address bar, which led to the Bing search engine homepage. The ‘Images’ tab was clicked. This Auto Suggested a search criterion of ‘Beautiful Britain’, as can be seen in the figure below:



Figure 1

The term ‘aston martin’ was then typed into the search box, as shown below:


Figure 2

None of the images were clicked or zoomed, nor was the result screen scrolled. Internet Explorer was closed, and the browsing account logged off. The Admin account was used to extract the browser data for processing in NetAnalysis. The below image shows some of the results. Both of these entries are from Master History INDEX.DAT files:


Figure 3


As can be seen, both entries show a hit count of 5. Both of these pages were visited only once, so it is immediately apparent that the hit count value maintained by Internet Explorer may not be an accurate count of how many times a particular page has been visited. However, this still did not explain how Internet Explorer had produced a hit count of 911.

The virtual machine was started again, and the browsing account logged on. The previous steps were repeated; typing ‘’ into the URL bar; visiting the Bing homepage; and clicking on the ‘Images’ tab. Once again, Bing Auto Suggested the search criterion of ‘Beautiful Britain’, and displayed the same thumbnail results page. The search criterion ‘aston martin’ was again typed into the search box and the same thumbnail results page was produced. None of the images were clicked or zoomed. The results page was scrolled using the side scroll bar, which generated more thumbnails as it went. Internet Explorer was closed, and the browsing account logged off. The Admin account was used to extract the browser data for processing in NetAnalysis. The below image shows some of the results. Both of these entries are again from Master History INDEX.DAT files:



Figure 4

As can be seen, the ‘Beautiful Britain’ search now has a hit count of 13 – it is not at all clear how Internet Explorer determined this figure. Moreover, the ‘aston martin’ search now shows a hit count of 511. This page was not visited 511 times, nor were 511 of the thumbnail images clicked. The contents of the INDEX.DAT for the local cache folders (Content.IE5) were checked to see how many records were held relating to thumbnails that had been cached. The results were as follows:


Figure 5

So it does not even appear that there are 511 thumbnails held in the local cache. The result page was scrolled quickly, so the user did not see a large proportion of the thumbnail images.

In conclusion, it is apparent that the ‘Hit Count’ maintained by Internet Explorer cannot be relied upon. Although this experiment involved a quite specific process relating solely to image searches carried out on one particular search engine, the disparity between results and reality makes it clear that unquestioning acceptance of what Internet Explorer is recording as a ‘Hit Count’ could lead to significant errors if presented in evidence.

To complete the experiment, two further identical Virtual Machines were created. On one, the Google Chrome browser (v 15.0.874.106 m) was installed and used. On the other, the Mozilla Firefox browser (v 8.0) was installed and used. The same steps were repeated: typing ‘’ into the URL bar; visiting the Bing homepage; and clicking on the ‘Images’ tab. The results from these processes are shown below:



Figure 6




Figure 7

It is apparent that both of these browsers seem to maintain a more accurate ‘Hit Count’.

Internet Explorer Data

As forensic examiners will be aware, Microsoft Internet Explorer stores cached data within randomly assigned folders. This behaviour was designed to prevent Internet data being stored in predictable locations on the local system in order to foil a number of attack types. Prior to the release of Internet Explorer v9.0.2, cookies were an exception to this behaviour and their location was insufficiently random in many cases.

Cookie Files

Generally, for Vista and Windows 7, cookie files are stored in the location shown below:


The cookie filename format was the user’s login name, the @ symbol and then a partial hostname for the domain of the cookie.

Cookie Files with Standard Name

With sufficient information about a user’s environment, an attacker might have been able to establish the location of any given cookie and use this information in an attack.

Random Cookie Filenames

To mitigate the threat, Internet Explorer 9.0.2 now names the cookie files using a randomly-generated alphanumeric string. Older cookies are not renamed during the upgrade, but are instead renamed as soon as any update to the cookie data occurs. The image below shows an updated cookie folder containing the new files.

Random Cookie Names

This change will have no impact on dealing with the examination of cookie data; however. it will no longer be possible to identify which domain a cookie belongs to from just the file name.

NetAnalysis showing Random Cookie Names

NetAnalysis showing Random Cookie Names


Safari is a web browser developed by Apple and is included as part of the Apple Macintosh OS X operating system.  It has been the default browser on all Apple computers since Mac OS X version 10.3 Panther and its first public release was in 2003.  Safari is currently at major version 5 released in June 2010.

In June 2007 Apple released a version of Safari for Microsoft Windows operating systems.  The version of Safari at this time was version 3.  Windows versions have been updated in parallel with Mac OS X versions ever since and are also at the time of writing at version 5.

Forensic Analysis of Safari

NetAnalysis® v1 currently supports the analysis of all versions of Safari.  Safari runs on Microsoft Windows and Apple Macintosh OS X operating systems.  The data created by Safari is file based and the structure of the data it creates is similar between operating systems.

Safari Browser v3 – 5

Safari, like all web browsers, aggressively prompts the user to update to the latest version to incorporate new security patches.  This means that you are likely to find the most recent version on computers currently in use, which at the time of writing is Version 5.

Internet History and Cache data is stored within each users profile, the exact location will vary depending on the operating system in use.

Safari stores Internet history records within an Apple property list file entitled history.plist (as shown in Figure 1).  Property list files have the file extension .plist and therefore are often referred to as plist files.  Plist files may be in either an XML format or a binary format.  For earlier versions of Safari (both Windows and Macintosh variants) the history.plist file was in the XML format.  Later and current versions utilise the binary plist format.  NetAnalysis parses both the XML and binary formatted history plist files.

Apple History Folder

Figure 1

Safari versions 3 to 5 store the cache in SQLite 3 database files entitled cache.db (as shown in Figure 2).  Earlier versions of Safari stored cache in files that had the file extension .cache.  These files are not currently supported.

Apple Cache Folder

Figure 2

Stage 1 – Recovery of Live Safari Data

To process and examine Safari live Internet history and cache with NetAnalysis, the following methodology should be used.  In the first instance, it is important to obtain the live data still resident within the file system (web pages can only be rebuilt from live cache data).

This can be done in either of the following three ways:

  • Export all of the data (preferably in the original folder structure) utilising a mainstream forensic tool
  • Mount the image using a forensic image tool
  • Access the original disk via a write protection device

Once the data has been extracted to an export folder, open NetAnalysis® and select File » Open All History From Folder.  Select the folder containing your exported Safari data.


Figure 3


Stage 2 – Recovery of Deleted Safari Data

HstEx® is a Windows-based, advanced professional forensic data recovery solution designed to recover deleted browser artefacts and Internet history from a number of different source evidence types.  HstEx® supports all of the major forensic image formats.

HstEx® currently supports the recovery of Safari XML and Binary plist data.  It cannot at the moment recover cache records (research and development is currently being conducted).  Figure 4 shows HstEx® processing

HstEx Processing Apple

Figure 4

Please see the following link for information on using HstEx® to recover browser data:

Please ensure you select the correct Data Type prior to processing.  Safari v5 stores history data in binary plist files.  When HstEx has finished processing, it will open a window similar to the one shown in Figure 5.  These files can now be imported into NetAnalysis® v1 by either selecting File» Open History and selecting all of the files, or select File » Open All History From Folder and selecting the root recovery folder.


HstEx Output Folder for Apple Safari Extraction

Figure 5

Default Folder Locations

Apple Safari data can be found in the following default folder locations (Figure 6):


Figure 6

Further Reading

Introduction to Userdata

Internet Explorer 8+ user data persistence is a function which allows online forms to save a small file to the system with information about values entered in a particular form.  This allows the user to retrieve a half filled web based form when they revisit.

Persistence creates new opportunities for website authors.  Information that persists beyond a single page without support from the server, or within the finite scope of cookies, can increase the speed of navigation and content authoring.

The folder structure where the data is actually stored is very much like the standard Internet Explorer cache folder structure. Inside the cache folder you will find the files containing data attributed to the associated website.

{user}\AppData\Roaming\Microsoft\Internet Explorer\UserData\

To demonstrate how this works, we have created a page which allows you to save a string to the local drive.

Once you have saved some string data using the above page, open the UserData INDEX.DAT file in NetAnalysis and review the entries.  Selecting F8 will bring up the search/filter dialogue.  Change the field name to ‘Type’ and enter ‘userdata’ in the filter text box.  When this is executed, you should find an entry as shown below:

If you navigate to the corresponding folder, you will find an XML file which contains the string you entered into the website.  This is shown below.


Some of you will have noticed that from NetAnalysis® v1.50 there have been numerous new date and time columns added. These new timestamps were identified during months of research and development and are now included with the latest release. The image below shows some of the new fields from Internet Explorer. This article will look at each of the new columns and explain what they mean.


Last Visited [UTC]

This column should be self explanatory. It is the timestamp which reflects the last known recorded visit to a webpage (or object) in Coordinated Universal Time (UTC). Normally, this timestamp is extracted directly from the source record and not changed in any way by the time zone information set in NetAnalysis. With the exception of Internet Explorer Weekly INDEX.DAT records, all other records have their timestamps saved as UTC values. Weekly records are stored as local times and therefore have to be converted to UTC to fill this column.

Last Visited [Local]

This column contains the timestamp which reflects the last known recorded visit to a webpage (or object) in Local time. This timestamp is calculated by using the data from the Last Visited [UTC] column and converting it to Local time using the time zone information set in NetAnalysis prior to extraction (with the exception of Daily INDEX.DAT records which is already stored in Local time).

Date Expiration [UTC]

This column contains a timestamp (in UTC) which reflects the date and time when the object or record is no longer regarded as valid by the browser. For example, in History records, you will see that the expiration time is set according to the amount of days the browser is set to keep history records, whilst the cache expiration time can be set by the web developer and is delivered to the browser during the HTTP response. This column reflects the ExpireTime field in the INTERNET_CACHE_ENTRY_INFO Structure.

Date Last Modified [UTC]

This column contains a timestamp (in UTC) which reflects the date and time the webpage (or object) was last modified (last written). This information is passed back to the browser as part of the HTTP response. Since origin servers do not always provide explicit expiration times, HTTP caches typically assign heuristic expiration times, employing algorithms that use other header values (such as the Last-Modified time) to estimate a plausible expiration time.

Date Index Created [UTC]

This column contains a timestamp (in UTC) which reflects the date and time the Weekly INDEX.DAT file from Internet Explorer was created.

Date Last Synch [UTC]

This column contains a timestamp (in UTC) which reflects the last date and time at which an object was checked for freshness with the origin server. LastSyncTime is initially set as the time at which an object is added to the cache, and is updated every time the browser verifies freshness of the object with the server.

Date First Visited [UTC]

This column contains a timestamp (in UTC) which is available during the extracting of Netscape and Firefox v1-2 History. It reflects the first date and time at which a web page (or object) was visited.

Date Added [UTC]

This column contains a timestamp (in UTC) which is available during the extracting of Netscape, Firefox and Mozilla bookmark files. It reflects the date and time at which an entry was added to the bookmark file.

Further Information

For a breakdown of NetAnalysis® v2 Date/Time fields, please see a breakdown of the grid columns: NetAnalysis® v2 Grid Columns.


The Internet Explorer disk cache is a storage folder for temporary Internet files that are written to the hard disk when a user views page from the Internet.Internet Explorer uses a persistent cache and therefore has to download all of the content of a page (such as graphics, sound files or video) before it can be rendered and displayed to the user.Even when the cache is set to zero percent, Internet Explorer requires a persistent cache for the current session. The persistent cache requires 4 MB or 1 percent of the logical drive size, whichever is greater.

Disk Cache Storage Location

The disk cache location varies across operating systems and can usually be found in the following default locations:

Digital Detective NetAnalysis Cache Location Windows 98

Figure 1

Digital Detective NetAnalysis Cache Location Windows 2K and XP

Figure 2

Digital Detective NetAnalysis Cache Location Windows Vista and 7

Figure 3

To identify the correct location of the cache for each user, the registry hive for the particular user must be examined. You cannot rely upon the live cache folder being in the default location. Figure 4 shows the registry key containing the cache folder location. Users have been known to move their cache location in an attempt to hide browsing activity.

Digital Detective NetAnalysis Cache Examination Shell Folder Location

Figure 4

The “User Shell Folders” subkey stores the paths to Windows Explorer folders for each user of the computer. The entries in this subkey can appear in both the “Shell Folders” subkey and the “User Shell Folders” and in both HKEY_LOCAL_MACHINE and HKEY_CURRENT_USER. The entries that appear in user “User Shell Folders” take precedence over those in “Shell Folders”. The entries that appear in HKEY_CURRENT_USER take precedence over those in HKEY_LOCAL_MACHINE.

Temporary Internet Files Folder Structure

Within the Temporary Internet Files folder, you will find a Content.IE5 folder.  This folder is the main disk cache for Internet Explorer.  Outlook Express also writes data to this location as is explained later in this article.Inside the Content.IE5 folder, you will find a minimum of 4 cache folders.  The file names for these cache folders comprise of 8 random characters.  When further cache space is required, Internet Explorer will add additional folders in multiples of 4.  Figure 5 shows the layout of a typical Internet Explorer disk cache.

Digital Detective Windows Forensic Analysis Microsoft Internet Explorer Cache Folders

Figure 5

Content.IE5 INDEX.DAT File

The cache INDEX.DAT file is a database of cache entries.  It holds information relating to individual cached items so that the browser can check whether the resource needs to be updated (eTag) and information relating to the location of the cached item.  It also stores the HTTP (Hypertext Transfer Protocol) response header for the resource.At the start of the INDEX.DAT file, you will find a zero based array holding the names of the cache folders.  Each individual URL record contains an index value which refers to a specific folder in the array.  This allows Internet Explorer to correctly identify the location of a cached item.  This can be seen in Figure 6 below.

Digital Detective NetAnalysis INDEX.DAT Forensic Analysis

Figure 6

As files are saved to the cache, Internet Explorer uses a specific naming convention as shown in Figure 7.  The only exception to this rule is when data is written to this location by Outlook Express.

Digital Detective Forensic Analysis of Internet Explorer Cache Naming Convention

Figure 7

As pages and resources are cached, they can easily end up in different folders. Internet Explorer attempts to keep the volume of data and number of cached items across each cache folder as level as possible. As Internet Explorer writes files to the cache folders, it checks to see if a file with the same name already exists.  This is frequently the case when web developers do not use imaginative or descriptive names for their files.  If the file already exists within the folder, Internet Explorer will increment the counter.  If no file exists, the counter portion of the file name is set to 1. Files with the same naming structure within a cache folder do not necessarily belong to the same web site.  Also, multiple visits to the same web site can easily result in files with similar naming conventions being spread across all the cache folders.  Cached resources can also become orphaned when the INDEX.DAT entry is not written to disk (such as when Internet Explorer crashes before the entries are written).  Figure 8 shows a typical cache folder.  There are two files within this folder which would have had the original name “images.jpg”.  Internet Explorer has renamed them “images[1].jpg” and “images[2].jpg”.

Digital Detective Internet Explorer Forensic Cache Analysis Folder Structure

Figure 8

Outlook Express Email Client

Microsoft Outlook Express also uses the Content.IE5 folder as a temporary cache.  When a user selects an email message within the client, Outlook Express reads the data from the corresponding DBX file and caches the components of the email so that it can be rendered on screen for the user as an email message. The structure of the DBX files is such that the email message is broken down into blocks and has to be rebuilt before it can be rendered.  If the message contains any attachments, they are stored in Base64 format and stored within the file.  The attachments also have to be extracted, decoded and cached prior to rendering on screen.As the message is rebuilt, Outlook Express saves the different elements of the message to the disk cache as a temporary file.  The naming convention is different to Internet Explorer.  In the past, some forensic examiners have not been aware of this and have incorrectly attributed data in the cache to a visit to a web page when it in fact was there as the result of viewing an email message. The file structure is shown in Figure 9.  The asterisk values represent hexadecimal digits.  Figure 9 also contains some example file names.

Digital Detective Forensic Analysis of Internet Explorer Cache Outlook Express Temporary File Naming Convention

Figure 9


Microsoft Internet Explorer maintains a number of INDEX.DAT files to record visits to web sites as well as to maintain cache and cookie data.  In this article, we will look at the Daily and Weekly files.

Daily INDEX.DAT Entries

The Daily INDEX.DAT file maintains a Daily record of visited pages.  This INDEX.DAT file has an unusual HOST record entry which helps the investigator analyse the pattern of visits to a particular web site.

The HOST record entry is used by Internet Explorer to display the hierarchical history structure when showing the user which web sites have been visited.  Each record contains a number of timestamps with the important data being stored in a FILETIME structure.  This timestamp structure contains a 64-bit value representing the number of 100-nanosecond intervals since 1st January 1601 (UTC).  The Digital Detective DCode utility can be used to convert these and other timestamp formats.

On the first daily visit to a particular web site, Internet Explorer creates a HOST entry in the INDEX.DAT record.  In effect, this entry represents the first visit to a particular HOST on specific day.  With further visits to the same web site, the HOST entry remains unchanged.  Examining the entries for the Daily INDEX.DAT will show when a web site was first and last visited during the period.  Figure 1 below shows an example of this when using the HOST filter view in NetAnalysis® v1 to look for visits to the Digital Detective web site.


Figure 1

Daily INDEX.DAT Timestamps

The Last Visited timestamp information is stored as two 64-bit FILETIMES located at offset 0x08 and 0x10 (Decimal 8,16).  They are stored as UTC and Local time values.  As there is no requirement to alter these timestamps, they are presented in an unaltered state in NetAnalysis® v1 as the “Last Visited [UTC]” and “Last Visited [Local]” columns.  Figure 2 and 3 summarise these timestamp values.


Figure 2


Figure 3

Establishing the Time Zone ActiveBias

As the URL records contain UTC and Local timestamps, it is possible to establish the Time Zone ActiveBias by establishing the time difference between both timestamps.  We discussed in a previous article on manually establish the system Time Zone settings.  The calculated ActiveBias information is represented in NetAnalysis® v1 by the ActiveBias column as shown in Figure 4.


Figure 4

NetAnalysis further uses this information to confirm the selected Time Zone is correct.  If the Time Zone ActiveBias is in conflict with the Time Zone setting in NetAnalysis®, the resulting timestamps may not be represented accurately.  The calculated ActiveBias is logged to the Audit Log as shown in Figure 5.


Figure 5

If NetAnalysis® detects that the Time Zone settings for the current forensic investigation are not correct, a warning dialogue will be shown immediately after the data has been imported.  Figure 4 shows the warning dialogue.


Figure 4

Examination of the ActiveBias column will show which entries are in conflict with the Time Zone Settings.

Weekly INDEX.DAT Entries

At the commencement of a new browsing week, the content from the Daily INDEX.DAT files is archived into a single Weekly INDEX.DAT file.  The actual timestamp information within the binary file changes for this file type when compared to the other files.

When the Weekly INDEX.DAT file is generated, the file created timestamp is saved at offset 0x10 of every URL record.  This is different to the other INDEX.DAT records as this location usually represents the Last Visited UTC Timestamp.  Many applications (including some software which claim to be for forensic purposes) get this wrong and misrepresent this timestamp as the “Last Visited Date”.

This timestamp is in FILETIME format and is saved as a UTC value.  This timestamp is presented within NetAnalysis in the “Date Index Created [UTC]” column.

The last visited timestamp is saved at offset 0x08 within the record as a LOCAL timestamp.  This is unusual, as FILETIME timestamps are normally saved as UTC values and the other INDEX.DAT files all contain a Last Visited timestamp with a UTC value.  With this timestamp, NetAnalysis takes the unaltered LOCAL time and saves it to the “Last Visited [Local]” column.  Unfortunately, the Last Visited UTC FILETIME value which was present in the Daily INDEX.DAT is not saved within the record and therefore has to be converted from a Local timestamp.

To calculate the UTC timestamp for the “Last Visited [UTC]” column, NetAnalysis takes the LOCAL timestamp at record offset 0x08 and converts it to UTC.  This conversion is calculated using the Time Zone value set in NetAnalysis prior to importing any data.  In doing so, dynamic daylight settings are also taken into account (as well as any year on year differences).

If a Weekly record is imported with the “No Time Zone Date/Time Adjustment” setting activated, NetAnalysis will show the LOCAL Last Visited timestamp but will not attempt to calculate the UTC timestamp.  In this case, the “Last Visited [UTC]” column will remain empty.  The “Last Visited [Local]” timestamp for Weekly entries is not changed or affected by NetAnalysis Time Zone settings.  It is left in an unaltered state.

Weekly INDEX.DAT Timestamps

The timestamp representation in NetAnalysis is shown in Figure 5 and 6 below.


Figure 5


Figure 6

Useful Links

Introduction to Time Zone Identification

In a digital forensic examination, establishing which Time Zone the system had been set to should one of the first examination tasks.  If this information is not established at an early stage and taken into account, the validity of Date/Time evidence may be brought into question.  Not only is this true for the examination of Browser History and related artefacts, it is also important when examining file system metadata.

I also believe this is something every examiner should be able to do manually, as opposed to relying on point and click or script forensics.  Whilst point and click certainly has a place and software tools can greatly increase the efficiency of the examination process, digital forensic practitioners need to possess the skills and ability to verify the results.

Some Date/Time values stored in binary files are affected by the Time Zone setting of the original suspect computer and many digital forensic applications can alter the representation of these dates by the Time Zone setting of the forensic workstation.

This becomes particularly complicated when the suspect computer was set to an incorrect Time Zone and the computer clock was set to correspond to the Local Time Zone.  Many of the Date/Time stamps store the data as UTC values.  In such circumstances, the Operating System (or application) has to convert the value from Local time to UTC.

Case Example

This was demonstrated in a case I was asked to review a number of years ago.  A computer had been seized as part of an investigation into abusive images of children.  The police had examined the computer correctly and the individual involved had been charged with offences under the Protection of Children Act 1978.

A defence expert examined the forensic image from the computer had declared in his report that the police had tampered with the evidence and alleged that they were responsible for the illegal material as the Date/Time stamps show the material was created on the disk some four hours after it had been seized by police.

My initial examination revealed that the defence expert had not established the Time Zone settings for the system nor had he taken them into account during his examination and subsequent report.  If he had, he would have seen that the system was incorrectly set to Pacific Time and not GMT.  As far as the Operating System was concerned, the system was in Pacific Time and added 8 hours to the Local times to convert them to UTC.  This resulted in the Date/Time stamps being 8 hours in advance of the correct time.

When the defence expert stated the computer had illegal material written to the disk after the system was seized, it was in fact that this had happened some 4 hours prior to the warrant being executed at the home of the suspect.

Establishing the Current Time Zone

To establish the Time Zone setting for a Microsoft Windows system, the forensic examiner can examine the SYSTEM registry hive.  To do this, you need to establish which ControlSet was active when the computer was seized.


Figure 1

There you will find 4 keys detailing the Current, Default, Failed and LastKnownGood control sets.  The current control set in the screen below is set to 3.  You can also see the there are three ControlSets numbered 001 to 003.


Figure 2

Now that this current control set has been identified, we can navigate to that location in the registry and calculate the different values as stored.  In this case, the Time Zone settings are stored here:


Figure 3

The Time Zone Information for this Control Set is shown in Figure 4.


Figure 4

The keys are explained below.  Please note that the bias settings are stored in minutes as a signed integer.  The bias is the difference, in minutes, between UTC and local time.  All translations between UTC and local time are based on the following formula:


Figure 5


This value is the current time difference from UTC in minutes, regardless of whether daylight saving is in effect or not. It is this value that helps establish the current Time Zone settings. Using the formula above, take this value and add it to local time to get the UTC value.


This value is the normal Time difference from UTC in minutes. This value is the number of minutes that would need to be added to a local time to return it to a UTC value. This value will identify the Master Time Zone (Standard Time).


This value is added to the value of the Bias member to form the bias used during standard time. In most time zones the value of this member is zero.


This value specifies a bias value to be used during local time translations that occur during daylight time. This value is added to the value of the Bias member to form the bias used during daylight time. In most time zones the value of this member is –60.


The Operating System uses this name during daylight saving months to display the current time Zone setting.


Binary data in SYSTEMTIME structure used to identify the date/time that Daylight Saving will commence in this time zone.


The Operating System uses this name during daylight saving months to display the current time zone setting.


Binary data in SYSTEMTIME format used to identify the date/time that Standard Time will commence in this time zone.


This will only be visible if the setting to automatically adjust clock for daylight saving has been switched OFF.

Calculating Signed Integer Bias Values

Within digital systems, all data, whether they be numbers or characters are represented by strings of binary digits. A problem arises when you want to store negative numbers.

Over the years, hardware designers have developed three different schemes for representing negative numbers: sign and magnitude, ones complement, and twos complement. The most common method for storing negative numbers is twos complement. With this method, the Most Significant Bit (MSB) is used to store the sign.

If the MSB is set, then this represents a NEGATIVE number. This method affords natural arithmetic with no special rules. To represent a negative number in twos complement notation the process is simple:

• Decide upon the number of bits (n)
• Find the binary representation of the +ve value in n-bits
• Flip all the bits (change 1 to 0 and vice versa)
• Add 1

Figure 5 below shows the binary representation of the positive number 5.


Figure 5

To represent this as a negative number (using 8 bits) then the procedure above is followed.  Flip the bits as shown above and add one as shown in Figure 6.


Figure 6

This method makes it extremely easy to add positive and negative numbers together.  For example:


Figure 7

It also makes it extremely easy to convert between positive and negative numbers:


Figure 8


If we look once again at the ActiveTimeBias in Figure 9, you will see a signed hexadecimal value.  This can be calculated using twos complement.


Figure 9

This value is stored in hexadecimal as a 32 bit value, so to work out the value it will need to be converted to binary.  Ignore the fact that on this occasion, the registry editor is showing the decimal value (4294967236) next to it; this is purely because the registry editor does not realise the value has been stored as a signed integer.

The twos complement calculation is as follows:


Convert this to binary:


The MSB is set so we know that the above value will be negative.  The next stage is to flip all the bits.  This involves changing 1 to 0 and vice versa.  This can be achieved quickly using the logical NOT function on a scientific calculator.  You must ensure that it is set to deal with the correct number of bits.


Add 1 bit to the value above


And then convert that value back to decimal, remembering that we are dealing with a negative number:



Daylight Saving / Standard Time Start Dates

Looking at Figure 10 below, you can see two keys entitled DaylightStart and StandardStart.  They hold encoded data showing the exact commencement date/time of Daylight Saving and Standard Time.   To establish when daylight saving starts and ends, both keys will need to be decoded.


Figure 10


This data is stored in a common structure called SYTEMTIME. This structure specifies a date and time, using individual members for the month, day, year, weekday, hour, minute, second, and millisecond.


Figure 11

The data in DalylightStart is as follows:


Figure 12

Bytes 0 & 1 (0x0000 ) refer to the year from a 1900 time base.  This is only required if the change is year specific and will normally be zero.

Bytes 2 & 3 (0x0003 ) refer to the month, in this case March.

Bytes 4 & 5 (0x0005) refer to the week (starts at 1 and 5 means last).  In this case the last week.

Bytes 6 & 7 (0x0001) refer to the Hour.  In this case it is 0100 Hours.

Bytes 8 & 9 (0x0000) refer to the Minutes.  In this case it is Zero minutes.

Bytes 10 & 11 (0x0000) refer to the Seconds; in this case it is Zero seconds.

Bytes 12 & 13 (0x0000) refer to the Milliseconds, in this case it is Zero milliseconds.

Bytes 14 & 15 (0x0000) refer to the actual Day of the Week (Sunday = 0).  In this case it is Sunday

For our example in Figure 12, Daylight Saving Time (DST) will start on Sunday of the Last Week in March at 0100 Hours.  If we had decoded StandardStart, we would see that DST would end on Sunday of the last week of October at 0200 hours.

Further Reading