Introduction to Character Encoding

Understanding how Character Encoding works is an essential part of understanding digital evidence. It is part of the common core of skills and knowledge.

A character set is a collection of letters and symbols used in a writing system. For example, the ASCII character set covers letters and symbols for English text, ISO-8859-6 covers letters and symbols needed for many languages based on the Arabic script, and the Unicode character set contains characters for most of the living languages and scripts in the world.

Characters in a character set are stored as one or more bytes. Each byte or sequence of bytes represents a given character. A character encoding is the key that maps a particular byte or sequence of bytes to particular characters that the font renders as text.

There are many different character encodings. If the wrong encoding is applied to a sequence of bytes, the result will be unintelligible text.


The American Standard Code for Information Interchange, or ASCII code, was created in 1963 by the American Standards Association Committee. This code was developed from the reorder and expansion of a set of symbols and characters already used in telegraphy at that time by the Bell Company.

At first, it only included capital letters and numbers, however, in 1967 lowercase letters and some control characters were added forming what is known as US-ASCII. This encoding used the characters 0 through to 127.

7-bit ASCII is sufficient for encoding characters, number and punctuation used in English, but is insufficient for other languages.

Extended ASCII

Extended ASCII uses the full 8-bit character encoding and adds a further 128 characters for non-English characters and symbols.


Hex viewer showing extended ASCII character encoding


Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, Europe alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use.

These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption. Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.

The Unicode Standard is a character coding system designed to support the worldwide interchange, processing, and display of the written texts of the diverse languages and technical disciplines of the modern world. In addition, it supports classical and historical texts of many written languages. Unicode 10.0 adds 8,518 characters, for a total of 136,690 characters.

Unicode can be implemented by different character encodings; the Unicode standard defines UTF-8, UTF-16, and UTF-32 (Unicode Transformation Format).


The number assigned to a character is called a codepoint. An encoding defines how many codepoints there are, and which abstract letters they represent e.g. “Latin Capital Letter A”. Furthermore, an encoding defines how the codepoint can be represented as one or more bytes.

The following image shows the encoding of an uppercase letter A using standard ASCII.


Image showing character encoding and the transition from Character A to binary and codepoints


UTF-8, UTF-16 and UTF-32

UTF-8 is the most widely used encoding and is variable in length. It is capable of encoding all valid Unicode code points and can use between 1 and 4 bytes for each code point. The first 128 code points require 1 byte and match ASCII.

UTF-16 is also a variable-length and is capable of encoding all valid Unicode code points. Characters are encoded with one or two 16-bit code units. UTF-16 was developed from an earlier fixed-width 16-bit encoding known as UCS-2 (for 2-byte Universal Character Set).

UTF-32 is a fixed length encoding that requires 4 bytes for every Unicode code point.

Browser Data Analysis

It is important to understand character encoding when examining Internet and browser data. Browser applications use a variety of different encoding methods for storing data. For example, some browsers use UTF-16 for storing page titles and the default Windows encoding for storing URL data (e.g. Windows 1252). Windows 1252 is a 1-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages.

Selecting a Code Page in NetAnalysis®

An appropriate Code Page can be selected when creating a New Case in NetAnalysis®.

Digital Detective NetAnalysis® new case screen and option to set character encoding

Clicking the button next to the code page shows the following window. This allows the user to select the appropriate code page (if required).


Digital Detective NetAnalysis® code page screen to select character encoding



When using third party image mounting tools to perform the forensic examination of NTFS file systems, it is extremely important to understand NTFS Junction Points so that you don’t find yourself making a critical mistake during your analysis. An issue has been identified with third party image mounting software where NTFS junction points are hard linked to folders on the forensic investigator’s own hard disk. If you use software to process a file system (such as NetAnalysis® or Anti-Virus software) and the file system is mounted with junction points, the Operating System on the forensic workstation may point the software to folders which are not contained within the suspect volume. This leads to the extremely serious situation, where the investigator may inadvertently process their own file system.

This is possible with the following Operating Systems and filesystems:

Operating / File System
Microsoft Windows Vista with NTFS volumes (and server Operating Systems)
Microsoft Windows 7 with NTFS volumes (and server Operating Systems)
Microsoft Windows 8 with NTFS volumes (and server Operating Systems)

Symbolic Links

Windows 2000 and higher supports directory symbolic links, where a directory serves as a symbolic link to another directory on the computer. By using junction points, you can graft a target folder onto another NTFS folder or “mount” a volume onto an NTFS junction point. Junction points are transparent to software applications.

An NTFS symbolic link (symlink) is a file system object in the NTFS file system that points to another file system object. The object being pointed to is called the target. Symbolic links should be transparent to users; the links appear as normal files or directories, and can be acted upon by the user or application in exactly the same manner. Symbolic links are designed to aid in migration and application compatibility with POSIX operating systems, and were introduced with the modifications made to the NTFS file system with Windows Vista. Unlike an NTFS junction point (available since Windows 2000), a symbolic link can also point to a file or remote SMB network path. Additionally, the NTFS symbolic link implementation provides full support for cross file system links. However, the functionality enabling cross-host symbolic links requires that the remote system also support them, which effectively limits their support to Windows Vista and later Windows operating systems.

Unlike an NTFS junction point, a symbolic link can also point to a file or remote SMB network path. While NTFS junction points support only absolute paths on local drives, the NTFS symbolic links allow linking using relative paths. Additionally, the NTFS symbolic link implementation provides full support for cross file system links. However, the functionality enabling cross-host symbolic links requires that the remote system also support them, which effectively limits their support to Windows Vista and later Windows operating systems.

Junction Points

In Windows Vista, Windows Server 2008 and Windows 8, the default locations for user data and system data have changed. For example, user data that was previously stored in the %SystemDrive%\Documents and Settings directory is now stored in the %SystemDrive%\Users directory. For backward compatibility, the old locations have junction points that point to the new locations. For example, C:\Documents and Settings is now a junction point that points to C:\Users. Backup applications must be capable of backing up and restoring junction points. These junction points can be identified as follows:

  • They also have their access control lists (ACLs) set to deny read access to everyone.

Applications that call out a specific path can traverse these junction points if they have the required permissions. However, attempts to enumerate the contents of the junction points will result in failures. It is important that backup applications do not traverse these junction points, or attempt to backup data under them, for two reasons:

  • Doing so can cause the backup application to back up the same data more than once.
  • It can also lead to cycles (circular references).

Some mounting tools do not respect these permissions and therefore allow software applications to follow the links. As the links are hard coded into the file system, they can point to actual folder locations on the forensic workstation.

Per-User Junctions and System Junctions

The junction points that are used to provide file and registry virtualisation in Windows Vista, Windows Server 2008 and Windows 8 can be divided into two classes: per-user junctions and system junctions.

Per-user junctions are created inside each individual user’s profile to provide backward compatibility for user applications. The junction point at C:\Users\[username]\My Documents that points to C:\Users\[username]\Documents is an example of a per-user junction. Per-user junctions are created by the Profile service when the user’s profile is created.

The other junctions are system junctions that do not reside under the Users\[username] directory. Examples of system junctions include:

  • Documents and Settings
  • Junctions within the All Users, Public, and Default User profiles

Examining Junction Points

The following image shows a volume containing junction points. You can also see the corresponding hard link.

Even though, this volume is mounted as F, accessing the folder F:\Users\All Users opens the link and presents the files from C:\ProgramData as if they were actually contained within F:\Users\All Users.




Good Practice for e-Crime Investigations

Criminal behaviour has shifted to take advantage of electronic mediums and serious and organised criminal networks have become increasingly sophisticated. Corporations, Government departments and businesses now need to invest considerable sums in order to protect their assets and data. Lloyds of London have stated that they are defending up to sixty attacks a day on their corporate infrastructure. Policing needs to equip itself with the necessary skills and knowledge to meet this new challenge.

The Internet, computer networks and automated data systems present wide ranging opportunities for criminals, including organised criminal networks, to partake in an extensive variety of offences; this presents a significant challenge to law enforcement globally.

One of the principal difficulties facing the law enforcement community is how best to tackle the complex and dynamic developments associated with the Internet, Digital Information and evolutions in communications technology. This creates difficulties in the consistency of approach and enforcement nationally; there is a clear need to harmonise practices and procedures throughout the UK. At the same time it should be possible to learn how best to develop and share the experience and skills within British Policing has.

The ACPO Good Practice Guide for Managers of Hi-Tech Crime Units used to be a restricted document; this is no longer the case. The current guide can be downloaded from the link below:


The latest version can be downloaded here:

Digital Evidence Good Practice

The ACPO good practice guide for dealing with computer based evidence was first released in the late 1990s. Since then, there have been five iterations; some of the changes include an update in document title. The guide is essential reading for anyone involved in the field of digital forensics. The latest version “ACPO Good Practice Guide for Digital Evidence” has been updated to include more than just evidence from computers.

According to DAC Janet Williams QPM, ACPO lead for the e-Crime Portfolio:

This guide has changed from version 4, where it centred on computer based evidence; the new revision reflects digital based evidence and attempts to encompass the diversity of the digital world. As such this guide would not only assist law enforcement but the wider family that assists in investigating cyber security incidents. I commend all to read and make use of the knowledge and learning contained in this guide to provide us with the right tools to carry out our role.



It seems that whenever a review of ACPO guidance is carried out we are in the middle of technological changes that have vast impact on the work that is done within digital forensic units. It is a testament to the authors of the original four guiding principles for digital forensics that they still hold today, and one of the key early decisions of the review board was to keep those four principles, with only a slight change of wording to principle four.

We work in an area of constant change. There is a continuing need to re-evaluate and revise our capacities to perform our duties. There is a need to recover and analyse digital data that can now be found within the many devices that are within day to day use, and can supply vital evidence in all our investigations.

Hence a second key early decision was to change the title of the document to ACPO Good Practice Guide for Digital Evidence. This would hopefully encompass all aspects of digital evidence and remove the difficulty about trying to draw the line to what is or isn’t a computer and thus falling within the remit of this guide.

It is important that people who work within the arena of digital forensics do not just concentrate on the technology, as essential as that is, but that the processes we use are fit for the purpose, and that skills and capacities within units reflect the demands that are made on them.

A prime example of this is the use of the word ’triage’. It has been a subject of much discussion within the forensic community. It should be noted that it does not mean a single triage tool rather it is a complete process where certain tools will play a part but are not the whole solution.

This guide is not intended to be an A-Z of digital forensics, or a specific “how to do” instruction manual. It should paint an overall picture and provides an underlying structure to what is required within Digital Forensic Units (DFUs). Therefore, the guide has been produced as a high-level document without the specific guidance included in previous versions, as this guidance is now available elsewhere. Where relevant, links to other guidance documents will be given.

In this document Digital Forensic Unit is used to cover any type of group that is actively involved in the processing of digital evidence.


The latest version can be downloaded here:


One of the growth areas in digital forensics is the use of USB dongles for the licensing of software. Every single practitioner now finds themselves in dongle hell trying to manage a veritable menagerie of tiny USB devices just to enable them to carry out their day-to-day work.

Of course, where dongles for core forensic software are concerned, most people will possess their own Digital Detective, EnCase or FTK dongles and these will be jealously guarded, with practitioners unwilling to let their prized (and in some cases, very expensive) hardware leave their sight. But what about some of the lesser used, but no less valuable, licencing dongles out there? At the moment, most labs will resound to the cries of “who’s got the X dongle? I need it to do Y”. Several minutes of frantic searching and head scratching then ensues, until someone remembers that they borrowed it to use in the imaging lab for five minutes, two weeks ago. One solution to this problem is network attached dongle server.


Avoiding dongle hell with a network attached SEH MyUTN-80 Dongle Server

Network Attached Dongle Servers

Dongle servers from SEH make USB dongles available over a network. You use your copy-protected software as usual but you don’t need to connect the license dongles directly to your client.

As with locally connected dongles, only one user can use the respective dongle over the point-to-point network connection.

The SEH UTN Manager software tool for Windows, OS X and Linux gives you access to your dongles as if they were connected directly to your computer. The SEH UTN Manager is installed on all notebooks, PCs, servers, and terminals that require dongle access.


SEH UTN Dongle Manager

This means that all of your licensing dongles can be stored in one location, and accessible to all of your staff via your forensic network. The port area of the dongle server is lockable, meaning that no-one is able to remove dongles without the key; and if you use the rack-mounting kit, the dongle server can even go in your server rack for further security.


Avoid dongle hell with a rack mounted SEH MyUTN-800 Network Attached Dongle Server

If working practices allow, the dongle server can be accessed over the Internet, meaning that on-site working doesn’t have to involve carrying around thousands of pounds worth of dongles. A remote worker can also have temporary access to a dongle when required. The server works with all the common forensic dongles such as Feitian, Aladdin HASP, SafeNet and Wibu CodeMeter. This means that even your core forensic function dongles can be kept securely locked away, safe from loss or damage.

Main Benefits

  • Easily share any licensing dongle via the local area network
  • Lock away expensive dongles to prevent theft
  • Prevent damage through constant insertion and removal
  • Easily share, and provide dongle access to remote workers
  • Easily share licensing dongles in the lab without having to constantly plug/unplug and throw them around

This would be an ideal purchase for small offices that cannot afford to buy licences for everyone, particularly for expensive software which may not be used every day.

Where to Buy

So, if you too would like to avoid dongle hell, please contact us for a quote.


A frequent question when dealing with browser forensics is ‘Does the Hit Count value mean that the user visited site ‘x’, on ‘y’ occasions?’ Most browsers record a ‘Hit Count’ value in one or more of the files they use to track browser activity, and it is important that an analyst understands any potential pitfalls associated with the accuracy, or otherwise, of this value.

We recently received a support request from an analyst who was analysing Internet Explorer data. They had found a record relating to a Bing Images search, which showed a hit count of 911. The particular search string was significant, and very damning had it actually been used 911 times. The analyst wanted to know if the hit count value could be relied upon.

The following experiment was carried out in order to establish how this surprisingly high hit count value could have been generated. In order to obtain a data set which contained as little extraneous data as possible, a brand new VMWare virtual machine was created. The machine was setup from the Microsoft Windows XP SP3 installation disc, which installed Internet Explorer v 6.0.2900.5512.xpsp.080413-2111 by default. Two user accounts were created on the machine – one to be used as an Admin account, for installing software etc; and the other to be used as the ‘browsing’ account. This separation of the accounts further assisted with minimising the possibility of any unwanted data being present within the ‘browsing’ account. Using the Admin account, the version of Internet Explorer in use on the virtual machine was upgraded to IE v 8.0.6001.18702. The ‘browsing’ account was then used for the first time. Starting Internet Explorer immediately directed the user to the MSN homepage. The address ‘’ was typed into the address bar, which led to the Bing search engine homepage. The ‘Images’ tab was clicked. This Auto Suggested a search criterion of ‘Beautiful Britain’, as can be seen in the figure below:



Figure 1

The term ‘aston martin’ was then typed into the search box, as shown below:


Figure 2

None of the images were clicked or zoomed, nor was the result screen scrolled. Internet Explorer was closed, and the browsing account logged off. The Admin account was used to extract the browser data for processing in NetAnalysis. The below image shows some of the results. Both of these entries are from Master History INDEX.DAT files:


Figure 3


As can be seen, both entries show a hit count of 5. Both of these pages were visited only once, so it is immediately apparent that the hit count value maintained by Internet Explorer may not be an accurate count of how many times a particular page has been visited. However, this still did not explain how Internet Explorer had produced a hit count of 911.

The virtual machine was started again, and the browsing account logged on. The previous steps were repeated; typing ‘’ into the URL bar; visiting the Bing homepage; and clicking on the ‘Images’ tab. Once again, Bing Auto Suggested the search criterion of ‘Beautiful Britain’, and displayed the same thumbnail results page. The search criterion ‘aston martin’ was again typed into the search box and the same thumbnail results page was produced. None of the images were clicked or zoomed. The results page was scrolled using the side scroll bar, which generated more thumbnails as it went. Internet Explorer was closed, and the browsing account logged off. The Admin account was used to extract the browser data for processing in NetAnalysis. The below image shows some of the results. Both of these entries are again from Master History INDEX.DAT files:



Figure 4

As can be seen, the ‘Beautiful Britain’ search now has a hit count of 13 – it is not at all clear how Internet Explorer determined this figure. Moreover, the ‘aston martin’ search now shows a hit count of 511. This page was not visited 511 times, nor were 511 of the thumbnail images clicked. The contents of the INDEX.DAT for the local cache folders (Content.IE5) were checked to see how many records were held relating to thumbnails that had been cached. The results were as follows:


Figure 5

So it does not even appear that there are 511 thumbnails held in the local cache. The result page was scrolled quickly, so the user did not see a large proportion of the thumbnail images.

In conclusion, it is apparent that the ‘Hit Count’ maintained by Internet Explorer cannot be relied upon. Although this experiment involved a quite specific process relating solely to image searches carried out on one particular search engine, the disparity between results and reality makes it clear that unquestioning acceptance of what Internet Explorer is recording as a ‘Hit Count’ could lead to significant errors if presented in evidence.

To complete the experiment, two further identical Virtual Machines were created. On one, the Google Chrome browser (v 15.0.874.106 m) was installed and used. On the other, the Mozilla Firefox browser (v 8.0) was installed and used. The same steps were repeated: typing ‘’ into the URL bar; visiting the Bing homepage; and clicking on the ‘Images’ tab. The results from these processes are shown below:



Figure 6




Figure 7

It is apparent that both of these browsers seem to maintain a more accurate ‘Hit Count’.

Internet Explorer Data

As forensic examiners will be aware, Microsoft Internet Explorer stores cached data within randomly assigned folders. This behaviour was designed to prevent Internet data being stored in predictable locations on the local system in order to foil a number of attack types. Prior to the release of Internet Explorer v9.0.2, cookies were an exception to this behaviour and their location was insufficiently random in many cases.

Cookie Files

Generally, for Vista and Windows 7, cookie files are stored in the location shown below:


The cookie filename format was the user’s login name, the @ symbol and then a partial hostname for the domain of the cookie.

Cookie Files with Standard Name

With sufficient information about a user’s environment, an attacker might have been able to establish the location of any given cookie and use this information in an attack.

Random Cookie Filenames

To mitigate the threat, Internet Explorer 9.0.2 now names the cookie files using a randomly-generated alphanumeric string. Older cookies are not renamed during the upgrade, but are instead renamed as soon as any update to the cookie data occurs. The image below shows an updated cookie folder containing the new files.

Random Cookie Names

This change will have no impact on dealing with the examination of cookie data; however. it will no longer be possible to identify which domain a cookie belongs to from just the file name.

NetAnalysis showing Random Cookie Names

NetAnalysis showing Random Cookie Names

Introduction to NetAnalysis® v1.52

Digital Detective is pleased to announce the release of NetAnalysis® v1.52 (and HstEx v3.6).  The release of this version has been eagerly awaited, so we are glad to say the wait is finally over.

NetAnalysis® v1.52 adds a number of new features and fixes some minor bugs.  Some of the major new features released in this version are the ability to export and rebuild the entire cache for each browser in a single process, new support for all versions of Google Chrome, support for Apple Safari Cache.db files and the ability to import recovered Apple Safari binary plist files recovered by HstEx® v3.6.

Export and Rebuild Cached Data

On 24th October 2002, we introduced the ability to rebuild and export cached pages to NetAnalysis v1.25. Over the years, this functionality has been extremely helpful in providing the necessary evidence in a whole host of forensic investigations.

In v1.52, we have responded to your requests and added the ability to extract and rebuild all of the cached content in one single process. This new functionality can be accessed from the Tools menu as shown below:

In addition to rebuilding and writing out the cached pages, NetAnalysis® exports all of the individual cached files and groups them by file type. This allows the contents of the cache to be quickly reviewed for evidence. The output also includes a full audit for each rebuilt web page and contains relative paths to allow the export folder to be archived to external media. You can also click on the hyperlinks within the audit log to access the cached content.

USB Licence Dongle Support

As some of you may be aware, our Blade® data recovery product is licensed via a USB licence dongle.  We are now offering the option to licence NetAnalysis/HstEx with a USB licence dongle.  The USB licence dongle provides you with much greater flexibility over a licence key file (which is restricted to one licence key per machine) as the USB dongle can be easily moved from machine to machine.  This is not permitted with a licence key file which is restricted to a single workstation.Existing licence key file holders can purchase a USB licence dongle upgrade through our store.  Please see the following link for further information on USB Dongle Licences.

TSV / CSV Exporting

Our support for exporting to TSV (tabbed separated values) and CSV (comma separated values) files has been completely re-written and enhanced. We now include the field column headers in the output and have added a progress bar for the export process. The export engines are also considerably faster than in previous versions.

It is also possible to switch off or hide any columns you do not need or to change the column order prior to exporting. This ensures the output format is in the same order as the grid columns. The export engine will also only output the current filtered records. The HTML export function will be updated in a future release.

Restricting Import Date Range

This is a new feature which was requested by some of our colleagues working within the corporate environment. In some investigations, they may only be permitted to import data within a certain date range. By selecting to restrict the import range, any data outside the target date range is not added to the workspace.

This functionality can be found by selecting Options from the Tools menu and selecting Restrict Data Range (as shown below).


F2 Find Next Tagged Record

This new function was added as a result of a request from a forensic examiner. During an investigation, it may be necessary to tag certain records of interest and then review the activity on either side of each record. This can be achieved by tagging the required records and then pressing F5 to remove all filters. Selecting F2 (or Searching » Find Next Tagged Record) will move the record pointer to the next tagged record allowing you to examine that record and the data surrounding it.

HstEx® v3.6 – Recovering Apple Safari History Binary Plist

NetAnalysis® now has the ability to import Apple Safari History binary plists recovered by HstEx® v3.6.

The Apple Safari browser stores Internet history records in an Apple Property List (plist). With the earlier versions of the Safari browser (version varies depending on operating system), this file was in XML format. In later versions, Apple switched to using a proprietary binary plist format. NetAnalysis supports both XML and binary plist files and now supports the recovery of this data direct from a forensic image or write protected physical/logical device.

The data is recovered by HstEx® and output in the form of *.hstx files. These files can then be loaded directly into NetAnalysis® v1.52. As of the publication of this article, NetAnalysis® and HstEx® are the only forensic tools capable of recovering this data.

Further Information


Not so long ago, one email client which increased in popularity (particularly amongst paedophiles) in the United Kingdom was that provided with America Online (AOL).

Email extraction and analysis causes significant problems for digital forensic examiners. Almost all of the forensic software designed for extracting email is tailored for dealing with mail-store files which are intact.  This means that they have not been designed to extract email data from the other areas of a suspect hard drive such as, unallocated clusters, cluster slack, page files, hibernation files and other binary source files.  They have also not been designed to extract data fragments when the mail-store index has been overwritten.

From an evidential point of view, it is likely that a large quantity of email evidence is not being extracted.  In addition, as there is limited documentation available regarding the proprietary binary file structures, there is wide variance in the output from many of the commercial forensic tools currently available.

Recovery of AOL (Personal Filing Cabinet) Email Messages

Digital Detective’s forensic data recovery software Blade® contains a Data Recovery module (with Intelli-Carve® which has been designed to recover AOL email messages from a number of sources.

The AOL Professional Recovery Module has the ability of recovering live and deleted email messages (including attachments) whether directly from a Forensic image (such as an Encase® e01 compressed image) or a physical disk / volume. The output from the software allows the forensic investigator to identify the exact location the data was recovered from.

The carving engine for this Module is the result of numerous years research and development. It was originally released in the Digital Detective product EMLXtract. When this software was released to law enforcement in 2004, it was the first software product to recover AOL email messages from an image or physical/logical device (as opposed to a single PFC File). When compared against other tools, this software recovered more email messages than any other. It works particularly well against corrupted data when many other tools fail to recover anything at all.

The research and development that went into recovering AOL email messages from a forensic image took a considerable amount of time. AOL email messages contain many different elements such as compressed and non-contiguous data blocks. Embedded attachments can be split and have to be stitched back together. When this module was originally designed, the goal was not to recover live and deleted email messages from a Personal Filing Cabinet, but to be able to recover emails from a disk image. This functionality was originally released to Police Forces all around the world as a tool called EMLXtract.

Through research and development, the recovery engine has been enhanced further and is now part of Blade®.


The Internet Explorer disk cache is a storage folder for temporary Internet files that are written to the hard disk when a user views page from the Internet.Internet Explorer uses a persistent cache and therefore has to download all of the content of a page (such as graphics, sound files or video) before it can be rendered and displayed to the user.Even when the cache is set to zero percent, Internet Explorer requires a persistent cache for the current session. The persistent cache requires 4 MB or 1 percent of the logical drive size, whichever is greater.

Disk Cache Storage Location

The disk cache location varies across operating systems and can usually be found in the following default locations:

Digital Detective NetAnalysis Cache Location Windows 98

Figure 1

Digital Detective NetAnalysis Cache Location Windows 2K and XP

Figure 2

Digital Detective NetAnalysis Cache Location Windows Vista and 7

Figure 3

To identify the correct location of the cache for each user, the registry hive for the particular user must be examined. You cannot rely upon the live cache folder being in the default location. Figure 4 shows the registry key containing the cache folder location. Users have been known to move their cache location in an attempt to hide browsing activity.

Digital Detective NetAnalysis Cache Examination Shell Folder Location

Figure 4

The “User Shell Folders” subkey stores the paths to Windows Explorer folders for each user of the computer. The entries in this subkey can appear in both the “Shell Folders” subkey and the “User Shell Folders” and in both HKEY_LOCAL_MACHINE and HKEY_CURRENT_USER. The entries that appear in user “User Shell Folders” take precedence over those in “Shell Folders”. The entries that appear in HKEY_CURRENT_USER take precedence over those in HKEY_LOCAL_MACHINE.

Temporary Internet Files Folder Structure

Within the Temporary Internet Files folder, you will find a Content.IE5 folder.  This folder is the main disk cache for Internet Explorer.  Outlook Express also writes data to this location as is explained later in this article.Inside the Content.IE5 folder, you will find a minimum of 4 cache folders.  The file names for these cache folders comprise of 8 random characters.  When further cache space is required, Internet Explorer will add additional folders in multiples of 4.  Figure 5 shows the layout of a typical Internet Explorer disk cache.

Digital Detective Windows Forensic Analysis Microsoft Internet Explorer Cache Folders

Figure 5

Content.IE5 INDEX.DAT File

The cache INDEX.DAT file is a database of cache entries.  It holds information relating to individual cached items so that the browser can check whether the resource needs to be updated (eTag) and information relating to the location of the cached item.  It also stores the HTTP (Hypertext Transfer Protocol) response header for the resource.At the start of the INDEX.DAT file, you will find a zero based array holding the names of the cache folders.  Each individual URL record contains an index value which refers to a specific folder in the array.  This allows Internet Explorer to correctly identify the location of a cached item.  This can be seen in Figure 6 below.

Digital Detective NetAnalysis INDEX.DAT Forensic Analysis

Figure 6

As files are saved to the cache, Internet Explorer uses a specific naming convention as shown in Figure 7.  The only exception to this rule is when data is written to this location by Outlook Express.

Digital Detective Forensic Analysis of Internet Explorer Cache Naming Convention

Figure 7

As pages and resources are cached, they can easily end up in different folders. Internet Explorer attempts to keep the volume of data and number of cached items across each cache folder as level as possible. As Internet Explorer writes files to the cache folders, it checks to see if a file with the same name already exists.  This is frequently the case when web developers do not use imaginative or descriptive names for their files.  If the file already exists within the folder, Internet Explorer will increment the counter.  If no file exists, the counter portion of the file name is set to 1. Files with the same naming structure within a cache folder do not necessarily belong to the same web site.  Also, multiple visits to the same web site can easily result in files with similar naming conventions being spread across all the cache folders.  Cached resources can also become orphaned when the INDEX.DAT entry is not written to disk (such as when Internet Explorer crashes before the entries are written).  Figure 8 shows a typical cache folder.  There are two files within this folder which would have had the original name “images.jpg”.  Internet Explorer has renamed them “images[1].jpg” and “images[2].jpg”.

Digital Detective Internet Explorer Forensic Cache Analysis Folder Structure

Figure 8

Outlook Express Email Client

Microsoft Outlook Express also uses the Content.IE5 folder as a temporary cache.  When a user selects an email message within the client, Outlook Express reads the data from the corresponding DBX file and caches the components of the email so that it can be rendered on screen for the user as an email message. The structure of the DBX files is such that the email message is broken down into blocks and has to be rebuilt before it can be rendered.  If the message contains any attachments, they are stored in Base64 format and stored within the file.  The attachments also have to be extracted, decoded and cached prior to rendering on screen.As the message is rebuilt, Outlook Express saves the different elements of the message to the disk cache as a temporary file.  The naming convention is different to Internet Explorer.  In the past, some forensic examiners have not been aware of this and have incorrectly attributed data in the cache to a visit to a web page when it in fact was there as the result of viewing an email message. The file structure is shown in Figure 9.  The asterisk values represent hexadecimal digits.  Figure 9 also contains some example file names.

Digital Detective Forensic Analysis of Internet Explorer Cache Outlook Express Temporary File Naming Convention

Figure 9