Monday, December 07, 2009

"Data Carving" and Metadata

This post is about an argument that was made in U.S. v. Haymond, 2009 WL 3029592 (U.S. District Court for the Northern District of Oklahoma 2009).

In December of 2008, Andre Ralph Haymond was indicted on federal charges of possessing and attempting to possess child pornography.

The indictment charged Haymond with possessing “child pornography `including, but not limited to’ five specific files.” U.S. v. Haymond, supra.

The government apparently found an additional 135 images of child pornography after the indictment was returned:

There are 140 images at issue here. Of these, the Government claims to have identified 78 pornographic images on Defendant's computer hard drive and 62 images (including one video clip) that allegedly were made available through LimeWire from a location associated with Defendant.

U.S. v. Haymond, supra.

The opinion cited above deals with Haymond’s efforts to gain access to these images and other evidence in the case against him. Haymond filed a second Motion to Compel the government to give him access to images

allegedly contained on his computer that was seized by the Government pursuant to a search warrant . . . . The Government made a mirror image of Defendant's computer's hard drive. For convenience, this mirrored hard drive was made available to Defendant's expert, David Penrod (`Penrod’”) at the Regional Computer Forensic Laboratory (`RCFL’) in Denver, . . . near Penrod's home. . . .

Penrod complained he was unable, using his Encase software, to find any pornographic images on the mirrored hard drive. This generated Defendant's first Motion to Compel. On Sept. 2, after the Court directed Penrod to return and work with the RCFL to try again to access the images, he was able to find 14,000 images on the hard drive using the Government's Forensic Tool Kit (`FTK’) software. Allegedly, 78 of these images are unlawful child pornography. Defendant now complains that he cannot tell which of the 14,000 images are the 78 images on the hard drive that the Government contends constitute child pornography. Defendant further complains that the images have been `stripped’ of all metadata that would enable him to prepare a forensic defense to the pending charge.

U.S. v. Haymond, supra.

Haymond filed his Motion to Compel under Rule 16 of the Federal Rules of Criminal Procedure. Rule 16(E) states that

[u]pon a defendant's request, the government must permit the defendant to inspect and to copy . . . books, papers, documents, data . . . or copies or portions of any of these items, if the item is within the government's possession, custody, or control and:

(i) the item is material to preparing the defense;

(ii) the government intends to use the item in its case-in-chief at trial; or

(iii) the item was obtained from or belongs to the defendant.

As this court noted, and I explained in an earlier post, Congress modified this part of Rule 16 in 2006, when it adopted the Adam Walsh Act, which added § 3509(m) to Title 18 of the U.S. Code. Under § 3509(m) a judge must

deny a defendant's requests to copy or otherwise reproduce child pornography as long as the material is made `reasonably available’ to the defendant. `Reasonably available’ requires the Government to provide an `ample opportunity’ for inspection at a Government facility.

U.S. v. Haymond, supra. Haymond’s argument was that even though the government made the mirrored image of his hard drive “available” to Penrod at the Denver RCFL, Penrod still couldn’t find “the 78 images on the mirrored hard drive that the Government contends constitute child pornography.” U.S. v. Haymond, supra. Haymond therefore made three arguments as to why he was entitled to more comprehensive discovery.

The first was the “missing metadata” argument. Haymond said his expert couldn’t “find any metadata associated with the files on the mirrored hard drive and implie[d] that the Government may have `stripped’ this metadata when it `data carved’ images from the hard drive.U.S. v. Haymond, supra. The judge didn’t buy this argument:

At the hearing on Sept. 16, 2009, the Government stated that the hard drive provided to Defendant was a `complete, exact’ copy of the hard drive the Government accessed and if Defendant did not have metadata, it is because the metadata is simply not there. . . . Defendant's own expert, Penrod, represented to the Court at the . . . hearing that he did not expect and was not looking for metadata.. . . . [T]he Court specifically asked Penrod if he was seeking metadata on the mirrored hard drive. He replied:

`Metadata is simply data such as the logical path to a file and the date and time of creation. . . . And you're not going to find that with any kind of-or with about 95 percent of data-carved items. All you're going to get is the image in this particular situation and the physical location of where that data is actually located on the hard drive.’

Later when Defense counsel queried Penrod about getting metadata . . . Penrod stated that he was `99.99-percent sure we won't find any metadata associated with these files.’

U.S. v. Haymond, supra. The judge therefore held that since Haymond hadn’t produced evidence showing that the government “`stripped’ metadata from the hard drive before creating the mirror-image”, there was “no basis for [Haymond’s] complaint in this regard”, i.e., no reason to grant what the Motion to Compel sought. U.S. v. Haymond, supra.

Haymond’s second argument was that he was entitled to additional discovery because his expert couldn’t find the 78 images “the Government contends are on the mirrored hard drive without poring over all 14,000 images found there.” U.S. v. Haymond, supra. The judge didn’t have to rule on this argument because the parties had worked it out:

At the Sept. 16 hearing, the Government agreed to send to the RCFL CDs containing all 140 images at issue in this case. These will be made available to Penrod at the RCFL and will remain in the custody and possession of the Government. Penrod will be able to access these images and compare them to data on the mirrored hard drive. Defense counsel stated that this will obviate the need for any cluster/sector information to locate the images as he had previously requested.

U.S. v. Haymond, supra.

That brings us to the third argument Haymond made in support of his Motion to Compel further discovery of digital evidence. He wanted the prosecution to create

redacted copies of the images at issue so he can use them with subpoenas to the internet web sites where the images originated. Defendant plans to subpoena information from the web site owner to the effect that the persons portrayed in the pornographic images are adults, not minors.

U.S. v. Haymond, supra. If the people in the images were adults, then the material was not child pornography and the charges against Haymond would fail. It might sound like a good argument, but it didn’t work, at least not completely:

[T]he Government stated that the core of its case at trial will be based on 10-12 of the 140 images. These include one video file and photographs where the Government has identified a juvenile victim. These images have been identified in the Child Victim Identification Program (`CVIP’) report available to Defendant. While the Government has not absolutely restricted itself to only using these 10-12 images at trial, it has confirmed that these will be the central core of the case. . . . Defense counsel has estimated that only about 15 percent of the images he has reviewed have any web site information embedded in them that would provide a basis for a subpoena. Accordingly, Defendant should focus on the 10-12 images identified in the CVIP report. If there are images with embedded web site information Defendant wants to subpoena, he shall identify those files well before trial and the Government will prepare redacted images for his use.

U.S. v. Haymond, supra. The Haymond judge noted that this procedure had been used in an earlier, similar case: U.S. v. Dobbs. U.S. v. Haymond, supra. He also noted that at the September 26 hearing in the Haymond case, Haymond’s attorney “conceded . . . that the subpoena issued in Dobbs to an internet site in Holland resulted in a `wild goose chase’” because the subpoena recipient ignored the subpoena. U.S. v. Haymond, supra.

Haymond is the only reported case I can find in which data carving was the basis for an altered/destroyed evidence claim. Not being a computer forensics expert, I can’t opine on the viability of such an argument, but my suspicion is that it’s not particularly sound.


Professor Don said...


I suspect you are right. Something is missing here.

Data carving is used to find deleted files. Undeleted files are located from an entry in what amounts to a table of contents. This is commonly referred to as metadata. (There are other sources of metadata but I think this is what the expert was talking about.)

When a file is deleted, only the table of contents entry is erased. The actual file exists until overwritten.

So, right off the bat, it's far fetched to assert that government experts deleted the evidence (erased the metadata).

Data carving works, in part, by ignoring the table of contents information and looking at each disk sector individually. This can be a major mess since (1) files are not necessarily contiguous and (2) only portions of files may be overwritten.

This is analogous to scrambling all the pages of a book with each page have a "Continued on page N" link at the bottom. The table of contents points to the first page of a story (file) and the reader then follows the links after that.

Now erase the table of contents and the links. Data carving is looking at each page and trying to figure out which story(file) it belongs to.

Nunc est bibendum.

JoelKatz said...

I guess I fundamentally just don't get this. Doesn't possession require some level of control? If it took forensic analysis to even find this content, how was he able to exercise any control over them?

Presumably, without going to extraordinary lengths, he couldn't view them. He couldn't delete them, as he already had deleted them.

How can you be charged with possessing something when you took just about every possible step to divest yourself of its possession?!

Anonymous said...

I would like to help clarify this since I am pursuing a Master's Degree in Cyber Forensics. First of all, metadata is used to compare photographs in a database that are known child porn images. If there is no metadata for a photograph, it can be difficult sometimes to know the age of the victim in the photo or if it's something from a legitimate legal porn site. That is the reason this data is relevant. If you download an application such as PhotoME, you can go ahead and open your own photos and view tons of metadata such as the date and time the picture was taken and even the location, type of camera used and geo location. So you can see how this kind of data is important in helping to identify victims.

Secondly, when evidence is seized at a crime scene, the first responder follows something called an order of volatility, which means he capture the most volatile data first such as system ram and registry files. Hard drives are further down the chain.

Third, once all evidence has been seized, each drive has an MD5 hash value calculated and a chain of custody is established. The forensic examiner makes a working bit stream copy (bit-by-bit) and then hashes that copy. If the hash matches with the original evidence, then you know that the hard drives are exact copies of each other. Then the examiner re-hashes the preservation copy to make sure no data was altered. If someone were to even alter one file in the drive, even adding a period to a sentence, the hashes won't match and the integrity of the evidence is compromised. So all 3 hashes must match exactly before any investigation even begins. The examiner then works off the image of the original. The original drive is stored in an anti-static bag and placed in a steel safe locker.

Finally, we do not know if the data they are referring to in this article is a RAM dump or something deleted from a hard drive. Either way, a forensic examiner can find this data if they know where to look and they are good at carving. It seems like the first examiner was not competent enough to find the images so they had to go back and look again. More than likely the suspect deleted these images when the police came and they were able to recover them. But once the files have been deleted, much of the metadata is gone, so it's hard to identify those victims and verify that they are of an illegal nature.