Is there any form of ESI production worse than .tiffs and load files? If you've experienced the ease of e-discovery with tools purpose-built for native review, you know what I'm talking about. Once you "go native," you'll never go back!
By native, I mean data in the original electronic formats the producing party uses for email, word processing, spreadsheets and presentations.
A native file is inherently electronically searchable and functional until it's converted to .tiff images, when it loses both searchability and functionality. It's like photographing a steak. You can see it, but you can't smell, taste or touch it, you can't hear the sizzle, and you surely can't eat it.
Because converting to .tiff takes so much away, parties producing .tiff images attempt to restore a measure of electronic searchability by extracting text from the electronic document and supplying it in a load file accompanying the .tiff images. A recipient must then run searches against the extracted text file and seek to correlate the hits in the text to the corresponding page image. It's clunky, costly and incomplete.
The irony of .tiff and load file productions is that what was once a cutting-edge technology has become an albatross around the neck of electronic data discovery. To understand how we got to this unenviable place requires a brief history lesson.
Before the turn of the century, when most items sought in discovery were paper documents, .tiff and load file productions made lawyers' lives easier by grafting rudimentary electronic searchability onto unsearchable paper documents. Documents were scanned to .tiff images and coded by reviewers, and their text was extracted via optical character recognition (OCR) software. It was expensive and crude, but speedier than poring over thousands or millions of pieces of paper.
The coding and text had to be stored in separate files because .tiff images are just pictures of pages, incapable of carrying added content. So, in "single page .tiff" productions, each page of a document became its own image file, another file held aggregate extracted OCR text, and yet another held the coded data about the data, i.e., its metadata.
The metadata would include information about the content and origin of the paper evidence, along with names and locations of the various images and files on the media (i.e., CD or DVD) used to transmit same. Thus, adding a measure of searchability yielded a dozen or more electronic files to carry the pieces of a 10-page document.
To put Humpty Dumpty back together again demanded a database and picture viewer capable of correlating the extracted text to its respective page image and running word searches. Thus was born a new category of document management software called "review platforms." Because the files holding the document's OCR'ed text and metadata were destined to be loaded onto a review platform, they came to be called "load files."
Different review platforms used different load file formats to order and separate information according to guidelines called load file specifications. Load files are plain text files employing characters called delimiters to separate the various information items in the load file. Thus, a load file specification might require that information about a document be transmitted in the order: Box No., Beginning Bates No., Ending Bates No., Date, and Custodian. The resulting single line of text, delimited by commas, would appear: 57,ABC0003123,ABC0003134,19570901,Ball C.