Your client’s dreaded day has arrived. His or her beloved company has been subpoenaed, sued, or threatened with impending litigation. Chances are your client’s first thought will not be wondering about exactly where the company keeps its electronically stored information (ESI). But it should be. All litigation provides for a discovery period, in which evidence will be sought by the opposite party. In the not so distant past, document production consisted solely of making available or reproducing paper records, such as agreements, contracts, letters, and miscellaneous financial information. Not so today. According to a recent University of California, Berkley survey, 93% of all information is now created in an electronic format. Considering email alone, the average user can easily generate between 50,000 and 100,000 documents per year. With numbers like these, it is no wonder that electronic discovery has become a matter receiving acute focus, with the goal being to find a way to effectively manage and sift through massive amounts of electronic data to locate the key information.
Traditional Search Methodologies
Not too long ago, attorneys would comb through repositories of electronic evidence in one of two distinct ways. The first involved conducting electronic searches, using relational or Boolean methods that searched for words insofar as they connect to one another. The second wass via key word searches, which simply targeted a known term. Each of these two methods have been utilized beneficially for many years by attorneys, and they are comprehended well and have been generally accepted.
Relational and key word searches, however, have their drawbacks as well. These searches can recognize only electronic data that contains the specific search words, either individually or in tandem.. Each methods thus is incapable of recognizing documents that contain similar terms or variants that do not exactly match the chosen search terms. Examples of missed items are initials, words that have been misspelled, nicknames, and synonyms.
Jacques Nack Ngue, founder and lead ediscovery specialist at eClaris, Inc., has commented that relying solely on Boolean or keyword-search technologies, without employing other search methods that have been developed, “is akin to using a typewriter when computers are available and accessible.” Litigators that fail to leverage these new possibilities run the risk of being out-searched by more technologically savvy opponents. Whereas traditional search methods are adequate for small databases, Ngue emphasizes that they are invariably lacking when dealing withthe legal analysis of massive databases involving complex queries. There must be, and is. another more thorough, more powerful alternative.
A Third Way— Concept Searching
Given the limitations of mere keyword and Boolean search methods, the legal industry has recently turned to “concept searching” as a potential solution. The producers of this technique maintain that concept searching has the power to more effectively and efficiently winnow out that handful of significant documents from millions of pages of electronic discovery. The primary advantage is that this method, if effectively used, can significantly reduce the need for laborious and expensive page-by-page attorney review.
As one might imagine, some concept-search technologies are better than others. In order to determine whether a specific technology is a viable option, it is first instructive to understand how it operates. Each concept-search technology will likely include some or all of the following three tools: (1) taxonomy abilities; (2) clustering functions; and (3) Bayesian demarcations.
“Taxonomy abilities” enable the concept search to classify data containing subcategories of language or terminology. In particular, this technique is used to categorize documents containing words that are subsets of issues directly relevant to a particular case. As an example, if Major League Baseball were a relevant subject, taxonomy abilities could also identify documents that use such terms as “Yankees,” “Dodgers,” and “Cubs.” Taxonomy abilities are vital for effectively pinpointing and managing large volumes of subset relationships.
A second tool is “clustering functions.” This technique operates in a manner directly opposite to the conventional Boolean and keyword search techniques, which automatically recognize potentially relevant data via directly identifying terms either individually or within a defined relation. Conversely, clustering functions use arithmetical relationships, which makes it possible to identify data containing a penumbra of words grouped or clustered together in pertinent categories. In essence, via the use of clustering functions, documents are selected based on the greater or lesser likelihood that their overall terminology pertains to a relevant topic; the more words a document has that correspond with the collection of relevant terms, the greater the likelihood the document will relate to the same topic and thus be relevant to some important issue in the litigation.
Third, there are "Bayesian markers." Named after 18th century statistician Thomas Bayes, Bayesian benchmarks involve the use of probability to identify relevant documents. The use of Bayesian markers maximizes the use of skilled assumptions about the probable significance of data based on in the case history of spotting relevant documents. Bayesian search results are sorted and positioned based on the forecasted chance of the probable significance of certain kinds of documents to litigated issues.
So, Which Approach Wins The Day?
Concept searching seems promising. The breadth, efficiency, and exactness that can be potentially accomplished by using this technology are truly remarkable. Nonetheless, many have wondered if concept searching is superior to the aforementioned Boolean and keyword approaches.
There is a reason that Boolean and keyword searching have become standard: prevalence. All of the major legal-research search engines, such as Lexis Nexis®, Loislaw® and Thomson-Reuters Westlaw®, use these search technologies. As a result, both court and counsel are quite familiar with the way they operate. In addition, the straightforwardness of these searching techniques are readily understandable.
The simplicity of Boolean and keyword searching, however, cuts both ways. Boolean searches can interrogate only the data containing specific, pre-identified terms. In other words, before a document can be identified as relevant, the attorney must identify in advance each and every specific word that will be searched for. In reality, of course, people communicate with a variety of terms. This limitation of Boolean and keyword searching almost guarantees that relevant data will be passed over. Moreover, keyword searches can be over-inclusive. Keyword searches necessarily target every single document containing the chosen term, regardless of whether the term’s actual use in context is is always relevant to the case.
By comparison, concept-searching tools do not rely on identifying the mere presence of specific terms within a given document. Instead, concept searching is smarter than that, for it includes techniques for determining whether a word’s use in context is likely to be relevant. As a result, for analyzing massive electronic databases, concept searching is capable of identifying highly relevant information that keyword and Boolean searches cannot identify.
That said, there are nevertheless drawbacks to concept searching. In particular, the possible benefits of concept searching must be weighed against the cost, both in money and in resources, necessary to employing the method. For example, concept-search techniques, like Boolean and keyword searches, can and often do yield many documents that are not truly significant. Counsel must, as always, weigh the costs and benefits.
The Verdict
While concept-search technologies potentially exceed the performance of Boolean and keyword searches, their time has not yet arrived.. For effectiveness, speed, and accuracy, as of today nothing beats Boolean and keyword searches, especially when employed in iterative progressions, in which subsequent searches further winnow previous search results. Yet for highly significant matters involving millions of pages of electronic data, concept-search technologies are worth deploying, whether separately or in concert with keyword and Boolean searches.
The evidence introduced at trial is inevitably a product of the discovery process. Even in complex lawsuits involving millions of pages of electronic data, a judge or jury can only view and digest a limited amount of data. This limitation makes attorneys’ analysis of produced documents all the more important. Given the proliferation of electronic data, winnowing out that handful of truly significant documents has become harder to accomplish. When a lawsuit involves millions of pages of documents, attorneys who use smart search methods gain an advantage over attorneys who know only how to work hard.
In order to best employ advanced search methods, counsel should learn how the concept-search technologies operate and take into consideration their potential benefits. In the end, the lawyer who better understands how to effectively identify the important documents may well win the day.
This article appeared in the Summer 2009 edition of Proof, the newsletter of the ABA's Trial Evidence Committee.