Hybrid Document Review
April 24, 2008 by Ben
A recent post on law.com discusses the maturity of computer assisted search and review technologies. It describes some recent tests that were run comparing the predominant boolean search methodology with newer approaches such as concept searching, clustering, taxonomies, and um… Bayesian classifiers of course (calculating probabilities based on known relevant results). What they determined in summary was that no single search algorithm outperformed boolean searching, but when combined with boolean search more responsive documents were returned. Unfortunately, as you may have guessed more nonresponsive documents were also returned.
So is there hope for automated computer review technology? I wouldn’t get rid of your manual review process, but I’m a firm believer that computer assisted review will continue to improve and play a valuable role in the future of document review. One possibility is a hybrid approach along the lines of performing a narrow key word search, manually reviewing the results, and then implementing clustering and/or a bayesian algorithm on the remaining document set. Unfortunately, there are still some critical issues preventing this hybrid approach from taking hold.
First and foremost, we need transparency in the search algorithms. Transparency is really what makes boolean searching so powerful. If you could implement an alternative review algorithm with the same level of transparency, then it begins to work. For example, you could run a search that shows documents with a similar layout as those containing some critical keywords. Perhaps that helps you find other brochure documents that market tobacco as beneficial to your health but don’t explicitly state “good health” or “tobacco”.
Second, there needs to be an open platform for applying these algorithms in your evidence management repository. We run into occasions where it makes sense to implement a clustering algorithm for a case, but in doing so we need to port 10 million pages of documents into a separate repository. It’s difficult to justify the time and cost when a review team is readily available.
So keep your finger on the pulse of alternative search and review technologies, but I wouldn’t count on manual review going away anytime soon. Without transparency you’ll never reach agreement between parties on the scope of the review. And without an open platform for applying multiple approaches, you probably won’t save any money when compared to manual review.