Research

A new “semantic data” service: “Document Discovery”?

The New York Times combines a number of data sieving and semantic sorting services into a new type of application “document discovery.” The aim of this new service is the Holly Grail of automatic content analysis. Concepts are derived from context by clustering synonyms. A leader seems to be Cataphora. New York Times identifies several contenders in the field of document discovery:

Now, thanks to advances in artificial intelligence, “e-discovery” software can analyze documents in a fraction of the time for a fraction of the cost. In January, for example, Blackstone Discovery of Palo Alto, Calif., helped analyze 1.5 million documents for less than $100,000.

More advanced programs filter documents through a large web of word and phrase definitions. A user who types “dog” will also find documents that mention “man’s best friend” and even the notion of a “walk.”

The sociological approach adds an inferential layer of analysis, mimicking the deductive powers of a human Sherlock Holmes. Engineers and linguists at Cataphora, an information-sifting company based in Silicon Valley, have their software mine documents for the activities and interactions of people — who did what when, and who talks to whom. The software seeks to visualize chains of events. It identifies discussions that might have taken place across e-mail, instant messages and telephone calls.

Then the computer pounces, so to speak, capturing “digital anomalies” that white-collar criminals often create in trying to hide their activities.

Another e-discovery company in Silicon Valley, Clearwell, has developed software that analyzes documents to find concepts rather than specific keywords, shortening the time required to locate relevant material in litigation.

Last year, Clearwell software was used by the law firm DLA Piper to search through a half-million documents under a court-imposed deadline of one week. Clearwell’s software analyzed and sorted 570,000 documents (each document can be many pages) in two days. The law firm used just one more day to identify 3,070 documents that were relevant to the court-ordered discovery motion.

Sorin Adam Matei

Assistant Vice President for Partnerships in Strategic Defense Innnovation and Professor of Communication at Purdue University, Director of the FORCES initiative leads research teams that study the relationship between technological and social systems using big data, simulation, and mapping approaches. He published papers and articles in Journal of Communication, Communication Research, Information Society, National Interest, and Foreign Policy. He is the author or co-editor of several books. The most recent is Structural differentation in social media. He also co-edited Ethical Reasoning in Big Data,Transparency in social media and Roles, Trust, and Reputation in Social Media Knowledge Markets: Theory and Methods (Computational Social Sciences) , all three the product of the NSF funded KredibleNet project. Dr. Matei's teaching portfolio includes technology and strategy, online interaction, and digital media analytics classes. A former BBC World Service journalist, his contributions have been published in Esquire and several leading Romanian newspapers. In Romania, he is known for his books Boierii Mintii (The Mind Boyars), Idolii forului (Idols of the forum), and Idei de schimb (Spare ideas).

Leave a Reply

Your email address will not be published. Required fields are marked *