Research

Why so Similar? Identifying Semantic Organizing Processes in Large Textual Corpora by Drew Margolin, Yu-Ru Lin, David Lazer :: SSRN

N-Gram Recordings Desktop Wallpaper
N-Gram Recordings Desktop Wallpaper (Photo credit: endless lazlo)

An interesting paper on social network of interaction in raw texts… Luo Si might be intrigued by this…

This paper introduces the concept of semantic organizing processes as a means of inferring theoretically meaningful behavior from the observation of raw text. Semantic organizing processes are mechanisms by which a set of authors come to produce texts that are similar in some observable, quantifiable way. We introduce three broad semantic organizing processes — authors sharing subject matter, authors sharing goals, and authors sharing sources — and argue that each of these processes will lead to texts that tend to share n-grams at different lengths: short n-grams for shared subject matter, moderate length n-grams for shared goals, and long n-grams for shared sources. To test these hypotheses, we develop a novel n-gram extraction technique to capture text similarity based on n-grams of different lengths. We then apply our technique to a corpus where the author attributes are observable: the public statements of the Members of the U.S. Congress. Our results support the hypothesis that these three processes are reflected in distinct kinds of textual similarity. This article presents the first empirical finding that different social processes are detectable through the structure of overlapping textual features. The finding has important implications for modeling text and understanding underlying social processes.

via Why so Similar?: Identifying Semantic Organizing Processes in Large Textual Corpora by Drew Margolin, Yu-Ru Lin, David Lazer :: SSRN.

Sorin Adam Matei

Assistant Vice President for Partnerships in Strategic Defense Innnovation and Professor of Communication at Purdue University, Director of the FORCES initiative leads research teams that study the relationship between technological and social systems using big data, simulation, and mapping approaches. He published papers and articles in Journal of Communication, Communication Research, Information Society, National Interest, and Foreign Policy. He is the author or co-editor of several books. The most recent is Structural differentation in social media. He also co-edited Ethical Reasoning in Big Data,Transparency in social media and Roles, Trust, and Reputation in Social Media Knowledge Markets: Theory and Methods (Computational Social Sciences) , all three the product of the NSF funded KredibleNet project. Dr. Matei's teaching portfolio includes technology and strategy, online interaction, and digital media analytics classes. A former BBC World Service journalist, his contributions have been published in Esquire and several leading Romanian newspapers. In Romania, he is known for his books Boierii Mintii (The Mind Boyars), Idolii forului (Idols of the forum), and Idei de schimb (Spare ideas).

Leave a Reply

Your email address will not be published. Required fields are marked *