ResearchReviewsTools

What is NodeXL and how can it be used. A review of the NodeXL book by Hansen, Shneiderman, Smith

INTL. JOURNAL OF HUMAN-COMPUTER INTERACTION, 27(4), 405–408, 2011 http://doi.org/10.1080/10447318.2011.544971

Book Review

Derek Hansen, Ben Shneiderman, and Marc A. Smith.

Analyzing Social Media Networks with NodeXL: Insights

from a Connected World. Burlington, MA: Morgan Kauffman,

2011. 284 pages. ISBN: 978-0-12-382229-1

Reviewed by Sorin Adam Matei, Department of Communication, Purdue University

Who were the most important actors in the ENRON scandal? Could the pattern of e-mails exchanged between the top employees of the energy company whose demise shook U.S. markets in 2001 answer this question? What makes people share content on Flickr? Who are the most in?uential members of a Twitter “follower” group? Why do some people answer thousands of questions posted on Questions and Answers sites? Who are these people? Who are the most in?uential editors of Lostpedia.org, the popular reference site dedicated to the TV series LOST? What types of cliques can be discerned on this fan site, and what does this tell us about online interaction in general? What patterns of sharing can be deciphered on the YouTube video-sharing site?

The volume Analyzing Social Media Networks with NodeXL not only provides speci?c answers to some of these questions but also proposes that the best way to answer them is by utilizing Social Network Analysis (SNA). Furthermore, the book proposes a new, simple, and free (if you own a copy of Microsoft Excel) SNA tool, NodeXL, which can be used by any spreadsheet literate analyst. The book is a great resource and how-to manual for academics or trainers in search of a handy tool for deploying simple SNA research projects. It is also a “ready-to-wear” textbook for an introductory network analysis class, either at the undergraduate or graduate level. Practitioners can equally bene?t from the tool and from the volume for the same reasons.

The most exciting part of the project proposed by the book is, of course, NodeXL. A relative newcomer [at the time of its publication], the software has a lot going for it. Offering some basic but essential capabilities within a simple interface, it is easy to learn and deploy. More important, it was developed on an open, free software architecture, which allows the user community to contribute its own enhancements. The software and the book are, so to speak, an appetizer for a far more bountiful meal. They are meant, in the ?rst instance, to present and educate the reader not about NodeXL but about SNA in general. In the background there is, however, a more ambitious plan: to create a great platform for more sophisticated network analytic tools that can be applied to problems of any size, from the smallest (a few hundred nodes and their linkages) to the largest (many thousands of nodes and millions of linkages). A true bene?t would come with the “virtualization” of Microsoft Excel, for which NodeXL is an extension, which will create the possibility of moving data analysis and storage in the cloud. One can certainly imagine a scenario in which average users can analyze gigantic data sets from the comfort of a desktop computer.

Yet, before we get there, what are the immediate NodeXL goals and capabilities as they can be gleaned from this volume? These are clearly mapped out by the very structure of the book. This is organized, according to the authors, in the form of a tree with roots, a trunk, and branches. The roots (Part I: Chapters 1 through 3) provide grounding in the history and core concepts of social media and social network analysis. The trunk (Part II: Chapters 4 through 7) focuses on the practical details of operating the free and open source NodeXL extension of Microsoft Excel. And the branches (Part III: Chapters 8 through 15) each focus on one form of social media [Facebook, Twitter, Flickr, etc.]. (p. 2)

According to the authors, the goal of the software and of the case studies is to teach users how to mainly identify key actors, documents, or networks. Thus, at least at this point, the package and the book focus on descriptive procedures. As we explain next, NodeXL is capable and will include, at some point, far more powerful inferential statistical capabilities.

The ?rst part of the book aims to de?ne social media and to explain to a relatively lay reader (e.g., a researcher not well versed in social network analysis) what social network analysis is all about and how its core concepts map onto speci?c heuristic problems. Although useful, the level of detail is a little excessive and is presented in rather pedantic manner. As is the case with books that address a rather knowledgeable audience, which is mainly interested in the “goodies,” that is, new ways and tools for exploring emerging data, extended de?nitions of areas of research can get in the way. On the other hand, if this book were to be picked up by someone who has never heard of social media or social network analysis, the long-winded explanations can in fact detract him or her from the business at hand, which is to get as fast as possible the main utility of the software and of the book. In brief, a second edition of the volume would bene?t from an abbreviated ?rst section, which would transition much faster to the more important business of the book, namely, to teach how to effectively use NodeXL.

This being said, the real meat of the volume, found in Parts 2 and 3, is of high quality. The chapters dedicated to explaining what NodeXL is and does provide straightforward and highly instructive tutorials on installing and using NodeXL. Albeit a mere adolescent, with much to contribute and space to grow, the software itself is a true contender in a rather sparse ?eld of desktop-based social network analysis tools. It is true, it should still reach the level of methodological sophistication of UCINET, the mainstay of SNA academic research, which is more mature and includes a lot more analytic procedures, including those that allow comparing networks in a statistically signi?cant manner. Yet the UCINET core software is at least 10 years old, and in terms of usability and ?exibility, it shows its age. On the other hand, although limited to a set of descriptive procedures (calculation of basic network metrics and statistics, such as a variety of centrality measures), NodeXL is in?nitely extensible and upgradeable via Microsoft Excel’s plug-in architecture. The base code itself is free and open source, being licensed under the terms of the Microsoft Public License, which according to some observers is as simple and transparent as the BSD licensing format.

The software itself is fully integrated into Microsoft Excel’s main functions. Its installation is as simple of opening an Excel template. Its use is driven by drop-down menus embedded in the Excel toolbar, where it has its own top-level category. Data are presented in spreadsheets, directly accessible to visualization and editing. Sorting is facilitated by drop-down menus built into column headers. Separate sheets are offered for visualizing and if needed, editing a variety of metrics. The networks (graphs) themselves are visualized using a special chartlike interface that is fully interactive. The color of the links or of the nodes, their size, and meaning can be adjusted via menus, ?lters, and option windows. The one, and at this point, signi?cant visualization limitation is that networks that are larger than a few thousand linkages (in social network analysis parlance, “edges”) do not display well or at all. There are only so many pixels in a screen, which can present only that much information. A workaround is to combine redundant edges if there are any, an eventuality that NodeXL can handle very well. According to the software designers, in the future visualization limitations might be overcome by employing tiling or data simpli?cation algorithms, both to be deployed automatically, as the size of the network demands. Yet this is yet to be written, remaining for now a projected addition.

At this point, NodeXL excels at calculating and displaying basic network metrics (especially centrality related) and cliques. Using relatively new algorithms, described in Chapter 7, NodeXL is fast and effective at visualizing how many communities are present in a given network and who the main actors that anchor them are. The NodeXL community has been using these tools with some success for de?ning a new problem space in social network analysis, namely, that in which social structures are associated with functional roles (see Chapter 15). This is a promising research area, and given its ability to change our understanding of how networks grow over time, it could be a direction for further extending the software. One can imagine adding speci?c algorithms to NodeXL, which can identify roles and network structures and/or determine if they occur more than what chance alone would predict.

Once NodeXL basics have been mastered, the volume proposes a number of methods and tools for extracting and analyzing data from a large variety of sources. The most impressive is the data import “spigots” for Twitter or a locally saved e-mail account. For example, NodeXL users can use a built-in import menu to retrieve any given Twitter account’s follower network. The process is rather straightforward; all you need is the Twitter account username and password. Once imported, data are stored in a spreadsheet that shows which followers are con.nected to each other. It is equally simple to import e-mail addresses associated with a Microsoft Outlook or Mozilla Thunderbird e-mail account. In this, as in the previous situation, spreadsheets are populated with pairs of data points, indicating e-mail exchanges between individuals. Of course, data still need to be manually cleaned to eliminate the “chaff” that can be found in e-mail messages, such as when a sender’s own e-mail address appears in a CC ?eld, but this process is rather straightforward and highly rewarding.

Other social media networks that can be directly accessed from NodeXL spigots include Flickr and YouTube. Furthermore, NodeXL has in the works new spigots for Facebook and MediaWiki.

NodeXL is also compatible with a variety of existing UCINET and similar network data formats. Of them, the most signi?cant is graphML, which is an XML schemata for encoding network data. As this format becomes more popular, NodeXL promises to be strategically placed to absorb any data set that is thrown at it. For example, those interested in analyzing Facebook data with NodeXL will ?nd out that a Facebook export application, Name Gen Web (http://apps.facebook.com/namegenweb/), provides a graphML version of one’s friend network, which can then be imported into NodeXL.

Each type of social media data import and analysis is illustrated by a separate chapter of the book. The editors chose a very smart strategy for designing these chapters. Instead of using hypothetical examples, they invited a number of academic researchers to present real-life projects that utilize NodeXL. Most social media platforms are featured, including Flickr, YouTube, Facebook, Twitter, and MediaWiki. The true challenge in shepherding these chapters was to rein in the researchers’ natural penchant to focus on the social scienti?c aspect of their projects at the expense of explaining how the tool was used. The chapters generally succeed in serving more as real-life tutorials than dry academic papers. They effectively combine step-by-step instructions on how to use the tool, and what capabilities can be employed for most commonly encountered data challenges, with broader “so-what questions.” Especially useful were the examples provided for analyzing Flickr, Twitter, and MediaWiki data.

In conclusion, Analyzing Social Media Networks with NodeXL is a valuable resource for the research and education community. The software is a much-needed and valuable addition to any researcher’s SNA toolkit. Although at the beginning of its career, the tool promises much. Further developments and new features will be posted on the project website, http://codeplex.com/nodexl, where a lively support forum can also be accessed. NodeXL is a tool that should be followed and supported by all those interested in SNA.

Sorin Adam Matei

Assistant Vice President for Partnerships in Strategic Defense Innnovation and Professor of Communication at Purdue University, Director of the FORCES initiative leads research teams that study the relationship between technological and social systems using big data, simulation, and mapping approaches. He published papers and articles in Journal of Communication, Communication Research, Information Society, National Interest, and Foreign Policy. He is the author or co-editor of several books. The most recent is Structural differentation in social media. He also co-edited Ethical Reasoning in Big Data,Transparency in social media and Roles, Trust, and Reputation in Social Media Knowledge Markets: Theory and Methods (Computational Social Sciences) , all three the product of the NSF funded KredibleNet project. Dr. Matei's teaching portfolio includes technology and strategy, online interaction, and digital media analytics classes. A former BBC World Service journalist, his contributions have been published in Esquire and several leading Romanian newspapers. In Romania, he is known for his books Boierii Mintii (The Mind Boyars), Idolii forului (Idols of the forum), and Idei de schimb (Spare ideas).

Leave a Reply

Your email address will not be published. Required fields are marked *