Tag Archives: network analysis

Making Edgesense: two online communities at a glance

During the summer, the Wikitalia group worked hard to improve Edgesense, the tool for real-time network analysis we are building as a part of the CATALYST project. As we worked on out “official” test bed community, that of Matera 2019, I happened to tell about it to Salvatore Marras. He proposed to deploy Edgesense on Innovatori PA. Edgesense is a very raw alpha, but the curiosity of trying it on a much larger community han the one in Matera (over ten thousand registered users) made us try anyway.

Surprise:despite using the same software as Matera 2019 (Drupal 7), Innovatori PA is not just bigger: it is really different. Even greater surprise: Edgesense allows you to literally see the difference with the naked eye (click here for a larger image with an English caption).

Metrics confirm what the eye sees. Innovatori PA, with over 700 active nodes (active means they wrote at least one post or one comment), gives rise to a rather sparse network with only 1127 relationships. Average distance is quite high, 3.76 degrees of separation (Facebook, with a billion-plus users has only 4.74 – source); modularity, the simplicity with which the networks partitions into subcommunities, is very high.

Conversely, the Matera 2019 community gives rise to a quite dense network: 872 relationships, so 80% of those in Innovatori PA, but with fewer than a third of its active users. Degrees of separation are 2.50, and modularity much lower.

If you want to play with Edgesense – among other things it helps to see the growth of the network over time – go here for Matera2019. No need to install anything, you access it with your browser. I recommend the tutorial we prepared to teach basic network analysis for online communities (click on the “tutorial” link top right in the page. The Innovatori PA installation is still being tweaked; I will update this post as it becomes available.

Near-real time network analysis with Python and Tulip

Tulip network

Regular readers may remember that I am trying to build software for near- real time network analysis of online conversations – think Google Analytics, but focused on relationships between users of a website rather than counting pageviews. I hope to use it to research the mathematical signatures of different social phenomena that happen online. My overarching goal is contributing towards maintaining conversations healthy and useful at a large scale, and so finally make participatory democracy work at the level of the large city, the nation state, even the planet.
I have now a new strategy, based on a network analysis package called Tulip.

Why Tulip? I have been using Gephi so far, and there is much to be said for Gephi – it has a larger community than Tulip’s and more functionalities. The answer is: Python. The core of Tulip, stripped of the GUI, is a Python library. This means that I can have a very short chain to monitor changes in the network representing a conversation. My favorite configuration would look something like this:

  • The analysis concerns a community website running on Drupal + MySQL. 
  • A module called views_datasource gives the core module Views  the ability to query the database and export the results of the query as a JSON file.
  • A Python script, armed with the JSON Python library (which is the part I am writing now) parses the JSON and maps elements in the database to relationships, i.e. directed edges from some entity to some other. Which elements you want to map onto which relationships depends on the problem you are investigating, but once you set it up you can simply re-run the same View and refresh your JSON dataset with new elements.
  • With the Tulip library loaded into Python, the same script can also compute network metrics and visualize the graph.
  • The script’s output can be visualized as a web page, through a standard web server. A library such as sigma.js can  take care of visualizing the graph.

The number crunching can be done server-side in just one stop. The Python script calls a curl of the Drupal view, and loads the JSON dataset. Then it does the parsing, the building of the network and the computing of network metrics (and possibly non-network metrics too, which is another thing I am working on now. More on this in forthcoming posts) in just one pass. It then passes the results to a web server for packaging into a dashboard of some kind and some pretty visuals (many networks are beautiful!).

To be fair, Python + Tulip is not the only solution. Gephi is available as a Java library (known as “the toolkit” to Gephi enthusiasts), so you could build a similar workflow with Java + Gephi. I chose Python + Tulip because I can now do a little Python (and absolutely no Java) and because Guy Melançon, Benjamin Renoust, Bruno Pinaud, Marie-Luce Viaud at University of Bordeaux and INRIA are such great collaborators. They like Tulip and I like them, so Tulip it is 🙂

Why network science is humbling

(dedicated to Benjamin Renoust)

For several years now I have been fascinated with networks. While I have grown to appreciate the internal coherence and beauty of the math, as soon as I lift my gaze from the models and try to use them to tell complicated, real-world stories I am a part of (like Edgeryders, or the unMonastery), I struggle with counterintuition. Duncan Watts’ beautiful book hits the nail on the head: since we are humans, we tend to overestimate the role of humans in how things unfold. By implication, we underestimate the role of other factors at play, like chance or, indeed, network effects. Highly connected individuals in a scale-free social network (say, people with million of Twitter followers) are, understandably, tempted to claim credit for their privileged position. And yet, we have rock-solid models that explain the emergence of hubs based purely on the (realistic) characteristic of the growth process of a network – even when nodes are identical.

Of course, you could build more sophisticated models, in which nodes are different from each other. That would make them even more realistic: indeed, people do have different abilities, and in many domains these abilities can be ranked. Clay Shirky’s blog posts are better than mine. He deserves to have more incoming links than I have. But here’s the thing: network math can explain rich, complex behavior by assuming identical nodes and focusing only on patterns of connectivity. In fact, that’s the whole point . As you make that move, your math gets much more elegant and tractable: you get a model building strategy that carries through to a very broad range of phenomena (networks of genes,of food ingredients in recipes, of intermediate goods in an economy, of relay stations in a power grid…). But most importantly, if you, like me, are ultimately interested in networks of humans, you find yourself staring at a counterintuitive, yet probably fundamental, conclusion:

Identity. Does. Not. Matter.

Or, more accurately, your pattern of connectivity – for modelling purposes – is your identity. In most models, you can start with identical nodes, add some randomness and watch the system create hubs of influence and power. Given how uncanny the predictive power of these models is, it is hard to escape the conclusion that they describe reality to some degree; in other words, that who we are is largely the product of chance and network math.

I find this thought beautiful and humbling, in a way that I can only describe as almost religious (even though I am not a believer in any faith). As I contemplate it, I feel somehow closer to my fellow humans, the powerful and connected as well as the weak and isolated. This may sound like not very scientific a conclusion, but I feel it is not a bad stance for social scientists and economists. Our disciplines can always use some extra empathy. Within the context of the Crossover project, I have been advocating for network analysis to be included in the toolkit of the modern policy maker; empathy is yet another argument for doing so.