Tag Archives: networks

Networks, swarms, policy: travels across the weird, dark landscape of 21st century policy making

I used to be an economist. Then in the two-thousands, I started to read about complexity science. I chased an intuition telling me networks are important (it was 2009, I still remember the epiphany when I saw the network analysis of interactions in Kublai) and started to study them. I was – still am – looking for a sort of Holy Grail: design and build online communities that can deploy collective intelligence to attack problems too complex for individuals (even very smart ones) or small groups to crack. To burrow deeper into the issue I had to re-learn some linear algebra and probability theory; and that unlocked paths entirely new to me, dark passageways across computational biology and experimental psychology.

The landscape got really strange, a far cry from the orderly, well-lit architecture of standard economics. Quite dangerous too: it’s full of philosophical traps (if it is really collective intelligence, will we individuals be able to recognize it? Would that not be like a neuron trying to understand the brain?) and even moral dilemmas (it is possible that the well-being of a system implies sacrificing its components, just like a species evolves killing off its weakest members: what happens if the system is society and we are its components? Do we sacrifice the whole or the parts?).

But here’s the craziest thing: I am not the only one wandering in this place, wherever it is. In the world of public policies, where I have worked for years, with every passing month I recognise new fellow travellers. I find myself talking of esoteric stuff like evolving networks, smart swarms, online ethnographies, variability engines. I feel like a sixteenth-century alchemist: we do stuff, it seems to work, we am not quite sure why but it works too well to be just random luck. We feel on the verge of an important discovery, something like the seventeenth-century scientific revolution. This weird, dark world is behind my talk at Personal Democracy Forum, held a month ago in Rome. If you want a taste, the video is below, both in the English original (left audio channel) and in Italian translation (right).

(Dedicated to Giulio Quaggiotto)

Algorithmic detection of specialization in online conversations

This is a writeup of the Team 1 hackathon at Masters of Networks 2. Participants were: Benjamin Renoust, Khatuna Sandroshvili, Luca Mearelli, Federico Bo, Gaia Marcus, Kei Kreutler, Jonne Catshoek and myself. I promise you it was great fun!

The goal

We would like to learn whether groups of users in Edgeryders are self-organizing in specialized conversations, in which (a) people gravitate towards one or two topics, rather than spreading their participation effort across all topics, and (b) the people that gravitate towards a certain topic also gravitate towards each other.

Why is this relevant?

Understanding social network dynamics and learning to see the pattern of their infrastructure can become a useful tool for policy makers to rethink the way policies are developed and implemented. Furthermore, it could ensure that policies reflect both needs and possible solutions put forward by people themselves. The ability to decode linkages between members of social networks based on the areas of their specialization can allow decision makers and development organisations to:

  1. Tap into existing networks of knowledge and expertise to gain increased understanding of a policy issue and of the groups most affected (i.e. the target population of a policy)
  2. Identify pre-existing bottom-up (ideas for) solutions relevant to the policy issue at hand
  3. Bring together networks with a proven interest in a policy issue and leverage their engagement to design new solutions and bring about change

Compared to traditional models of policy development, this method can allow for more effective and accountable policy interventions. Rather than spending considerable resources on developing a knowledge base and building new communities around a policy theme, the methodology would enable decision makers and development organisations alike to tap into available knowledge bases and to work with these existing networks of interested specialists, saving time and resources. Moreover, pre-existing networks of specialists are expected to be more sustainable as a resource of information and collective action than ad-hoc networks built around emerging policy issues.

The data

Edgeryders is a project rolled out by the Council of Europe and the European Commission in late 2011. Its goal was to generate a proposal for the reform of European youth policy that encoded the point of view of youth themselves. This was done by launching an open conversation on an online platform (more information).

The conversation was hosted on a Drupal 6 platform. Using a Drupal module called Views Datasource, we exported three JSON files encoding respectively information about users; posts; and comments.
These data are sufficient to build the social network of the conversation. In it, users represent nodes; comments represent edges. Anna and Bob are connected by an edge if Anna has written at least one comment to a piece of content authored by Bob. We used a Python script with the Tulip library for network analysis to build the graph and analyze it. The result was a network with 260 active people and about 1600 directed edges, encoding about 4000 comments.

To move towards our goal, we needed to enrich this dataset with extra information concerning the semantics of that conversation (see below).

What we did

To define the extent to which degree people gravitate towards certain topics, and towards each other, we carried out “entanglement analysis” on a dataset containing all conversations carried out between members of the Edgeryders network. Entanglement analysis was proposed by Benjamin Renoust in 2013; we performed it using a program called Data Detangler (accessible at http://tulipposy.labri.fr:31497/).

1. Understanding Edgeryders as a social network of comments

These data can be interpreted as a social network: people write posts and comment on them; moreover, they can comment other people’s comments. Within this dataset, each comment can be interpreted as an edge, connecting the author of the comment to the author of the post or comment she is commenting on. Alternatively, we could interpret them as a bipartite network that connects people to content: comments are edges that connect their authors to the unit of content they are commenting.

2. Posts are written in response to briefs

Each of the posts written on Edgeryders is a response to set briefs, or missons, that sit under higher level campaigns. This means that many posts – and associated comments – live under the higher level ‘topic’ of one of nine campaigns.

3. Keywords indexing briefs

In order to understand how the various topics and briefs connect to each other we analysed the keywords that defined each mission/brief. This was carried out by manually analysing the significance of word frequency for each post. Word Frequency was ascertained by using the in-browser software http://tagcrowd.com/faq.html#whatis to work out the top 12-15 words per mission. We then manually verified these words and kept those that are semantically relevant (removing, for example names, or words that were too general, or that were a function of the Edgeryders platform itself- e.g. ‘comment’ or ‘add post’).

The combination of these three elements gives us a multiplex social network, that is indexed by keywords. A multiplex social network is one where there are multiple relations among the same set of actor. The process can be visualized in Figure 1.

Fig. 1 – Building a multiplex  social network where edges carry semantics. Fig. 1 – Building a multiplex social network where edges carry semantics.

4. Drop one-off interactions

We dropped edges that are linked to only one brief. These are edges of  “degenerate specialistic” interactions; as they only interact in the context of one brief, they are specialistic only by default.

5. Remove generalist conversations

At this point, we had a multiplex social network of users and keywords. Users were connected by edges carrying different keywords – indeed, each keyword can be seen as a “layer” of the multiplex network, inducing its own social network: the network of the conversation about employment, the network of the conversation about education etc. Many of the interactions going on are non-specialized; the same two users talk of several different things. In order to isolate specialized conversation, for each individual edge of the multiplex we remove all keywords except those that appear in all interactions between these two users. In other words, we rebuild the network by assigning to each edge the intersection of the sets of keywords encoded in each of the individual interactions. In many cases, the intersection is empty: it only takes two interactions happening in the context of two briefs with no keywords in common for this to happen. In this case, the edge is dropped altogether.

A nice side-effect of 4 and 5 is to greatly reduce the influence of the Edgeryders team of moderators on the results. Moderators are among the most active users; while this is as it should be, they tend to “skew” the behaviour of the online community. However, 4 removes all the one-off interactions they tend to have with users that are not very active; and 5 removes all the edges connecting moderators to each other, because they – by virtue of being very active – interact with one another across many different briefs, and as a result the intersection of keywords across all their interactions tends to be zero.

6. Look for groups of specialists

We then identified groups of specialists by identifying those users interacting together solely around a small number of keywords (e.g. in example, n(keywords) = 2).

Figure 1. Specialized conversations on education and learning Figure 1. Detecting specialized conversations on education and learning.

Conclusions

The method does indeed seem to be able to identify groups of specialists. “Groups” is used here in the social sense of a collection of people that not only write content related to the keywords, but interact with one another in doing so – this is to capture the collective intelligence dimension of large scale conversations. Figure 1 shows some conversations between people (highlighted on the left) that only interact on the “education” and “learning” keywords (shown on the right). Highlighted individuals that are not connected to any highlighted edges are users who do write contributions that are related to those keywords, but are not part to specialized interactions on those keywords.

Once a group of specialists is identified, the next step is to look for the keywords that co-occur on the edges connecting them. An example of this is Figure 2, that shows the keywords co-occurring on the edges of the conversations involving our specialist group on education and learning. The size of the edge on the right part of the figure indicated that keyword’s contribution to entanglement, i.e. to making that group of keywords a cohesive one. Unsurprisingly, “education” and “learning” are among the most important ones. More interestingly, there is another keyword that seems to be deeply entangled with these two: it is “open”. We can interpret this as follows: specialized interaction on education and learning is deeply entangled with the notion of “open”. The education specialists in this community think that openness is important when talking about education.

MoN2_Fig_2 Figure 3. Discovering more keywords entangled with the original two in the specialized conversation.

This method is clearly scalable. It can be used to identify “surprising” patterns of entanglement, which can then be further investigated by qualitative research.

Scope for improvement

The main problem with our method was that is is quite sensitive to the coding by keyword. Assigning the keywords was done by way of a quick hack based on occurrency count. This method should work much better with proper ethnographic coding. Note that folksonomies (unstructured tagging) typically won’t work, as it will introduce a lot of noise in the system (for example, with no stemming you get a lot of false (“degenerate”) specialist.)

 

The dampened contagion: spreading memes in an economy of attention

I really enjoyed a recent paper by Nathan Hodas and Kristina Lerman called The Simple Rules of Social Contagion. It resonates strikingly with my own work. They start by asking themselves why is it that “social contagion” (the spreading of memes) does not behave like contagion proper as described by SIR models – in the sense that, for a given network of interactions, social contagion spreads slower than and not as far as actual epidemics. The way they answer this question is really nice, as are they results.

Their results is the following: social contagion effects can be broken down into two components. One is indeed a simple SIR-style epidemic model; the other is a dampening factors that takes into account the cognitive limits of highly connected individuals. The idea here is that catching the flu does not require any expenditure of energy, whereas resharing something on the web does: you had to devote some attention to it before you could make the decision it was worth resharing. The critical point here is this: highly connected individuals (network hubs) are exposed to more information than less connected ones, because their richer web of relationships entails more exposure. Therefore, they end up with a higher attention threshold. So, in contagion proper wherever the infection hits a network hub diffusion skyrockets: hubs unambiguously help the infection spread. In social contagion on hitting a hub diffusion can still skyrocket if the meme makes it past the hub’s attention threshold, but it can also decrease if it does not. Hubs are both enhancers (via connectivity) and dampeners (via attention deficit) of contagion. This way of looking at things resonates with economists: their models work well only where there is a scarce resource (attention).

Their method is also sweet. They consider two social networks, Twitter and Digg. For each they build an exposure response function, which maps the probability of users exposed to a certain URL to retweet it (Twitter) or vote it (Digg). This function is in turn broken into two components: the visibility of incoming messages (exposures) and a social enhancement factor – if you know that your friends are spreading a certain content, you might be more likely to spread it yourself. So, the paper tracks down the visibility of each exposure through a time response function (probability that a user retweets or votes a URL as a function of the time elapsed since exposure and their number of friends). At the highest level, this is modeled as a multiplication: the probability of becoming infected by the meme for a in individual with n_f friends after n_e exposures is the product of the social enhancement factor times the probability of finding n of the n_e exposure occurring during the time interval considered.

At this point, the authors do something neat: they model the precise form of the user response function based on the specific characteristics of the user interfaces of, respectively, Twitter and Digg. For example, in Twitter, they reason, the user is going to scan the screen top to bottom. Her probability of becoming infected by one tweet can be reasonably assumed to be independent of her probability of becoming infected by any other tweet. Suppose the same URL is exposed twice in the user’s feed (which would mean two of the people she follows have retweeted the same URL): then, the overall probability of the user not to become infected is given by the probability of not becoming infected by the first of the tweets times that of not becoming infected by the second tweet. For Digg, they model explicitly the social signal given by “a badge next to the URL that shows the number of friends who voted for the URL”. So, they are accounting for design choices in social software to model how information spreads across it – something I have myself been going on about for a few years now.

This kind of research can be elusive: for example, Twitter is at core a set of APIs that can be queried in a zillion different ways. Accounting for the user interfaces of the different apps people use to look at the Twitter stream can be challenging: the paper itself at some point mentions that “the Twitter user interface offered no explicit social feedback”, and that is not quite the way I perceive it. But never mind that: the route is traced. If you can quantify the effects of user interfaces on the spreading of information in networks, you can also design for the desired effects in a rigorous way. The implication for those of us who care about collective intelligence are straightforward: important conversations should be moved online, where stewardship is easy(-ier) and cheap (-er).

Noted: there are some brackets missing from equation (2).