Tag Archives: network analysis

Is your online community sustainable? A network science approach

The latest years have seen an unmistakable drive towards government agencies, local and regional authorities and public institutions in general to spawn online communities. For many reasons (desire to harness collective intelligence online; need for legitimacy through open participation; top-down policy drive for modernization) they come, and will probably keep coming. This, however, raises the issue of funding. How much do public-sector funded online communities really cost? How do their running costs evolve over time? Some commentators think that citizen engagement online is very cheap to maintain – after all, citizen engagement is the civic equivalent of user generated content, and user generated content is made by users, for free. There can be significant costs to get one going, associated to purchasing and deploying technology and investing in its startup, but then one can relax, sit back and enjoy the flight.

My experience, and that of numerous colleagues, is that this is largely a myth. It is probably true for very large communities, in which even a tiny share of active users can drive a lot of activity because even a a tiny minority is still a lot of people in absolute numbers. But the communities typically spawned by government authorities are generally small: less than a thousand for mobility in Milano, a few thousands for peer-to-peer collaboration on business plans for the creative industries in Italy, maybe a few tens of thousands somewhere else – forget it. I learned this the hard way, as administrative uncertainty almost crashed the previously buzzing community of Kublai.

But, if public policy-oriented online communities are not 100% self-sustaining, many do display some degree of sustainability – and therefore, all other things being equal, of cost advantages. This was certainly true of Kublai: almost three years of administrative uncertainty and false starts have undermined it, but not killed it. The community was still showing signs of vitality in July 2012, when the new team finally was hired. So, how can we measure the degree of sustainability?

An intuitive way to do it is to look at user generated content vs. content created by paid staff. It works like this: even if you have the best technology and the best design in the world, a social website is by definition useless if no one uses it. The result is that nobody wants to be the first to enter a newly launched online community. Catherina Fake, CEO of the photo sharing website Flickr, found a clever workaround: she asked her employees to use the site after they had built it. In this way, the first “real” users that wandered in found a website already populated with people who were passionate about photography – they were also paid employees of the company, but this might not have been obvious to the casual surfer. So the newcomers stayed in and enjoyed it, making the website even more attractive for other newcomers, kickstarting a virtuous cycle. With more than 50 million registerered users, now Flickr presumably does not need its employees to stand in as users any more.

Let me share with you some data from Edgeryders. This project, just like many others, employs a small team of animators to prime the pump of the online conversation. Think of it as a blogging community with writing assignments: people participate by writing essays on the proposed topics, and by commenting one another’s submission. At the time I took the measurement (July 19th 2012) there were 478 posts with 3,395 comments in the Edgeryders database. The community had produced a vast majority of the posts – 80% exactly – and a much smaller majority of the comments – 55%. Over time, the community evolved much as one would expect: the role of the paid team in generating the platform’s content is much stronger at the beginning, and then it declines over time as the community gets up to speed. So, the share of community-generated content over the total is clearly increasing (see the chart above). Activity indicators in absolute terms have also increased quite fast until June, then dropped in July as a part of a (planned) break while the research team digests results. In this perspective, the Edgeryders community seems to display signs of being at least partly sustainable, and of its sustainability increasing. However, I would like to suggest a different point of view.

When talking about the sustainability of an online community, a relevant question is: what is it that is being sustained? In a community like Edgeryders (and, I would argue, in many others that are policy-oriented) it is conversation. The content being uploaded on the platform is not a gift from the heavens; rather it is both a result of an ongoing dialogue among participants and its driver. As long as the dialog keeps going, it keeps appearing in the form of new content. So, a better way to look at sustainability is by looking at the conversation as a network and asking what would happen to that conversation if the team were removed from it.

We can address this question precisely in a quantitative way with network analysis. My team and I have extracted network data from the Edgeryders database. The conversation network is specified as follows:

users are modeled as nodes in the network
comments are modeled as edges in the network
an edge from Alice to Bob is created every time Alice comments a post or a comment by Bob
edges are weighted: if Alice writes 3 comments to Bob’s content an edge of weight 3 is created connecting Alice to Bob

Thus specified, the Edgeryders network in mid-July 2012 consists of 3,395 comments, and looks like this:

Colors represent connectiveness: the redder nodes are more connected (higher degree). What would happen to the conversation if we suddenly removed the contribution of the Edgeryders team? This:

I call this representation of an online community its induced conversation. It selects only the interactions that do not involve the members of the team – and yet it is induced in the sense that these interactions would not have happened at all if the community managers had not created a context for them to take place in.

Even from simple visual inspection, it seems clear that the paid team plays a large role in the Edgeryders conversation. Once you drop the nine people that, at various stages, received a compensation to animate the community all indicators of network cohesion drop. An intuitive way to look at what is happening is:

the average active participant in the full Edgeryders network interacts directly with 6.5 other people (this means she either comments or receives comments from 6.5 other members on average). The intensity of the average interaction is a little over 2 (this means that, on average, people on Edgeryders exchange two comments with each person they interact with). Dropping the team members, the average number of interactants per participant drops to 2.4, and the average intensity of interactions to just above 1.5. Though most active participants are involved in the induced conversation, for many of them the team members are an important part of what fuels the social interactions. Dropping them is likely to change significantly the experience of Edgeryders, from a lively conversation to a community where one has the feeling she does not know anyone anymore.
more than three quarters of active participants do interact with other community members. However, only a little more than one third of the interactions happens between non-team community members, and do not involve the team at all. Notice how these shares are lower than the shares of community generated vs. team generated content.
49 out of 219 non-team active members are “active singletons”: they do contribute to user-generated content, but they only interact with the Edgeryders team. Removing the latter means disconnecting these members from the conversation. There is probably a life-cycle effect at work here: new members are first engaged by the team, which then tries to introduce the newcomers to others with similar interest. This is definitely what we try to do in Edgeryders, and I have every intention to use longitudinal data to explore the life-cycle hypothesis at some later stage.
the average distance from two members is 2.296 in the full network, but increases to 3.560 when we drop the team. The team plays an important role in facilitating the propagation of information across the network, by shaving off more than one degree of separation on average.

From an induced conversation perspective, it seems unlikely that the Edgeryders community could be self-sustaining. The willingness of its members to contribute content lies at least in part on the role played by its team in sustaining the conversation, making the experience of participating in Edgeryders much more rewarding even in the presence of a small number of active users.

That said, it seems that the community has been moving towards a higher degree of sustainability. If we look at the share of the Egderyders active participants that take part in the induced conversation, as well as the share of all interactions that constitute the induced conversation itself, we find clear upward trends:

Based on the above, I would argue that these data can be very helpful in making management decisions that concern sustainability. If you find yourself in a situation like that of Edgeryders in July and you run out of funding, for example, my recommendation would be to “quit while you are ahead”: shut the project down in a very public way while participants have a good perception of it rather than letting it die a slow death by the removal of its team. On the other hand, if you are trying to achieve a self-sustainable community, you might want to target indicators like average degree, average intensity of the interactions (weighted degree), average distance and rates of participation to the induced conversation, and try out management practices until you have established which ones affect your target indicators.

It’s trial and error, I know, but still a notch up the total steering by guts prevailing in this line of work. And it will get better, if we keep at it. Which is why I am involved in building Dragon Trainer.

See also: how online conversations scale. Forthcoming: another post on conversation diversity, all based on the same data as this.

How online conversations scale, and why this matters for public policies

8 Replies

I care about public policies, and try to contribute to their betterment. The road I am exploring is to take advantage of the social Internet to connect citizens among themselves and with government institutions to assess governance problems, design solutions and implement them – all in a decentralized fashion. I wrote a book to show it has been done, and to argue for it to be done more.

But it remains a tough sell. Many decision makers remain skeptical: why should online conversations converge onto evidence-based consensus? A few people who share a common work method can make an effective group, but a large number of very diverse and self-selected citizens – what I have been arguing for – is likely to collapse under the weight of trolling, controversy and sheer information overload. We have examples in which this did not happen: but we don’t have a theory to guide us in designing conversation environment which produce the desired results. Not good enough.

Some work I have been doing recently might provide a lead. As the director of Edgeryders, I marveled at the uncanny ability of that community to process complex problems – as I had done many times before in my years as a participant to online conversations. But this time I had access to the database, and – together with my colleagues at the Council of Europe and the Dragon Trainer project – I used it to reconstruct a full model of the Edgeryders conversation as a network. The network works like this:

users are modeled as nodes in the network
comments are modeled as edges in the network
an edge from Alice to Bob is created every time Alice comments a post or a comment by Bob
edges are weighted: if Alice writes 3 comments to Bob’s content an edge of weight 3 is created connecting Alice to Bob

I looked at the growth over time of the Edgeryders network as defined above, by taking nine snapshots at 30 days intervals, working backwards from July 17th 2012. For each snapshot I looked at four parameters:

number of connected components (“islands” in the network)
Louvain modularity of the network. This parameter identifies the network’s subcommunities and computes the difference between its subcommunities structure and what you would expect in a random network. Modularity can take any value between 0 and 1: higher values indicate a topology that is unlikely to emerge by chance, so they are the signature that some force is giving the network its actual shape; low values mean that the breakdown into subcommunities is weak, and could well have emerged by chance.
for modularity values indicating significance (above 0.4), the number of subcommunities in which the network is broken down by the Louvain algorithm

These indicators for Edgeryders agree that there is no partitioning in the network. All active members are connected in one giant component, whose modularity values stay consistently low (around 0.3-0.2) throughout the period analyzed. This is not surprising: my team at Edgeryders had clear instructions to engage all newcomers into the conversation, commenting their work (and therefore connecting them to the giant component). From a network perspective, the job of the team was exactly to connect every user to the rest of the community, and this means compressing modularity.

Next, I looked at the induced conversation, the network of comments that were not by nor directed towards members of the Edgeryders team. It includes conversations that the Council of Europe got “for free”, without involving paid staff – and in a sense the most diverse, and therefore the most interesting. To do this, I dropped from the network the nodes representing myself and the other team members and recomputed the four parameters above. Results:

there is a significant number of “active singletons”, active nodes that are only talking to the team members, but not to each other. This might indicate a user life cycle effect: when a new user becomes active, she is first engaged by a member of the paid team, who tries to facilitate her connection to the rest of the community (by making introductions etc. My team has specific instructions to do this). The percentage of active singletons decreases over time, from about 10% to less than 5%.
not counting active singletons, there are several components in the induced conversation network. A giant component emerges in February; from that moment on, the number of components is roughly constant.
the modularity of the induced conversation network (excluding singletons) is high throughout the observation period (over 0.5),
the modularity of the giant component is also high throughout the period (over 0.5). Interestingly, modularity grows in the November-April period, indicating self-organization of the giant component. In February it crosses the 0.4 significance threshold
the number of subcommunities in which the Louvain algorithm partitions the giant component also grows over time, from 3 in April to 11 in July

Subcommunities are color coded. Knowing Edgeryders and being part of its community (and having access to non-anonymized data), I can easily see that some of those subcommunities correspond to subjects of conversation. For example, the yellow group in the upper part of the graph is involved in a web of conversation about the Occupy movement and how to build and share a pool of common resources. Also, looking at the growth of the graph over time, subcommunities seem to grow sequentially more than simultaneaneously. This might be related to the management structure of Edgeryders: we launched campaigns (roughly one every four weeks) to explore broad issues that have to do with the transition of youth to adulthood. Examples of issues are employment/income generation and learning. So, an interpretation could be this: each campaign summoned users interested in the campaign’s issue. These users connected to each other in clusters of conversation, and some of them act as “bridges” across the different cluster, giving rise to a connected, yet highly modular structure. The video above has some nice visualizations of the network’s growth and of the most relevant metrics.

This looks very much like parallel computing (except this computer is made of humans), and could be the engine of scalability. As more people join, online conversation does not necessarily become unmanageable: it could self-organize into clusters of conversation, increasing its ability to process a certain issue from many angles at the same time. Also, this interpretation is consistent with the idea that such an outcome can be helped by appropriate community management techniques.

Ten years ago, Clay Shirky warned us that communities don’t scale. He was right, by his own definition of community – which is what in network terms is called a clique, a structure in which everybody is connected to everybody else. I would argue, however, his definition is not the most appropriate to online communities. Communities do scale, by self-organizing into structures of tight clusters only weakly connected to each other.

If we could generalize what happens in Egderyders, the implications for online policies would be significant. It would mean we can attack almost any problem by throwing an online community at it; and that we can effectively tune how smart our governance is by recruiting more citizens. appropriately connected, into it. We at the Dragon Trainer project are following this line of investigation and developing tools for data-powered online community management. If you care about this issue too, you are welcome to join us onto the Dragon Trainer Google Group; if you want to play with Edgeryders data, you can find them on our Github repository.

Coming soon: posts about conversation diversity and community sustainability based on the same data.

Meet the dragon hatchling: herding emergent behavior in an online community (long)

Why is this important?

For some time now policy makers have been fascinated by entities like Wikipedia: non-organizations, loose communities of individuals with almost no money and no command structure that manage, despite this apparent lack of cohesion, to collaborate everyday in producing complex, coherent artifacts. Such phenomena are made even more tantalizing by the uncanny speed and efficiency with which they do what they do. Can we summon Wikipedia-like entities into existence, and order them to produce public goods? Can we steer them? Can we do public policy with them?

In order to do so, we will need to learn to craft policy into a new space, which – following Lane and others – we call the meso level. Such policies will not be targeted at individual behavioral change (micro policies); nor at the economy or the whole society (macro policies). They will be targeted at achieving certain patterns of interaction between a large group of people. Individual may and will move in and out of these patterns, just like individual water molecules move in and out of clouds; but this does not much affect the behavior of the cloud.

Operating at the meso level – running an online community of innovators, for example – entails managing a paradox. Structuring interaction among participants as a network of relationships, of which participants themselves are the nodes, can result in extremely effective and rewarding participation, because – under certain circumstances – each participant is exposed to information that is relevant to them, while not having to browse all the information the community knows. This results in a very high signal to noise ratio from the point of view of the participants; they often report experiences of greatly enhanced serendipity, as they seem to stumble into useful information that they did not know they were looking for and was sent their way by other participants.

This extraordinary efficiency cannot be planned a priori by community managers, who – after all – do not and cannot know what each individual participant knows and what she wants to know. The desirable properties of networks as information sharing tools arise from the link structure being emergent from the community’s endogenous social dynamics. The paradox stems from the fact that endogenous social dynamics can and often do steer online communities away from its goals and onto idle chitchat or “hanging out”, that seems to be the default attractor for large online networks. So, managers of communities of innovators need to let endogenous dynamics create a link structure to transport information efficiently across the network while ensuring that the community does not lose its focus on helping members to do what they participate in it to do.

Building Dragon Trainer: a case study

With this in mind, I joined forces with emergence theorists, network scientists and developers to build the prototype of Dragon Trainer, an online community management augmentation tool. It models an online community as a network of relationships, and uses network analysis as its main tool for drawing inferences about what goes on in the community. Generally speaking community managers build knowledge of their communities by spending a lot of time participating rather than using formal analysis; and they act on the basis of that knowledge by resorting to a repertoire of steering techniques learned by trial and error. The error component in trial-and-error is usually fairly large, because by construction there is no top-down control in online communities; the community manager can only attempt to direct emergent social dynamics towards the result that she sees as desirable. Control over the software does give her top-down control in the trivial case of prohibition: by disabling access, or comments, she can always dampen activity directly. What she cannot do without directing emergence is enhancing activity – which is what online communities of innovators are for.

DT aims at augmenting this approach in two ways. Firstly, it allows the community manager to enrich the “local” knowledge she acquires by simply spending time interacting with the community. Such knowledge is extremely rich and fairly accurate for small communities, but it does not scale well as the network grows. Network analysis, on the other hand, scales well: computing network metrics on large networks is conceptually not harder than doing it on small ones, though it can get computationally more intensive. In an ideal situation, a community manager might start to use DT when her network is still small and she has a good informal understanding of what goes on therein simply by participating in it; she could then build a repertoire of recipes. We define recipes as formalisms that map from changes in the mathematical characteristics of the network to social phenomena in the community represented by that network. Recipes of this kind enhance the community manager’s diagnostic abilities, and take the form:

Network metric A is a signature of social phenomenon B.

As she tries out different management techniques to yield her desired results, she would then proceed to build more recipes, this time mapping from management techniques to their outcomes – the latter being also be measured in terms of changes in the metrics of the network representing the community. Recipes of this kind enhance the community manager’s policy making, and take the form:

To get to social outcome C, try doing D. Success or failure would show up in network metric E.

She then might be able to lean on repertoires of recipes of both kinds to run the network as it gets larger, because the software does not lose its ability to monitor those changes. These repertoires of correspondences are going to be built by integrating inputs from two different sources. The first one is theoretical: the systemic theory of emergence in the social world that some of my colleagues are engaged in developing. The second one is practical: the firsthand experience of community managers, myself included. Once built, the two repertoires would make up DT’s knowledge base, its computational intelligence core.

What follows is an account of a concrete case in which network science helped formulate a policy goal, the completion of which could then be monitored through, again, network analysis. It is only a small example, but we believe directed emergence is at work in it. And if emergence can really be directed then yes, in principle public policies can happen in the mesolevel and become closer to Wikipedia.

Context

In March 2009 I was the director of an online community called Kublai, a project of the Italian Ministry of Economic Development. People use Kublai to develop business plans for creative industries projects or companies. At the time it was about 10 months old; we had about 600 registered members working on 80 projects. I directed a small team that did its best to encourage people to try and think out new things, and to help each other to do so. Most creatives find it hard to achieve critical distance from their pet ideas, and an external, disenchanted eye might help them become aware of weaknesses or untapped sources of strength. Even simple questions like “why do we need your project?” or “how do I know this is better than the competition?” can help.

These conversations happen online, in the context of a small, dedicated social network that used the Ning software-as-service platform. We customized Ning’s translation to change the names of the “groups” functionality into “projects”: the object in the database was still the same, but rhetorically we were encouraging people to come together to collaborate to a project’s business plan. Ning groups lent themselves well to the task, because each sports (1) a project wall for comments; (2) a forum for threaded discussion; (3) group-wide broadcast message functionalities at the disposal of the group creator. In March 2009 the largets projects/groups in Kublai had about 60/70 members.

Ruggero Rossi, like me, is passionate about self-organizing behavior in the social world. When he proposed to do his thesis by running a network analysis of the Kublai social graph, I supported him in every way I could. The thesis was supervised by David Lane, a complexity economist I admire, which was an added bonus.

March 2009: diagnosis

The first problem was to specify our network. We decided nodes would correspond to people: each user is a node. The links could be several things, since there are several types of relationships between members of a Ning social network: relationships might be created by adding someone as a friend, leaving a comment on her wall, sending her a message, joining the same group/project etcetera. We decided to focus on collaboration in writing business plans, which is Kublai’s core business; we also decided that, in the context of Kublai, only writing in the context of a group/project counts as collaboration.

So we defined the link as follows: Alice is connected to Bob if they both have posted a comment on the same project. This is a somewhat bold assumption, because positing some kind of communication between the two implies that everybody who ever posted anything within a project reads absolutely everything that is posted in that project. I thought that was reasonable in the context of Kublai, also given the short time frame in which the comments had been piling up. This implies a bidirectional relationship: in network parlance, the graph is called undirected, and its links are called edges. The edges are weighted: the edge connecting Alice to Bob has an intensity, or weight, equal to the number of comments posted by Alice on the project times the number of comments posted by Bob on the same project. If they collaborate on more than one project, we simply add the weight of the link created across all projects on which they are both posting comments.

Eventually Ruggero crunched the data and showed me his results, that boil down to this:

all active (posting) members of Kublai were connected in a giant component: there was no “island”.
a kernel of people who were highly connected to each other acted as Kublai’s hub, connecting each participant to everybody else in the network. All of the paid team members were part of this kernel: no surprise here. More surprisingly, many non-team members were also part of the kernel. So many, in fact, that if you removed all of the team members from the graph it would still not break down; everybody would still be connected to everybody else.

Summer-fall 2009: policy

This was an epiphany for me. I discussed these results with the team, and our interpretation was this: a core of dedicated community members was forming that was buying into Kublai’s peering ethics. They took time off developing their own projects to help others with theirs. This was a very good thing, in two ways.

it implied efficiency. With more people participating in more than one project, Kublai could do a better job of transporting information from one project to another, and that is the whole point of the exercise. Alice is stuck with her project on some issue, and it turns out that Charlie, somewhere else in the network, has run into the same problem before. Alice does not know this, but she does collaborate with Bob, and Bob is a collaborator on Charlie’s project. So Bob can point Alice to what Charlie already did: Alice needs not walk Charlie’s learning curve all over again.
it implied resilience. If enough people do this, we thought, maybe the Ministry can turn Kublai over to the community, which will keep running it at little or no cost to the taxpayer. This would have created a public good out of thin air. Not bad!

So, we decided to encourage this self-organizing feature. How to do it? A way to go about it was to encourage especially people who were developing projects (progettisti) to interact more. What could bring them together? Purely social stuff like football or celebrities discussion groups were not even considered: they would mar the informal, yet hardworking atmosphere of Kublai. According to my readings on the early days of online communities, something that any community loves to do is discussing itself. So, we thought we would turn over some of the control over the rules of Kublai to the community, and we would put a significant effort in it. We created a special group in Kublai, the only one that was not a project at that point, and called it “Club dei Progettisti”. Joining was unrestricted; also, we actively invited everybody in the kernel and everybody who had started a project up to that point. We did things like coordinate to welcome newcomers and discuss the renovation of the Second Life island we used for synchronous meetings. The atmosphere was that of the inner circle, the “tribe elders” of our community. This went on from about May to the end of 2009.

December 2009: policy assessment

Was the policy working? It was hard to say. The Club dei Progettisti grew to be by far the largest and most active in Kublai, but that does not mean that people interacting more in that context would then go on to collaborate on business plans in the context of individual projects, that was our real goal. It did feel like we had a vibrant community going – but not all the time. And then vibrant with respect to what? And how does vibrancy translate into effectiveness? We spent a lot of time online, and sailed by instinct. Instinct checked green, but let’s face it – after one thousand users and 150 projects it was hard not to lose the overview.

With another round of network analysis I would have been able to have a stab at policy assessment. In network terms, I wanted the kernel to be bigger: more people not from the Kublai team, collaborating across more projects, would facilitate the information flow across projects and improve efficiency. But Ruggero had finished his thesis, and the administrative structure running Kublai was at this point so rigid that contracting him was next to impossible.

Only recently, two years later, did I get the chance to crunch an export of the Kublai database. We at the Dragon Trainer group extracted a snapshot of Kublai at March 23rd 2009 (the same day Ruggero scraped it for his thesis) and one at December 31st 2009.

Kublai community graph 2009

In these two images the nodes, representing members of the Kublai community, have been color coded according to a measure called betweenness centrality, indicating how often the node is in the shortest paths connecting other nodes (it is often interpreted as an indicator of brokerage efficacy). Yellow nodes are the least central and blue nodes the most central, with orange ones in an intermediate position; nodes representing the Kublai team employees (typically very central) have been dropped from the graph altogether. In March 2009, a handful of community members, less than ten, collaborated on several projects on a regular basis – and, as a result, did most of the brokerage of information across the network. In December, however, their number had about doubled, despite the fact that attaining orange or blue “status” required a lot more work (the most central node in the March network has betweenness centrality 1791, the one in the December network 7740). At the end of the year, Kublai’s kernel was both larger and more connected than it had been in March. This growth is an emergent social dynamics: there is no top-down control in these graphs, anybody I could tell “go form a link with X” has been dropped from the dataset. But this emergence is somehow directed: we wanted to get to a social arrangement whose graph looks like this.

You can see how powerful this thing is. We can already say something just by looking at the graph; we have not even started to crunch the numbers, let alone do more sophisticated things. We could (and we will) compute and compare measures of network centralization; respecify the network in many different ways, allowing for link impermanence (if Alice and Bob are linked but don’t keep interacting, after a while the edge fades out), bipartite networks (what about a people-project graph?) directed graphs (links representing monodirectional help rather than bidirectional collaboration: if Alice posts on Bob’s project, she is helping him, but Bob might not reciprocate); and play with the data in as many ways as we can think of.

We keep working on this, and we will continue to share our results and our thoughts. If you want to be a part of this effort, you can, and are absolutely welcome. Everything we do (the data, the code, the papers, even the mailing list) is open source and reusable. Download what takes your fancy and let us know what you are doing. We are looking forward to learning from you.

Download the raw data (database dump, anonymized).
Download and improvethe code to export the data into file formats supported by the main network analysis software.
Download the exported data if you would like to jump right into the network analysis. .net files (Pajek projects) are importable also in Gephi and Tulip.
join our mailing list if you want to be involved in our discussion. Everyone welcome, no technical background is required. We are online community managers mathematicians, coders, public policy practictioners, committed to being interdisciplinary and therefore to going out of our way to make anyone feel welcome.