Author Archives: Alberto

Photo: Marco Giacomassi

Missing out: why we don’t have an European open data community (yet)

The last weekend of March was SOD14, the second yearly gathering of the Spaghetti Open Data mailing list. The acronym in English may be awkward (it was just too funny to pass on!), but the event was just great. We had 182 people registered over the three days; attendance peaked at the conference on Friday 28, with 139 people in the room at the same time. About 100 people attended the hackathon Saturday 29 and the training session on Sunday 30. We produced 12000 tweets (and, being geeks, we archived them all). Everyone came on their own time and money.

The hackathon was spectacular: we had planned for four tracks, but so many people showed up that we ended up doing seven. We hacked things like data on goods confiscated to mafia bosses, the Open Knowledge Foundation’s open data census; we designed a sort of peer-to-peer service for civil servants wishing to release open data; there was a track for lawyers and one for civic monitoring.

Everything , from conference program to hackathon tracks, was built from the bottom-up. Spaghetti Open Data is a community: it has no money, no corporate structure, no leaders, so it can’t help being bottom up.  SOD14 was completely organized by volunteers: though our host city of Bologna and its regional government stepped in with free venues, free coffee and flawless connectivity and two (community-designed and delivered) mini-courses, for a grand total of 1500 euro. The community provided video trailers, logos, jingles and ringtones, t-shirts,stickers and even superheroes; there was a very diverse attendance, (data geeks, data lawyers, developers, data journalists, policy makers, even some open data archeologists) with a strong female presence. SOD14 had the playful energy of the really grassroots events. And when the event was over, people simply retreated to the mighty mailing list: at the time of writing, Spaghetti Open Data has three and a half years of life, 894 subscribers, 1,840 threads, an estimated 20,000 posts (well over 20 a day in 2014). It is far and away the largest open data resource in the Italian language.

So all was well, except that something was missing. There was no Europe in SOD14.

We did our best to stay in touch with our European brothers- and sisters-in-arms. We had our only keynote in English – with Wikimedia Germany’s Adam Shorland telling us about Wikidata. I personally called EPSI, DG CNECT’s initiative for promoting open data across the European Union, and asked them for support – not in the form of money, which we can’t accept anyway, but embodied in someone to come to our gathering and say “you are not alone, we are happy you are doing this work”. Even though we had updated and verified the EPSI scoreboard for Italy during 2013, nobody showed up at SOD14 to say “thank you” in person: they agreed to do so initially, but then they decided they were covered by Matteo Brunati, EPSI’s correspondent for Italy, present at SOD14.

Dear European Commission, as a European patriot and  an open data activist, I feel it is my duty to let you know you’ve wasted an opportunity, and to advise you never to do that again. In SOD14 we were not discussing Italian open data problems. All our problems were at least European. For example, we had a fascinating session about open data in archeology and cultural heritage. Italy is hardly the only European country to deal with these kinds of issues; we are struggling with very conservative cultural institutions here, and could benefit a lot from comparing notes with people doing equivalent work in, say, Greece or France. That’s where you could have made a difference – but didn’t. I could make ten more examples like this from SOD14 alone, and so could you.

Matteo is a high-level civic hacker, and EPSI is very fortunate to have him on board. We, on the other hand, are his home community, and talk to him every day. There is no value added to our event if you just put a different hat on his head. The way you add value to Matteo’s European commitment is to dispatch him to events like ours in Estonia, Belgium or Ireland; and the way you add value to Italian events like SOD14 is to dispatch people like Matteo, but with experience in Denmark and Spain and Austria. It’s horizontal relationships that make a community. I know you know this, because you have been doing Erasmus-like stuff in many variants and for a long time. But horizontal relationships are slow to build, and no one is working on building them now – not even you. And so, things that should be taken for granted don’t happen. Why don’t we have civic hackers from across the continent cooperating in doing some open data project about the European elections? Because European civic hackers don’t get the chance to hang out together all that much. Even TweetYourMEP was built exclusively by Italians. So, there is no such thing as a solid European civic hacking community.

But don’t give up just yet. Europe played a key role in unlocking the supply side of the open data scene. The EPSI Directive was fundamental in nudging less data-friendly governments like ours onto the right path. Europeana is a great idea. You have done well on those fronts: why should you not do equally well in helping unlock the demand side of open data? A year ago, EPSI interviewed me and asked me “what do you think Europe should do around open data?”. And I replied “invest in the community. Give them free venues, free travel and something to do” (this video, at 6:08). I still think that would be the best way to use your EPSI infrastructure. Actually, tell you what: why don’t you go all the way and start an “Erasmus for Open Data” program. A few hundred international exchanges, with people from across the continent actually working together on data projects, would go a long way towards creating the small world network we need to be a community at the European level. Spaghetti Open Data stands ready to help. Are you game?

The Edgeryders team at the unMonastery: left to right, Matthias Ansorg, Nadia El-Imam, Alberto Cottica, Noemi Salantiu, Arthur Doohan, Ben Vickers. Photo: Sam Muirhead CC-BY

The business corporation as a symbiont to a community: Edgeryders crosses a watershed

Last week Edgeryders LBG, the company I co-founded, closed its first substantial deal. We are going to be working with the United Nations Development Programme, scanning the horizon in three countries (Armenia, Egypt and Georgia) in the hope to detect trends that will shape our common future as they start to unfold. We are very excited: this is exactly the kind of cutting-edge work we aspire to do, and Giulio Quaggiotto and his posse at UNDP-CIS are exactly the kind of people we aspire to work with.

This deal marks a watershed in the Edgeryders trajectory. We were a joint project of the Council of Europe and the European Commission from launch in late 2011 to sunset at the end of 2012. In January 2013 some of us, enamoured with what we had come to see as a uniquely valuable community, stepped in and spun it off onto a newly built online platform. In May 2013 we founded a non-profit social enterprise, Edgeryders LBG, to provide the infrastructure and the sense of direction we felt were needed to keep the community together.

We wanted to do this by providing work opportunities to our community on the edge (many of us are close to uncontractable for various reasons: too young and unexperienced, too old, too minority, too anti-authoritarian, too inclined towards being self-taught rather than academic achievers…). And not just any work opportunities: meaningful ones, cutting-edge, high-risk, potentially world-changing, one-step-removed-from crazy work opportunities. We want to be the skunkworks of the global society, the Foreign Legion of social innovation, the people that have little to lose, and so can afford to go to the ugliest places and take on the scariest work.

We would do this in part directly, by going out and pitching our community as a “distributed think tank” that swarms near-instantly around any interesting problem you throw at it; but the most innovative part of the model was that we would also help members of the community to provide those opportunities for themselves and each other. To secure this, we built our company so that it can serve as a vehicle for anyone in the community to use. This way, people would be able to quickly prototype their ideas without worrying about having to start a company: if they needed an incorporation they could simply use us as a “corporate shell”, an interface towards a world that understands corporations but not communities. Basically, anyone who wishes to do so (with minimal limitations) can put on an Edgeryders hat and talk to potential clients or funders as if representing the company – this makes us the first (to my knowledge) corporation without permission. On launching a successful project, we simply hire them to run them: this is a process we describe as hiring yourself. Of course, we also informally try and help people with ideas and the will to work hard, mostly by connecting with others in the community with relevant skills and experience.

We gave ourselves a year to find out whether this plan had a chance of working. We were not too worried – we had learnt our lesson from the tech industry so many of us gravitate around, and had made it really cheap to fail.

Three months to go to that deadline. Here’s where we are:

  1. On the corporate front, we have secured the UNDP contract. Two more contracts are in the pipeline, and we expect them to come through well before May.
  2. We have secured a deal with the Italian city of Matera to provide a (spectacular!) building and some seed funding for the world’s first unMonastery, a project of some visionary edgeryders led by Ben Vickers. After much preparation, unMonastery Matera went live on February 1st.
  3. We have served as a corporate shell for several community projects. Two of them succeeded in raising seed funding: these are Matthias Ansorg’s Economy App, winner of the first European Social Innovation Competition in 2013, and David Bovill’s Viral Academy, recipient of a Nominet Trust grant on digital innovation in 2014. I am confident that many more will come through, for reasons explained below. Another project just launched is Said Hamideh’s EdgeLance, a communication agency that leverages the unusual brand of creativity of many edgeryders to build cutting-edge communication services. Said, a professional freelance communicator, has chosen wrap EdgeLance into the Edgeryders LBG corporate shell. News of more initiatives are coming in daily.
  4. Meanwhile, the community has thrived despite the end of the Council of Europe’s tenure. We have been able to organize, with no funding at all, the third Living On The Edge event, that gathered over 100 edgeryders from all over the continent in the (then unfinished) unMonastery premises. Over the past year, the community has gathered 700 new members and produced about 1,000 posts, wikis and tasks and well over 3,000 comments.

My conclusion: our proof of concept is done. Edgeryders can indeed be a viable business. But we are well aware that proving a concept is not the same thing as making it work in practice. We may be fast and smart, but incumbent consulting conglomerates are big, and scary. Can we really carve a niche for ourselves, expand it and keep the McKinseys, Accentures and Gartners of the world away from it?

Time will tell. But we do have one thing we have going for us: we are not a predator, we are a mutualistic symbiont to our communityWe don’t just recruit the smartest people from the community; we hate digital sharecropping, and try very hard never to be the slightest bit exploitative. We invest in the community and serve it as best we can; we believe we can only be a viable business because we serve it. Investments in this community pay back tenfolds, because it is so smart and fast as to be almost frightening. New conventions and tools continue to be proposed: some are adopted and spread, like the community call, the “call a human” button, the Twitterstorm, the Task Manager.

Among the potentially most significant are the FormStorm and its Recycling Bin, dreamed up by Ksenia Serova and her crew: the idea is to socialize application writing, helping each other take part in contests and competitions. This was tested very successfully with the European Social Innovation Competition: the community got together (virtually) and produced 13 applications (about 1% of the total applications submitted throughout Europe). Two of them, Giacomo Neri’s Moove and Epelia’s Food Supply Unchained, were shortlisted for the semi-finals (Lois is prototyping the latter in unMonastery Matera, another sign that a whole ecosystem is emerging from what we do). More, much more is cooking.

While many edgeryders are individually very smart, we believe this kind of performance to be an emergent property of the whole community, with its tools and its values. It is, truly, collective intelligence.  And if this is what happens with fewer than two thousands registered users, we can only imagine how fast this crowd can move as that number scales  to a mere twenty thousands. We can’t wait to find out.

A phrase from Chris Anderson’s famous article about the makers movement’s next industrial revolution comes to mind. In that article, he describes his own company, DIY Drones, as a typical small, family-run business, initially run by Anderson’s garage. Then he adds:

But the difference between this kind of small business and the dry cleaners and corner shops that make up the majority of micro-enterprise in the country is that we’re global and high tech. Two-thirds of our sales come from outside the US, and our products compete at the low end with defense contractors like Lockheed Martin and Boeing. Although we don’t employ many people or make much money, our basic model is to lower the cost of technology by a factor of 10 (mostly by not charging for intellectual property). [...] When you take an order of magnitude out of pricing in any market, you can radically reshape it, bringing in more and different customers.

This describes accurately what we are trying to do to consulting. We are tiny, barely starting to bootstrap from sweat equity, and yet we are already global – we are doing work in Armenia, Egypt, Georgia, Germany, Italy, the UK; we are negotiating deals in South Africa, Sweden, Uganda, the United Arab Emirates; we participate in conferences in places like Thailand and Montenegro (not to mention the fact that our community lives in 40 countries). We are resolutely open, both in content an in software, hence we don’t charge for intellectual property. And yes, we are cheap, and we aim to get people and orgs who do good work, but can’t afford to pay for standard consulting, to turn to us.

If you like this vision, you can help make it come true.

  • If you run a business, a public- or a third sector organization you can join UNDP as one of our “founding clients”: you will be an early adopter of  our open consulting services, and we will strive to reward your belief in us by overdelivering and sharing with you our learning journey. If you wish to find out more about how this would work, just contact me.
  • If you are building a project for a better world, or want to collaborate to one, consider joining the Edgeryders community. Be sure to contact Noemi to say hello, she’ll help you make the most of the community.

We scan the horizon for UNDP, to discern the shadow of the future. But the feeling is very strong that a warm, glowing piece of future is right here.

The Edgeryders conversation network in December 2012

Farming online conversations: assessing moderators impact with panel data econometrics (long)

Online communities have been my Swiss army knife for the best part of a decade. I tend to throw an online community at every problem I face – and that’s not limited to work: even my wedding party was organized that way.

We all know online communities have interesting properties, but how much can we control them? Can we “farm” them, growing one around each problem we are interested in? How expensive is this likely to be? These questions are relevant for my work, because we can’t use them as tools, as I tend to do, without some degree of control on them. I simply need to know if my intuition of online communities as general purpose collective intelligence tools is grounded, or if I am just delusional. As part of obsessing on this problem, two years ago I started a semi-structured research project, which is supposed to become a Ph.D. thesis. Some of the results are now in: the executive summary is that there seems to be evidence for our ability to artificially grow an online conversation about a specific problem.

Data source: the Council of Europe’s Edgeryders

I am using data from a project called Edgeryders. It was meant as a “distributed think tank”, an attempt to grow and harvest a large-scale online conversation around the task of building a proposal for reform European youth policy. The project was led by the Council of Europe and co-funded by the European Commission; it took place between late 2011 and the end of 2012. I was its director, as I worked at the Council of Europe at the time.

Edgeryders was based on an online interactive platform; its database if the source of all data described in this post. It worked like this: my team would ask it research questions grouped by topics (topics were broad issues like employment and income generations, education and learning etc.). Anyone was free to join and provide answers in the form of blog posts. All such blog posts were commentable for validation. By the end of the exercise, it had about 1200 registered users. Most were “lurkers” who never wrote anything on the platform. Those that did contribute were 260; collectively they wrote 500 posts and over 4000 comments. This material was analyzed by ethnographers, who used it to construct a publication on youth issues as seen from the youth themselves. The researchers obviously found Edgeryders material relevant for the task at hand. This, however, does not per se prove that we could really “farm” an online conversation around the issue of youth policy: maybe we simply intercepted a need to discuss the issue, and our efforts to grow and steer the conversation were in vain, or even counterproductive.

A small team of moderators devoted part of their time to encouraging users to share and discuss their point of view using positive reinforcement. This was a policy in the strict sense of the word: they were paid and instructed to engage users, especially first-time ones (“Hey, X, this is really interesting!”) ; ask them questions conducive to extracting issue-relevant information from the conversation; and connect them with other users with similar interests or approaches. Sometimes this would happen spontaneously, as the members of the Edgeryders community engaged one another; but when this did not happen by itself within hours of somebody writing a new post, the team of moderators was tasked with “breaking the ice.”

Modelling strategy

To assess the impact of the policy on the conversation, we proceed as follows. First, we model the Edgeryders conversation as a social network, whose nodes are active users and whose edges are comments. The network is directed: user i is connected to user j if i has commented some content (one or more posts or comments) written by j (comments are threaded in Edgeryders, so users can comment another comment). This was done by

  • using Datasource on the (Drupal) Edgeryders platform to extract relevant information from the database in JSON form;
  • writing a Python script to read the data and build the Edgeryders network, in conjunction with Tulip for graph analysis.

With dynamic network analysis still in its infancy, the main challenge to address this question was how to capture the time dimension of the data. I adopted the approach of “slicing” the course of the project into 57 one-week time periods, ranging from late October 2011 to December 2012. For each period I extracted the corresponding subgraph, using the following conventions:

  • “edges don’t die”. Let user i comment user j‘s content for the first time in period t. The edge from i to j appears in all graphs described from period t onwards. The interpretation of this convention is that, by interacting, i and j transform their relationships (among other changes, i is giving j a more or less explicit permission to interact with him or her) and this transformation is permanent in the context of the project (about one year).
  • nodes appear from the period in which they first create their Edgeryders account, even if they will only write their first contribution in subsequent periods. Nodes corresponding to not-yet active users will of course show up in the network as singletons, whereas nodes corresponding to users that will never become active are simply dropped from the network.

The final subgraph has 260 nodes and 4041 edges, many of them parallel to each other(same source and destination, different dates).

Once I had the slices, I needed a model of individual user behaviour that I could run against the data. The natural thing to do, with network data, would be to stay with a network approach, and estimate

 \forall i,j \in N P(e_i,j,t)|S_t

Where N is the number of nodes P(e_i,j,t) is the probability of user i forming an edge e_i,j at time t and S_t denotes the state of the system in terms of non-network variables at t. Unfortunately, this is not computationally viable. So, I collapsed network information into a vector of variables attached to each user, and allow it to vary over time. This transforms the problem into one of estimating

 A_i,t = f(A_{j \ne i, t}, EgoNetwork_{i,t} ,GlobalNetwork_{i,t} )

where:

  • A_i,t is the activity of user i at time t
  • A_{j \ne i, t} is the activity of other users at t
  • EgoNetwork_{i,t} is a vector of ego network variables (for example in-degree and clustering coefficient)
  • GlobalNetwork_{i,t} is a vector of global network variables (like density and modularity).

Once ego- and global network metrics have been computed for each time slice, this problem is tractable by panel data statistical techniques. Incidentally, notice that the relatively small size of each time slice – only one week – was chosen to capture the signal of the many lagged variables I had to use to avoid endogeneity issues. This, however, has the flip side of making most users inactive at most periods (13710 observations out of 14820 take value zero). This makes the dependent variable almost, but not quite, binary (taking value “nothing” in most cases and “something” in the rest). Therefore, the estimate was computed using a negative binomial model with fixed effects. This model estimates mostly the effect of regressors on activation (the probability of users to become active), but – unlike fully binary models like logits – it also uses the extra information encoded in users writing more than a post or comment in a given period.

An even more fundamental flip side is that this “flattening” of the network into a vector loses key information about the identity of who is connected with whom. In this first iteration, I kept track of whether inbound and outbound comments for each user come from moderators or non-moderators. Other refinements can be added.

Results

Preliminary data exploration shows evidence of structural change in early April 2012 (period 21). This corresponds to the date when a major Edgeryders conference, to be held in June the same year, was announced. People involved in the project confirm that this announcement was a game-changer in the project, as it made the Edgeryders proposition to its community much clearer and more credible – we referred to it as “addressing the what’s-in-it-for-you question”. We promised to cover travel expenses for about 50 contributors to the platform to get together in Strasbourg and flesh out the policy document that the Council of Europe would then advocate with respect to the European Commission and its own member states. We could spot structural differences across three subsets of the data: the first one describing activity in the interval before period 21; the second one describing activity of “old” users (who became active before period 21) after period 21; and the third one describing activity of users who became active in periods 21 and later.

The results of our negative binomial estimation for the whole dataset and for each of the three subsets are visible below.

To a first approximation, these results hold two lessons.

  1. Policy works. In all subsets as well as in the whole dataset receiving comments by moderators as well as by non-moderator community members has a positive and strongly significant impact (p-value < 0.01). This result is unambiguous.
  2. The network’s shape influences activity. All models show strongly significant influence of some variables capturing the shape of users’ ego networks as well as the global network. This is nontrivial, because some of these variables, like modularity, cannot be perceived directly, even by the most attentive users, without access to the database and network analysis software. However, these results are (still) ambiguous, and not consistent across subsets of data.

Implications

Result 1 has clear implications for online community managers running collective intelligence exercises. It says that user activity propagates across the conversation network, with each user receiving an impulse from in-neighbours and retransmitting it with its own input added (with some probability) to its out-neighbours. This is consistent with this paper by Nathan Hodas and Kristina Lerman, who find that contagion models explain well the spreading of information across online social networks after accounting for meme visibility as constructed by the designers of the online social networks themselves. The overall picture is reminescent of neural networks, a topic that I know little; it might be worth exploring this similarity further, as neural networks are in a sense a natural modelling choice for collectively intelligent organizational arrangements.

Contagion in a random directed network. Starting from the bright red node in the center (the moderator), the signal travels across the network.

Contagion in a random directed network. Starting from the bright red node in the center (the moderator), the signal travels across the network.

In a policy perspective, the result is saying that you can indeed “farm” an online conversation  by deploying one or more moderators/animators to interact with users. It works like this: a moderator engages directly some users; this increases the probability that they will become active; if they become active, this increases the probability that these first users out-neighbours will also become active and so on. This confirms anecdotal evidence: for example, as Shirky relates (source), photo sharing web 2.0 company Flickr deployed its own employees as the first users of the website, so that, when unpaid users came online, they found a lively conversation going on. Caterina Fake, Flickr’s CEO, reportedly remarked “You have to great the first ten thousand users personally”.

For online community managers, such a policy has two attractive properties.

  1. Firstly, there is a sort of multiplier effect: moderators activate users, whose activity in turn activates other users – remember, the coefficients on both comments received by moderators and comments received by non-moderators is strongly significant.
  2. Secondly, it does not matter which users in particular are targeted by moderators initially, because the activation signal propagates across the network according to some contagion model (video). To a first approximation, propagation will be limited only by the size of the strongly connected component that the signal starts in (though of course the size and topology of the giant component is itself endogenous to user activity; by becoming active, users can choose to activate new edges). Most real-life social networks tend to evolve towards a topology that features a giant component gathering a large share of all participants. In December 2012, the Edgeryders conversation network featured a giant strongly connected component that gathered more than 50% of active users. 100% of active users were part of a weakly connected giant component.

So, the model’s implications for the community managers wishing to “farm” an online conversation around a specific topic can be translated into two simple rules:

  1. Deploy moderators to engage users. This policy increases activity.
  2. Try to connect users with other users. This policy increases connectivity and, critically, the size of the network’s giant component. Moderators in Edgeryders were instructed to do just that, trying to match new users to existing ones on the basis of their first contributions (“Hey, Anna, this is really interesting! Something similar, but with a very different slant, has been proposed by Bob at this link…”).

The more activity and connectivity there are in the network, the less the conversation needs moderator effort to keep going. This is not so intuitive, and indeed online community managers tend to think it is the number of users, not the number of connections among them, that drive a conversation towards being self-sustaining. Again, if true this is excellent news for people who, like me, deploy online networks to attack problems by a collective intelligence approach, because it implies that you can get nearly self-sustaining (hence cheap to maintain) conversations even with relatively small networks, in the hundreds of low thousands of active users. This is because the potential connections in a graph grow quadratically with respect to the number of its nodes, so for a given level of performance, moderators can reduce costs both by attracting new users and by connecting existing users to each other.

What to monitor?

In the context of the CATALYST project, I am involved in an effort to build easy-to-install social network analysis software for common CMSs. Based on the above discussion, what should this software monitor? I would suggest:

  1. a visualization mode that makes it easy to tell the activity of moderators from that of non-moderators.
  2. the number of strongly connected components,  making sure that moderators engage each one.
  3. the size of the largest strongly connected component, both absolute and relative to the conversation network, as an indicator of conversation self-sustainability.

What do you think?

I wish to acknowledge the role of the Council of Europe in launching the project and firing me to lead it; of the University of Alicante, the INSITE project and IUAV for supporting my research; and to Giovanni Ponti, Luigi Di Prinzio, Guy Melançon, Benjamin Renoust and Raffaele Miniaci for their invaluable help and generosity. Financial support from the Spanish Ministerio de Economía y Competitividad (ECO2012-34928) is gratefully acknowledged.