Category Archives: Network Notebook

Semi-structured notes on my Ph.D. thesis work on network science. For an intro, see

The Horizon 2020 tribes. Partnership building and network assortativity in European research funding

Highly innovative economies are characterised by intense cooperation between academia and industry. It makes sense: university researchers are good at discovery and invention, industry engineers are good at product and business development. Together, they have more chances of coming up with innovative products and bringing them to market. So, many governments would like to see more of it. They have rolled out policies to encourage academics and business people to work together across the culture chasm.

Horizon 2020 is one such policy. With its 80 billion Euro budget, it is the European Union’s flagship research and innovation funding programme. It is an interesting point of observation on cooperation between industry and academia because of its size, and also because it grants funding not to individual organisations, but to consortia. Each consortium is an opportunity for academia and industry to work together. To what extent do European universities and companies seize those opportunities? How effective is Horizon 2020 in bringing together academia and industry?

With my sisters- and brothers-in-arms in the Spaghetti Open Data community we have tried to address these questions. We started this work as a hackathon track at Open Data Fest, in June 2017. Here’s what we did it and what we found out.

What we did

  1. Fortunately, the data on funding under Horizon 2020 are open. We downloaded the CORDIS dataset from the European Open Data Portal. Our dataset includes 16,592 organisations and 11,068 projects.
  2. We used them to induce a network. Its nodes are the 16,592 organisations. Two organisations are connected by an edge if they participated in at least one project together. There turn out to be 493,014 edges in this network.
  3. We filtered the network to include what we call “stable partnerships”. Two organisations are said to have a stable partnership if they participated together in at least two Horizon 2020 projects. Organisations that have  no stable partners were dropped. This yielded a network with 3,414 nodes, and 46,632 edges. It is important to note that, for computational reasons, there are two edges for each connected pair of organisations (A, B) in the network: one that connects A to B and the other that connects B back to A. Edges can be interpreted as decisions to build a stable partnership: A has decided to participate in more projects in which B is present, and B has made the same decision with regard to A.
  4. CORDIS data distinguish between five types of organisations: private companies (PRC) , higher education establishments (HES), research organisations (REC), public sector (PUB) and others (OTH). With this information, we could look at the patterns of partnership generation within and across types of organisations.

What we learned

Organisations in Horizon 2020 show a marked preference for partnering with other organisations of the same type. This pattern of behaviour is called assortativity, and is common in many social networks. However, it plays out in very different ways across different types of organisations.

Type % edges w/orgs of same type (actual) % edges w/orgs of same type (random) % Difference
PRC 45 40 +5
HES 59 18 +41
REC 38 22 +16
PUB 46 10 +36
OTH 14 8 +6
ALL 46 26 +20

The second column of this table shows how many within-type partnerships we actually observe. Organisations of type PRC (companies) choose to partner up with other PRCs 45% of the times. Organisations of type HES (universities) choose to partner up with other HESs 59% of the times, and so on.

The third column show what these percentages would be if organisations were to chose partners at random from the population of Horizon 2020 participants. Choosing partners at random of course makes no sense: but it gives us a useful mathematical benchmark to compare our observations against. Companies, for example, account for 40% of all the organisations in the stable partnership network: so, if they choose a partner at random, they will pick another company 40% of the times. The difference between observed choice and random choice (45% – 40% = 5%) is a measure of the preference for in-type partnership of each type of organisations.

This preference is strong for the network in general, but weak for companies and very strong indeed for public sector organisations and, especially, universities. You can perceive it visually, by looking at the picture that opens this post: edges are grey when they connect partners of different types. When they connect partners of the same type, take the color of that type, shown in the legend. There are very clear clusters of public sector organisations (yellow) and, right in the center of the action, universities (blue).

These organisations obviously see some advantage in investing mostly on partnerships within their own “tribe”.  This tendency is an indicator the width of the cultural chasm that academics and business people need to overcome if they are to work together.

How effective is the set of incentives incorporated in Horizon 2020 in overcoming it? Not very effective, it turns out. Out of the 46,632 edges in the stable partnership networks, only 3,254 (7%)  involve one company and one university. This is exactly half of the partnerships of this type you would get if organisations were to choose their partners at random. To give a visual appreciation of this, we drew the network, and coloured the edges connecting universities and company in red.

The giant component of the Horizon 2020 stable partnership graph. Red edges encode a partnership between a university and a company.

The giant component of the Horizon 2020 stable partnership graph. Red edges encode a partnership between a university and a company.

Thanks to Open Data Sicilia (especially the mighty Giuseppe La Mensa) and Spaghetti Open Data for organising the hackathon. Thanks to Baya Remaoun, web and data manager at CORDIS, for her support.

Code, data and images are available on GitHub. You can find a more detailed explanation of this and other paths of exploration across the CORDIS dataset on the wiki. You are free to use this post and the GitHub repo under the terms of the respective licenses, but if you want to write a paper about this please consider involving me as a co-author.

The quest for collective intelligence: a research agenda

I am knee deep into the research work for opencare. I think I am learning new things on how to use collective intelligence in practice. This has far-reaching implications for my own work in Edgeryders, and beyond.  Far beyond, in fact. If we crack collective intelligence, we gain access to a new source of cognition. Forget my own work; this has profound implications for the future of our species. If you think that’s radical, go read the work of cultural evolution scholars, like Boyd, Richerson or Henrich. They think homo sapiens has started a major transition: evolutionary forces are pulling us towards a larger, more integrated “collective brain”. We are en route to becoming to primates what ants are to flies.

Collective intelligence is an elusive concept. It appeals to intuition, but it is hard to define and harder to measure and model. And yet, model it we must if we are to go forward. The good news is: I think I see a possible way. What follows is just a  back-of-the-envelope note, plotting a rough course for the next three years or so.

1. Data model: semantic social networks

I submit that the raw data of collective intelligence are in the form of semantic social networks. By this term I mean a way to represent human conversation. The representation is a social network, because it involves humans connected to each other by interactions. And it is semantic, because those interactions encode meaning.

2. Network science: it’s all in the links.

Collective intelligence is not additive: it’s interactional. We can only generate new insight when the information in my head comes into contact with the information in yours. So, what makes a collectivity more or less smart is the pattern of linking across its members. Network science is what allows a rigorous study of that linking, looking for the patterns of interaction which associate to the smartest behaviors.

3. Ethnography: harvesting smart outcomes

Suppose we accept that the hive mind can generate powerful insights and breakthroughs. How can we, individual human beings, lift them from the surrounding noise? Looking at what individual members of the community say and do would likely be fruitless. The problem is understanding how the group represents to itself the issue at hand; no individual you ask will be able to hold all the complexity in her head. We do have a discipline that specializes in this task: ethnography. Ethnographers are good at representing a collective point of view on something. Their skills are useful to understand just what the collective intelligence is saying.

4. “Shallow” text analytics: casting your net wider

Ethnography is like a surgical knife: super sharp and precise. But sometimes you what you need is a machete. As I write this, the opencare conversation consists of over 300,000 words, authored by 137 people. This is a very big study by ethnography standards, and these numbers are likely to double again. We are already pushing the envelope of what ethnographers can process.

So, the next step is giving them prosthetics. The natural tool is text analytics, a branch of data analysis centered on text-as-data. It comes in two flavors: shallow-and-robust and deep-and-ad-hoc. I like the shallow flavor best: it is intuitive and relatively easy to make into standard tools. When the time of your ethnographers is scarce and the raw data is abundant, you can use text analysis to find and discard contributions that are likely to be irrelevant or off topic.

5. Machine learning: weak AI for more cost-effective analysis

Beyond the simplest levels, text analytics uses a lot of machine learning techniques. It comes with the territory: human speech does not come easy to machines. At best, computers can evolve algorithms that mimic classification decisions made by skilled humans. A close cooperation between humans and machines just makes sense.

6. Agent-based modelling: understanding emergence by simulation

We do not yet have a strong intuition for how interacting individuals give rise to emergent collective intelligence. Agent-based models can help us build that intuition, as they have done in the past for other emergent phenomena. For example, Craig Reynolds’s Boids model explains flocking behaviour very well.

The above defines the “long game” research agenda for Edgeryders. And it’s already under way.

  • I am knee-deep in network science since 2009. We run real-time social network analysis on Edgeryders with Edgesense. We have developed an event format called Masters of Networks to spread the culture beyond the usual network nerds like myself. All good.
  • We collaborate with ethnographers since 2012. We have developed OpenEthnographer, our own tool to do in-database ethno coding I’d love to have a blanket agreement with an anthropology department: there is potential for groundbreaking methodological innovation in the discipline.
  • We are working with the University of Bordeaux to build a dashboard for semantic social network analysis.
  • I still need to learn a lot. I am studying agent-based modelling right now. Text analytics and machine learning are next, probably starting towards the end of 2016.

With that said, it’s early days. We are several breakthroughs short of a real mastery of collective intelligence. And without a lot of hard, thankless wrangling with the data, we will have no breakthrough at all. So… better get down to it. It is a super-interesting journey, and I am delighted and honoured to be along for the ride. I look forward to making whatever modest contribution I can.

Photo credit: jbdodane on CC-BY-NC

Masters of Networks 3: designing the future of online debate

Back in the day, the emergence of the global Internet was saluted with joy and hope by lovers of democracy. Many activists saw an opportunity for an electronic agora, endowed with always-on operations mode and total recall, that would finally deliver an Athenian-style participatory democracy at the planetary scale, and win power to the collective intelligence of people. It turned out things were not so simple. Online communities have been around for at least 30 years: some of them led interesting, deep debates, and even built amazing things like Wikipedia or OpenStreetMap; others, not so much. A large-scale participatory democracy is very far from being realized.

Masters of Networks 3: communities is an event that tries to learn from the experience of 30 years of online debate. Why is debate fruitful and creative in some contexts, sterile and conflictual in others? Are there reliable tests for a debate’s good health? Can we predict how conversations will evolve? We will tackle these questions starting from a key idea: any conversation, both on- and offline, is a network of interactions across humans, i.e. a social network. In the course of the CATALYST project, Wikitalia and its partners have built Edgesense, a simple software for real-time, interactive network analysis of online communities (video demoexample).

Masters of Networks 3: communities is a two-day hackathon for network scientists, active members of online communities and people interested in participatory democracy to get together, discuss these themes and make sense of what we already know about them. We will visualize and analize the networks of several online communities, using the deep knowledge of its active members and moderators as our guiding star; our goal is figuring out what a “healthy” conversation network looks like, and if we can tell them apart from the networks of “sick” conversations (too conflictual, superficial, polarized etc.).

Masters of Networks 2: communities happens in Rome on 10-11 March 2015. Several scientists, developers and community managers from the CATALYST project will attend, but we have set aside about ten places to allow any interested person to participate. In particular, if you are running an online community and would like to visualize and analyze its interaction network, we can probably help – get in touch and we will see what we can do. Participation is free, but registration is necessary – go here to register. The working language will be English.

I will be there. I think this is a central issue; I tried to argue as much in the video below