Category Archives: Network Notebook

Semi-structured notes on my Ph.D. thesis work on network science. For an intro, see http://youtu.be/KKrM2c-ww_k

Salzburg Global Seminar: a social network of session 593

The Horizon 2020 tribes. Partnership building and network assortativity in European research funding

What we did

Fortunately, the data on funding under Horizon 2020 are open. We downloaded the CORDIS dataset from the European Open Data Portal. Our dataset includes 16,592 organisations and 11,068 projects.
We used them to induce a network. Its nodes are the 16,592 organisations. Two organisations are connected by an edge if they participated in at least one project together. There turn out to be 493,014 edges in this network.
We filtered the network to include what we call “stable partnerships”. Two organisations are said to have a stable partnership if they participated together in at least two Horizon 2020 projects. Organisations that have no stable partners were dropped. This yielded a network with 3,414 nodes, and 46,632 edges. It is important to note that, for computational reasons, there are two edges for each connected pair of organisations (A, B) in the network: one that connects A to B and the other that connects B back to A. Edges can be interpreted as decisions to build a stable partnership: A has decided to participate in more projects in which B is present, and B has made the same decision with regard to A.
CORDIS data distinguish between five types of organisations: private companies (PRC) , higher education establishments (HES), research organisations (REC), public sector (PUB) and others (OTH). With this information, we could look at the patterns of partnership generation within and across types of organisations.

What we learned

Organisations in Horizon 2020 show a marked preference for partnering with other organisations of the same type. This pattern of behaviour is called assortativity, and is common in many social networks. However, it plays out in very different ways across different types of organisations.

Type	% edges w/orgs of same type (actual)	% edges w/orgs of same type (random)	% Difference
PRC	45	40	+5
HES	59	18	+41
REC	38	22	+16
PUB	46	10	+36
OTH	14	8	+6
ALL	46	26	+20

The second column of this table shows how many within-type partnerships we actually observe. Organisations of type PRC (companies) choose to partner up with other PRCs 45% of the times. Organisations of type HES (universities) choose to partner up with other HESs 59% of the times, and so on.

The third column show what these percentages would be if organisations were to chose partners at random from the population of Horizon 2020 participants. Choosing partners at random of course makes no sense: but it gives us a useful mathematical benchmark to compare our observations against. Companies, for example, account for 40% of all the organisations in the stable partnership network: so, if they choose a partner at random, they will pick another company 40% of the times. The difference between observed choice and random choice (45% – 40% = 5%) is a measure of the preference for in-type partnership of each type of organisations.

This preference is strong for the network in general, but weak for companies and very strong indeed for public sector organisations and, especially, universities. You can perceive it visually, by looking at the picture that opens this post: edges are grey when they connect partners of different types. When they connect partners of the same type, take the color of that type, shown in the legend. There are very clear clusters of public sector organisations (yellow) and, right in the center of the action, universities (blue).

These organisations obviously see some advantage in investing mostly on partnerships within their own “tribe”. This tendency is an indicator the width of the cultural chasm that academics and business people need to overcome if they are to work together.

How effective is the set of incentives incorporated in Horizon 2020 in overcoming it? Not very effective, it turns out. Out of the 46,632 edges in the stable partnership networks, only 3,254 (7%) involve one company and one university. This is exactly half of the partnerships of this type you would get if organisations were to choose their partners at random. To give a visual appreciation of this, we drew the network, and coloured the edges connecting universities and company in red.

The giant component of the Horizon 2020 stable partnership graph. Red edges encode a partnership between a university and a company.

Thanks to Open Data Sicilia (especially the mighty Giuseppe La Mensa) and Spaghetti Open Data for organising the hackathon. Thanks to Baya Remaoun, web and data manager at CORDIS, for her support.

Code, data and images are available on GitHub. You can find a more detailed explanation of this and other paths of exploration across the CORDIS dataset on the wiki. You are free to use this post and the GitHub repo under the terms of the respective licenses, but if you want to write a paper about this please consider involving me as a co-author.

The quest for collective intelligence: a research agenda

1. Data model: semantic social networks

I submit that the raw data of collective intelligence are in the form of semantic social networks. By this term I mean a way to represent human conversation. The representation is a social network, because it involves humans connected to each other by interactions. And it is semantic, because those interactions encode meaning.

2. Network science: it’s all in the links.

Collective intelligence is not additive: it’s interactional. We can only generate new insight when the information in my head comes into contact with the information in yours. So, what makes a collectivity more or less smart is the pattern of linking across its members. Network science is what allows a rigorous study of that linking, looking for the patterns of interaction which associate to the smartest behaviors.

3. Ethnography: harvesting smart outcomes

Suppose we accept that the hive mind can generate powerful insights and breakthroughs. How can we, individual human beings, lift them from the surrounding noise? Looking at what individual members of the community say and do would likely be fruitless. The problem is understanding how the group represents to itself the issue at hand; no individual you ask will be able to hold all the complexity in her head. We do have a discipline that specializes in this task: ethnography. Ethnographers are good at representing a collective point of view on something. Their skills are useful to understand just what the collective intelligence is saying.

4. “Shallow” text analytics: casting your net wider

Ethnography is like a surgical knife: super sharp and precise. But sometimes you what you need is a machete. As I write this, the opencare conversation consists of over 300,000 words, authored by 137 people. This is a very big study by ethnography standards, and these numbers are likely to double again. We are already pushing the envelope of what ethnographers can process.

So, the next step is giving them prosthetics. The natural tool is text analytics, a branch of data analysis centered on text-as-data. It comes in two flavors: shallow-and-robust and deep-and-ad-hoc. I like the shallow flavor best: it is intuitive and relatively easy to make into standard tools. When the time of your ethnographers is scarce and the raw data is abundant, you can use text analysis to find and discard contributions that are likely to be irrelevant or off topic.

5. Machine learning: weak AI for more cost-effective analysis

Beyond the simplest levels, text analytics uses a lot of machine learning techniques. It comes with the territory: human speech does not come easy to machines. At best, computers can evolve algorithms that mimic classification decisions made by skilled humans. A close cooperation between humans and machines just makes sense.

6. Agent-based modelling: understanding emergence by simulation

We do not yet have a strong intuition for how interacting individuals give rise to emergent collective intelligence. Agent-based models can help us build that intuition, as they have done in the past for other emergent phenomena. For example, Craig Reynolds’s Boids model explains flocking behaviour very well.

The above defines the “long game” research agenda for Edgeryders. And it’s already under way.

I am knee-deep in network science since 2009. We run real-time social network analysis on Edgeryders with Edgesense. We have developed an event format called Masters of Networks to spread the culture beyond the usual network nerds like myself. All good.
We collaborate with ethnographers since 2012. We have developed OpenEthnographer, our own tool to do in-database ethno coding I’d love to have a blanket agreement with an anthropology department: there is potential for groundbreaking methodological innovation in the discipline.
We are working with the University of Bordeaux to build a dashboard for semantic social network analysis.
I still need to learn a lot. I am studying agent-based modelling right now. Text analytics and machine learning are next, probably starting towards the end of 2016.

With that said, it’s early days. We are several breakthroughs short of a real mastery of collective intelligence. And without a lot of hard, thankless wrangling with the data, we will have no breakthrough at all. So… better get down to it. It is a super-interesting journey, and I am delighted and honoured to be along for the ride. I look forward to making whatever modest contribution I can.

Photo credit: jbdodane on flickr.com CC-BY-NC