Author Archives: Alberto

Photo credit: McTrent on flickr.com

What counts as evidence in interdisciplinary research? Combining anthropology and network science

Intro: why bother?

Over the past few years, it turns out, three of the books that most influenced my intellectual journey were written by anthropologists. This comes as something of a surprise, as I find myself in the final stages of a highly quantitative, data- and network science heavy Ph.D. programme. The better I become at constructing mathematical models and building quantitatively testable hypotheses around them, the more I find myself fascinated by the (usually un-quantitative) way of thinking great anthro research deploys.

This raises two questions. The first one is: why? What is calling to me from in there? The second one is: can I use it? Could one, at least in principle, see the human world simultaneously as a network scientist and as an anthropologist? Can I do it in practice?

The two questions are related at a deep level. The second one is hard, because the two disciplines simplify human issues in very different ways: they each filter out and zoom in to different things. Also, what counts as truth is different. Philosophers would say that network science and anthropology have different ontologies and different epistemologies. In other words, on paper, a bad match. The first one, of course is that this same difference makes for some kind of added value. Good anthro people see on a wavelength that I, as a network scientist, am blind to. And I long for it… but I do not want to lose my own discipline’s wavelength.

Before I attempt to answer these questions, I need to take a step back, and explain why I chose network science as my main tool to look at social and economic phenomena in the first place. I’m supposed to be an economist. Mainstream economists do not, in general, use networks much. They imagine that economic agents (consumers, firms, labourers, employers…) are faced with something called objective functions. For example, if you are a consumer, your objective is pleasure (“utility”). The argument of this function are things that give you pleasure, like holidays, concert tickets and strawberries. Your job is, given how much money you have, to figure our exactly which combination of concert tickets and strawberries will yield the most pleasure. The operative word is “most”: formally, you are maximising your pleasure function, subject to your budget constraint. The mathematical tool for maximising functions is calculus: and calculus is what most economists do best and trust the most.

This way of working is mathematically plastic. It allows scholars to build a consistent array of models covering just about any economic phenomenon. But it has a steep price: economic agents are cast as isolated. They do not interact with each other: instead, they explore their own objective functions, looking for maxima. Other economic agents are buried deep inside the picture, in that they influence the function’s parameters (not even its variables). Not good enough. The whole point of economic and social behaviour is that involves many people that coordinate, fight, trade, seduce each other in an eternal dance. The vision of isolated monads duly maximising functions just won’t cut it. Also, it flies in in the face of everything we know about cognition, and on decades of experimental psychology.

The networks revolution

You might ask how is it that economics insists on such a subpar theoretical framework. Colander and Kupers have a great reconstruction of the historical context in which this happened, and how it got locked in with university departments and policy makers. What matters to the present argument is this: I grasped at network science because it promised a radical fix to all this. Networks have their own branch of math: per se, they are no more relevant to the social world than calculus is. But in the 1930s, a Romanian psychiatrist called Jacob Moreno came up with the idea that the shape of relationships between people could be the object of systematic analysis. We now call this analysis social network analysis, or SNA.

Take a moment to consider the radicality and elegance of this intellectual move. Important information about a person is captured by the pattern of her relationships with others, whoever the people in question are. Does this mean, then, that individual differences are unimportant? It seems unlikely that Moreno, a practicing psychiatrist, could ever hold such a bizarre belief. A much more likely interpretation of social networks is that an individual’s pattern of linking to others, in a sense, is her identity. That’s what a person is.

Three considerations:

  1. The ontological implications of SNA are polar opposites of those of economics. Economists embrace methodological individualism: everything important in identity (individual preferences, for consumer theory; a firm’s technology, in production theory) is given a priori with respect to economic activity. In sociometry, identity is constantly recreated by economic and social interaction.
  2. The SNA approach does not rule out the presence of irreducible differences across individuals. A few lines above I stated that an individual’s pattern of linking to others, in a sense, is her identity. By “in a sense” I mean this: it is the part of the identity that is observable. This is a game changer: in economics, individual preferences are blackboxed. This introduces the risk of economic analysis becoming tautologic. If you observe an economic system that seems to plunge people into misery and anxiety, you can always claim this springs directly from people maximising their own objective functions because, after all, you can’t know what they are. This kind of criticism is often levelled to neoliberal thinkers. But social networks? They are observable. They are data. No fooling around, no handwaving. And even though there remains an unobservable component of identity, modern statistical techniques like fixed effects estimation can make system-level inferences on what is observable (though they were invented after Moreno’s times).
  3. Moreno’s work is all the more impressive because the mathematical arsenal around networks was then in its infancy. The very first network paper was published by Euler in 1736, but it seems to have been considered a kind of amusing puzzle, and left brewing for over a century. In the times of Moreno there had been significant progress in the study of trees, a particular class of graphs used in chemistry. But basically Moreno relied on visual representation of his social networks, that he called sociograms, to draw systematic conclusions.

By Martin Grandjean (Own work), strictly based on Moreno, 1934 [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons

With SNA, we have a way of looking at social and economic phenomena that is much more appealing than that of standard economics. It puts relationships, surely the main raw material of societies and economies, right under the spotlight. And it is just as mathematically plastic – more, in fact, because you can more legitimately make the assumption that all nodes in a social network are identical, except for the links connecting them to other nodes. I embraced it enthusiastically, and spent ten years teaching myself the new (to me) math and other relevant skills, like programming and agent-based modelling.

Understanding research methods in anthropology

As novel as networks science felt to me, anthropology is far stranger. From where I stand, it breaks off from scholarship as I was trained to understand it in three places. These are: how it treats individuals; how it treats questions; and what counts as legitimate answers.

Spotlight on individuals

A book written by an anthropologist is alive with actual people. It resonates with their voices, with plenty of quotations; the reader is constantly informed of their whereabouts and even names. Graeber, for example, towards the beginning of Debt introduces a fictitious example of bartering deal between two men, Henry and Joshua; a hundred pages later he shows us a token of credit issued by an actual 17th century English shopkeeper, actually called Henry. This historical Henry did his business in a village called Stony Stratford, in Buckinghamshire. The token is there to make the case that the real Henry would do business in a completely different way than the fictional one (credit, not barter). 300 pages later (after sweeping over five millennia of economic, religious and cultural history in two continents) he informs us that Henry’s last name was Coward, that he also engaged in some honourable money lending, and that he was held in high standing by his neighbours. To prove the case, he quotes the writing of one William Stout, a Quaker businessman from Lancashire, who started off his career as Henry’s apprentice.

To an economist, this is theatrical, even bizarre. The author’s point is that it was normal for early modern trade in European villages to take place in credit, rather than cash. Why do we need to know this particular’s shopkeeper’s name and place of establishment, and the name and birthplace of his apprentice as well? Would the argument not be even stronger, if it applied to general trends, to the average shopkeeper, instead of this particular man?

I am not entirely sure what is going on here. But I think it is this: to build his case, the author had to enter in dialogue with real people, and make an effort to see things through their eyes. Ethnographers do this by actually spending time with living members of the groups they wish to study; in the case of works like Debt he appears to spend a great deal of time reading letters and diaries, and piecing things together (“Let me tell you how Cortés had gotten to be in that predicament…”). If the reader wishes to fully understand and appreciate the argument, she, too, needs to make that effort. And that means spending time with informants, even in the abridged form of reading the essay, and getting to know them. So, detailed descriptions of individual people are a device for empathy and understanding.

All this makes reading a good anthro book great fun. It also is the opposite of what network scientists do: we build models with identical agents to tease out the effect of the pattern of linking. Anthropologists zoom in on individual agents and make a point of keeping track of their unique trajectories and predicaments.

Asking big questions

Good anthropologists are ambitious, fearless. They zero in on big, hairy, super-relevant questions and lay siege to them. Look at James Scott:

I aim, in what follows, to provide a convincing account of the logic behind the failure  of some of the great utopian social engineering schemes of the twentieth century.

That’s a big claim right there. It means debugging the whole of development policies, most urban regeneration projects, villagization of agriculture schemes, and the building of utopian “model cities” like Kandahar or Brasilia. It means explaining why large, benevolent, evidence-based bureaucracies like the United Nations, the International Monetary Fund and the World Bank fail so often and so predictably. Yet Scott, in his magisterial Seeing Like a State, pushes on – and, as far as I am concerned, delivers the goods. David Graeber’s own ambition is in the title: Debt – The first 5,000 years.

Economists don’t do that  anymore.You need to be very very senior (Nobel-grade, or close) to feel like you can tackle a big question. Researchers are encouraged to act as laser beams rather than searchlights, focusing tightly on well-defined problems. It was not always like that: Keynes’s masterpiece is immodestly titled The General Theory of Employment, Interest and Money. But that was then, and now it is.

What counts as “evidence”?

Ethnographic analysis – the main tool in the anthropologist’s arsenal – is not exactly science. Science is about building a testable hypothesis, and then testing it. But testing implies reproducibility of experiments, and that is generally impossible for meso- and macroscale social phenomena, because they have no control group. You cannot re-run the Roman Empire 20 times to see what would have happened if Constantine had not embraced the christian faith. This kind of research is more like diagnosis in medicine: pathologies exist as mesoscale phenomena and studying them helps. But in the end each patient is different, and doctors want to get it right this time, to heal this patient.

How do you do rigorous analysis when you can’t do science? When I first became intrigued with ethnography, someone pointed me to Michael Agar’s The professional stranger. This book started out as a methodological treatise for anthropologists in the field; much later, Agar revisited it and added a long chapter to account for how the discipline had evolved since its original publication. This makes it a sort of meta-methodological guide. Much of Agar’s argument in the additional chapter is dedicated to cautiously suggesting that ethnographers can maintain some kind of a priori categories as they start their work. This, he claims, does not make an ethnographer a “hypothesis-testing researcher”, which would obviously be really bad. When I first read this expression, I did a double take: how could a researcher do anything else than test hypotheses? But no: a “hypothesis-testing researcher” is, to ethnographers, some kind of epistemological fascist. What they think of as good epistemology is to let patterns emerge from immersion in, and identification with, the world in which informants live. They are interested in finding out “what things look like from out here”.

It sounds pretty vague. And yet, good anthropologists get results. They make fantastic applied analysts, able to process diverse sources of evidence from archaeological remains to statistical data, and tie them up into deep, compelling arguments about what we are really looking at when we consider debt, or the metric system, or the particular pattern with which cypress trees are planted in certain areas. A hard-nosed scientist will scoff at many of the pieces (for example, Graeber writes things like “you can’t help feeling that there’s more to this story”. Good luck getting a sentence like that past my thesis supervisor), but those pieces make a very convincing whole. To anthropologists, evidence comes in many flavours.

Coda: where does it all go?

You can see why interdisciplinary research is avoided like the plague by researchers who wish to publish a lot. Different disciplines see the world with very different eyes; combining them requires methodological innovation, with a high risk of displeasing practitioners of both.

But I have no particular need to publish, and remain fascinated by the potential of combining ethnography with network science for empirical research. I have a specific combination in mind: large scale online conversations, to be harvested with ethnographic analysis. Harvested content is then rendered as a type of graph called a semantic social network, and reduced and analysed via standard quantitative methods from network science. With some brilliant colleagues, we have outlined this vision in a paper (a second one is in the pipeline) so I won’t repeat it here.

I want, instead, to remark how this type of work is, to me, incredibly exciting. I see a potential to combine ethnography’s empathy and human centricity, anthropology’s fearlessness and network science’s exactness, scalability and emphasis on the mesoscale social system. The idea of “linking as identity” is a good example of methodological innovation: it reconciles the idea of identity as all-important with that of interdependence within the social context, and it enables simple(r) quantitative analysis. All this implies irreducible methodological tensions, but I think in most cases they can be managed (not solved) by paying attention to the context. The work is hard, but the rewards are substantial. For all the bumps in the road, I am delighted that I can walk this path, and look forward to what lies beyond the next turns.

Photo credit: McTrent on flickr.com

 

The Horizon 2020 tribes. Partnership building and network assortativity in European research funding

Highly innovative economies are characterised by intense cooperation between academia and industry. It makes sense: university researchers are good at discovery and invention, industry engineers are good at product and business development. Together, they have more chances of coming up with innovative products and bringing them to market. So, many governments would like to see more of it. They have rolled out policies to encourage academics and business people to work together across the culture chasm.

Horizon 2020 is one such policy. With its 80 billion Euro budget, it is the European Union’s flagship research and innovation funding programme. It is an interesting point of observation on cooperation between industry and academia because of its size, and also because it grants funding not to individual organisations, but to consortia. Each consortium is an opportunity for academia and industry to work together. To what extent do European universities and companies seize those opportunities? How effective is Horizon 2020 in bringing together academia and industry?

With my sisters- and brothers-in-arms in the Spaghetti Open Data community we have tried to address these questions. We started this work as a hackathon track at Open Data Fest, in June 2017. Here’s what we did it and what we found out.

What we did

  1. Fortunately, the data on funding under Horizon 2020 are open. We downloaded the CORDIS dataset from the European Open Data Portal. Our dataset includes 16,592 organisations and 11,068 projects.
  2. We used them to induce a network. Its nodes are the 16,592 organisations. Two organisations are connected by an edge if they participated in at least one project together. There turn out to be 493,014 edges in this network.
  3. We filtered the network to include what we call “stable partnerships”. Two organisations are said to have a stable partnership if they participated together in at least two Horizon 2020 projects. Organisations that have  no stable partners were dropped. This yielded a network with 3,414 nodes, and 46,632 edges. It is important to note that, for computational reasons, there are two edges for each connected pair of organisations (A, B) in the network: one that connects A to B and the other that connects B back to A. Edges can be interpreted as decisions to build a stable partnership: A has decided to participate in more projects in which B is present, and B has made the same decision with regard to A.
  4. CORDIS data distinguish between five types of organisations: private companies (PRC) , higher education establishments (HES), research organisations (REC), public sector (PUB) and others (OTH). With this information, we could look at the patterns of partnership generation within and across types of organisations.

What we learned

Organisations in Horizon 2020 show a marked preference for partnering with other organisations of the same type. This pattern of behaviour is called assortativity, and is common in many social networks. However, it plays out in very different ways across different types of organisations.

Type % edges w/orgs of same type (actual) % edges w/orgs of same type (random) % Difference
PRC 45 40 +5
HES 59 18 +41
REC 38 22 +16
PUB 46 10 +36
OTH 14 8 +6
ALL 46 26 +20

The second column of this table shows how many within-type partnerships we actually observe. Organisations of type PRC (companies) choose to partner up with other PRCs 45% of the times. Organisations of type HES (universities) choose to partner up with other HESs 59% of the times, and so on.

The third column show what these percentages would be if organisations were to chose partners at random from the population of Horizon 2020 participants. Choosing partners at random of course makes no sense: but it gives us a useful mathematical benchmark to compare our observations against. Companies, for example, account for 40% of all the organisations in the stable partnership network: so, if they choose a partner at random, they will pick another company 40% of the times. The difference between observed choice and random choice (45% – 40% = 5%) is a measure of the preference for in-type partnership of each type of organisations.

This preference is strong for the network in general, but weak for companies and very strong indeed for public sector organisations and, especially, universities. You can perceive it visually, by looking at the picture that opens this post: edges are grey when they connect partners of different types. When they connect partners of the same type, take the color of that type, shown in the legend. There are very clear clusters of public sector organisations (yellow) and, right in the center of the action, universities (blue).

These organisations obviously see some advantage in investing mostly on partnerships within their own “tribe”.  This tendency is an indicator the width of the cultural chasm that academics and business people need to overcome if they are to work together.

How effective is the set of incentives incorporated in Horizon 2020 in overcoming it? Not very effective, it turns out. Out of the 46,632 edges in the stable partnership networks, only 3,254 (7%)  involve one company and one university. This is exactly half of the partnerships of this type you would get if organisations were to choose their partners at random. To give a visual appreciation of this, we drew the network, and coloured the edges connecting universities and company in red.

The giant component of the Horizon 2020 stable partnership graph. Red edges encode a partnership between a university and a company.

The giant component of the Horizon 2020 stable partnership graph. Red edges encode a partnership between a university and a company.

Thanks to Open Data Sicilia (especially the mighty Giuseppe La Mensa) and Spaghetti Open Data for organising the hackathon. Thanks to Baya Remaoun, web and data manager at CORDIS, for her support.

Code, data and images are available on GitHub. You can find a more detailed explanation of this and other paths of exploration across the CORDIS dataset on the wiki. You are free to use this post and the GitHub repo under the terms of the respective licenses, but if you want to write a paper about this please consider involving me as a co-author.

The Edgeryders guide to starting a company based on Estonia’s e-residency scheme

In early 2017 my partners and I founded a new company in Estonia, Edgeryders Osaühing. This has become much easier since the launch of Estonia’s e-residency scheme in 2014. We explained our reasons in a separate post. It took quite a lot of research to figure out how to do it: it makes sense to share it, in the hope that it will help those of you considering the same move.

Preparation

  1. Budget time and effort. In theory, starting a company through the Estonian e-residency scheme is fast and low-effort. In practice, it takes time and effort. You are, after all, dealing with a completely new (to you) legal system.  Estonian professionals are still getting up to speed with e-residency: expect glitches and misunderstandings. For us, doing it was one of those important-not-urgent things. We chose to do it on the side, with low- to no disruption of our day-by-day.  It took us about 8 months to go from decision to foundation, and another two months to get the bank account up. We spent about 1,500 EUR, including one year of assistance but excluding the e-residency charges themselves and one trip to Tallinn.
  2. Take some time to understand the e-residency scheme. It’s a novel concept, and many people misunderstand it. The main things you need to know:
    • e-residency is a government-guaranteed digital identity scheme. Through it, e-residents can access Estonian online services such as company foundation, banking, taxation. They can also sign documents and contracts.
    • e-residency does not give you the right to enter Estonia, or live in it.
    • e-residents do not pay personal income tax in Estonia, but in their country of physical residence.
    • e-residency is quite well documented (starting a company through it, not so much). The official website is a good place to start.
  3. Seek help.  Establishing a legal entity is useless unless you can use it to do actual business. As a foreigner, Estonia’s rules and regulation will look alien to you. Also, any company needs a physical legal address in Estonia, and PO boxes are not allowed.  Here is a list of business service providers you can hire to assist you and provide you with an address. If you are starting a one-man business, our Estonian friends recommend Leapin. Anything else will need an accountant: we are small and simple, but Leapin turned us down saying our needs are too sophisticated for their offer. Our accountant is Witismann and Partners, in Tallinn. They have been very patient and helpful. If you contact them, please mention us.
  4. Beware KYC. The Estonian government is committed to fast, frictionless online services. But Estonia still has international obligations to watch out for money laundering operations. In practice, banks and accountants need you to prove your identity and residency before they can take you on as a customer. This process is called Know Your Customer (KYC). Consequences for you: paperwork, with literal paper implied. This was the single most frustrating part of our experience of incorporating in Estonia.
    • At the time of writing, banks in Estonia insist on a visit to one of their branches in person. This should change, so check back on it. Every person who wants to be a user of online banking needs to visit. Our provisional solution: one of us flew to Tallinn, opened the account and got login credentials for the online banking. This makes us operational. We will add other users later.
    • All partners had to go through KYC with the accountant, but here you can do it in remote. We tried several solutions – all bad, because this is bureaucracy at its most dreadful. The least bad we found is this:
    • Authenticate a (paper) photocopy of your passport at an Estonian embassy (see below).
    • Get a utility bill in English (or Russian or Estonian – good luck with that). If this is not possible, ask your bank to write you a letter in English certifying that you have a personal account there. Make sure it shows your physical address.
    • Send your accountant the authenticated copy of the passport and the original of the bill/bank letter.

Execution

  1. Get e-residency. All partners in the company need to become e-residents. For practical reasons, it’s best to also have anyone you want on the management board to do so. Application is online and very simple, and costs 100 EUR. Once it comes through, every e-resident needs to collect her physical e-residency card in person.  Cards are not mailed, and you are not allowed to send someone else in your place. You can do this either in Tallinn or at Estonian diplomatic representations. This can be an inconvenience, because Estonia is a small country with relatively few embassies around the world (list of Estonian diplomatic missions). If you do not have one of them near, get your KYC obligations out of the way on the same day as you pick up your card.  We recommend you make 2-3 copies of your passport and get them authenticated by them while you are there. In Brussels this service is by appointment, and it costs 30 EUR per copy. Also ask the accountant if they insist on other authenticated documents, then get all the authentications done in one visit.
  2. Install, test and debug the software and the hardware. E-residency cards communicate with your computer via a card reader and some software. You should not expect this to Just Work. Take time to install, test and debug. For all its pathfinding ambitions, the Estonian government is still a government, and so it runs mostly on Windows. Mac users should restart their computers before attempting to do anything with e-residency cards. Linux users should do the same, and also look at these improved installation instructions that @Matthias wrote. The cards come with cheap card readers. In general, these work, but I already had to replace mine. Pro tip: the e-residency help desk is really great and very responsive, via email or phone.
  3. Incorporate. You can do this through the Company registration portal. The way we did it: we set up a Skype meeting with the accountant, and they guided us step-by-step. There are three steps.
    • One person fills a form called a “petition”. This is a request to the government to authorise the establishment of the new company.
    • Each shareholder needs to digitally sign the petition. You can save the process at any time in case one of the partners struggles with the tech. The last partner to sign sends the petition to the relevant authority by pressing a button on the site.
    • Pay a tax (at the moment 190 EUR). If you are working with an Estonian accountant (as you should), they take care of this for you. A few days later the government informs you that your company is now live, and you receive a registry number.
  4. Get a bank account. At the moment, three banks support the e-residency scheme, but more should be added. Before you apply to one of the three, make sure you have a business plan ready. They will ask you questions: how much money do you think you will receive? From which countries? What are your estimates based on? Where is your money going to come from? Your accountant can help you smooth things out. In our case, I think it helped a lot that I sent them a recent profit-and-loss statement downloaded from our cloud accounting platform. We bank with LHV Pank. Our Estonian friends recommend it, because it is the only Estonian bank currently in this game, so it is the fastest to adopt government innovations as they come through. But it seems all banks in Estonia offer very competitive conditions and good service.
  5. Register for VAT. Estonia has very low ceilings for operating without a VAT registration, so you will need to do this soon. Beware: whatever your turnover, you are not allowed to file quarterly returns for VAT. VAT is always monthly. To register:
    • Go to this page of the Tax and Customs Board’s website. Download the PDF form called “Application for registration as a person liable to VAT”.
    • Enter the relevant information, sign the file digitally and e-mail it to the to the TCB (kmkr@emta.ee). Processing takes about a week.
  6. (Optional) change your articles of association. When you incorporate,  the E-Registry portal generates standard articles of association for you. You have no way to customise them. But you can do it later. The procedure is this:
    • Download your own articles of association from the e-Business register. To do that, access your company’s record (for example by searching for it). Then select “Documents in the business file” from the “Choose information” drop-down menu. Then add to cart and pay (2 EUR) to download the files. Two of them are the articles of association, in Estonian (pohikiri) and English.
    • Make the changes you need on the English version.
    • Get the changes translated into Estonian. Articles of association have to be in Estonian, sorry. Save this into a PDF.
    • Log into the Company Registration portal. Choose “Submission of application”. Click on “Changing the data of an enterprise ” and choose the name of your company.
    • In the page that opens, click on the button “Start the petition for an entry regarding alteration”. You are taken to a page where you can later any data of the company. Click on “Alter the articles of association”, then  “Add the articles of association as a file”. Select the PDF you saved from you computer and click on “+Add the articles of association as a file”.
    • At this point you have created a petition, like the one you created for incorporation. Next, shareholder need to digitally sign the petition and send it to the authorities for approval.