Tag Archives: evaluation

Is the impact of social innovations measurable?

Last week I was in London for the first jury meeting of the European Social Innovation Competition. We were guests of NESTA; its CEO Geoff Mulgan presided over it. Geoff is, in my humble opinion, one of the most interesting policy makers in Europe: even if we don’t always agree, when he speaks I pay attention. On this occasion, I was stricken by his insistence on the importance on measuring the impact of social innovation initiatives. Rigour and quantitative measurement, he finds, are essential to get rid of the hype and the faking that have deposited on the concept of social innovation over the last few years (his defining phrase: social innovation is “a bulls#!t attractor”).

I agree on the rigour part. On quantitative measurement I have doubts. Social innovators – the real-deal, disruptive ones – want deep change in society, so it does not make sense to assess what they do in terms of the very society they are trying to change, Take, for example, Bitcoin (Wikipedia) – an electronic for of cash designed so as to work in purely peer-to-peer fashion, with no central entity that can manipulate its value. It uses unbreakable cryptography to prevent people from spending their Bitcoins more than once, and it allows (unbreakably) anonymous transactions. I know personally people that advocate for it passionately: they are idealistic, generous folks, moved by the idea that bank-created “fiat money” is inherently flawed. There is only a small problem: unbreakable crypto and anonymous transactions are likely to lead to 100% safe fiscal evasion. Its detractors claim that Bitcoin has the potential to strike at the very heart of states, destroying their ability of imposing taxes. When you run this argument to Bitcoin supporters, many shrug it off: states, they say, are only good insofar as they solve problems for people. If their existence becomes a roadblock to problem solving, well – it might be time to look for something that works better.

Let me attempt to reformulate that. It’s not that these people are diehard revolutionaries. It is that, for the innovator, the status quo has zero value, less than zero in some cases. He or she assesses the impact of what s/he is trying to do in terms of the world that will contain the innovation at hand. Professional evaluators, working for government or private foundations, run their own assessment in terms of the world we have now, and ask the proposed innovation to improve it without changing it too much (they themselves are ruling class in the existing world, and it makes sense for them to treasure it). They are like Henry Ford’s clients that, in his own words, “would have asked for faster horses” because, simply, they could not possibly see clearly the car civilization without giving up important parts of their identity. Or like the Archbishop of Mainz, Gutenberg’s ultimate sponsor (through the “angel investor” Johann Fust), that supported the development of the printing press in the hope of producing an impact (cheap, fast indulgence certificates and bibles) that ended up, ex post, being insignificant; whereas its true impact (democratization of reading and writing, diffusion of heterodox religious material and ultimate victory for Luther’s reformation) would have made him recoil in horror. We consider the printing press a great step forward in the history of humanity, but this is because we are the children of the civilization that the printing press has spawned. We won that particular battle, so we get to write its history – if only because the losers have gone extinct.

Am I exaggerating? I don’t think so. Much of social innovation is out to redesign welfare. Welfare is a very important part of European identity: free and compulsory education, health care, provisions for the socially excluded. This stuff is, therefore, politically explosive. Try telling an Italian or a Swede “hey, looks like mass university is not working. Let’s scrap it and replace it with a system of Massive Open Online Courses”: what you’ll get, more often than not, is not a serene discussion, but an entrenched defense of the values allegedly underpinning the existing system (like “open and fair access to education for all”). Good luck arguing that the system is not particularly good at realizing those values, and that it makes sense to explore alternative routes: you are likely to be treated with suspicion and irritation (“There must be something fishy. Just in case, hands off our mass university”). So, the evaluator of social innovation projects finds herself in an uncomfortable position: if a project is low impact, there is no point in supporting it. But if it is high impact, supporting it could be very dangerous for the society in which the evaluation happens.

How to solve the dilemma? A technical solution could be to separate completely the function of promoting social innovation from that of evaluating it. In this scenario, you’d get a small scene of government agencies and private foundations tasked with maximizing the creative potential of social innovation, with a “take no prisoners” attitude and a complete disregard for existing societal equilibria; and a watchdog filtering out projects that threaten to be too costly in terms of foregone stability. But such a system is likely to be politically untenable – and then forecasting disruptive effects is at a minimum very hard, and could well be impossible even in theory because of positive feedback dynamics. While we wait for a better idea, I am afraid we will have to live with policies for social innovation that promote vanilla ideas and cater to the usual suspects, who stand guard to the existing order.

La valutazione è sopravvalutata?

14 Replies

Gli esperti di politiche pubbliche insistono sulla valutazione quantitativa come meccanismo di accountability. La Commissione Europea si è messa alla testa di una campagna per l’adozione di tecniche di valutazione quantativa anche in aree tradizionalmente “morbide” come la coesione sociale o l’innovazione sociale. Il messaggio è semplice: sono tempi duri per i bilanci pubblici. Se volete che finanziamo qualcosa, dovete spiegare perché questa cosa è più importante di altre. Ha senso. Come potrebbe essere sbagliato?

Eppure non sono convinto. La valutazione ha solide basi teoriche quando misura i risultati nella stessa unità in cui è misurato l’input. Il gold standard in questo senso è il famoso rendimento dell’investimento (ROI): investi dollari. Raccogli dollari. Dividi i dollari raccolti per i dollari investiti. Facile. Se investi dollari per ottenere, diciamo, un aumento della popolazione di aironi di una zona umida, o una riduzione attesa dell mortalità del cancro ai polmoni, le cose iniziano a farsi più complicate. E se cerchi di paragonare un aumento della popolazione di aironi con una riduzione della mortalità da cancro ai polmoni si fanno molto più complicate.

Io lo so bene. Sono un veterano di una battaglia molto simile.

Negli anni 80, pensatori influenti come David Pearce, consigliere per l’ambiente della signora Thatcher, gli economisti dell’ambiente hanno tentato di quantificare il valore economico dei beni ambientali. L’obiettivo era di insegnare all’umanità ad abbandonare l’idea che l’ambiente si possa dare per scontato, e a cominciare a trattarlo come una risorsa scarsa. La roccaforte di questa scena era University College London, dove Pearce dirigeva un centro ricerche e un programma di Master. Mi sono iscritto al secondo nel 1992. Il nostro strumento principale era un’estensione dell’analisi costi-benefici, attrezzo ben collaudato dei valutatori dell’era del New Deal. Avevamo tutta una serie di trucchi intelligenti per tradurre i benefici ambientali in dollari o sterline: prezzi edonici, valutazione contingente, metodo dei costi di viaggio. Una volta convertiti in unità monetarie, costi e benefici ambientali potevano essere confrontati con qualunque cosa, rendendo possibile una valutazione rigorosa. O no?

Spostandoci dalle nostre aule londinesi alla pratica, abbiamo scoperto che le cose erano molto più complicate. Anzitutto c’era un grosso problema teorico: cercavamo di emulare i mercati per valutare i benefici ambientali perché, secondo la teoria economica standard, mercati ben funzionanti assegnano ai beni esattamente i prezzi che massimizzano il benessere collettivo. Sfortunatamente, le condizioni matematiche perché questo si verifichi sono molto restrittive, tanto da non verificarsi praticamente mai nella vita reale. Joseph Stiglitz, uno dei miei economisti preferiti, ha vinto un Nobel dimostrando che, rimuovendo una sola condizione (informazione perfetta e simmetrica), le proprietà virtuose dei mercati collassano completamente. In secondo luogo, anche se siamo disposti a un atto di fede nelle fondamenta teoriche, arrivare a quantificare è difficile. Molto. I dati necessari in genere non sono disponibili, ed è molto costoso generarli, quindi molti ricercatori si rifugiavano nei sondaggi di opinione (chiamati “valutazioni contingenti”, che suona più scientifico). Mossa sbagliata: ci siamo impantanati subito nei paradossi di psicologia cognitiva esplorati in dettaglio da Daniel Kahneman e Amos Tversky, che hanno mostrato in modo conclusivo che gli umani non valutano le cose allo stesso modo dei mercati – e hanno vinto un altro Nobel.

In più c’era una situazione politica molto sfavorevole per questo tipo di ricerche. Gli unici soggetti disposti a finanziare generosamente la valutazione ambientale erano le imprese inquinatrici più grandi e aggressive. Un’intera branca della letteratura è fiorita all’ombra del famigerato naufragio della petroliera Exxon Valdez: a Londra studiavamo i papers degli esperti incaricati da Exxon di produrre una valutazione dei danni causati all’ambiente artico da cento milioni di litri di greggio sversati in mare. Questi esperti avevano i mezzi per fare una valutazione vera, ma quelli che li pagavano non erano esattamente neutrali rispetto ai loro risultati. Non deve essere una situazione facile.

Eppure, valutare si doveva. Quindi ci abbiamo provato. E abbiamo scoperto una cosa interessante: con tutti i limiti, facendo un esercizio di valutazione su un progetto arrivi a capirlo molto meglio. Alla fine si ottiene un risultato, e si è in grado di difenderlo. Purtroppo, questo risultato non è mai uno scalare (tipo “questo lago vale 20 milioni di euro”); prende quasi sempre la forma “se realizzi questo progetto guadagnerai A ma perderai B e C”, con A, B e C misurati in unità completamente diverse e irriducibili. Inoltre, gli unici a imparare davvero da una valutazione sono i valutatori: tutti gli altri vedono solo il risultato finale, e non la logica sofisticata che serve per produrlo.

La causa della valutazione come requisito delle opere pubbliche ha fatto progressi innegabili. La valutazione di impatto ambientale, usata in America fino dagli anni 60, è stata resa obbligatoria in Europa per molti progetti pubblici da una direttiva del 1985. Si è investito. Molti consulenti hanno fatto qualche corso improbabile e si sono messi a vendere valutazioni di impatto ambientale. Questo ha favorito l’avvento di una valutazione oggettiva e sorretta dall’evidenza? Non credo. Anche adesso, venticinque anni dopo, ambientalisti e imprese appaltatrici continuano a combattersi in tribunale, ciascuna brandendo la propria valutazione di impatto ambientale, o semplicemente insistendo che l’altra parte ha fatto fare una VIA non imparziale per sostenere la propria posizione (questo è ciò che sta succedendo sul collegamento ad alta velocità Torino-Lione). Questo non significa che la VIA sia inutile: però significa che non è oggettiva. La promessa di valutazione quantitativa e quindi imparziale era illusoria. Sospetto che questo sia non un caso, ma parte della struttura fondamentale della valutazione: valutare, dopotutto, implica valori. Anche il ROI incorpora una serie di valori: in particolare, implica che tutta l’informazione rilevante è contenuta nei segnali di prezzo, per cui se stai facendo soldi vuol dire che stai aumentando il benessere della società.

Sarei curioso di tentare un approccio alternativo alla valutazione: l’emergenza di una comunità che partecipa a un progetto, contribuisce tempo, porta doni. Per esempio, nel corso di un progetto del Consiglio d’Europa che si chiama Edgeryders, ho registrato un breve video introduttivo in inglese. Un membro della nostra comunità lo ha caricato su Universal Subtitles, ha trascritto l’audio in sottotitoli inglesi e li ha tradotti in spagnolo. Due settimane dopo, erano stati tradotti in nove lingue. Cose così non succedono tutti i giorni ai dipendenti pubblici: il nostro piccolo gruppo di eurocrati ne è stato molto felice, ma soprattutto – insieme all’impegno sulla nostra piattaforma online ai continui apprezzamenti su Twitter e ad altre iniziative di comunità come la mappa dell’impegno civile – l’abbiamo preso come un segnale che stavamo facendo qualcosa di buono. Come una valutazione, un voto espresso in tempo-uomo e impegno. Una valutazione di questo tipo non è un’attività eseguita da un valutatore, ma una proprietà emergente del progetto stesso; e quindi rapida, a basso costo, impietosa nei riguardi dei progetti che non riescono a rendere la propria utilità chiara ai cittadini.

Certo, i progetti costruiti intorno a comunità online come Edgeryders o Kublai si prestano particolarmente bene a essere valutati in questo modo – contengono migliaia di ore, donate dai cittadini, di lavoro umano di alta qualità, un’unità di conto naturale per la valutazione. Ma è un criterio che può essere più generalizzabile di quanto sembri. Di recente un amico, che dirige una piccola azienda di software, mi ha stupito con questa considerazione:

Di questi tempi, metà del lavoro di un programmatore consiste nel far crescere e motivare una comunità su Github.

Quindi non è solo un mio errore di prospettiva: in un numero sempre maggiore di ambiti, la complessità dei problemi è diventata ingestibile a meno che non la affronti con gli strumenti dell’intelligenza collettiva, di sciame. Sempre più sono i problemi che possono – e forse devono – essere concepiti in termini di una comunità online che cresce loro intorno. Se questo è vero, quella comunità può essere usata come base di una valutazione. In realtà dovrebbe essere ovvio: non ho mai conosciuto un ecologo o un assistente sociale che pensi che valutare un impatto ambientale o sociale in termini di ROI abbia il minimo senso. Se riusciamo a inventare un percorso teoricamente solido e a basso costo per la valutazione possiamo e dovremmo sbarazzarci del ROI per le attività nonprofit. Non credo ne sentiremo la mancanza.

Is evaluation overrated?

7 Replies

Policy wonks everywhere insist on hard, quantitative evaluation as an accountability device. The European Commission is spearheading the effort to drive the adoption of quantitative evaluation in traditionally “soft” areas, like social cohesion or social innovation. The message is quite simple: these are tough times for public budgets. You want something funded, you’d better make a strong case for it. It makes sense. How could it be wrong?

And yet, I wonder. Evaluation is theoretically rock-solid when it measures output in the same units as its input. The gold standard of that would be the famed Return on Investment (ROI): invest dollars. Reap dollars. Compute a ratio. Easy. When you invest dollars to reap, say, an increase in the heron population, or in the expected reduction in lung cancer incidence, things start to get blurred. And if you are comparing an increase in the heron population with an expected reduction in lung cancer incidence, they get really blurred.

I should know. I am a veteran of a similar battle.

In the 1980s, led by influential thinkers like the late David Pearce, Mrs. Thatcher’s environmental advisor, environmental economists tried to quantify the economic value to environmental goods. Their goal was to teach humanity to abandon the idea that the environment was there for free, and to start treating it as a scarce resource. This scene had its stronghold at University College London, where Pearce directed a research center and an M.Sc. program. I joined the latter in 1992. Our main tool was an augmentation of cost-benefit analysis, that old evaluation workhorse of the New Deal era. We had all kind of clever hacks to translate environmental benefits into dollars or pounds: hedonic pricing, contingent valuation, travel costs analysis. Once something is measured in money, it can be compared against anything else. Hard-nosed, quantitative evaluation ensued. Or did it?

As we moved on from our London classrooms to practice, we found out things were not nearly that simple. First, we had a very big theoretical issue: we were trying to emulate markets in order to value environmental goods, because, according to standard economic theory, well-behaved markets assign to goods exactly those prices that maximize collective well-being. However, the mathematical conditions for that to hold are very peculiar, such that they are rarely, if ever, observed in real life. Joseph Stiglitz, one of my favorite economists, was awarded a Nobel prize for showing that, removing just one of those conditions (perfect and symmetric information), the properties of the model go down in a big way. But even if you were prepared to take a leap of faith in the underpinning theory, man, getting to those values was hard. Very. Data are usually not available and impossibly expensive to generate, so people resorted a lot to surveys (“contingent evaluation” as we called them – it sounds more scientific). Bad move: that just got us entangled in cognitive psychology paradoxes explored in detail by Daniel Kahneman and Amos Tversky, who showed conclusively that humans simply do not value as (theoretical) markets do – and earned another Nobel.

Then there was very unfortunate politics. Just about the only people prepared to fund generously environmental evaluation were the biggest, baddest polluters. A whole body of literature sprang up as a consequence of the infamous Exxon Valdez oil spill, as Exxon fought in court to avoid having to pay for damages to the Arctic environment: we studied those papers in London. Their authors had the means to do a real evaluation exercise, but the people footing their bill had very strong preferences over its outcome. Not an easy situation.

We certainly succeeded in advancing the cause of evaluation as a requirement. Environmental impact assessment, used in America since the late 1960s, was made a requirement for many public projects by European regulation with a 1985 directive. Money was spent. A lot of consultants took some random course and started offering environmental impact evaluation as a service. But as to bringing about objective, evidence-backed evaluation, I am not so sure. Even now, 25 years later, environmentalists and general contractors are fighting court battles, each wielding their own environmental impact assessment, or simply claiming that the other side has intentionally commissioned a partial EIA to rig the debate (this is happening around the planned high speed rail link from Turin to Lyon). That does not mean EIA is not being useful: it does mean, however, it is not objective. The promise of “hard-nosed evidence” was delusional. I suspect this is fundamental, not only contingent: evalutation implies, you know, values. The ROI embeds a set of values, too: namely, it implies that all the information that matters is embedded in price signals, so you are making money you must be advancing social well-being.

I am curious to try an alternative path to evaluation: the emergence of a community that participates in a project, volunteers time, offers gifts. For example: in the course of a project I manage at the Council of Europe, called Edgeryders, I created a short introductory video in English. A member of our community loaded it onto Universal Subtitles, transcribed the audio into English subtitles and created a first translation into Spanish. Two weeks later, the video had been translated into nine languages, just as a gift. That does not happen every day: it made our lonely bunch of Eurocrats very happy, and – alongside a veritable stream of Twitter kudos, engagement on our online platform and other community initiatives like the map of citizen engagement – we took it as a sign we were doing something right. That’s evaluation: a vote, expressed in man-hours, commitment, good thinking. Such an evaluation is not an add-on activity performed by an evaluator, but rather an emergent property of the project itself; as such, quite likely, very fast, relatively cheap, and merciless in exposing failures to convince citizens of the value the project is bringing to the table.

Granted, online community projects like Edgeryders or Kublai lend themselves particularly well to being assessed this way – they contain thousands of hours of citizen-donated high quality human labor, a quite natural accounting unit for evaluation. But this criterion might be more generalizable than we think, or became so relatively soon. Recently a friend – the CEO of a software development company – astonished me with the following remark:

In the present day and age, half of a programmer’s work is nurturing a community on Github.

So it’s not just me: in more and more areas of human activity, complexity has become unmanageable unless you tackle it by collective “swarm” intelligence. In other words, more and more problems can – and maybe have to – be framed in terms of growing an online community around them. If that is true, that community can be used as a basis for evaluation activities. It should be a no brainer: I have never met an ecologist or a social worker who think that assessing an environmental or social cohesion impact with ROI makes the slightest sense. If we can figure out a theoretically sound and practically feasible path to evaluation we can and should get rid of ROI for nonprofit activities altogether. And good riddance.