Tag Archives: Drupal

Near-real time network analysis with Python and Tulip

Tulip network

Regular readers may remember that I am trying to build software for near- real time network analysis of online conversations – think Google Analytics, but focused on relationships between users of a website rather than counting pageviews. I hope to use it to research the mathematical signatures of different social phenomena that happen online. My overarching goal is contributing towards maintaining conversations healthy and useful at a large scale, and so finally make participatory democracy work at the level of the large city, the nation state, even the planet.
I have now a new strategy, based on a network analysis package called Tulip.

Why Tulip? I have been using Gephi so far, and there is much to be said for Gephi – it has a larger community than Tulip’s and more functionalities. The answer is: Python. The core of Tulip, stripped of the GUI, is a Python library. This means that I can have a very short chain to monitor changes in the network representing a conversation. My favorite configuration would look something like this:

  • The analysis concerns a community website running on Drupal + MySQL. 
  • A module called views_datasource gives the core module Views  the ability to query the database and export the results of the query as a JSON file.
  • A Python script, armed with the JSON Python library (which is the part I am writing now) parses the JSON and maps elements in the database to relationships, i.e. directed edges from some entity to some other. Which elements you want to map onto which relationships depends on the problem you are investigating, but once you set it up you can simply re-run the same View and refresh your JSON dataset with new elements.
  • With the Tulip library loaded into Python, the same script can also compute network metrics and visualize the graph.
  • The script’s output can be visualized as a web page, through a standard web server. A library such as sigma.js can  take care of visualizing the graph.

The number crunching can be done server-side in just one stop. The Python script calls a curl of the Drupal view, and loads the JSON dataset. Then it does the parsing, the building of the network and the computing of network metrics (and possibly non-network metrics too, which is another thing I am working on now. More on this in forthcoming posts) in just one pass. It then passes the results to a web server for packaging into a dashboard of some kind and some pretty visuals (many networks are beautiful!).

To be fair, Python + Tulip is not the only solution. Gephi is available as a Java library (known as “the toolkit” to Gephi enthusiasts), so you could build a similar workflow with Java + Gephi. I chose Python + Tulip because I can now do a little Python (and absolutely no Java) and because Guy Melançon, Benjamin Renoust, Bruno Pinaud, Marie-Luce Viaud at University of Bordeaux and INRIA are such great collaborators. They like Tulip and I like them, so Tulip it is 🙂

Zen and the art of website procurement: why bureaucrats should get their hands dirty with technology

In the past few years I worked for several public sector agencies. Much of my work consists of thinking up and delivering projects that happen mostly through Internet channels. This is a good time to take a step back and muse on what I learned. As always, the most valuable lessons come from mistakes made – so it’s a good thing I made a lot of them.

  • Software-as-service is a bad idea, though there are exceptions. My team and I made this mistake with Kublai, as we decided to deploy our platform on Ning. That allowed us to be up and running in half an hour, no small advantage; but we paid for it by sequestering our own database, procured and paid for by the Italian government, and handing it over to an American private company forever. A year later, Ning changed its CEO and business model: it moved its platform from the open source to the full copyright domain, disabled APIs and blocked migration tools. Just to do a network analysis, Ruggero Rossi had to write a web crawler – a bit like picking the lock to the door of our own home. It could have been worse: we were using a free service (that was before Ning rolled out pricing plans). If the company had simply shut down business, formatted the hard drives and walked away we could not have stopped them, since we were not in any contractual agreement. They would not even answer our emails. I am never going to even consider again rolling out a public sector project in which my agency does not have root access to the server hosting the database.

  • Using proprietary software is not a good idea either, again with some exceptions. It is expensive and it amounts to a open-ended commitment to your supplier. If a large software house develops custom software for you and then sells you the license, no one, except that same supplier, is ever going to be able to tweak that code. You risk finding yourself disempowered and stuck in a situation in which changing the color of the background or the font is expensive (as in billable hours expensive) and involves a lot of administrative friction. Furthermore, it is politically questionable: proprietary software is not reusable for free by other administrations, and that is not good – especially in a time of budget cuts and of (justified) skepticism vis-a-vis the effectiveness of administrations in spending taxpayer money.

  • That leaves free/open source software. I have been using WordPress in public sector projects since 2007; for the Edgeryders platform, more or less finished as of this week, my team ventured into Drupal. Working with open source software can be hard and frustrating. Features that are supposed to work simply installing a module or a plug-in turn out to have horrible bugs in practice; everything takes longer that you think; most of the work is not developing, but debugging. Meanwhile, the rest of the projects activities are stalled. It feels horrible. I think experience can mitigate the problem, but never really solve it. Free software is by definition organic and gritty: it works by hacks and duct tape as well as by elegant, rational solutions.

Despite all the problems, my experience of Drupal procurement is going to be positive in the end, as with WordPress before. The reason is this: these platforms allow, and even require, a hybrid figure of “power admin” to emerge, somebody who is less skilled than a developer but more so than a normal user. This happens because the admin interfaces of WordPress and Drupal are intuitive and very powerful; Drupal, especially, allows fine-grained control over your website. You can query the database, format the return of the query and send it to a page, a block or even an email message; you can tell the website to execute instructions of the kind IF [condition] THEN [action], not quite programming but on the border. Furthermore – and here I am thinking about my standing love affair with WordPress – when the admin interface is not enough, it is easy to find online resources and tutorials to get your hands into non-core parts of the code. I am technically incompetent, but still I have been able to teach myself to tweak the CSS in a blog’s style sheet, and even the PHP code for very simple tasks, like assigning different headers to different page or inserting a line of Javascript. That required a small-ish investment, to which the proliferation of “For Dummies” books in my library is testimony. This gives you an incredibly important freedom: that of developing in a quick-and-dirty fashion, launching, and then just keep tweaking as your project evolves. Trust me, you will feel the need from day one.

Here’s the trick: the hackerish power admin role is perfect for a public servant that needs to procure software. Getting to know the architecture of these platforms well and to take full advantage of their scope for customization does not make you developer, but it does mean being able to have a constructive conversation with your developers, get real on what can and can’t be done, how long it takes and how much it costs. Furthermore, a power admin can rethink her goals in terms of the software, and so come up with highly sophisticated terms of service for the procurement effort. For example, on Edgeryders we need to constantly reinvolve users in the conversation: this is done through email notifications and the recent activity feed. In Drupal, these functionalities are carried out by certain non-core modules. If the public servant knows this, she can procure not “a website that feels buzzing”, but “a website in which the activity stream module logs activities that are not logged out-of-the-box”, that is much clearer

When I got into motorcycle riding, I read the obligatory Zen and the Art of Motorcycle Maintenance. The lesson of that book is the following: the act of driving a motorbike is not really separable from that of doing its maintenance. “Romantic” bikers, who do not enjoy getting their hands dirty, don’t accept this, and delegate to professional mechanics even the simplest maintenance operation. But they pay the price of disempowerment, when their machines stop by the roadside and won’t get started again, and they don’t have a clue what’s wrong and how to fix it. This system failure can be disastrous in public policy: in the projects I manage technology typically accounts for less than 10% of the budget, yet if the technology is not there the entire project grinds to a halt.

Summing up, high quality procurement is impossible until you know what you are buying. In my experience the free/open source software community is up for sharing its knowledge; corporates producing proprietary software much less so. If, like me, you find yourself in the position of procuring a simple technological solution for the public sector, I recommend you turn to this community, arm yourself with patience and get your hands dirty with the technology the developers intend to use. Install and configure sandbox sites, add functionalities, tweak their look and feel. Spend time with hackers, show that you are eager to learn, an grill them with questions. Above all, don’t yield to the temptation of going “this is not my job, just make it work and send me the invoice”. It doesn’t work like that. This is very time consuming, but you will save that time, with interests, once you are in production. I know it’s not a perfect system, but it is still better than available alternatives. Truth be told, I think it would be really useful if somebody started a course of website procurement for public servants. Anybody out there is interested? I would certainly sign up.

Thanks Freddy Mascheretti, Ivan Vaghi, Paolo Mainardi and Claudio Beatrice for their patience and suggestions