My Visit to Socrata, and Data Analysis About Data Analysis

August 30, 2013 8:00 am PDT | Data as a Service

By Thomas Levine


A couple months ago, I downloaded the metadata files for all of the datasets on 60 Socrata open data portals. Interesting things happen when you do craziness like this.

Last Friday, I visited Socrata in Seattle. The most important part of my visit was of course the teriyaki lunch; it was so important that Developer Evangelist Chris Metcalf sent me a Google Calendar invitation for it. They’ve been doing this for about 19 months and have only missed two weeks!

Of secondary importance was my discovery of a glowing ampersand.


(I like ampersands.)

I also met everyone, talked about my findings, and learned a lot about the implementation of open data portals.


After my talk, VP of Product Saf Rabah asked me why I had done all this study of open data. That was a good question; I regularly forget that I tend to do strange things.

I think most things do better when knowledge is shared. I run and write free software, I post most of my work quite publicly, and I pay attention to media licensing. I read copyright law when I was in high school, and I started a Free Culture Foundation chapter in college. Naturally, I also care about open data. But that still doesn’t quite explain it.

Hackathon Apps

At the first hackathon for the NYC Big Apps competition, Ashley Williams and I started talking about the sort of apps that come out of contemporary hackathons. We quickly decided to algorithmically generate random hackathon apps.

We noticed that many apps follow a search-for-things-on-a-map paradigm. That is, you fill out a search form and get results on a map. We automated the creation of the cliche hackathon apps in AppGen (being cliche hackathon apps, AppGen-generated apps also broke a month after being created.)

Dataset Dataset

In building AppGen, I wound up downloading all of the datasets on the New York City data portal. I generalized it to run on any Socrata Open Data Portal, and then I downloaded all of the metadata about all of the datasets on 60 Socrata portals.

I started treating datasets as data points and doing analysis across datasets. I think this is the main novelty of all my open data studies: I’ve just been quantifying things that other people hadn’t thought to quantify.


Someone recently suggested that I’m generally interested in understanding how information gets created and shared.

Fun related story: this one time, I posted “EMERGENCY EXIT ONLY ALARM WILL SOUND” signs on doors that weren’t emergency exits only. Those emergency signs are scary! Even I was afraid of opening the doors afterwards! I continue to be intrigued by how signs are seen as authoritative sources of information even though anyone can put up a sign.

When so much information passes through a central and open resource like a data portal, we acquire rich data about how different people produce and interpret information. This ecosystem of open data doesn’t just give us more data to analyze; it also allows us to analyze data analysis.

