Open Data ROI
After repeatedly debating with open data managers that the number of data sets on a platform should not be a relevant success metric for open data, I’ve come around. I now see the relative value. The quotes abound, from U.S. Chief Information Officer Steve VanRoekel to San Francisco Chief Innovation Officer Jay Nath. My initial argument was that quantity does not equal quality; governments should instead measure the impact of data. Although I still firmly believe that to be true, the number of data sets also demonstrates a sense of scale for new opportunities and is an indicator of government transparency when measured over time.
Consider this benchmark, the Library of Congress had to recreate their holdings after the original items burned in the War of 1812. They did so with the purchase of Thomas Jefferson’s entire collection of 6,487 books. Since then, the Library of Congress has grown to hold approximately 147 million items, with 33 million books that constitute about 15 terrabytes of data. Is there any doubt that the Library of Congress has valuable data inside?
Here’s a personal benchmark, when I was at the Department of Energy, it took an average of 7.5 hours to post one new dataset to Data.gov, the official platform of the U.S. government. Those 7.5 hours would drop a bit when grouped together in bulk and would rise substantially for really interesting data. But as a measure of progress, is it any wonder that the DOE open data stakeholders wanted to see how many data sets were even getting published on a monthly basis? (Side-Note: that cycle time will drop substantially with the President’s recent Executive Order on Open Data.)
Measuring the success of open data is particularly interesting for Socrata . This is not only because we are making sure that our customers are successful, but also because one of our main products is GovStat, an open performance metrics platform. In fact, we recently revised our own definitions for the number of datasets on a platform in response to requests from our customers to more accurately reflect the cloud’s contents. (More information on Socrata’s data set platform metrics available here.)
To put data set counts in perspective, there are three other categories we see as valuable when measuring the open data return on investment:
1. Cost Savings & Avoidance
Governments that break down data silos can remove duplication and increase efficiency of operations. This is translated in both outgoing dollars and internal labor-hours. For example, the Oregon Secretary of State was able to avoid $500,000 in IT costs just by publishing data directly to community groups instead of standing up a duplicative internal data exchange system. Additionally, San Francisco was able to reduce excessive amounts of manual data manipulation by setting up an automatic data connector that linked disparate data systems.
2. Revenue Generation
Open data has the power to create jobs, which creates new government revenue. The President’s recently cited examples of OPower and iTriage stand as archetypal illustrations of the new businesses that can be built on open data and the outside investment they can bring in. However, those that are active in the open data community know that there are many more businesses beyond those two. That is why Socrata is actively working with its partners to map out all of the businesses that leverage its open data portals.
3. The Intangible “You Know It When You See It”
Going back to the Library of Congress example, what exactly is the return on investment for any public library? Even if the ROI is a bit fuzzy, no one has a substantial argument to shut down all physical and digital libraries. That’s because we fundamentally recognize their collective worth. From a political perspective, there is a measurable increase in trust that occurs between elected leaders and citizens when a mayor makes education a priority and then shares actionable information on high school graduation rates. Using open data as a public asset also means that it should be a contributor to results like more affordable health care, improved school systems, safer neighborhoods, and a cleaner planet.
When effectively deployed, an open data platform delivers at least a 10x return on investment. The largest single contributor in the beginning often comes from cost savings and internal efficiency gains. Whatever the measure a government chooses, I now recognize the value of the “Number of Data Sets” metric.