The Data Plan

Which Data Should You Publish First?

Which data do you publish first, and which do you focus on later? There is no standard formula that applies to all organizations. However, our aim in this chapter is to present with you with a set of practical guidelines about what data to publish and in what format. You can then apply these guidelines to your own context. In addition, we will provide you with examples of choices made by open data leaders so you can learn from their successes.

Data Release Guidelines: “What Data Should I Publish?”

We advocate a very deliberate approach to open data publishing. Requests for a data set you hadn’t thought of may come along the way, but we believe it’s best that you at least align your data release plans with your goals to start. This gives the project purpose, discipline, and measurability.

We suggest the following eight step approach. You can apply it again and again as you go through different cycles in your open data initiative.

In an ideal world, all of your organization’s public data would be online, accessible, usable for everyone, and available in formats that developers can reuse. Progressive open data policies, such as New York City’s, foster openness and ensure that every department and agency participate.

However, even with a strong mandate, open data takes work. Here is an approach you can follow to get your data available to the public, step by step, so that you maximize outcomes and cost savings along the way.

8 Steps to a Successful Data Plan

  1. Identify the data that supports your strategic goals.

    As mentioned in Guidelines for Goal Setting, open data should support your existing initiatives and strategic goals, whether at a department level or organization-wide. Start with the recommendations in that section.

  2. Adapt your open data goals to your local context.

    Also in Guidelines for Goal Setting, we presented practical examples of how you can adapt your data release plans to your local needs. Whether it’s information about extreme weather or environmental issues, or health events like a flu outbreak, open data helps your community build tools to plan better.

  3. Start with the data already on your site.

    Open data may be new, but publishing data online is not. You already have a wealth of valuable data on your .gov site(s) in various formats. A lot of them are available via PDFs, Excel spreadsheets, various offline database extracts, shapefiles, and KML files. These are all excellent candidates for your open data project. Additionally, some of the most valuable datasets are tightly coupled with your Web apps, most of which were built many years ago when “Web-ifying” everything was the order of the day.

    We highly recommend using a systematic approach that consists of three steps:

    1. Separate the data from the legacy Web application used to surface it.

    2. Publish that data on your open data platform, which gives you an instant API to that data.

    3. Reuse your own API to modernize your legacy apps, or retire them in favor of community-developed apps or a purpose-built, cloud apps from Web 2.0 companies like Socrata and others.

  4. Analyze your site traffic.

    Your constituents are your best guides. You can figure out what data they find most useful by using simple Web analytics software like Google Site Analytics. Granted, if you operate in a decentralized environment, you might have many websites to analyze, but your web managers should be able to provide you with information about what people find most useful based on Web traffic.

  5. Analyze your FOIA and public information requests.

    One of the goals of an effective open data strategy is to proactively provide the information people are looking for. Not only will this increase citizen satisfaction, but it can also help reduce the costs of repetitively handling every information request in a high-touch process. As part of your internal collaboration with public information officers, clerks, and disclosure teams, we recommend that you create a prioritized list of commonly requested information and be sure to post it to your open data portal. Create a self-service experience for citizens.

  6. Request feedback from citizens.

    You can get feedback from citizens in several ways, including your existing channels for public engagement and community feedback. As our interview with civic hacker Derek Eder in Chicago points out, efforts by the open government team in Chicago to connect with citizens and developers have paid huge dividends in making data releases more impactful.
    In addition, your open data site should provide a clear way for your constituents to request data you haven’t yet published. This can be done openly and in a managed environment that you control.

  7. Interview your co-workers.

    Every department or agency in your organization will have some great ideas on what information they would like to share with the public. Often, your colleagues will not think of this as data, but will gravitate towards Web and mobile information that can help them execute on their service mandate. Asking them what data they would like to provide on their agency websites, visualize, “appify,” and make useful to their own customers and partners will drive their participation and increase the flow of data.

  8. Don’t reinvent the wheel. Copy what works.

    Section 3 in this chapter gives more specific advice about copying what works. What have pioneers done that has proven successful? What data is fueling innovative apps? What’s being recognized by the press as valuable? The key is to copy liberally, shamelessly, and build on the success of others.

What Are Open Data Leaders Publishing?

Towed vehicles. Restaurant health inspections. Outstanding warrants. The leaders of the open data movement have learned through trial and error which datasets the public finds most useful. You can follow their example when publishing your data. See this adjacent list of datasets published by major contributors to the open data movement, like the city of Edmonton, the state of Oregon, and Kenya.

Powered by Socrata

Data Format and Standards

One of the most common questions we hear is, “In what formats should I publish my data?” This is a great question. The open data ethos says to publish data in open, machine-readable formats. That means, among other things, no PDFs or other closed formats as the default.

Machine-Readable Formats

We believe that to make the machine-readable requirement easy to follow, we should let machines figure out how to take any data source and output it in all the needed formats. This is why the Socrata platform uses data serialization technology. This takes any piece of structured data on the platform and makes it automatically available in the following formats:

  • Comma-Separated Values (CSV)
  • Tab-Separated Values (TSV)
  • Excel XLS/XLSX
  • Extensible Markup Language (XML)
  • JavaScript Object Notation (JSON)
  • Resource Description Format (RDF)
  • RDF Site Summary (RSS feeds)

Open Data Standards

A key to success for your open data initiative is the adoption of open data standards in three areas:

  • Data Catalog Interoperability. Enable universal federation of different open data catalogs using a standard catalog schema, based on the W3C Data Catalog Vocabulary (DCAT).

  • Data Portability Based on Standard Data Formats. Standardize outputs including JSON, XML, and CSV, as well as RDF and other Linked Data standards. The goal is to move towards standard schemas that developers can use for popular datasets, based on real-world examples and collaboration between data publishers.

  • Application Portability Based on Open Data API Standards. Standardize the application programming interfaces (APIs) used to programmatically access open data, using established paradigms and protocols such as REST, HTTP, and Structured Query Language (SQL).

Read more on the open data standards discussion.

Containers, Facebook, Baseball & the Dark Matter around Open Data

Read this blog post by David Eaves for further insight into open data standards.

Application Programming Interfaces (APIs)

One of the fundamental tenets of open data is reuse. When you publish your data, it is always with the intent that somebody else can reuse that data to create added value. Application programming interfaces (APIs) provide a modern, inexpensive, and scalable way for organizations to expose their data to external systems, third-party developers, and partner organizations.

APIs have become a key requirement for successful open data initiatives. This is especially true for large datasets, or frequently changing data like 911, 311, transportation, and many other near real-time applications.

“A decade ago, businesses were still working to understand the importance of having a website. Today, businesses need to understand the importance of the API.”

Kin Lane, API Evangelist

Human Accessible Interfaces

Open data is not just about machines consuming data. The primary goal is to make the data accessible and to create the interface or experience that makes the data most useful to those who need it.

For data to reach the broadest possible audience and be immediately useful to your constituents, anyone, regardless of technical proficiency, needs to access it and use it the same way they access and use other information on the Web. In the An Outstanding Citizen Experience chapter in this guide, the “Curating the Data Experience” section will give you a detailed breakdown on how to create a winning interactive experience for your users.