New York City Releases Data on Millions of Cab Rides

August 12, 2015 12:00 pm PST | Data as a Service

New York City’s taxicabs are both iconic and ever-present. For residents and tourists, it can easily feel like cabs are everywhere: stretching from the Hudson River to the East River during rush hour, speeding downtown on Broadway, and traversing the bridges and tunnels to the outer boroughs around the clock. The easily recognizable yellow cabs serve the entire city; green cabs, introduced in 2013, wander the outer boroughs for fares.

But for all their ubiquity, how much is really known about the vehicles’ travels? With the release of a massive dataset on August 1, including data from 2014 and the first half of 2015, New York City’s Taxi and Limousine Commission (TLC), in partnership with New York City Department of Information Technology and Telecommunications (DOITT), has made information about taxi rides available publicly for the first time.

The TLC posted the data on millions of cab rides in three ways: downloadable spreadsheets with monthly trip data, on Google’s BigQuery (Google account required), and through the City’s OpenDataPortal, which allows users to easily download and create visualizations of the data. The TLC plans to release data dating back to 2009 in the future.

“Making public data easier to access is a win-win. It’s good for the public and good for government,” says Minerva Tantoco, Chief Technology Officer for the City of New York.

Previously Available Only Through FOIL

This is not the first time that data about New York City’s cabs has been available — in 2014, Chris Whong filled out a FOIL request for the data. The request was quickly fulfilled, although he had to provide his own brand new hard drive and make two in-person visits to the TLC’s office in downtown Manhattan. As Whong, a self-described data junkie and civic hacker, commented at the time, “this data should be open, API accessible, downloadable, and free for all to use. Size and complexity of the data are not an acceptable excuse…” for not making it publicly and readily available.

Ben Wellington, a data scientist who runs the popular blog IQuantNY, agrees, finding FOIL requests “incredibly inefficient,” and adding that “Putting data online makes that process easier for all, including the government agencies themselves.”

Millions of Lines of Data — And More to Come

In a pedestrian-friendly city, where streets and public transportation are easy to navigate, and parking spaces are a fought-over commodity, taxis are an essential part of New York City’s transportation. This makes information about taxi rides very revealing. Wellington comments, “Taxi data gives us an incredibly detailed view of our city. Developers can use it to help people find taxis easier, understand if their neighborhood is getting fair access to transportation, understand the time and cost of getting between any two points in the city by time of day and more.”

The TLC’s release of data from last year and the start of 2015 already amounts to millions of data points, a treasure trove to comb through. Included within the dataset is information on:

  • Pick-up and drop-off date, time, and location
  • Trip distances
  • Fares (which are itemized to include surcharges for extra passengers, tolls, etc.)
  • Passenger counts
  • Payment types

Information about the yellow cabs, which pick up fares across the city’s five boroughs, and the green cabs, which are relegated to the outer boroughs, is stored separately.

How Will the Data Be Used?

After Whong made the data he received from his FOIL request available, Ben Wellington was able to dig in and brainstorm solutions for a problem that plagues New Yorkers: the lack of cabs available to hail at 4pm, when shifts change. But that was just one year’s worth of data, and available only to those who sought it out on Whong’s website.

With more data available, there will be opportunities to evaluate taxi availability, how much cabs contribute to congestion, and general patterns of taxi usage for the years when data is available.

Wellington comments that releasing the data publicly and openly serves an important purpose: it “allows us to get unbiased access to raw data. This is important because often statistics are spun to tell a particular side of a story…. once raw data is introduced, it becomes much harder to spin the numbers in any one way.”

For more details on where New York’s taxis travel, and for how long, explore the dataset online at New York’s OpenData portal.


Browse the dataset


Previous Article
Data as a Service
Governments Move Away From Legacy Technology

August 13, 2015

Next Article
Data as a Service
Highlights from Transform Newsletter, August 12, 2015

August 12, 2015