[cryptome] NYC Taxicab Log Dump

  • From: Aftermath <aftermath.thegreat@xxxxxxxxx>
  • To: "cryptome@xxxxxxxxxxxxx" <cryptome@xxxxxxxxxxxxx>
  • Date: Thu, 26 Jun 2014 10:19:51 -0700


from the second link...

Recently, thanks to a Freedom of Information request, Chris Whongreceived
and made public a complete dump of historical trip and fare logs from NYC
taxis. It’s pretty incredible: there are over 20GB of uncompressed data
comprising more than 173 million individual trips. Each trip record
includes the pickup and dropoff location and time, anonymized hack licence
number and medallion number (i.e. the taxi’s unique id number, 3F38, in my
photo above), and other metadata.

These data are a veritable trove for people who love cities, transit, and
data visualization. But there’s a big problem: the personally identifiable
information (the driver’s licence number and taxi number) hasn’t been
anonymized properly — what’s worse, it’s trivial to undo, and with other
publicly available data, one can even figure out which person drove each
trip. In the rest of this post, I’ll describe the structure of the data,
what the person/people who released the data did wrong, how easy it is to
deanonymize, and the lessons other agencies should learn from this. (And
yes, I’ll also explain how rainbows fit in).

The NYC taxi data consist of a number of CSV-files.....

Other related posts: