[cryptome] Re: NYC Taxicab Log Dump

  • From: doug <douglasrankine2001@xxxxxxxxxxx>
  • To: cryptome@xxxxxxxxxxxxx
  • Date: Thu, 26 Jun 2014 19:48:03 +0100

On 26/06/14 18:19, Aftermath wrote:
Dear Aftermath...or is it Bill?

I thought this posting was absolutely brilliant. My congratulations to you for bringing it to our attention and Mr. Wong wh did the initial research. Why? Because...if you think on it, you have saved the NSA, the CIA and the intelligence and security departments of the New York Police Force huge amounts of money on expenditure of resources...and foreign organisations such as the Chinese, the Israelis, the Russians and any ole Tom, Dick and Harry of a security organisation a lot of time. Even the United Nations security and intelligence services will be all agog at your posting when they read it. I hope you are not claiming copyright on it...;-) . At the click of a button, they can all find out who has been doing what in New York....United Nations officials, Ambassadors, envoys, consuls and all sorts of public and civil service officials, secret service, intelligence services, even the private sector. All this metadata available to link up with known associations and links...and all open source too, who visited who and when, who was where and when, with a little help from the smart mobile phone, the most valuable source of i.d. location and contacts, you have provided the world, with a wealth of information. Just think...the amount of money you have saved the world's intelligence organisations. Absolutely f*cking brilliant....My congratulations...
P.S. Who needs privacy and encryption when we have all this stuff about the world leaders in our very own hands. It is all a question of pressing the right button...


from the second link...

Recently, thanks to a Freedom of Information request, Chris Whongreceived and made public a complete dump of historical trip and fare logs from NYC taxis. It’s pretty incredible: there are over 20GB of uncompressed data comprising more than 173 million individual trips. Each trip record includes the pickup and dropoff location and time, anonymized hack licence number and medallion number (i.e. the taxi’s unique id number, 3F38, in my photo above), and other metadata.

These data are a veritable trove for people who love cities, transit, and data visualization. But there’s a big problem: the personally identifiable information (the driver’s licence number and taxi number) hasn’t been anonymized properly — what’s worse, it’s trivial to undo, and with other publicly available data, one can even figure out which person drove each trip. In the rest of this post, I’ll describe the structure of the data, what the person/people who released the data did wrong, how easy it is to deanonymize, and the lessons other agencies should learn from this. (And yes, I’ll also explain how rainbows fit in).

The NYC taxi data consist of a number of CSV-files.....

Other related posts: