Google Refine lets you fix and handle huge, messy sets of data

By Erez Zukerman, Download SquadNovember 17, 2010 at 10:30AM

Filed under: ,

googlerefine

Google has just introduced a new product, and this time it’s a PC application (with a browser-based UI). It’s called Google Refine, and it solves a problem that is enormous for some people: it lets you take massive sets of “messy data” and massage them into shape so that they’re uniform, make sense, and can be statistically analyzed.

The video after the jump shows a very good example, which is based on a CSV file exported from a publicly available data source (a government contract system, in this case). The data is very realistic – descriptions are inconsistent (Firm Fixed Price on some rows and FFP on other rows), and even the number formats are inconsistent (you get 0.78 on one row and a number in the millions on another row).

Google Refine lets you very easily hone in on those inconsistencies and fix them in a myriad of ways. This is an important data tool because those heaps of messy data are often public records, which are available but not transparent; being able to quickly analyze them could expose some very interesting patterns and anomalies in the way that public institutions and governments behave.

[Thanks, Yanksy, for the tip!]

Google Refine lets you fix and handle huge, messy sets of data originally appeared on Download Squad on Wed, 17 Nov 2010 10:30:00 EST. Please see our terms for use of feeds.

Read | Permalink | Email this | Comments

Add to digg
Add to del.icio.us
Add to Google
Add to StumbleUpon
Add to Facebook
Add to Reddit
Add to Technorati