Data Cleaning IS Analysis, Not Grunt Work.

First let’s start with stating the problem with existing writing on “Data Cleaning”. Wikipedia's post on data cleaning does a decent summary of the big important qualities of data quality: Validity, Accuracy, Completeness, Consistency, Uniformity. It’s also got a section on “process” that’s really dry and academic (in a negative way) and won’t help you clean any data at all. Next I’m just gonna sample posts from the top links on Google when I search “Data cleaning”. I’ll provide links as reference so you know what I’m griping about. This highly PageRanked one is like a friendlier expansion of the Wikipedia page at the start. Luckily it redeems itself in the process section by listing a big list of example techniques to use to clean data, things like cleaning spaces, dropping irrelevant values, etc. Has some examples and illustrations!. read more...

Unix News

Search This Blog

Data Cleaning IS Analysis, Not Grunt Work.

Comments

Popular posts from this blog

Fixing Unix/Linux/POSIX Filenames

Multi-Boot Disk for Machines With AMD Opteron Processors

Application configuration with Perl.