Wednesday, December 28, 2022

Data Cleaning IS Analysis, Not Grunt Work.

First let’s start with stating the problem with existing writing on “Data Cleaning”. Wikipedia's post on data cleaning does a decent summary of the big important qualities of data quality: Validity, Accuracy, Completeness, Consistency, Uniformity. It’s also got a section on “process” that’s really dry and academic (in a negative way) and won’t help you clean any data at all. Next I’m just gonna sample posts from the top links on Google when I search “Data cleaning”. I’ll provide links as reference so you know what I’m griping about. This highly PageRanked one is like a friendlier expansion of the Wikipedia page at the start. Luckily it redeems itself in the process section by listing a big list of example techniques to use to clean data, things like cleaning spaces, dropping irrelevant values, etc. Has some examples and illustrations!. read more...

No comments:

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

  This project is about how to systematically persuade LLMs to jailbreak them. The well-known ...