Unix news is nothing but Opensourse - this site features general information about Linux - Unix - Python and Perl programming language. Lately i am adding more posts about Data and Data analytics in general.
Wednesday, December 28, 2022
Data Cleaning IS Analysis, Not Grunt Work.
First let’s start with stating the problem with existing writing on “Data Cleaning”.
Wikipedia's post on data cleaning does a decent summary of the big important qualities of data quality: Validity, Accuracy, Completeness, Consistency, Uniformity. It’s also got a section on “process” that’s really dry and academic (in a negative way) and won’t help you clean any data at all.
Next I’m just gonna sample posts from the top links on Google when I search “Data cleaning”. I’ll provide links as reference so you know what I’m griping about.
This highly PageRanked one is like a friendlier expansion of the Wikipedia page at the start. Luckily it redeems itself in the process section by listing a big list of example techniques to use to clean data, things like cleaning spaces, dropping irrelevant values, etc. Has some examples and illustrations!. read more...
Subscribe to:
Post Comments (Atom)
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
This project is about how to systematically persuade LLMs to jailbreak them. The well-known ...
-
Traditionally, Unix/Linux/POSIX filenames can be almost any sequence of bytes, and their meaning is unassigned. The only real rules are that...
-
Schwachstellen-Scanning: Bedeutung und Prozess Schwachstellen-Scanning ist ein kritischer Schritt in der Informationssicherheit, der dazu d...
No comments:
Post a Comment