You know how they say 80% of Machine Learning is data preparation? Well, I decided to write a book about the 80%.
I started with the goal of writing a short, practical guide. I blinked, and now it's a 26-chapter, 6-appendix monstrosity that covers everything from Andrew Ng's philosophy to the mathematical proofs for things I barely understood myself. It's less of a book and more of a strategic manual for navigating the data jungle armed with more than just a rusty df.dropna().
The core philosophy is simple: "Garbage In, Garbage Out" is the silent killer of AI projects. This book is an attempt to perform an exorcism on the "Garbage In" demon.
I've stared at this table of contents for so long that the words "Imputation" and "Dimensionality" have lost all meaning. I need fresh eyes before I send this behemoth to an editor.
I'm looking for your brutal, merciless, and invaluable feedback on:
- Is This Insane? Is a 26-chapter book on data prep a noble quest or a sign of a complete psychological break?
- What Did I Miss? Did I forget your favorite esoteric encoding technique? Is there a new data architecture from Netflix that renders Chapter 3 obsolete?
- Does It Make Sense? Does the flow from "The Data-Centric Revolution" to "Quantum Computing Implications" feel like a logical progression or a fever dream?
- Technical Blunders: Have I misinterpreted a core concept? Is my explanation of MCAR vs. MAR going to get me laughed out of the next data science meetup?
- Typos & Gibberish: Point out anything that looks like I fell asleep on the keyboard.
- Read a Chapter (or a section): Pick a topic that interests you or that you're an expert in. The Notion link for Chapter 1 is in the outline below.
- Open an Issue: Create a new GitHub Issue for any feedback. Please prefix the issue title with the chapter number, e.g.,
[Chapter 10] Hashing Encoder explanation is confusing. - No Feedback is Too Small: From a simple typo to a full-blown philosophical disagreement about Feature Stores, I want to hear it all.
Thank you for being brave enough to look. Now, please tell me my baby isn't (that) ugly.
If you've read this far, and would like to give me some thoughts, -> The Monstrous Table of Contents