Synopsis: A practical guide to performing common data mining tasks in RapidMiner.
The first thing you need to know about “Data Mining for the Masses” is that, despite protestations to the contrary by author Dr. Matthew North, it’s basically an instructional text for using the popular data mining tool RapidMiner. (Spoiler alert: We’ll be discussing the installation and operation of RapidMiner later this week)
North does spend a little time at the beginning of the book covering such basics as the CRISP-DM model and data warehousing, but in general there is very little theory in the book. Instead there are excellent chapters covering the practical application of the usual suspects (k-means, linear regression, neural networks, etc.) in statistical modeling. Since RapidMiner is a popular tool, and since North does such a good job of covering how to use it, this book is definitely a must-have for those pursuing a career in data.
I am particularly pleased that North saw fit to include a chapter on ethics, and he hits the nail squarely on the head with this:
In all seriousness, when we are dealing with data, those data represent people’s lives…Ethics is the set of moral codes, above and beyond the legally required minimums, that an individual uses to make right and respectful decisions. When mining data, questions of an ethical nature will invariably arise. Simply because it is legal to gather and mine certain data does not make it ethical.
This is a message we as practitioners of the craft need to constantly hammer home, to ourselves but to others as well.
The second thing you need to know about this book is that it is available on Amazon.com for $39.99 USD, but it is licensed under Creative Commons and as such is available for free on the Internet. I downloaded the PDF version some time back, but found the information so valuable that I bought the book later to ensure North was compensated. I would encourage you to do the same.