On a project for a recent client I needed to apply some common Natural Language Processing (NLP) techniques to surveys they had gathered, but one of the requirements for the project was that the source document had to remain in Word’s .docx format and couldn’t be exported to .txt. RapidMiner was the tool of choice […]
I’m loving Seahorse, a GUI frontend for Spark by deepsense.io. The interface is simple, elegant, and beautiful, and has the potential to significantly speed up development on a machine learning workflow by its drag-and-drop nature. Thus far I haven’t run into any major bugs that affect the results so naturally that shoots it near the […]
You’ll probably see a lot of CSV files in the workplace, or generate them from the vast ocean of spreadsheets that are floating around the average office. But that won’t always be the case, and sometimes you’re going to need to tap directly into an existing database or build your own. So how do you […]
Learn to use machine learning to solve real-world business problems.
In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this edition acts as an introduction to these techniques and other best practices in Spark programming.
If you want to learn Data Science, there's no escaping the need for hands-on experience with the tools you'll be using on a daily basis. This email course takes you through installing free and Open Source tools that top professionals use on a regular basis. Start getting your feet wet today.