Data Science is the kind of field where you can’t really learn without doing. Sure, you can learn about statistics and models in a classroom but in order to be functional in a a job you have to know how to use the tools. So what kind of equipment are we going to need in order to gather data from the web, store and manipulate it, then break it down for analysis? To learn the tools? Thats a lot of work, so essentially you’re going to need…..a laptop.
That’s it? You can’t be serious.
Actually, I am.
Learning Data Science doesn’t have to cost a fortune, it doesn’t require racks full of servers, and it doesn’t mean getting a PhD (although a lot of PhD’s would like us to believe that it does). 80% of what a Data Scientist does can be accomplished with a well-specified laptop and some very cost-effective software.
For example, some of the best tools for data science are Open Source. These are the same versions in use at Fortune 500 companies so there’s no lack of functionality, but they are free for anyone to use. Other tools are commercial in nature but have free versions for educational or non-commercial purposes.
Over the next few weeks we’ll be looking at some of the leading software for data analysis, with the common criteria being they have to be free for educational use and they have to fit on a reasonably specified laptop. Some of the tools we’ll examine include:
- Hadoop (which also gets you Hive, Sqoop, and a host of others)
As the series develops I will undoubtedly add to this list, but I want to get you started on these first.