In the data science world, some of the best stuff is free. I’ve already posted about free books and some of the better videos on YouTube, so now let’s put together a list of software tools. Some of these are limited versions of commercial software. Others, like R, are Open Source packages that have become the go-to standards in their area. Enjoy.
Microstrategy Analytics Desktop – Stripped down version of their full analytics package, designed for use on a user PC or laptop.
Microstrategy Analytics Express – Coud version of Microstrategy, with a few more features than Analytics Desktop. Only free for the first year.
Tableau Public – Very limited version of a very popular visualization package. Great for learning to assemble dashboards but doesn’t allow local storage of data, which is a royal pain in the ass and limits its usefulness.
RapidMiner Studio – If you want to move past visualization and get into data mining, it’s hard to do better than RapidMiner. The free download won’t import anything other than text files or CSVs and it’s limited to 1 Gig of memory, but when you think about it there’s a lot you can do with that. You can also get a two-week trial of the full version. The best part? No coding knowledge is necessary.
WEKA – Similar in purpose to RapidMiner, but completely different in use. Written in Java and runs on almost any operating system.
KNIME – Yet another free data mining app. Can use all of the modules in WEKA, and also incorporates plugins that allow integration with R. Powerful stuff.
Cloudera Distribution of Hadoop – All of the Big Three (Cloudera, MapR, and Hortonworks) have free versions of their distributions due to the fact that Hadoop is Open Source. I’m partial to the Cloudera distribution so I’ve included it in this list. Includes such gems as Hive, Hbase, Spark, etc.
MySQL – Sooner or later you’re going to need to work with a database, and there are few free choices in such widespread use as MySQL.
MySQL Workbench – A GUI for MySQL. Optional, but extremely useful.
R – This free software package remains the most popular programming language for data science, despite the recent surge of popularity by python. If you’re going to be a data scientist, sooner or later you’re going to need R skills.
R Commander – A popular GUI for R. Comes as a collection of plugins for different uses.
R Studio – If the idea of developing in R from the command line makes you want to poke yourself in the eye with a sharp stick, save yourself the pain and get an IDE (R Studio) instead.
Pandas – Python is included in OSX and nearly every distribution of Linux, but if you want to use it for data science you’ll need to augment the basic package. Pandas and NumPy are what you’re looking for.
NumPy – Ditto above.
If you know of some others I’ve missed, please list them in the comments below.