Introduction to RapidMiner Part 1

RapidMiner is widely used and has a free version that is ideal for learning (and even useful enough to solve some real problems day-to-day). During this three-part tutorial series we’ll download the software, load data into it, and learn how to do simple customer segmentation. Let’s go

To get the free version of RapidMiner, go to rapidminer.com and click on the “Downloads” menu option. This will take you to a sign-up form that asks for the usual information (your name, email address, etc). Fill in the form and RapidMiner will send you an email to confirm creation of your account.

Simply click on the link provided in the email to confirm creation of your account.

Now that you have an account you can log into the site and go to the downloads page, which shows the various operating systems that can run the software and auto-selects the one you are using. Click that and it will download the installer to your computer. Double-click the installer and accept the defaults for any options it provides.

You would think that the installation is done at this point, but no. The first time you start the application it’s going to ask for a license key and give you a link to the website to get that key. Choose the Studio edition, then copy and paste the key into the application.

Once that has been loaded you’re ready to go. The main screen looks like this:

As a way of introduction to the main screen, let’s quickly go over some terminology:
• The big empty space in the middle is where you’ll build your Process, which is nothing more than a graphical representation of the flow your data will take through RapidMiner to get the end result you’re looking for.
• A process is made up of Operators, or individual functions represented by squares in the Process screen. These do anything from loading data, to munging data, to applying a statistical model like Linear Regression to the data. If you want RapidMiner to do something with data, there’s probably an Operator for that. Operators can be found in the upper left hand of the screen and dragged into the Process building area, where you can chain them together into a logical progression.
• Repositories are where you’ll store your data and the processes you build. The repositories are shown in the bottom left of the screen.
In the next installment we’ll setup a new repository, put some data in it, then drag some Operators into the Process and see what they do with our data. See you soon!