Blog Posts
-
Best practices for building COVID-19 dashboards that track cases, visualize trends, and communicate rapidly changing pandemic data clearly and responsibly.
-
Migrating from Tableau Desktop to Tableau Online
A complete guide to migrating from Tableau Desktop and Server to Tableau Online. Covers planning, data source strategies, security setup, and post-migration optimization for remote teams.
-
Tableau 2020.1: Set Controls and Dynamic Parameters
Deep dive into Tableau 2020.1’s game-changing features: set controls and dynamic parameters. Learn how to build interactive dashboards with multi-select filters and self-updating parameter lists.
-
Performance Optimization: Speeding Up Large Tableau Workbooks
Tips to optimize performance of Tableau workbooks.
-
Getting Started with TabPy: Bringing Python Magic to Your Tableau Dashboards
Start using Tableau’s free python server.
-
Infer Schema when Importing to PostgreSQL or MySQL
Postgres and MySQL are great Open Source databases in use by organizations of all sizes. Both of them are flexible and somewhat scalable. Both of them have nice GUI front-ends available to make them even easier to use. Sequel Pro…
-
Getting Logistic Regression Right with Scikit-Learn
So you want to do some logistic regression? Cool! It’s like linear regression’s slightly more complicated cousin who went to business school. Instead of predicting continuous values, logistic regression predicts probabilities and categories. Perfect for questions like “Will this email…
-

Data Warehousing with Hadoop
Almost from the moment Hadoop was first introduced, organizations have sought to replace their expensive data warehousing systems with it. Hadoop’s distributed nature and the fact that it uses commodity hardware make it cheap, massively scalable, and highly available. However,…
-

Using SQL Commands in Spark with SparkSQL
Spark has become a standard for performing analysis on huge amounts of data due to its distributed nature. SparkSQL evolved as a necessary component of Spark due to the need for working with structured data.There are many times when there…
-

Importing Word Docs into Rapidminer
On a project for a recent client I needed to apply some common Natural Language Processing (NLP) techniques to surveys they had gathered, but one of the requirements for the project was that the source document had to remain in…
-

Using Seahorse for Spark on a Cloudera HA Cluster
I’m loving Seahorse, a GUI frontend for Spark by deepsense.io. The interface is simple, elegant, and beautiful, and has the potential to significantly speed up development on a machine learning workflow by its drag-and-drop nature. Thus far I haven’t run…
-
Installing MySQL from Scratch
You’ll probably see a lot of CSV files in the workplace, or generate them from the vast ocean of spreadsheets that are floating around the average office. But that won’t always be the case, and sometimes you’re going to need…
-
Free Data Science Software
In the data science world, some of the best stuff is free. I’ve already posted about free books and some of the better videos on YouTube, so now let’s put together a list of software tools. Some of these are…
-
Educational Videos on Data Science
Here’s a list of some of the better videos I’ve stumbled across over the past couple of years. They range from forward-looking glimpses into the future, to software tutorials. I’d love to grow this list, so if you have a…
-
Introduction to RapidMiner Part 3
Now that you know how to import data and examine it, it’s time to get to the meat of RapidMiner: building a Process. And when you’re building a Process, the Design screen becomes really important.Open RapidMiner, and if it doesn’t…