Author: Scott King
-
Demystifying Large Language Models: How ChatGPT and Similar AI Work
Ever had a conversation with ChatGPT and wondered, “How does this thing know what to say?” You’re not alone. These AI systems seem almost magical; they can write poetry, debug code, explain quantum physics, and somehow know exactly what you mean even when you’re being vague about it. But here’s the thing: there’s no actual…
-
Random Forest: The Swiss Army Knife of Machine Learning
So you’ve heard about Random Forest and you’re wondering what all the fuss is about? Well, buckle up because we’re about to dive into one of the most reliable and versatile algorithms in the machine learning toolbox. What’s This Random Forest Thing Anyway? Think of Random Forest as that friend who always gives solid advice…
-
Multiple Regression with Scikit-learn: When One Variable Isn’t Enough
So you’ve mastered simple linear regression and you’re feeling pretty good about yourself. You can predict house prices based on square footage, estimate salaries from years of experience, and impress your friends at parties with your newfound ML skills. But then reality hits: the real world is messy, and one variable rarely tells the whole…
-
Embedded Analytics with Tableau: Bring Insights Into Your App or Website
How to embed Tableau vizualizations into webpages or apps.
-
Tableau Prep vs Alteryx: Which is Best?
So you’re drowning in messy data and need a lifeline? Welcome to the club! If you’re trying to decide between Tableau Prep and Alteryx for your data prep needs, you’ve come to the right place. Let’s break down these two heavy hitters and figure out which one deserves a spot in your data toolkit. Full…
-
Performance Optimization: Speeding Up Large Tableau Workbooks
Tips to optimize performance of Tableau workbooks.
-
Getting Started with TabPy: Bringing Python Magic to Your Tableau Dashboards
Start using Tableau’s free python server.
-
Infer Schema when Importing to PostgreSQL or MySQL
Postgres and MySQL are great Open Source databases in use by organizations of all sizes. Both of them are flexible and somewhat scalable. Both of them have nice GUI front-ends available to make them even easier to use. Sequel Pro for MySQL is a handy GUI frontend that runs on Macs, and one of the…
-
Getting Logistic Regression Right with Scikit-Learn
So you want to do some logistic regression? Cool! It’s like linear regression’s slightly more complicated cousin who went to business school. Instead of predicting continuous values, logistic regression predicts probabilities and categories. Perfect for questions like “Will this email be spam?” or “Is this customer going to buy something?” Let’s dive into how to…
-
Data Warehousing with Hadoop
Almost from the moment Hadoop was first introduced, organizations have sought to replace their expensive data warehousing systems with it. Hadoop’s distributed nature and the fact that it uses commodity hardware make it cheap, massively scalable, and highly available. However, data warehousing with Hadoop is often ill-advised and the projects have ended badly. HDFS, the…
-
Using SQL Commands in Spark with SparkSQL
Spark has become a standard for performing analysis on huge amounts of data due to its distributed nature. SparkSQL evolved as a necessary component of Spark due to the need for working with structured data.There are many times when there is a need to query data in Spark with SQL commands. Doing so isn’t complicated…
-
Importing Word Docs into Rapidminer
On a project for a recent client I needed to apply some common Natural Language Processing (NLP) techniques to surveys they had gathered, but one of the requirements for the project was that the source document had to remain in Word’s .docx format and couldn’t be exported to .txt. RapidMiner was the tool of choice…
-
Using Seahorse for Spark on a Cloudera HA Cluster
I’m loving Seahorse, a GUI frontend for Spark by deepsense.io. The interface is simple, elegant, and beautiful, and has the potential to significantly speed up development on a machine learning workflow by its drag-and-drop nature. Thus far I haven’t run into any major bugs that affect the results so naturally that shoots it near the…
-
Installing MySQL from Scratch
You’ll probably see a lot of CSV files in the workplace, or generate them from the vast ocean of spreadsheets that are floating around the average office. But that won’t always be the case, and sometimes you’re going to need to tap directly into an existing database or build your own. So how do you…