Category: Data Platforms
-
A Review of Snowflake Snowpark
After spending several months using Snowflake Snowpark, I’m really impressed with how it enhances the data engineering and data science experience within the Snowflake ecosystem. Essentially, Snowpark allows you to write and execute code directly inside Snowflake using languages like Python, Scala, and Java. This eliminates the need for external processing engines, which reduces complexity…
-
What’s the Deal with Snowflake Cortex AI?
Snowflake Cortex is basically Snowflake’s way of saying, “Hey, we do AI now!” It’s a fully managed service that brings machine learning and generative AI right into your Snowflake environment. So instead of shipping your data out to some other service for analysis, you can just do the smart stuff right there, right where your…
-
Snowflake Query Optimization: Tips for Faster Performance
Nobody likes slow queries – they’re the digital equivalent of waiting in line at the DMV. Let’s speed things up with some proven optimization techniques. The Low-Hanging Fruit: — FastSELECT customer_id, order_date, total_amount FROM large_table; Advanced Optimization Techniques: Use Clustering Keys for Large Tables: If you’re repeatedly filtering or joining on specific columns, clustering keys…
-
-
Wrangling Data with Databricks Delta Live Tables
The Medallion process with Databricks Delta Live Tables.
-
-
Data Warehousing with Hadoop
Almost from the moment Hadoop was first introduced, organizations have sought to replace their expensive data warehousing systems with it. Hadoop’s distributed nature and the fact that it uses commodity hardware make it cheap, massively scalable, and highly available. However, data warehousing with Hadoop is often ill-advised and the projects have ended badly. HDFS, the…
-
Using Seahorse for Spark on a Cloudera HA Cluster
I’m loving Seahorse, a GUI frontend for Spark by deepsense.io. The interface is simple, elegant, and beautiful, and has the potential to significantly speed up development on a machine learning workflow by its drag-and-drop nature. Thus far I haven’t run into any major bugs that affect the results so naturally that shoots it near the…