mosaic ai agent framework

Building a Production-Ready RAG with Databricks Mosaic AI Agent Framework

Ever tried building a chatbot that seemed brilliant in demos but completely fell apart when real users started asking it questions? Yeah, we’ve all been there. The gap between “cool AI prototype” and “actually useful enterprise application” is where dreams go to die, and where Databricks Mosaic AI Agent Framework comes to the rescue.

What’s the Big Deal?

Building a proof-of-concept AI agent is surprisingly easy these days. Throw some data at an LLM, add a chat interface, and you’ve got something that looks impressive in a PowerPoint deck. But when it’s time to put that thing in front of actual users is when reality hits hard.

Databricks announced the public preview of Mosaic AI Agent Framework and Agent Evaluation at the Data + AI Summit 2024, and it’s specifically designed to bridge this gap. Instead of leaving you stranded in prototype purgatory, it gives you the tools to build AI agents that are actually ready for prime time.

The Problem It’s Solving

Let’s be frank about what usually goes wrong with AI applications in the real world:

Quality Issues: Your agent confidently tells users that penguins live in the Arctic (they don’t), or worse, hallucinates completely made-up information about your company’s policies.

No Feedback Loop: You deploy your agent and then… crickets. You have no idea if it’s helping users or driving them crazy, and no systematic way to improve it.

Development Chaos: Every iteration feels like starting from scratch. There’s no clear path from “this works on my laptop” to “this is serving thousands of users reliably.”

Governance Nightmares: Your legal team is breathing down your neck about data security, model behavior, and compliance, but you have no good way to monitor or control what your AI is actually doing.

Enter Mosaic AI Agent Framework

This isn’t just another AI tool; it’s more like a complete production line for AI agents. These tools are designed to help developers build and deploy high-quality Agentic and Retrieval Augmented Generation (RAG) applications within the Databricks Data Intelligence Platform.

Here’s what makes it different:

Built-in Quality Control

The framework comes with AI judges that automatically evaluate your agent’s responses for accuracy, helpfulness, and safety. It’s like having a team of quality assurance experts working 24/7, except they never get tired or miss subtle issues.

But here’s the clever part: Agent Evaluation lets you define what high-quality answers look like for your GenAI application by letting you invite subject matter experts across your organization to review your application and provide feedback on the quality of responses even if they are not Databricks users. Your domain experts can review responses through a simple web interface, and the system learns from their feedback to get better at judging quality automatically.

Real Feedback from Real People

Remember that chatbot that seemed great until actual users started complaining? Agent Framework solves this by making it easy to collect and analyze user feedback. You get a browser-based review application that you can share with stakeholders immediately, with no complex setup required.

The feedback gets automatically stored in Delta tables in Unity Catalog, so you can analyze patterns, identify common issues, and track improvement over time. It’s like having user research built right into your AI development process.

Production-Ready from Day One

Agent Framework is integrated with MLflow and enables developers to use the standard MLflow APIs like log_model and mlflow.evaluate to log a GenAI application and evaluate its quality. This means you’re not building a prototype that needs to be completely rewritten for production, you’re building production-quality code from the start.

Deploy with one line of code, get automatic observability, and have everything logged and traceable. Your DevOps team will actually like you.

Real Companies, Real Results

This isn’t just theoretical. Companies are already seeing serious benefits:

Corning used the framework to build an AI research assistant that indexes hundreds of thousands of documents, including US patent data. “By leveraging the Databricks Data Intelligence Platform, we significantly improved retrieval speed, response quality, and accuracy” says Denis Kamotsky, their Principal Software Engineer.

Lippert found that the framework was “a game-changer for us because it allowed us to evaluate the results of our GenAI applications and demonstrate the accuracy of our outputs while maintaining complete control over our data sources”.

Ford Direct created a unified chatbot for Ford and Lincoln dealerships to help dealers assess performance and inventory. The integration with Delta Tables and Unity Catalog meant their vector indexes update in real-time as source data changes, without touching the deployed model.

How It Actually Works

Let’s say you’re building a RAG application (because let’s face it, most enterprise AI applications are some flavor of RAG). Here’s the typical workflow:

  1. Build and Trace: Add three lines of MLflow code to your application to enable tracing and observability
  2. Deploy for Testing: Register your app in Unity Catalog and deploy it as a POC
  3. Collect Feedback: Share the review app with stakeholders and start collecting feedback immediately
  4. Evaluate Quality: Use built-in AI judges to automatically assess responses against quality criteria
  5. Iterate and Improve: Use the feedback to identify issues and improve your application
  6. Deploy to Production: Once you hit your quality threshold, deploy with confidence

The whole process is integrated with Unity Catalog for governance, MLflow for lineage and metadata management, and includes LLM Guardrails for safety.