· Miguel Ángel García · Analytics  · 7 min read

Building a Semantic Layer for Agentic and Embedded Analytics

Text-to-SQL pipelines are prone to hallucination. A semantic layer fixes that — by giving AI agents a governed, structured vocabulary instead of a raw schema to guess at. This is why I built DataPlexer, and what I've learned using it in production.

Text-to-SQL pipelines are prone to hallucination. A semantic layer fixes that — by giving AI agents a governed, structured vocabulary instead of a raw schema to guess at. This is why I built DataPlexer, and what I've learned using it in production.

One of the main areas where companies are realizing the use of AI can become a game-changer is data analytics. Data analysis is about finding patterns in data, figuring out how inputs (behavior, events, etc) affect outputs (performance, trends, results), and AI can be extremely good at this task.

The problem companies and researchers in this area are finding difficult to tackle is not around training AI to interpret data; that comes almost naturally. The challenge lies in how to give AI tools access to the data they need to explore in order to derive insights in a reliable, controlled manner.

Solving this problem involves not only getting a model to understand a database schema and write SQL (a challenge in its own right), but also setting up proper guardrails to ensure these tools have access only to the data they are permitted to use. And then there’s the hallucination problem: the all-too-common, sometimes hard-to-spot behavior where the model simply starts making up facts.

In a Text-to-SQL pipeline, a common hallucination occurs when the model invents database fields, tables, and relationships, producing inaccurate-at-best results that can easily go unnoticed. In data analysis and decision-making, accuracy is paramount and even small errors can lead to major problems.

The Semantic Layer: A Governed Approach to Data Access

The idea of using a semantic layer for providing access to data has been around for ages and has been used mostly in conjunction with enterprise BI tools to enable self-service analytics.

More recently, we’ve also seen this same concept used to provide access to applications. This essentially turns the semantic layer into a central engine for organization-wide data access, with clear benefits like centralized governance, unified metric definitions and ensuring alignment for KPI calculation across all users and tools. The proverbial and sometimes evasive single source of truth.

A Decades-Old Solution to an AI-Era Problem

A semantic layer is, at its core, a translation layer; and translation is exactly what AI needs. It sits between business language and database structures: you define your data models with business-friendly names, centralized metric definitions, and rich context, and any consumer (whether a BI tool, a custom application, or now an AI agent) accesses the data through a consistent, well-defined interface.

Instead of using a Text-to-SQL approach for agentic analytics, we can leverage the semantic layer to bridge the gap between natural language business questions and AI-generated queries.

We can think of it this way: instead of asking an AI to figure out that amt_01 in the txn_fact table means “Net Revenue after discounts, excluding taxes and shipping,” you tell it upfront. You give it the vocabulary, the relationships, and the available datapoints; i.e. the business context. The AI doesn’t have to guess, it just has to ask the right questions using the language you’ve already defined.

The idea of a semantic layer isn’t new — it’s been around in various forms for decades. But three things are converging right now that make it more critical than ever:

  1. The rise of AI and LLMs. AI agents need structured, well-documented data access to answer business questions accurately. Without it, they hallucinate.

  2. The composable data stack. Modern architectures demand interoperable, API-first components. Monolithic BI platforms are giving way to specialized tools that work together.

  3. Embedded analytics everywhere. Every SaaS product now needs built-in analytics, and building custom data endpoints for every chart is not sustainable.

Enter DataPlexer

There’s one more piece to this puzzle. With the rise of the Model Context Protocol, AI agents don’t even have to query data using SQL — they call tools using JSON, a language they’re already naturally good at. A semantic layer that speaks JSON natively maps directly onto how AI agents work, which makes it a much better fit than a Text-to-SQL pipeline for agentic use cases.

This is the context in which I started building DataPlexer, an AI-native semantic layer designed with two core use cases in mind:

  1. Embedded Analytics — a single, unified API endpoint for all the data queries powering the charts, dashboards, and reports built into the systems users already interact with.
  2. Agentic Analytics — purpose-built MCP tools that give AI agents governed, structured access to your data, so they can answer business questions accurately without writing a line of SQL.

What makes DataPlexer architecturally interesting is that both use cases communicate with the semantic engine using the exact same JSON query structure. That convergence unlocks a third capability that didn’t need to be engineered separately, but rather emerges naturally from the design:

  1. AI-driven dashboard development — because AI agents already know to interact with the semantic layer through MCP, they can construct and validate dashboard queries autonomously, then use those exact same queries to power live visualizations that interact with the semantic engine programatically. The LLM builds a query, tests it against real data via MCP, confirms the results match the intent, then commits it to a dashboard configuration. A closed loop, from natural language to production chart.

The core idea is simple: you define your data models and DataPlexer provides a unified API that translates structured JSON queries into optimized SQL for your database.

Here’s what a simplified query looks like:

{
  "dimensions": ["Month", "Category"],
  "measures": [{"field": "Net Revenue"}],
  "filters": [
      {"Status": "Active"},
      {"Year": "2025"}
  ]
}

The consumer asks for data using business terms (“Net Revenue,” “Month,” “Category”) and DataPlexer handles the rest: the joins, metric definitions, aggregations, the dialect-specific SQL generation.

I already use DataPlexer to power the analytics dashboards in Pulso Fiscal, my own tax analytics platform, so everything described here is running in production, not just a design exercise.

Agentic Analytics: Teaching AI to Be a Data Analyst

One of the most exciting aspects of this project has been building for what I’ve been calling agentic analytics: enabling AI agents to perform sophisticated business intelligence tasks autonomously.

The workflow looks like this:

  1. Discovery. The AI agent explores what data is available using MCP tools like list_models and search_catalog.

  2. Understanding. It reads the semantic context (the business definitions, field names, and descriptions) to understand what the data actually means.

  3. Query construction. It builds a structured JSON query using the vocabulary from the semantic layer.

  4. Execution and analysis. It runs the query and interprets the results.

The key insight here is that the AI never writes SQL. It works with a constrained, structured format where field names come from the model definition, and the query is validated by the engine before it hits the database. This approach eliminates an entire class of errors.

DataPlexer implements this through 12 purpose-built MCP (Model Context Protocol) tools, not a chat proxy, but direct, discrete tools that an AI agent can call independently. MCP has become the industry standard for AI tool integration, with over 97 million monthly SDK downloads, and building native support for it was a deliberate decision from the start.

What’s Next

I’ll be sharing more about DataPlexer in upcoming posts — diving deeper into the schema design, the MCP tools, the embedded analytics use case, and real-world deployments. If you’re building data experiences into your product or exploring how to give AI agents reliable access to your data, I think you’ll find this space worth watching.

The analytics landscape is changing fast. AI is reshaping how we interact with data, and the companies that get the foundation right — the semantic layer, the data model, the single source of truth — are the ones that will be best positioned to take advantage of what’s coming next.

Thanks for reading.

Subscribe to receive new posts like this one and updates right in your inbox

Back to Blog

Related Posts

View All Posts »