Dinobase: Why the Creator of PostHog AI Built a Database for AI Agents

The creator of PostHog AI built Dinobase, an AI agent database using SQL instead of MCPs. Benchmarks show 2-3x accuracy gains with 16-22x less tokens.

The creator of PostHog AI just launched Dinobase, a SQL-first database specifically designed for AI agents that challenges the prevailing MCP architecture paradigm. After leaving PostHog three weeks ago, he spent the past weeks building a system that exposes business data through DuckDB and annotated schemas rather than per-source MCP tools. His benchmarks across 11 LLMs from Kimi 2.5 to Claude Opus 4.6 reveal a striking advantage: SQL-based access achieved 2-3x higher accuracy while consuming 16-22x fewer tokens per correct answer compared to traditional MCP approaches. Dinobase syncs data from 101 connectors into Parquet via dlt, uses Claude agents for schema annotation, and integrates with OpenClaw, Claude Code, Cursor, and major frameworks like LangChain and CrewAI.

What Is Dinobase and Who Built It?

Dinobase is a specialized database layer for AI agents built by the former lead behind PostHog AI, the business analyst agent at PostHog. After departing the company three weeks ago to pursue side projects, he identified a critical gap in how agents access structured data. Existing solutions force agents to navigate complex tool call sequences or MCP servers for each data source, creating overhead and error-prone context management. These issues often lead to suboptimal performance and increased operational costs for AI agent deployments.

Dinobase takes a fundamentally different approach. It acts as a centralized query layer where agents write standard SQL against a unified schema spanning multiple data sources. The system handles 101 different connectors ranging from SaaS APIs like Stripe and HubSpot to databases and file storage systems. Data syncs to Parquet format via dlt, then queries through DuckDB. This architecture eliminates the need for agents to understand multiple API formats or manage cross-source relationships in their limited context windows. The creator chose the name because he loves dinosaurs and the domain was available, signaling a builder-first mentality over marketing polish and complex branding.

Why SQL Outperforms MCP for Agent Data Access

The core insight driving Dinobase is that SQL handles data relationships more effectively and efficiently than MCP tool calls. When agents use per-source MCPs, they must join information in-context, forcing the LLM to track relationships between, for example, Stripe transactions, HubSpot contacts, and internal database records within its context window. This approach is inherently fragile and often fails on complex queries, as the LLM’s context window has limitations in both size and its ability to consistently perform accurate data joins.

SQL, on the other hand, moves the join logic into the database engine where it belongs. DuckDB executes cross-source JOINs natively, returning only the final, aggregated result set to the agent. This significantly reduces the cognitive load on the LLM and eliminates the token-heavy, iterative back-and-forth of multiple tool calls. The PostHog AI team discovered this after experimenting extensively with raw SQL access versus tool-based approaches, consistently finding that SQL won on accuracy and reliability. Dinobase extends this pattern with automatically annotated schemas that help agents understand table relationships, PII constraints, and column meanings without guessing, further enhancing query precision. For builders running OpenClaw or Claude Code, this means writing familiar SQL instead of orchestrating complex multi-step tool chains, leading to more robust and predictable agent behavior.

The Benchmark Results: Quantifying the SQL Advantage

The creator ran rigorous benchmarks pitting Dinobase against per-source MCP access across 11 different LLMs, including models from Kimi 2.5 to Claude Opus 4.6. The results were striking and consistent across the diverse range of models tested. When answering complex business questions, the SQL-based approach achieved 2-3x higher accuracy rates compared to MCP-based queries. This significant improvement in accuracy highlights a fundamental architectural superiority for data-intensive tasks.

This accuracy gap primarily stems from how agents handle data relationships. MCP architectures require the agent to make multiple tool calls, store intermediate results, and perform logical joins within its context. Each of these steps introduces potential failure points: a missed relationship, an incorrect join condition, or a misinterpretation of data can cascade into wrong answers. Dinobase removes these intermediate, error-prone steps. The agent writes a single SQL query, DuckDB executes the joins optimally and efficiently, and returns the correct dataset. This architecture proves particularly effective for multi-hop questions like “What is the lifetime value of customers who churned last quarter after viewing the pricing page?” Such queries require intricate joins across analytics, CRM, and payment data, which MCPs struggle to coordinate reliably but SQL handles with native efficiency and precision.

Token Efficiency: How Dinobase Cuts Costs by 95%

Beyond the substantial accuracy gains, Dinobase also delivers massive token efficiency. The benchmarks clearly show 16-22x fewer tokens consumed per correct answer when using SQL versus MCP access. For production AI agents running on API-based LLMs, this translates directly into significant cost reductions, often in the range of 90-95% for data retrieval operations. This level of token efficiency can make the difference between a proof-of-concept and a truly scalable, economically viable production deployment.

The token savings primarily come from eliminating conversation overhead. MCP-based agents burn tokens on tool call descriptions, formatting intermediate results, and attempting error correction when joins fail. For instance, each Stripe API call might return 500 tokens of customer data, then the agent must pass that to a HubSpot query, creating token duplication and unnecessary chatter. Dinobase compresses this entire process into a single, concise SQL query and a streamlined result set. A complex 3-way join that might require 3,000 tokens of MCP chatter could cost as little as 150 tokens for the SQL query itself, plus the tokens for the final result. For high-volume agents processing thousands of queries daily, this efficiency is a critical factor in determining whether a deployment is economically viable or prohibitively expensive, making Dinobase a compelling choice for cost-conscious organizations.

Architecture Deep Dive: dlt, Parquet, and DuckDB

Under the hood, Dinobase leverages a modern data stack optimized for agent queries and efficiency. The data pipeline begins with dlt (data load tool), an open-source library that facilitates extraction and loading of data from 101 diverse connectors. These connectors cover a wide range of sources, including popular SaaS APIs, various types of databases, and common file storage systems. This ingested data then lands in Parquet format, which can be stored either locally on the agent’s machine or in cloud object storage solutions like Amazon S3, Google Cloud Storage (GCS), or Azure Blob Storage.

DuckDB serves as the embedded analytical query engine. Its in-process architecture allows it to handle cross-source JOINs directly without the need for external servers, making it ideal for self-contained agent deployments. Parquet, a columnar storage format, provides significant benefits such as compression and predicate pushdown, ensuring that agents querying large datasets only read the most relevant rows and columns, further boosting performance. The architecture supports both read-heavy analytics and safe mutations through guardrails. Unlike traditional data warehouses that often require complex ETL (Extract, Transform, Load) pipelines, Dinobase treats data synchronization as a lightweight operation. The dlt connectors automatically handle schema evolution, meaning that if a source like Stripe adds a new field, your agents see it immediately without manual schema updates. This entire setup can run anywhere from a powerful Mac mini to a cloud instance, fitting perfectly with local-first agent workflows, such as those employing OpenClaw.

Schema Annotation with Claude Agents

One of Dinobase’s most innovative features is its automated schema annotation process. After each data synchronization completes, a dedicated Claude agent is invoked to process the newly updated schema. This agent intelligently generates comprehensive documentation for the data, including high-level table descriptions, detailed column-level documentation, PII (Personally Identifiable Information) flags, and crucial relationship maps between tables.

This sophisticated annotation layer addresses a critical and common problem in agent data access: context confusion and ambiguity. When an AI agent encounters a column named uid, it might represent a user ID, a unique identifier, or a universal index. Without explicit documentation, agents are forced to guess, often leading to failed queries or incorrect interpretations. The Claude annotator leverages its understanding to read sample data and schema metadata, accurately determining that, for example, uid in the users table refers to PostHog user IDs, while uid in the stripe_customers table maps to Stripe customer IDs. It also reliably flags PII columns like email and phone so that querying agents can apply appropriate masking or adhere to data privacy regulations. For OpenClaw builders, this means agents are equipped with a built-in, dynamic data dictionary, eliminating the need for manual schema documentation and mitigating the risks associated with ambiguous column interpretations.

The 101 Connectors Strategy for Comprehensive Data Access

Dinobase ships with an impressive suite of 101 pre-built connectors, providing comprehensive coverage of the modern SaaS landscape and various data sources. This extensive collection includes popular services like Stripe for payments, HubSpot for CRM, various relational databases such as Postgres and MySQL, and diverse file storage systems. All these connectors are powered by dlt, which efficiently handles complex tasks such as authentication, rate limiting, and automatic schema extraction.

This breadth of connector support is a significant advantage for AI agent builders. Most AI agents require data from multiple sources to be truly useful and provide holistic insights. For instance, a sales agent needs access to HubSpot contacts, Stripe billing history, and product usage data from an analytics warehouse to provide accurate recommendations. Building custom MCP servers for each of these integrations can be a time-consuming and resource-intensive process, often taking weeks or even months. Dinobase solves this by providing these integrations out of the box with standardized schemas, significantly accelerating agent development. The connectors sync data incrementally, updating Parquet files without requiring full refreshes, which optimizes resource usage. For local deployments, users can run the sync on their own hardware, ensuring sensitive data remains in-house. The number 101 is not arbitrary; it represents a strategic effort to cover the most common business tools, ensuring that most agent builders can connect their entire data stack without writing custom integration code.

Cross-Source JOINs vs. In-Context Joining: A Fundamental Difference

The fundamental architectural difference between Dinobase and traditional MCP approaches lies in the critical step of data joining. MCP architectures fundamentally force the AI agent to perform joins in-context. This means the agent calls the Stripe MCP to get customer data, holds that data in its limited memory or context window, then calls the HubSpot MCP to get contact data, and subsequently attempts to match records using string similarity or ID mapping directly within its context. This process is prone to errors, especially as data volume and complexity increase.

Dinobase, in contrast, pushes this complex data joining work to DuckDB, the embedded database engine. When an agent asks for “customers who paid over $1000 and opened a support ticket,” Dinobase executes a JOIN operation between the Stripe payments table and the Zendesk tickets table at the database level. DuckDB, being an optimized analytical database, uses efficient algorithms like hash joins or merge joins, tailored for the specific data types and indexes, returning only the final, filtered, and correctly joined result. This approach eliminates the substantial token overhead of passing large intermediate datasets to the LLM and removes the error-prone step of the agent attempting to align records. For complex queries involving 4-5 data sources, in-context joining reliably fails, while native SQL JOINs succeed with high accuracy and efficiency.

SELECT 
    s.customer_id,
    s.amount,
    z.ticket_id,
    h.contact_name
FROM stripe_charges s
JOIN zendesk_tickets z ON s.customer_id = z.requester_id
JOIN hubspot_contacts h ON s.customer_id = h.external_id
WHERE s.amount > 1000
    AND z.status = 'open';

The above SQL query exemplifies how Dinobase handles a multi-source join, allowing the database engine to perform the heavy lifting, rather than burdening the LLM.

Seamless Integration with OpenClaw and Local Agents

Dinobase is explicitly designed to support and enhance the local-first agent movement. It seamlessly integrates with leading local agent platforms like OpenClaw, Claude Code, Cursor, and Codex. Furthermore, it maintains compatibility with hosted frameworks such as LangChain, CrewAI, LlamaIndex, Pydantic AI, and Mastra. This dual compatibility is crucial as developers increasingly opt to run agents on local hardware for enhanced privacy, security, and cost-effectiveness.

For OpenClaw users specifically, Dinobase provides a standard SQL interface that naturally aligns with the framework’s tool-calling patterns. To integrate, you configure an OpenClaw agent with a database tool that points to your Dinobase DuckDB instance. The agent can then generate SQL queries based on the detailed, annotated schema, execute these queries efficiently through Dinobase, and receive structured results for further processing. Because DuckDB operates as an embedded database, you can host Dinobase on the same local machine, whether it’s a Mac mini or a Linux box, that is running OpenClaw. This local data processing setup avoids sending sensitive business data to external cloud APIs during the analysis phase, a critical requirement for many security-conscious deployments and organizations with strict data governance policies.

Guardrails for Safe Mutations and Reverse ETL Capabilities

Dinobase is not confined to read-only operations; it extends its utility by incorporating robust guardrails for safe mutations and Reverse ETL (Extract, Transform, Load) operations. This capability empowers AI agents to write data back to source systems, transforming them from passive analytical tools into active operational entities that can update CRM records, trigger refunds, or modify user permissions in a controlled manner. This closed-loop functionality significantly expands the scope and impact of AI agents.

The guardrails layer is designed to intercept and validate all SQL INSERT, UPDATE, and DELETE statements before their execution. It rigorously checks these operations against the schema annotations, preventing writes to critical PII columns or sensitive tables without explicit, pre-defined confirmation or authorization. For Reverse ETL, Dinobase can push computed metrics, such as churn risk scores or customer segments, back to platforms like HubSpot or Stripe, utilizing the same dlt connectors that manage data ingestion. For example, an agent could calculate detailed churn risk scores by analyzing data from multiple disparate sources, then write these scores directly to a custom field within your CRM system. Furthermore, comprehensive mutation logs capture every change with detailed before/after states, providing essential audit trails required for compliance and operational transparency in production systems. This write capability fundamentally distinguishes Dinobase from simpler query layers, enabling truly closed-loop agent workflows and automated business processes.

From PostHog AI to Dinobase: The Genesis Story

The Dinobase project directly stems from the creator’s hands-on experience building PostHog AI, the company’s internal business analyst agent. Over the past year, he extensively experimented with two primary approaches for data access: providing the agent with raw SQL access to PostHog’s underlying databases versus exposing sophisticated tools and MCPs for data retrieval. This practical, real-world experimentation provided invaluable insights.

The SQL approach consistently won decisively. PostHog AI performed significantly better and more reliably when it could query tables directly, rather than being forced to navigate multiple layers of abstraction. This critical insight emerged from production usage where complex business questions frequently required joining data across the product analytics warehouse, CRM exports, and billing systems. The tool-based approach, while seemingly intuitive, required intricate orchestration that often broke down on edge cases and complex multi-step queries. After leaving PostHog three weeks ago, the creator was driven to explore and formalize this SQL-first pattern. He quickly built an MVP, exposing business data through DuckDB with richly annotated schemas, and then conducted comparative benchmarks to rigorously validate his hypothesis. The results emphatically confirmed his PostHog experience: SQL access dramatically outperformed MCP architectures on real-world business questions, providing the impetus for Dinobase’s creation.

Why He Left PostHog to Build This Solution

Three weeks ago, the creator made the significant decision to depart from PostHog, a successful company and Y Combinator alum, to dedicate his full attention to side projects. This choice to leave a stable and prominent role signals a profound conviction in the technical insights that underpin Dinobase and the pressing need for a better approach to AI agent data access.

At PostHog, he witnessed the inherent limitations of current AI agent architectures firsthand. PostHog AI served real business users asking complex, nuanced questions, which starkly exposed where MCPs fall short on multi-source queries. The persistent gap between what AI agents could theoretically achieve and what they reliably delivered in practice became a source of significant frustration. Rather than merely patching the problem with more elaborate tool calls or complex prompt engineering, he recognized that a fundamental architectural shift was required. This shift involved moving away from brittle API abstraction layers and returning to direct, database-native SQL access, but implemented with modern data infrastructure principles. The remarkably short three-week timeline from his departure to the delivery of a functional MVP clearly demonstrates the urgency and conviction he feels about solving this critical problem. For the broader AI agent ecosystem, this move validates that innovation at the database layer remains paramount, even as much attention focuses on advancements in model capabilities and prompt engineering techniques.

Dinobase vs. MCP: A Technical Comparison Table

The debate between SQL-based data access and MCP (Multi-Component Pipeline) architectures is a central theme in current AI agent data strategies. Dinobase represents a SQL-native approach, leveraging the power of relational databases, while MCP servers, such as official integrations from Stripe or HubSpot, embody a tool-native approach. Understanding the distinct tradeoffs between these two paradigms is crucial for builders to select the most appropriate pattern for their specific use case.

FeatureDinobase (SQL-Native)Per-Source MCP (Tool-Native)
Cross-source joinsNative and optimized in DuckDB enginePerformed in-context by LLM, error-prone
Token usage16-22x lower due to efficient data retrievalHigh per-call overhead, token-intensive orchestration
Accuracy on complex queries2-3x higher, robust for multi-source questionsDegrades significantly with join complexity and data volume
Schema discoveryAutomated via Claude agent, rich annotationsManual tool descriptions, often incomplete or ambiguous
Data freshnessSync-based, configurable intervalsReal-time API calls, but with token costs
Setup complexitySingle connection to unified Dinobase layerMultiple API keys, endpoints, and tool configurations
Data Governance & PIIAutomated PII flagging, guardrails for mutationsManual management per tool, inconsistent enforcement
MaintainabilityCentralized schema, automated documentationDistributed tool definitions, prone to drift
ScalabilityDuckDB scales well for analytical workloads, ParquetLLM context window limits, API rate limits
Cost EfficiencySignificantly lower operational costs for data accessHigher operational costs due to token consumption

Dinobase effectively trades real-time API freshness for superior query power, efficiency, and accuracy. For analytical workloads where slightly stale data (e.g., data updated hourly or daily) is acceptable, SQL-native approaches like Dinobase offer a clear advantage. However, for truly real-time operational tasks requiring immediate consistency, traditional MCPs might still hold value, albeit with higher costs and complexity. The compelling 2-3x accuracy advantage makes SQL the unequivocal choice for decision-support agents and complex business intelligence tasks.

Performance Metrics: Speed and Accuracy Breakdown

The rigorous benchmarks supporting Dinobase’s claims utilized a diverse array of 11 different LLMs, spanning from accessible open-weight models like Kimi 2.5 to cutting-edge frontier models such as Claude Opus 4.6. Across this comprehensive spectrum of models, the SQL-native approach consistently demonstrated significant advantages across three key performance dimensions: accuracy, token consumption, and overall query latency.

The observed accuracy improvements, ranging from 2-3x, are particularly critical. This means that in scenarios where MCP-based agents frequently failed on complex business questions due to orchestration difficulties or context limitations, SQL-based agents, powered by Dinobase, consistently succeeded. This reliability gap is often more important than raw speed for production deployments, where correctness is paramount. However, Dinobase also proved to be 2-3x faster in end-to-end query resolution. These latency gains are primarily attributed to the elimination of numerous network round-trips between multiple MCP servers and a substantial reduction in the LLM’s processing time. When the database engine efficiently handles complex joins and data aggregation, the LLM receives data that is already processed and ready for immediate analysis, minimizing its computational burden. For agents running on OpenClaw with local LLMs, this combined efficiency enables an interactive analysis experience that feels responsive and fluid, rather than sluggish. The synergy of higher accuracy and lower latency creates a user experience where AI agents genuinely deliver on the promise of autonomous and intelligent business analysis.

Deployment Flexibility: Local, S3, GCS, and Azure Support

Dinobase is engineered to support a wide array of deployment topologies, offering exceptional flexibility to meet diverse organizational needs and infrastructure preferences. Users can run the entire stack locally on a development machine, sync data to popular cloud object storage services such as Amazon S3, Google Cloud Storage (GCS), or Azure Blob Storage for seamless team sharing and collaboration, or deploy it within private Virtual Private Clouds (VPCs) for enhanced security and control. This inherent flexibility is crucial for addressing the stringent data sovereignty and compliance concerns that often hinder the adoption of many enterprise AI agent projects.

The local mode allows dlt connectors and DuckDB to operate entirely on your hardware, ensuring that all data processing remains securely within your network perimeter. This setup is ideal for OpenClaw deployments on Mac minis or Linux workstations, particularly where data privacy and minimal external dependencies are paramount. For collaborative teams, syncing Parquet files to object storage creates a shared, versioned data layer without the high cost and operational overhead of a traditional data warehouse. DuckDB can query objects stored in S3, GCS, or Azure directly using its httpfs extension, enabling agents to analyze terabytes of data without requiring local storage constraints. The schema annotations automatically travel with the data, ensuring that any team member connecting to the same cloud bucket benefits from the same comprehensive column documentation and PII flags. This robust architecture effectively avoids the vendor lock-in often associated with proprietary agent data platforms, while simultaneously providing the collaborative features and scalability that modern teams require.

Current Limitations and Call for Community Feedback

The creator of Dinobase is transparent and explicit about the project’s current status: it is an MVP (Minimum Viable Product) built in just three weeks and is not yet production-ready for all use cases. This candidness is a refreshing contrast to many AI infrastructure projects that often overpromise stability and features. The project is actively seeking community feedback to identify and address major architectural gaps, bugs, and areas for improvement.

Known limitations currently include the absence of advanced Role-Based Access Control (RBAC) beyond basic SQL permissions, a lack of comprehensive audit logging capabilities essential for enterprise compliance requirements, and no built-in horizontal scaling for the DuckDB layer itself. The data synchronization process currently relies on dlt’s standard batch scheduling rather than real-time streaming, which might not suit applications requiring immediate data freshness. Error handling for failed connector syncs presently requires manual intervention, and the automated schema annotation feature depends on the availability and rate limits of the Claude API. For OpenClaw builders evaluating Dinobase, these constraints imply that it is currently best suited for analytical agents operating on non-critical data, rather than high-stakes, real-time operational systems. The creator’s active solicitation of feedback on platforms like Hacker News and GitHub underscores a genuine commitment to an open-source development approach, fostering community-driven improvements rather than a stealth commercial product launch.

Implications for Production AI Agent Builders

For teams that are actively developing and deploying AI agents into production environments, Dinobase validates a critical architectural principle: prioritizing robust query power and data integrity over absolute real-time freshness. A significant reason many business agents fail is not due to a lack of model intelligence, but rather their inability to reliably access, join, and interpret data across disparate systems.

The demonstrated 2-3x accuracy improvement and a staggering 95% token cost reduction fundamentally alter the economics and feasibility of AI agent deployment. Projects that previously required expensive, high-end LLMs (like GPT-4 class models) to handle complex tool orchestration can now potentially run on smaller, more cost-effective models when paired with SQL-native access provided by Dinobase. This enables more practical local deployment patterns using OpenClaw with models like LLaMA or Qwen, which might lack the extensive context windows necessary for MCP-heavy workflows. Furthermore, the automated schema annotation feature significantly reduces the ongoing maintenance burden of keeping agent documentation synchronized with constantly evolving data structures. Builders should consider Dinobase not just as a specific implementation, but as a foundational reference architecture for future agent data layers. The paradigm of annotated SQL over raw, fragmented APIs is highly likely to become a standard across the AI agent ecosystem, driving more reliable and efficient deployments.

The Future Trajectory of Agent Data Infrastructure

Dinobase emerges at a pivotal juncture for AI agent infrastructure. As open-source frameworks like OpenClaw mature and local-first deployment models become increasingly practical and desirable, the data layer is quickly becoming the primary bottleneck for advanced agent capabilities. The Dinobase project strongly suggests a convergence between traditional data engineering principles and the evolving needs of AI agent architectures.

Future iterations of Dinobase, or similar architectural patterns, will likely incorporate real-time streaming connectors to address use cases demanding immediate data freshness. We can also anticipate the introduction of materialized views for common agent queries, significantly boosting performance for frequently accessed data patterns. The integration of vector search capabilities for hybrid SQL/semantic retrieval will further enhance agents’ ability to understand and query data using natural language. The automated schema annotation approach could be expanded to include example queries for each table, empowering agents to generate correct SQL more efficiently and accurately. We may also see the emergence of hosted versions that maintain the open-source query engine while managing the underlying infrastructure, offering a managed service option. For the broader AI ecosystem, Dinobase serves as compelling evidence that SQL remains an optimal and highly efficient interface for structured data, even in an era dominated by natural language processing and AI agents. The key innovators in agent infrastructure will be those who embrace and build upon database-native operations rather than attempting to circumvent them with complex, token-intensive abstraction layers. The three-week build timeline also sets a new, inspiring standard for the speed of MVP development in the AI infrastructure space.

Frequently Asked Questions

What is Dinobase and how does it differ from MCP architectures?

Dinobase is a SQL-first database for AI agents built by the creator of PostHog AI. Unlike MCP architectures that force agents to join data in-context through multiple tool calls, Dinobase uses DuckDB to handle cross-source JOINs natively in the database engine. This architectural difference achieved 2-3x higher accuracy and used 16-22x fewer tokens in benchmarks compared to per-source MCP access. While MCPs require agents to orchestrate multiple API calls and manage intermediate results, Dinobase lets agents write standard SQL queries that return final datasets ready for analysis.

Why did the creator choose SQL over MCP for agent data access?

After building PostHog AI, he discovered that giving agents raw SQL access to databases consistently outperformed exposing tools or MCPs. SQL handles joins natively in the database engine rather than forcing the LLM to manage relationships in its limited context window, reducing token consumption and improving accuracy on complex business questions. The benchmarks across 11 different LLMs confirmed this hypothesis, showing that SQL eliminated the error-prone steps of API orchestration and in-context joining that plague MCP-based agent architectures.

What is the architecture behind Dinobase?

Dinobase uses dlt to sync data from 101 connectors into Parquet files, stored locally or in S3, GCS, or Azure buckets. DuckDB serves as the embedded query engine for cross-source JOINs. After each sync completes, a Claude agent automatically annotates schemas with table descriptions, column documentation, PII flags, and relationship maps. This annotation layer helps querying agents understand foreign key relationships and data types without manual documentation. The architecture supports both read operations and safe mutations through guardrails that validate SQL before execution.

How does Dinobase integrate with OpenClaw and other agent frameworks?

Dinobase works with all major agent frameworks including LangChain, CrewAI, LlamaIndex, Pydantic AI, and Mastra. It also supports local agents like Claude Code, Cursor, Codex, and OpenClaw. You connect your agent to Dinobase’s DuckDB instance via standard SQL connectors or ODBC drivers. For OpenClaw specifically, you configure a database tool pointing to the local DuckDB file or remote endpoint, allowing agents to generate and execute SQL queries against your unified data layer. This integration pattern fits naturally with OpenClaw’s tool-calling architecture while providing the performance benefits of native SQL execution.

What are the current limitations of Dinobase?

The creator explicitly states Dinobase is not bug-free and is currently seeking community feedback. As an MVP built over three weeks, it lacks enterprise features like advanced RBAC, comprehensive audit logging, and horizontal scaling for the DuckDB layer. The sync process runs on batch schedules rather than real-time streaming, and error handling for failed connector syncs requires manual intervention. The project currently focuses on query correctness and token efficiency over production hardening, making it best suited for analytical agents and proof-of-concept deployments rather than critical operational systems handling sensitive real-time data.

Conclusion

The creator of PostHog AI built Dinobase, an AI agent database using SQL instead of MCPs. Benchmarks show 2-3x accuracy gains with 16-22x less tokens.