Grab’s Analytics Data Warehouse Team Deploys Multi-Agent AI System to Revolutionize Engineering Support and Boost Innovation.

SagohMay 20, 2026

0 6 7 minutes read

Grab, the leading superapp in Southeast Asia, has announced a significant leap in its operational efficiency and engineering productivity with the deployment of a multi-agent AI system by its Analytics Data Warehouse (ADW) team. This innovative system is designed to automate complex engineering support workflows across Grab’s expansive data platform, effectively reducing the burden of repetitive operational tasks and dramatically improving the speed and efficacy of issue resolution. The strategic shift aims to free up valuable engineering talent, allowing them to pivot from reactive problem-solving to proactive, high-value development and system design, thereby fostering greater innovation within the company’s critical data infrastructure.

The Escalating Challenge of Scale within Grab’s Data Ecosystem

Grab’s journey to becoming a dominant force in Southeast Asia’s digital economy has been underpinned by an ever-growing, sophisticated data infrastructure. The ADW platform, a core analytics component, supports over 1,000 internal users and manages a staggering 15,000-plus tables. This infrastructure is not merely a repository; it is the lifeblood for critical functions across Grab’s diverse services, from optimizing ride-hailing routes and personalizing food delivery recommendations to enabling secure financial transactions and driving strategic business intelligence. The sheer volume and complexity of data, coupled with the rapid expansion of Grab’s services across multiple markets, inevitably led to an escalating demand for engineering support.

As the platform scaled, the ADW engineering team found itself increasingly mired in a cycle of operational firefighting. A substantial portion of their collective effort was consumed by a steady stream of repetitive support tasks and ad hoc investigations. These included common yet time-intensive activities such as data warehouse troubleshooting, debugging complex SQL queries, and providing general platform assistance. While essential for maintaining platform stability and user satisfaction, these tasks diverted critical engineering bandwidth away from strategic initiatives. Engineers, whose expertise was invaluable for developing new features, enhancing system architecture, and driving long-term platform improvements, were instead dedicating significant hours to routine support tickets. This not only created bottlenecks in development cycles but also posed a risk to engineer morale, as the focus shifted from creative problem-solving to reactive maintenance.

A Strategic Pivot: From Firefighting to System Building

Recognizing this operational inefficiency as a significant impediment to growth and innovation, Grab’s Central Data Team embarked on a mission to re-engineer their support paradigm. The vision was to leverage advanced artificial intelligence to offload the predictable, repetitive aspects of engineering support, thereby unlocking the latent potential of their human engineers. Sneh Agrawal, Head of Analytics at Grab, succinctly captured this transformative goal in a LinkedIn post, stating, "Grab’s Central Data Team is leveraging a multi-agent system to automate repetitive operational work, reclaiming hundreds of engineering hours each month. This shift is unlocking critical engineering bandwidth and enabling a transition from reactive firefighting to higher-value system building." This statement underscored not just a technical solution but a strategic organizational imperative to empower engineers and accelerate platform evolution.

Unpacking the Multi-Agent Architecture: Investigation and Enhancement

To address the multifaceted nature of engineering support requests, the Grab team implemented a sophisticated multi-agent architecture. This design intelligently segregates incoming engineering requests into two primary, specialized workflows: Investigation and Enhancement. This deliberate separation was a key architectural decision aimed at reducing complexity in agent reasoning and improving the reliability and predictability of outputs in production environments.

Investigation Workflows: These workflows are meticulously designed for diagnostic tasks. When an engineer submits a query or reports an issue, the system can automatically initiate a series of investigative steps. This includes detailed query analysis to identify performance bottlenecks or syntax errors, efficient log retrieval across various system components to pinpoint anomalies, precise schema lookup to understand data structures, and comprehensive issue summarization, compiling all relevant findings into a coherent report. The agents within this workflow act as highly efficient digital detectives, sifting through vast amounts of data to diagnose the root cause of a problem.
Enhancement Workflows: Complementing the diagnostic capabilities, enhancement workflows are geared towards generating actionable outputs. Once an issue is identified or a request for a modification is made, these agents focus on creating concrete solutions. This can involve generating precise code changes to resolve bugs or implement minor features, crafting optimized SQL fixes for inefficient queries, and even initiating automated merge requests for review within Grab’s Git-based version control system. The human-in-the-loop oversight for these automated changes ensures that while the system accelerates development, critical engineering judgment and quality control remain paramount.

The Technical Underpinnings: LangGraph, FastAPI, and Specialized Agents

At the heart of Grab’s multi-agent system lies a robust orchestration layer built on modern AI and software development frameworks. The system leverages a LangGraph-based workflow engine, which provides a flexible and powerful way to define and manage complex agent interactions and decision-making processes. LangGraph’s capabilities allow for the creation of cyclic graphs, enabling agents to communicate, iterate, and refine their outputs based on feedback within the system, mimicking a collaborative human team.

This workflow engine is seamlessly integrated with FastAPI services. FastAPI, known for its high performance and ease of use in building APIs, coordinates crucial functions across the system. It handles the initial routing of requests to the appropriate agents, manages the execution of various internal tools, and maintains the state across different interactions, ensuring a consistent and coherent operational flow.

Upon receiving a request, the system first classifies its nature and then intelligently routes it to one or more specialized agents. These agents are designed with deliberately constrained responsibilities. For instance, a dedicated agent might be responsible solely for context retrieval, sifting through documentation and past solutions. Another might specialize in code search, identifying relevant code snippets or functions. Yet another could be tasked with solution generation, proposing fixes or enhancements. This modular approach, where each agent operates with a narrow, defined scope, significantly reduces ambiguity in their decision-making processes and vastly improves the predictability and reliability of their outputs. An overarching Supervisor agent plays a critical role in controlling the communication flow between these specialized agents and delegating tasks, much like a project manager overseeing a team.

Optimizing the Tool Ecosystem for Enhanced Performance

A significant technical challenge encountered during the system’s development was managing the vast array of internal tools Grab’s engineers utilized. Initially, the multi-agent system had access to over 30 internal tools spanning data access, logging, and code systems. While this provided broad capabilities, it also introduced complexities. The sheer number of tools made maintainability difficult and often led to unpredictable tool selection by the agents, as they struggled to consistently identify the most appropriate tool for a given task.

To address this, the team made a strategic decision to consolidate and curate the tool ecosystem. They reduced the extensive list to a smaller, more manageable, and highly optimized set of tools. This curated tool layer now includes controlled SQL execution environments, ensuring queries are run safely and efficiently; robust metadata access systems for understanding data lineage and structure; sophisticated log retrieval systems for diagnostics; and seamless integration with Git-based workflows for automated change management and version control. This streamlined approach not only enhanced the system’s maintainability but also significantly improved the agents’ ability to make accurate and efficient tool selections, leading to more reliable and faster problem resolution.

Prioritizing Safety, Governance, and Human Oversight

Given the sensitive nature of data within Grab’s operations and the critical role of its data platform, safety and governance were not afterthoughts but integral components of the multi-agent system’s design. Grab implemented several layers of control and oversight to ensure responsible AI deployment.

Designing a Multi-Agent System for Engineering Support at Scale: A Case Study From Grab

SQL execution, a potentially powerful and risky operation, is strictly constrained through multiple validation layers. These layers prevent malicious or erroneous queries from impacting the production environment. Furthermore, sophisticated mechanisms are in place for sensitive data handling, including capabilities for detecting and mitigating potential exposure risks, ensuring data privacy and compliance with regulatory standards.

Crucially, Grab maintained a "human-in-the-loop" (HITL) model for all enhancement workflows that produce code changes. This means that while the AI system can generate code changes or SQL fixes, these automated outputs are never deployed directly to production without prior human review and approval. This engineering oversight mechanism is vital for maintaining high code quality, ensuring logical correctness, and building trust in the AI system’s capabilities. It balances the efficiency gains of automation with the necessary human judgment for critical system modifications.

Navigating Technical Hurdles: Context Management

One of the most significant technical challenges in developing a multi-step agent reasoning system is effective context management. For agents to perform complex, multi-step investigations or generate intricate solutions, they need to maintain relevant state and information across multiple interactions. However, large language models (LLMs), which often power such agents, operate under token constraints, limiting the amount of information they can process at any given time.

Grab’s engineering team addressed this through innovative strategies, including structured context compression and selective retrieval. Context compression involves intelligently summarizing and distilling vast amounts of information into more concise representations, allowing the agents to retain necessary details without exceeding token limits. Selective retrieval ensures that only the most pertinent information is brought into the agent’s active context at any given step, avoiding information overload and improving the efficiency and accuracy of reasoning. These techniques were crucial for enabling the agents to perform complex, multi-turn interactions effectively and reliably.

Tangible Impact and Broader Implications

The deployment of Grab’s multi-agent AI system has already yielded tangible benefits. While specific performance metrics such as precise percentage reductions in resolution times or cost savings were not publicly disclosed, the team has unequivocally observed a significant reduction in the time engineers spend on routine support tasks. This directly translates to faster resolution cycles for common issues, enhancing the overall efficiency and reliability of the ADW platform.

More importantly, the system has facilitated a profound shift in engineering effort. Engineers are now demonstrably moving away from the reactive "firefighting" culture that previously dominated their days. Instead, they are increasingly able to dedicate their expertise to higher-value activities such as platform engineering, architectural improvements, and innovative system design. This strategic reallocation of human capital is expected to accelerate Grab’s data platform evolution, leading to more robust, scalable, and feature-rich infrastructure.

From a broader industry perspective, Grab’s initiative serves as a powerful case study for the practical application of multi-agent AI in large-scale enterprise environments. It demonstrates how AI can augment human capabilities, not merely replace them, by automating the mundane and freeing humans for creative and strategic work. As companies globally grapple with the dual challenges of managing increasingly complex data infrastructures and maximizing engineering productivity, Grab’s model offers a compelling blueprint. It highlights the potential for AI to tackle "tech debt" and operational overhead, fostering a culture of innovation and continuous improvement. The emphasis on safety, governance, and human oversight also sets a crucial precedent for responsible AI deployment in mission-critical systems, underscoring that the future of enterprise AI lies in intelligent automation complemented by robust human control. This strategic investment positions Grab not only as a leader in Southeast Asia’s digital economy but also as an innovator in leveraging advanced AI for operational excellence and long-term technological advancement.