Is RAG Useless? The Rise of Context Engineering and Semantic Layers for Agentic AI

Constructing Transformer Fashions from Scratch with PyTorch (10-day Mini-Course)

Implementing the Fourier Rework Numerically in Python: A Step-by-Step Information

Introduction

Retrieval-Augmented Era (RAG) might have been crucial for the primary wave of enterprise AI, however it’s rapidly evolving into one thing a lot bigger. Over the previous two years, organizations have realized that merely retrieving textual content snippets utilizing vector search isn’t sufficient. Context needs to be ruled, explainable, and adaptive to an agent’s function.

This submit explores how that evolution is taking form and what it means for information and AI leaders constructing methods that may purpose responsibly.

You’ll come away with solutions to some key questions:

How do information graphs enhance RAG?

They supply construction and that means to enterprise information, linking entities and relationships throughout paperwork and databases to make retrieval extra correct and explainable for each people and machines.

How do semantic layers assist LLMs retrieve higher solutions?

Semantic layers standardize information definitions and governance insurance policies so AI brokers can perceive, retrieve, and purpose over all types of information in addition to AI instruments, recollections, and different brokers.

How is RAG evolving within the age of agentic AI?

Retrieval is turning into one step in a broader reasoning loop (more and more being known as “context engineering”) the place brokers dynamically write, compress, isolate, and choose context throughout information and instruments.

TL;DR

(RAG) rose to prominence following the launch of ChatGPT and the belief that there’s a restrict on the context window: you’ll be able to’t simply copy all of your information into the chat interface. Groups used RAG, and its variants like GraphRAG (RAG utilizing a graph database) to carry further context into prompts at question time. RAG’s reputation quickly uncovered its weaknesses: placing incorrect, irrelevant, or simply an excessive amount of data into the context window can really degrade relatively than enhance outcomes. New methods like re-rankers have been developed to beat these limitations however RAG wasn’t constructed to outlive within the new agentic world.

As AI shifts from single prompts to autonomous brokers, retrieval and its variants are only one instrument in an agent’s toolbelt, alongside writing, compressing, and isolating context. Because the complexity of workflows and the data required to finish these workflows grows, retrieval will proceed to evolve (although it might be known as context engineering, RAG 2.0, or agentic retrieval). The following period of retrieval (or context engineering) would require metadata administration throughout information constructions (not simply relational) in addition to instruments, recollections, and brokers themselves. We’ll consider retrieval not only for accuracy but in addition relevance, groundedness, provenance, protection, and recency. Data graphs will likely be key for retrieval that’s context-aware, policy-aware, and semantically grounded.

The Rise of RAG

What’s RAG?

RAG, or Retrieval-Augmented Era, is a way for retrieving related data to reinforce a immediate that’s despatched to an LLM in an effort to enhance the mannequin’s response.

Shortly after ChatGPT went mainstream in November 2022, customers realized that LLMs weren’t (hopefully) skilled on their very own information. To bridge that hole, groups started growing methods to retrieve related information at question time to reinforce the immediate – an strategy referred to as retrieval-augmented era (RAG). The time period got here from a 2020 Meta paper, however the reputation of the GPT fashions introduced the time period and the observe into the limelight.

Instruments like LangChain and LlamaIndex helped builders construct these retrieval pipelines. LangChain was launched at across the identical time as ChatGPT as a means of chaining completely different elements like immediate templates, LLMs, brokers, and reminiscence collectively for generative AI purposes. LlamaIndex was additionally launched concurrently a method to handle the restricted context window in GPT3 and thus enabling RAG. As builders experimented, they realized that vector databases present a quick and scalable method to energy the retrieval a part of RAG, and vector databases like Weaviate, Pinecone, and Chroma turn out to be commonplace components of the RAG structure.

What’s GraphRAG?

GraphRAG is a variation of RAG the place the underlying database used for retrieval is a information graph or a graph database.

One variation of RAG grew to become particularly widespread: GraphRAG. The concept right here is that the underlying information to complement LLM prompts is saved in a information graph. This permits the mannequin to purpose over entities and relationships relatively than flat textual content chunks. In early 2023, researchers started publishing papers exploring how information graphs and LLMs may complement one another. In late 2023, Juan Sequeda, Dean Allemang, and Bryon Jacob from information.world launched a paper demonstrating how information graphs can enhance LLM accuracy and explainability. In July 2024, Microsoft open-sourced its GraphRAG framework, which made graph-based retrieval accessible to a wider developer viewers and solidified GraphRAG as a recognizable class inside RAG.

The rise of GraphRAG reignited curiosity in information graphs akin to when Google launched its Data Graph in 2012. The sudden demand for structured context and explainable retrieval gave them new relevance.

From 2023–2025, the market responded rapidly:

January 23, 2023 – Digital Science acquired metaphacts, creators of the metaphactory platform: “a platform that helps clients in accelerating their adoption of information graphs and driving information democratization.”

February 7, 2023 – Progress acquired MarkLogic in February of 2023. MarkLogic is a multimodal NoSQL database, with a selected power in managing RDF information, the core information format for graph know-how.
July 18, 2024 – Samsung acquired Oxford Semantic Applied sciences, makers of the RDFox graph database, to energy on-device reasoning and private information capabilities.
October 23, 2024 – Ontotext and Semantic Net Firm merged to type Graphwise, explicitly positioning round GraphRAG. “The announcement is critical for the graph trade, because it elevates Graphwise as probably the most complete information graph AI group and establishes a transparent path in the direction of democratizing the evolution of Graph RAG as a class.”
Could 7, 2025 – ServiceNow introduced its acquisition of information.world, integrating a graph-based information catalog and semantic layer into its enterprise workflow platform.

These are simply the occasions associated to information graph and associated semantic know-how. If we develop this to incorporate metadata administration and/or semantic layers extra broadly then there are extra offers, most notably the $8 billion acquisition of metadata chief Informatica by Salesforce.

These strikes mark a transparent shift: information graphs are not simply metadata administration instruments—they’ve turn out to be the semantic spine for AI and nearer to their origins as skilled methods. GraphRAG made information graphs related once more by giving them a essential position in retrieval, reasoning, and explainability.

In my day job because the product lead for a semantic information and AI firm, we work to unravel the hole between information and its precise that means for a few of the world’s largest corporations. Making their information AI-ready is a mixture of making it interoperable, discoverable, and usable so it may well feed LLMs contextually related data in an effort to produce secure, correct outcomes. That is no small order for giant, extremely regulated, and complicated enterprises managing exponential quantities of information.

The autumn of RAG and the rise of context engineering

Is RAG lifeless? No, however it has developed. The unique model of RAG relied on a single dense vector search and took the highest outcomes to feed straight into an LLM. GraphRAG constructed on this by including in some graph analytics and entity and/or relationship filters. These implementations nearly instantly bumped into constraints round relevance, scalability, and noise. These constraints pushed RAG ahead into new evolutions identified by many names: agentic retrieval, RAG 2.0, and most just lately, context engineering. The unique, naive implementation is essentially lifeless, however its descendants are thriving and the time period itself remains to be extremely widespread.

Following the RAG hype cycle in 2024, there was inevitable disillusionment. Whereas it’s potential to construct a RAG demo in minutes, and many individuals did, getting your app to scale in an enterprise turns into fairly a bit dicier. “Individuals suppose that RAG is simple as a result of you’ll be able to construct a pleasant RAG demo on a single doc in a short time now and it will likely be fairly good. However getting this to truly work at scale on actual world information the place you’ve enterprise constraints is a really completely different drawback,” mentioned Douwe Kiela of Contextual AI and one of many authors of the unique RAG paper from Meta in 2020.

One problem with scaling a RAG app is the amount of information wanted at retrieval time. “I believe the difficulty that individuals get into with it’s scaling it up. It’s nice on 100 paperwork, however now impulsively I’ve to go to 100,000 or 1,000,000 paperwork” says Rajiv Shah. However as LLMs matured, their context home windows grew. The dimensions of context home windows was the unique ache level that RAG was constructed to handle, elevating the query if RAG remains to be crucial or helpful. As Dr. Sebastian Gehrmann from Bloomberg factors out, “If I’m able to simply paste in additional paperwork or extra context, I don’t must depend on as many tips to slim down the context window. I can simply depend on the big language mannequin. There’s a tradeoff right here although” he notes, “the place longer context normally comes at a value of considerably elevated latency and value.”

It isn’t simply value and latency that you just threat by arbitrarily dumping extra data into the context window, you can too degrade efficiency. RAG can enhance responses from LLMs, supplied the retrieved context is related to the preliminary immediate. If the context will not be related, you may get worse outcomes, one thing known as “context poisoning” or “context conflict”, the place deceptive or contradictory data contaminates the reasoning course of. Even in case you are retrieving related context, you’ll be able to overwhelm the mannequin with sheer quantity, resulting in “context confusion” or “context distraction.” Whereas terminology varies, a number of research present that mannequin accuracy tends to say no past a sure context dimension. This was present in a Databricks paper again in August of 2024 and strengthened by way of current analysis from Chroma, one thing they termed “context rot”. Drew Breuning’s submit usefully categorizes these points as distinct “context fails”.

To handle the issue of overwhelming the mannequin, or offering incorrect, or irrelevant data, re-rankers have grown in reputation. As Nikolaos Vasiloglou from RelationalAI states, “a re-ranker is, after you carry the information, how do you resolve what to maintain and what to throw away, [and that] has a huge impact.” Standard re-rankers are Cohere Rerank, Voyage AI Rerank, Jina Reranker, and BGE Reranker. Re-ranking will not be sufficient in at the moment’s agentic world. The latest era of RAG has turn out to be embedded into brokers–one thing more and more referred to as context engineering.

What’s Context Engineering?

“the artwork and science of filling the context window with simply the precise data at every step of an agent’s trajectory.” Lance Martin of LangChain.

I wish to give attention to context engineering for 2 causes: the originators of the phrases RAG 2.0 and Agentic Retrieval (Contextual AI and LlamaIndex, respectively) have began utilizing the time period context engineering; and it’s a way more widespread time period primarily based on Google search developments. Context engineering may also be considered an evolution of immediate engineering. Immediate engineering is about crafting a immediate in a means that will get you the outcomes you need, context engineering is about supplementing that immediate with the suitable context

RAG grew to prominence in 2023, eons in the past within the timeline of AI. Since then, every thing has turn out to be ‘agentic’. RAG was created below the idea that the immediate can be generated by a human, and the response can be learn by a human. With brokers, we have to rethink how this works. Lance Martin breaks down context engineering into 4 classes: write, compress, isolate, and choose. Brokers must write (or persist or keep in mind) data from process to process, similar to people. Brokers will typically have an excessive amount of context as they go from process to process and must compress or condense it in some way, normally by way of summarization or ‘pruning’. Relatively than giving all the context to the mannequin, we will isolate it or break up it throughout brokers to allow them to, as Anthropic describes it, “discover completely different components of the issue concurrently”. Relatively than threat context rot and degraded outcomes, the concept right here is to not give the LLM sufficient rope to hold itself.

Brokers have to make use of their recollections when wanted or name upon instruments to retrieve further data, i.e. they should choose (retrieve) what context to make use of. A kind of instruments could possibly be vector-based retrieval i.e. conventional RAG. However that is only one instrument within the agent’s toolbox. As Mark Brooker from AWS put it, “I do anticipate what we’re going to see is a few of the flashy newness round vector type of calm down and us go to a world the place we’ve got this new instrument in our toolbox, however numerous the brokers we’re constructing are utilizing relational interfaces. They’re utilizing these doc interfaces. They’re utilizing lookup by major key, lookup by secondary index. They’re utilizing lookup by geo. All of this stuff which have existed within the database area for many years, now we even have this yet one more, which is kinda lookup by semantic that means, which could be very thrilling and new and highly effective.”

These on the forefront are already doing this. Martin quotes Varun Mohan of Windsurf who says, “we […] depend on a mix of methods like grep/file search, information graph primarily based retrieval, and … a re-ranking step the place [context] is ranked so as of relevance.”

Naive RAG could also be lifeless, and we’re nonetheless determining what to name the trendy implementations, however one factor appears sure: the way forward for retrieval is vibrant. How can we guarantee brokers are in a position to retrieve completely different datasets throughout an enterprise? From relational information to paperwork? The reply is more and more being known as the semantic layer.

Context engineering wants a semantic layer

What’s a Semantic Layer?

A semantic layer is a means of attaching metadata to all information in a type that’s each human and machine readable, so that individuals and computer systems can constantly perceive, retrieve, and purpose over it.

There’s a current push from these within the relational information world to construct a semantic layer over relational information. Snowflake even created an Open Semantic Interchange (OSI) initiative to try to standardize the way in which corporations are documenting their information to make it prepared for AI.

However focusing solely on relational information is a slim view of semantics. What about unstructured information and semi-structured information? That’s the type of information that enormous language fashions excel at and what began all of the RAG rage. If solely there was a precedent for retrieving related search outcomes throughout a ton of unstructured information 🤔.

Google has been retrieving related data throughout your entire web for many years utilizing structured information. By structured information, right here, I imply machine-readable metadata, or as Google describes it, “a standardized format for offering details about a web page and classifying the web page content material.” Librarians, data scientists, and website positioning practitioners have been tackling the unstructured information retrieval drawback by way of information group, data retrieval, structured metadata, and Semantic Net applied sciences. Their strategies for describing, linking, and governing unstructured information underpin at the moment’s search and discovery methods, each publicly and on the enterprise. The way forward for the semantic layer will bridge the relational and the structured information worlds by combining the rigor of relational information administration with the contextual richness of library sciences and information graphs.

The way forward for RAG

Listed here are my predictions for the way forward for RAG.

RAG will proceed to evolve into extra agentic patterns. Because of this retrieval of context is only one a part of a reasoning loop which additionally contains writing, compressing, and isolating context. Retrieval turns into an iterative course of, relatively than one-shot. Anthropic’s Mannequin Context Protocol (MCP) treats retrieval as a instrument that may be given by way of MCP to an agent. OpenAI affords File search as a instrument that brokers can name. LangChain’s agent framework LangGraph allows you to construct brokers utilizing a node and edge sample (like a graph). Of their quickstart information right here, you’ll be able to see that retrieval (on this case an internet search) is simply one of many instruments that the agent will be given to do its job. Right here they listing retrieval as one of many actions an agent or workflow can take. Wikidata additionally has an MCP that allows customers to work together straight with public information.

Retrieval will broaden and embody all types of information (aka multimodal retrieval): relational, content material, after which pictures, audio, geodata, and video. LlamaIndex affords 4 ‘retrieval modes’: chunks, files_via_metadata, files_via_content, auto_routed. In addition they supply composite retrieval, permitting you to retrieve from a number of sources without delay. Snowflake affords Cortex Search for content material and Cortex Analyst for relational information. LangChain affords retrievers over relational information, graph information (Neo4j), lexical, and vector.

Retrieval will broaden to incorporate metadata about instruments themselves, in addition to “recollections”. Anthropic’s MCP standardized how brokers name instruments utilizing a registry of instruments i.e. instrument metadata. OpenAI, LangChain, LlamaIndex, AWS Bedrock, Azure, Snowflake, and Databricks all have capabilities for managing instruments, some by way of MCP straight, others by way of their very own registries. On the reminiscence facet, each LlamaIndex and LangChain deal with recollections as retrievable information (quick time period and long run) that brokers can question throughout workflows. Initiatives like Cognee push this additional with devoted, queryable agent reminiscence.

Data graphs will play a key position as a metadata layer between relational and unstructured information, changing the slim definition of semantic layer at present in use with a extra strong metadata administration framework. The market consolidation we’ve seen over the previous couple years and described above, I consider, is a sign of the market’s rising acknowledgement that information graphs and metadata administration are going to be essential as brokers are requested to do extra difficult duties throughout enterprise information. Gartner’s Could 2025 report “Pivot Your Knowledge Engineering Self-discipline to Effectively Assist AI Use Circumstances,” recommends information engineering groups undertake semantic methods (reminiscent of ontologies and information graphs) to assist AI use circumstances. Data graphs, metadata administration, and reference information administration are already ubiquitous in massive life sciences and monetary providers corporations, largely as a result of they’re extremely regulated and require fact-based, grounded information to energy their AI initiatives. Different industries are going to begin adopting the tried and true strategies of semantic know-how as their use circumstances turn out to be extra mature and require explainable solutions.

Analysis metrics on context retrieval will achieve reputation. Ragas, Databricks Mosaic AI Agent Analysis, and TruLens all present frameworks for evaluating RAG. Evidently affords open supply libraries and educational materials on RAG analysis. LangChain’s analysis product LangSmith has a module targeted on RAG. What’s essential is that these frameworks are usually not simply evaluating the accuracy of the reply given the immediate, they consider context relevance and groundedness (how properly the response is supported by the context). Some distributors are constructing out metrics to judge provenance (citations and sourcing) of the retrieved context, protection (did we retrieve sufficient?) and freshness or recency.

Coverage-as-code guardrails guarantee retrieval respects entry management, insurance policies, rules, and greatest practices. Snowflake and Databricks allow row stage entry management and column masking already. Coverage engines like Open Coverage Agent (OPA) and Oso are embedding entry management into agentic workflows. As Dr. Sebastian Gehrmann of Bloomberg has discovered, “RAG will not be essentially safer,” and may introduce new governance dangers. I anticipate the necessity for guardrails to develop to incorporate extra difficult governance guidelines (past entry management), coverage necessities, and greatest practices.

Conclusion

RAG was by no means the top aim, simply the place to begin. As we transfer into the agentic period, retrieval is evolving into part of a full self-discipline: context engineering. Brokers don’t simply want to seek out paperwork; they should perceive which information, instruments, and recollections are related for every step of their reasoning. This understanding requires a semantic layer–a method to perceive, retrieve, and govern over your entire enterprise. Data graphs, ontologies, and semantic fashions will present that connective tissue. The following era of retrieval received’t simply be about velocity and accuracy; it’ll even be about explainability and belief. The way forward for RAG will not be retrieval alone, however retrieval that’s context-aware, policy-aware, and semantically grounded.

Concerning the writer: Steve Hedden is the Head of Product Administration at TopQuadrant, the place he leads the technique for TopBraid EDG, a platform for information graph and metadata administration. His work focuses on bridging enterprise information governance and AI by way of ontologies, taxonomies, and semantic applied sciences. Steve writes and speaks frequently about information graphs, and the evolving position of semantics in AI methods.