Proxy-Pointer RAG: Reaching Vectorless Accuracy at Vector RAG Scale and Value

Constructing a Python Workflow That Catches Bugs Earlier than Manufacturing

I Changed Vector DBs with Google’s Reminiscence Agent Sample for my notes in Obsidian

launch of PageIndex not too long ago, is a part of a broader shift in AI structure towards “Vectorless RAG” or “Reasoning-Primarily based Retrieval.” As an alternative of the usual methodology of splitting paperwork into random chunks and looking by way of mathematical similarity, PageIndex builds a “Good Desk of Contents” (a hierarchical tree) that enables LLMs to navigate paperwork like a human professional would. Quite a few blogs (together with this one from Microsoft), define the working ideas (no vector database, no chunking, enhanced explainability) together with 98.7% accuracy achieved on a monetary benchmark. Nonetheless, they’re additionally cautious to notice that Vectorless RAG is finest fitted to deep-dive queries on advanced structured or semi-structured paperwork (reminiscent of monetary statements), than looking throughout many unbiased paperwork, reminiscent of buyer help information bases, the place we should always proceed to make use of Vector RAG.

Why is that?

If Vectorless RAG utilizing PageIndex supplies higher (or a minimum of pretty much as good) outcomes on nearly any question, why not use it for a big assortment of paperwork. The first cause is that PageIndex’s tree-based strategy can not virtually scale to multi-document eventualities. The hierarchical tree index that may be a pre-requisite ingestion step is sluggish and costly to construct utilizing a LLM. Moreover, the retrieval is a 2 step course of: use an LLM to stroll the tree, find essentially the most related nodes, then use the content material of these nodes as context for the response synthesis step utilizing the LLM.

As compared, constructing a vector index is quick and cheap, and the retrieval step makes use of a LLM solely as soon as through the synthesis step. Additionally, Ingestion utilizing an embedding mannequin prices a lot lower than summarization of the total doc by an LLM.

What should you may get the superb structure-aware reasoning accuracy of Vectorless RAG, together with the low latency and worth of a Vector RAG, in a manner that’s scalable throughout the enterprise database? On this article, I’ll stroll by means of an actual use case on a big, advanced doc to construct Proxy-Pointer RAG—an ingestion and retrieval pipeline that achieves this by means of a set of novel engineering steps. Alongside the best way, we are going to discover and show the next:

Why precisely is PageIndex so correct? And why it’s tough to virtually scale the idea to multi-document information bases.
A fast comparability of Vectorless RAG utilizing PageIndex vs Flat Vector RAG to determine a baseline.
How can we incorporate the ideas of PageIndex right into a Vector index with not one of the related latency and price?
Comparability of all kinds of queries utilizing PageIndex and Proxy-Pointer to check the standard of retrievals.

Use Case Setup

We’ll use a World Financial institution report named South Asia Improvement Replace, April 2024: Jobs for Resilience (License: CC BY 3.0 IGO). This can be a 131 web page report comprising a number of chapters, advanced charts, tables, content material in bins and many others. and is an efficient candidate for PageIndex to show its functionality. I’ve used gemini-3-flash because the LLM to construct the pageindex tree and gemini-3.1-flash-lite for retrievals. I extracted the report pdf to a markdown file utilizing the Adobe PDF Extract API, however every other methodology reminiscent of utilizing a VLM which preserves the integrity of the tables, charts and many others would work simply as effectively. For vector database, FAISS is used.

How does PageIndex work?

As an alternative of the “chunk your doc, embed the chunks, retrieve the top-Ok, feed them to an LLM” pipeline of a Vector RAG, PageIndex takes a radically totally different strategy to doc retrieval. As an alternative of treating a doc as a flat sequence of chunks, it builds a semantic skeleton tree — a hierarchical map of each part, sub-section, and content material block within the doc — after which makes use of an LLM to navigate that tree at question time.

Section 1: Indexing(as soon as per doc)

PageIndex parses the doc’s heading construction (Markdown headers, PDF outlines, and many others.) right into a nested tree. Every node will get:

A title (extracted from the heading)
A node ID (distinctive identifier like 0012)
Line boundaries (begin and finish line within the supply doc)
A abstract (generated by an LLM — that is the costly and time-consuming half)

The result’s a JSON that appears like this:

{
  "node_id": "0011",
  "title": "Chapter 1. Misleading Energy",
  "abstract": "Covers South Asia's progress outlook, inflation tendencies, monetary vulnerabilities, local weather dangers, and coverage challenges...",
  "line_num": 621,
  "nodes": [
    {
      "node_id": "0012",
      "title": "Introduction",
      "summary": "Summarizes the chapter's key themes including regional growth driven by India...",
      "line_num": 625
    },
    ...
  ]
}

Section 2: Retrieval (Per Question)

When a consumer asks a query, PageIndex fingers the total tree of summaries to an LLM and says, “Which nodes comprise the reply?”. That is not like a Vector RAG which depends on mathematical similarity between question and chunk embeddings to construct the related context.

The LLM reads the summaries — not the total textual content — and returns a brief listing of node IDs. PageIndex then makes use of the road boundaries to slice the precise, contiguous, full part from the unique markdown file and passes it to the synthesis LLM.

Why this works so effectively?

PageIndex excels due to three architectural benefits:

1. Structural Navigation, Not Sample Matching

Once we ask “What are the primary messages of Chapter 1?”, PageIndex doesn’t seek for chunks containing these phrases. It reads the abstract of node 0011 (“Chapter 1. Misleading Energy”) which says “Covers progress outlook, inflation, monetary vulnerabilities, local weather dangers, and coverage challenges” — and instantly is aware of that is the proper node. It causes about relevance, not semantic and lexical similarity.

2. Contiguous Context Extraction

As soon as the proper nodes are recognized, PageIndex extracts the total, unbroken part that the node represents, from the unique Markdown — headers, sub-headers, bullet factors, determine references, and all. The synthesis LLM receives context that reads like a correctly authored doc part, not a fragmented chunk with arbitrary boundaries.

3. Zero Chunk Boundary Artifacts

There aren’t any overlapping chunks, no cut up sentences, no context home windows that begin mid-paragraph. Each piece of context has a pure starting (the part header) and a pure finish (the subsequent part’s begin). This dramatically reduces hallucination from ambiguous context.

Nonetheless, this energy comes at a price—and that is the place the strategy begins to interrupt down.

Why that is tough to scale?

The explanations are Value and Latency (ingestion in addition to retrieval time). For our 131-page report, we get 137 structural nodes within the tree and due to this fact, PageIndex wants 137 LLM calls only for the indexing part — one abstract per node. For a corpus of fifty paperwork, that’s ~7,000 LLM calls earlier than now we have used it to reply a single query.

Here’s a comparative price of PageIndex vs Vector Index:

Metric	PageIndex	Normal Vector RAG
Indexing LLM calls	~137 per doc	0
Indexing time	5-10 minutes per doc	30 seconds per doc
Indexing price	$$$ (LLM API charges)	$ (embedding API solely)
Multi-document help	Requires per-document tree traversal throughout retrieval	Single unified index, quick and environment friendly
Retrieval high quality	★★★★★	★★★☆☆

For a single vital doc reminiscent of a monetary report, this price is well justified. For a 500-document enterprise information base on customer support SOPs that updates weekly? It’s a non-starter.

That is the place PageIndex begins changing into impractical for actual methods at scale.

Earlier than proposing another, let’s set up a baseline: how significantly better is PageIndex in comparison with commonplace Vector RAG?

Comparability of Vectorless vs Flat Vector RAG

To determine a baseline, I in contrast the outcomes of a various vary of queries utilizing Vectorless and flat Vector RAG. As anticipated, the standard of Vectorless RAG was higher in all circumstances. Listed here are two examples:

Question about desk content material

Right here is the response from Vectorless RAG:

=== VECTORLESS RAG INTERACTION ===
Query: what does annexure desk a2.1.1 say
Nodes Retrieved: 0125, 0131
Response:
Annex Desk A2.1.1, titled "Research included within the literature evaluation," supplies a complete listing of the tutorial and coverage papers utilized to estimate the expansion elasticity of employment (Annex Determine A2.1.1; Annex 2.1).

The desk catalogs 25 particular research, detailing the next info for every:
*   **Paper:** The creator(s) and publication supply.
*   **Interval of important pattern:** The timeframe lined by the research (starting from the Nineteen Sixties by means of 2019).
*   **Pattern protection:** The geographical or organizational scope (e.g., Worldwide, Europe, South Asia, Africa, Asia).
*   **Area:** The particular area analyzed.
*   **Economies:** The particular nations included within the research pattern.

Based on the offered textual content, the articles listed on this desk had been chosen based mostly on particular standards: they give attention to progress elasticity of employment utilizing country-level estimates. The choice excludes articles that current solely subnational, gender-specific, or business/sector-specific estimates, in addition to these targeted solely on Okun's legislation and unemployment elasticity of progress (Annex 2.1).

And the identical utilizing Vector RAG, which utterly misses the content material

=== VECTOR RAG INTERACTION ===
Query: what does annexure desk a2.1.1 say
Retrieved 5 chunks.
Response:
Primarily based on the offered context, there isn't a info obtainable describing the contents of a desk labeled "A2.1.1."

The paperwork listing "A2.1.1" within the desk of contents as **"Research included within the literature evaluation"** (web page 93). Nonetheless, the detailed content material offered in your search outcomes pertains to different tables, particularly:

*   **Annex 2.1:** Describes a abstract of the literature on the expansion elasticity of employment, noting that "an in depth literature search was carried out of research on the elasticity of employment with respect to output progress."
*   **Annex Determine A2.1.1 (referenced as A2..1.11.1 within the textual content):** Titled "Estimates of output elasticity of employment," which states, "The literature supplies a variety of estimates on the expansion elasticity of complete employment."
*   **Annex Desk A2.3.3:** Gives particular regression information relating to the robustness assessments for the feminine employment ratio.

There isn't a particular information or content material offered for Annex Desk A2.1.1 past its title within the desk of contents.

Question requiring Structural information of the doc

The question is what are the questions answered by chapter 2? Right here is the response from Vectorless RAG. It exactly positioned Chapter 2 and the part of “Questions” inside that and revealed it verbatim, together with the grounding info for explainability.

=== VECTORLESS RAG INTERACTION ===
Query: what are the questions answered by chapter 2
Nodes Retrieved: 0098
Response:
Primarily based on the offered excerpts, Chapter 2 addresses the next questions:

*   **How does South Asia’s employment examine with that of different rising market and growing economies (EMDEs)?**
*   **What are the correlates of long-run employment ratios amongst EMDEs?**
*   **How can South Asia create the roles wanted to soak up its rising working-age inhabitants, whereas additionally boosting productiveness?**

These questions are explicitly listed within the part titled **"Questions."**

The chunks in Vector RAG don’t preserve the structural hierarchy and part integrity, so whereas it locates the “Questions” part, it isn’t capable of finding the particular questions in it.

=== VECTOR RAG INTERACTION ===
Query: what are the questions answered by chapter 2
Retrieved 5 chunks.
Response:
Primarily based on the offered doc, Chapter 2—titled "Jobless Improvement"—is structured to deal with particular inquiries relating to labor markets and financial progress.

The chapter explicitly identifies a "Questions" part (web page 65) to border its evaluation. Whereas the doc doesn't listing the particular textual content of those questions, it signifies that the chapter covers the next thematic areas and analytical elements:

*   **Tendencies and Productiveness:** The chapter examines "Tendencies in employment and labor productiveness" (web page 67) and addresses "Labor productiveness progress" (web page 73).
*   **Employment Ratios:** A good portion of the chapter is devoted to "Lengthy-run employment ratios" (web page 75) and their numerous "Correlates," together with:
    *   **Coverage and Structural Elements:** Tax regimes, coverage uncertainty (web page 77), and commerce openness (web page 79).
    *   **Market and Monetary Elements:** Entry to finance (web page 81), in addition to product, labor, and land markets (web page 82).
    *   **Human Capital and Agency Traits:** Schooling (web page 83) and institution dimension (web page 78).
*   **Progress Dynamics:** The chapter explores the "Engines of output progress in EMDEs" (web page 73) and the "Sectoral employment composition" (web page 72).
*   **Coverage Implications:** The analysis concludes by addressing "Coverage choices to spice up employment" (web page 85).

So the query turns into: can we retain these benefits with out paying the fee?

Engineering a Higher Retriever — Proxy-Pointer RAG

Let’s construct Proxy-Pointer RAG to reply that query. The core perception is intuitive and as follows:

You don’t want LLM summaries to offer a vector database structural consciousness. You simply must encode the construction into the embeddings themselves.

The system makes use of the identical structural tree utilizing PageIndex — however with out the costly summarization flag set. Constructing this skeletal tree requires no costly LLM calls throughout indexing. The tree is constructed purely from regex-based heading detection, which runs in milliseconds.

Then, as a substitute of asking an LLM to navigate the tree, we let FAISS do the retrieval — however we engineer the chunks in order that FAISS “understands” the place every chunk lives within the doc’s hierarchy.

Here’s a view of the Ingestion pipeline:

Construct a Skeleton Tree

PageIndex’s tree parser doesn’t really want an LLM to construct the structural hierarchy. The heading detection is regex-based — it finds Markdown headers (#, ##, ###) and builds the nesting from indentation ranges. The LLM is barely used to summarize every node.

We name the LLM-free model a Skeleton Tree: identical construction, identical node IDs, identical line boundaries — however no summaries.

# Construct skeleton tree — no LLM, runs in milliseconds
pageindex = PageIndex(doc_path, enable_ai=False)
tree = pageindex.build_structure()  # Pure regex parsing

The skeleton tree and the summarized tree produced for the sooner Vectorless RAG produce similar buildings — identical 137 nodes, identical nesting depths, identical line numbers, identical titles. The one distinction is the lacking abstract subject.

Value: $0. Time: < 1 second.

Structural Metadata Pointers (The Core Differentiator)

That is the center of why PageIndex works so effectively — and the trick we are going to undertake.

In commonplace vector RAG, a retrieved chunk is the context. No matter 500 phrases FAISS returns, that’s what the LLM sees. If the chunk begins mid-sentence or ends earlier than the important thing information level, the response will miss the intent of the question completely (as illustrated within the earlier part on Vectorless vs Vector RAG comparability).

PageIndex does one thing basically totally different: the chunk isn’t the context. Every node within the tree is aware of its precise place within the authentic doc — its title, its node ID, and crucially, the begin and finish line numbers of the whole part it represents. When retrieval selects a node, PageIndex goes again to the unique Markdown file and slices out the full, contiguous part between these line boundaries.

We replicate this precisely. Each chunk we embed into the vector index carries wealthy structural metadata from the tree node:

metadata = {
    "doc_id": "SADU",           # Which doc
    "node_id": "0012",          # Which structural node
    "title": "Introduction",    # Part heading
    "start_line": 624,          # The place the part begins within the authentic file
    "end_line": 672             # The place the part ends
}

At retrieval time, we don’t feed the matched chunks to the LLM. As an alternative, we:

Use the chunks as proxies — they’re solely there to establish which nodes are related. Take away duplicate (doc_id, node_id) combos to get distinctive top-k.
Observe the metadata pointers — open the unique Markdown, slice strains of nodes, e.g. 624 to 672
Ship the total sections — the LLM receives the whole, pristine, structurally-intact textual content

Here’s a view of the retrieval pipeline:

This implies even when a bit solely matched on a single sentence deep inside a bit, the synthesis LLM will get the total part — with its header, its context, its figures, its conclusions. The chunk was disposable; the pointer is what issues.

This is the reason I name it Proxy-Pointer RAG: the vectors are proxies for location, the metadata are tips that could the true content material.

Value: $0. Impression: Transforms context high quality from fragmented chunks to finish doc sections.

Breadcrumb Injection (Structural Context)

That is key to reply the queries associated to particular part of the doc (reminiscent of Chapter 2). Normal vector RAG embeds uncooked textual content:

"Whereas personal funding progress has slowed in each South Asia and different EMDEs..."

FAISS has no thought this chunk comes from Chapter 1, underneath Financial Exercise, inside Field 1.1. So when consumer asks “important messages of Chapter 1,” this chunk gained’t rank extremely — it doesn’t comprise the phrases “Chapter 1” or “important messages.”

Breadcrumb injection prepends the total ancestry path from the Skeleton Tree to each chunk earlier than embedding:

"[Chapter 1. Deceptive Strength > Economic activity > Regional developments > BOX 1.1 Accelerating Private Investment]
Whereas personal funding progress has slowed in each South Asia and different EMDEs..."

Now the embedding vector encodes each the content material AND its structural location. When somebody asks about “Chapter 1,” FAISS is aware of which chunks belong to Chapter 1 — as a result of the phrases “Chapter 1. Misleading Energy” are current within the embedding.

# Construct breadcrumb from ancestry
current_crumb = f"{parent_breadcrumb} > {node_title}"

# Prepend to chunk textual content earlier than embedding
enriched_text = f"[{current_crumb}]n{section_text}"
chunks = text_splitter.split_text(enriched_text)

This can be a zero-cost encoding of the tree construction into the vector area. We’re utilizing the identical embeddings API, the identical FAISS index, the identical retrieval code. The one distinction is what we feed into the embedder.

Value: $0 additional. Impression: Transforms retrieval high quality for structural queries.

Construction Guided Chunking (No Blind Sliding Home windows)

Normal vector RAG applies a sliding window throughout the total doc — a 2000-character window that strikes ahead with some overlap, utterly oblivious to the doc’s construction. A piece may begin mid-paragraph within the Introduction and finish mid-sentence in a Determine caption. The boundaries are arbitrary, and each chunk is an island in itself, with no information of its place within the total doc construction.

Proxy-Pointer does one thing basically totally different: we stroll the tree, not the textual content.

For every node within the skeleton tree, we extract solely its personal part textual content — from start_line to end_line — after which apply the textual content splitter to that remoted part. If a bit is brief sufficient, it turns into a single chunk. If it’s longer, the splitter divides it — however strictly inside that part’s boundaries.

Normal RAG:  Blind sliding window throughout total doc
[====chunk1====][====chunk2====][====chunk3====]...
    ↑ may begin in Introduction, finish in Determine caption

Proxy-Pointer: Chunk inside every node's boundaries
Introduction (strains 624-672)     → [chunk A] [chunk B]
Financial Exercise (strains 672-676) → [chunk C]
BOX 1.1 (strains 746-749)          → skipped (< 100 chars)
Inflation (strains 938-941)        → [chunk D]

This ensures three issues:

Chunks by no means cross part boundaries — a bit from Introduction won’t ever overlap with Financial Exercise
Every chunk belongs to precisely one node — so the node_id metadata is at all times exact
Breadcrumbs are correct per-chunk — they mirror the precise structural container, not a guess

Importantly, when a node is skipped (as a result of its textual content is simply too quick — e.g., a “BOX 1.1” heading with no physique content material), the tree stroll nonetheless recurses into its youngsters. The precise content material lives in baby nodes like “Introduction,” “Options,” and “Figures” — all of which get embedded with the father or mother’s title of their breadcrumb (eg: BOX 1.1 Accelerating Personal Funding > Introduction, BOX 1.1 > Options of...). No content material is ever misplaced; solely empty structural headers are excluded.

Value: $0. Impression: Each chunk is structurally traceable — enabling exact metadata pointers.

Noise Filtering (Eradicating Distractions)

Some sections in any doc are irrelevant for retrievals and contaminate the context: they comprise key phrases that match nearly each question however present no helpful content material.

Desk of Contents — mentions each chapter title (matches “Chapter 1,” “Chapter 2,” every part)
Govt Abstract — paraphrases each key discovering (matches each subject question)
Abbreviations — lists each acronym used within the doc
Acknowledgments — mentions organizations, nations, and themes

These sections act as distractions in vector area. They sit close to the centroid of the doc’s embedding distribution, pulling retrieval towards shallow overview textual content as a substitute of deep analytical content material.

We apply a easy title-based filter throughout indexing:

NOISE_TITLES = {
    "contents", "abstract of contents", "desk of contents",
    "abbreviations", "acknowledgments", "foreword",
    "government abstract", "references"
}

if title.strip().decrease() in NOISE_TITLES:
    proceed  # Skip this node completely

This filter eliminated 7 nodes from our check doc. The affect was instant: for the question “What are the primary messages of Chapter 1?”, the system stopped retrieving the Govt Abstract and Desk of Contents and began retrieving the precise Introduction part.

Value: $0. Impression: Eliminates the commonest retrieval failure mode.

That’s it. Proxy-Pointer RAG is now prepared to check.

Check Proxy-Pointer pipeline

Whereas detailed comparability is talked about within the subsequent part, listed here are the outcomes of the identical two queries that had been cited within the “Vectorless vs Flat Vector RAG” part. As you possibly can examine, the solutions are as correct and detailed as Vectorless RAG utilizing PageIndex.

Question about desk content material

=== PROXY-POINTER RAG INTERACTION ===
Query: what does annexure desk a2.1.1 say
Distinctive Context Nodes Extracted:
  - [SADU] Node 0131: 'ANNEX TABLE A2.1.1 Research included within the literature evaluation' | Strains: 3952-4157
  - [SADU] Node 0092: 'ANNEX TABLE SL.B1.1.1 Marginal chance of research documenting above-average impact for poor-average households' | Strains: 2537-2745
  - [SADU] Node 0052: 'TABLE B1.1.1 Correlates of the probability of personal funding accelerations beginning' | Strains: 1663-1899

Response:
Primarily based on the offered paperwork, **Annex Desk A2.1.1** is titled **"Research included within the literature evaluation."**

This desk supplies a abstract of varied analysis research, detailing the next info for every:
*   **Paper:** The creator(s) and 12 months of the research (e.g., Kapsos (2005), Saget (2000), Papola and Sahu (2012), and many others.).
*   **Interval of important pattern:** The timeframe lined by the research (starting from the Nineteen Sixties to 2019).
*   **Pattern protection:** The size or variety of entities analyzed (e.g., "160 nations," "90 growing nations," "47 African nations").
*   **Area:** The geographic scope of the analysis (e.g., Worldwide, South Asia, Africa, Europe, Center East, Asia).
*   **Economies:** The particular nations or financial groupings included within the analysis (e.g., "India," "Bulgaria, Czech Republic, Estonia...", "G20 nations").

The desk paperwork a various set of analysis spanning from 1960 by means of 2019, overlaying areas and economies throughout the globe, together with particular information on transition economies, numerous OECD nations, and particular areas reminiscent of South Asia and Africa.

Question requiring Structural information

=== PROXY-POINTER RAG INTERACTION ===
Query: what are the questions answered by chapter 2
Distinctive Context Nodes Extracted:
  - [SADU] Node 0099: 'FIGURE 2.3 Progress fashions' | Strains: 3127-3155
  - [SADU] Node 0098: 'Questions' | Strains: 3115-3127
  - [SADU] Node 0124: 'ANNEX 2.1 Abstract of literature on the expansion elasticity of employment' | Strains: 3807-3812

Response:
Primarily based on the offered textual content, Chapter 2 addresses the next questions relating to South Asia’s financial progress and labor market:

*   **How does South Asia’s employment examine with that of different rising market and growing economies (EMDEs)?** (Supply: *Chapter 2, Questions part*)
*   **What are the correlates of long-run employment ratios amongst EMDEs?** (Supply: *Chapter 2, Questions part*)
*   **How can South Asia create the roles wanted to soak up its rising working-age inhabitants, whereas additionally boosting productiveness?** (Supply: *Chapter 2, Questions part*)

Moreover, the chapter explores these points with the aim of understanding how the area can harness its working-age inhabitants to speed up convergence towards the revenue ranges of superior economies.

Whereas now we have demonstrated the fee benefit to this point, does the structure overcome the scalability problem of PageIndex?

Is Proxy-Pointer Scalable?

Right here is why the structure is scalable throughout an enterprise information base. PageIndex pays a scalability penalty at each ends: ~137 LLM calls per doc throughout indexing, and an extra LLM reasoning step per question for tree navigation. Proxy-Pointer eliminates each.

No LLM at indexing. The skeleton tree is regex-built in milliseconds. The one API calls are to the embedding mannequin — similar to straightforward vector RAG.
No tree navigation at retrieval. Queries go straight to the vector index. No LLM studying summaries, no per-document traversal.

Proxy-Pointer is commonplace vector RAG with clever metadata baked in. The structural consciousness lives contained in the embeddings (by way of breadcrumbs) and the chunk metadata (by way of node pointers) — not in an LLM reasoning loop. It inherits all of vector RAG’s scalability: unified multi-document indexes, sub-linear search, incremental updates, and 0 per-query LLM overhead past the ultimate synthesis.

Fail-safe for unstructured paperwork: If a doc has no headings — or the skeleton tree produces solely a single root node — the system detects this throughout chunking and falls again to a typical sliding window. Chunks are flagged with empty node_id and line boundaries. At retrieval time, flagged chunks are used immediately as LLM context as a substitute of following pointers again to the supply. The system gracefully degrades to straightforward vector RAG — no errors, no particular dealing with required.

Let’s examine Vectorless RAG and Proxy-Pointer head-to-head.

Vectorless vs Proxy-Pointer RAG

I ran a wide range of queries — broad structural, cross-reference, particular factual, determine particular and many others. And let Claude choose the responses for a complete comparability. You’ll find the detailed responses from Vectorless and Proxy-Pointer together with the total High quality Comparability report right here.

The next desk encapsulates the decision. The Last Rating: PageIndex 2 — Proxy 4 — Ties 4 . In different phrases, Proxy-Pointer matches or beats PageIndex on 8 out of 10 queries. And all on the scalability and worth of a Flat Vector RAG.

Right here is the abstract verdict:

#	Question Kind	Winner
1	Broad structural (Ch.1 messages)	🔴 PageIndex
2	Broad structural (Ch.2 messages)	🔴 PageIndex (slim)
3	Particular factual (Field 1.1 options)	🟡 Tie
4	Cross-reference (inflation tables)	🟢 Proxy-Pointer
5	Comparative (India vs area)	🟢 Proxy-Pointer
6	Determine-specific (B1.1.1 tendencies)	🟢 Proxy-Pointer
7	Direct lookup (Annexure A2.1.1)	🟡 Tie
8	Entity-specific (foreign money disaster nations)	🟡 Tie
9	Navigational (Ch.2 questions)	🟡 Tie
10	Inferential/coverage (govt vs shocks)	🟢 Proxy-Pointer

And right here is the fee comparability:

Metric	PageIndex	Proxy-Pointer	Normal Vector RAG
Indexing LLM calls	~137 per doc	0	0
Indexing time	5-10 min/doc	< 30 sec/doc	< 30 sec/doc
Retrieval high quality	★★★★★	★★★★★ (8/10 vs PageIndex)	★★★☆☆
Multi-doc scalability	Poor (per-doc tree nav)	Glorious (unified vector index)	Glorious
Structural consciousness	Full (LLM-navigated)	Excessive (breadcrumb-encoded)	None
Index rebuild on replace	Costly (re-summarize)	Low cost (re-embed affected nodes)	Low cost
Explainability	Excessive (part titles + doc IDs)	Excessive (part titles + doc IDs)	Low (opaque chunks)

Key Takeaways

Construction is the lacking ingredient in RAG. The standard hole between naive vector RAG and PageIndex isn’t about higher embeddings — it’s about preserving hierarchy.
You don’t want an LLM to encode construction. Breadcrumb injection and structural metadata give the vector index, structural consciousness with none price.
Noise filtering beats higher embeddings. Eradicating 7 low-value nodes from the index had extra affect on retrieval high quality than any mannequin swap may.
Pointers beat chunks. Chunks act as proxies for the total part, which is what the synthesizer LLM sees.

Conclusion

Proxy-Pointer RAG proves a easy thesis: you don’t want an costly LLM to make a retriever structurally conscious — you simply must be intelligent about what you embed.

5 zero-cost engineering strategies — skeleton bushes, metadata pointers, breadcrumbs, structure-guided chunking, and noise filtering — shut the standard hole with a full LLM-navigated system, whereas holding the velocity and scalability of ordinary vector RAG. On our 10-query benchmark, Proxy-Pointer matched or beat PageIndex on 8 out of 10 queries, at the price of a typical Vector RAG.

The subsequent time you’re constructing RAG in your structured (or unstructured) doc repository, don’t attain for an even bigger mannequin. Attain for Proxy-Pointer index.

Join with me and share your feedback at www.linkedin.com/in/partha-sarkar-lets-talk-AI

Reference

World Financial institution. 2024. South Asia Improvement Replace, April 2024: Jobs for Resilience — License: CC BY 3.0 IGO.

_{Pictures used on this article are generated utilizing Google Gemini. Code created by me.}

Proxy-Pointer RAG: Reaching Vectorless Accuracy at Vector RAG Scale and Value

Constructing a Python Workflow That Catches Bugs Earlier than Manufacturing

I Changed Vector DBs with Google’s Reminiscence Agent Sample for my notes in Obsidian

Related Posts

Constructing a Python Workflow That Catches Bugs Earlier than Manufacturing

I Changed Vector DBs with Google’s Reminiscence Agent Sample for my notes in Obsidian

What Occurs Now That AI is the First Analyst On Your Crew?

Learn how to Make Claude Code Higher at One-Shotting Implementations

All the things You Must Know About Recursive Language Fashions

Self-Therapeutic Neural Networks in PyTorch: Repair Mannequin Drift in Actual Time With out Retraining

Leave a Reply Cancel reply

POPULAR NEWS

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

Chainlink’s Run to $20 Beneficial properties Steam Amid LINK Taking the Helm because the High Creating DeFi Challenge ⋆ ZyCrypto

Easy methods to Use LLMs for Highly effective Computerized Evaluations

XMN is accessible for buying and selling!

College endowments be a part of crypto rush, boosting meme cash like Meme Index

EDITOR'S PICK

Cardano Marches In the direction of Huge Improve With The Launch Of A Improvement Tracker ⋆ ZyCrypto

Financial institution of Russia to Permit Choose Buyers to Commerce Crypto

Cloud Computing in 2025: Revolutionizing Expertise

Cardano’s Hoskinson Reiterates Bullish Bitcoin Prediction whereas Anticipating ADA, SOL, XRP Rocket ⋆ ZyCrypto

About Us

Categories

Recent Posts

Are you sure want to unlock this post?

Are you sure want to cancel subscription?

Proxy-Pointer RAG: Reaching Vectorless Accuracy at Vector RAG Scale and Value

READ ALSO

Use Case Setup

How does PageIndex work?

Section 1: Indexing(as soon as per doc)

Section 2: Retrieval (Per Question)

Why this works so effectively?

Why that is tough to scale?

Comparability of Vectorless vs Flat Vector RAG

Question about desk content material

Question requiring Structural information of the doc

Engineering a Higher Retriever — Proxy-Pointer RAG

Construct a Skeleton Tree

Structural Metadata Pointers (The Core Differentiator)

Breadcrumb Injection (Structural Context)

Construction Guided Chunking (No Blind Sliding Home windows)

Noise Filtering (Eradicating Distractions)

Check Proxy-Pointer pipeline

Question about desk content material

Question requiring Structural information

Is Proxy-Pointer Scalable?

Vectorless vs Proxy-Pointer RAG

Key Takeaways

Conclusion

Reference

Related Posts

Leave a Reply Cancel reply

POPULAR NEWS

EDITOR'S PICK

About Us

Categories

Recent Posts

Are you sure want to unlock this post?

Are you sure want to cancel subscription?