Neighbors might nonetheless be completely different.
Language fashions include a context restrict. For newer OpenAI fashions, that is round 128k tokens, roughly 80k English phrases. This will sound large enough for many use circumstances. Nonetheless, giant production-grade functions usually have to discuss with greater than 80k phrases, to not point out pictures, tables, and different unstructured data.
Even when we pack every thing throughout the context window with extra irrelevant data, LLM efficiency drops considerably.
That is the place RAG helps. RAG retrieves the related data from an embedded supply and passes it as context to the LLM. To retrieve the ‘related data,’ we should always have divided the paperwork into chunks. Thus, chunking performs a significant function in a RAG pipeline.
Chunking helps the RAG retrieve particular items of a giant doc. Nevertheless, small modifications within the chunking technique can considerably impression the responses LLM makes.