On this article we’ll discover why 128K tokens (and extra) fashions can’t totally substitute utilizing RAG.
We’ll begin with a quick reminder of the issues that may be solved with RAG, earlier than trying on the enhancements in LLMs and their affect on the want to make use of RAG.
RAG isn’t actually new
The thought of injecting a context to let a language mannequin get entry to up-to-date knowledge is kind of “previous” (on the LLM stage). It was first launched by Fb AI/Meta researcher on this 2020 paper “Retrieval-Augmented Technology for Information-Intensive NLP Duties”. Compared the primary model of ChatGPT was solely launched on November 2022.
On this paper they distinguish two sort of reminiscence:
- the parametric one, which is what’s inherent to the LLM, what it discovered whereas being fed lot and lot of texts throughout coaching,
- the non-parametric one, which is the reminiscence you’ll be able to present by feeding a context to the immediate.