• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Tuesday, May 13, 2025
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home ChatGPT

Self Internet hosting RAG Purposes On Edge Gadgets with Langchain

Admin by Admin
August 28, 2024
in ChatGPT
0
Rag Scaled.webp.webp
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Introduction 

Within the second a part of our collection on constructing a RAG utility on a Raspberry Pi, we’ll develop on the inspiration we laid within the first half, the place we created and examined the core pipeline. Within the first half, we created the core pipeline and examined it to make sure every part labored as anticipated. Now, we’re going to take issues a step additional by constructing a FastAPI utility to serve our RAG pipeline and making a Reflex app to provide customers a easy and interactive method to entry it. This half will information you thru organising the FastAPI back-end, designing the front-end with Reflex, and getting every part up and operating in your Raspberry Pi. By the top, you’ll have a whole, working utility that’s prepared for real-world use.

Studying Targets

  • Arrange a FastAPI back-end to combine with the prevailing RAG pipeline and course of queries effectively.
  • Design a user-friendly interface utilizing Reflex to work together with the FastAPI back-end and the RAG pipeline.
  • Create and take a look at API endpoints for querying and doc ingestion, guaranteeing clean operation with FastAPI.
  • Deploy and take a look at the whole utility on a Raspberry Pi, guaranteeing each back-end and front-end parts operate seamlessly.
  • Perceive the combination between FastAPI and Reflex for a cohesive RAG utility expertise.
  • Implement and troubleshoot FastAPI and Reflex parts to offer a completely operational RAG utility on a Raspberry Pi.

In the event you missed the earlier version, remember to test it out right here: Self-Internet hosting RAG Purposes on Edge Gadgets with Langchain and Ollama – Half I.

This text was revealed as part of the Knowledge Science Blogathon.

Creating Python Surroundings

Earlier than we begin with creating the applying we have to setup the surroundings. Create an surroundings and set up the beneath dependencies:

deeplake 
boto3==1.34.144 
botocore==1.34.144 
fastapi==0.110.3 
gunicorn==22.0.0 
httpx==0.27.0 
huggingface-hub==0.23.4 
langchain==0.2.6 
langchain-community==0.2.6 
langchain-core==0.2.11 
langchain-experimental==0.0.62 
langchain-text-splitters==0.2.2 
langsmith==0.1.83 
marshmallow==3.21.3 
numpy==1.26.4 
pandas==2.2.2 
pydantic==2.8.2 
pydantic_core==2.20.1 
PyMuPDF==1.24.7 
PyMuPDFb==1.24.6 
python-dotenv==1.0.1 
pytz==2024.1 
PyYAML==6.0.1 
reflex==0.5.6 
requests==2.32.3
reflex==0.5.6
reflex-hosting-cli==0.1.13

As soon as the required packages are put in, we have to have the required fashions current within the gadget. We are going to do that utilizing Ollama. Comply with the steps from Half-1 of this text to obtain each the language and embedding fashions. Lastly, create two directories for the back-end and front-end functions.

As soon as the fashions are pulled utilizing Ollama, we’re able to construct the ultimate utility.

Growing the Again-Finish with FastAPI

Within the Half-1 of this text, we have now constructed the RAG pipeline having each the Ingestion and QnA modules. We’ve examined each the pipelines utilizing some paperwork they usually have been completely working. Now we have to wrap the pipeline with FastAPI to create consumable API. This can assist us combine it with any front-end utility like Streamlit, Chainlit, Gradio, Reflex, React, Angular and so forth. Let’s begin by constructing a construction for the applying. Following the construction is totally optionally available, however be sure that to verify the dependency imports in the event you comply with a unique construction to create the app.

Under is the tree construction we’ll comply with:

backend
├── app.py
├── necessities.txt
└── src
    ├── config.py
    ├── doc_loader
    │   ├── base_loader.py
    │   ├── __init__.py
    │   └── pdf_loader.py
    ├── ingestion.py
    ├── __init__.py
    └── qna.py

Let’s begin with the config.py. This file will comprise all of the configurable choices for the applying, just like the Ollama URL, LLM identify and the embeddings mannequin identify. Under is an instance:

LANGUAGE_MODEL_NAME = "phi3"
EMBEDDINGS_MODEL_NAME = "nomic-embed-text"
OLLAMA_URL = "http://localhost:11434"

The base_loader.py file accommodates the mum or dad doc loader class that will probably be inherited by youngsters doc loader. On this utility we’re solely working with PDF information, so a Little one PDFLoader class will probably be
created that may inherit the BaseLoader class.

Under are the contents of base_loader.py and pdf_loader.py:

# base_loader.py
from abc import ABC, abstractmethod

class BaseLoader(ABC):
    def __init__(self, file_path: str) -> None:
        self.file_path = file_path

    @abstractmethod
    async def load_document(self):
        cross


# pdf_loader.py
import os

from .base_loader import BaseLoader
from langchain.schema import Doc
from langchain.document_loaders.pdf import PyMuPDFLoader
from langchain.text_splitter import CharacterTextSplitter


class PDFLoader(BaseLoader):
    def __init__(self, file_path: str) -> None:
        tremendous().__init__(file_path)

    async def load_document(self):
        self.file_name = os.path.basename(self.file_path)
        loader = PyMuPDFLoader(file_path=self.file_path)

        text_splitter = CharacterTextSplitter(
            separator="n",
            chunk_size=1000,
            chunk_overlap=200,
        )
        pages = await loader.aload()
        total_pages = len(pages)
        chunks = []
        for idx, web page in enumerate(pages):
            chunks.append(
                Doc(
                    page_content=web page.page_content,
                    metadata=dict(
                        {
                            "file_name": self.file_name,
                            "page_no": str(idx + 1),
                            "total_pages": str(total_pages),
                        }
                    ),
                )
            )

        final_chunks = text_splitter.split_documents(chunks)
        return final_chunks

We’ve mentioned the working of pdf_loader within the Half-1 of the article.

Subsequent, let’s construct the Ingestion class. That is identical because the one we constructed within the Half-1 of this text.

Code for Ingestion Class

import os
import config as cfg

from pinecone import Pinecone
from langchain.vectorstores.deeplake import DeepLake
from langchain.embeddings.ollama import OllamaEmbeddings
from .doc_loader import PDFLoader

class Ingestion:
    """Doc Ingestion pipeline."""
    def __init__(self):
        attempt:
            self.embeddings = OllamaEmbeddings(
                mannequin=cfg.EMBEDDINGS_MODEL_NAME,
                base_url=cfg.OLLAMA_URL,
                show_progress=True,
            )
            self.vector_store = DeepLake(
                dataset_path="knowledge/text_vectorstore",
                embedding=self.embeddings,
                num_workers=4,
                verbose=False,
            )
        besides Exception as e:
            increase RuntimeError(f"Did not initialize Ingestion system. ERROR: {e}")

    async def create_and_add_embeddings(
        self,
        file: str,
    ):
        attempt:
            loader = PDFLoader(
                file_path=file,
            )

            chunks = await loader.load_document()
            dimension = await self.vector_store.aadd_documents(paperwork=chunks)
            return len(dimension)
        besides (ValueError, RuntimeError, KeyError, TypeError) as e:
            increase Exception(f"ERROR: {e}")

Now that we have now setup the Ingestion class, we’ll go ahead with creating the QnA class. This too is identical because the one we created within the Half-1 of this text.

Code for QnA Class

import os
import config as cfg

from pinecone import Pinecone
from langchain.vectorstores.deeplake import DeepLake
from langchain.embeddings.ollama import OllamaEmbeddings
from langchain_community.llms.ollama import Ollama
from .doc_loader import PDFLoader

class QnA:
    """Doc Ingestion pipeline."""
    def __init__(self):
        attempt:
            self.embeddings = OllamaEmbeddings(
                mannequin=cfg.EMBEDDINGS_MODEL_NAME,
                base_url=cfg.OLLAMA_URL,
                show_progress=True,
            )
            self.mannequin = Ollama(
                mannequin=cfg.LANGUAGE_MODEL_NAME,
                base_url=cfg.OLLAMA_URL,
                verbose=True,
                temperature=0.2,
            )
            self.vector_store = DeepLake(
                dataset_path="knowledge/text_vectorstore",
                embedding=self.embeddings,
                num_workers=4,
                verbose=False,
            )
            self.retriever = self.vector_store.as_retriever(
                search_type="similarity",
                search_kwargs={
                    "okay": 10,
                },
            )
        besides Exception as e:
            increase RuntimeError(f"Did not initialize Ingestion system. ERROR: {e}")

    def create_rag_chain(self):
        attempt:
            system_prompt = """nnContext: {context}"
            """
            immediate = ChatPromptTemplate.from_messages(
                [
                    ("system", system_prompt),
                    ("human", "{input}"),
                ]
            )
            question_answer_chain = create_stuff_documents_chain(self.mannequin, immediate)
            rag_chain = create_retrieval_chain(self.retriever, question_answer_chain)

            return rag_chain
        besides Exception as e:
            increase RuntimeError(f"Did not create retrieval chain. ERROR: {e}")

With this we have now completed creating the code functionalities of the RAG app. Now let’s wrap the app with FastAPI.

Code for the FastAPI Utility

import sys
import os
import uvicorn

from src import QnA, Ingestion
from fastapi import FastAPI, Request, File, UploadFile
from fastapi.responses import StreamingResponse

app = FastAPI()

ingestion = Ingestion()
chatbot = QnA()
rag_chain = chatbot.create_rag_chain()


@app.get("https://www.analyticsvidhya.com/")
def hi there():
    return {"message": "API Working in server 8089"}


@app.put up("/question")
async def ask_query(request: Request):
    knowledge = await request.json()
    query = knowledge.get("query")

    async def event_generator():
        for chunk in rag_chain.choose("reply").stream({"enter": query}):
            yield chunk

    return StreamingResponse(event_generator(), media_type="textual content/plain")


@app.put up("/ingest")
async def ingest_document(file: UploadFile = File(...)):
    attempt:
        os.makedirs("information", exist_ok=True)
        file_location = f"information/{file.filename}"
        with open(file_location, "wb+") as file_object:
            file_object.write(file.file.learn())

        dimension = await ingestion.create_and_add_embeddings(file=file_location)
        return {"message": f"File ingested! Doc rely: {dimension}"}
    besides Exception as e:
        return {"message": f"An error occured: {e}"}


if __name__ == "__main__":
    attempt:
        uvicorn.run(app, host="0.0.0.0", port=8089)
    besides KeyboardInterrupt as e:
        print("App stopped!")

Let’s breakdown the app by every endpoints:

  • First we initialize the FastAPI app, the Ingestion and the QnA objects. We then create a RAG chain utilizing the create_rag_chain technique of QnA class.
  • Our first endpoint is a straightforward GET technique. This can assist us know whether or not the app is wholesome or not. Consider it like a ‘Howdy World’ endpoint.
  • The second is the question endpoint. This can be a POST technique and will probably be used to run the chain. It takes in a request parameter, from which we extract the consumer’s question. Then we create a asynchronous technique that acts as an asynchronous wrapper across the chain.stream operate name. We have to do that to permit FastAPI to deal with the LLM’s stream operate name, to get a ChatGPT-like expertise within the chat interface. We then wrap the asynchronous technique with StreamingResponse class and return it.
  • The third endpoint is the ingestion endpoint. It is also a POST technique that takes in the whole file as bytes as enter. We retailer this file within the native listing after which ingest it utilizing the create_and_add_embeddings technique of Ingestion class.

Lastly, we run the app utilizing uvicorn bundle, utilizing host and port. To check the app, merely run the applying utilizing the next command:

python app.py
Code for the FastAPI Application

Use a API testing IDE like Postman, Insomnia or Bruno for testing the applying. You may as well use Thunder Consumer extension to do the identical.

Testing the Ingestion endpoint:

Testing the Ingestion endpoint

Testing the question endpoint:

Testing the query endpoint

Designing the Entrance-Finish with Reflex

We’ve efficiently created a FastAPI app for the backend of our RAG utility. It’s time to construct our front-end. You’ll be able to selected any front-end library for this, however for this specific article we’ll construct the front-end utilizing Reflex. Reflex is a python-only front-end library, created to construct internet functions, purely utilizing python. It proves us with templates for widespread functions like calculator, picture era and chatbot. We are going to use the chatbot utility template as a begin for our consumer interface. Our last app can have the next construction, so let’s have it right here for reference.

Frontend Listing

We can have a frontend listing for this:

frontend
├── property
│   └── favicon.ico
├── docs
│   └── demo.gif
├── chat
│   ├── parts
│   │   ├── chat.py
│   │   ├── file_upload.py
│   │   ├── __init__.py
│   │   ├── loading_icon.py
│   │   ├── modal.py
│   │   └── navbar.py
│   ├── __init__.py
│   ├── chat.py
│   └── state.py
├── necessities.txt
├── rxconfig.py
└── uploaded_files

Steps for Remaining App

Comply with the steps to organize the grounding for the ultimate app.

Step1: Clone the chat template repository within the frontend listing

git clone https://github.com/reflex-dev/reflex-chat.git .

Step2: Run the next command to initialize the listing as a reflex app

reflex init
Run the following command to initialize the directory as a reflex app

This can setup the reflex app and will probably be able to run and develop. 

Step3: Check the app, use the next command from contained in the frontend listing

reflex run
Test the app, use the following command from inside the frontend directory

Let’s begin modifying the parts. First let’s modify the chat.py file.

Under is the code for a similar:

import reflex as rx
from reflex_demo.parts import loading_icon
from reflex_demo.state import QA, State

message_style = dict(
    show="inline-block",
    padding="0 10px",
    border_radius="8px",
    max_width=["30em", "30em", "50em", "50em", "50em", "50em"],
)


def message(qa: QA) -> rx.Part:
    """A single query/reply message.

    Args:
        qa: The query/reply pair.

    Returns:
        A part displaying the query/reply pair.
    """
    return rx.field(
        rx.field(
            rx.markdown(
                qa.query,
                background_color=rx.shade("mauve", 4),
                shade=rx.shade("mauve", 12),
                **message_style,
            ),
            text_align="proper",
            margin_top="1em",
        ),
        rx.field(
            rx.markdown(
                qa.reply,
                background_color=rx.shade("accent", 4),
                shade=rx.shade("accent", 12),
                **message_style,
            ),
            text_align="left",
            padding_top="1em",
        ),
        width="100%",
    )


def chat() -> rx.Part:
    """Checklist all of the messages in a single dialog."""
    return rx.vstack(
        rx.field(rx.foreach(State.chats[State.current_chat], message), width="100%"),
        py="8",
        flex="1",
        width="100%",
        max_width="50em",
        padding_x="4px",
        align_self="middle",
        overflow="hidden",
        padding_bottom="5em",
    )


def action_bar() -> rx.Part:
    """The motion bar to ship a brand new message."""
    return rx.middle(
        rx.vstack(
            rx.chakra.type(
                rx.chakra.form_control(
                    rx.hstack(
                        rx.enter(
                            rx.enter.slot(
                                rx.tooltip(
                                    rx.icon("information", dimension=18),
                                    content material="Enter a query to get a response.",
                                )
                            ),
                            placeholder="Kind one thing...",
                            id="query",
                            width=["15em", "20em", "45em", "50em", "50em", "50em"],
                        ),
                        rx.button(
                            rx.cond(
                                State.processing,
                                loading_icon(peak="1em"),
                                rx.textual content("Ship", font_family="Ubuntu"),
                            ),
                            sort="submit",
                        ),
                        align_items="middle",
                    ),
                    is_disabled=State.processing,
                ),
                on_submit=State.process_question,
                reset_on_submit=True,
            ),
            rx.textual content(
                "ReflexGPT might return factually incorrect or deceptive responses. Use discretion.",
                text_align="middle",
                font_size=".75em",
                shade=rx.shade("mauve", 10),
                font_family="Ubuntu",
            ),
            rx.emblem(margin_top="-1em", margin_bottom="-1em"),
            align_items="middle",
        ),
        place="sticky",
        backside="0",
        left="0",
        padding_y="16px",
        backdrop_filter="auto",
        backdrop_blur="lg",
        border_top=f"1px stable {rx.shade('mauve', 3)}",
        background_color=rx.shade("mauve", 2),
        align_items="stretch",
        width="100%",
    )

The adjustments are minimal from the one current natively within the template.

Subsequent, we’ll edit the chat.py app. That is the primary chat part.

Code for Essential Chat Part

Under is the code for it:

import reflex as rx
from reflex_demo.parts import chat, navbar, upload_form
from reflex_demo.state import State


@rx.web page(route="/chat", title="RAG Chatbot")
def chat_interface() -> rx.Part:
    return rx.chakra.vstack(
        navbar(),
        chat.chat(),
        chat.action_bar(),
        background_color=rx.shade("mauve", 1),
        shade=rx.shade("mauve", 12),
        min_height="100vh",
        align_items="stretch",
        spacing="0",
    )


@rx.web page(route="https://www.analyticsvidhya.com/", title="RAG Chatbot")
def index() -> rx.Part:
    return rx.chakra.vstack(
        navbar(),
        upload_form(),
        background_color=rx.shade("mauve", 1),
        shade=rx.shade("mauve", 12),
        min_height="100vh",
        align_items="stretch",
        spacing="0",
    )


# Add state and web page to the app.
app = rx.App(
    theme=rx.theme(
        look="darkish",
        accent_color="jade",
    ),
    stylesheets=["https://fonts.googleapis.com/css2?family=Ubuntu&display=swap"],
    model={
        "font_family": "Ubuntu",
    },
)
app.add_page(index)
app.add_page(chat_interface)

That is the code for the chat interface. We’ve solely added the Font household to the app config, the remainder of the code is identical.

Subsequent let’s edit the state.py file. That is the place the frontend will make name to the API endpoints for response.

Modifying state.py File

import requests
import reflex as rx


class QA(rx.Base):
    query: str
    reply: str


DEFAULT_CHATS = {
    "Intros": [],
}


class State(rx.State):
    chats: dict[str, list[QA]] = DEFAULT_CHATS
    current_chat = "Intros"
    url: str = "http://localhost:8089/question"
    query: str
    processing: bool = False
    new_chat_name: str = ""

    def create_chat(self):
        """Create a brand new chat."""
        # Add the brand new chat to the record of chats.
        self.current_chat = self.new_chat_name
        self.chats[self.new_chat_name] = []

    def delete_chat(self):
        """Delete the present chat."""
        del self.chats[self.current_chat]
        if len(self.chats) == 0:
            self.chats = DEFAULT_CHATS
        self.current_chat = record(self.chats.keys())[0]

    def set_chat(self, chat_name: str):
        """Set the identify of the present chat.

        Args:
            chat_name: The identify of the chat.
        """
        self.current_chat = chat_name

    @rx.var
    def chat_titles(self) -> record[str]:
        """Get the record of chat titles.

        Returns:
            The record of chat names.
        """
        return record(self.chats.keys())

    async def process_question(self, form_data: dict[str, str]):
        # Get the query from the shape
        query = form_data["question"]

        # Examine if the query is empty
        if query == "":
            return

        mannequin = self.openai_process_question

        async for worth in mannequin(query):
            yield worth

    async def openai_process_question(self, query: str):
        """Get the response from the API.

        Args:
            form_data: A dict with the present query.
        """
        # Add the query to the record of questions.
        qa = QA(query=query, reply="")
        self.chats[self.current_chat].append(qa)
        payload = {"query": query}

        # Clear the enter and begin the processing.
        self.processing = True
        yield

        response = requests.put up(self.url, json=payload, stream=True)

        # Stream the outcomes, yielding after each phrase.
        for answer_text in response.iter_content(chunk_size=512):
            # Guarantee answer_text will not be None earlier than concatenation
            answer_text = answer_text.decode()
            if answer_text will not be None:
                self.chats[self.current_chat][-1].reply += answer_text
            else:
                answer_text = ""
                self.chats[self.current_chat][-1].reply += answer_text
            self.chats = self.chats
            yield

        # Toggle the processing flag.
        self.processing = False

On this file, we have now outlined the URL for the question endpoint. We’ve additionally modified the openai_process_question technique to ship a POST request to the question endpoint and get the streaming
response, which will probably be displayed within the chat interface.

Writing Contents of the file_upload.py File

Lastly, let’s write the contents of the file_upload.py file. This part will probably be displayed at first which can enable us to add the file for ingestion.

import reflex as rx
import os
import time

import requests


class UploadExample(rx.State):
    importing: bool = False
    ingesting: bool = False
    progress: int = 0
    total_bytes: int = 0
    ingestion_url = "http://127.0.0.1:8089/ingest"

    async def handle_upload(self, information: record[rx.UploadFile]):
        self.ingesting = True
        yield
        for file in information:
            file_bytes = await file.learn()
            file_name = file.filename
            information = {
                "file": (os.path.basename(file_name), file_bytes, "multipart/form-data")
            }
            response = requests.put up(self.ingestion_url, information=information)
            self.ingesting = False
            yield
            if response.status_code == 200:
                # yield rx.redirect("/chat")
                self.show_redirect_popup()

    def handle_upload_progress(self, progress: dict):
        self.importing = True
        self.progress = spherical(progress["progress"] * 100)
        if self.progress >= 100:
            self.importing = False

    def cancel_upload(self):
        self.importing = False
        return rx.cancel_upload("upload3")


def upload_form():
    return rx.vstack(
        rx.add(
            rx.flex(
                rx.textual content(
                    "Drag and drop file right here or click on to pick out file",
                    font_family="Ubuntu",
                ),
                rx.icon("add", dimension=30),
                course="column",
                align="middle",
            ),
            id="upload3",
            border="1px stable rgb(233, 233,233, 0.4)",
            margin="5em 0 10px 0",
            background_color="rgb(107,99,246)",
            border_radius="8px",
            padding="1em",
        ),
        rx.vstack(rx.foreach(rx.selected_files("upload3"), rx.textual content)),
        rx.cond(
            ~UploadExample.ingesting,
            rx.button(
                "Add",
                on_click=UploadExample.handle_upload(
                    rx.upload_files(
                        upload_id="upload3",
                        on_upload_progress=UploadExample.handle_upload_progress,
                    ),
                ),
            ),
            rx.flex(
                rx.spinner(dimension="3", loading=UploadExample.ingesting),
                rx.button(
                    "Cancel",
                    on_click=UploadExample.cancel_upload,
                ),
                align="middle",
                spacing="3",
            ),
        ),
        rx.alert_dialog.root(
            rx.alert_dialog.set off(
                rx.button("Proceed to Chat", color_scheme="inexperienced"),
            ),
            rx.alert_dialog.content material(
                rx.alert_dialog.title("Redirect to Chat Interface?"),
                rx.alert_dialog.description(
                    "You can be redirected to the Chat Interface.",
                    dimension="2",
                ),
                rx.flex(
                    rx.alert_dialog.cancel(
                        rx.button(
                            "Cancel",
                            variant="gentle",
                            color_scheme="grey",
                        ),
                    ),
                    rx.alert_dialog.motion(
                        rx.button(
                            "Proceed",
                            color_scheme="inexperienced",
                            variant="stable",
                            on_click=rx.redirect("/chat"),
                        ),
                    ),
                    spacing="3",
                    margin_top="16px",
                    justify="finish",
                ),
                model={"max_width": 450},
            ),
        ),
        align="middle",
    )

This part will enable us to add a file and ingest it into the vector retailer. It makes use of the ingest endpoint of our FastAPI app to add and ingest the file. After ingestion, the consumer can merely transfer
to the chat interface for asking queries.

With this we have now accomplished constructing the front-end for our utility. Now we might want to take a look at the applying utilizing some doc.

Testing and Deployment

Now let’s take a look at the applying on some manuals or paperwork. To make use of the applying, we have to run each the back-end app and the reflex app individually. Run the back-end app from it’s listing utilizing the
following command:

python app.py

Await the FastAPI to begin operating. Then in one other terminal occasion run the front-end app utilizing the next command:

reflex run

One the apps are up and operating, obtained to the following URL to entry the reflex app. Initially we might be within the File Add web page. Add a file and press the add button.

reflex chat

The file will probably be uploaded and ingested. This can take some time relying on the doc dimension and
the gadget specs. As soon as it’s completed, click on on the ‘Proceed to Chat’ button to maneuver to the chat interface. Write your question and press Ship.

Conclusion

On this two half collection, you’ve now constructed a whole and practical RAG utility on a Raspberry Pi, from creating the core pipeline to wrapping it with a FastAPI back-end and growing a Reflex-based front-end. With these instruments, your RAG pipeline is accessible and interactive, offering real-time question processing by way of a user-friendly internet interface. By mastering these steps, you’ve gained precious expertise in constructing and deploying end-to-end functions on a compact, environment friendly platform. This setup opens the door to numerous prospects for deploying AI-driven functions on resource-constrained gadgets just like the Raspberry Pi, making cutting-edge expertise extra accessible and sensible for on a regular basis use.

Key Takeaways

  • An in depth information is supplied on organising the event surroundings, together with putting in mandatory dependencies and fashions utilizing Ollama, guaranteeing the applying is prepared for the ultimate construct.
  • The article explains the way to wrap the RAG pipeline in a FastAPI utility, together with organising endpoints for querying the mannequin and ingesting paperwork, making the pipeline accessible by way of an internet API.
  • The front-end of the RAG utility is constructed utilizing Reflex, a Python-only front-end library. The article demonstrates the way to modify the chat utility template to create a user-friendly interface for interacting with the RAG pipeline.
  • The article guides on integrating the FastAPI backend with the Reflex front-end and deploying the whole utility on a Raspberry Pi, guaranteeing seamless operation and consumer accessibility.
  • Sensible steps are supplied for testing each the ingestion and question endpoints utilizing instruments like Postman or Thunder Consumer, together with operating and testing the Reflex front-end to make sure the whole utility features as anticipated.

Incessantly Requested Query

Q1: How can I make the app accessible to myself from wherever within the World with out compromising safety?

A. There’s a platform named Tailscale that permits your gadgets to be related to a non-public safe community, accessible solely to you. You’ll be able to add your Raspberry Pi and different gadgets to Tailscale gadgets and hook up with the VPN to entry your apps, from wherever throughout the world. 

Q2: My utility could be very sluggish when it comes to ingestion and QnA.

A. That’s the constraint on account of low {hardware} specs of Raspberry Pi. The article is only a head up tutorial on the way to begin constructing RAG app utilizing Raspberry Pi and Ollama. 

The media proven on this article will not be owned by Analytics Vidhya and is used on the Writer’s discretion.

READ ALSO

OpenAI needs to construct a subscription OS in your life • The Register

Yolk’s on you – eggs break much less after they land sideways • The Register


Introduction 

Within the second a part of our collection on constructing a RAG utility on a Raspberry Pi, we’ll develop on the inspiration we laid within the first half, the place we created and examined the core pipeline. Within the first half, we created the core pipeline and examined it to make sure every part labored as anticipated. Now, we’re going to take issues a step additional by constructing a FastAPI utility to serve our RAG pipeline and making a Reflex app to provide customers a easy and interactive method to entry it. This half will information you thru organising the FastAPI back-end, designing the front-end with Reflex, and getting every part up and operating in your Raspberry Pi. By the top, you’ll have a whole, working utility that’s prepared for real-world use.

Studying Targets

  • Arrange a FastAPI back-end to combine with the prevailing RAG pipeline and course of queries effectively.
  • Design a user-friendly interface utilizing Reflex to work together with the FastAPI back-end and the RAG pipeline.
  • Create and take a look at API endpoints for querying and doc ingestion, guaranteeing clean operation with FastAPI.
  • Deploy and take a look at the whole utility on a Raspberry Pi, guaranteeing each back-end and front-end parts operate seamlessly.
  • Perceive the combination between FastAPI and Reflex for a cohesive RAG utility expertise.
  • Implement and troubleshoot FastAPI and Reflex parts to offer a completely operational RAG utility on a Raspberry Pi.

In the event you missed the earlier version, remember to test it out right here: Self-Internet hosting RAG Purposes on Edge Gadgets with Langchain and Ollama – Half I.

This text was revealed as part of the Knowledge Science Blogathon.

Creating Python Surroundings

Earlier than we begin with creating the applying we have to setup the surroundings. Create an surroundings and set up the beneath dependencies:

deeplake 
boto3==1.34.144 
botocore==1.34.144 
fastapi==0.110.3 
gunicorn==22.0.0 
httpx==0.27.0 
huggingface-hub==0.23.4 
langchain==0.2.6 
langchain-community==0.2.6 
langchain-core==0.2.11 
langchain-experimental==0.0.62 
langchain-text-splitters==0.2.2 
langsmith==0.1.83 
marshmallow==3.21.3 
numpy==1.26.4 
pandas==2.2.2 
pydantic==2.8.2 
pydantic_core==2.20.1 
PyMuPDF==1.24.7 
PyMuPDFb==1.24.6 
python-dotenv==1.0.1 
pytz==2024.1 
PyYAML==6.0.1 
reflex==0.5.6 
requests==2.32.3
reflex==0.5.6
reflex-hosting-cli==0.1.13

As soon as the required packages are put in, we have to have the required fashions current within the gadget. We are going to do that utilizing Ollama. Comply with the steps from Half-1 of this text to obtain each the language and embedding fashions. Lastly, create two directories for the back-end and front-end functions.

As soon as the fashions are pulled utilizing Ollama, we’re able to construct the ultimate utility.

Growing the Again-Finish with FastAPI

Within the Half-1 of this text, we have now constructed the RAG pipeline having each the Ingestion and QnA modules. We’ve examined each the pipelines utilizing some paperwork they usually have been completely working. Now we have to wrap the pipeline with FastAPI to create consumable API. This can assist us combine it with any front-end utility like Streamlit, Chainlit, Gradio, Reflex, React, Angular and so forth. Let’s begin by constructing a construction for the applying. Following the construction is totally optionally available, however be sure that to verify the dependency imports in the event you comply with a unique construction to create the app.

Under is the tree construction we’ll comply with:

backend
├── app.py
├── necessities.txt
└── src
    ├── config.py
    ├── doc_loader
    │   ├── base_loader.py
    │   ├── __init__.py
    │   └── pdf_loader.py
    ├── ingestion.py
    ├── __init__.py
    └── qna.py

Let’s begin with the config.py. This file will comprise all of the configurable choices for the applying, just like the Ollama URL, LLM identify and the embeddings mannequin identify. Under is an instance:

LANGUAGE_MODEL_NAME = "phi3"
EMBEDDINGS_MODEL_NAME = "nomic-embed-text"
OLLAMA_URL = "http://localhost:11434"

The base_loader.py file accommodates the mum or dad doc loader class that will probably be inherited by youngsters doc loader. On this utility we’re solely working with PDF information, so a Little one PDFLoader class will probably be
created that may inherit the BaseLoader class.

Under are the contents of base_loader.py and pdf_loader.py:

# base_loader.py
from abc import ABC, abstractmethod

class BaseLoader(ABC):
    def __init__(self, file_path: str) -> None:
        self.file_path = file_path

    @abstractmethod
    async def load_document(self):
        cross


# pdf_loader.py
import os

from .base_loader import BaseLoader
from langchain.schema import Doc
from langchain.document_loaders.pdf import PyMuPDFLoader
from langchain.text_splitter import CharacterTextSplitter


class PDFLoader(BaseLoader):
    def __init__(self, file_path: str) -> None:
        tremendous().__init__(file_path)

    async def load_document(self):
        self.file_name = os.path.basename(self.file_path)
        loader = PyMuPDFLoader(file_path=self.file_path)

        text_splitter = CharacterTextSplitter(
            separator="n",
            chunk_size=1000,
            chunk_overlap=200,
        )
        pages = await loader.aload()
        total_pages = len(pages)
        chunks = []
        for idx, web page in enumerate(pages):
            chunks.append(
                Doc(
                    page_content=web page.page_content,
                    metadata=dict(
                        {
                            "file_name": self.file_name,
                            "page_no": str(idx + 1),
                            "total_pages": str(total_pages),
                        }
                    ),
                )
            )

        final_chunks = text_splitter.split_documents(chunks)
        return final_chunks

We’ve mentioned the working of pdf_loader within the Half-1 of the article.

Subsequent, let’s construct the Ingestion class. That is identical because the one we constructed within the Half-1 of this text.

Code for Ingestion Class

import os
import config as cfg

from pinecone import Pinecone
from langchain.vectorstores.deeplake import DeepLake
from langchain.embeddings.ollama import OllamaEmbeddings
from .doc_loader import PDFLoader

class Ingestion:
    """Doc Ingestion pipeline."""
    def __init__(self):
        attempt:
            self.embeddings = OllamaEmbeddings(
                mannequin=cfg.EMBEDDINGS_MODEL_NAME,
                base_url=cfg.OLLAMA_URL,
                show_progress=True,
            )
            self.vector_store = DeepLake(
                dataset_path="knowledge/text_vectorstore",
                embedding=self.embeddings,
                num_workers=4,
                verbose=False,
            )
        besides Exception as e:
            increase RuntimeError(f"Did not initialize Ingestion system. ERROR: {e}")

    async def create_and_add_embeddings(
        self,
        file: str,
    ):
        attempt:
            loader = PDFLoader(
                file_path=file,
            )

            chunks = await loader.load_document()
            dimension = await self.vector_store.aadd_documents(paperwork=chunks)
            return len(dimension)
        besides (ValueError, RuntimeError, KeyError, TypeError) as e:
            increase Exception(f"ERROR: {e}")

Now that we have now setup the Ingestion class, we’ll go ahead with creating the QnA class. This too is identical because the one we created within the Half-1 of this text.

Code for QnA Class

import os
import config as cfg

from pinecone import Pinecone
from langchain.vectorstores.deeplake import DeepLake
from langchain.embeddings.ollama import OllamaEmbeddings
from langchain_community.llms.ollama import Ollama
from .doc_loader import PDFLoader

class QnA:
    """Doc Ingestion pipeline."""
    def __init__(self):
        attempt:
            self.embeddings = OllamaEmbeddings(
                mannequin=cfg.EMBEDDINGS_MODEL_NAME,
                base_url=cfg.OLLAMA_URL,
                show_progress=True,
            )
            self.mannequin = Ollama(
                mannequin=cfg.LANGUAGE_MODEL_NAME,
                base_url=cfg.OLLAMA_URL,
                verbose=True,
                temperature=0.2,
            )
            self.vector_store = DeepLake(
                dataset_path="knowledge/text_vectorstore",
                embedding=self.embeddings,
                num_workers=4,
                verbose=False,
            )
            self.retriever = self.vector_store.as_retriever(
                search_type="similarity",
                search_kwargs={
                    "okay": 10,
                },
            )
        besides Exception as e:
            increase RuntimeError(f"Did not initialize Ingestion system. ERROR: {e}")

    def create_rag_chain(self):
        attempt:
            system_prompt = """nnContext: {context}"
            """
            immediate = ChatPromptTemplate.from_messages(
                [
                    ("system", system_prompt),
                    ("human", "{input}"),
                ]
            )
            question_answer_chain = create_stuff_documents_chain(self.mannequin, immediate)
            rag_chain = create_retrieval_chain(self.retriever, question_answer_chain)

            return rag_chain
        besides Exception as e:
            increase RuntimeError(f"Did not create retrieval chain. ERROR: {e}")

With this we have now completed creating the code functionalities of the RAG app. Now let’s wrap the app with FastAPI.

Code for the FastAPI Utility

import sys
import os
import uvicorn

from src import QnA, Ingestion
from fastapi import FastAPI, Request, File, UploadFile
from fastapi.responses import StreamingResponse

app = FastAPI()

ingestion = Ingestion()
chatbot = QnA()
rag_chain = chatbot.create_rag_chain()


@app.get("https://www.analyticsvidhya.com/")
def hi there():
    return {"message": "API Working in server 8089"}


@app.put up("/question")
async def ask_query(request: Request):
    knowledge = await request.json()
    query = knowledge.get("query")

    async def event_generator():
        for chunk in rag_chain.choose("reply").stream({"enter": query}):
            yield chunk

    return StreamingResponse(event_generator(), media_type="textual content/plain")


@app.put up("/ingest")
async def ingest_document(file: UploadFile = File(...)):
    attempt:
        os.makedirs("information", exist_ok=True)
        file_location = f"information/{file.filename}"
        with open(file_location, "wb+") as file_object:
            file_object.write(file.file.learn())

        dimension = await ingestion.create_and_add_embeddings(file=file_location)
        return {"message": f"File ingested! Doc rely: {dimension}"}
    besides Exception as e:
        return {"message": f"An error occured: {e}"}


if __name__ == "__main__":
    attempt:
        uvicorn.run(app, host="0.0.0.0", port=8089)
    besides KeyboardInterrupt as e:
        print("App stopped!")

Let’s breakdown the app by every endpoints:

  • First we initialize the FastAPI app, the Ingestion and the QnA objects. We then create a RAG chain utilizing the create_rag_chain technique of QnA class.
  • Our first endpoint is a straightforward GET technique. This can assist us know whether or not the app is wholesome or not. Consider it like a ‘Howdy World’ endpoint.
  • The second is the question endpoint. This can be a POST technique and will probably be used to run the chain. It takes in a request parameter, from which we extract the consumer’s question. Then we create a asynchronous technique that acts as an asynchronous wrapper across the chain.stream operate name. We have to do that to permit FastAPI to deal with the LLM’s stream operate name, to get a ChatGPT-like expertise within the chat interface. We then wrap the asynchronous technique with StreamingResponse class and return it.
  • The third endpoint is the ingestion endpoint. It is also a POST technique that takes in the whole file as bytes as enter. We retailer this file within the native listing after which ingest it utilizing the create_and_add_embeddings technique of Ingestion class.

Lastly, we run the app utilizing uvicorn bundle, utilizing host and port. To check the app, merely run the applying utilizing the next command:

python app.py
Code for the FastAPI Application

Use a API testing IDE like Postman, Insomnia or Bruno for testing the applying. You may as well use Thunder Consumer extension to do the identical.

Testing the Ingestion endpoint:

Testing the Ingestion endpoint

Testing the question endpoint:

Testing the query endpoint

Designing the Entrance-Finish with Reflex

We’ve efficiently created a FastAPI app for the backend of our RAG utility. It’s time to construct our front-end. You’ll be able to selected any front-end library for this, however for this specific article we’ll construct the front-end utilizing Reflex. Reflex is a python-only front-end library, created to construct internet functions, purely utilizing python. It proves us with templates for widespread functions like calculator, picture era and chatbot. We are going to use the chatbot utility template as a begin for our consumer interface. Our last app can have the next construction, so let’s have it right here for reference.

Frontend Listing

We can have a frontend listing for this:

frontend
├── property
│   └── favicon.ico
├── docs
│   └── demo.gif
├── chat
│   ├── parts
│   │   ├── chat.py
│   │   ├── file_upload.py
│   │   ├── __init__.py
│   │   ├── loading_icon.py
│   │   ├── modal.py
│   │   └── navbar.py
│   ├── __init__.py
│   ├── chat.py
│   └── state.py
├── necessities.txt
├── rxconfig.py
└── uploaded_files

Steps for Remaining App

Comply with the steps to organize the grounding for the ultimate app.

Step1: Clone the chat template repository within the frontend listing

git clone https://github.com/reflex-dev/reflex-chat.git .

Step2: Run the next command to initialize the listing as a reflex app

reflex init
Run the following command to initialize the directory as a reflex app

This can setup the reflex app and will probably be able to run and develop. 

Step3: Check the app, use the next command from contained in the frontend listing

reflex run
Test the app, use the following command from inside the frontend directory

Let’s begin modifying the parts. First let’s modify the chat.py file.

Under is the code for a similar:

import reflex as rx
from reflex_demo.parts import loading_icon
from reflex_demo.state import QA, State

message_style = dict(
    show="inline-block",
    padding="0 10px",
    border_radius="8px",
    max_width=["30em", "30em", "50em", "50em", "50em", "50em"],
)


def message(qa: QA) -> rx.Part:
    """A single query/reply message.

    Args:
        qa: The query/reply pair.

    Returns:
        A part displaying the query/reply pair.
    """
    return rx.field(
        rx.field(
            rx.markdown(
                qa.query,
                background_color=rx.shade("mauve", 4),
                shade=rx.shade("mauve", 12),
                **message_style,
            ),
            text_align="proper",
            margin_top="1em",
        ),
        rx.field(
            rx.markdown(
                qa.reply,
                background_color=rx.shade("accent", 4),
                shade=rx.shade("accent", 12),
                **message_style,
            ),
            text_align="left",
            padding_top="1em",
        ),
        width="100%",
    )


def chat() -> rx.Part:
    """Checklist all of the messages in a single dialog."""
    return rx.vstack(
        rx.field(rx.foreach(State.chats[State.current_chat], message), width="100%"),
        py="8",
        flex="1",
        width="100%",
        max_width="50em",
        padding_x="4px",
        align_self="middle",
        overflow="hidden",
        padding_bottom="5em",
    )


def action_bar() -> rx.Part:
    """The motion bar to ship a brand new message."""
    return rx.middle(
        rx.vstack(
            rx.chakra.type(
                rx.chakra.form_control(
                    rx.hstack(
                        rx.enter(
                            rx.enter.slot(
                                rx.tooltip(
                                    rx.icon("information", dimension=18),
                                    content material="Enter a query to get a response.",
                                )
                            ),
                            placeholder="Kind one thing...",
                            id="query",
                            width=["15em", "20em", "45em", "50em", "50em", "50em"],
                        ),
                        rx.button(
                            rx.cond(
                                State.processing,
                                loading_icon(peak="1em"),
                                rx.textual content("Ship", font_family="Ubuntu"),
                            ),
                            sort="submit",
                        ),
                        align_items="middle",
                    ),
                    is_disabled=State.processing,
                ),
                on_submit=State.process_question,
                reset_on_submit=True,
            ),
            rx.textual content(
                "ReflexGPT might return factually incorrect or deceptive responses. Use discretion.",
                text_align="middle",
                font_size=".75em",
                shade=rx.shade("mauve", 10),
                font_family="Ubuntu",
            ),
            rx.emblem(margin_top="-1em", margin_bottom="-1em"),
            align_items="middle",
        ),
        place="sticky",
        backside="0",
        left="0",
        padding_y="16px",
        backdrop_filter="auto",
        backdrop_blur="lg",
        border_top=f"1px stable {rx.shade('mauve', 3)}",
        background_color=rx.shade("mauve", 2),
        align_items="stretch",
        width="100%",
    )

The adjustments are minimal from the one current natively within the template.

Subsequent, we’ll edit the chat.py app. That is the primary chat part.

Code for Essential Chat Part

Under is the code for it:

import reflex as rx
from reflex_demo.parts import chat, navbar, upload_form
from reflex_demo.state import State


@rx.web page(route="/chat", title="RAG Chatbot")
def chat_interface() -> rx.Part:
    return rx.chakra.vstack(
        navbar(),
        chat.chat(),
        chat.action_bar(),
        background_color=rx.shade("mauve", 1),
        shade=rx.shade("mauve", 12),
        min_height="100vh",
        align_items="stretch",
        spacing="0",
    )


@rx.web page(route="https://www.analyticsvidhya.com/", title="RAG Chatbot")
def index() -> rx.Part:
    return rx.chakra.vstack(
        navbar(),
        upload_form(),
        background_color=rx.shade("mauve", 1),
        shade=rx.shade("mauve", 12),
        min_height="100vh",
        align_items="stretch",
        spacing="0",
    )


# Add state and web page to the app.
app = rx.App(
    theme=rx.theme(
        look="darkish",
        accent_color="jade",
    ),
    stylesheets=["https://fonts.googleapis.com/css2?family=Ubuntu&display=swap"],
    model={
        "font_family": "Ubuntu",
    },
)
app.add_page(index)
app.add_page(chat_interface)

That is the code for the chat interface. We’ve solely added the Font household to the app config, the remainder of the code is identical.

Subsequent let’s edit the state.py file. That is the place the frontend will make name to the API endpoints for response.

Modifying state.py File

import requests
import reflex as rx


class QA(rx.Base):
    query: str
    reply: str


DEFAULT_CHATS = {
    "Intros": [],
}


class State(rx.State):
    chats: dict[str, list[QA]] = DEFAULT_CHATS
    current_chat = "Intros"
    url: str = "http://localhost:8089/question"
    query: str
    processing: bool = False
    new_chat_name: str = ""

    def create_chat(self):
        """Create a brand new chat."""
        # Add the brand new chat to the record of chats.
        self.current_chat = self.new_chat_name
        self.chats[self.new_chat_name] = []

    def delete_chat(self):
        """Delete the present chat."""
        del self.chats[self.current_chat]
        if len(self.chats) == 0:
            self.chats = DEFAULT_CHATS
        self.current_chat = record(self.chats.keys())[0]

    def set_chat(self, chat_name: str):
        """Set the identify of the present chat.

        Args:
            chat_name: The identify of the chat.
        """
        self.current_chat = chat_name

    @rx.var
    def chat_titles(self) -> record[str]:
        """Get the record of chat titles.

        Returns:
            The record of chat names.
        """
        return record(self.chats.keys())

    async def process_question(self, form_data: dict[str, str]):
        # Get the query from the shape
        query = form_data["question"]

        # Examine if the query is empty
        if query == "":
            return

        mannequin = self.openai_process_question

        async for worth in mannequin(query):
            yield worth

    async def openai_process_question(self, query: str):
        """Get the response from the API.

        Args:
            form_data: A dict with the present query.
        """
        # Add the query to the record of questions.
        qa = QA(query=query, reply="")
        self.chats[self.current_chat].append(qa)
        payload = {"query": query}

        # Clear the enter and begin the processing.
        self.processing = True
        yield

        response = requests.put up(self.url, json=payload, stream=True)

        # Stream the outcomes, yielding after each phrase.
        for answer_text in response.iter_content(chunk_size=512):
            # Guarantee answer_text will not be None earlier than concatenation
            answer_text = answer_text.decode()
            if answer_text will not be None:
                self.chats[self.current_chat][-1].reply += answer_text
            else:
                answer_text = ""
                self.chats[self.current_chat][-1].reply += answer_text
            self.chats = self.chats
            yield

        # Toggle the processing flag.
        self.processing = False

On this file, we have now outlined the URL for the question endpoint. We’ve additionally modified the openai_process_question technique to ship a POST request to the question endpoint and get the streaming
response, which will probably be displayed within the chat interface.

Writing Contents of the file_upload.py File

Lastly, let’s write the contents of the file_upload.py file. This part will probably be displayed at first which can enable us to add the file for ingestion.

import reflex as rx
import os
import time

import requests


class UploadExample(rx.State):
    importing: bool = False
    ingesting: bool = False
    progress: int = 0
    total_bytes: int = 0
    ingestion_url = "http://127.0.0.1:8089/ingest"

    async def handle_upload(self, information: record[rx.UploadFile]):
        self.ingesting = True
        yield
        for file in information:
            file_bytes = await file.learn()
            file_name = file.filename
            information = {
                "file": (os.path.basename(file_name), file_bytes, "multipart/form-data")
            }
            response = requests.put up(self.ingestion_url, information=information)
            self.ingesting = False
            yield
            if response.status_code == 200:
                # yield rx.redirect("/chat")
                self.show_redirect_popup()

    def handle_upload_progress(self, progress: dict):
        self.importing = True
        self.progress = spherical(progress["progress"] * 100)
        if self.progress >= 100:
            self.importing = False

    def cancel_upload(self):
        self.importing = False
        return rx.cancel_upload("upload3")


def upload_form():
    return rx.vstack(
        rx.add(
            rx.flex(
                rx.textual content(
                    "Drag and drop file right here or click on to pick out file",
                    font_family="Ubuntu",
                ),
                rx.icon("add", dimension=30),
                course="column",
                align="middle",
            ),
            id="upload3",
            border="1px stable rgb(233, 233,233, 0.4)",
            margin="5em 0 10px 0",
            background_color="rgb(107,99,246)",
            border_radius="8px",
            padding="1em",
        ),
        rx.vstack(rx.foreach(rx.selected_files("upload3"), rx.textual content)),
        rx.cond(
            ~UploadExample.ingesting,
            rx.button(
                "Add",
                on_click=UploadExample.handle_upload(
                    rx.upload_files(
                        upload_id="upload3",
                        on_upload_progress=UploadExample.handle_upload_progress,
                    ),
                ),
            ),
            rx.flex(
                rx.spinner(dimension="3", loading=UploadExample.ingesting),
                rx.button(
                    "Cancel",
                    on_click=UploadExample.cancel_upload,
                ),
                align="middle",
                spacing="3",
            ),
        ),
        rx.alert_dialog.root(
            rx.alert_dialog.set off(
                rx.button("Proceed to Chat", color_scheme="inexperienced"),
            ),
            rx.alert_dialog.content material(
                rx.alert_dialog.title("Redirect to Chat Interface?"),
                rx.alert_dialog.description(
                    "You can be redirected to the Chat Interface.",
                    dimension="2",
                ),
                rx.flex(
                    rx.alert_dialog.cancel(
                        rx.button(
                            "Cancel",
                            variant="gentle",
                            color_scheme="grey",
                        ),
                    ),
                    rx.alert_dialog.motion(
                        rx.button(
                            "Proceed",
                            color_scheme="inexperienced",
                            variant="stable",
                            on_click=rx.redirect("/chat"),
                        ),
                    ),
                    spacing="3",
                    margin_top="16px",
                    justify="finish",
                ),
                model={"max_width": 450},
            ),
        ),
        align="middle",
    )

This part will enable us to add a file and ingest it into the vector retailer. It makes use of the ingest endpoint of our FastAPI app to add and ingest the file. After ingestion, the consumer can merely transfer
to the chat interface for asking queries.

With this we have now accomplished constructing the front-end for our utility. Now we might want to take a look at the applying utilizing some doc.

Testing and Deployment

Now let’s take a look at the applying on some manuals or paperwork. To make use of the applying, we have to run each the back-end app and the reflex app individually. Run the back-end app from it’s listing utilizing the
following command:

python app.py

Await the FastAPI to begin operating. Then in one other terminal occasion run the front-end app utilizing the next command:

reflex run

One the apps are up and operating, obtained to the following URL to entry the reflex app. Initially we might be within the File Add web page. Add a file and press the add button.

reflex chat

The file will probably be uploaded and ingested. This can take some time relying on the doc dimension and
the gadget specs. As soon as it’s completed, click on on the ‘Proceed to Chat’ button to maneuver to the chat interface. Write your question and press Ship.

Conclusion

On this two half collection, you’ve now constructed a whole and practical RAG utility on a Raspberry Pi, from creating the core pipeline to wrapping it with a FastAPI back-end and growing a Reflex-based front-end. With these instruments, your RAG pipeline is accessible and interactive, offering real-time question processing by way of a user-friendly internet interface. By mastering these steps, you’ve gained precious expertise in constructing and deploying end-to-end functions on a compact, environment friendly platform. This setup opens the door to numerous prospects for deploying AI-driven functions on resource-constrained gadgets just like the Raspberry Pi, making cutting-edge expertise extra accessible and sensible for on a regular basis use.

Key Takeaways

  • An in depth information is supplied on organising the event surroundings, together with putting in mandatory dependencies and fashions utilizing Ollama, guaranteeing the applying is prepared for the ultimate construct.
  • The article explains the way to wrap the RAG pipeline in a FastAPI utility, together with organising endpoints for querying the mannequin and ingesting paperwork, making the pipeline accessible by way of an internet API.
  • The front-end of the RAG utility is constructed utilizing Reflex, a Python-only front-end library. The article demonstrates the way to modify the chat utility template to create a user-friendly interface for interacting with the RAG pipeline.
  • The article guides on integrating the FastAPI backend with the Reflex front-end and deploying the whole utility on a Raspberry Pi, guaranteeing seamless operation and consumer accessibility.
  • Sensible steps are supplied for testing each the ingestion and question endpoints utilizing instruments like Postman or Thunder Consumer, together with operating and testing the Reflex front-end to make sure the whole utility features as anticipated.

Incessantly Requested Query

Q1: How can I make the app accessible to myself from wherever within the World with out compromising safety?

A. There’s a platform named Tailscale that permits your gadgets to be related to a non-public safe community, accessible solely to you. You’ll be able to add your Raspberry Pi and different gadgets to Tailscale gadgets and hook up with the VPN to entry your apps, from wherever throughout the world. 

Q2: My utility could be very sluggish when it comes to ingestion and QnA.

A. That’s the constraint on account of low {hardware} specs of Raspberry Pi. The article is only a head up tutorial on the way to begin constructing RAG app utilizing Raspberry Pi and Ollama. 

The media proven on this article will not be owned by Analytics Vidhya and is used on the Writer’s discretion.

Tags: ApplicationsdevicesEdgeHostingLangChainRAG

Related Posts

Altman Shutterstock.jpg
ChatGPT

OpenAI needs to construct a subscription OS in your life • The Register

May 13, 2025
Shutterstock Brokenegg.jpg
ChatGPT

Yolk’s on you – eggs break much less after they land sideways • The Register

May 10, 2025
Shutterstock Chrome Iphone.jpg
ChatGPT

If Google is pressured to surrender Chrome, what occurs subsequent? • The Register

May 9, 2025
Aicoding.jpg
ChatGPT

30 p.c of some Microsoft code now written by AI • The Register

May 8, 2025
Eddy Cue Univision.jpg
ChatGPT

Google shares hunch as Apple exec calls AI the brand new search • The Register

May 7, 2025
Shutterstock Ai Agent.jpg
ChatGPT

AI brokers promise large issues. How can we assist them? • The Register

May 7, 2025
Next Post
Img Xyibjjes4fgf5r3imu6llvxg 800x457.jpg

Tokenized US Treasuries break $2 billion market cap barrier

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025
How To Maintain Data Quality In The Supply Chain Feature.jpg

Find out how to Preserve Knowledge High quality within the Provide Chain

September 8, 2024
0khns0 Djocjfzxyr.jpeg

Constructing Data Graphs with LLM Graph Transformer | by Tomaz Bratanic | Nov, 2024

November 5, 2024
1vrlur6bbhf72bupq69n6rq.png

The Artwork of Chunking: Boosting AI Efficiency in RAG Architectures | by Han HELOIR, Ph.D. ☕️ | Aug, 2024

August 19, 2024

EDITOR'S PICK

Screenshot 2025 02 28 At 3.31.14 pm 1024x1019.png

Information Science: From Faculty to Work, Half II

March 3, 2025
Bitcoin 609d4d.jpg

Bitcoin Surpasses Realized Worth Of Latest Patrons — Rally Incoming Or Double High?

April 24, 2025
Dogecoin Etf.jpg

21Shares highlights Dogecoin’s position in diversified, high-return funding portfolios

May 1, 2025
1pz7zpn1aql5qp0iglo60da.png

Easy methods to Sort out an Optimization Drawback with Constraint Programming | by Yan Georget | Dec, 2024

December 24, 2024

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • How I Lastly Understood MCP — and Bought It Working in Actual Life
  • Empowering LLMs to Assume Deeper by Erasing Ideas
  • Tether Gold enters Thailand with itemizing on Maxbit trade
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?