How to Create Your Own AI Bot for Document Q&A

Select Language:

Having a local AI to manage your personal documents can be incredibly beneficial. Imagine having a chatbot that thoroughly reads all your important files and can quickly answer questions like:

"What is the deductible for my car insurance?"

"Is my supplemental dental insurance valid for inlays?"

If you’re into board games, you could instruct the AI with all the game manuals and then ask it questions such as:

"Where can I place tiles in Qwirkle?"

We’ve experimented with how effectively this works on standard home computers.

Further reading: 4 free AI chatbots to operate directly on your PC

What You Need

To make inquiries about your files using a locally executed AI, you primarily need three components: a local AI model, a database for your documents, and a chatbot interface.

AI programs like Anything LLM and Msty provide these elements—all at no cost.

To get started, install these applications on a computer with at least 8GB of RAM and a relatively modern CPU. You should also have 5GB or more free space on your SSD.

A good graphics card from Nvidia or AMD can greatly enhance performance, and you can refer to this list of compatible models for guidance.

Once you install Anything LLM or Msty, you’ll be equipped with a chatbot on your machine. The programs will load an AI language model known as a Large Language Model (LLM) to facilitate your queries.

Which AI model you can operate in your chatbot will depend largely on your PC’s capabilities. Getting familiar with the chatbot interface is straightforward, but tweaking the extensive settings may require advanced knowledge.

Nevertheless, even using default settings is simple. Besides the AI model and chatbot, Anything LLM and Msty also include an embedding model that ingests your documents and organizes them into a local database for the language model’s use.

Bigger is Better: Limitations of Smaller AI Models

While some AI language models can operate on weaker hardware, “weaker” here describes computers with only 8GB RAM and dated CPUs lacking robust Nvidia or AMD graphics cards.

Models compatible with such systems typically feature 2 to 3 billion parameters, having been optimized through a process known as quantization.

This method reduces both memory and processing power needs but also compromises output quality. Examples include variants like Gemma 2 2B or Llama 3.2 3B.

Although these smaller models can generate surprisingly valid answers for various queries and text outputs entirely locally, their effectiveness diminishes when processing your documents.

During our initial tests with local AI using personal documents, the results were disappointing, leading us to suspect that there was an issue with the document embedding process.

Only after experimenting with a model featuring 7 billion parameters did we notice a significant improvement in responses. After testing the online model ChatGPT 4o, we were able to see the quality of the answers it could provide.

The key determinant for local AI’s performance hinges on the AI model. The larger the model with more parameters, the better the results will likely be. Components like the embedding model and the chatbot itself play a much smaller role in this context.

Understanding Embedding and Retrieval-Augmented Generation

Ionos

Your data is accessed by the AI through a technique called embedding and retrieval-augmented generation (RAG).

When utilizing Anything LLM or Msty, your documents undergo analysis through an embedding model that breaks their content down for storage as vectors.

Besides document embeddings, this model can also work with database information or other sources of knowledge.

The outcome is a vector database that encapsulates the essence of your documents. This database enables the AI to accurately locate relevant data.

This process is distinct from traditional word searches, where an index usually tracks the position of a word within a document. In contrast, a vector database stores the actual content contained within the texts.

This means that a question like:

"What is stated on page 15 of my car insurance document?"

may not yield effective results with RAG since references to “page 15” are typically excluded from the vector database. Such inquiries could lead the AI model to generate fictitious information instead of accurate answers.

Creating the vector database is the first phase, followed by the retrieval of information, indicated as RAG.

The AI model receives results from the vector database alongside the original question (augment).

Next, the AI generates an answer (generate), combining insights from the AI model and the vector database.

Comparison: Choosing Between Anything LLM and Msty

We’ve tried both chatbots—Anything LLM and Msty. While they share similarities, they differ notably in the speed at which they embed local documents, a crucial step that can take time.

For example, Anything LLM managed to embed a PDF file of roughly 150 pages in 10 to 15 minutes during our testing, while Msty frequently took three to four times longer.

Both tools were assessed using their respective preset AI models: “Mixed Bread Embed Large” for Msty and “All Mini LM L6 v2” for Anything LLM.

Although Msty’s embedding process is time-consuming, its detailed user guidance and accurate citation sources make it worthwhile, particularly for faster computers.

If your hardware isn’t as robust, try Anything LLM first and see if you achieve satisfying results. The AI language model is crucial here, and both platforms provide access to similar options.

Notably, both Anything LLM and Msty allow alternative embedding models, though this can complicate configuration. Online embedding models, sourced from providers like OpenAI, are also available, although an API key is required to access them.

Anything LLM: Fast and Straightforward

To use the Anything LLM chatbot, start the installation process and be aware that Microsoft Defender Smartscreen might flag the file as unsecure. You can proceed by selecting “Run anyway.”

Once installed, choose a preferred AI language model within Anything LLM. We suggest starting with Gemma 2 2B, which can be changed later on (see “Change AI language model” below).

Create a workspace in the configuration wizard or by clicking “New workspace” later, where you can import documents. Name the workspace and save it.

Your new workspace will appear in the left sidebar of Anything LLM. Click its icon near the gear symbol to import your documents. Use the option to either upload or drag and drop files.

After a few seconds, your documents will appear on the list. Click on them to move them to the right, then click “Save and Embed” to initiate the embedding process, which may take some time depending on your documents’ size and your computer’s speed.

Tip: Begin with a simple text document rather than attempting to upload extensive files like the past 30 years of PCWorld in PDF format. This helps you gauge whether larger documents are feasible for your machine.

Once embedding completes, close the window and pose your first question to the chatbot. To ensure it uses your documents, you must select the workspace you created on the left panel, then type your query in the main window.

To change the AI language model, click the gear icon at the bottom left, then “LLM.” Under “LLM provider,” you can select from various suggested models.

Deepseek also provides new models. By clicking on “Import model from Ollama or Hugging Face,” you can explore nearly all currently available free AI models.

The download for these models can be lengthy due to their size, and server speed may fluctuate. If you’re pursuing an online AI model, you can select it from the “LLM provider” dropdown.

Things to note about Anything LLM: Some options might be tricky to navigate. Always confirm changes by clicking “Save.” The button can be hard to locate on longer configuration pages.

Also, you can adjust the user interface language in Anything LLM under “Open settings > Customize > Display Language.”

IDG

Msty: A Versatile Chatbot for Competitive Hardware

The Msty chatbot provides greater flexibility in its applications compared to Anything LLM. It can function as a local AI chatbot without needing to merge your files.

Msty allows multiple AI models to be loaded simultaneously. Its installation and setup mirror that of Anything LLM.

IDG

In Msty, “Knowledge Stack” replaces Anything LLM’s term “Workspace,” configured from the menu at the bottom left.

To begin, create a new knowledge stack and choose your documents before starting the embedding process by clicking “Compose.”

Expect some time for completion. Back in Msty’s main window, enter your question in the input field below.

Ensure the chatbot references your documents by selecting the knowledge stack icon and checking the box for your chosen stack.

Troubleshooting Inaccurate or Missing Answers

If the responses regarding your documents aren’t satisfactory, try selecting a more advanced AI model. For instance, upgrade from Gemma 2 2B (2 billion parameters) to Gemma 2 9B, or Llama 3.1 with 8 billion parameters.

If results are still lacking, or if your computer is slow in responding, consider transitioning to an online language model, which won’t have access to your local files or their vector databases.

However, it can still utilize relevant sections of your vector database. In Anything LLM, each workspace can switch to an online model individually. Click the gear icon for that workspace and choose “Open AI” under “Chat settings > Workspace LLM provider” to access ChatGPT models.

You will need a paid Open AI API key for this, costing $12. The number of responses you receive will vary based on the language model you choose; more details can be found on openai.com/api/pricing.

Should you face data protection issues preventing the use of an online model, Anything LLM offers a troubleshooting guide that explains the embedding and RAG processes, along with configuration tips to yield better results.