The Guide for Time Series Data Projects is out.

Download now
Skip to content

Architecture of AI Knowledge Assistants

← Go back to AI/ML Database

AI Knowledge Assistants leverage comprehensive and evolving knowledge databases and inference engines to make informed decisions, learn and adjust based on the new data. Typically, the overall architecture of a knowledge assistant consists of four parts: Context Data, LLM Gateway, Chatbot, and Monitoring and Reporting.

Context Data

Contextual data is the foundation for knowledge assistants, where vast amounts of data are processed and prepared for retrieval. It is crucial for the enterprise-specific intelligence. This data is derived from various sources, chunked, and stored alongside embeddings in a vector store. Access to this data needs to be controlled and monitored.

Context data is usually prepared following common principles for creating data pipelines. A landing zone stores incoming data in various formats, which can be structured, semi-structured, or unstructured, even binary sometimes. Then, input data is split into smaller consumable chunks to generate embeddings. Both chunks and vectors are stored together, in order to reference which contextual information is extracted from which source. Data access should be carefully governed in order to avoid unauthorized access, for example by creating multiple search indexes that are secured with privileges at the database or application level.

LLM Gateway

The LLM component provides a gateway to different embedding models, depending on the use case and type of data being embedded. A LLM service encapsulates the interaction with LLMs and chooses from the most appropriate LLM for the particular use case.

LLM logging mainly tracks costs associated with using LLMs (e.g. tokens generated, subscriptions). It helps manage operational budget and optimize resource allocation. Additionally, all interactions are logged to understand usage patterns and help with troubleshooting and improvements.


The chatbot interface provided to users is usually a web or a mobile application consisting of multiple components:

  • The input handler analyses the request and enforces some guardrails (there might be some questions we don’t want to answer).
  • The response formation retrieves and enriches the context.
  • The output handler enforces some final guardrails and grounding of the results to avoid some undesired answers and reduce hallucinations.

Learn more about AI-powered chatbots >

Monitoring and Reporting

Monitoring and Reporting

Monitoring and reporting are crucial to understand the actual system usage (usage reports), the costs occurred by the different components and users (cost reports), and to get insights into the data sources used (data reports).

Monitoring and reporting are divided into three core components:

  • Usage monitoring aims to monitor closely how the solution is utilized across the organization (metrics: number of user interactions, peak usage times, types of queries being processed). Understanding usage patterns is crucial for effective scaling and to meet the evolving needs of the company.
  • Cost analysis serves to track and analyze all operational expenses (token consumption by LLMs, data processing, and other computational resources). This promotes effective budget management and assists in identifying
    opportunities for cost optimization.
  • Data analytics provides a comprehensive view of the performance and effectiveness, including response accuracy, user satisfaction, and overall efficiency of operations. It plays a pivotal role in guiding future improvements, ensuring the solution remains a cutting-edge tool for the company.

Learn more about Architecture of AI Knowledge Assistants in this White Paper

How to Build AI-driven knowledge assistants

Discover other AI/ML topics

Interested in learning more?