The overall architecture of a knowledge assistant usually consists of four parts: Context Data, LLM Gateway, Chatbot, as well as Monitoring and Reporting.
Context Data
Contextual data is the foundation for knowledge assistants, where vast amounts of data are processed and prepared for retrieval. It is crucial for the enterprise-specific intelligence. This data is derived from various sources, chunked, and stored alongside embeddings in a vector store. Access to this data needs to be controlled and monitored.
Context data is usually prepared following common principles for creating data pipelines. A landing zone stores incoming data in various formats, which can be structured, semi-structured, or unstructured, even binary sometimes. Then, input data is split into smaller consumable chunks to generate embeddings. Both chunks and vectors are stored together, in order to reference which contextual information is extracted from which source. Data access should be carefully governed in order to avoid unauthorized access, for example by creating multiple search indexes that are secured with privileges at the database or application level.
For more complex data pipelines, knowledge APIs provide access to additional data sources to vectorize (e.g. wikis), or directory services for data access control.
LLM Gateway
The LLM component provides a gateway to different embedding models, depending on the use case and type of data being embedded. A LLM service encapsulates the interaction with LLMs and chooses from the most appropriate LLM for the particular use case.
LLM logging mainly tracks costs associated with using LLMs (e.g. tokens generated, subscriptions). It helps manage operational budget and optimize resource allocation. Additionally, all interactions are logged to understand usage patterns and help with troubleshooting and improvements.
Chatbot
The chatbot interface provided to users is usually a web or a mobile application consisting of multiple components:
- The input handler analyses the request and enforces some guardrails (there might be some questions we don’t want to answer).
- The response formation retrieves and enriches the context.
- The output handler enforces some final guardrails and grounding of the results to avoid some undesired answers and reduce hallucinations.
'
Configuration stores and operational stores are used for conversation history, user settings, feedback, and other critical operational data essential for the knowledge assistant to be functional. Conversation history is particularly important for providing historic context to the LLM, and enhancing the relevance of responses in ongoing interactions.
Monitoring and Reporting
Monitoring and reporting are crucial to understand the actual system usage (usage reports), the costs occurred by the different components and users (cost reports), and to get insights into the data sources used (data reports).
Monitoring and reporting are divided into three core components:
- Usage monitoring aims to monitor closely how the solution is utilized across the organization (metrics: number of user interactions, peak usage times, types of queries being processed). Understanding usage patterns is crucial for effective scaling and to meet the evolving needs of the company.
- Cost analysis serves to track and analyze all operational expenses (token consumption by LLMs, data processing, and other computational resources). This promotes effective budget management and assists in identifying
opportunities for cost optimization. - Data analytics provides a comprehensive view of the performance and effectiveness, including response accuracy, user satisfaction, and overall efficiency of operations. It plays a pivotal role in guiding future improvements, ensuring the solution remains a cutting-edge tool for the company.