AI Assistant
RAG Chat
Basic Prototype
RAG Chat
Uses RAG (Retrieval Augmented Generation) to enhance chat interactions with relevant information.
Autofill
Live stage
0.00s
-
Response
Noch keine Antwort.
Data Flow & Handling
What happens to user data
Processed
Message content, optional conversation memory, retrieval context, model output, and request metadata.
Models Used
Generation model: qwen2.5:3b
Embedding model: nomic-embed-text
Rerank model: qwen2.5:3b or disabled
System
Deployment: self-hosted
CPU: 6 vCPUs
GPU: none (CPU-only inference)
Technical pipeline details
- Validate input
- Create request tracking
- Detect tool usage
- Classify request type
- Choose knowledge collection
- Embed query
- Retrieve relevant chunks from Chroma
- Convert matches into sources
- Optionally rerank top sources
- Filter weak matches
- Trim final context
- Build grounded prompt with context and conversation memory
- Call LLM: Generate and stream answer
- Normalize and verify grounding
- Save conversation turn
- Log timings and token usage
- Finish