AI Assistant

RAG Chat

Basic Prototype

RAG Chat

Uses RAG (Retrieval Augmented Generation) to enhance chat interactions with relevant information.

Autofill

Live stage

0.00s

-

Response

Noch keine Antwort.

Data Flow & Handling

What happens to user data

Processed

Message content, optional conversation memory, retrieval context, model output, and request metadata.

Models Used

Generation model: qwen2.5:3b

Embedding model: nomic-embed-text

Rerank model: qwen2.5:3b or disabled

System

Deployment: self-hosted

CPU: 6 vCPUs

GPU: none (CPU-only inference)

Technical pipeline details
  1. Validate input
  2. Create request tracking
  3. Detect tool usage
  4. Classify request type
  5. Choose knowledge collection
  6. Embed query
  7. Retrieve relevant chunks from Chroma
  8. Convert matches into sources
  9. Optionally rerank top sources
  10. Filter weak matches
  11. Trim final context
  12. Build grounded prompt with context and conversation memory
  13. Call LLM: Generate and stream answer
  14. Normalize and verify grounding
  15. Save conversation turn
  16. Log timings and token usage
  17. Finish