Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.chatzy.ai/llms.txt

Use this file to discover all available pages before exploring further.

Chunking is the process of splitting large documents or datasets into smaller, manageable pieces called chunks.

Why Chunking Matters

AI models can only process a limited number of tokens at once.
Chunking ensures that long documents remain searchable, indexable, and retrievable.

Best Practices

  • Smaller chunks → Higher precision (AI focuses better) but may lose broader context
  • Larger chunks → Preserve context but may retrieve less relevant information
  • The right balance depends on your data and use case

Summary

Chunking makes your data searchable and structured, preparing it for retrieval systems like RAG.
RAG (Retrieval-Augmented Generation) improves chatbot accuracy by combining knowledge retrieval with LLM generation.
Instead of relying only on the model’s memory, responses are grounded in your uploaded data.

How RAG Works in Chatzy

1) Knowledge Base Upload & Processing

  • Users upload their Knowledge Base in the Data section
  • The system:
    • Splits content into chunks
    • Generates vector embeddings
    • Stores them in the database
Chunking can be configured in Advanced Bot Settings:
  • Number of chunks
  • Chunk length

2) Query Understanding

When a user sends a message:
  • The system checks if KB(knowledge base) sources are available
  • A lightweight GPT model analyzes:
    • User message
    • Conversation history
  • It generates a search query based on user intent
👉 If no relevant intent is found, the search query may be null.

3) Retrieval Step

If query is null
  • No KB context is attached
  • The LLM answers normally
If query exists
  • Vector search is performed on the KB
  • Top N relevant chunks are retrieved
Similarity and relevance thresholds can be configured in Advanced Settings.

4) Hybrid Search (Optional)

Hybrid Search combines similarity search and text (keyword) search into a final relevancy score.How it works:
  • Extra chunks are fetched as a broad match
  • This step may include some noise
  • The system then reranks results
  • Only the top N most relevant chunks are kept
This ensures:
  • Better precision
  • Less irrelevant context
  • Higher response quality

5) Generation Step

  • Retrieved chunks are combined
  • Added to the system prompt as context
  • Sent to the LLM with instructions
The LLM then generates a grounded, context-aware response.

RAG Process Flow

User Query


Intent → Search Query Generation


Vector / Hybrid Search on KB


Reranking (Top N Chunks)


Context + Prompt → LLM


Final Answer


Benefits of RAG

✅ Reduces hallucinations
✅ Grounds answers in your data
✅ Works well for large knowledge bases
✅ Configurable retrieval behavior
✅ More accurate and trustworthy responses

In short: Chunking makes your data searchable, and RAG ensures the model uses the right information at the right time.

Location Request

If your chatbot needs the user’s live location during the conversation, you can instruct it to send a location request button. This is useful for scenarios such as booking a service visit, delivery verification, or assigning the nearest agent.Important:
You must clearly mention the use case inside your Base Prompt, explaining when the bot should send this JSON.
Use this when:
  • You need the user’s address or live location for delivery, service visits, event check-ins, etc.
JSON format (use exactly as shown):
{
      "type": "location_request", 
      "content": "Replace with the message that you want to display"
}

Call Permission Request

Use this when your AI agent needs to ask the user for permission to place a WhatsApp call (required for business-initiated calls).Why it’s needed:
  • WhatsApp requires explicit user consent before initiating an outbound call.
Best used for:
  • Sales follow-ups
  • Demo or consultation calls
  • Appointment confirmation calls
Important:
Specify when the bot should request permission inside your Base Prompt along with json so it triggers appropriately.(e.g., “If the user requests a callback, use ONLY the following for call permission request”).
JSON format:
{
      "type": "call_permission_request", 
      "content": "Replace with the message that you want to display"
}

WhatsApp Call Button

This option allows the agent to display a Call Now button directly inside the conversation.Use this when:
  • You want the user to start the call themselves (no approval template required).
  • Ideal for support lines, quick escalation, or urgent help.
When the user taps the button, the call is placed immediately from their device.Important:
In your Base Prompt, describe in which situations this button should appear (e.g., “If the user asks to talk to support, use ONLY the following for call button”).
JSON format:
{
      "type": "voice_call", 
      "content": "Replace with the message that you want to display"
}