Additional concepts

Chunking

Chunking is the process of splitting large documents or datasets into smaller, manageable pieces called chunks.

Why Chunking Matters

AI models can only process a limited number of tokens at once.
Chunking ensures that long documents remain searchable, indexable, and retrievable.

Best Practices

Smaller chunks → Higher precision (AI focuses better) but may lose broader context
Larger chunks → Preserve context but may retrieve less relevant information
The right balance depends on your data and use case

Summary

Chunking makes your data searchable and structured, preparing it for retrieval systems like RAG.

RAG (Retrieval-Augmented Generation)

RAG (Retrieval-Augmented Generation) improves chatbot accuracy by combining knowledge retrieval with LLM generation.
Instead of relying only on the model’s memory, responses are grounded in your uploaded data.

How RAG Works in Chatzy

1) Knowledge Base Upload & Processing

Users upload their Knowledge Base in the Data section
The system:
- Splits content into chunks
- Generates vector embeddings
- Stores them in the database

Chunking can be configured in Advanced Bot Settings:

Number of chunks
Chunk length

2) Query Understanding

When a user sends a message:

The system checks if KB(knowledge base) sources are available
A lightweight GPT model analyzes:
- User message
- Conversation history
It generates a search query based on user intent

👉 If no relevant intent is found, the search query may be null.

3) Retrieval Step

If query is null

No KB context is attached
The LLM answers normally

If query exists

Vector search is performed on the KB
Top N relevant chunks are retrieved

Similarity and relevance thresholds can be configured in Advanced Settings.

4) Hybrid Search (Optional)

Hybrid Search combines similarity search and text (keyword) search into a final relevancy score.How it works:

Extra chunks are fetched as a broad match
This step may include some noise
The system then reranks results
Only the top N most relevant chunks are kept

This ensures:

Better precision
Less irrelevant context
Higher response quality

5) Generation Step

Retrieved chunks are combined
Added to the system prompt as context
Sent to the LLM with instructions

The LLM then generates a grounded, context-aware response.

RAG Process Flow

User Query
   │
   ▼
Intent → Search Query Generation
   │
   ▼
Vector / Hybrid Search on KB
   │
   ▼
Reranking (Top N Chunks)
   │
   ▼
Context + Prompt → LLM
   │
   ▼
Final Answer

Benefits of RAG

✅ Reduces hallucinations
✅ Grounds answers in your data
✅ Works well for large knowledge bases
✅ Configurable retrieval behavior
✅ More accurate and trustworthy responses

In short: Chunking makes your data searchable, and RAG ensures the model uses the right information at the right time.

Advance Settings for Conversational AI Agent

Location Request

If your chatbot needs the user’s live location during the conversation, you can instruct it to send a location request button. This is useful for scenarios such as booking a service visit, delivery verification, or assigning the nearest agent.Important:
You must clearly mention the use case inside your Base Prompt, explaining when the bot should send this JSON.Use this when:

You need the user’s address or live location for delivery, service visits, event check-ins, etc.

JSON format (use exactly as shown):

{
      "type": "location_request", 
      "content": "Replace with the message that you want to display"
}

Call Permission Request

Use this when your AI agent needs to ask the user for permission to place a WhatsApp call (required for business-initiated calls).Why it’s needed:

WhatsApp requires explicit user consent before initiating an outbound call.

Best used for:

Sales follow-ups
Demo or consultation calls
Appointment confirmation calls

Important:
Specify when the bot should request permission inside your Base Prompt along with json so it triggers appropriately.(e.g., “If the user requests a callback, use ONLY the following for call permission request”).JSON format:

{
      "type": "call_permission_request", 
      "content": "Replace with the message that you want to display"
}

WhatsApp Call Button

This option allows the agent to display a Call Now button directly inside the conversation.Use this when:

You want the user to start the call themselves (no approval template required).
Ideal for support lines, quick escalation, or urgent help.

When the user taps the button, the call is placed immediately from their device.Important:
In your Base Prompt, describe in which situations this button should appear (e.g., “If the user asks to talk to support, use ONLY the following for call button”).JSON format:

{
      "type": "voice_call", 
      "content": "Replace with the message that you want to display"
}

Documentation Index

​Why Chunking Matters

​Best Practices

​Summary

​How RAG Works in Chatzy

​1) Knowledge Base Upload & Processing

​2) Query Understanding

​3) Retrieval Step

​4) Hybrid Search (Optional)

​5) Generation Step

​RAG Process Flow

​Benefits of RAG

​Location Request

​Call Permission Request

​WhatsApp Call Button