GenAI Core Concepts Explained (RAG, Function Calling, MCP, AI Agent)
Overview
Large language models (LLMs) are getting better and smarter by the day—but they still have their limits. For example, they can hardly ingest the latest info, or do anything beyond text replies. Also, hallucination is a prominent problem.
To fill these gaps, a few key ideas have started to take off: RAG (Retrieval-Augmented Generation), Function Calling, MCP (Model Context Protocol), and AI Agents. These tools help models stay up to date, connect with real-world tools, and even complete tasks for you.
BladePipe recently launched a new feature called RagApi, which brings these ideas together in a powerful and easy-to-use way. In this blog, we’ll walk you through what each of these terms means, how they work, and how BladePipe puts them into action.
RAG: Retrieval-Augmented Generation
RAG(Retrieval-Augmented Generation) is an AI architecture combining two things: retrieving information and generating answers. Instead of having the LLMs reply only based on what it “remembers” from training, RAG lets it look up relevant info from external sources—like documents or databases—and then use that info to answer your question, which will be more accurate and relevant.
So, it’s like giving your AI access to a private library every time you ask it something.
RAG's Strengths
- Up-to-date knowledge: The model doesn’t just rely on old training data. It can pull in fresh or domain-specific content.
- Work with private data: You can enjoy enhanced security and customized services.
- Fewer made-up answers: Since it’s referencing real content, it’s less likely to hallucinate or guess.
RAG Workflow
- Build a knowledge base: Take your documents, slice them into smaller parts, and turn them into vectors. Store them in a vector database like PGVector.
- User asks a question: That question is also turned into a vector.
- Do a similarity search: The system finds the most relevant text chunks from the database.
- Feed results to the model: These chunks are added to the model prompt to help it generate a better answer.
What is a Vector?
In RAG, turning text into vectors is step one. Then, what is a vector on earth?
Think of a vector as a way for a computer to understand meaning using numbers.
To make it easy to understand, let’s take the word “apple.” We humans know what it is from experience. But to a computer, it has to be turned into a vector through the method embedding—like:
[0.12, 0.85, -0.33, ..., 0.07] (say, 768 dimensions)
Each number (dimension) represents a hidden meaning:
- The 12th dimension might say “is it a fruit?”
- The 47th dimension might say “is it food?”
- The 202nd dimension might say “is it a company?”
- The 588th dimension might say “is it red in color?”
Each dimension stands for a feature, and the number at each dimension is like the "score" for it. The higher the score is, the prominent the feature is.
Based on the scores for all dimensions, every word or sentence gets a position in a semantic space, like a pin on a multi-dimensional map.
How do We Measure Similarity?
Let’s say we turn “apple” and “banana” into vectors. Even though they’re different words, their scores for many dimensions are similar, because they are similar semantically.
To make it understandable, we use three dimensions [category, edible or not, color] to measure the semantic similarity of the words: apple, banana and plane.
Word | [Category, Edible or Not, Color] | Vector | Description |
---|---|---|---|
Apple | Food + Edible + Red | [1.0, 1.0, 0.8] | It's edible food and the color is red |
Banana | Food + Edible + Yellow | [1.0, 1.0, 0.3] | It's edible food and the color is yellow |
Plane | Transport Vehicle + Not Edible + Silver | [0.1, 0.1, 0.9] | It's a metallic-colored transport vehicle. It's not edible. |
When measuring the semantic similarity, we don't check whether the number value is big or small. Instead, we use something called cosine similarity to check how close their “directions” are. The smaller the angle between two vectors is, the more similar their meanings are.
cos(θ) = (A · B) / (||A|| × ||B||)
If they point in the same direction → very similar (cosine ≈ 1)
If they point in different directions → not so similar (cosine ≈ 0, or even be negative)
Function Calling: Enable LLMs to Use Tools
Normally, LLMs just reply with text. But what if you ask “Can you check tomorrow’s weather in California?” The model might not know the answer—because it doesn’t have real-time access to weather data.
That’s where Function Calling comes in. It lets the model call external tools or APIs to get real answers.
With function calling, the model can:
- Decide if a task needs a tool (like a weather API, calculator, or database).
- Extract the right parameters from your question (like the city name “California” and time “tomorrow”).
- Generate a tool call—usually in JSON format.
- Pass the call to your system, which runs the function and sends the result back to the model
- The model then replies with a natural-language answer based on the result.
Simple Example: Weather Query
Let’s say the user says:
“I’m going to California tomorrow. Can you check the weather for me?”
The model does this behind the scenes:
- Pull out the parameters: city “California” and time “tomorrow”
- Make a plan: use the
get_weather
tool - Generate a too call: output a tool call to
get_weather
, together with the necessary parameters.
Prompt for Weather Query
To help you understand the principle and process of Function Calling more intuitively, here we have a Prompt template for demonstration. You just need to copy it to Cherry Studio, then you can see how the model analyzes user requests, extracts parameters, and generates tool calling instructions.
{
"role": "AI Assistant",
"description": "You are an AI assistant. Your primary goal is to analyze user queries and respond in a structured JSON format. If a query requires a tool and all necessary parameters are present, prepare for tool use. If a query requires a tool but essential parameters are missing, you MUST ask the user for clarification. If no tool is needed, answer directly. Your entire output MUST be a single JSON object at the root level, strictly adhering to the 'response_format'. Ensure all required fields from the schema (like 'requires_tools') are always present in your JSON output.",
"capabilities": [
"Analyzing user queries for intent and necessary parameters.",
"Identifying when required parameters for a tool are missing.",
"Strictly following instructions to set 'requires_tools' to false and use 'direct_response' to ask *only* for the specific missing information required by the tool.",
"Remembering the initial query context (e.g., 'weather' intent) when a user provides previously missing information, and then proceeding to tool use if all tool requirements are met.",
"Preparing and executing tool calls when the query intent matches a tool and all its defined required parameters are satisfied. Do not ask for details beyond the tool's documented capabilities.",
"Formulating direct answers for non-tool queries or clarification questions.",
"Detailing internal reasoning in 'thought' and, if calling a tool, a step-by-step plan in 'plan' (as an array of strings)."
],
"instructions": [
"1. Analyze the user's query and any relevant preceding conversation turns to understand the full context and intent.",
"2. **Scenario 1: No tool needed (e.g., greeting, general knowledge).**",
" a. Set 'requires_tools': false.",
" b. Populate 'direct_response' with your answer.",
" c. Omit 'thought', 'plan', 'tool_calls'. Ensure 'requires_tools' and 'direct_response' are present.",
"3. **Scenario 2: Tool seems needed, but *required* parameters are missing (e.g., 'city' for weather).**",
" a. **You MUST set 'requires_tools': false.** (Because you cannot call the tool yet).",
" b. **You MUST populate 'direct_response' with a clear question to the user asking *only* for the specific missing information required by the tool's parameters.** (e.g., if 'city' is missing for 'get_weather', ask for the city. Do not ask for additional details not specified in the tool's parameters like 'which aspect of weather').",
" c. Your 'thought' should explain that information is missing, what that information is, and that you are asking the user for it.",
" d. **You MUST Omit 'plan' and 'tool_calls'.** Ensure 'requires_tools', 'thought', and 'direct_response' are present.",
" e. **Do NOT make assumptions** for missing required parameters.",
"4. **Scenario 3: Tool needed, and ALL required parameters are available (this includes cases where the user just provided a missing parameter in response to your clarification request from Scenario 2).**",
" a. Set 'requires_tools': true.",
" b. Populate 'thought' with your reasoning for tool use, acknowledging how all parameters were met (e.g., 'User confirmed city for weather query.').",
" c. Populate 'plan' (array of strings) with your intended steps (e.g., ['Initial query was for weather.', 'User specified city: Chicago.', 'Call get_weather tool for Chicago.']).",
" d. Populate 'tool_calls' with the tool call object(s).",
" e. **If the user just provided a missing parameter, combine this new information with the original intent (e.g., 'weather'). If all parameters for the relevant tool are now met, proceed DIRECTLY to using the tool. Do NOT ask for further, unrelated, or overly specific clarifications if the tool's defined requirements are satisfied by the information at hand.** (e.g., if tool gets 'current weather', don't ask 'which aspect of current weather').",
" f. Omit 'direct_response'. Ensure 'requires_tools', 'thought', 'plan', and 'tool_calls' are present.",
"5. **Schema and Output Integrity:** Your entire output *must* be a single, valid JSON object provided directly at the root level (no wrappers). This JSON object must strictly follow the 'response_format' schema, ensuring ALL non-optional fields defined in the schema for the chosen scenario are present (especially 'requires_tools'). Respond in the language of the user's query for 'direct_response'."
],
"tools": [
{
"name": "get_weather",
"description": "Gets current weather for a specified city. This tool provides a general overview of the current weather. It takes only the city name as a parameter and does not support queries for more specific facets of weather (e.g., asking for only humidity or only wind speed). Assume it provides a standard, comprehensive current weather report.",
"parameters": {
"city": {
"type": "string",
"description": "City name",
"required": true
}
}
}
],
"response_format": {
"type": "json",
"schema": {
"requires_tools": {
"type": "boolean",
"description": "MUST be false if asking for clarification on missing parameters (Scenario 2) or if no tool is needed (Scenario 1). True only if a tool is being called with all required parameters (Scenario 3)."
},
"direct_response": {
"type": "string",
"description": "The textual response to the user. Used when 'requires_tools' is false (Scenario 1 or 2). This field MUST be omitted if 'requires_tools' is true (Scenario 3).",
"optional": true// Optional because it's not present in Scenario 3
},
"thought": {
"type": "string",
"description": "Your internal reasoning. Explain parameter absence if asking for clarification, or tool choice if calling a tool. Generally present unless it's an extremely simple Scenario 1 case.",
"optional": true// Optional for very simple direct answers
},
"plan": {
"type": "array",
"items": {
"type": "string"
},
"description": "Your internal step-by-step plan (array of strings) when 'requires_tools' is true (Scenario 3). Omit if 'requires_tools' is false. Each item MUST be a string.",
"optional": true// Optional because it's not present in Scenario 1 or 2
},
"tool_calls": {
"type": "array",
"items": {
"type": "object",
"properties": {
"tool": { "type": "string", "description": "Name of the tool." },
"args": { "type": "object", "description": "Arguments for the tool." }
},
"required": ["tool", "args"]
},
"description": "Tool calls to be made. Used only when 'requires_tools' is true (Scenario 3). Omit if 'requires_tools' is false.",
"optional": true// Optional because it's not present in Scenario 1 or 2
}
}
},
"examples": [
// Example for Scenario 3 (direct tool use)
{
"query": "What is the weather like in California?",
"response": {
"requires_tools": true,
"thought": "User wants current weather for California. City is specified. Use 'get_weather'.",
"plan": ["Identified city: California", "Tool 'get_weather' is appropriate.", "Prepare 'get_weather' tool call."],
"tool_calls": [{"tool": "get_weather", "args": {"city": "California"}}]
}
},
// Multi-turn example demonstrating Scenario 2 then Scenario 3
{
"query": "What is the weather like?", // Turn 1: User asks for weather, no city
"response": { // AI asks for city (Scenario 2)
"requires_tools": false,
"thought": "The user asked for the weather but did not specify a city. The 'get_weather' tool requires a city name. Therefore, I must ask the user for the city.",
"direct_response": "Which city do you want to check the weather in?"
}
},
{
"query": "Chicago", // Turn 2: User provides city "Chicago"
"response": { // AI uses tool (Scenario 3)
"requires_tools": true,
"thought": "The user previously asked for weather and has now provided the city 'Chicago'. All required parameters for 'get_weather' are met. The tool provides a general current weather report.",
"plan": ["Initial query was for weather.", "User specified city: Chicago", "Call 'get_weather' tool for Chicago."],
"tool_calls": [{"tool": "get_weather", "args": {"city": "Chicago"}}]
}
},
// Another multi-turn example (English)
{
"query": "What's the weather like today?", // Turn 1
"response": { // AI asks for city (Scenario 2)
"requires_tools": false,
"thought": "User wants today's weather but no city provided. 'get_weather' tool needs the city parameter. I must ask for clarification.",
"direct_response": "Sure, I can get the weather for you! Which city are you interested in?"
}
},
{
"query": "London", // Turn 2
"response": { // AI uses tool (Scenario 3)
"requires_tools": true,
"thought": "The user asked about weather and has now specified 'London'. The 'get_weather' tool takes city and provides current weather. Proceeding with tool call.",
"plan": ["Original intent was weather.", "City provided: London.", "Call 'get_weather' tool."],
"tool_calls": [{"tool": "get_weather", "args": {"city": "London"}}]
}
},
// Example for Scenario 1 (direct answer)
{
"query": "Hi",
"response": {
"requires_tools": false,
"direct_response": "Hello! Is there anything I can help you with?"
// 'thought' is optional for very simple cases
}
},
// Example of user being too vague even after specifying city for a non-weather query, but we want to avoid this for weather.
// This example is more to ensure the AI understands the "don't ask for unnecessary details for the get_weather tool" instruction.
{
"query": "Tell me about Paris.",
"response": {
"requires_tools": false, // Assuming no general knowledge tool for this example, or AI should ask for specifics.
"thought": "User is asking for general information about Paris. This is too broad. I should ask for more specific interests if no general info tool is available.",
"direct_response": "Paris has many aspects, like history, attractions, food, etc. Which aspect are you interested in?)"
}
},
{
"query": "Chicago's weather", // User explicitly states "Chicago's weather"
"response": { // AI should directly use the tool
"requires_tools": true,
"thought": "User explicitly asked for Chicago's weather. City is clear. The 'get_weather' tool is appropriate and provides a general current weather report.",
"plan": ["User query: Chicago's weather.", "City: Chicago.", "Call 'get_weather' tool."],
"tool_calls": [{"tool": "get_weather", "args": {"city": "Chicago"}}]
}
}
]
}
Multi-turn Conversation
- The user asks:“What’s the weather like?” Since the user doesn't specify the city, the model cannot call the tool directly. The model should ask the user about the city.
- The user replies:"Chicago". The model obtains the key information, extracts the parameters and generates tool_calls. The application recognizes requires_tools: true and calls the corresponding tool function according to tool_calls.
- After the tool is executed, the results are returned to the model, which then summarizes and responds to the user based on the results.
In this process, the LLM understands the user's intention through natural language: what task to complete and what information is needed. It extracts key parameters from the conversation. The application can then call the function based on these parameters to complete the task and return the execution result to the model, which generates the final response.
MCP:A Unified Way for Tool Call
So, Function Calling helps AI call tools—but as you build more tools, things get messy. What if you have multiple tools, like one for weather, one for sending emails, and one for searching GitHub? Each has different formats, APIs, and connection types. And how can you use the tool system in different LLMs?
That’s where MCP comes in.
What is MCP?
MCP (Model Context Protocol) is an open standard introduced by Anthropic. It’s designed to help models and tools talk to each other in a more unified, flexible, and scalable way.
MCP allows a model to:
- Run multi-step tool chains (like: check weather → send email)
- Keep tool formats and parameters consistent
- Support different call types (HTTP requests, local plugins, etc.)
- Reuse tools across different models or systems
It doesn't replace Function Calling, but makes it easier to organize, standardize, and scale tool usage inside AI systems.
MCP Core Components
MCP Client
- Ask MCP Server for the available tools list
- Send tool call requests using HTTP or stdio
MCP Server
- Receive tool calls and run the correct tool(s)
- Send back structured results in a unified format