Barry

Chief Technology Officer

View all authors

GenAI Core Concepts Explained (RAG, Function Calling, MCP, AI Agent)

August 21, 2025 · 24 min read

Barry

Chief Technology Officer

Overview

Large language models (LLMs) are getting better and smarter by the day—but they still have their limits. For example, they can hardly ingest the latest info, or do anything beyond text replies. Also, hallucination is a prominent problem.

To fill these gaps, a few key ideas have started to take off: RAG (Retrieval-Augmented Generation), Function Calling, MCP (Model Context Protocol), and AI Agents. These tools help models stay up to date, connect with real-world tools, and even complete tasks for you.

BladePipe recently launched a new feature called RagApi, which brings these ideas together in a powerful and easy-to-use way. In this blog, we’ll walk you through what each of these terms means, how they work, and how BladePipe puts them into action.

RAG: Retrieval-Augmented Generation

RAG（Retrieval-Augmented Generation） is an AI architecture combining two things: retrieving information and generating answers. Instead of having the LLMs reply only based on what it “remembers” from training, RAG lets it look up relevant info from external sources—like documents or databases—and then use that info to answer your question, which will be more accurate and relevant.

So, it’s like giving your AI access to a private library every time you ask it something.

RAG's Strengths

Up-to-date knowledge: The model doesn’t just rely on old training data. It can pull in fresh or domain-specific content.
Work with private data: You can enjoy enhanced security and customized services.
Fewer made-up answers: Since it’s referencing real content, it’s less likely to hallucinate or guess.

RAG Workflow

Build a knowledge base: Take your documents, slice them into smaller parts, and turn them into vectors. Store them in a vector database like PGVector.
User asks a question: That question is also turned into a vector.
Do a similarity search: The system finds the most relevant text chunks from the database.
Feed results to the model: These chunks are added to the model prompt to help it generate a better answer.

What is a Vector?

In RAG, turning text into vectors is step one. Then, what is a vector on earth?

Think of a vector as a way for a computer to understand meaning using numbers.

To make it easy to understand, let’s take the word “apple.” We humans know what it is from experience. But to a computer, it has to be turned into a vector through the method embedding—like:

[0.12, 0.85, -0.33, ..., 0.07] (say, 768 dimensions)

Each number (dimension) represents a hidden meaning:

The 12th dimension might say “is it a fruit?”
The 47th dimension might say “is it food?”
The 202nd dimension might say “is it a company?”
The 588th dimension might say “is it red in color?”

Each dimension stands for a feature, and the number at each dimension is like the "score" for it. The higher the score is, the prominent the feature is.

Based on the scores for all dimensions, every word or sentence gets a position in a semantic space, like a pin on a multi-dimensional map.

How do We Measure Similarity?

Let’s say we turn “apple” and “banana” into vectors. Even though they’re different words, their scores for many dimensions are similar, because they are similar semantically.

To make it understandable, we use three dimensions [category, edible or not, color] to measure the semantic similarity of the words: apple, banana and plane.

Word	[Category, Edible or Not, Color]	Vector	Description
Apple	Food + Edible + Red	[1.0, 1.0, 0.8]	It's edible food and the color is red
Banana	Food + Edible + Yellow	[1.0, 1.0, 0.3]	It's edible food and the color is yellow
Plane	Transport Vehicle + Not Edible + Silver	[0.1, 0.1, 0.9]	It's a metallic-colored transport vehicle. It's not edible.

When measuring the semantic similarity, we don't check whether the number value is big or small. Instead, we use something called cosine similarity to check how close their “directions” are. The smaller the angle between two vectors is, the more similar their meanings are.
cos(θ) = (A · B) / (||A|| × ||B||)
If they point in the same direction → very similar (cosine ≈ 1)
If they point in different directions → not so similar (cosine ≈ 0, or even be negative)

Function Calling: Enable LLMs to Use Tools

Normally, LLMs just reply with text. But what if you ask “Can you check tomorrow’s weather in California?” The model might not know the answer—because it doesn’t have real-time access to weather data.

That’s where Function Calling comes in. It lets the model call external tools or APIs to get real answers.

With function calling, the model can:

Decide if a task needs a tool (like a weather API, calculator, or database).
Extract the right parameters from your question (like the city name “California” and time “tomorrow”).
Generate a tool call—usually in JSON format.
Pass the call to your system, which runs the function and sends the result back to the model
The model then replies with a natural-language answer based on the result.

Simple Example: Weather Query

Let’s say the user says:

“I’m going to California tomorrow. Can you check the weather for me?”

The model does this behind the scenes:

Pull out the parameters: city “California” and time “tomorrow”
Make a plan: use the get_weather tool
Generate a too call: output a tool call to get_weather, together with the necessary parameters.

Prompt for Weather Query

To help you understand the principle and process of Function Calling more intuitively, here we have a Prompt template for demonstration. You just need to copy it to Cherry Studio, then you can see how the model analyzes user requests, extracts parameters, and generates tool calling instructions.

{
  "role": "AI Assistant",
"description": "You are an AI assistant. Your primary goal is to analyze user queries and respond in a structured JSON format. If a query requires a tool and all necessary parameters are present, prepare for tool use. If a query requires a tool but essential parameters are missing, you MUST ask the user for clarification. If no tool is needed, answer directly. Your entire output MUST be a single JSON object at the root level, strictly adhering to the 'response_format'. Ensure all required fields from the schema (like 'requires_tools') are always present in your JSON output.",
"capabilities": [
    "Analyzing user queries for intent and necessary parameters.",
    "Identifying when required parameters for a tool are missing.",
    "Strictly following instructions to set 'requires_tools' to false and use 'direct_response' to ask *only* for the specific missing information required by the tool.",
    "Remembering the initial query context (e.g., 'weather' intent) when a user provides previously missing information, and then proceeding to tool use if all tool requirements are met.",
    "Preparing and executing tool calls when the query intent matches a tool and all its defined required parameters are satisfied. Do not ask for details beyond the tool's documented capabilities.",
    "Formulating direct answers for non-tool queries or clarification questions.",
    "Detailing internal reasoning in 'thought' and, if calling a tool, a step-by-step plan in 'plan' (as an array of strings)."
  ],
"instructions": [
    "1. Analyze the user's query and any relevant preceding conversation turns to understand the full context and intent.",
    "2. **Scenario 1: No tool needed (e.g., greeting, general knowledge).**",
    "    a. Set 'requires_tools': false.",
    "    b. Populate 'direct_response' with your answer.",
    "    c. Omit 'thought', 'plan', 'tool_calls'. Ensure 'requires_tools' and 'direct_response' are present.",
    "3. **Scenario 2: Tool seems needed, but *required* parameters are missing (e.g., 'city' for weather).**",
    "    a. **You MUST set 'requires_tools': false.** (Because you cannot call the tool yet).",
    "    b. **You MUST populate 'direct_response' with a clear question to the user asking *only* for the specific missing information required by the tool's parameters.** (e.g., if 'city' is missing for 'get_weather', ask for the city. Do not ask for additional details not specified in the tool's parameters like 'which aspect of weather').",
    "    c. Your 'thought' should explain that information is missing, what that information is, and that you are asking the user for it.",
    "    d. **You MUST Omit 'plan' and 'tool_calls'.** Ensure 'requires_tools', 'thought', and 'direct_response' are present.",
    "    e. **Do NOT make assumptions** for missing required parameters.",
    "4. **Scenario 3: Tool needed, and ALL required parameters are available (this includes cases where the user just provided a missing parameter in response to your clarification request from Scenario 2).**",
    "    a. Set 'requires_tools': true.",
    "    b. Populate 'thought' with your reasoning for tool use, acknowledging how all parameters were met (e.g., 'User confirmed city for weather query.').",
    "    c. Populate 'plan' (array of strings) with your intended steps (e.g., ['Initial query was for weather.', 'User specified city: Chicago.', 'Call get_weather tool for Chicago.']).",
    "    d. Populate 'tool_calls' with the tool call object(s).",
    "    e. **If the user just provided a missing parameter, combine this new information with the original intent (e.g., 'weather'). If all parameters for the relevant tool are now met, proceed DIRECTLY to using the tool. Do NOT ask for further, unrelated, or overly specific clarifications if the tool's defined requirements are satisfied by the information at hand.** (e.g., if tool gets 'current weather', don't ask 'which aspect of current weather').",
    "    f. Omit 'direct_response'. Ensure 'requires_tools', 'thought', 'plan', and 'tool_calls' are present.",
    "5. **Schema and Output Integrity:** Your entire output *must* be a single, valid JSON object provided directly at the root level (no wrappers). This JSON object must strictly follow the 'response_format' schema, ensuring ALL non-optional fields defined in the schema for the chosen scenario are present (especially 'requires_tools'). Respond in the language of the user's query for 'direct_response'."
  ],
"tools": [
    {
      "name": "get_weather",
      "description": "Gets current weather for a specified city. This tool provides a general overview of the current weather. It takes only the city name as a parameter and does not support queries for more specific facets of weather (e.g., asking for only humidity or only wind speed). Assume it provides a standard, comprehensive current weather report.",
      "parameters": {
        "city": {
          "type": "string",
          "description": "City name",
          "required": true
        }
      }
    }
  ],
"response_format": {
    "type": "json",
    "schema": {
      "requires_tools": {
        "type": "boolean",
        "description": "MUST be false if asking for clarification on missing parameters (Scenario 2) or if no tool is needed (Scenario 1). True only if a tool is being called with all required parameters (Scenario 3)."
      },
      "direct_response": {
        "type": "string",
        "description": "The textual response to the user. Used when 'requires_tools' is false (Scenario 1 or 2). This field MUST be omitted if 'requires_tools' is true (Scenario 3).",
        "optional": true// Optional because it's not present in Scenario 3
      },
      "thought": {
        "type": "string",
        "description": "Your internal reasoning. Explain parameter absence if asking for clarification, or tool choice if calling a tool. Generally present unless it's an extremely simple Scenario 1 case.",
        "optional": true// Optional for very simple direct answers
      },
      "plan": {
        "type": "array",
        "items": {
          "type": "string"
        },
        "description": "Your internal step-by-step plan (array of strings) when 'requires_tools' is true (Scenario 3). Omit if 'requires_tools' is false. Each item MUST be a string.",
        "optional": true// Optional because it's not present in Scenario 1 or 2
      },
      "tool_calls": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "tool": { "type": "string", "description": "Name of the tool." },
            "args": { "type": "object", "description": "Arguments for the tool." }
          },
          "required": ["tool", "args"]
        },
        "description": "Tool calls to be made. Used only when 'requires_tools' is true (Scenario 3). Omit if 'requires_tools' is false.",
        "optional": true// Optional because it's not present in Scenario 1 or 2
      }
    }
  },
"examples": [
    // Example for Scenario 3 (direct tool use)
    {
      "query": "What is the weather like in California?",
      "response": {
        "requires_tools": true,
        "thought": "User wants current weather for California. City is specified. Use 'get_weather'.",
        "plan": ["Identified city: California", "Tool 'get_weather' is appropriate.", "Prepare 'get_weather' tool call."],
        "tool_calls": [{"tool": "get_weather", "args": {"city": "California"}}]
      }
    },
    // Multi-turn example demonstrating Scenario 2 then Scenario 3
    {
      "query": "What is the weather like?", // Turn 1: User asks for weather, no city
      "response": { // AI asks for city (Scenario 2)
        "requires_tools": false,
        "thought": "The user asked for the weather but did not specify a city. The 'get_weather' tool requires a city name. Therefore, I must ask the user for the city.",
        "direct_response": "Which city do you want to check the weather in?"
      }
    },
    {
      "query": "Chicago", // Turn 2: User provides city "Chicago"
      "response": { // AI uses tool (Scenario 3)
        "requires_tools": true,
        "thought": "The user previously asked for weather and has now provided the city 'Chicago'. All required parameters for 'get_weather' are met. The tool provides a general current weather report.",
        "plan": ["Initial query was for weather.", "User specified city: Chicago", "Call 'get_weather' tool for Chicago."],
        "tool_calls": [{"tool": "get_weather", "args": {"city": "Chicago"}}]
      }
    },
    // Another multi-turn example (English)
    {
      "query": "What's the weather like today?", // Turn 1
      "response": { // AI asks for city (Scenario 2)
        "requires_tools": false,
        "thought": "User wants today's weather but no city provided. 'get_weather' tool needs the city parameter. I must ask for clarification.",
        "direct_response": "Sure, I can get the weather for you! Which city are you interested in?"
      }
    },
    {
      "query": "London", // Turn 2
      "response": { // AI uses tool (Scenario 3)
        "requires_tools": true,
        "thought": "The user asked about weather and has now specified 'London'. The 'get_weather' tool takes city and provides current weather. Proceeding with tool call.",
        "plan": ["Original intent was weather.", "City provided: London.", "Call 'get_weather' tool."],
        "tool_calls": [{"tool": "get_weather", "args": {"city": "London"}}]
      }
    },
    // Example for Scenario 1 (direct answer)
    {
      "query": "Hi",
      "response": {
        "requires_tools": false,
        "direct_response": "Hello! Is there anything I can help you with?"
        // 'thought' is optional for very simple cases
      }
    },
    // Example of user being too vague even after specifying city for a non-weather query, but we want to avoid this for weather.
    // This example is more to ensure the AI understands the "don't ask for unnecessary details for the get_weather tool" instruction.
    {
      "query": "Tell me about Paris.", 
      "response": {
        "requires_tools": false, // Assuming no general knowledge tool for this example, or AI should ask for specifics.
        "thought": "User is asking for general information about Paris. This is too broad. I should ask for more specific interests if no general info tool is available.",
        "direct_response": "Paris has many aspects, like history, attractions, food, etc. Which aspect are you interested in?)"
      }
    },
    {
      "query": "Chicago's weather", // User explicitly states "Chicago's weather"
      "response": { // AI should directly use the tool
        "requires_tools": true,
        "thought": "User explicitly asked for Chicago's weather. City is clear. The 'get_weather' tool is appropriate and provides a general current weather report.",
        "plan": ["User query: Chicago's weather.", "City: Chicago.", "Call 'get_weather' tool."],
        "tool_calls": [{"tool": "get_weather", "args": {"city": "Chicago"}}]
      }
    }
  ]
}

Multi-turn Conversation

The user asks:“What’s the weather like?” Since the user doesn't specify the city, the model cannot call the tool directly. The model should ask the user about the city.

The user replies:"Chicago". The model obtains the key information, extracts the parameters and generates tool_calls. The application recognizes requires_tools: true and calls the corresponding tool function according to tool_calls.

After the tool is executed, the results are returned to the model, which then summarizes and responds to the user based on the results.

In this process, the LLM understands the user's intention through natural language: what task to complete and what information is needed. It extracts key parameters from the conversation. The application can then call the function based on these parameters to complete the task and return the execution result to the model, which generates the final response.

MCP：A Unified Way for Tool Call

So, Function Calling helps AI call tools—but as you build more tools, things get messy. What if you have multiple tools, like one for weather, one for sending emails, and one for searching GitHub? Each has different formats, APIs, and connection types. And how can you use the tool system in different LLMs?

That’s where MCP comes in.

What is MCP?

MCP (Model Context Protocol) is an open standard introduced by Anthropic. It’s designed to help models and tools talk to each other in a more unified, flexible, and scalable way.

MCP allows a model to:

Run multi-step tool chains (like: check weather → send email)
Keep tool formats and parameters consistent
Support different call types (HTTP requests, local plugins, etc.)
Reuse tools across different models or systems

It doesn't replace Function Calling, but makes it easier to organize, standardize, and scale tool usage inside AI systems.

MCP Core Components

MCP Client

Ask MCP Server for the available tools list
Send tool call requests using HTTP or stdio

MCP Server

Receive tool calls and run the correct tool(s)
Send back structured results in a unified format

Methods to Call a MCP Server

HTTP (StreamableHttp)

MCP Server runs as a web service. The following interfaces are exposed:

/mcp: to receive tool calls and list available tools
Support Event Stream and JSON-RPC protocol.

Example: Call a weather service using HTTP:

cat > streamable_weather.mjs << 'EOF'
#!/usr/bin/env node
import express from "express";
import { McpServer, ResourceTemplate } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { isInitializeRequest } from "@modelcontextprotocol/sdk/types.js";
import { randomUUID } from "node:crypto";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const app = express();
app.use(express.json());

function getServer() {
    const server = new McpServer({
        name: "Weather",
        version: "1.0.0"
    });

    server.resource(
        "get_weather",
        new ResourceTemplate("weather://{city}", { list: undefined }),
        async (uri, { city }) => ({
            contents: [{
                uri: uri.href,
                text: `Resource weather for ${city}: Tomorrow will be sunny, with a maximum of 59°F and a breeze of 2 mile/h`
            }]
        })
    );

    server.tool(
        "get_weather",
        { city: z.string() },
        async ({ city }) => ({
            content: [{ type: "text", text: `Tool weather for ${city}: Tomorrow will be sunny, with a maximum of 59°F and a breeze of 2 mile/h` }]
        })
    );

    server.prompt(
        "get_weather",
        { city: z.string() },
        ({ city }) => ({
            messages: [{
                role: "user",
                content: {
                    type: "text",
                    text: `Please tell me the weather conditions of ${city}`
                }
            }]
        })
    );
    return server;
}

app.post("/mcp", async (req, res) => {
    try {
        const server = getServer();
        const transport = new StreamableHTTPServerTransport({
            sessionIdGenerator: undefined,
        });

        res.on("close", () => {
            console.log("Request closed");
            transport.close();
            server.close();
        });

        await server.connect(transport);
        await transport.handleRequest(req, res, req.body);
    } catch (error) {
        console.error("Error handling MCP request:", error);
        if (!res.headersSent) {
            res.status(500).json({
                jsonrpc: "2.0",
                error: {
                    code: -32603,
                    message: "Internal server error",
                },
                id: null,
            });
        }
    }
});

app.get("/mcp", (req, res) => {
    console.log("Received GET MCP request");
    res.status(405).json({
        jsonrpc: "2.0",
        error: {
            code: -32000,
            message: "Method not allowed.",
        },
        id: null,
    });
});

app.delete("/mcp", (req, res) => {
    console.log("Received DELETE MCP request");
    res.status(405).json({
        jsonrpc: "2.0",
        error: {
            code: -32000,
            message: "Method not allowed.",
        },
        id: null,
    });
});

const PORT = process.env.PORT || 30001;
app.listen(PORT, () => {
    console.log(
        `MCP Stateless Streamable HTTP Server listening on http://localhost:${PORT}/mcp`
    );
});

EOF

# Install dependencies
npm install express @modelcontextprotocol/sdk zod

# Start the service
node streamable_weather.mjs 

# Get the tool list
curl -N -X POST http://localhost:30001/mcp \
  -H 'Accept: application/json, text/event-stream' \
  -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc":"2.0",
    "id":1,
    "method":"tools/list",
    "params":{}
  }'

# > Return tools
event: message
data: {"result":{"tools":[{"name":"get_weather","inputSchema":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}}]},"jsonrpc":"2.0","id":1}

# Execute tool calls
curl -N -X POST http://localhost:30001/mcp \
  -H 'Accept: application/json, text/event-stream' \
  -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc":"2.0",
    "id":2,
    "method":"tools/call",
    "params":{
      "name":"get_weather",
      "arguments":{ "city":"California" }
    }
  }'

# > Return tool call results
event: message
data: {"result":{"content":[{"type":"text","text":"Tool weather for California: Tomorrow will be sunny, with a maximum of 59°F and a breeze of 2mile/h"}]},"jsonrpc":"2.0","id":2}

Stdio (Local Plug-in)

Stdio mode is suitable for plug-ins running locally. The model talks to the MCP Server via standard input/output. It doesn't need internet, great for on-premise or secure environments.

Example: Call a weather service using Stdio:

cat > weather_stdio.mjs << 'EOF'
#!/usr/bin/env node
import { McpServer, ResourceTemplate } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new McpServer({
    name: "Weather",
    version: "1.0.0"
});

server.resource(
    "get_weather",
    new ResourceTemplate("weather://{city}", { list: undefined }),
    async (uri, { city }) => ({
        contents: [{
            uri: uri.href,
            text: `Resource weather for ${city}: Tomorrow will be sunny, with a maximum of 59°F and a breeze of 2 mile/h`
        }]
    })
);

server.tool(
    "get_weather",
    { city: z.string() },
    async ({ city }) => ({
        content: [{ type: "text", text: `Tool weather for ${city}: Tomorrow will be sunny, with a maximum of 59°F and a breeze of 2 mile/h` }]
    })
);

server.prompt(
    "get_weather",
    { city: z.string() },
    ({ city }) => ({
        messages: [{
            role: "user",
            content: {
                type: "text",
                text: `Please tell me the weather conditions of ${city}`
            }
        }]
    })
);

const transport = new StdioServerTransport();
await server.connect(transport);
EOF

# Get the tool list
printf '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}\n' | node weather_mcp.mjs

# > Return tools
{"result":{"tools":[{"name":"get_weather","inputSchema":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}}]},"jsonrpc":"2.0","id":1}

# Execute tool calls: call get_weather
printf '{"jsonrpc":"2.0","id":4,"method":"tools/call","params":{"name":"get_weather","arguments":{"city":"California"}}}\n' | node weather_mcp.mjs

# > Return tool call results
{"result":{"content":[{"type":"text","text":"Tool weather for California: Tomorrow will be sunny, with a maximum of 59°F and a breeze of 2 mile/h"}]},"jsonrpc":"2.0","id":4}

Multi-turn Interaction with MCP

A user asks: "What’s the weather in California?"

The model knows it should use get_weather, and it generates the following instruction:

{
  "tool": "get_weather",
  "args": {
    "city": "California"
  }
}

The user's program passes the instruction to MCP Server. MCP Server runs the tool and sends back the result:

The user's program extracts the text field, content.text, from result.content, and turns it into a friendly answer:

Tomorrow California will be sunny, with a highest temperature of 59°F and a breeze of 2 mile/h. If you want to know more, just ask.

MCP provides a unified and scalable execution framework for connecting models to the outside world, with the following advantages:

Support multiple communication methods (HTTP, Stdio);
Tools are discoverable, reusable, and consistent
Work across different models, platforms, and languages
Easy to deploy locally or remotely.

Together with Function Calling, MCP gives you a standard pipeline to build an AI Agent—clean, organized, and easy to scale.

AI Agent: Smart Digital Assistant

So far we’ve talked about how models can:

Look up info (with RAG),
Call tools (with Function Calling),
Organize those tools (with MCP).

Now imagine putting all that together into one system that can understand a task, plan steps, use tools, and get things done — that’s an AI Agent.

An AI Agent is more than a chatbot. It’s a smart digital assistant that can:

Understand your goal: You say: “Help me check California’s weather and email it to my team.”
Break the task into steps: First check weather → then write email → then send.
Use the right tools: Call get_weather API, then trigger email service.
Remember context: It keeps track of the conversation or project so far.
Handle errors & retry: If a tool fails, it can try again or ask you for missing info.

It’s like hiring a junior assistant who learns fast, handles tools, and follows instructions—just in software form. The use cases are growing fast. As long as you define what the agent should do—and give it tools—it can do real work for you.

Summary

Concept	Nature	Data Sources	Use Cases	Real Example
RAG	Research + Generation	Knowledge/docs	Industry-specific Q&A, dynamic knowledge updates	Enterprise knowledge base, customer service robot
Function Calling	Call external functions	API/databases	Real-time data interaction, automated tasks	Weather query, order processing
MCP	Standard tool call protocol	Platforms (e.g. GitHub)	Cross-model and cross-platform collaboration	Intelligent workflow (such as checking weather + sending email)
AI Agent	Automatic plan and execute	Combined (RAG + tool call)	Automation of complex tasks	Personal assistant

BladePipe RAG

BladePipe recently launched RagApi, a plug-and-play service that brings together RAG architecture and MCP protocol to build smart, real-time RAG services with external tool support.

What’s great? It’s designed to be simple, powerful, and non-developer-friendly. No code is needed.

Workflow

The API of BladePipe RagApi service exposed to the outside world is in OpenAI API format. RagApi turns your enterprise data into a smart assistant in two main steps:

DataJob 1: Chunking and Embedding (File → PGVector)

Data Collection and Preparation
Enterprise knowledge base includes markdown, txt, databases, internal documents, etc. Users create a DataJob through BladePipe to realize text embedding and configure information such as data sources, models, and target tables.
Data Chunking and Embedding
BladePipe automatically processes original documents and generates vectors, which are saved in PostgreSQL using the pgvector extension (e.g. in a column like __vector).

DataJob 2: RAG API Building (PGVector → RagApi)

Query Embedding and Smart Query Tweaks
BladePipe turns the user's query into vectors using the same model. In this process, query tweaks can be done:

QUERY_COMPRESS: Shrink long or vague queries to their core meaning, making the vector similarity search more accurate.
QUERY_EXTEND: Expand queries with synonyms or related terms to extend the similarity search scope.

Vector Search and Knowledge Selection
Do vector similarity search in the vector database to find the most relevant knowledge chunks. And you can use KNOWLEDGE_SELECT to filter knowledge. It supports multi-knowledge base routing. BladePipe picks the best matching pieces across tables based on semantic relevance.
Prompt Building
BladePipe constructs a complete context based on the configured Prompt template, combined with users' queries and search results.
LLM Reasoning
The created Prompt is fed into the configured Chat model (such as qwq-plus, gpt-4o, etc.) to generate the final answer. The interface is an OpenAI format interface, which can be directly connected to the applications.
Call Tools with MCP (Optional)
To do more beyond getting info, we need MCP. With MCP integration, RagApi can use tools to finish tasks(e.g. query GitHub PR status, call the company's API). It supports standardized MCP tools (HTTP or local via stdio). External tools are called via Function Calling, and the model replies with the final output.

The specific procedures are shown in the following blogs:

After the two DataJobs are running, the RAG service that can answer industry-specific questions and automate tasks based on MCP is available. The OpenAI-compatible RAG API is an enhanced version of the existing LLM APIs, with no need of client code change.

We can check the performance using Cherry Studio. CherryStudio is compatible with the OpenAI interface, suitable for interface joint debugging, context debugging, and model performance verification.

Try It Out in Cherry Studio

Open Cherry Studio, and click the setting icon in the bottom left corner.
In Model Provider, search Open AI and configure as follows:

API Key: Enter the RagApi API Key configured in BladePipe.
API Host: http://localhost:18089

Model Name: BP_RAG

Go back to the chat page. Click Add assistant > Default Assistant.
Right click Default Assistant > Edit assistant > Model Settings, and select the model added in Step 2.

Go back to the chat page. Enter How to create an Incremental DataJob in BladePipe? RagApi will generate the answer based on the vector search and the LLM.

Tool Call (MCP Example)

If you configure the function of MCP (e.g. web crawling, GitHub query), the model can automate tool calls.

Follow the steps:

Go to BladePipe, click Functions > Modify DataJob Params in the upper right corner of the DataJob details page.
Select Target tab, find the parameter mcpServers, and paste the following configuration into it.
Click Save in the upper right corner to confirm the parameter modification.
Click Submit, and restart the DataJob.

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-github"
      ],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "<YOUR_TOKEN>"
      }
    },
    "mcp-server-firecrawl": {
      "command": "npx",
      "args": [
        "-y",
        "firecrawl-mcp"
      ],
      "env": {
        "FIRECRAWL_API_KEY": "<YOUR_API_KEY_HERE>"
      }
    }
  }
}

BladePipe automates multi-turn tool calls through MCP and makes a summary, then returns the final response.

Wrapping Up

RAG, Function Calling, MCP, and AI Agents aren’t just buzzwords—they’re the core building blocks of real, usable AI systems.

With RagApi, BladePipe wraps these powerful tools into an easy-to-use, no-code-needed solution that brings intelligent Q&A and automated tool usage into your apps instantly. In BladePipe, everyone can create your own RAG service. It’s AI that knows, thinks, and gets things done.

Build A RAG Chatbot with OpenAI - A Beginner's Guide

August 21, 2025 · 6 min read

Barry

Chief Technology Officer

In a previous article, we explained key GenAI concepts like RAG, Function Calling, MCP, and AI Agents. Now the question is: how do we go from concepts to practice?

Currently, you can find plenty of RAG building tutorials online, but most of them are based on frameworks like LangChain, which still have a learning curve for beginners.

BladePipe, as a data integration platform, already supports the access to and processing of multiple data sources. This gives it a natural edge in setting up the semantic search foundations for a RAG system. Recently, BladePipe launched RagApi, which wraps up vector search and Q&A capabilities into a plug-and-play API service. With just two DataJobs in BladePipe, you can have your own RAG service—no coding required.

Why BladePipe RagApi?

Compared to traditional RAG setups, which often involve lots of manual work, BladePipe RagApi offers several unique benefits:

Two DataJobs for a RAG service: One to import documents, and one to create the API.
Zero-code deployment: No need to write any code, just configure.
Adjustable parameters: Adjust vector top-K, match threshold, prompt templates, model temperature, etc.
Multi-model and platform compatibility: Support DashScope (Alibaba Cloud), OpenAI, DeepSeek, and more.
OpenAI-compatible API: Integrate it directly with existing Chat apps or tools with no extra setup.

Demo: A Q&A Service for BladePipe Docs

We’ll use BladePipe’s own documentation as a knowledge base to create a RAG-based Q&A service.

Here’s what we’ll need:

BladePipe – to build and manage the RagApi service
PostgreSQL – as the vector database
Embedding model – OpenAI text-embedding-3-large
Chat model – OpenAI GPT-4o

Here’s the overall workflow:

Step-by-Step Setup

Install BladePipe

Follow the instructions in Install Worker (Docker) or Install Worker (Binary) to download and install a BladePipe Worker.

Prepare Your Resources

Log in OpenAI API platform and create the API key.
Install a local PostgreSQL instance:

#!/bin/bash

# create file docker-compose.yml
cat <<EOF > docker-compose.yml
version: "3"
services:
  db:
    container_name: pgvector-db
    hostname: 127.0.0.1
    image: pgvector/pgvector:pg16
    ports:
      - 5432:5432
    restart: always
    environment:
      - POSTGRES_DB=api
      - POSTGRES_USER=root
      - POSTGRES_PASSWORD=123456
    volumes:
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
EOF

# Start docker-compose automatically
docker-compose up --build

# Access PostgreSQL
docker exec -it pgvector-db psql -U root -d api

Create a privileged user and log in.
Switch to the target schema where you need to create tables (like public).
Run the following SQL to enable vector capability:

CREATE EXTENSION IF NOT EXISTS vector;

Add DataSources

Add Files:

Select Self Maintenance > SshFile. You can set extra parmeters.

Address: Fill in the machine IP where the files are stored and SSH port (default 22).
Account & Password: Username and password of the machine.
Parameter fileSuffixArray: set to .md to include markdown files.
Parameter dbsJson: Copy the default value and modify the schema value (the root path where target files are located)

[
  {
    "db":"cc_virtual_fs",
    "schemas":[
      {
        "schema":"/tmp/cloudcanal-doc-v2/locales",
        "tables":[]
      }
    ]
   }
]

Add the Vector Database:

Choose Self Maintenance > PostgreSQL, then connect.

Add a LLM:
Choose Independent Cloud Platform > Manually Fill > OpenAI, and fill in the API key.

Add RagApi Service:

Choose Self Maintenance > RagApi.

Address: Set host to localhost and port to 18089.
API Key: Create your own API key for later use.

DataJob 1: Vectorize Your Data

Go to DataJob > Create DataJob.
Choose source: SshFile, target: PostgreSQL, and test the connection.

Select Full Data for DataJob Type. Keep the specification as default (2 GB).
In Tables page,
1. Select the markdown files you want to process.
2. Click Batch Modify Target Names > Unified table name, and fill in the table name (e.g. vector_store).

In Data Processing page,
1. Click Set LLM > OpenAI, and select the instance and the embedding model (text-embedding-3-large).
2. Click Batch Operation > LLM embedding. Select the fields for embedding, and check Select All.

In Creation page, click Create DataJob.

DataJob 2: Build RagApi Service

Go to DataJob > Create DataJob.
Choose source: PostgreSQL(with vectors stored), target: RagApi, and test the connection.

Select Incremental for DataJob Type. Keep the specification as default (2 GB).
In Tables page, select the vector table(s).

In Data Processing page, click Set LLM:
1. Embedding LLM: Select OpenAI and the embedding model (e.g. text-embedding-3-large). Note: Make sure vector dimensions in PostgreSQL match the embedding model.
2. Chat LLM: Select OpenAI and the chat model (e.g. gpt-4o).

In Creation page, click Create DataJob to finish the setup.

Test

You can test the RagApi with CherryStudio, a visual tool that supports OpenAI-compatible APIs.

Open CherryStudio, click the Settings icon in the bottom left corner.
Under Model Provider, search for OpenAI and configure:
- API Key: your RagApi key configured in BladePipe
- API Host: http://localhost:18089
- Model ID: BP_RAG

Back on the chat page:
- Click Add Assistant > Default Assistant.
- Right click Default Assistant > Edit Assistant > Model Settings, and choose BP_RAG as the default model.

Now try asking: How to create an incremental DataJob in BladePipe?. RagApi will search your vector database and generate a response using the chat model.

Wrapping Up

With just a few steps, we’ve built a fully functional RagApi service from scratch—vectorized the data, connected to a vector DB, configured LLMs, generated Prompt and deployed an OpenAI-compatible API.

With BladePipe, teams can quickly build Q&A services based on outside knowledge without writing any code. It's a powerful yet accessible way to tap into GenAI for your own data.

BladePipe is Now Available on AWS Marketplace

July 15, 2025 · 2 min read

Barry

Chief Technology Officer

We’re thrilled to announce that BladePipe is now available on AWS Marketplace!

With the presence of BladePipe on AWS marketplace, customers can easily subscribe BladePipe services, and consolidate the charges to the AWS bill for convenient management.

Why it Matters

Simplified Procurement

Now, customers can subscribe to BladePipe directly on AWS Marketplace with just a few clicks, similar to purchasing any other AWS service. Pre-defined pricing models and standard contract terms allow customers to quickly evaluate costs and make purchasing decisions. There is no need of lengthy negotiations, complex contract signings, and multiple approval layers, streamlining the procurement process.

Unified Billing and Invoicing

By subscribing through AWS Marketplace, all usage of BladePipe is billed directly to the customer's AWS account, with unified invoicing and clear cost tracking. That eliminates the need to process separate vendor invoices. Customers can manage and check the costs together with other AWS services, making it easier for budget controlling and accounting.

Enhanced Flexibility

Now customers have more subscription options than ever before. They can choose to subscribe services directly on AWS marketplace or chat with the BladePipe team for a tailored plan. This expanded flexibility empowers organizations to make the right choice based on their unique operational needs and existing cloud strategies.

Getting Started

We’re excited to bring BladePipe to even more data teams through the convenience and scale of AWS. Whether you're syncing data to your data warehouses, or building real-time apps, BladePipe is ready to elevate your data integration process. Click here to have a try!

Overview​

RAG: Retrieval-Augmented Generation​

RAG's Strengths​

RAG Workflow​

What is a Vector?​

How do We Measure Similarity?​

Function Calling: Enable LLMs to Use Tools​

Simple Example: Weather Query​

Prompt for Weather Query​

Multi-turn Conversation​

MCP：A Unified Way for Tool Call​

What is MCP?​

MCP Core Components​

MCP Client​

MCP Server​

Methods to Call a MCP Server​

HTTP (StreamableHttp)​

Stdio (Local Plug-in)​

Multi-turn Interaction with MCP​

AI Agent: Smart Digital Assistant​

Summary​

BladePipe RAG​

Workflow​

DataJob 1: Chunking and Embedding (File → PGVector)​

DataJob 2: RAG API Building (PGVector → RagApi)​

Try It Out in Cherry Studio​

Tool Call (MCP Example)​

Wrapping Up​

Why BladePipe RagApi?​

Demo: A Q&A Service for BladePipe Docs​

Step-by-Step Setup​

Install BladePipe​

Prepare Your Resources​

Add DataSources​

DataJob 1: Vectorize Your Data​

DataJob 2: Build RagApi Service​

Test​

Wrapping Up​

Why it Matters​

Simplified Procurement​

Unified Billing and Invoicing​

Enhanced Flexibility​

Getting Started​

Latest blog posts

GenAI Core Concepts Explained (RAG, Function Calling, MCP, AI Agent)

Build A RAG Chatbot with OpenAI - A Beginner's Guide

Build a Local RAG Using Ollama, PostgreSQL and BladePipe

Overview

RAG: Retrieval-Augmented Generation

RAG's Strengths

RAG Workflow

What is a Vector?

How do We Measure Similarity?

Function Calling: Enable LLMs to Use Tools

Simple Example: Weather Query

Prompt for Weather Query

Multi-turn Conversation

MCP：A Unified Way for Tool Call

What is MCP?

MCP Core Components

MCP Client

MCP Server

Methods to Call a MCP Server

HTTP (StreamableHttp)

Stdio (Local Plug-in)

Multi-turn Interaction with MCP

AI Agent: Smart Digital Assistant

Summary

BladePipe RAG

Workflow

DataJob 1: Chunking and Embedding (File → PGVector)

DataJob 2: RAG API Building (PGVector → RagApi)

Try It Out in Cherry Studio

Tool Call (MCP Example)

Wrapping Up

Why BladePipe RagApi?

Demo: A Q&A Service for BladePipe Docs

Step-by-Step Setup

Install BladePipe

Prepare Your Resources

Add DataSources

DataJob 1: Vectorize Your Data

DataJob 2: Build RagApi Service

Test

Wrapping Up

Why it Matters

Simplified Procurement

Unified Billing and Invoicing

Enhanced Flexibility

Getting Started