Skip to main content

3 posts tagged with "Data & AI"

About AI

View All Tags

GenAI Core Concepts Explained (RAG, Function Calling, MCP, AI Agent)

· 24 min read
Barry
Chief Technology Officer

Overview

Large language models (LLMs) are getting better and smarter by the day—but they still have their limits. For example, they can hardly ingest the latest info, or do anything beyond text replies. Also, hallucination is a prominent problem.

To fill these gaps, a few key ideas have started to take off: RAG (Retrieval-Augmented Generation), Function Calling, MCP (Model Context Protocol), and AI Agents. These tools help models stay up to date, connect with real-world tools, and even complete tasks for you.

BladePipe recently launched a new feature called RagApi, which brings these ideas together in a powerful and easy-to-use way. In this blog, we’ll walk you through what each of these terms means, how they work, and how BladePipe puts them into action.

RAG: Retrieval-Augmented Generation

RAG(Retrieval-Augmented Generation) is an AI architecture combining two things: retrieving information and generating answers. Instead of having the LLMs reply only based on what it “remembers” from training, RAG lets it look up relevant info from external sources—like documents or databases—and then use that info to answer your question, which will be more accurate and relevant.

So, it’s like giving your AI access to a private library every time you ask it something.

RAG's Strengths

  • Up-to-date knowledge: The model doesn’t just rely on old training data. It can pull in fresh or domain-specific content.
  • Work with private data: You can enjoy enhanced security and customized services.
  • Fewer made-up answers: Since it’s referencing real content, it’s less likely to hallucinate or guess.

RAG Workflow

  1. Build a knowledge base: Take your documents, slice them into smaller parts, and turn them into vectors. Store them in a vector database like PGVector.
  2. User asks a question: That question is also turned into a vector.
  3. Do a similarity search: The system finds the most relevant text chunks from the database.
  4. Feed results to the model: These chunks are added to the model prompt to help it generate a better answer.

What is a Vector?

In RAG, turning text into vectors is step one. Then, what is a vector on earth?

Think of a vector as a way for a computer to understand meaning using numbers.

To make it easy to understand, let’s take the word “apple.” We humans know what it is from experience. But to a computer, it has to be turned into a vector through the method embedding—like:

[0.12, 0.85, -0.33, ..., 0.07] (say, 768 dimensions)

Each number (dimension) represents a hidden meaning:

  • The 12th dimension might say “is it a fruit?”
  • The 47th dimension might say “is it food?”
  • The 202nd dimension might say “is it a company?”
  • The 588th dimension might say “is it red in color?”

Each dimension stands for a feature, and the number at each dimension is like the "score" for it. The higher the score is, the prominent the feature is.

Based on the scores for all dimensions, every word or sentence gets a position in a semantic space, like a pin on a multi-dimensional map.

How do We Measure Similarity?

Let’s say we turn “apple” and “banana” into vectors. Even though they’re different words, their scores for many dimensions are similar, because they are similar semantically.

To make it understandable, we use three dimensions [category, edible or not, color] to measure the semantic similarity of the words: apple, banana and plane.

Word[Category, Edible or Not, Color]VectorDescription
AppleFood + Edible + Red[1.0, 1.0, 0.8]It's edible food and the color is red
BananaFood + Edible + Yellow[1.0, 1.0, 0.3]It's edible food and the color is yellow
PlaneTransport Vehicle + Not Edible + Silver[0.1, 0.1, 0.9]It's a metallic-colored transport vehicle. It's not edible.

When measuring the semantic similarity, we don't check whether the number value is big or small. Instead, we use something called cosine similarity to check how close their “directions” are. The smaller the angle between two vectors is, the more similar their meanings are.
cos(θ) = (A · B) / (||A|| × ||B||)
If they point in the same direction → very similar (cosine ≈ 1)
If they point in different directions → not so similar (cosine ≈ 0, or even be negative)

Function Calling: Enable LLMs to Use Tools

Normally, LLMs just reply with text. But what if you ask “Can you check tomorrow’s weather in California?” The model might not know the answer—because it doesn’t have real-time access to weather data.

That’s where Function Calling comes in. It lets the model call external tools or APIs to get real answers.

With function calling, the model can:

  1. Decide if a task needs a tool (like a weather API, calculator, or database).
  2. Extract the right parameters from your question (like the city name “California” and time “tomorrow”).
  3. Generate a tool call—usually in JSON format.
  4. Pass the call to your system, which runs the function and sends the result back to the model
  5. The model then replies with a natural-language answer based on the result.

Simple Example: Weather Query

Let’s say the user says:

“I’m going to California tomorrow. Can you check the weather for me?”

The model does this behind the scenes:

  1. Pull out the parameters: city “California” and time “tomorrow”
  2. Make a plan: use the get_weather tool
  3. Generate a too call: output a tool call to get_weather, together with the necessary parameters.

Prompt for Weather Query

To help you understand the principle and process of Function Calling more intuitively, here we have a Prompt template for demonstration. You just need to copy it to Cherry Studio, then you can see how the model analyzes user requests, extracts parameters, and generates tool calling instructions.

{
"role": "AI Assistant",
"description": "You are an AI assistant. Your primary goal is to analyze user queries and respond in a structured JSON format. If a query requires a tool and all necessary parameters are present, prepare for tool use. If a query requires a tool but essential parameters are missing, you MUST ask the user for clarification. If no tool is needed, answer directly. Your entire output MUST be a single JSON object at the root level, strictly adhering to the 'response_format'. Ensure all required fields from the schema (like 'requires_tools') are always present in your JSON output.",
"capabilities": [
"Analyzing user queries for intent and necessary parameters.",
"Identifying when required parameters for a tool are missing.",
"Strictly following instructions to set 'requires_tools' to false and use 'direct_response' to ask *only* for the specific missing information required by the tool.",
"Remembering the initial query context (e.g., 'weather' intent) when a user provides previously missing information, and then proceeding to tool use if all tool requirements are met.",
"Preparing and executing tool calls when the query intent matches a tool and all its defined required parameters are satisfied. Do not ask for details beyond the tool's documented capabilities.",
"Formulating direct answers for non-tool queries or clarification questions.",
"Detailing internal reasoning in 'thought' and, if calling a tool, a step-by-step plan in 'plan' (as an array of strings)."
],
"instructions": [
"1. Analyze the user's query and any relevant preceding conversation turns to understand the full context and intent.",
"2. **Scenario 1: No tool needed (e.g., greeting, general knowledge).**",
" a. Set 'requires_tools': false.",
" b. Populate 'direct_response' with your answer.",
" c. Omit 'thought', 'plan', 'tool_calls'. Ensure 'requires_tools' and 'direct_response' are present.",
"3. **Scenario 2: Tool seems needed, but *required* parameters are missing (e.g., 'city' for weather).**",
" a. **You MUST set 'requires_tools': false.** (Because you cannot call the tool yet).",
" b. **You MUST populate 'direct_response' with a clear question to the user asking *only* for the specific missing information required by the tool's parameters.** (e.g., if 'city' is missing for 'get_weather', ask for the city. Do not ask for additional details not specified in the tool's parameters like 'which aspect of weather').",
" c. Your 'thought' should explain that information is missing, what that information is, and that you are asking the user for it.",
" d. **You MUST Omit 'plan' and 'tool_calls'.** Ensure 'requires_tools', 'thought', and 'direct_response' are present.",
" e. **Do NOT make assumptions** for missing required parameters.",
"4. **Scenario 3: Tool needed, and ALL required parameters are available (this includes cases where the user just provided a missing parameter in response to your clarification request from Scenario 2).**",
" a. Set 'requires_tools': true.",
" b. Populate 'thought' with your reasoning for tool use, acknowledging how all parameters were met (e.g., 'User confirmed city for weather query.').",
" c. Populate 'plan' (array of strings) with your intended steps (e.g., ['Initial query was for weather.', 'User specified city: Chicago.', 'Call get_weather tool for Chicago.']).",
" d. Populate 'tool_calls' with the tool call object(s).",
" e. **If the user just provided a missing parameter, combine this new information with the original intent (e.g., 'weather'). If all parameters for the relevant tool are now met, proceed DIRECTLY to using the tool. Do NOT ask for further, unrelated, or overly specific clarifications if the tool's defined requirements are satisfied by the information at hand.** (e.g., if tool gets 'current weather', don't ask 'which aspect of current weather').",
" f. Omit 'direct_response'. Ensure 'requires_tools', 'thought', 'plan', and 'tool_calls' are present.",
"5. **Schema and Output Integrity:** Your entire output *must* be a single, valid JSON object provided directly at the root level (no wrappers). This JSON object must strictly follow the 'response_format' schema, ensuring ALL non-optional fields defined in the schema for the chosen scenario are present (especially 'requires_tools'). Respond in the language of the user's query for 'direct_response'."
],
"tools": [
{
"name": "get_weather",
"description": "Gets current weather for a specified city. This tool provides a general overview of the current weather. It takes only the city name as a parameter and does not support queries for more specific facets of weather (e.g., asking for only humidity or only wind speed). Assume it provides a standard, comprehensive current weather report.",
"parameters": {
"city": {
"type": "string",
"description": "City name",
"required": true
}
}
}
],
"response_format": {
"type": "json",
"schema": {
"requires_tools": {
"type": "boolean",
"description": "MUST be false if asking for clarification on missing parameters (Scenario 2) or if no tool is needed (Scenario 1). True only if a tool is being called with all required parameters (Scenario 3)."
},
"direct_response": {
"type": "string",
"description": "The textual response to the user. Used when 'requires_tools' is false (Scenario 1 or 2). This field MUST be omitted if 'requires_tools' is true (Scenario 3).",
"optional": true// Optional because it's not present in Scenario 3
},
"thought": {
"type": "string",
"description": "Your internal reasoning. Explain parameter absence if asking for clarification, or tool choice if calling a tool. Generally present unless it's an extremely simple Scenario 1 case.",
"optional": true// Optional for very simple direct answers
},
"plan": {
"type": "array",
"items": {
"type": "string"
},
"description": "Your internal step-by-step plan (array of strings) when 'requires_tools' is true (Scenario 3). Omit if 'requires_tools' is false. Each item MUST be a string.",
"optional": true// Optional because it's not present in Scenario 1 or 2
},
"tool_calls": {
"type": "array",
"items": {
"type": "object",
"properties": {
"tool": { "type": "string", "description": "Name of the tool." },
"args": { "type": "object", "description": "Arguments for the tool." }
},
"required": ["tool", "args"]
},
"description": "Tool calls to be made. Used only when 'requires_tools' is true (Scenario 3). Omit if 'requires_tools' is false.",
"optional": true// Optional because it's not present in Scenario 1 or 2
}
}
},
"examples": [
// Example for Scenario 3 (direct tool use)
{
"query": "What is the weather like in California?",
"response": {
"requires_tools": true,
"thought": "User wants current weather for California. City is specified. Use 'get_weather'.",
"plan": ["Identified city: California", "Tool 'get_weather' is appropriate.", "Prepare 'get_weather' tool call."],
"tool_calls": [{"tool": "get_weather", "args": {"city": "California"}}]
}
},
// Multi-turn example demonstrating Scenario 2 then Scenario 3
{
"query": "What is the weather like?", // Turn 1: User asks for weather, no city
"response": { // AI asks for city (Scenario 2)
"requires_tools": false,
"thought": "The user asked for the weather but did not specify a city. The 'get_weather' tool requires a city name. Therefore, I must ask the user for the city.",
"direct_response": "Which city do you want to check the weather in?"
}
},
{
"query": "Chicago", // Turn 2: User provides city "Chicago"
"response": { // AI uses tool (Scenario 3)
"requires_tools": true,
"thought": "The user previously asked for weather and has now provided the city 'Chicago'. All required parameters for 'get_weather' are met. The tool provides a general current weather report.",
"plan": ["Initial query was for weather.", "User specified city: Chicago", "Call 'get_weather' tool for Chicago."],
"tool_calls": [{"tool": "get_weather", "args": {"city": "Chicago"}}]
}
},
// Another multi-turn example (English)
{
"query": "What's the weather like today?", // Turn 1
"response": { // AI asks for city (Scenario 2)
"requires_tools": false,
"thought": "User wants today's weather but no city provided. 'get_weather' tool needs the city parameter. I must ask for clarification.",
"direct_response": "Sure, I can get the weather for you! Which city are you interested in?"
}
},
{
"query": "London", // Turn 2
"response": { // AI uses tool (Scenario 3)
"requires_tools": true,
"thought": "The user asked about weather and has now specified 'London'. The 'get_weather' tool takes city and provides current weather. Proceeding with tool call.",
"plan": ["Original intent was weather.", "City provided: London.", "Call 'get_weather' tool."],
"tool_calls": [{"tool": "get_weather", "args": {"city": "London"}}]
}
},
// Example for Scenario 1 (direct answer)
{
"query": "Hi",
"response": {
"requires_tools": false,
"direct_response": "Hello! Is there anything I can help you with?"
// 'thought' is optional for very simple cases
}
},
// Example of user being too vague even after specifying city for a non-weather query, but we want to avoid this for weather.
// This example is more to ensure the AI understands the "don't ask for unnecessary details for the get_weather tool" instruction.
{
"query": "Tell me about Paris.",
"response": {
"requires_tools": false, // Assuming no general knowledge tool for this example, or AI should ask for specifics.
"thought": "User is asking for general information about Paris. This is too broad. I should ask for more specific interests if no general info tool is available.",
"direct_response": "Paris has many aspects, like history, attractions, food, etc. Which aspect are you interested in?)"
}
},
{
"query": "Chicago's weather", // User explicitly states "Chicago's weather"
"response": { // AI should directly use the tool
"requires_tools": true,
"thought": "User explicitly asked for Chicago's weather. City is clear. The 'get_weather' tool is appropriate and provides a general current weather report.",
"plan": ["User query: Chicago's weather.", "City: Chicago.", "Call 'get_weather' tool."],
"tool_calls": [{"tool": "get_weather", "args": {"city": "Chicago"}}]
}
}
]
}

Multi-turn Conversation

  • The user asks:“What’s the weather like?” Since the user doesn't specify the city, the model cannot call the tool directly. The model should ask the user about the city.

  • The user replies:"Chicago". The model obtains the key information, extracts the parameters and generates tool_calls. The application recognizes requires_tools: true and calls the corresponding tool function according to tool_calls.

  • After the tool is executed, the results are returned to the model, which then summarizes and responds to the user based on the results.

In this process, the LLM understands the user's intention through natural language: what task to complete and what information is needed. It extracts key parameters from the conversation. The application can then call the function based on these parameters to complete the task and return the execution result to the model, which generates the final response.

MCP:A Unified Way for Tool Call

So, Function Calling helps AI call tools—but as you build more tools, things get messy. What if you have multiple tools, like one for weather, one for sending emails, and one for searching GitHub? Each has different formats, APIs, and connection types. And how can you use the tool system in different LLMs?

That’s where MCP comes in.

What is MCP?

MCP (Model Context Protocol) is an open standard introduced by Anthropic. It’s designed to help models and tools talk to each other in a more unified, flexible, and scalable way.

MCP allows a model to:

  • Run multi-step tool chains (like: check weather → send email)
  • Keep tool formats and parameters consistent
  • Support different call types (HTTP requests, local plugins, etc.)
  • Reuse tools across different models or systems

It doesn't replace Function Calling, but makes it easier to organize, standardize, and scale tool usage inside AI systems.

MCP Core Components

MCP Client

  • Ask MCP Server for the available tools list
  • Send tool call requests using HTTP or stdio

MCP Server

  • Receive tool calls and run the correct tool(s)
  • Send back structured results in a unified format

Methods to Call a MCP Server

HTTP (StreamableHttp)

MCP Server runs as a web service. The following interfaces are exposed:

  • /mcp: to receive tool calls and list available tools
  • Support Event Stream and JSON-RPC protocol.

Example: Call a weather service using HTTP:

cat > streamable_weather.mjs << 'EOF'
#!/usr/bin/env node
import express from "express";
import { McpServer, ResourceTemplate } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { isInitializeRequest } from "@modelcontextprotocol/sdk/types.js";
import { randomUUID } from "node:crypto";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const app = express();
app.use(express.json());

function getServer() {
const server = new McpServer({
name: "Weather",
version: "1.0.0"
});

server.resource(
"get_weather",
new ResourceTemplate("weather://{city}", { list: undefined }),
async (uri, { city }) => ({
contents: [{
uri: uri.href,
text: `Resource weather for ${city}: Tomorrow will be sunny, with a maximum of 59°F and a breeze of 2 mile/h`
}]
})
);

server.tool(
"get_weather",
{ city: z.string() },
async ({ city }) => ({
content: [{ type: "text", text: `Tool weather for ${city}: Tomorrow will be sunny, with a maximum of 59°F and a breeze of 2 mile/h` }]
})
);

server.prompt(
"get_weather",
{ city: z.string() },
({ city }) => ({
messages: [{
role: "user",
content: {
type: "text",
text: `Please tell me the weather conditions of ${city}`
}
}]
})
);
return server;
}

app.post("/mcp", async (req, res) => {
try {
const server = getServer();
const transport = new StreamableHTTPServerTransport({
sessionIdGenerator: undefined,
});

res.on("close", () => {
console.log("Request closed");
transport.close();
server.close();
});

await server.connect(transport);
await transport.handleRequest(req, res, req.body);
} catch (error) {
console.error("Error handling MCP request:", error);
if (!res.headersSent) {
res.status(500).json({
jsonrpc: "2.0",
error: {
code: -32603,
message: "Internal server error",
},
id: null,
});
}
}
});

app.get("/mcp", (req, res) => {
console.log("Received GET MCP request");
res.status(405).json({
jsonrpc: "2.0",
error: {
code: -32000,
message: "Method not allowed.",
},
id: null,
});
});

app.delete("/mcp", (req, res) => {
console.log("Received DELETE MCP request");
res.status(405).json({
jsonrpc: "2.0",
error: {
code: -32000,
message: "Method not allowed.",
},
id: null,
});
});

const PORT = process.env.PORT || 30001;
app.listen(PORT, () => {
console.log(
`MCP Stateless Streamable HTTP Server listening on http://localhost:${PORT}/mcp`
);
});

EOF
# Install dependencies
npm install express @modelcontextprotocol/sdk zod

# Start the service
node streamable_weather.mjs

# Get the tool list
curl -N -X POST http://localhost:30001/mcp \
-H 'Accept: application/json, text/event-stream' \
-H 'Content-Type: application/json' \
-d '{
"jsonrpc":"2.0",
"id":1,
"method":"tools/list",
"params":{}
}'

# > Return tools
event: message
data: {"result":{"tools":[{"name":"get_weather","inputSchema":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}}]},"jsonrpc":"2.0","id":1}

# Execute tool calls
curl -N -X POST http://localhost:30001/mcp \
-H 'Accept: application/json, text/event-stream' \
-H 'Content-Type: application/json' \
-d '{
"jsonrpc":"2.0",
"id":2,
"method":"tools/call",
"params":{
"name":"get_weather",
"arguments":{ "city":"California" }
}
}'

# > Return tool call results
event: message
data: {"result":{"content":[{"type":"text","text":"Tool weather for California: Tomorrow will be sunny, with a maximum of 59°F and a breeze of 2mile/h"}]},"jsonrpc":"2.0","id":2}

Stdio (Local Plug-in)

Stdio mode is suitable for plug-ins running locally. The model talks to the MCP Server via standard input/output. It doesn't need internet, great for on-premise or secure environments.

Example: Call a weather service using Stdio:

cat > weather_stdio.mjs << 'EOF'
#!/usr/bin/env node
import { McpServer, ResourceTemplate } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new McpServer({
name: "Weather",
version: "1.0.0"
});

server.resource(
"get_weather",
new ResourceTemplate("weather://{city}", { list: undefined }),
async (uri, { city }) => ({
contents: [{
uri: uri.href,
text: `Resource weather for ${city}: Tomorrow will be sunny, with a maximum of 59°F and a breeze of 2 mile/h`
}]
})
);

server.tool(
"get_weather",
{ city: z.string() },
async ({ city }) => ({
content: [{ type: "text", text: `Tool weather for ${city}: Tomorrow will be sunny, with a maximum of 59°F and a breeze of 2 mile/h` }]
})
);

server.prompt(
"get_weather",
{ city: z.string() },
({ city }) => ({
messages: [{
role: "user",
content: {
type: "text",
text: `Please tell me the weather conditions of ${city}`
}
}]
})
);

const transport = new StdioServerTransport();
await server.connect(transport);
EOF
# Get the tool list
printf '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}\n' | node weather_mcp.mjs

# > Return tools
{"result":{"tools":[{"name":"get_weather","inputSchema":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}}]},"jsonrpc":"2.0","id":1}

# Execute tool calls: call get_weather
printf '{"jsonrpc":"2.0","id":4,"method":"tools/call","params":{"name":"get_weather","arguments":{"city":"California"}}}\n' | node weather_mcp.mjs

# > Return tool call results
{"result":{"content":[{"type":"text","text":"Tool weather for California: Tomorrow will be sunny, with a maximum of 59°F and a breeze of 2 mile/h"}]},"jsonrpc":"2.0","id":4}

Multi-turn Interaction with MCP

  • A user asks: "What’s the weather in California?"

  • The model knows it should use get_weather, and it generates the following instruction:
{
"tool": "get_weather",
"args": {
"city": "California"
}
}
  • The user's program passes the instruction to MCP Server. MCP Server runs the tool and sends back the result:

  • The user's program extracts the text field, content.text, from result.content, and turns it into a friendly answer:

Tomorrow California will be sunny, with a highest temperature of 59°F and a breeze of 2 mile/h. If you want to know more, just ask.

MCP provides a unified and scalable execution framework for connecting models to the outside world, with the following advantages:

  • Support multiple communication methods (HTTP, Stdio);
  • Tools are discoverable, reusable, and consistent
  • Work across different models, platforms, and languages
  • Easy to deploy locally or remotely.

Together with Function Calling, MCP gives you a standard pipeline to build an AI Agent—clean, organized, and easy to scale.

AI Agent: Smart Digital Assistant

So far we’ve talked about how models can:

  • Look up info (with RAG),
  • Call tools (with Function Calling),
  • Organize those tools (with MCP).

Now imagine putting all that together into one system that can understand a task, plan steps, use tools, and get things done — that’s an AI Agent.

An AI Agent is more than a chatbot. It’s a smart digital assistant that can:

  • Understand your goal: You say: “Help me check California’s weather and email it to my team.”
  • Break the task into steps: First check weather → then write email → then send.
  • Use the right tools: Call get_weather API, then trigger email service.
  • Remember context: It keeps track of the conversation or project so far.
  • Handle errors & retry: If a tool fails, it can try again or ask you for missing info.

It’s like hiring a junior assistant who learns fast, handles tools, and follows instructions—just in software form. The use cases are growing fast. As long as you define what the agent should do—and give it tools—it can do real work for you.

Summary

ConceptNatureData SourcesUse CasesReal Example
RAGResearch + GenerationKnowledge/docsIndustry-specific Q&A, dynamic knowledge updatesEnterprise knowledge base, customer service robot
Function CallingCall external functionsAPI/databasesReal-time data interaction, automated tasksWeather query, order processing
MCPStandard tool call protocolPlatforms (e.g. GitHub)Cross-model and cross-platform collaborationIntelligent workflow (such as checking weather + sending email)
AI AgentAutomatic plan and executeCombined (RAG + tool call)Automation of complex tasksPersonal assistant

BladePipe RAG

BladePipe recently launched RagApi, a plug-and-play service that brings together RAG architecture and MCP protocol to build smart, real-time RAG services with external tool support.

What’s great? It’s designed to be simple, powerful, and non-developer-friendly. No code is needed.

Workflow

The API of BladePipe RagApi service exposed to the outside world is in OpenAI API format. RagApi turns your enterprise data into a smart assistant in two main steps:

DataJob 1: Chunking and Embedding (File → PGVector)

  1. Data Collection and Preparation
    Enterprise knowledge base includes markdown, txt, databases, internal documents, etc. Users create a DataJob through BladePipe to realize text embedding and configure information such as data sources, models, and target tables.
  2. Data Chunking and Embedding
    BladePipe automatically processes original documents and generates vectors, which are saved in PostgreSQL using the pgvector extension (e.g. in a column like __vector).

DataJob 2: RAG API Building (PGVector → RagApi)

  1. Query Embedding and Smart Query Tweaks
    BladePipe turns the user's query into vectors using the same model. In this process, query tweaks can be done:
  • QUERY_COMPRESS: Shrink long or vague queries to their core meaning, making the vector similarity search more accurate.
  • QUERY_EXTEND: Expand queries with synonyms or related terms to extend the similarity search scope.
  1. Vector Search and Knowledge Selection
    Do vector similarity search in the vector database to find the most relevant knowledge chunks. And you can use KNOWLEDGE_SELECT to filter knowledge. It supports multi-knowledge base routing. BladePipe picks the best matching pieces across tables based on semantic relevance.
  2. Prompt Building
    BladePipe constructs a complete context based on the configured Prompt template, combined with users' queries and search results.
  3. LLM Reasoning
    The created Prompt is fed into the configured Chat model (such as qwq-plus, gpt-4o, etc.) to generate the final answer. The interface is an OpenAI format interface, which can be directly connected to the applications.
  4. Call Tools with MCP (Optional)
    To do more beyond getting info, we need MCP. With MCP integration, RagApi can use tools to finish tasks(e.g. query GitHub PR status, call the company's API). It supports standardized MCP tools (HTTP or local via stdio). External tools are called via Function Calling, and the model replies with the final output.

The specific procedures are shown in the following blogs:

  1. Create and Store Embeddings in PGVector
  2. Create RAG API with PGVector

After the two DataJobs are running, the RAG service that can answer industry-specific questions and automate tasks based on MCP is available. The OpenAI-compatible RAG API is an enhanced version of the existing LLM APIs, with no need of client code change.

We can check the performance using Cherry Studio. CherryStudio is compatible with the OpenAI interface, suitable for interface joint debugging, context debugging, and model performance verification.

Try It Out in Cherry Studio

  1. Open Cherry Studio, and click the setting icon in the bottom left corner.
  2. In Model Provider, search Open AI and configure as follows:

  • Model Name: BP_RAG

  1. Go back to the chat page. Click Add assistant > Default Assistant.
  2. Right click Default Assistant > Edit assistant > Model Settings, and select the model added in Step 2.

  1. Go back to the chat page. Enter How to create an Incremental DataJob in BladePipe? RagApi will generate the answer based on the vector search and the LLM.

Tool Call (MCP Example)

If you configure the function of MCP (e.g. web crawling, GitHub query), the model can automate tool calls.

Follow the steps:

  1. Go to BladePipe, click Functions > Modify DataJob Params in the upper right corner of the DataJob details page.
  2. Select Target tab, find the parameter mcpServers, and paste the following configuration into it.
  3. Click Save in the upper right corner to confirm the parameter modification.
  4. Click Submit, and restart the DataJob.
{
"mcpServers": {
"github": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-github"
],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "<YOUR_TOKEN>"
}
},
"mcp-server-firecrawl": {
"command": "npx",
"args": [
"-y",
"firecrawl-mcp"
],
"env": {
"FIRECRAWL_API_KEY": "<YOUR_API_KEY_HERE>"
}
}
}
}

BladePipe automates multi-turn tool calls through MCP and makes a summary, then returns the final response.

Wrapping Up

RAG, Function Calling, MCP, and AI Agents aren’t just buzzwords—they’re the core building blocks of real, usable AI systems.

With RagApi, BladePipe wraps these powerful tools into an easy-to-use, no-code-needed solution that brings intelligent Q&A and automated tool usage into your apps instantly. In BladePipe, everyone can create your own RAG service. It’s AI that knows, thinks, and gets things done.

Build A RAG Chatbot with OpenAI - A Beginner's Guide

· 6 min read
Barry
Chief Technology Officer

In a previous article, we explained key GenAI concepts like RAG, Function Calling, MCP, and AI Agents. Now the question is: how do we go from concepts to practice?

Currently, you can find plenty of RAG building tutorials online, but most of them are based on frameworks like LangChain, which still have a learning curve for beginners.

BladePipe, as a data integration platform, already supports the access to and processing of multiple data sources. This gives it a natural edge in setting up the semantic search foundations for a RAG system. Recently, BladePipe launched RagApi, which wraps up vector search and Q&A capabilities into a plug-and-play API service. With just two DataJobs in BladePipe, you can have your own RAG service—no coding required.

Why BladePipe RagApi?

Compared to traditional RAG setups, which often involve lots of manual work, BladePipe RagApi offers several unique benefits:

  • Two DataJobs for a RAG service: One to import documents, and one to create the API.
  • Zero-code deployment: No need to write any code, just configure.
  • Adjustable parameters: Adjust vector top-K, match threshold, prompt templates, model temperature, etc.
  • Multi-model and platform compatibility: Support DashScope (Alibaba Cloud), OpenAI, DeepSeek, and more.
  • OpenAI-compatible API: Integrate it directly with existing Chat apps or tools with no extra setup.

Demo: A Q&A Service for BladePipe Docs

We’ll use BladePipe’s own documentation as a knowledge base to create a RAG-based Q&A service.

Here’s what we’ll need:

  • BladePipe – to build and manage the RagApi service
  • PostgreSQL – as the vector database
  • Embedding model – OpenAI text-embedding-3-large
  • Chat model – OpenAI GPT-4o

Here’s the overall workflow:

Step-by-Step Setup

Install BladePipe

Follow the instructions in Install Worker (Docker) or Install Worker (Binary) to download and install a BladePipe Worker.

Prepare Your Resources

  1. Log in OpenAI API platform and create the API key.
  2. Install a local PostgreSQL instance:
#!/bin/bash

# create file docker-compose.yml
cat <<EOF > docker-compose.yml
version: "3"
services:
db:
container_name: pgvector-db
hostname: 127.0.0.1
image: pgvector/pgvector:pg16
ports:
- 5432:5432
restart: always
environment:
- POSTGRES_DB=api
- POSTGRES_USER=root
- POSTGRES_PASSWORD=123456
volumes:
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
EOF

# Start docker-compose automatically
docker-compose up --build

# Access PostgreSQL
docker exec -it pgvector-db psql -U root -d api
  1. Create a privileged user and log in.
  2. Switch to the target schema where you need to create tables (like public).
  3. Run the following SQL to enable vector capability:
CREATE EXTENSION IF NOT EXISTS vector;

Add DataSources

Log in to the BladePipe Cloud. Click DataSource > Add DataSource.

Add Files:

Select Self Maintenance > SshFile. You can set extra parmeters.

  • Address: Fill in the machine IP where the files are stored and SSH port (default 22).
  • Account & Password: Username and password of the machine.
  • Parameter fileSuffixArray: set to .md to include markdown files.
  • Parameter dbsJson: Copy the default value and modify the schema value (the root path where target files are located)
[
{
"db":"cc_virtual_fs",
"schemas":[
{
"schema":"/tmp/cloudcanal-doc-v2/locales",
"tables":[]
}
]
}
]

Add the Vector Database:

Choose Self Maintenance > PostgreSQL, then connect.

Add a LLM:
Choose Independent Cloud Platform > Manually Fill > OpenAI, and fill in the API key.

Add RagApi Service:

Choose Self Maintenance > RagApi.

  • Address: Set host to localhost and port to 18089.
  • API Key: Create your own API key for later use.

DataJob 1: Vectorize Your Data

  1. Go to DataJob > Create DataJob.
  2. Choose source: SshFile, target: PostgreSQL, and test the connection.

  1. Select Full Data for DataJob Type. Keep the specification as default (2 GB).
  2. In Tables page,
    1. Select the markdown files you want to process.
    2. Click Batch Modify Target Names > Unified table name, and fill in the table name (e.g. vector_store).

  1. In Data Processing page,
    1. Click Set LLM > OpenAI, and select the instance and the embedding model (text-embedding-3-large).
    2. Click Batch Operation > LLM embedding. Select the fields for embedding, and check Select All.

  1. In Creation page, click Create DataJob.

DataJob 2: Build RagApi Service

  1. Go to DataJob > Create DataJob.
  2. Choose source: PostgreSQL(with vectors stored), target: RagApi, and test the connection.

  1. Select Incremental for DataJob Type. Keep the specification as default (2 GB).
  2. In Tables page, select the vector table(s).

  1. In Data Processing page, click Set LLM:
    1. Embedding LLM: Select OpenAI and the embedding model (e.g. text-embedding-3-large). Note: Make sure vector dimensions in PostgreSQL match the embedding model.

    2. Chat LLM: Select OpenAI and the chat model (e.g. gpt-4o).

  1. In Creation page, click Create DataJob to finish the setup.

Test

You can test the RagApi with CherryStudio, a visual tool that supports OpenAI-compatible APIs.

  1. Open CherryStudio, click the Settings icon in the bottom left corner.
  2. Under Model Provider, search for OpenAI and configure:

  1. Back on the chat page:
    • Click Add Assistant > Default Assistant.
    • Right click Default Assistant > Edit Assistant > Model Settings, and choose BP_RAG as the default model.

  1. Now try asking: How to create an incremental DataJob in BladePipe?. RagApi will search your vector database and generate a response using the chat model.

Wrapping Up

With just a few steps, we’ve built a fully functional RagApi service from scratch—vectorized the data, connected to a vector DB, configured LLMs, generated Prompt and deployed an OpenAI-compatible API.

With BladePipe, teams can quickly build Q&A services based on outside knowledge without writing any code. It's a powerful yet accessible way to tap into GenAI for your own data.

Build a Local RAG Using Ollama, PostgreSQL and BladePipe

· 8 min read

Retrieval-Augmented Generation (RAG) is becoming increasingly common in enterprise applications. Unlike lightweight Q&A systems designed for personal users, enterprise RAG solutions must be reliable, controllable, scalable, and most importantly—secure.

Many companies are cautious about sending internal data to public cloud-based models or vector databases due to the risk of sensitive information leakage. For industries with strict compliance needs, this is often a dealbreaker.

To address these challenges, BladePipe now supports building local RAG services with Ollama, enabling enterprises to run intelligent RAG services entirely within their own infrastructure. This article walks you through building a fully private, production-ready RAG application—without writing any code.

What is an Enterprise-Grade RAG Service?

Enterprise-grade RAG emphasizes end-to-end integration, data control, and tight alignment with business systems. The goal isn’t just smart Q&A. It brings automation and intelligence that genuinely boost business.

Compared to hobby or research-focused RAG setups, enterprise systems have four key traits:

  • Fully private stack: All components must run locally or in a private cloud. No data leaves the enterprise boundary.
  • Diverse data sources: Beyond plain text files. Databases and more formats are supported.
  • Incremental data syncing: Business data updates constantly. RAG indexes must stay in sync automatically.
  • Integrated tool calling (MCP-like capabilities): Retrieval and generation are only part of the story. Tools like SQL query, function calls, or workflow execution must be supported.

Introducing BladePipe RagApi

BladePipe’s RagApi encapsulates both vector search and LLM-based Q&A capabilities and supports the MCP protocol. It’s designed to help every one quickly launch their own RAG services.

Compared to the traditional way to build RAG services, RagApi's key advantages include:

  • Two DataJobs for a RAG service: Import documents + publish API.
  • Zero-code deployment: Everything is configurable—no development needed.
  • Adjustable parameters: Adjust vector top-K, match threshold, prompt templates, model temperature, etc.
  • Multi-model and platform support: Support DashScope (Alibaba Cloud), OpenAI, DeepSeek, and more.
  • OpenAI-compatible API: Easily integrate into your existing chat UI or toolchain.

Demo

Here’s how to build a fully private, secure RAG service using:

  • Ollama for local model reasoning and embedding.
  • PostgreSQL for local vector storage.
  • BladePipe RagApi for building and managing the RagApi service.

The overall workflow is like:

Preparation

Run Ollama Locally

Ollama allows you to deploy LLMs on your local machine. It will be used for both embedding and reasoning.

  1. Download Ollama from https://ollama.com/download
  2. After installation, run the following command to pull and run a suitable model for embedding and reasoning, such as deepseek-r1. Note: Large models may require significant hardware resources.
ollama run deepseek-r1

Set Up PGVector

  1. Install Docker (Skip if already installed): For different operating systems, refer to the following steps for installation:
  • MacOS: Refer to the official installation doc: Docker Desktop for Mac.
  • CentOS / RHEL: Refer to the script below.
## centos / rhel
sudo yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

sudo yum install -y docker-ce-20.10.9-3.* docker-ce-cli-20.10.9-3.*

sudo systemctl start docker

sudo echo '{"exec-opts": ["native.cgroupdriver=systemd"]}' > /etc/docker/daemon.json

sudo systemctl restart docker
  • Ubuntu: Refer to the script below.
## ubuntu
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -

sudo add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"

sudo apt-get update

sudo apt-get -y install docker-ce=5:20.10.24~3-0~ubuntu-* docker-ce-cli=5:20.10.24~3-0~ubuntu-*

sudo systemctl start docker

sudo echo '{"exec-opts": ["native.cgroupdriver=systemd"]}' > /etc/docker/daemon.json

sudo systemctl restart docker
  1. Start PostgreSQL + pgvector Container Service: Execute the following command to deploy the PostgreSQL environment in one go:
cat <<'EOF' > init_pgvector.sh
#!/bin/bash

# create docker-compose.yml
cat <<YML > docker-compose.yml
version: "3"
services:
db:
container_name: pgvector-db
hostname: 127.0.0.1
image: pgvector/pgvector:pg16
ports:
- 5432:5432
restart: always
environment:
- POSTGRES_DB=api
- POSTGRES_USER=root
- POSTGRES_PASSWORD=123456
YML

# Start container service (run in background)
docker-compose up --build -d

# Wait for container to start, then enter database and enable vector extension
echo "Waiting for container to start..."
sleep 5

docker exec -it pgvector-db psql -U root -d api -c "CREATE EXTENSION IF NOT EXISTS vector;"
EOF

# Grant execution permissions and run the script
chmod +x init_pgvector.sh
./init_pgvector.sh

After execution, local PostgreSQL will automatically enable the pgvector extension, ready to store document embeddings securely on-prem.

Deploy BladePipe (Enterprise)

Follow the installation guide to download BladePipe (Enterprise).

RAG Building

Add DataSources

Log in to BladePipe. Click DataSource > Add DataSource.

Files(SshFile):
Select Self Maintenance > SshFile. You can set extra parmeters.

  • Address: Fill in the machine IP where the files are stored and SSH port (default 22).
  • Account & Password: Username and password of the machine.
  • Parameter fileSuffixArray: set to .md to include markdown files.
  • Parameter dbsJson: Copy the default value and modify the schema value (the root path where target files are located)
[
{
"db":"cc_virtual_fs",
"schemas":[
{
"schema":"/Users/zoe/cloudcanal-doc-v2/locales",
"tables":[]
}
]
}
]

Vector Database(PostgreSQL):
Choose Self Maintenance > PostgreSQL.
Configuration Details:

  • Address: localhost:5432
  • Account: root
  • Password: 123456

LLM (Ollama):
Choose Self Maintenance > Ollama.
Configuration Details:

  • Address: localhost:11434
  • Parameter llmEmbedding:
{
"deepseek-r1": {
"dimension": 4096
}
}
  • Parameter llmChat:
{
"deepseek-r1": {
"temperature": 1,
"topP": 0.9,
"showReasoning": false
}
}

RagApi Service (BladePipe):
Choose Self Maintenance > RagApi.

  • Address: Set host to localhost and port to 18089.
  • API Key: Customize a string (e.g. my-bp-rag-key), used for authentication when calling RagApi later.

DataJob 1: Vectorize the Docs

  1. Go to DataJob > Create DataJob.
  2. Choose source: SshFile, target: PostgreSQL, and test the connection.

  1. Select Full Data for DataJob Type. Keep the specification as default (2 GB).
  2. In Tables page,
    1. Select the markdown files you want to process.
    2. Click Batch Modify Target Names > Unified table name, and fill in the table name (e.g. knowledge_base).
    3. Click Set LLM > Ollama, and select the instance and the embedding model.

  1. In Data Processing page, click Batch Operation > LLM embedding. Select the fields for embedding, and check Select All.

  1. In Creation page, click Create DataJob.

DataJob 2: Build RagApi Service

  1. Go to DataJob > Create DataJob.
  2. Choose source: PostgreSQL(with vectors stored), target: RagApi, and test the connection.

  1. Select Incremental for DataJob Type. Keep the specification as default (2 GB).
  2. In Tables page, select the vector table(s). Then, click Set LLM, and choose Ollama as the Embedding LLM and Chat LLM.

  1. In Creation page, click Create DataJob to finish the setup.

  1. Perform a simple test using the following command:
curl http://localhost:18089/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer my-cc-rag-key" \
-d '{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"stream": false
}'

Test

You can test the RagApi with CherryStudio, a visual tool that supports OpenAI-compatible APIs.

  1. Open CherryStudio, click the Settings icon in the bottom left corner.
  2. Under Model Provider, search for OpenAI and configure:

  1. Back on the chat page:
    1. Click Add Assistant > Default Assistant.
    2. Right click Default Assistant > Edit Assistant > Model Settings, and choose BP_RAG as the default model.

  1. Now try asking: What privileges does BladePipe require for a source MySQL?. RagApi will search your vector database and generate a response using the chat model.

Final Thoughts

Enterprise-grade RAG services prioritize data privacy and control. By combining BladePipe with Ollama, you can easily achieve a fully private RAG service deployment, creating a truly reliable enterprise-grade RAG solution that does not depend on public networks.

Latest blog posts

Back to blogarrow-right
GenAI Core Concepts Explained (RAG, Function Calling, MCP, AI Agent)
Barry
Barry
Aug 21, 2025
Build A RAG Chatbot with OpenAI - A Beginner's Guide
Data & AI

Build A RAG Chatbot with OpenAI - A Beginner's Guide

Show how to create RagApi with BladePipe

Barry
Barry
Aug 21, 2025
Build a Local RAG Using Ollama, PostgreSQL and BladePipe
Data & AI

Build a Local RAG Using Ollama, PostgreSQL and BladePipe

Show how to create local RAG services with BladePipe and Ollama

B
BladePipe Team
Aug 21, 2025