Connecting Custom AI Agents to OpenWebUI: Auth, Latency, and API Design

Why OpenWebUI as the Interface Layer

OpenWebUI is an open-source, self-hostable chat interface that supports the OpenAI API spec. That compatibility means any system that implements

— including your custom agent — can be plugged into OpenWebUI as a backend model.

For VirtuAI, this was the integration path: deploy the agent as a FastAPI service exposing an OpenAI-compatible API, register it in OpenWebUI as a custom model, and users get a polished chat interface without building one from scratch.

The OpenAI-Compatible API Contract

Your agent needs to implement two endpoints:

The chat completions request shape:

Implementing the Endpoint with FastAPI

Streaming: The UX Non-Negotiable

Non-streaming responses for agentic tasks create terrible UX — the user sees nothing for 10–30 seconds while the agent runs tool calls.

OpenAI's streaming format uses Server-Sent Events with

lines:

For agentic workflows where tool calls happen before generation, streamstatus updates during the tool execution phase: "Querying VM fleet...", "Analysing utilisation metrics...". This keeps the user informed and dramatically improves perceived performance.

Authentication Architecture

OpenWebUI → custom agent authentication has two layers:

Layer 1: OpenWebUI to Agent API

OpenWebUI passes a bearer token in the

header. Your agent verifies this:

Layer 2: Agent to Azure Resources

The agent itself needs to authenticate to Azure APIs. Use Managed Identity when deployed on Azure, never hardcoded credentials:

API Latency Management

Agentic requests are inherently slow — multiple LLM calls, multiple Azure API calls. The strategies we use:

1. Parallel tool execution — when tool calls have no dependencies, run them concurrently:

2. Response caching — Azure resource queries are cached for 5 minutes:

3. Timeout budgets — each tool call has a hard timeout:

The result: median end-to-end latency of 4.2 seconds for simple queries, 12–18 seconds for complex multi-tool workflows — all streamed so the user sees progress the entire time.