Abhinav Yadav

Designing Agentic AI Workflows: Tool-Calling, Memory, and Multi-Modal Integration

December 12, 2025
AI AgentsLLMTool CallingMemoryMulti-ModalVirtuAI
Designing Agentic AI Workflows: Tool-Calling, Memory, and Multi-Modal Integration

What Makes a System "Agentic"?

An LLM that answers questions is not an agent. An agent is a system that can:
  1. Perceive — receive input from the environment (text, images, API responses, files)
  2. Plan — decide what actions to take to achieve a goal
  3. Act — execute tools that affect the real world (call APIs, write files, query databases)
  4. Observe — receive the results of actions and update its plan accordingly
  5. Iterate — repeat until the goal is achieved or a stopping condition is met
The loop is what distinguishes an agent from a single LLM call.

The Architecture of VirtuAI's Agent Framework

VirtuAI is an AI platform for Azure infrastructure management. Its agent needs to answer complex multi-step questions like: "Analyse our current VM fleet, identify over-provisioned instances, calculate the cost saving from right-sizing, and generate a migration plan."
That's 4–6 distinct tool calls with dependencies between them. A single prompt can't do it.

Tool Definition and Calling

With the OpenAI function calling API, tools are declared as JSON Schema and the model decides when and how to call them:

Memory Management

The naive agent keeps the entire conversation history in context. This works until the context window fills up (expensive) or degrades model performance (long contexts hurt reasoning quality).
We use a three-tier memory architecture:
Working Memory — the active conversation buffer (last N messages + current task state)
Episodic Memory — summaries of past sessions, stored in a vector DB and retrieved by semantic similarity to the current task
Semantic Memory — long-term facts about the user's infrastructure (subscription IDs, common patterns, preferences) stored as structured key-value pairs

Multi-Modal Integration

VirtuAI accepts architecture diagrams (PNG/JPG) as input. Users can photograph a whiteboard diagram and ask "analyse this architecture and identify single points of failure."
The GPT-4o vision capability handles this natively — images are passed as base64 in the message content:

Where Agent Frameworks Add Value

Frameworks like LangGraph, AutoGen, or custom orchestrators like Hermes (our internal framework) add value in four areas:
  1. State management — persisting agent state across async tool calls
  2. Retry and error handling — graceful degradation when tools fail
  3. Observability — tracing every tool call, token count, and decision
  4. Parallelism — running independent tool calls concurrently (reducing latency significantly)
For most production use cases, a simple hand-rolled agent loop (like the one above) is more maintainable than adopting a full framework. Use frameworks when you need their specific features, not by default.
The VirtuAI agent handles 500+ requests per day with a median response time of 4.2 seconds — covering tasks that previously required a senior cloud architect and 30 minutes of manual analysis.