The agent isn't your orchestrator

I was recently building an autonomous agent for ESG due diligence - a deep knowledge base of frameworks, standards, and prior assessments, with the tools to retrieve, score, and cite against a fund manager's documentation. Given a single ESG question, it did the job beautifully. The reasoning was solid, the citations were grounded, the confidence scoring held up.

Then the real workload landed: a hundred questions, end to end, every one of them needing that same depth of reasoning. My first instinct was the obvious one - hand the agent the full list and let it work through them autonomously. It's an autonomous agent. That's the whole point.

On paper this was the cleanest design imaginable. One agent. One run. One unit of work. It even completed end-to-end in a demo.

Then I pushed it past ten questions and the cracks started showing.

Why the agent-as-loop pattern broke

Copilot Studio agents have effective execution limits that don't matter at demo scale and matter enormously at production scale. Execution is single-threaded by design, so a hundred questions process serially. At 20–30 seconds a question (quick by RAG standards), that's 30–50 minutes of unbroken runtime. Long before I hit any hard ceiling, I'd already lost: the session would always timeout. One failure mid-loop put the entire job at risk, and replaying just the failures meant reconstructing state by hand.

And then there was the silent killer: context pressure. Every tool call, every retrieved chunk, every reasoning trace adds to the working context. By question 12, the agent was swimming in a whole lot of context. Answers got worse as the run went on. It didn't show up in the logs - it showed up as inconsistent quality across the run, which is the worst kind of failure mode because it doesn't trip an alarm.

The reframe

The instinct to put orchestration inside the agent is the same instinct that puts retry logic inside an HTTP client when the cleaner answer is a queue. Autonomous doesn't mean omnipotent. The agent I'd built was a brilliant unit of autonomous work - it reasoned, retrieved, scored, cited - but it wasn't built to manage a hundred-question pipeline against itself. That's a different layer.

Once I let go of "the agent does everything", the pattern almost designed itself.

The fan-out pattern I landed on

The cleaner shape splits responsibilities across three components: a Power Automate flow that drives the work, the agent as a stateless processor, and Dataverse as the durable store of outcomes. The bit I hadn't expected going in was just how much was solved by going further than batching - by giving the agent one question per invocation, not ten.

Flow A - the dispatcher. Reads the input set - the hundred ESG questions for a given fund manager. Fans them out across parallel branches, each one invoking the agent with a single question as input. Records the run metadata up front so partial completion is meaningful and recoverable.

The agent. Takes one question. Has the entire context window to itself to retrieve, reason, and cite against the knowledge base - no prior questions cluttering its working memory, no budget split across nine other items it also has to get through. Writes the result - answer, citations, confidence score, framework mapping, anything else worth keeping - directly into a Dataverse table keyed by run ID and question ID. Returns when done. The agent doesn't know it's one of a hundred. It just knows it has one question to do properly.

Flow B - the aggregator. Triggers when all invocations complete (or polls until they do). Reads the Dataverse table for the run ID. Produces the consolidated output: the populated due diligence template, a structured report, a return payload for whatever consumes it next.

What it got me

Parallelism, obviously. A hundred concurrent invocations turned a 50-minute serial slog into something that finished in a fraction of the time. But that's the least interesting benefit.

Quality, which I hadn't fully anticipated. Giving each question its own clean invocation meant the agent wasn't budgeting reasoning effort across a queue - it could spend as long as it needed retrieving, cross-checking, and citing for that one answer. The deep, ambiguous ESG questions - the ones that needed two or three passes through the knowledge base to ground properly - got noticeably better answers than they had inside a batch, let alone inside the original hundred-question run.

Idempotency at the question level. A failed invocation retries just that one question. Nothing else in the run is touched. I could re-run a single failed question without re-running anything else, which turned "the whole assessment failed, start again" into "question 47 failed, retry it" with everything else intact.

Observability for free. Power Automate's run history gave me per-question visibility without writing a line of code. Dataverse gave me a queryable record of every answer the agent had produced, with its own confidence score and citation trail. I could compare runs over time, audit individual answers, flag the low-confidence ones, surface drift across fund managers - all without going near an internal agent run log.

Decoupling. The agent doesn't need to know how many questions there are, where they came from, or what happens to the answers afterwards. Its job is to answer one question well. The output schema became a Dataverse table I could evolve independently - versioning question sets, scoring frameworks, anything else - without touching the agent.

Use each layer for what it's good at

What I'd actually built, without quite meaning to, was a separation of concerns: an agent doing what it's good at - autonomous reasoning over a knowledge base, calling tools, producing well-grounded, well-cited answers - and a flow doing what it's good at: scheduling, retry, fan-out, aggregation. Copilot Studio isn't built to be an orchestration engine over its own runtime, and it doesn't have to be, because Power Automate is right there, designed exactly for this kind of work. The "fan out, process, aggregate" shape isn't novel; it's the standard form of batch work everywhere. The novel bit was recognising that an autonomous agent belongs inside that shape, not above it.

The agent is the worker. The flow is the foreman. Stop asking either of them to do the other's job.