Agents will flop, delegation will rise
There is a lot of shit that can go wrong with agentic AI.
- Data breaches
- Data poisoning
- Cascading failures
- Unintended consequences
- Misalignment
- Losing track
With tool calling hooking directly into databases and company systems, they are the opposite of pure. They have more side-effects than …
I do not expect agentic AI (completely autonomous) to be successfull. There are too many risk, especially in the enterprise. But I expect that delegation will become more prevalanet with “test-time-compute”. We delegate longer running task to workflows (multi-steps, pre-defined workflows with LLMs for decision making in control flow or as workers in the nodes) that are sepcialized for certain actions and wait till it reports back.
AI delegation has given rise to a new form of automation known as agentic automation. Unlike traditional rules-based automation, which relies on predefined rules and workflows, agentic automation leverages AI agents to handle more complex and unstructured processes. These agents can adapt to changing situations and make decisions in real-time, leading to more efficient and flexible automation solutions.
However, it’s important to recognize that the future of enterprise workflows likely lies in a combination of both probabilistic (agentic) and deterministic (rules-based) technologies. This blended approach allows organizations to leverage the strengths of each, creating robust and adaptable automation solutions that can handle a wide range of tasks and processes.
Gemini deep research is a good example (find more!).
Large Language Models (LLMs) are a key component of AI delegation frameworks. LLMs are trained on massive amounts of text data and can understand and generate human-like text. This makes them well-suited for tasks such as:
Summarizing Information: LLMs can analyze large volumes of text and provide concise summaries, helping humans quickly grasp key insights. Generating Ideas: LLMs can generate creative ideas and potential solutions, expanding the range of options considered in decision-making. Identifying Patterns: LLMs can identify patterns and trends in data, providing valuable insights for decision-making. Analyzing Historical Data: LLMs can analyze historical data to understand past trends and inform future decisions. By integrating LLMs into AI workflows, organizations can leverage their ability to process information, generate insights, and support human decision-making.
A third example is the AI delegation framework proposed by Ross Dawson, which outlines different levels of AI delegation in decision-making. These levels range from “human only” to “AI recommendation,” where AI proposes its preferred actions, and humans approve or use them as input for their decision-making. This framework provides a structured approach to integrating AI into decision-making processes.
Framework for delegation:
-
Task Setup: Humans frame the goal. They set limits on what the AI can do. Like setting guardrails on a road.
-
AI Processing: The AI crunches data, spots patterns, and flags key findings. But it doesn’t act on them.
-
Human Review: A person checks the AI’s work. They can:
- Accept it as is
- Ask for more detail
- Change direction
- Stop the process
-
Feedback Loop: Each round of work helps tune the system. The AI learns what outputs are useful. Humans learn what tasks to delegate.
Think of it like teaching a new employee:
First you show them basic tasks Then you check their work Over time, you trust them with more But you keep the final say
The key is the handoff points. AI does the heavy lifting, but humans control when and how it happens.
Here’s how AI delegation works in real tasks:
Data Entry
- AI scans invoices and pulls key fields
- Human quickly checks extracted data
- AI flags odd entries (wrong format, unusual amounts)
- Human fixes flags, approves batches
- AI learns from corrections
Data Cleaning
- Human sets rules (“flag outliers over 3 std dev”)
- AI finds patterns and suggests fixes
- Human reviews a sample of changes
- AI applies approved fixes to full dataset
- Human gets final check of key metrics
Financial Research
- Human picks topics (“Compare tech startups’ growth rates”)
- AI pulls data from reports, news, filings
- AI spots trends and flags key points
- Human reviews findings, asks new questions
- AI digs deeper on specific areas
- Human builds final insight from AI’s work
Coding Tasks
- Human outlines feature needs
- AI suggests code structure
- Human picks approach
- AI writes basic code
- Human reviews, tweaks architecture
- AI handles routine parts (tests, docs)
- Human focuses on core logic
- AI flags potential bugs
- Human makes final call on changes
The pattern? Humans guide while AI handles the grunt work. Each knows its role. The work flows back and forth, getting better each round.
This still means that we need people to design these:
- Engineers figure out the steps and the workflow
- They figure out the points where the human should intervene or sign off
Design Work Needed:
- Map the Process
- Find natural break points in workflows
- Spot where things often go wrong
- Mark where expert judgment matters most
- Test different stopping points
- Set Check Points
- Pick key moments for human review
- Build in safety stops
- Add ways to roll back changes
- Create override options
- Before expensive operations (like writing to a database, calling an expensive model)
- To review operations (e.g. searching a vdb)
- Handle Edge Cases
- Plan for weird data
- Add escape hatches for tough calls
- Build in ways to flag problems
- Make manual override easy
- Make it Learn
- Track what humans change
- Note which flags help
- Watch for patterns in mistakes
- Build in feedback loops
Example: Data Cleaning System
Before: Raw data → Human cleans → Human checks → Done
After:
- Engineer maps common clean-up steps
- Builds in check points after each big change
- Adds ways to see what changed
- Creates undo options
- Tests what needs human eyes
- Refines based on what works
The hard part? Knowing where to draw the lines. Too many checks: system’s slow. Too few: things break.