User Interaction Architecture¶
Core Abstraction¶
AgentPool uses InputProvider to handle user interactions across different execution contexts (CLI, ACP, OpenCode, tests).
Three-Layer Architecture¶
┌─────────────────────────────────────────────┐
│ Layer 1: Tools (Protocol-Agnostic) │
│ - question, tool confirmations │
│ - Only knows MCP types │
│ - Calls ctx.handle_elicitation() │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ Layer 2: Context (Router) │
│ - get_input_provider() │
│ - Resolution: context → pool → fallback │
│ - Pure delegation, no protocol logic │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ Layer 3: InputProvider (Protocol-Specific) │
│ ┌──────┐ ┌─────┐ ┌──────────┐ ┌──────┐ │
│ │Stdlib│ │ ACP │ │ OpenCode │ │ Mock │ │
│ └──────┘ └─────┘ └──────────┘ └──────┘ │
└─────────────────────────────────────────────┘
Why separate layers: Different contexts have fundamentally different I/O mechanisms (blocking stdin vs SSE+HTTP vs protocol RPCs). Unifying them would violate their native patterns.
Providers¶
StdlibInputProvider¶
Location: agentpool/ui/stdlib_provider.py
Usage: CLI, fallback
Mechanism: Blocking input() calls
Limitations: No async, no rich UI, no multi-select
ACPInputProvider¶
Location: agentpool_server/acp_server/input_provider.py
Usage: ACP clients (Goose, Codex)
Mechanism: Maps elicitation → request_permission() [HACK]
Why hacky: ACP lacks native elicitation, shoehorns questions into permission system
Limitations: Max 4 options, no multi-select, wrong semantics
OpenCodeInputProvider¶
Location: agentpool_server/opencode_server/input_provider.py
Usage: OpenCode TUI/Desktop
Mechanism: SSE events + HTTP response endpoints
Flow: Create question → broadcast event → await HTTP reply → resolve future
Advantages: Native questions, multi-select, unlimited options, rich descriptions
MockInputProvider¶
Location: agentpool/ui/mock_provider.py
Usage: Tests
Mechanism: Pre-programmed responses
OpenCode Flow (Detailed)¶
Tool: question("Which DB?", options=[...])
↓
Context: ctx.handle_elicitation(params)
↓
Provider: OpenCodeInputProvider.get_elicitation()
│
├─ Generate question_id: "que_12345"
├─ Build OpenCode format with options
├─ Create asyncio.Future
├─ Store in state.pending_questions[id] = {future, ...}
├─ Broadcast SSE: QuestionAskedEvent
└─ await future # Blocks until HTTP response
↓
OpenCode UI receives SSE → shows question dialog
↓
User selects "PostgreSQL"
↓
POST /question/que_12345/reply {answers: [["PostgreSQL"]]}
↓
Route handler: provider.resolve_question(id, answers)
↓
future.set_result(["PostgreSQL"])
↓
Provider returns: ElicitResult(action="accept", content="PostgreSQL")
↓
Tool gets answer: "PostgreSQL"
Key insight: SSE broadcasts the question, HTTP receives the response. The future bridges the async gap.
Provider Resolution¶
context.input_provider # 1. Explicit (servers set per-session)
↓ (if None)
context.pool._input_provider # 2. Pool default
↓ (if None)
StdlibInputProvider() # 3. Fallback
Current Issues¶
1. Ownership Ambiguity¶
Problem: Can be set on agent, pool, or context with unclear precedence
Fix: Context should always own it, resolve at creation time
2. Invisible to Observers¶
Problem: Input requests don't appear in event stream
Impact: Can't observe when agent waits, can't replay conversations
Fix: Emit InputRequestEvent and InputResolvedEvent while still using provider for response
3. ACP Elicitation Hack¶
Problem: Uses permissions for questions (semantic mismatch)
Options:
- Add elicitation to ACP spec
- Accept limitation and document clearly
- Use ACP resources for complex input
Recommended Evolution¶
Phase 1: Fix Ownership¶
class NodeContext:
input_provider: InputProvider # Always set, never None
@classmethod
def create(cls, node, pool=None, input_provider=None):
provider = input_provider or pool?._input_provider or StdlibInputProvider()
return cls(node=node, input_provider=provider)
Benefit: Clear ownership, no scattered fallback logic
Phase 2: Add Observability¶
class InputProvider:
event_emitter: EventEmitter | None
async def get_elicitation(self, params):
# Emit for observability
if self.event_emitter:
await self.event_emitter.emit(InputRequestEvent.from_params(params))
# Handle via protocol-specific method
result = await self._handle_elicitation(params)
# Emit resolution
if self.event_emitter:
await self.event_emitter.emit(InputResolvedEvent(result))
return result
Benefit: Input requests visible in stream, no breaking changes
Phase 3: Bidirectional Streams (Future)¶
Support optional stream-based resume for advanced providers while keeping async fallback.
Design Decision: Why Not Pure Event Stream?¶
Considered: Making all interactions part of the bidirectional event stream
Rejected because:
- Different contexts are too different (blocking vs async vs protocol-specific)
- Adds bidirectional complexity to all clients
- Event stream becomes harder to reason about
- Current providers work well for their contexts
Hybrid approach: Emit events for observability, use providers for response handling
Capability Matrix¶
| Feature | Stdlib | ACP | OpenCode | Mock |
|---|---|---|---|---|
| Text input | ✅ | ✅ | Future | ✅ |
| Tool confirm | ✅ | ✅ | ✅ | ✅ |
| Boolean | ✅ | ✅ | ✅ | ✅ |
| Single-select | ✅ | ✅ (≤3) | ✅ | ✅ |
| Multi-select | ❌ | ❌ | ✅ | ✅ |
| Descriptions | ❌ | ✅ | ✅ | ✅ |
| Free JSON | ✅ | ❌ | Future | ✅ |
| Async | ❌ | ✅ | ✅ | ✅ |
Best Practices¶
Tools: Use MCP types, call ctx.handle_elicitation(), never check provider type
Servers: Create provider per-session, inject via context
Tests: Use MockProvider with pre-programmed responses
Agents: Set at pool level unless run-specific override needed