ADR 0004: Headless query planning and non-blocking execution¶

Status¶

Proposed.

Context¶

agentgrep is a headless search library with multiple frontends: CLI, Textual TUI, and MCP. The same backend behavior must serve all of them.

Recent profiling showed two different bottleneck shapes:

Prompt and find paths are often discovery-bound. Repeated store discovery, path enumeration, subprocess startup, and source-handle construction can dominate a query before record parsing begins.
Conversation paths are often collection-bound. Large JSONL transcripts, recursive message extraction, SQLite reads, and record-level filtering can dominate the run after sources have been selected.

The current event-stream engine gives agentgrep a useful producer/consumer boundary, but it is still a synchronous generator that frontends wrap in threads. That is good enough for the CLI and many tests, but it is not the long-term shape for a fully non-blocking TUI. Textual must never run broad discovery, JSON parsing, SQLite reads, ranking, or slow rendering work on the UI event loop. MCP tools also need a backend that can run from async tool wrappers without blocking the server.

Prior systems point to the same direction:

Dask keeps user intent as a logical expression tree and lowers it through optimizer phases before scheduling work. Its scheduler boundary separates graph execution from the concrete submit function: expression planning and local async scheduling.
DataFusion separates logical/physical planning from runtime state: physical planner, execution plan, and session state.
Tokio separates runtime construction from scheduler internals: runtime builder and scheduler implementations.
Daft and Polars show Python-facing plan construction with separate physical and streaming execution: Daft native runner, Daft logical plan, Polars lazy frame, and Polars physical stream graph.
Ray and Flink make remote/distributed execution a driver concern behind stable plan and execution interfaces: Ray Data streaming executor, Ray execution options, Flink planner interface, and Flink pipeline executor factory.
ClickHouse and DuckDB split plans from executable pipelines and scheduler work: ClickHouse QueryPlan, ClickHouse processors, DuckDB planner, DuckDB physical plan generator, and DuckDB task scheduler.
Hyperfine keeps benchmark execution and export as first-class surfaces: benchmark types and export formats.

agentgrep does not need to become any of those systems. The useful pattern is the separation: user intent, logical plan, physical plan, execution driver, result stream, and measurement are distinct responsibilities.

Decision¶

agentgrep will evolve the search backend into a typed query planning and execution system. The backend remains headless. CLI, TUI, and MCP are frontends over the same library request, result, and event types.

The architecture has six layers:

Query request: immutable user intent, including terms, field predicates, scope, agents, limits, dedupe, ranking, and cancellation policy.
Logical plan: a normalized, frontend-neutral plan describing source roles, source predicates, record predicates, ordering, limits, dedupe, and required capabilities.
Planner/optimizer: rewrites the logical plan into cheaper equivalent work, pushes source predicates into discovery, chooses direct lookup paths, and avoids version metadata or source construction that is not needed for the query.
Physical plan: an ordered set of source tasks with adapter strategies, cost hints, concurrency limits, output ordering rules, and fallback rules.
Execution driver: runs the physical plan using inline, threaded, async, or future worker-backed execution while preserving the same events and result semantics.
Result sinks: translate backend events into CLI Rich/text/JSON/NDJSON, TUI updates, and MCP response models.

The public surface is the event stream and result models, not the concrete execution strategy. A query that runs inline for tests, threaded for a classic CLI command, and async for the TUI must produce the same records, ordering, dedupe semantics, errors, and privacy-safe profile observations.

Interfaces¶

Names below describe the intended internal types and boundaries. They are not all public APIs until implemented and documented.

QueryRequest : Frozen user intent. It includes query text or a compiled query, target scope, selected agents, record limit, dedupe, ranking mode, and a cancellation token. It does not include output formatting.

LogicalSearchPlan : Frontend-neutral work description. It contains source-role requirements, field predicates, text predicates, source predicates, record predicates, ordering, dedupe, and limit semantics.

AdapterCapability : Per-adapter declarations for cheap operations: metadata-only discovery, source predicate support, path prefiltering, raw text prefiltering, SQLite predicate pushdown, JSONL line prefiltering, streaming records, and source-level cost hints.

PhysicalSearchPlan : Executable plan made of SourceTask items. Each task chooses one adapter strategy, declares whether it can stream records, records whether it emits newest-first records, records whether it may stop after satisfying the query limit, records scheduler-facing cost and source-group hints, and records how output order will be restored when work runs concurrently.

ExecutionDriver : The scheduling boundary. The first required drivers are an inline deterministic driver for tests and a non-blocking async/thread-backed driver for CLI/TUI/MCP. Source-local scanning and driver scheduling are separate modules so future process or worker drivers can keep the same logical and physical plan types.

SourceScanResult : The source-local execution boundary. A worker scans one SourceTask and returns candidates, counters, and timing. Global dedupe, top-K ordering, frontier pruning, and record emission stay with the driver so worker completion order cannot change search semantics.

SourceScanBatch : The incremental source-local execution boundary. A source scan may yield matching candidates and counters in batches before the source is fully drained. SourceScanResult remains the compatibility wrapper that collects those batches for call sites that still need a whole-source result. Batch scheduling is an execution-driver choice, not a parser behavior change.

LimitPolicy : The scheduler-local rule for deciding whether queued lower-priority sources can be skipped once enough candidates have reached the owner-thread frontier. The default policy preserves the current source-order frontier behavior; stricter global-newest policies can be added behind the same typed boundary when source metadata can prove them.

SearchEvent / FindEvent : The stream types. Existing events remain the baseline. Future events may add planning, warning, cancellation, or profile summaries only if old consumers can continue to ignore unknown event variants safely.

ResultSink : Output adapters. Rich/text, JSON, NDJSON/streaming, TUI, and MCP sinks consume events. They do not discover stores, parse records, or decide search semantics.

ProfileSink : Privacy-safe measurement. It records phase spans, adapter decisions, source-task counts, subprocess families, bytes/counts, cancellation, and output backpressure without prompt text, raw argv, or local absolute paths.

Result types¶

The result types must answer the user-facing questions raised by #55: did the run finish, was it bounded, is there another page, and how can a caller inspect one result without guessing at source paths?

Record emission alone is not sufficient. Search and find collectors must consume lifecycle events, counters, cancellation, warnings, and finish state, then expose that state through the frontend result payload.

Run status vocabulary¶

RunStatus : The terminal run state exposed by JSON, NDJSON final summaries, MCP tool responses, and TUI completion chrome.

complete : Every planned source and batch that could affect the requested result set was examined.

bounded : The run intentionally stopped at a documented semantic bound, such as a requested result/page limit, source-local bounded scan, answer-now request, or configured result cap. A normal paginated discovery response that emits a complete page and a usable next_cursor is bounded, not truncated. A cursorless search capped at limit is also bounded when lookahead proves another ordered match. More records may exist outside the examined bound.

truncated : The sink stopped emitting because of an output budget, byte budget, tool response budget, or client-imposed response limit before it could deliver the requested page/result payload. More matching records are known or likely to exist, and cursor continuation may be unavailable or unreliable.

cancelled : The caller, terminal user, TUI, MCP client, timeout, or replacement search cancelled the run before normal completion.

approximate : The run used an accepted approximation whose assumptions can affect completeness, such as mtime-as-recency or bounded newest-first scanning across stores with shared dedupe keys.

failed : The run stopped because of an unrecovered error. Partial results may be present only if the result payload marks them as partial and includes a diagnostic.

RunStatus values are compatibility-sensitive. Additive values require tests for every sink that renders or serializes run status.

Reachability¶

State	Producer today	Owed by
`complete`	the engine terminal-summary builder after full planned coverage	—
`bounded`	engine result lookahead, source bounds, and answer-now; discovery pagination adds its page-local bound	—
`truncated`	the MCP response-limit sink when it omits whole search records	any future CLI JSON/NDJSON byte budget
`cancelled`	the engine collector from `SearchControl` terminal evidence	—
`approximate`	the engine summary for targeted effort or another declared approximation	—
`failed`	the engine collector on source, adapter, or execution failure	—

CLI JSON and NDJSON summaries and MCP search responses serialize the same engine-owned lifecycle vocabulary. A sink may add only evidence it owns, such as MCP response truncation.

Result payload fields¶

SearchResult / FindResult : The default machine-readable result types for JSON and MCP collection. A streaming NDJSON sink may emit events incrementally, but it must finish with an equivalent lifecycle summary.

Minimum result payload fields:

schema_version: response schema version.
request: normalized query/request summary, excluding private text that is not already part of the user’s command input.
stats: counts for sources discovered, eligible, searched, skipped, cancelled, records seen, matches seen, records emitted, dedupe drops, elapsed time, and the active limit/page size.
page: result-window metadata with limit and emitted count, plus an opaque next_cursor only for tools that support continuation.
status: RunStatus plus optional reason, source/budget that caused truncation, cancellation point, and approximation notes.
diagnostics: privacy-safe warnings and errors, including unsupported pushdown, malformed stores, unavailable optional tools, timeout/cancellation, and source-level failures.
results: emitted records in sink-specific record models.

PageInfo : The result-window type. Search exposes only limit and emitted count and is deliberately cursorless. A tool that supports pagination may add next_cursor; it is opaque, stable only for the documented cursor lifetime, and must carry enough planner/execution state to resume without callers reconstructing source paths. Absence of next_cursor means there is no supported next-page request for that result payload.

Diagnostic : A privacy-safe warning or error record with a stable code, severity, message, optional source/store classifier, and optional remediation. Diagnostics must not include prompt text, raw argv, secret values, or local absolute paths.

RecordRef : An opaque handle for result drilldown. It identifies the emitted record or source-scoped record position through a stable, private representation chosen by agentgrep. Callers use the handle with an inspect/drilldown operation instead of building tool calls from local file paths, adapter ids, or record offsets. Source path, adapter id, and line/offset metadata may be included as display or debug metadata, but they are not the primary public drilldown input.

MCP, JSON, and NDJSON collectors must preserve these result fields by default. Collecting only RecordEmitted events and discarding started, progress, warning, cancellation, and finished events is not compliant with this ADR.

Execution rules¶

Discovery must be planned. A query that can be answered from source metadata must not construct record parsers. A prompt-only query must not discover conversation-only stores unless the requested scope requires them. A field predicate such as agent:grok or path:*session* must prune before record parsing whenever the adapter can prove the predicate from source metadata.

Search effort is an ordered, engine-owned read policy. ADR 0020 owns the prompt, targeted, and exhaustive effort guarantees, frontend normalization, status, and explicit escalation. ADR 0021 owns targeted candidate evidence, proof-bound locators, the conversation-attempt bound, and the no-backfill rule. This ADR supplies their shared planner, driver, matcher, collector, lifecycle, and coverage boundaries.

Planning must choose the cheapest correct adapter strategy. find remains a first-class fd/find-shaped source and storage discovery command; it may share planner, driver, pagination, diagnostics, and result collection internals with search, but it must not be replaced by a parallel source-listing API with different semantics.

Planning strategies include:

Direct metadata enumeration for find-shaped queries.
SQLite predicates for stores whose schema can answer them safely.
Path or source prefiltering before JSON/JSONL parsing.
Raw text prefiltering only when it preserves parser semantics. Literal JSONL prefilters compare both raw and JSON-escaped query terms, while keeping Unicode-escaped lines conservative so decoded text matches are not lost. Haystack JSONL prefilters may only run for adapters whose per-record text, role, model, title, and source path are available without cross-record context; source-path matches are treated as static terms so path-only matches cannot be filtered before decoding.
Bounded newest-first JSONL scans for limited append-only sources when record predicates do not require metadata that only appears earlier in the file.
Lazy source admission for bounded text-surface append-only JSONL root sources. These sources can skip eager whole-root text prefiltering because raw JSONL line checks and newest-first execution are cheaper than a separate root scan in the bounded path. Haystack searches keep eager root prefiltering for broad content terms, but must admit sources whose source path satisfies at least one query term — regardless of limit or adapter — because a content-only root prefilter cannot prove those path matches impossible. Other unbounded, unknown-order, and non-JSONL root sources keep the eager prefilter path.
Full Python parsing when the store format, query semantics, or privacy rules require it.

Optimizations interact with parser state along four axes: record order (reverse scans), line visibility (raw skip predicates), file admission (root and direct prefilters), and result reuse (the source scan cache fingerprint). An adapter may join an optimization set only when every emitted field is derivable from the record line plus the source path, or when the optimization carries an explicit exemption — header markers that bypass skip predicates and seed reverse scans, cache exemption for adapters that expand sibling files, and unconditional admission for stores whose searchable text is not greppable in place. The STATEFUL_HEADER_JSONL_ADAPTERS set names the parsers that carry leading-header state. Source ordering also assumes file mtime tracks record recency; restored backups or clock skew can violate that, which is accepted alongside the bounded-scan approximations below.

Execution must be cancellable and bounded. Drivers poll cancellation between source tasks and record batches. A task that declares bounded source behavior can stop before older records are parsed once the source-local candidate limit is satisfied. The stop condition counts source-locally deduplicated candidates; the frontier’s global cross-source dedup may later drop some of them, so when stores share dedupe keys a bounded search can return fewer than limit records even though deeper records exist — an accepted bounded-scan approximation. Source scans compile query matchers once per task so record loops do not rebuild term, regex, surface, or predicate state for each candidate record. The frontier driver can run eligible source tasks concurrently, merges candidates on the owner thread, and stops submitting lower-priority bounded sources once the global result limit is filled. The default frontier driver consumes whole-source results because profiling showed single-worker batch queueing was slower than the skip opportunity on local Claude/Codex JSONL stores. Incremental SourceScanBatch scheduling remains available behind driver configuration for experiments and future worker-count tuning. Bounded text-surface JSONL tasks keep the inline driver by default when profiling shows scheduler overhead is larger than skip opportunity; they may opt into frontier execution when a configured worker count makes source-level parallelism worthwhile. Profiling controls the default worker count because local JSONL parsing is often CPU-bound enough that unbounded worker fan-out hurts latency. Interactive CLI runs may map blank Enter to an answer-early request. The TUI maps Esc/Ctrl-C and replacement searches to the same cancellation path. MCP maps client cancellation or timeout to the same path when the framework exposes it.

The TUI must remain non-blocking. It may receive events on the event loop, but broad discovery, subprocess work, SQLite reads, JSON/JSONL parsing, ranking, and large result filtering must run through the execution driver. Event delivery uses bounded queues or backpressure so a fast parser cannot overwhelm rendering.

CLI output modes are sinks:

Rich/text progress for humans.
JSON for complete machine-readable result payloads.
NDJSON or equivalent streaming output for consumers that want events as they arrive.
Optional answer-early behavior in interactive terminals.

MCP tools are sinks over the same event stream. A tool must collect lifecycle events into result payloads that expose stats, page info, run status, diagnostics, emitted records, and opaque drilldown handles by default. The collection must happen through a non-blocking wrapper so the MCP server event loop is not blocked by local store scans. MCP collectors must consume started, progress, warning/cancellation when present, emitted-record, and finished events; collecting only emitted records hides truncation and is not compliant with this ADR.

Observability and benchmarks¶

The planner and executor must be easy to profile. Each run can emit:

query shape: scope, agent count, terms/predicate count, limit presence;
discovery counts by agent, store, adapter, and path kind;
planner decisions: predicates pushed down, sources pruned, direct paths chosen, root prefilters skipped, fallback reasons;
execution counts: sources started, submitted, completed, skipped, cancelled, batches yielded, records seen, matches seen, emitted records, dedupe drops, cancellation point;
timing spans: discovery, planning, per-source execution, output sink backpressure, subprocess families;
warning summaries: unsupported pushdown, malformed sources, unavailable optional tools.

Profiler and benchmark artifacts must keep their current privacy boundary: no prompt text, no raw command argv, no secret values, and no local absolute paths. They should keep schema_version and artifact_kind fields so future CI or issue artifacts can be distinguished from local evidence.

Deterministic counters belong in CI tests. Wall-clock profiling remains local evidence unless a fixture-only benchmark is explicitly designed for CI.

Native boundary¶

This ADR does not approve native code.

The architecture deliberately creates a future native boundary that would fit ADR 0003: Native boundary and execution architecture if measurement ever proves Python cannot resolve a user-visible bottleneck structurally. Any future native work must cross at a plan, batch, buffer, or protocol boundary. It must not cross per record, per JSON token, per callback, or per UI event.

The Python implementation remains the semantic source of truth. A native accelerator for a public Python API must follow ADR 0002: Pure Python/Rust accelerator module compatibility requirements; a native engine or worker must follow ADR 0003: Native boundary and execution architecture.

Consequences¶

Positive¶

Frontends can improve independently without changing search semantics.
The TUI can stay responsive during broad scans.
Profiling identifies whether time is spent in discovery, planning, collection, output backpressure, or a specific adapter strategy.
Planner tests can prove useless work is avoided without requiring large local history stores.
Future source-level parallelism or worker execution has a typed place to attach.

Tradeoffs¶

The backend will carry more internal types than a direct scan loop.
Adapters must describe capabilities honestly, not just expose parser functions.
Deterministic ordering and dedupe need explicit merge rules once execution becomes concurrent. Those rules are settled in ADR 0014: Result order, limit, and the streaming merge contract.
Sinks must handle events incrementally instead of assuming a completed list.

Risks¶

Planner overreach: an optimization could prune a source incorrectly. The mitigation is a reference inline driver, fixture-backed equivalence tests, and capability tests per adapter.

Concurrency nondeterminism: parallel source tasks can change output order. The mitigation is explicit merge rules in the physical plan and tests that compare inline and concurrent drivers.

Backpressure bugs: a streaming sink can either lag or block too much. The mitigation is bounded queues, cancellation tests, and profile spans for sink wait time.

Frontend leakage: CLI/TUI/MCP code can start making semantic decisions again. The mitigation is a strict ResultSink boundary: formatting code consumes events and never discovers stores or parses records.

Native shortcutting: future native work could bypass Python semantics. The mitigation is ADR 0002, ADR 0003, and this ADR’s plan/batch/protocol boundary.

Final position¶

agentgrep’s scalable shape is a typed, headless query system: discover, plan, execute, observe, and render are separate responsibilities. The first implementation target is still Python, but the structure must be ready for non-blocking TUI execution, fast CLI streaming, MCP collection, richer profiling, and future parallel or worker drivers without changing user-visible search semantics.