Query language¶
agentgrep grep, agentgrep grep, and agentgrep find accept a
Lucene-style query language for inline field predicates, boolean
composition, and date ranges. The same syntax works across all three
subcommands; each interprets the predicates against its natural
record shape.
The query language is opt-in: a bare positional like
agentgrep grep bliss keeps the legacy fast path with zero
overhead. Detection is a single-character scan for : in the
positional tokens — if absent, the query module is never loaded.
Grammar¶
query := disjunction
disjunction := conjunction ("OR" conjunction)*
conjunction := negation ("AND"? negation)*
negation := ("NOT" | "-" | "+")? primary
primary := group | field-expr | term
group := "(" disjunction ")"
field-expr := IDENT ":" field-value
field-value := comparison | range | exact-value
comparison := (">" | "<" | ">=" | "<=") TERM
range := "[" TERM "TO" TERM "]" ; inclusive
| "{" TERM "TO" TERM "}" ; exclusive
exact-value := TERM
term := TERM
Implicit AND between bare terms is preserved: agentgrep grep foo bar
matches records containing both foo and bar. Explicit AND /
OR / NOT are case-insensitive and must be whole words.
The sigils - and + are shortcuts for NOT and “required”
respectively. + is currently a no-op (implicit AND already
requires every term); it’s accepted for rg compatibility.
Field registry¶
The default registry ships ten fields, split across two evaluation layers:
Source-level fields¶
These can be decided from a SourceHandle alone, so source-level
predicates prune sources before any file is opened.
Field |
Kind |
Notes |
|---|---|---|
|
enum |
One of |
|
string |
Substring against the source’s store name |
|
string |
Substring; alias |
|
path |
Glob (with |
|
date |
Source-file mtime; supports |
Record-level fields¶
These need the parsed record, so they filter after the source predicate has admitted the source.
Field |
Kind |
Notes |
|---|---|---|
|
enum |
One of |
|
date |
Record timestamp; supports comparison + range; alias |
|
string |
Substring against |
|
string |
Substring against |
|
string |
Substring; implicit field for bare positional terms |
Unknown field names error at parse time with a clean message listing the registered fields.
Date literals¶
The mtime and timestamp fields accept three forms:
ISO 8601:
2026-05-22,2026-05,2026,2026-05-22T14:30:00,2026-05-22T14:30:00Z,2026-05-22T14:30:00+02:00.Relative:
today,yesterday,tomorrow,Nd,Nw,Nm,Ny(with optional trailingago),N(d|w|m|y) from now. Month ≈ 30 days, year ≈ 365 days.Unbounded marker: the literal
*inside a range (field:[* TO 2026-05]).
Bare-day equality expands to a half-open 24-hour range; bare-month
to the calendar month; bare-year to the calendar year. Exact-time
literals (2026-05-22T14:30:00) match the precise instant.
Two-layer filtering¶
The compiler classifies each predicate into a source-layer pass and a record-layer pass. Source-layer predicates prune sources before any file is opened; record-layer predicates filter parsed records afterward.
For boolean composition:
AND of any layers: source-layer children prune; record-layer children filter. Each layer evaluates its own children.
OR of same-layer children: the OR runs cleanly at that layer.
OR mixing source-level and record-level: the source pass uses three-valued logic and conservatively lets the source through (the record pass decides). One OR-mixed query is the only perf cliff in the design.
NOT propagates per layer; a
NOTover a mixed subtree falls back to record-only evaluation, same as OR-mixed.
Examples¶
$ agentgrep grep agent:codex bliss
Records from codex matching “bliss”. Claude / cursor / gemini sources are never opened.
$ agentgrep grep '(agent:codex OR agent:cursor) AND deploy'
Records from either codex or cursor mentioning “deploy”. Claude / gemini are pruned at source level.
$ agentgrep grep '-agent:claude bliss'
Records from anyone except claude that mention “bliss”.
$ agentgrep grep 'timestamp:>2026-01-01 bliss'
Records after 2026-01-01 mentioning “bliss”. The timestamp filter runs at the record layer.
$ agentgrep grep 'timestamp:[2026-01 TO 2026-03] model:claude'
Records in Q1 2026 from any claude-* model.
$ agentgrep find path:~/.codex agent:codex
Codex-agent sources under ~/.codex/.
$ agentgrep grep agent:codex bliss
Grep over codex records for “bliss” — same line-aware output as
plain agentgrep grep bliss, but with the codex prefilter.
Flag / field collisions¶
agentgrep rejects ambiguous combinations of CLI flags and inline
field predicates:
$ agentgrep grep --agent codex agent:claude bliss
agentgrep grep: error: cannot combine --agent flag with agent: field predicate; pick one syntax
Currently checked: --agent × agent:, --type × type:. Other
flags don’t yet have query-field counterparts.
Performance¶
When the positionals contain no :, the query module is never
imported and zero work is added — the legacy fast path runs exactly
as before. When the syntax is used:
Parse + compile is sub-millisecond for typical queries.
Source pruning is O(predicates) per
SourceHandle. Pruning saves multiple seconds on multi-thousand-file trees when a single field rules out most sources.Record filtering runs in the existing per-record hot loop and short-circuits as soon as a child predicate fails. The net effect on records that pass is sub-5% overhead; rejected records save time vs. the legacy path because no haystack is built.
The one perf cliff is OR-mixed: an OR that straddles source- and record-level predicates can’t push down past the source-prune boundary. The compiler degrades safely (lets the source through; the record pass decides) — it just costs the file read.
Known limitations¶
Leading - on a field predicate¶
A field predicate that begins with a bare - (e.g.
-agent:claude as the negation shortcut for NOT agent:claude)
collides with argparse’s short-option collapse rule. The argv
token -agent:claude would otherwise parse as the combined short
options -a -g -e nt:claude because each leading character
matches a defined short flag, silently turning the user’s intent
into a totally different command.
agentgrep rejects this argv shape at parse time with a clear error and two workarounds:
$ agentgrep find -agent:claude
agentgrep: error: argument '-agent:claude' looks like a field
predicate but argparse parses the leading '-' as combined short
options. Use one of:
-- positional separator: agentgrep ... -- -agent:claude
keyword negation: agentgrep ... 'NOT agent:claude'
Pick the form that fits your scripting style. The NOT keyword
is the most readable; -- is the most surgical. Note that
shell-level quoting ('-agent:claude') does not help — the
shell strips quotes before argparse runs, so the quoted token
arrives at argparse identically to the unquoted form and the
pre-scan rejects both. Use NOT or --.
field: with no inline value¶
The query agent: bliss parses as a single
FieldEq(agent, "bliss") predicate, not as “missing value
followed by separate term bliss”. The tokenizer emits
ident("agent"), colon and the next term token becomes the
value. Defensible (the colon’s : separator is a contiguous
operator, the space after is just whitespace) but unintuitive
when typing.
If you want the bare term bliss plus a separate agent
predicate, write agent:codex bliss (filled-in value) or
bliss (no agent: predicate at all).