Storage catalogue¶
agentgrep keeps an explicit catalogue of every on-disk store it knows
about, modelled as Pydantic
StoreDescriptor rows aggregated under one
StoreCatalog. The catalogue is descriptive:
it documents where each agent’s data lives and what the records look
like. Search-policy decisions — whether agentgrep actually opens a
particular store by default — are captured per-row and may be deferred
(search_by_default=None) when no adapter consumes them yet.
The catalogue is the single source of truth that downstream adapters
consume. When upstream renames a path or changes a record shape, the
fix is to update one
StoreDescriptor and bump the
catalogue version; adapters pick the new metadata up automatically.
Why a catalogue¶
Three reasons we did not bake paths into the adapters:
Provenance. Each row carries an
observed_versionandobserved_atstamp. A reader can tell at a glance whether the schema notes are still current or stale.Drift. Codex renames
history.jsonl, Cursor adds a CLI agent layout, Gemini reorganises itstmp/tree. With paths catalogued centrally, those changes diff cleanly in code review.Overlap. Several stores live in adjacent paths but play different roles — Codex
history.jsonl(user prompts only) vs.sessions/*.jsonl(full per-thread transcripts); Geminitmp/<hash>/chats/(live) vs.history/<timestamp>/(post-retention archive). Thedistinguishes_fromfield on each descriptor names the sibling and explains the difference.
Reading a descriptor¶
from agentgrep.store_catalog import CATALOG
claude_session = CATALOG.by_id("claude.projects.session")
claude_session.path_pattern
# '${HOME}/.claude/projects/<encoded_project>/<session_uuid>.jsonl'
Path patterns use ${HOME} and ${<ENV>} tokens; resolving them
against a concrete environment is the consumer’s job, so the catalogue
stays portable. env_overrides lists the env vars that change the
root (Codex respects CODEX_HOME; Gemini respects GEMINI_CLI_HOME).
Stores by agent¶
Claude Code¶
observed_version: claude-code v2.1.143 (2026-05-15).
Claude’s primary chat record lives at
${HOME}/.claude/projects/<encoded_project>/<session_uuid>.jsonl. The
file format is JSONL with multiple record types per line —
type: "user", type: "assistant", type: "attachment",
type: "permission-mode". Sub-agent dispatches nest under
<session_uuid>/subagents/. The auto-memory feature stores markdown
notes under <encoded_project>/memory/.
Cursor¶
Two distinct surfaces, both catalogued and both searched:
Cursor CLI agent (
cursor-agent): transcripts live at${HOME}/.cursor/projects/<id>/agent-transcripts/<session_uuid>/<session_uuid>.jsonland are parsed bycursor.cli_jsonl.v1. Records are Anthropic-style{role, message.content[]}withtextandtool_usecontent blocks; tool outputs are sometimes[REDACTED]in oldercursor-agentbuilds. There is no native per-turn timestamp, so agentgrep backfills the file’s mtime.Cursor IDE: parsed by
cursor.state_vscdb_modern.v1/cursor.state_vscdb_legacy.v1viastate.vscdb(SQLite). The catalogue keeps the IDE path separate from the CLI agent so the two never collide.
cursor.cli.worktrees is catalogued explicitly with
role=SOURCE_TREE and search_by_default=False so the adapter
does not index multi-gigabyte git working trees as chat history.
Codex¶
observed_version: github.com/openai/codex@4c89772 (2026-05-16).
Schemas are pinned directly to the upstream Rust types:
JSONLhistory.jsonl→HistoryEntry { session_id: String, ts: u64, text: String }(codex-rs/message-history/src/lib.rs:54-58).Per-thread
sessions/YYYY/MM/DD/rollout-…jsonl→ tagged enumRolloutItemwith variantsSessionMeta,ResponseItem,Compacted,TurnContext,EventMsg(codex-rs/protocol/src/protocol.rs:2783).
The two _N.sqlite files at the Codex root — state_5.sqlite and
logs_2.sqlite — belong to the Codex CLI. Their filenames come from
STATE_DB_FILENAME and LOGS_DB_FILENAME in
codex-rs/state/src/lib.rs.
Gemini CLI¶
observed_version: gemini-cli v0.42.0 stable (2026-05-12); types
from v0.44.0-nightly HEAD 77e65c0d. Three adapters cover the
three on-disk shapes:
gemini.tmp_chats_jsonl.v1parsestmp/<project_hash>/chats/session-*.jsonl. Each file opens with aSessionMetadataRecord(sessionId,projectHash,startTime,lastUpdated,kind); subsequent lines areMessageRecordturns interleaved withMetadataUpdateRecordupdates ({$set: {…}}). Real files surfacetypevaluesuserandgemini; upstream types also declareinfo/error/warningplusRewindRecordandPartialMetadataRecord, but those records did not appear in sampling.gemini-typed turns whosecontentis empty have their searchable text drawn fromthoughts[*].subject/descriptionandtoolCalls[*].name/description, joined into one record per turn.gemini.tmp_chats_legacy_json.v1parses pre-Feb 2026tmp/<project_hash>/chats/session-*.jsonsingle-file sessions. Upstream still reads this shape via theisLegacyRecorddiscriminator atchatRecordingService.ts:941; the legacy file holds session metadata at the top level and the full conversation under amessagesarray.gemini.tmp_logs_json.v1parsestmp/<project_hash>/logs.json— a flat JSON array ofLogEntryrecords (user-prompt audit log).
Gemini’s
sessionCleanup.ts
hard-deletes expired sessions via fs.unlink() — there is no
history/ archive. The Antigravity files some installs carry under
~/.gemini/antigravity/conversations/ are written by the
Antigravity IDE,
a separate Google product — Gemini CLI only detects Antigravity as
an IDE launcher and does not read or write the protobuf
conversation files. Both stores are out of scope for the Gemini
adapters.
The project_hash is sha256(absolute_project_root). agentgrep
exposes a Python mirror via
gemini_project_hash() so the CLI can
answer “which Gemini sessions belong to this repo?”.
Adding or updating a store¶
Edit
src/agentgrep/store_catalog.py. Stampobserved_versionandobserved_atagainst the version you actually inspected.Add an
upstream_ref(preferred) or asample_recordso future readers can verify the schema.If the new store overlaps a sibling, name it in
distinguishes_fromand explain the difference inschema_notes.Capture a redacted fixture under
tests/samples/<agent>/<store_id>/.Bump
catalog_versionin the same commit that changes descriptor shape.Run
uv run pytest tests/test_stores.py.
See also¶
agentgrep.stores— model definitionsagentgrep.store_catalog— concrete registry