local-only | runs on one workstation | data never leaves disk build 2026.06.07

A search system for everything on your disk.

Drop in text, code, PDFs, images, audio, or video. AdaRag profiles each file, chunks it adaptively, embeds it with bge-m3, and answers questions with citations. When a citation is an image, the original pixels are attached to the prompt and the model sees what is actually there, not a caption of it.

Nothing leaves the machine. The host CLI bridge routes inference to claude-cli, gemini-cli, or Ollama, so a free Claude subscription pays for chat and Ollama handles the cheap enrichment.


// ingest

One pipeline. Six modalities.

A file's modality picks its preprocessor. A profiler picks its chunker. Each chunk is enriched in parallel, embedded into dense and sparse vectors from a single model, and upserted into a per-bucket Qdrant collection. If the subject is ambiguous (an unnamed photo, a document about an unnamed person), the model files clarifications for you to answer later.

  1. aprofilemodality + structure. code, paper, image, audio, video, prose.
  2. bpreprocessvision caption + OCR, whisper transcript, AST parse, or native text.
  3. cchunkcode AST, paper-section, or prose. picked from the profile.
  4. denrichsituating context + entity metadata, in parallel. images attached so visual context grounds in pixels.
  5. eembedbge-m3 dense + sparse from one model. GPU when available.
  6. findexqdrant named-vector upsert. postgres metadata. ambiguity flagged.
collection adarag_chunks_<bucket>   ·   dense 1024d bge-m3   ·   sparse bge-m3 sparse head

// answer

Retrieve, rerank, ground in the actual pixels.

A query rewrite tightens the search string. Qdrant runs dense and sparse prefetches and fuses them server-side with RRF, up to 20 candidates. The bge-reranker-v2-m3 cross-encoder scores the whole pool. A diversity guarantee makes sure every retrieved media modality lands its best chunk in the top-k. The prompt assembles citations; when one is an image or video, the original media is attached so vision-capable LLMs cite what is actually there. The answer streams as SSE deltas with 600 ms Postgres checkpoints, so a refresh, restart, or mid-stream crash never loses the buffer.

  1. arewritequery transform. tightens search without changing intent.
  2. bretrievedense + sparse prefetch. qdrant server-side RRF fusion. 20 candidates.
  3. crerankbge-reranker-v2-m3 cross-encoder over the whole pool.
  4. ddiversifyevery retrieved media modality earns one slot in the top-k.
  5. ebuildcited spans + original media attached for vision-capable LLMs.
  6. fstreamSSE deltas. 600 ms postgres checkpoints. real Stop kills the subprocess.
scope strict · medium · lazy   ·   agent tool cards inline   ·   resume survives refresh + crash

// why local

Some things should not be a SaaS.

Research notes. Source code. Private photos. The RAG that reads them should not be a billed monthly subscription with a privacy policy you skim. AdaRag owns its index, runs on one workstation, and routes inference through whichever local CLI you already pay for.

cloud RAG AdaRag
data uploaded to a vendor data never leaves your disk
$20-$200 per seat per month $0 above what you pay your LLM
opaque retrieval, vendor-tuned every knob exposed, swept by Optuna
vision answers run on their model claude-cli, gemini-cli, ollama. your pick.
"how does it index PDFs?" support ticket ~3,000 lines of code. read it.

// stack

One job each.


// made by

Hemang Choudhary.

B.Tech IT at MBM University Jodhpur, also reading Data Science and AI at IIT Madras. Built AdaRag because cloud RAG kept asking him to upload his disk to someone else's machine. Trained CycleGAN at DRDO before that. Lives on a 5080, a Sonnet subscription, and three jars of instant coffee.