A search system for everything on your disk.
Drop in text, code, PDFs, images, audio, or video. AdaRag profiles each file, chunks it adaptively, embeds it with bge-m3, and answers questions with citations. When a citation is an image, the original pixels are attached to the prompt and the model sees what is actually there, not a caption of it.
Nothing leaves the machine. The host CLI bridge routes inference to claude-cli, gemini-cli, or Ollama, so a free Claude subscription pays for chat and Ollama handles the cheap enrichment.
// ingest
One pipeline. Six modalities.
A file's modality picks its preprocessor. A profiler picks its chunker. Each chunk is enriched in parallel, embedded into dense and sparse vectors from a single model, and upserted into a per-bucket Qdrant collection. If the subject is ambiguous (an unnamed photo, a document about an unnamed person), the model files clarifications for you to answer later.
- aprofilemodality + structure. code, paper, image, audio, video, prose.
- bpreprocessvision caption + OCR, whisper transcript, AST parse, or native text.
- cchunkcode AST, paper-section, or prose. picked from the profile.
- denrichsituating context + entity metadata, in parallel. images attached so visual context grounds in pixels.
- eembedbge-m3 dense + sparse from one model. GPU when available.
- findexqdrant named-vector upsert. postgres metadata. ambiguity flagged.
// answer
Retrieve, rerank, ground in the actual pixels.
A query rewrite tightens the search string. Qdrant runs dense and sparse prefetches and fuses them server-side with RRF, up to 20 candidates. The bge-reranker-v2-m3 cross-encoder scores the whole pool. A diversity guarantee makes sure every retrieved media modality lands its best chunk in the top-k. The prompt assembles citations; when one is an image or video, the original media is attached so vision-capable LLMs cite what is actually there. The answer streams as SSE deltas with 600 ms Postgres checkpoints, so a refresh, restart, or mid-stream crash never loses the buffer.
- arewritequery transform. tightens search without changing intent.
- bretrievedense + sparse prefetch. qdrant server-side RRF fusion. 20 candidates.
- crerankbge-reranker-v2-m3 cross-encoder over the whole pool.
- ddiversifyevery retrieved media modality earns one slot in the top-k.
- ebuildcited spans + original media attached for vision-capable LLMs.
- fstreamSSE deltas. 600 ms postgres checkpoints. real Stop kills the subprocess.
// why local
Some things should not be a SaaS.
Research notes. Source code. Private photos. The RAG that reads them should not be a billed monthly subscription with a privacy policy you skim. AdaRag owns its index, runs on one workstation, and routes inference through whichever local CLI you already pay for.
// stack
One job each.
-
embeddings
bge-m3dense + sparse from one model. 1024d. local on GPU when available.
-
vector store
Qdrantnamed dense + sparse vectors. server-side RRF prefetch fusion.
-
rerank
bge-reranker-v2-m3cross-encoder over the full candidate pool, every query.
-
metadata
Postgresdocuments, chunks, clarifications, chat jobs. async via asyncpg.
-
api
FastAPIPython 3.12. SQLAlchemy 2 async. background tasks for non-blocking ingest.
-
frontend
Next.js 16App Router + Turbopack. React 19. Tailwind v4. SSE via ReadableStream.
-
llm bridge
host CLIuvicorn on the host. proxies claude-cli, gemini-cli, ollama over HTTP.
-
optimizer
Optunamulti-objective. Pareto front of relevance vs latency over golden queries.
// made by
Hemang Choudhary.
B.Tech IT at MBM University Jodhpur, also reading Data Science and AI at IIT Madras. Built AdaRag because cloud RAG kept asking him to upload his disk to someone else's machine. Trained CycleGAN at DRDO before that. Lives on a 5080, a Sonnet subscription, and three jars of instant coffee.