83 points by tmaly 16 hours ago | 29 comments

autogn0me 2 minutes ago [-]

https://github.com/ggozad/haiku.rag/ - the embedded lancedb is convenient and has benchmarks; uses docling. qwen3-embedding:4b, 2560 w/ gpt-oss:20b.

tebeka 2 minutes ago [-]

https://duckdb.org/2024/05/03/vector-similarity-search-vss

CuriouslyC 2 hours ago [-]

Don't use a vector database for code, embeddings are slow and bad for code. Code likes bm25+trigram, that gets better results while keeping search responses snappy.

rao-v 2 minutes ago [-]

Anybody know of a good service / docker that will do BM25 + vector lookup without spinning up half a dozen microservices?

postalcoder 58 minutes ago [-]

I agree. Someone here posted a drop-in for grep that added the ability to do hybrid text/vector search but the constant need to re-index files was annoying and a drag. Moreover, vector search can add a ton of noise if the model isn't meant for code search and if you're not using a re-ranker.

For all intents and purposes, running gpt-oss 20B in a while loop with access to ripgrep works pretty dang well. gpt-oss is a tool calling god compared to everything else i've tried, and fast.

ehsanu1 29 minutes ago [-]

I've gotten great results applying it to file paths + signatures. Even better if you also fuse those results with BM25.

itake 1 hours ago [-]

With AI needing more access to documentation, WDYT about using RAG for documentation retrieval?

lee1012 2 hours ago [-]

static embedding models im finding quite fast lee101/gobed https://github.com/lee101/gobed is 1ms on gpu :) would need to be trained for code though the bigger code llm embeddings can be high quality too so its just yea about where is ideal on the pareto fronteir really , often yea though your right it tends to be bm25 or rg even for code but yea more complex solutions are kind of possible too if its really important the search is high quality

cbcoutinho 25 minutes ago [-]

The Nextcloud MCP Server [0] supports Qdrant as a vectordb to store embeddings and provide semantic search across your personal documents. This enables any LLM & MCP client (e.g. claude code) into a RAG system that you can use to chat with your files.

For local deployments, Qdrant supports storing embeddings in memory as well as in a local directory (similar to sqlite) - for larger deployments Qdrant supports running as a standalone service/sidecar and can be made available over the network.

[0] https://github.com/cbcoutinho/nextcloud-mcp-server

lormayna 8 minutes ago [-]

I have done some experiments with nomic embedding through Ollama and ChromaDB.

Works well, but I didn't tested on larger scale

ehsanu1 30 minutes ago [-]

Embedded usearch vector database. https://github.com/unum-cloud/USearch

dvorka 27 minutes ago [-]

Any suggestion what to use as embeddings model runtime and semantic search in C++?

init0 2 hours ago [-]

I built a lib for myself https://pypi.org/project/piragi/

stingraycharles 1 hours ago [-]

That looks great! Is there a way to store / cache the embeddings?

rahimnathwani 15 hours ago [-]

If your data aren't too large, you can use faiss-cpu and pickle

https://pypi.org/project/faiss-cpu/

notyourwork 2 hours ago [-]

For the uneducated, how large is too large? Curious.

itake 1 hours ago [-]

FAISS runs in RAM. If your dataset can't fit into ram, FAISS is not the right tool.

1 hours ago [-]

lee1012 2 hours ago [-]

lee101/gobed https://github.com/lee101/gobed static embedding models so they are embedded in milliseconds and on gpu search with a cagra style on gpu index with a few things for speed like int8 quantization on the embeddings and fused embedding and search in the same kernel as the embedding really is just a trained map of embeddings per token/averaging

eajr 15 hours ago [-]

Local LibreChat which bundles a vector db for docs.

jeanloolz 1 hours ago [-]

Sqlite-vec

motakuk 14 hours ago [-]

LightRAG, Archestra as a UI with LightRAG mcp

nineteen999 11 hours ago [-]

A little BM25 can get you quite a way with an LLM.

jeffchuber 3 hours ago [-]

try out chroma or better yet as opus to!

whattheheckheck 15 hours ago [-]

Anythingllm is promising

pdyc 2 hours ago [-]

sqlite's bm25

electroglyph 2 hours ago [-]

simple lil setup with qdrant

ramesh31 11 hours ago [-]

SQLite with FTS5

undergrowth 2 hours ago [-]

undergrowth.io

undergrowth 2 hours ago [-]

Undergrowth.io

lee101 2 hours ago [-]

[dead]

Loading comments...

autogn0me 2 minutes ago [-]

https://github.com/ggozad/haiku.rag/ - the embedded lancedb is convenient and has benchmarks; uses docling. qwen3-embedding:4b, 2560 w/ gpt-oss:20b.

tebeka 2 minutes ago [-]

https://duckdb.org/2024/05/03/vector-similarity-search-vss

CuriouslyC 2 hours ago [-]

Don't use a vector database for code, embeddings are slow and bad for code. Code likes bm25+trigram, that gets better results while keeping search responses snappy.

rao-v 2 minutes ago [-]

Anybody know of a good service / docker that will do BM25 + vector lookup without spinning up half a dozen microservices?

postalcoder 58 minutes ago [-]

For all intents and purposes, running gpt-oss 20B in a while loop with access to ripgrep works pretty dang well. gpt-oss is a tool calling god compared to everything else i've tried, and fast.

ehsanu1 29 minutes ago [-]

I've gotten great results applying it to file paths + signatures. Even better if you also fuse those results with BM25.

itake 1 hours ago [-]

With AI needing more access to documentation, WDYT about using RAG for documentation retrieval?

lee1012 2 hours ago [-]

cbcoutinho 25 minutes ago [-]

[0] https://github.com/cbcoutinho/nextcloud-mcp-server

lormayna 8 minutes ago [-]

I have done some experiments with nomic embedding through Ollama and ChromaDB.

Works well, but I didn't tested on larger scale

ehsanu1 30 minutes ago [-]

Embedded usearch vector database. https://github.com/unum-cloud/USearch

dvorka 27 minutes ago [-]

Any suggestion what to use as embeddings model runtime and semantic search in C++?

init0 2 hours ago [-]

I built a lib for myself https://pypi.org/project/piragi/

stingraycharles 1 hours ago [-]

That looks great! Is there a way to store / cache the embeddings?

rahimnathwani 15 hours ago [-]

If your data aren't too large, you can use faiss-cpu and pickle

https://pypi.org/project/faiss-cpu/

notyourwork 2 hours ago [-]

For the uneducated, how large is too large? Curious.

itake 1 hours ago [-]

FAISS runs in RAM. If your dataset can't fit into ram, FAISS is not the right tool.

1 hours ago [-]

lee1012 2 hours ago [-]

eajr 15 hours ago [-]

Local LibreChat which bundles a vector db for docs.

jeanloolz 1 hours ago [-]

Sqlite-vec

motakuk 14 hours ago [-]

LightRAG, Archestra as a UI with LightRAG mcp

nineteen999 11 hours ago [-]

A little BM25 can get you quite a way with an LLM.

jeffchuber 3 hours ago [-]

try out chroma or better yet as opus to!

whattheheckheck 15 hours ago [-]

Anythingllm is promising

pdyc 2 hours ago [-]

sqlite's bm25

electroglyph 2 hours ago [-]

simple lil setup with qdrant

ramesh31 11 hours ago [-]

SQLite with FTS5

undergrowth 2 hours ago [-]

undergrowth.io

undergrowth 2 hours ago [-]

Undergrowth.io

lee101 2 hours ago [-]

[dead]