- JSON output: CLI and MCP results are now structured JSON, making Semble easier to use from agents and scripts.
- Better file control: use `.sembleignore` to exclude files or force-include custom extensions.
Release notes: https://t.co/H4rO6DTC4O
We’ve just released Semble v0.3.0!
Biggest changes:
- Automatic disk caching: First search builds the index; later searches are (much) faster.
- Search more than code: new `--content` flag for searching docs, config, code, or all of them together.
🧵
Today we're releasing Semble, a fast and accurate code search library built for agents 🤖!
We're also releasing potion-code-16M, a small code-specialized static embedding model that powers Semble.
🧵
Main features:
- Fast: indexes a full codebase in ~250 ms and answers queries in ~1.5 ms, all on CPU
- Accurate: on par with transformers
- MCP server: drop-in tool for Claude Code, Cursor, Codex, OpenCode, and any other MCP-compatible agent
- Zero setup: no API keys, no GPU
🧵
We also have a new blogpost on model size reduction where we showcase how to reduce model size by a factor of 15, creating a 6MB model (!) without impacting performance.
Links:
Release notes: https://t.co/wVsFOqQLT4
Blogpost: https://t.co/lPJFuCWvEo
Model2Vec 0.7.0 is out now, as well as a blogpost on model size reduction techniques!
This release features a large number of ways to improve the distillation process.
- Vocabulary quantization
- Configurable pooling
- A number of small improvements and bugfixes
🧵
@casper_hansen_ Thanks for the feedback (and for using SemHash)! That's a good idea, we can add something to our readme. There's also a HF space where you can use it directly on the hub: https://t.co/FFvqw29Qpz
We have a new website (and name): https://t.co/4Zdi049lCy
We’ve been working on an improved website for a while, and it’s finally here. It has documentation for all our packages as well as our blog. More things coming soon! 🚀
Some guy forked our "model2vec-rs" crate, and put it under the "model2vec" name on crates io and then didn't tell us about it. See here: https://t.co/4IZHlJB5wV
Like what's the goal here except name squatting.
- Smaller tokenizers: all tokenizers are now 40% smaller, at no cost to anyone.
A blog post with experimental results is coming in the next couple of days.
We just released model2vec 0.6.0!
This is a big release, containing many big improvements 🔥
GitHub release: https://t.co/qj3aQCGfcg
PyPi release: https://t.co/dtMlfACXzR
🧵
- Model improvements: nearly all distilled models will perform better, especially in STS and clustering tasks. A while ago we published a blog post on modernbert not working, but we now found out why, and fixed it!
🧵
@tomaarsen Thanks for sharing our deduplication space! For those who are interested in applying this in their own workflows, this is powered by SemHash: https://t.co/wqmBc33kzy
@ben_burtenshaw Thanks for sharing our deduplication space and adding some shiny new features! For those who are interested in applying this in their own workflows, this is powered by SemHash: https://t.co/wqmBc33kzy