vivis.dev @vivis_dev - Twitter Profile

over 1 year ago

When building ColBERT, I assumed it will pave the way for hypernetwork-based, pruning-capable retrieval indexes. Let me explain. The big insight in ColBERT is that we can encode each document upfront *not* into a vector, but into a rich scoring function, f: query -> float, which simultaneously supports pruning, so you can skip most computation. In v1/v2, the choice of function was "a matrix + MaxSim". It showed that at inference time, we could do a lot better than dot products. But in the future, the function could also be a small DNN constructed out of each document! The encoder is then a hypernetwork producing functions f with the same query -> float signature, allowing each document to decide its strategy for deciding if a query is relevant to it. How to do this while allowing pruning (so that retrieval is sub-linear at scale) is a rich question you can steal if you're doing NLP systems or IR.

12

336

34

272

62K

vivis_dev retweeted

KKY

@evilpsycho42

29 days ago

You are right @badlogicgames I copied codex exec_command and write_stdin into Pi Agent. Then compared its performance to the plain bash tool. The result supprised me. Async bash almost lost in every task.

evilpsycho42's tweet photo. You are right @badlogicgames I copied codex exec_command and write_stdin into Pi Agent.

Then compared its performance to the plain bash tool. The result supprised me. Async bash almost lost in every task. https://t.co/LzzJjjqYmD

2

93

4

96

14K

vivis.dev

@vivis_dev

about 1 month ago

@deedydas Would love to know if the results change using different agents. They only tried using mini-SWE-agent. @lateinteraction - wonder if dspy.RLM could have a crack at this.

0

2K

vivis_dev retweeted

Aksel

@akseljoonas

about 2 months ago

Introducing ml-intern, the agent that just automated the post-training team @huggingface It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem. It can pull off crazy things: We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%. In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%. For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on https://t.co/udm7xGpNzR, watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously. How it works? ml-intern makes full use of the HF ecosystem: - finds papers on arxiv and https://t.co/brvCC7fLPa, reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on https://t.co/hrJuRkRyzi - browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data - launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like. Releasing it today as a CLI and a web app you can use from your phone/desktop. CLI: https://t.co/l3K1PslZ1n Web + mobile: https://t.co/orko5srL4H And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.

138

5K

642

6K

1M

vivis_dev retweeted

Jiazhi Yang @jiazhi_yang2024

about 2 months ago

🌏 RISE is now open-sourced! https://t.co/pH82Dpd3si

0

170

27

135

23K

vivis_dev retweeted

Antoine Chaffin

@antoine_chaffin

about 2 months ago

The new generation of open state-of-the-art single and multi-vector retrieval models is here It's time, DenseOn with the LateOn 🎶 @LightOnIO releases models that leap past existing ones, and everything you need to do the same!

antoine_chaffin's tweet photo. The new generation of open state-of-the-art single and multi-vector retrieval models is here

It's time, DenseOn with the LateOn 🎶

@LightOnIO releases models that leap past existing ones, and everything you need to do the same! https://t.co/B96cNdqn7b

13

224

52

104

40K

vivis_dev retweeted

Zain Shah

@zan2434

about 2 months ago

Imagine every pixel on your screen, streamed live directly from a model. No HTML, no layout engine, no code. Just exactly what you want to see. @eddiejiao_obj, @drewocarr and I built a prototype to see how this could actually work, and set out to make it real. We're calling it Flipbook. (1/5)

1K

29K

4K

25K

6M

vivis.dev

@vivis_dev

about 2 months ago

@thepericulum Agreed, the browser visual editor is so handy for UI changes - https://t.co/uXVf2qn8uD Don't see anything similar in the Claude universe

0

2

0

301

vivis_dev retweeted

Physical Intelligence

@physical_int

about 2 months ago

Our newest model, π0.7, has some interesting emergent capabilities: it can control a new robot to fold shirts for which we had no shirt folding data, figure out how to use an appliance with language-based coaching, and perform a wide range of dexterous tasks all in one model!

59

3K

313

789

450K

vivis.dev

@vivis_dev

about 2 months ago

@theo I never left

0

9

vivis.dev

@vivis_dev

about 2 months ago

Measuring tokens/d is a good signal to weed out slop factories vs. companies that actually ship quality products.

Steve Yegge

@Steve_Yegge

about 2 months ago

I'm not trying to misrepresent anyone, and perhaps my Googler friends are misinformed. But I strongly suspect that by my own notions of what constitutes advanced AI adoption--and indeed, what most of the industry would expect from Google right now--you are not doing great. At Anthropic, which is basically the bar at this point, everyone is burning, I'd guess, 10M to 15M tokens a day. If Google can convince me that half their engineers are burning 4M tokens a day, then I'd be happy to post a retraction with an apology.

119

366

7

75

196K

0

17

vivis.dev

@vivis_dev

about 2 months ago

@jsnnsa 9/7

0

412

vivis.dev

@vivis_dev

2 months ago

@NotNordgaren @grok how could many years of fuzzing miss something like this?

1

0

126

vivis.dev

@vivis_dev

2 months ago

@marmaduke091 I think they do this to save money, but honestly not sure how it passes review

0

408

vivis.dev

@vivis_dev

2 months ago

@DavidGFar This is awesome. How far can you take this? Are we at a point where you could train on the Hermes agent traces (https://t.co/srVlfcSdyZ) to get a lightning fast routing head for an agent to select the right tool?

1

3

0

2

416

vivis.dev

@vivis_dev

2 months ago

@BatsouElef https://t.co/eHEkP8Bl4N I built a newsfeed for Substack that shows only long-form posts from the last 24 hours. Already discovering way better writers.

1

0

19

vivis.dev

@vivis_dev

2 months ago

"It seems to me that there will quickly reach a point where we can treat computers in much the same manner as we treat fellow humans, without ever assuming that they are human or should be. For instance, I think it not unreasonable to ask a computer to understand me (maybe someday in natural language), to cooperate with me, to take some initiative on its own, and to make life simpler for me. It is reasonable for the computer to not understand occasionally, and to need clarification, or even for it to screw up and do as I said, and not what I meant." - The Mind's I - Jan 21 1983 usenet

0

41

vivis.dev

@vivis_dev

2 months ago

@aidenybai Yep, and they produce completely different results for different models.

0

44

1

4

2K

vivis.dev

@vivis_dev

2 months ago

@caprikaps @venturetwins Seriously dude? This was 100% written by AI

0

41

vivis.dev

@vivis_dev

2 months ago

@ThePrimeagen I'm building https://t.co/QNCN8mkFlA A newsfeed for Substack that shows only long-form posts from the last 24 hours. Already discovering way better writers.

0

185

vivis.dev

@vivis_dev

Last Seen Users on Sotwe

Trends for you

Most Popular Users