Michael Feil @feilsystem - Twitter Profile

24 days ago

Biotech R&D is generating more scientific AI models than ever, from protein structure prediction to molecular docking to sequence analysis. But the infrastructure to run them hasn't kept up. Today we're announcing Benchling Inference, powered by Baseten. Together with @benchling, we're delivering on-demand GPU capacity built for the bursty, high-stakes demands of scientific workloads. With Benchling Inference, scientists can: → Deploy models in seconds, not weeks → Keep proprietary models inside their VPC if needed → Benefit from economics that work even at small and mid-size biotech scale Benchling and Baseten decided to team up because we believe that research teams shouldn't have to manage HPC queues, negotiate cloud contracts, or become GPU experts to run frontier models on their own data. Six years of inference expertise are now available where science happens. Read more here: https://t.co/vqmtnXnAT1

baseten's tweet photo. Biotech R&D is generating more scientific AI models than ever, from protein structure prediction to molecular docking to sequence analysis. But the infrastructure to run them hasn't kept up.

Today we're announcing Benchling Inference, powered by Baseten. Together with @benchling, we're delivering on-demand GPU capacity built for the bursty, high-stakes demands of scientific workloads. With Benchling Inference, scientists can:

→ Deploy models in seconds, not weeks
→ Keep proprietary models inside their VPC if needed
→ Benefit from economics that work even at small and mid-size biotech scale

Benchling and Baseten decided to team up because we believe that research teams shouldn't have to manage HPC queues, negotiate cloud contracts, or become GPU experts to run frontier models on their own data.

Six years of inference expertise are now available where science happens.

Read more here: https://t.co/vqmtnXnAT1

1

33

10

2K

feilsystem retweeted

Charlie O'Neill

@oneill_c

about 1 month ago

https://t.co/ERAhd412li

34

554

69

806

131K

Michael Feil

@feilsystem

about 2 months ago

@zeddotdev Vscode tunnel ability, most my environments cannot connect via SSH.

0

41

Michael Feil

@feilsystem

about 2 months ago

@bluequbit Pretty cool! Index select will cost ~1% overhead. I found if you are using decent kernels (e.g. https://t.co/ikew6Jt23L or torch.index_select on recent torch versions), it works well.

0

1

0

11

Michael Feil

@feilsystem

about 2 months ago

https://t.co/OX1jK4Qh5Y

2

32

0

21

14K

feilsystem retweeted

Jeff Huber

@jeffreyhuber

about 2 months ago

OpenAI is shutting down text-embedding-3-small?!? I strongly believe that if you shut down a closed-source embedding model that you should open-source. Imaging the trillions of tokens that will no longer be queryable. cc @romainhuet

jeffreyhuber's tweet photo. OpenAI is shutting down text-embedding-3-small?!?

I strongly believe that if you shut down a closed-source embedding model that you should open-source. Imaging the trillions of tokens that will no longer be queryable.

cc @romainhuet https://t.co/NiL8ZyqeHW

65

2K

63

246

386K

Michael Feil

@feilsystem

about 2 months ago

lets go baseten -- rooting for you.

Kimi.ai @Kimi_Moonshot

about 2 months ago

We are excited to have @baseten as a day 0 launch partner for Kimi K2.6! Their inference stack brings KV-aware routing, NVFP4 on Blackwell, multi-modal hierarchical caching, and prefill-decode disaggregation, so K2.6 runs the way it's meant to in production. Try it out at: https://t.co/ol3lIkaH6m

13

925

41

146

100K

0

7

0

542

feilsystem retweeted

Charlie O'Neill

@oneill_c

2 months ago

why join a dense team when you can join an moe team @baseten

3

114

2

17

15K

Michael Feil

@feilsystem

2 months ago

Named Entity Recognition is a core workload used by Search and Healthcare companies to filter queries and anonymize queries. We shipped the fastest inference on the market: 1 ms P50 and 3 ms P99 server-side latency, 7.7x faster than an optimized PyTorch baseline, fixing several bottenecks: HTTP parsing, networking, load-balancing.

Michael Feil

@feilsystem

2 months ago

https://t.co/SRkXdBgWyZ

4

65

7

40

20K

0

9

0

1

314

Michael Feil

@feilsystem

2 months ago

https://t.co/SRkXdBgWyZ

4

65

7

40

20K

feilsystem retweeted

OpenEvidence

@EvidenceOpen

2 months ago

Over 1 million clinical questions hit OpenEvidence every day. More than half the practicing physicians in the US rely on us at the point of care, mid-decision, with a patient in front of them. Downtime in that moment has real consequences. We partner with @baseten for our inference infrastructure to make sure answers are always there when physicians need them. They stopped by our office to talk about what that looks like under the hood.

6

88

17

34

107K

Michael Feil

@feilsystem

3 months ago

@art_zucker Making the token+position a u64 is a good idea for lookups, e.g. I did this also a couple of times. https://t.co/A9YSOTpVCc I cross compiled the package from the blog post, so `pip install fastokens-b10` is a thing.

0

1

0

86

Michael Feil

@feilsystem

3 months ago

Really good blog by the dynamo and crusoe team. https://t.co/0DwEXFdoP0. tl;dr: Ported some of the learnings back to hf/tokenizers. https://t.co/5ks163ScKX https://t.co/im9XSO0aaJ at the scale of transformers, will save probably save M$ if done right. @art_zucker.

2

8

1

574

feilsystem retweeted

Amir Haghighat

@amiruci

4 months ago

You’ve used language models, image models, video models, and voice models. Now it’s time for world models, thanks to World Labs.

34

204

30

251

839K

Michael Feil

@feilsystem

4 months ago

As result current engines wastes around 5 to 500% in prefill performance during inference and training, when using shared prefixes. implementation: https://t.co/53lC6Wl4kd paper: https://t.co/jMVTMwgskl

0

2

0

74

Michael Feil

@feilsystem

4 months ago

tldr: We open-source a inference engine that deduplicates prefill tokens and wrote a paper (@juliuslipp). RadixMLP was missed chance by the community that developed varlen (THD-packed) inference, and overlooked by people working on training and inference engines. [1/x]

Baseten

@baseten

4 months ago

Introducing RadixMLP: intra-batch prefix deduplication for 1.4–5x faster prefill. Tokens with identical prefixes (like system prompts or shared queries) produce identical activations. @feilsystem developed RadixMLP to eliminate this redundancy, then open-sourced it and added it to TEI and BEI. https://t.co/LFBJ2RsVzp

baseten's tweet photo. Introducing RadixMLP: intra-batch prefix deduplication for 1.4–5x faster prefill.

Tokens with identical prefixes (like system prompts or shared queries) produce identical activations. @feilsystem developed RadixMLP to eliminate this redundancy, then open-sourced it and added it to TEI and BEI.

https://t.co/LFBJ2RsVzp

0

26

3

19

3K

1

11

1

769

Michael Feil

@feilsystem

4 months ago

Turns out that all engines do just prefill multiple requests, at the same time, even when prefixes are shared. KV-style caching for training systems is possible, it just needs to look different to a vllm-style paged kv-cache. [2/x]

1

0

74

feilsystem retweeted

Baseten

@baseten

4 months ago

Introducing Kimi K2.5 on Baseten’s Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis. Even among a landscape of incredible open source models, Kimi K2.5 stands out with its multi-modal capabilities and it's ability to accommodate an alarmingly large number of tool calls. Get the good stuff here: https://t.co/X1yWULgvjM

baseten's tweet photo. Introducing Kimi K2.5 on Baseten’s Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis.

Even among a landscape of incredible open source models, Kimi K2.5 stands out with its multi-modal capabilities and it's ability to accommodate an alarmingly large number of tool calls.

Get the good stuff here: https://t.co/X1yWULgvjM

11

98

8

21

15K

feilsystem retweeted

Cursor @cursor_ai

4 months ago

Composer 1.5 is now available. We’ve found it to strike a strong balance between intelligence and speed.

155

2K

183

242

665K

feilsystem retweeted

Baseten

@baseten

6 months ago

If you need an adrenaline rush to wake up from your post-Thanksgiving stupor… we got you. @deepseek_ai V3.2 dropped this week and is now available on Baseten. It’s so smart your mother will ask why you can't be more like DeepSeek. V3.2 is currently on par with GPT-5 all whilst being multiples cheaper. V3.2 is now live on our Model APIs and on @openrouter and @ArtificialAnlys. Baseten is the fastest provider with 0.22 TTFT and 191 tps (that’s 1.5x faster than the next guy). For a model this size, it’s screaming. Get the brains, without trading off performance.

baseten's tweet photo. If you need an adrenaline rush to wake up from your post-Thanksgiving stupor… we got you.

@deepseek_ai V3.2 dropped this week and is now available on Baseten. It’s so smart your mother will ask why you can't be more like DeepSeek. V3.2 is currently on par with GPT-5 all whilst being multiples cheaper.

V3.2 is now live on our Model APIs and on @openrouter and @ArtificialAnlys. Baseten is the fastest provider with 0.22 TTFT and 191 tps (that’s 1.5x faster than the next guy). For a model this size, it’s screaming. Get the brains, without trading off performance.

10

43

12

2

5K

Michael Feil

@feilsystem

Last Seen Users on Sotwe

Trends for you

Most Popular Users