Derek Xu @derekzxu - Twitter Profile

Fireworks Training is now in preview. You can now full-parameter fine-tune Kimi K2.5 (1T params, 256k context) with custom loss functions (GRPO, DRO, DAPO, or bring your own) on managed infra. @genspark_ai built their proprietary model stack in four weeks. @vercel hit 93% error-free generation with RFT. @cursor_ai runs their RL rollout fleet on Fireworks. Full-parameter from 8B to 1T. Multi-LoRA serving. Managed or bring your own training loop. Your model is your product. Your data is your moat. https://t.co/kyz7HzihC1

FireworksAI_HQ's tweet photo. Fireworks Training is now in preview.

You can now full-parameter fine-tune Kimi K2.5 (1T params, 256k context) with custom loss functions (GRPO, DRO, DAPO, or bring your own) on managed infra.

@genspark_ai built their proprietary model stack in four weeks. @vercel hit 93% error-free generation with RFT. @cursor_ai runs their RL rollout fleet on Fireworks.

Full-parameter from 8B to 1T. Multi-LoRA serving. Managed or bring your own training loop.
Your model is your product. Your data is your moat.
https://t.co/kyz7HzihC1

6

203

16

69

34K

derekzxu retweeted

Max Weinbach

@mweinbach

3 months ago

Fireworks AI fire pass is so good It's Kimi K2.5 Turbo right now at like 250 tok/s and idk what the limits are but it's HIGH oh and free trial but $7 per week

mweinbach's tweet photo. Fireworks AI fire pass is so good

It's Kimi K2.5 Turbo right now at like 250 tok/s and idk what the limits are but it's HIGH

oh and free trial but $7 per week https://t.co/JKQbHCvS0A

32

447

19

344

69K

Derek Xu @derekzxu

3 months ago

@chuyishang incredible

0

3

0

711

derekzxu retweeted

chuyi shang

@chuyishang

3 months ago

Wrote a deep dive on implementing a language model from scratch in JAX and scaling it with distributed training! If you’re coming from PyTorch and want to see how the same ideas look in JAX, or just want a hands-on intro to distributed training, check out this blog post: https://t.co/nsR3O3Zjxg Comes with code + an assignment and test cases so you can follow along!

chuyishang's tweet photo. Wrote a deep dive on implementing a language model from scratch in JAX and scaling it with distributed training!

If you’re coming from PyTorch and want to see how the same ideas look in JAX, or just want a hands-on intro to distributed training, check out this blog post: https://t.co/nsR3O3Zjxg

Comes with code + an assignment and test cases so you can follow along!

9

604

65

674

34K

derekzxu retweeted

Dmytro Dzhulgakov

@dzhulgakov

3 months ago

Composer 2 beats Opus on TerminalBench at a fraction of the cost. The ingredients: coding focus only, data flywheel, cracked RL team, and infrastructure that can keep up. @FireworksAI_HQ powered the inference and RL scaling behind Composer 2. Scaling RL is still genuinely hard, and we're proud we could help make it less so. Congrats to @cursor_ai on shipping a great model!

5

59

9

4

37K

Derek Xu @derekzxu

3 months ago

@DimitrisPapail noticed tau bench airline wasn't called out anywhere here. any interesting findings there? my sense is labs are moving away from it (or using the modified Anthropic version), due to variance from the simulated user, which also probably makes it hard to predict.

0

25

derekzxu retweeted

Dimitris Papailiopoulos

@DimitrisPapail

4 months ago

https://t.co/plZjz8c3nu

40

1K

96

1K

297K

derekzxu retweeted

Nishkarsh

@contextkingceo

3 months ago

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built @hydra_db for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

621

6K

636

6K

4M

Derek Xu @derekzxu

3 months ago

it was great working on this together! kernel is an awesome tool :)

KERNEL @usekernel

3 months ago

we worked with @fireworksai_hq to make training VLM browser agents with open source tools as easy as possible.

4

21

8

11

4K

0

7

1

0

442

derekzxu retweeted

KERNEL @usekernel

3 months ago

we worked with @fireworksai_hq to make training VLM browser agents with open source tools as easy as possible.

4

21

8

11

4K

derekzxu retweeted

Fireworks AI

@FireworksAI_HQ

3 months ago

New blog from the team at Fireworks: Where training–inference parity breaks in MoE models Kernel fusions that are mathematically identical can still drift numerically. We walk through the bugs we hit while serving Kimi K2.5 and training Qwen3.5-MoE, and how we fixed them. Worth a read if you're building high-performance inference: https://t.co/jDolrWzjxn

FireworksAI_HQ's tweet photo. New blog from the team at Fireworks:

Where training–inference parity breaks in MoE models

Kernel fusions that are mathematically identical can still drift numerically. We walk through the bugs we hit while serving Kimi K2.5 and training Qwen3.5-MoE, and how we fixed them.

Worth a read if you're building high-performance inference:
https://t.co/jDolrWzjxn

1

22

5

92K

Derek Xu @derekzxu

3 months ago

@itsayaanmomin tuff

0

25

Derek Xu @derekzxu

4 months ago

@VihaarNandigala incredible

0

3

Derek Xu @derekzxu

4 months ago

@VihaarNandigala let’s gooooooooooo

0

110

derekzxu retweeted

Vihaar Nandigala

@VihaarNandigala

4 months ago

We just raised a $5.3M seed round for Orange Slice, co-led by 1984 Capital and Moxxie Ventures, with participation from angels like Paul Graham. We’re building AI agents, inside a spreadsheet, that help sales teams find companies that already want to buy. The reality is most sales teams don’t struggle with effort - they struggle with timing. Reps spend huge amounts of time working static lists and broad targeting, chasing leads that were never going to convert. That creates noise, low reply rates, and wasted cycles. Top companies like Ramp solve this with dedicated growth engineers building internal data workflows. We’re making that same capability accessible to everyone else. At its core, the challenge is simple: finding customers who already have the problem you solve. Orange Slice turns the spreadsheet into a system for discovering buying signals - agents research company sites, news, social signals, and niche sources like court records or building permits, then structure that information directly into columns teams can act on. Not “might be a fit.” But “likely in-market.” So instead of guessing who to target, teams build and refine living lists of high-intent accounts inside a sheet. Still early. Still learning. But we’re excited to keep building. Kishan and I met sophomore year on a Bollywood dance team at Michigan — and I couldn’t ask for a better co-founder. Grateful to our team, customers, and investors for believing in this vision.

77

674

37

530

69K

derekzxu retweeted

Cal Lavicka

@CalLavicka

4 months ago

LLMs suck at creating tests. Their tests are too basic and they cheat all the time, validating buggy behavior to get 100% test coverage rather than flagging real bugs. So, I created an opencode plugin to fix this

CalLavicka's tweet photo. LLMs suck at creating tests. Their tests are too basic and they cheat all the time, validating buggy behavior to get 100% test coverage rather than flagging real bugs.

So, I created an opencode plugin to fix this https://t.co/8zO0ixuhWD

1

18

6

10

3K

derekzxu retweeted

Dmytro Dzhulgakov

@dzhulgakov

5 months ago

🌕 Kimi K2.5 = open SOTA reasoning + vision + 256K context + agentic coding 🏎 200+ t/s on @FireworksAI_HQ (soon even faster) ✅ Nails @simonw's "pelican on a bike" test in both directions Try it now on Fireworks and hats off to @Kimi_Moonshot

dzhulgakov's tweet photo. 🌕 Kimi K2.5 = open SOTA reasoning + vision + 256K context + agentic coding

🏎 200+ t/s on @FireworksAI_HQ (soon even faster)

✅ Nails @simonw's "pelican on a bike" test in both directions

Try it now on Fireworks and hats off to @Kimi_Moonshot https://t.co/vMFAZnTEFg

0

40

8

6

8K

Derek Xu

@derekzxu

Last Seen Users on Sotwe

Trends for you

Most Popular Users