Lila

Verified account

@LilaRest

npm install life 🌷

Joined June 2021

244 Following

3.2K Followers

99 Posts

Pinned Tweet

2 months ago

Introducing 𝐆𝐞𝐦𝐦𝐚 𝟒 𝟑𝟏𝐁 𝐓𝐮𝐫𝐛𝐨 ⚡️ It runs on a 𝘴𝘪𝘯𝘨𝘭𝘦 RTX 5090, at 51 tok/s (single) and 1244 tok/s (batched). And prefills up to 15359 tok/s. It's 𝟔𝟖% 𝐬𝐦𝐚𝐥𝐥𝐞𝐫 in GPU memory and ~𝟐.𝟓𝐱 𝐟𝐚𝐬𝐭𝐞𝐫 than the base model, and retains nearly 𝐢𝐝𝐞𝐧𝐭𝐢𝐜𝐚𝐥 𝐪𝐮𝐚𝐥𝐢𝐭𝐲 on benchmarks (1-3% loss). Turbo is a derivative of the NVFP4 quant that NVIDIA released a few days ago. It fully leverages NVIDIA Blackwell FP4 tensor cores for ~𝟐× 𝐡𝐢𝐠𝐡𝐞𝐫 𝐜𝐨𝐧𝐜𝐮𝐫𝐫𝐞𝐧𝐭 𝐭𝐡𝐫𝐨𝐮𝐠𝐡𝐩𝐮𝐭 𝐭𝐡𝐚𝐧 𝐨𝐭𝐡𝐞𝐫 𝐪𝐮𝐚𝐧𝐭𝐬. I'm using it for hard classification tasks — on internal benchmarks it showed 𝐒𝐨𝐧𝐧𝐞𝐭-𝟒.𝟓-𝐥𝐞𝐯𝐞𝐥 𝐢𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞 (scored well above Haiku 4.5), at a 600𝘵𝘩 of the cost. A single RTX 5090 scales up to 18 req/s at 1000in/20out 🥵. Model card and benchmark in comments 👇 I'd love to hear your use cases.

LilaRest's tweet photo. Introducing 𝐆𝐞𝐦𝐦𝐚 𝟒 𝟑𝟏𝐁 𝐓𝐮𝐫𝐛𝐨 ⚡️

It runs on a 𝘴𝘪𝘯𝘨𝘭𝘦 RTX 5090, at 51 tok/s (single) and 1244 tok/s (batched). And prefills up to 15359 tok/s.

It's 𝟔𝟖% 𝐬𝐦𝐚𝐥𝐥𝐞𝐫 in GPU memory and ~𝟐.𝟓𝐱 𝐟𝐚𝐬𝐭𝐞𝐫 than the base model, and retains nearly 𝐢𝐝𝐞𝐧𝐭𝐢𝐜𝐚𝐥 𝐪𝐮𝐚𝐥𝐢𝐭𝐲 on benchmarks (1-3% loss).

Turbo is a derivative of the NVFP4 quant that NVIDIA released a few days ago. It fully leverages NVIDIA Blackwell FP4 tensor cores for ~𝟐× 𝐡𝐢𝐠𝐡𝐞𝐫 𝐜𝐨𝐧𝐜𝐮𝐫𝐫𝐞𝐧𝐭 𝐭𝐡𝐫𝐨𝐮𝐠𝐡𝐩𝐮𝐭 𝐭𝐡𝐚𝐧 𝐨𝐭𝐡𝐞𝐫 𝐪𝐮𝐚𝐧𝐭𝐬.

I'm using it for hard classification tasks — on internal benchmarks it showed 𝐒𝐨𝐧𝐧𝐞𝐭-𝟒.𝟓-𝐥𝐞𝐯𝐞𝐥 𝐢𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞 (scored well above Haiku 4.5), at a 600𝘵𝘩 of the cost. A single RTX 5090 scales up to 18 req/s at 1000in/20out 🥵.

Model card and benchmark in comments 👇

I'd love to hear your use cases.

59

1K

125

1K

267K

about 11 hours ago

I see what you mean about solving skill issues at the framework-level, and in fact Langchain / Mastra are partially trying that by offering high-level APIs to even the simplest things, like managing the context window (which in most cases is a few hundreds lines of code). I'm talking about the opposite, instead of the framework support high-level features, it'd just offer low-level primitives you could compose to achieve any high-level feature. Like React for web development. React doesn't tell you how you should architect your app, nor how you should implement this or that features (and same for frameworks on top of React), instead it offers a set of primitives, that can be explained in 10mins, and allow you to compose from the simplest to the most complex applications.

1

0

0

0

17

2 days ago

Be honest, what's the worst part of building AI agents with frameworks like Langchain or Mastra?

6

5

0

1

415

about 12 hours ago

Skill issue is partially the responsibility of the framework too. If skilled devs assisted with sota coding models are struggling to produce even the simplest agents, maybe the framework creates more complexity than it solves? Ultimately an agent is just a loop + tools + conversation history. It’s easy to call out skill issues, but in my opinion, if a framework was properly designed, even a junior dev with a coding assistant could scale a complex agentic app. Mastra and Langchain solve the « Can your framework do X? » question by pilling up features for years, instead of shrinking down to a set of primitives that are flexible and minimal enough to do anything.

1

0

0

0

32

Who to follow

Bookworm. Den som sover syndar icke. Olemme todennäköisesti samaa mieltä jostain.

waz🦍💎🙌

2 days ago

@JE4NVRG @Rebecca49484009 Agreed

0

1

0

0

12

2 days ago

@JustJerry121 So basically being able to observe precisely why it failed right?

0

0

0

0

18

2 days ago

@adelbucetta but so that problems is not about the framework your using right? or the framework could help you in some way with that?

0

0

0

0

12

2 days ago

@Rebecca49484009 I feel you. It seems like the existing frameworks are failing pretty fast when you cross the boundaries of what their authors had in mind. And so we build layers of evals and observability tools to monior even the simplest agents instead of fixing the root cause

1

1

0

0

19

about 1 month ago

@shawntenam ahaha, that’s a great thank you

0

1

0

0

14

about 1 month ago

@bnjmn_marie Just perfect for nuanced zero-shot classifications

0

0

0

0

100

about 2 months ago

@TheAaryanKapoor lmao, you��re hired

0

1

0

0

32

about 2 months ago

Introducing Ghod-1, a model outperforming Mythos. Ghod-1 excels at everything. We released it to a handful of partners for safety and financial reasons. Priced at $2K/M tokens in and $6K/M tokens out.

LilaRest's tweet photo. Introducing Ghod-1, a model outperforming Mythos.

Ghod-1 excels at everything.

We released it to a handful of partners for safety and financial reasons.

Priced at $2K/M tokens in and $6K/M tokens out.

39

258

11

91

45K

about 2 months ago

@momo_mattomo leave a star on the bible ^^

0

1

0

0

15

about 2 months ago

@momo_mattomo depends on whether you belive in Ghod

2

0

0

0

221

about 2 months ago

@Noor_Farayeh1 @X @grok that's a satire mate, and there is no link to click

1

2

0

0

38

about 2 months ago

@OmniScopeBio Ghod does

0

1

0

0

135

about 2 months ago

@nickshiva42 Confirmed

0

3

0

0

777

about 2 months ago

@ioannesesledieu It’s smart enough to refuse distillation attempts

0

0

0

0

260

about 2 months ago

@Mohamed_KhedrX Ghod-1 (high) scores 392, what do you say to that? (love your model name though)

0

2

0

0

412

about 2 months ago

@Vertoxo_ai 🐍

0

1

0

0

1K

about 2 months ago

@OmniScopeBio It is inaccessible, no worries

1

1

0

0

1K

about 2 months ago

@aiseomastery Ghod is independent

1

1

0

0

522

Last Seen Users on Sotwe

Trends for you

Most Popular Users