Dmitry Noranovich @javaeeeee1 - Twitter Profile

about 9 hours ago

New course on serving LLMs efficiently -- how do you serve models to many concurrent users at low latency and reasonable cost? This short course is built with @RedHat and taught by @cedricclyburn. Efficient LLM serving requires efficient memory management. A 70B-parameter model takes ~140 GB just to load the weights. On top of that, every active request needs its own chunk of GPU memory, the KV cache, to store the token context it has built up so far. In this course, you'll learn to reduce a model's memory footprint with quantization and serve it using vLLM, which handles many concurrent requests efficiently through smart memory management. Skills you'll gain: - Quantize a model and measure the accuracy tradeoff - Serve a model with vLLM and watch it handle concurrent requests efficiently - Benchmark your deployment and make informed tradeoffs between speed, cost, and accuracy Join and learn to serve LLMs efficiently: https://t.co/x04xMbFlkO

34

428

67

332

36K

javaeeeee1 retweeted

OpenAI

@OpenAI

about 10 hours ago

We’ve been researching new ways for ChatGPT memory to carry context across conversations and keep it useful over time. Today, that work is rolling out as a more capable memory system in ChatGPT. https://t.co/0MyFKCe2Mu

445

7K

702

2K

1M

javaeeeee1 retweeted

Sebastian Raschka

@rasbt

about 9 hours ago

And another open-weight release. Nemotron 3 Ultra has an ultra impressive capability:efficiency ratio! Design-wise, it carries forward the Mamba-2-attention hybrid stack and LatentMoE introduced in the previous Super variant. But everything is a bit bigger.

rasbt's tweet photo. And another open-weight release. Nemotron 3 Ultra has an ultra impressive capability:efficiency ratio!

Design-wise, it carries forward the Mamba-2-attention hybrid stack and LatentMoE introduced in the previous Super variant. But everything is a bit bigger. https://t.co/nRjbMtY2aI

14

285

50

99

17K

Dmitry Noranovich

@javaeeeee1

about 16 hours ago

AppWizzy: Rent a private VM with Codex to build production apps by @flatlogic and @Blarior https://t.co/Jj2hMFbvjY

0

15

Who to follow

Cstacks

@TheMayor910

Entrepreneur. Ncat alumni. I love hooping, making ppl laugh, & good business😎 Subscribe on Youtube🙏🏽https://t.co/RCM9o3JVX9

Eugene

@eugenesergio

Nuked all my graveyard projects and started working on AI agents, n8n, and microservices-related content

Investor Business News

@ibnewsx

Investor Business News delivers fast, reliable updates on markets, money, policy and global business trends — all in one place.

Dmitry Noranovich

@javaeeeee1

about 16 hours ago

Google Gemma 4 12B: Run multimodal AI locally with an encoder-free architecture by @joshtwoodward https://t.co/vursWytAwr

0

33

Dmitry Noranovich

@javaeeeee1

about 16 hours ago

Astra Security: AI-Powered Offensive Pentest Platform by @shikhilsharma https://t.co/o17zqzokZy

0

12

Dmitry Noranovich

@javaeeeee1

about 16 hours ago

Carbon Voice: Record your voice. Share a link. Skip the meeting. by @TravisBogard https://t.co/pXhK3F5k3E

1

2

0

14

Dmitry Noranovich

@javaeeeee1

about 16 hours ago

Close: This SMB CRM calls your leads for you by @Steli, @NickPersico, and @philfreo https://t.co/m8kGWAhTMb

0

2

0

29

Dmitry Noranovich

@javaeeeee1

about 16 hours ago

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories https://t.co/qou8E8vTnb

0

10

javaeeeee1 retweeted

Philipp Schmid

@_philschmid

1 day ago

We made a collection @GoogleDeepMind scientific agent skils for research tasks, genomics, structural biology, cheminformatics, literature search, and more. 👉https://t.co/zkPuCtmwEE https://t.co/zkPuCtmwEE

16

323

47

245

20K

javaeeeee1 retweeted

DeepLearning.AI

@DeepLearningAI

1 day ago

New short course: Fast & Efficient LLM Inference with vLLM, built in partnership with @RedHat and taught by @cedricclyburn. Learn to quantize an open-source LLM, serve it with vLLM, and benchmark your deployment across speed, cost, and accuracy. Free to enroll: https://t.co/czVwJBnLZ6

12

287

55

204

39K

javaeeeee1 retweeted

Sebastian Raschka

@rasbt

1 day ago

It's been a while! 4 nice additions to the open-weight local-LLM-on-consumer-hardware ecosystem:

30

865

139

400

58K

javaeeeee1 retweeted

Google Gemma

@googlegemma

1 day ago

Meet Gemma 4 12B! A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license. Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇