Zied @databishops - Twitter Profile

6 months ago

Underexplored LLM failure mode: Specificity Hallucination You need: "patient must lie down" LLM says: "patient must lie on their back" Not wrong—too specific. The model invented precision that doesn't exist. RLHF rewards detail. Even fabricated detail.

0

10

Zied @DataBishops

6 months ago

Upgraded to vLLM 0.12.0 on my H100s. Text models: +17% throughput (17.8 → 20.8 req/s) ✅ Multimodal models: breaks if CUDA <12.9 — ViT FlashAttention kernels need 12.9, no way to opt out. Fix: keep vision-language models on vLLM 0.8.x for now.

0

22

Zied @DataBishops

6 months ago

Finally got around to posting about Orchestra v3 (rip v2 announcement) Inference service system for running multiple AI models on GPUs. True concurrent batching, 10-15x throughput improvement. https://t.co/2eIMN16nne

0

14

Zied @DataBishops

6 months ago

Spent hours debugging vLLM memory errors. "Not enough KV cache blocks" on an 80GB H100. Made no sense. The fix? One line: export VLLM_USE_V1=0 V1 engine has a bug — it calculates memory budget after torch.compile already consumed part of it. V0 is more robust for multi-model.

0

26

Zied @DataBishops

6 months ago

TIL I've been thinking about LLM scaling wrong. I assumed more model instances = more throughput. I didn't understand how much vLLM and PagedAttention improve batching. One instance can handle massive concurrency — the scheduler batches requests and manages KV cache dynamically.

0

1

0

18

Zied @DataBishops

7 months ago

I love Claude Opus 4.5

0

29

Zied @DataBishops

8 months ago

@trikcode Hi ! AI engineer for some years now and was always passive in social media (read-only mode, no comments no posts) but decided lately I'll be more active and sharing stuff... don't hesitate to connect 🤓

0

8

Zied @DataBishops

8 months ago

@Pokee_AI POKEE 😄

0

4

Zied @DataBishops

8 months ago

With Orchestra, secure on-prem AI scalability can finally feel as simple as cloud APIs — but without the data trade-off. Built with 🧠 FastAPI, asyncio, vLLM, CUDA, Hugging Face. #AI #MLOps #GPU #DeepLearning #FastAPI #OpenSource #vLLM

0

58

Zied @DataBishops

8 months ago

🎼 Introducing Orchestra v1.0 — a distributed AI orchestration system for secure, scalable inference. Built for teams who want to deploy more models on limited GPUs without relying on external APIs. 🔗 https://t.co/NLDZhDYWvC

1

2

3

0

171

Zied @DataBishops

8 months ago

What it does: • Run multiple models (Gemma3, Qwen-VL, Qwen3, Whisper+Emotion, ...) • Intelligent load balancing + auto-scaling • CUDA MPS + vLLM for GPU sharing • Real-time dashboard, metrics & logs

1

0

74

Zied @DataBishops

10 months ago

Reading the GPT-5 system card, it seems much of the backlash is really about the naming and the fact most models are api-dev-only. If “gpt-5-main” had launched as “GPT-5o” and “gpt-5-thinking” as “OpenAI o5,” the reaction might have been very different.

DataBishops's tweet photo. Reading the GPT-5 system card, it seems much of the backlash is really about the naming and the fact most models are api-dev-only. If “gpt-5-main” had launched as “GPT-5o” and “gpt-5-thinking” as “OpenAI o5,” the reaction might have been very different. https://t.co/eP2bhq62bX

0

68

DataBishops retweeted

Pokémon UNITE

@PokemonUnite

almost 2 years ago

#UNITE3rd Giveaway Campaign Day 5 Chance to win Aeos Coins, Aeos Tickets or Holowear! 🎁 To participate: 1.Follow @PokemonUnite 2.Repost this post by July 23, 4:59 pm (PDT) You will receive the result instantly! Terms: https://t.co/Eu7kpenwBh #PokemonUNITE

170

4K

7K

132

2M

Zied

@DataBishops

Last Seen Users on Sotwe

Trends for you

Most Popular Users