Arash Bakhtiari @arashb - Twitter Profile

arashb retweeted

9 days ago

The most dangerous person in the age of AI is someone who was mediocre at coding but has taste, and can now build anything

475

10K

627

2K

438K

arashb retweeted

Noam Brown

@polynoamial

22 days ago

https://t.co/oWqzT12RtZ

78

3K

414

3K

1M

arashb retweeted

Elad Gil

@eladgil

about 1 month ago

The events of the last 6 months in technology are arguable amongst the most important in human history The tools now increasingly exist for recursive self improvement of models & agents We are likely in very early lift off & exponential Largely unnoticed outside of tech

268

5K

423

1K

584K

arashb retweeted

Physics In History

@PhysInHistory

about 2 months ago

Doing nothing vs Small consistent effort ✍️

28

1K

243

203

50K

Who to follow

Conglong Li

@conglongli

李葱茏/リーツォンロン Senior Research Scientist @GoogleDeepMind Japan office. Ex @Microsoft @DeepSpeedAI member. @SCSatCMU PhD. Views are my own. English/中文/日本語.

Michal Valko ✈️ ICML 2026 from June 30th 🔥

@misovalko

Founding Researcher @ Isara Labs & Inria & MVA. Ex: Llama @AIatMeta; Gemini & BYOL @GoogleDeepMind. LLMs, RL, alignment.

Maür Vinent i Pons

@menorcacatalana

Piulaire de la Catalunya insular oriental (Menorca). Artesà pictòric. Numismàtic. Amb un amor sense remei ni adob. Amb un enyor sense esma ni sortida...

arashb retweeted

Physics In History

@PhysInHistory

about 2 months ago

General Relativity for babies ✍️

41

2K

306

675

153K

Arash Bakhtiari @arashb

about 2 months ago

@youtubemusic’s recommendation system knows me better than anyone else on this planet!

0

11

Arash Bakhtiari @arashb

about 2 months ago

Codex’s prompt-centric UI feels like the future of IDEs

0

26

Arash Bakhtiari @arashb

about 2 months ago

2016 Student: What programming language should I learn? Teacher: Python is a great place to start. 2026 Student: What programming language should I learn? Teacher: Markdown is all you need.

0

26

Arash Bakhtiari @arashb

about 2 months ago

@DynamicWebPaige Codex doesn’t really look like a traditional IDE. Its UI is much more prompt-centric. But tbh, it’s already replacing VS Code for me in most of my workflows.

0

67

Arash Bakhtiari @arashb

about 2 months ago

@yaroslavvb Compute and Ideas

0

102

Arash Bakhtiari @arashb

about 2 months ago

With tools like @OpenAI Codex, we’re becoming truly idea-bound (and compute-bound 😀) So many ideas I never had the time, energy, or foundation to explore. Now @ChatGPTapp helps me think through them, and Codex helps me build. Exciting times 🚀

0

46

Arash Bakhtiari @arashb

about 2 months ago

@ChatGPTapp is the PhD advisor I never had: patient, thoughtful, and always ready to gently guide me in the right direction while actually listening to my ideas

0

26

Arash Bakhtiari @arashb

2 months ago

compute is the bottleneck now!

0

45

Arash Bakhtiari @arashb

over 1 year ago

My Bluesky profile: https://t.co/weIT9kqUfp

0

141

arashb retweeted

Microsoft

@Microsoft

about 2 years ago

We're excited to announce the launch of Phi-3, a groundbreaking family of small language models that outperform larger models on a range of benchmarks. Learn how these small language models trained on high-quality data are doing more with less: https://t.co/dFfyktuEUL

66

1K

256

190

272K

arashb retweeted

Sebastien Bubeck

@SebastienBubeck

about 2 years ago

Game on! https://t.co/7WCJ6Stz46

19

340

48

73

58K

arashb retweeted

elvis

@omarsar0

about 2 years ago

Phi-3 Technical Report Microsoft presents a new 3.8B parameter language model called phi-3-mini. It's trained on 3.3 trillion tokens and is reported to rival Mixtral 8x7B and GPT-3.5. Has a default context length of 4K but also includes a version that is extended to 128K (phi-mini-128K). Combines heavily filtered web data and synthetic data to train the 3.8B models. It also reports results on 7B and 14B models trained on 4.8T tokens (phi-3-small and phi-3-medium). phi-3-mini achieves 69% on MMLU while phi-3-small and phi-3-medium achieve 75% and 78% on MMLU. The authors claim that because of the limited size of this model, it has less capacity to store "factual knowledge" making it weaker for certain tasks. This is something that can be resolved using an external search engine.

omarsar0's tweet photo. Phi-3 Technical Report

Microsoft presents a new 3.8B parameter language model called phi-3-mini.

It's trained on 3.3 trillion tokens and is reported to rival Mixtral 8x7B and GPT-3.5. Has a default context length of 4K but also includes a version that is extended to 128K (phi-mini-128K).

Combines heavily filtered web data and synthetic data to train the 3.8B models.

It also reports results on 7B and 14B models trained on 4.8T tokens (phi-3-small and phi-3-medium).

phi-3-mini achieves 69% on MMLU while phi-3-small and phi-3-medium achieve 75% and 78% on MMLU.

The authors claim that because of the limited size of this model, it has less capacity to store "factual knowledge" making it weaker for certain tasks. This is something that can be resolved using an external search engine.

7

383

95

168

79K

arashb retweeted

DeepSpeed

@DeepSpeedAI

over 2 years ago

#DeepSpeed joins forces with @Sydney_Uni to unveil an exciting tech #FP6. Just supply your FP16 models, and we deliver: 🚀 1.5x performance boost for #LLMs serving on #GPUs 🚀 Innovative (4+2)-bit system design 🚀 Quality-preserving quantization link: https://t.co/m6vcmXaWxb

DeepSpeedAI's tweet photo. #DeepSpeed joins forces with @Sydney_Uni to unveil an exciting tech #FP6. Just supply your FP16 models, and we deliver:
🚀 1.5x performance boost for #LLMs serving on #GPUs
🚀 Innovative (4+2)-bit system design
🚀 Quality-preserving quantization
link: https://t.co/m6vcmXaWxb https://t.co/6pAzLJiTUe

1

166

26

52

19K

arashb retweeted

AK

@_akhaliq

over 2 years ago

Microsoft presents FP6-LLM Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design paper page: https://t.co/nVfNqjBmCV Six-bit quantization (FP6) can effectively reduce the size of large language models (LLMs) and preserve the model quality consistently across varied applications. However, existing systems do not provide Tensor Core support for FP6 quantization and struggle to achieve practical performance improvements during LLM inference. It is challenging to support FP6 quantization on GPUs due to (1) unfriendly memory access of model weights with irregular bit-width and (2) high runtime overhead of weight de-quantization. To address these problems, we propose TC-FPx, the first full-stack GPU kernel design scheme with unified Tensor Core support of float-point weights for various quantization bit-width. We integrate TC-FPx kernel into an existing inference system, providing new end-to-end support (called FP6-LLM) for quantized LLM inference, where better trade-offs between inference cost and model quality are achieved. Experiments show that FP6-LLM enables the inference of LLaMA-70b using only a single GPU, achieving 1.69x-2.65x higher normalized inference throughput than the FP16 baseline.

_akhaliq's tweet photo. Microsoft presents FP6-LLM

Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

paper page: https://t.co/nVfNqjBmCV

Six-bit quantization (FP6) can effectively reduce the size of large language models (LLMs) and preserve the model quality consistently across varied applications. However, existing systems do not provide Tensor Core support for FP6 quantization and struggle to achieve practical performance improvements during LLM inference. It is challenging to support FP6 quantization on GPUs due to (1) unfriendly memory access of model weights with irregular bit-width and (2) high runtime overhead of weight de-quantization. To address these problems, we propose TC-FPx, the first full-stack GPU kernel design scheme with unified Tensor Core support of float-point weights for various quantization bit-width. We integrate TC-FPx kernel into an existing inference system, providing new end-to-end support (called FP6-LLM) for quantized LLM inference, where better trade-offs between inference cost and model quality are achieved. Experiments show that FP6-LLM enables the inference of LLaMA-70b using only a single GPU, achieving 1.69x-2.65x higher normalized inference throughput than the FP16 baseline.

0

148

33

64

23K

Arash Bakhtiari @arashb

over 2 years ago

@fabmilo @mpowers206 @MSFTDeepSpeed DeepSpeed-MII benchmark scripts are available here: https://t.co/62DHAFvOHX

0

1

0

67

Arash Bakhtiari

@arashb

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users