Bohan Hou @bohanhou1998 - Twitter Profile

Bohan Hou @bohanhou1998

4 days ago

RT @ruihanglai: Two moments every ML researcher knows. You get onto a new cluster, and week one goes to fitting the framework to your setup…

0

10

0

1

bohanhou1998 retweeted

Tianqi Chen

@tqchenml

8 months ago

📢Excited to introduce Apache TVM FFI, an open ABI and FFI for ML systems, enabling compilers, libraries, DSLs, and frameworks to naturally interop with each other. Ship one library across pytorch, jax, cupy etc and runnable across python, c++, rust https://t.co/m2gHJRreol

tqchenml's tweet photo. 📢Excited to introduce Apache TVM FFI, an open ABI and FFI for ML systems, enabling compilers, libraries, DSLs, and frameworks to naturally interop with each other. Ship one library across pytorch, jax, cupy etc and runnable across python, c++, rust https://t.co/m2gHJRreol https://t.co/LDEPp6v6DM

3

165

40

55

38K

bohanhou1998 retweeted

PyTorch

@PyTorch

8 months ago

Live from the AI Infra Summit, co-located with #PyTorchCon — Tianqi Chen (@nvidia) explores how shared ML foundations can advance interoperability across compilers, libraries, DSLs, and frameworks, while unifying workloads across edge and cloud. 🔗 https://t.co/lLaazPPW2z #AIInfraSummit #OpenSourceAI #AIInfrastructure

PyTorch's tweet photo. Live from the AI Infra Summit, co-located with #PyTorchCon — Tianqi Chen (@nvidia) explores how shared ML foundations can advance interoperability across compilers, libraries, DSLs, and frameworks, while unifying workloads across edge and cloud.
🔗 https://t.co/lLaazPPW2z

#AIInfraSummit #OpenSourceAI #AIInfrastructure

2

45

13

3

8K

bohanhou1998 retweeted

Tim Dettmers

@Tim_Dettmers

about 1 year ago

Happy to announce that I joined the CMU Catalyst with three of my incoming students. Our research will bring the best models to consumer GPUs with a focus on agent systems and MoEs. It is amazing to see so many talented people at Catalyst -- a very exciting ecosystem!

13

339

48

33

24K

Who to follow

Zhuohan Li

@zhuohan123

building @vllm_project at @meta | ex-openai | cs phd @ 🌁 uc berkeley | machine learning system | the real agi is the friends we made along the way

Lianmin Zheng

@lm_zheng

Inference @meta | Prev: Engineer @xAI, Ph.D. @UCBerkeley, Co-founder @lmsysorg

Hao Zhang

@haozhangml

Asst. Prof. @HDSIUCSD and @ucsd_cse running @haoailab. Cofounder and runs @lmsysorg. 20% with @Snowflake

bohanhou1998 retweeted

Tianqi Chen

@tqchenml

about 1 year ago

Really thrilled to receive #NVIDIADGX B200 from @nvidia . Looking forward to cooking with the beast. Together with an amazing team at CMU Catalyst group @BeidiChen @Tim_Dettmers @JiaZhihao @zicokolter, We are looking at the innovate across entire stack from model to instructions

0

83

17

3

11K

bohanhou1998 retweeted

Zhihao Jia

@JiaZhihao

about 1 year ago

Thank you to @NVIDIA for gifting our Catalyst Research Group the latest NVIDIA DGX B200! The B200 platform will greatly accelerate our research in building next-generation ML systems.🚀 #NVIDIADGX #DGXB200 @NVIDIADC

0

51

10

2

8K

Bohan Hou @bohanhou1998

about 1 year ago

before use/in use/after use

CMU School of Computer Science @SCSatCMU

about 1 year ago

Huge thank you to @NVIDIADC for gifting a brand new #NVIDIADGX B200 to CMU’s Catalyst Research Group! This AI supercomputing system will afford Catalyst the ability to run and test their work on a world-class unified AI platform.

3

140

28

12

82K

1

12

3

0

2K

bohanhou1998 retweeted

Hongyi Jin @HongyiJin258

over 1 year ago

🚀Making cross-engine LLM serving programmable. Introducing LLM Microserving: a new RISC-style approach to design LLM serving API at sub-request level. Scale LLM serving with programmable cross-engine serving patterns, all in a few lines of Python. https://t.co/fq78yU2HvH

HongyiJin258's tweet photo. 🚀Making cross-engine LLM serving programmable.
Introducing LLM Microserving: a new RISC-style approach to design LLM serving API at sub-request level. Scale LLM serving with programmable cross-engine serving patterns, all in a few lines of Python.
https://t.co/fq78yU2HvH https://t.co/kMjL5M0KOM

0

64

31

26

19K

bohanhou1998 retweeted

Ruihang Lai @ruihanglai

almost 2 years ago

Announcing MLCEngine, a universal LLM deployment engine with ML Compilation. We rebuilt the engine with state-of-the-art serving optimizations and maximum local env portability. Fully OpenAI compatible for both cloud and local use cases. Check out the blog https://t.co/d0SqtRsBI4

ruihanglai's tweet photo. Announcing MLCEngine, a universal LLM deployment engine with ML Compilation. We rebuilt the engine with state-of-the-art serving optimizations and maximum local env portability. Fully OpenAI compatible for both cloud and local use cases. Check out the blog https://t.co/d0SqtRsBI4 https://t.co/aagfLlxR2i

3

44

15

18

13K

bohanhou1998 retweeted

Charlie Ruan

@charlie_ruan

about 2 years ago

Llama 3 from @AIatMeta is now up on WebLLM! Try it on https://t.co/NnJ7e1vPlH with local inference accelerated by @WebGPU. Or start building your local agent with the web-llm package -- everything in-browser!

charlie_ruan's tweet photo. Llama 3 from @AIatMeta is now up on WebLLM!

Try it on https://t.co/NnJ7e1vPlH with local inference accelerated by @WebGPU.

Or start building your local agent with the web-llm package -- everything in-browser! https://t.co/U92u9f7QCf

2

77

12

47

23K

bohanhou1998 retweeted

Tianqi Chen

@tqchenml

about 2 years ago

#Llama3 🦙🦙 running fully locally on iPad without internet connnection. credits to @ruihanglai and the team

0

73

15

14

8K

bohanhou1998 retweeted

Ruihang Lai @ruihanglai

about 2 years ago

Deploy #Llama3 locally with native GPU acceleration on CUDA/ROCm/Vulkan/Metal with MLC LLM. Check out https://t.co/xcXJlpqu5h for quick start instructions.

ruihanglai's tweet photo. Deploy #Llama3 locally with native GPU acceleration on CUDA/ROCm/Vulkan/Metal with MLC LLM.

Check out https://t.co/xcXJlpqu5h for quick start instructions. https://t.co/L9GQ19Q6jr

1

11

6

1

2K

bohanhou1998 retweeted

Mengshiun @mengshyu

about 2 years ago

Deploy #Llama3 on $100 Orange Pi with GPU acceleration through MLC LLM. Try it out on your Orange Pi 👉 https://t.co/zSJDE3GwUV

1

54

12

29

19K

bohanhou1998 retweeted

Tianqi Chen

@tqchenml

about 2 years ago

Please spread the words, #MLSys2024 will feature a full day single track-event young professional symposium with invited talks, panels, round tables, and poster sessions. Submit your 1-page abstract by April 1st & present your work at our poster session. https://t.co/dpTjseTZWq

2

69

19

8

23K

bohanhou1998 retweeted

Mishaal Rahman

@MishaalRahman

over 2 years ago

I asked @Google's Gemma 2B LLM to write me a poem. This is being run using the MLCChat app for Android on my Samsung Galaxy S24 Ultra.

5

227

16

23

19K

bohanhou1998 retweeted

Junru Shao

@junrushao

over 2 years ago

(1/3) 🦙🌟 Looking to run Llama2-70B? With two NV/AMD GPUs or more? 💥🔥 Machine learning compilation (MLC) now supports multi-GPU. ⚡️💻 We achieve 34 tok/sec on 2 x RTX 4090, the fastest solution at $3.2k. 🌐💡Two AMD 7900XTX delivers 30 tok/sec at $2k. https://t.co/iGTHTU0xdN

junrushao's tweet photo. (1/3) 🦙🌟 Looking to run Llama2-70B? With two NV/AMD GPUs or more?
💥🔥 Machine learning compilation (MLC) now supports multi-GPU.
⚡️💻 We achieve 34 tok/sec on 2 x RTX 4090, the fastest solution at $3.2k.
🌐💡Two AMD 7900XTX delivers 30 tok/sec at $2k.
https://t.co/iGTHTU0xdN https://t.co/M3cyYbtGxQ

8

166

37

70

41K

bohanhou1998 retweeted

Junru Shao

@junrushao

almost 3 years ago

While LLM is resource hungry and challenging to run at satisfactory speed on small devices, we show that ML compilation (MLC) techniques makes it possible to actually generate tokens at 5 tok/sec on a $100 Orange Pi with a Mali GPU. https://t.co/j8a21e1EVL

junrushao's tweet photo. While LLM is resource hungry and challenging to run at satisfactory speed on small devices, we show that ML compilation (MLC) techniques makes it possible to actually generate tokens at 5 tok/sec on a $100 Orange Pi with a Mali GPU. https://t.co/j8a21e1EVL https://t.co/igUZqJjFRI

11

229

49

103

76K

Bohan Hou @bohanhou1998

almost 3 years ago

Making @AMD @amdradeon GPUs competitive for LLM inference! 130 toks/s of Llama 2 7B, 75 toks/s for 13B with ROCm 5.6 + 7900 XTX + 4 bit quantization 80% performance of Nvidia RTX 4090 See how we do this in detail and try out our Python packages here: https://t.co/IL8IFMEiQs

bohanhou1998's tweet photo. Making @AMD @amdradeon GPUs competitive for LLM inference!

130 toks/s of Llama 2 7B, 75 toks/s for 13B with ROCm 5.6 + 7900 XTX + 4 bit quantization

80% performance of Nvidia RTX 4090

See how we do this in detail and try out our Python packages here: https://t.co/IL8IFMEiQs https://t.co/NocBVbSLx9

9

182

39

56

77K

Bohan Hou @bohanhou1998

almost 3 years ago

Now available in AppStore! https://t.co/NLqe9mypgB

Bohan Hou @bohanhou1998

almost 3 years ago

#Llama2 is running on iPhone, iPad📱natively with GPU acceleration. No internet connection is required. See IOS instructions to get the test flight app now: https://t.co/dJTCjRqMWy

15

257

75

158

139K

0

6

5

1

3K

bohanhou1998 retweeted

Ruihang Lai @ruihanglai

almost 3 years ago

Running Llama 2 directly in web browser with @WebGPU acceleration. Try it out at https://t.co/V7smDggAB0 Build your own web app with Web LLM in 35 lines of code 👇, with npm package at https://t.co/DcGC2DfZSO

ruihanglai's tweet photo. Running Llama 2 directly in web browser with @WebGPU acceleration. Try it out at https://t.co/V7smDggAB0

Build your own web app with Web LLM in 35 lines of code 👇, with npm package at https://t.co/DcGC2DfZSO https://t.co/jWlkAZcG6z

0

69

15

29

26K

Bohan Hou

@bohanhou1998

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users