Hey @x
Looking to #connect with what are you building?
๐ป SaaS
โ๏ธ Tech
๐ค Automation
๐ง AI tools
๐ฆ Product Development
๐ Web apps
Drop what you're working on ๐๐ผ
Let's #buildinpublic
Hey all @X
I'm looking to #connect with
people interested in: -
Frontend
Backend
Fullstack
AI/ML
DevOps
Web3
Cloud
Open Source
Tech Writing
Markets
Finance
Crypto
Let's grow together ๐ค
Hey @x
Looking to #connect with what are you building?
๐ป SaaS
โ๏ธ Tech
๐ค Automation
๐ง AI tools
๐ฆ Product Development
๐ Web apps
Drop what you're working on ๐๐ผ
Let's #buildinpublic
Now let's do the math on what that actually current systems use...
Current system for 6GB model:
OS kernel: ~500MB
Runtime overhead: ~500MB
Memory addressing/pointers: ~1GB
Padding/alignment waste: ~500MB
Cache inefficiency: ~1GB
Actual weights: ~2.5GB
That's 58% waste.
What does a system actually need to run LLM inference?
Just these things:
-Store the weights โ the numbers
-Read them in order โ sequentially
-Multiply them โ basic math
-Output a result โ one token at a time
That's it. That's all an LLM inference engine fundamentally does.
What does a system actually need to run LLM inference?
Just these things:
-Store the weights โ the numbers
-Read them in order โ sequentially
-Multiply them โ basic math
-Output a result โ one token at a time
That's it. That's all an LLM inference engine fundamentally does.