Thought LLMs were just prompt→response
Then I dug deeper
• Why are production systems so much faster?
• Why memory becomes a bottleneck?
→ Continuous batching
→ Paged attention
→ Speculative decoding
Harder problems begin underneath:
latency, memory, throughput, serving.
@aakrit Interested, I'm AI/ML Engineer with 7 months of experience in legal tech startup. My work spans LLM-powered agents, RAG pipelines, recommendation systems and I think about failure modes, non-determinism and scalability before worrying about models.
Github: https://t.co/BvdBYngCQp
@Ethen_Brooks Hi I'm interested, I have 7 months of part time experience in a legal tech startup.
Github: https://t.co/Z1wjHTXSNh
Projects demo:
Voker (Interruptible voice agent) :https://t.co/JIDdzLsMxD
Hotel agent: https://t.co/ir9TlFGGg0
Loan compliance pipeline: https://t.co/dOZFPwp58i
@Yashagarwal9911 Hi, I'm interested. I have part time experience of 7 months in this domain along with self made projects. here is my resume and github.
Resume: https://t.co/NpMAfCxuIY
Github: https://t.co/BvdBYngCQp
@souvikdeb26 Hi, I'm interested. I have part time experience of 7 months in this domain along with self made projects.
here is my resume and github:
Resume: https://t.co/NpMAfCxuIY
Github: https://t.co/BvdBYngCQp