Niz

@homo_codean

A staunch believer of infinity = 30.

Brisbane, Queensland

Joined August 2011

184 Following

23 Followers

386 Posts

Niz @homo_codean

1 day ago

We've spent years treating evaluation as the side quest. It's starting to look like the main quest. https://t.co/DrTehpwWM7 https://t.co/lyOPiRxInA https://t.co/JeE4EIRFTG

Niz @homo_codean

1 day ago

The bottleneck in AI is increasingly shifting from model capability to measurement capability. The harder question is becoming: how do we know what they're actually capable of?

Niz @homo_codean

1 day ago

3 signals worth watching: - ExploitBench brings granularity to exploit evals beyond pass/fail - Anthropic's Glasswing shows AI finds vulnerabilities faster than humans can patch them - Their exploit eval work pushes agentic evals into realistic envs over trivial benchmarks

homo_codean retweeted

Fei-Fei Li

@drfeifei

2 days ago

https://t.co/Kt50ttQRMJ

147

852

763K

Who to follow

Vivek Talwar(e/acc)

@vivektalwar689

math ∩ biology ∩ computing, Opinions are my own, RTs are not endorsements.

homo_codean retweeted

luthira

@luthiraabeykoon

about 1 month ago

We implemented @karpathy 's MicroGPT fully on FPGA fabric. No GPU. No PyTorch. No CPU inference loop. Just a transformer burned into hardware, generating 50,000+ tokens/sec. The model is small, but the idea is not: inference does not have to live only in software 👇

267

700

850K

homo_codean retweeted

rUv

@rUv

2 months ago

Most people think in terms of what’s there. Signals. Data. Presence. The Maxwell Algorithm flips that. It’s really about fields. Not objects. Not rows in a database. Fields. Where energy moves, where pressure builds, where something is about to change. You’re not modeling things, you’re modeling influence. That’s exactly how I use it. With ruvector and Cognitum, I don’t try to track everything. That’s a losing game. I treat the system like a field and look for gradients. Where is coherence breaking. Where is tension forming. Where is flow accelerating. Take RuView. I’m not “seeing a person” through walls. I’m tracking disturbances in the RF field. Breathing is a periodic ripple. Heartbeat is a micro fluctuation. Movement is a phase shift. I’m reading the field, not the object. Same with dynamic mincut. The cut is not just a boundary. It’s where the field is weakest. Where separation naturally occurs. That’s your signal. That’s where something important is happening. Agents follow that. They don’t scan everything. They move toward pressure. Toward change. Practically, this means less compute, faster detection, and better decisions. You’re not reacting after the fact. You’re moving with the system as it evolves. Once you start thinking in fields instead of objects, everything gets simpler. You stop searching. You start sensing.

700

homo_codean retweeted

Karl Weinmeister

@kweinmeister

2 months ago

How do you maximize LLM inference performance while staying within your budget? My latest post covers these latest techniques: 1. Semantic routing across model tiers 2. Prefill & decode disaggregation 3. Quantization 4. Context routing 5. Speculative decoding 👉 https://t.co/2deUL279cG

333

homo_codean retweeted

lak lakshmanan @lak_luster

5 months ago

You are probably hearing every influential software developer in your social media feed rave about the new generation of coding assistants like Claude Code, Augment, Cursor, and Google Antigravity. What's happening? 1/n

270

Niz @homo_codean

4 months ago

this 🫡

Aakash Gupta

@aakashgupta

4 months ago

Sounds incredible until you read the fine print. The compiler generates less efficient code than GCC with all optimizations disabled. It doesn’t have its own assembler or linker. It can’t produce a 16-bit x86 code generator. And Carlini himself says it has “nearly reached the limits of Opus’s abilities.” New features and bugfixes kept breaking existing functionality. So what did $20,000 and two weeks actually buy? A compiler that passes 99% of GCC’s torture tests but can’t match the output quality of a tool that’s had 37 years of human engineering. That’s the constraint nobody’s pricing in. The real story is in the cost curve, not the capability demo. $20,000 for 100,000 lines means $0.20 per line of generated code. A senior compiler engineer costs roughly $150/hour. At maybe 50 polished lines per hour for something this complex, that’s $3/line. AI just did it at 15x cheaper, and it will only get cheaper from here. But the code isn’t equivalent. The AI version needs a human to finish the assembler, fix the linker, optimize the output, and prevent regressions. Those are the hardest 20% of the problem, and they represent 80% of the engineering value. Anthropic built the demo. Shipping the product still requires humans. This tells you exactly where we are in the autonomous software timeline. AI can now produce impressive first drafts of complex systems at trivial cost. Turning those drafts into production software still requires the judgment that costs $300K+ per year in compiler engineer salary. The gap between “compiles the Linux kernel” and “replaces GCC” is measured in decades of accumulated engineering wisdom that no model has internalized yet. The companies that understand this will use agent teams to generate the 80% and hire engineers to finish the 20%. The companies that don’t will ship $20,000 compilers that produce slower code than a free tool from 1987.

187

309

970

375K

homo_codean retweeted

Aflah 🍉🕊️ @Aflah02101

5 months ago

🧵 Thread: Introducing MAMF Explorer 🧵 A practical way to understand real matmul performance on GPUs, not just theoretical peaks. https://t.co/j4R3LHRAFt

15K

homo_codean retweeted

Richard Seroter

@rseroter

6 months ago

We spun up a new GitHub repo for all things MCP at @Google. Get info on our remote managed MCP servers, open source MCP servers, examples, and learning resources. https://t.co/q6erJX2Xcc

rseroter's tweet photo. We spun up a new GitHub repo for all things MCP at @Google.

Get info on our remote managed MCP servers, open source MCP servers, examples, and learning resources.

https://t.co/q6erJX2Xcc https://t.co/nYpxhIJ9Xq

237

109K

Niz @homo_codean

5 months ago

this 🫡

Andrej Karpathy

@karpathy

5 months ago

I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue. There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering. Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession. Roll up your sleeves to not fall behind.

56K

32K

17M

homo_codean retweeted

Luis Batalha

@luismbat

5 months ago

After rewatching Home Alone, I couldn’t stop wondering: how plausible is the oversleep that leaves Kevin behind? So I wrote a tiny paper and ran the numbers. Merry Christmas! 🎄

luismbat's tweet photo. After rewatching Home Alone, I couldn’t stop wondering:

how plausible is the oversleep that leaves Kevin behind?
So I wrote a tiny paper and ran the numbers.

Merry Christmas! 🎄 https://t.co/jhtrVe1AuT

429

22K

homo_codean retweeted

anshuman

@athleticKoder

8 months ago

Techniques I’d master if I wanted to make LLMs faster + cheaper. 1. Quantization 2. KV-Cache Quantization 3. Flash Attention 4. Speculative Decoding 5. LoRA 6. Pruning 7. Knowledge Distillation 8. Weight Sharing 9. Sparse Attention 10. Batching & Dynamic Batching 11. Model Serving Optimization 12. Tensor Parallelism 13. Pipeline Parallelism 14. Paged Attention 15. Mixed Precision Inference 16. Early Exit / Token-Level Pruning

163

61K

homo_codean retweeted

Pol Lanski 🥩,🤖

@Pol_Lanski

over 1 year ago

"Not your weights, not your brain" I see this as pluggable exocortexes. You will be using closed-source exocortexes for work, where "giving up control in exchange for a better brain" gives you an edge. For your personal, private life you will plug in something that actually cares about you, does not make you vulnerable to private company control for profit.

445

homo_codean retweeted

Avi Chawla

@_avichawla

8 months ago

Finally, Python 3.14 lets you disable GIL! It's a big deal because earlier, even if you wrote multi-threaded code, Python could only run one thread at a time, giving no performance benefit. But now, Python can run your multi-threaded code in parallel. And uv fully supports it!

118

484

546K

homo_codean retweeted

Vivek Galatage

@vivekgalatage

9 months ago

Understanding GPU Architecture from Cornell https://t.co/B1tDsYOCVF During a low-level discussion at a casual meetup, many folks were interested in understanding GPUs more closely. While CPUs optimize for complex control flow (see those big cores + caches), the GPUs maximize throughput with thousands of simple cores sharing memory. The GPU architecture roadmap is a good starting point for diving deeper.

vivekgalatage's tweet photo. Understanding GPU Architecture from Cornell

https://t.co/B1tDsYOCVF

During a low-level discussion at a casual meetup, many folks were interested in understanding GPUs more closely.

While CPUs optimize for complex control flow (see those big cores + caches), the GPUs maximize throughput with thousands of simple cores sharing memory. The GPU architecture roadmap is a good starting point for diving deeper.

385

534K

homo_codean retweeted

Victoria Slocum

@victorialslocum

10 months ago

Prompt engineering is dead. Long live 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 (Well, not quite dead - but it's definitely evolving into something way more powerful) Meet 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 - the art of building dynamic systems that give LLMs exactly what they need to succeed. As we move from simple chatbots to complex AI agents, we're realizing that clever prompts aren't enough. What matters is orchestrating an entire ecosystem of information that flows into your LLM. So what exactly does that mean? It's about building dynamic systems to provide the right information and tools in the right format such that the LLM can plausibly accomplish the task. The anatomy of a context-engineered system includes: 📊 𝗨𝘀𝗲𝗿 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: Preferences, history, and personalization data 🔧 𝗧𝗼𝗼𝗹 𝗨𝘀𝗲: APIs, calculators, search engines - whatever the LLM needs to get the job done 🔍 𝗥𝗔𝗚 𝗖𝗼𝗻𝘁𝗲𝘅𝘁: Retrieved information from vector databases like Weaviate 💬 𝗨𝘀𝗲𝗿 𝗜𝗻𝗽𝘂𝘁: The actual query or task at hand 🧠 𝗔𝗴𝗲𝗻𝘁 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴: The LLM's thought process and decision-making chain 📜 𝗖𝗵𝗮𝘁 𝗛𝗶𝘀𝘁𝗼𝗿𝘆: Previous interactions that provide continuity But what about the memory architecture? • 𝗦𝗵𝗼𝗿𝘁-𝘁𝗲𝗿𝗺 𝗺𝗲𝗺𝗼𝗿𝘆: Lives in the context window, handling current conversations • 𝗟𝗼𝗻𝗴-𝘁𝗲𝗿𝗺 𝗺𝗲𝗺𝗼𝗿𝘆: Stored in vector databases (like Weaviate), persisting user preferences and past interactions across sessions Why does this matter? Because when agentic systems fail, it's rarely because the model isn't smart enough. It's because we haven't given it the right context. The format matters too. A well-structured error message beats a massive JSON blob every time. Just like humans, LLMs need clear, digestible communication.

victorialslocum's tweet photo. Prompt engineering is dead. Long live 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴

(Well, not quite dead - but it's definitely evolving into something way more powerful)

Meet 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 - the art of building dynamic systems that give LLMs exactly what they need to succeed.

As we move from simple chatbots to complex AI agents, we're realizing that clever prompts aren't enough. What matters is orchestrating an entire ecosystem of information that flows into your LLM.

So what exactly does that mean?

It's about building dynamic systems to provide the right information and tools in the right format such that the LLM can plausibly accomplish the task.

The anatomy of a context-engineered system includes:
📊 𝗨𝘀𝗲𝗿 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: Preferences, history, and personalization data
🔧 𝗧𝗼𝗼𝗹 𝗨𝘀𝗲: APIs, calculators, search engines - whatever the LLM needs to get the job done
🔍 𝗥𝗔𝗚 𝗖𝗼𝗻𝘁𝗲𝘅𝘁: Retrieved information from vector databases like Weaviate
💬 𝗨𝘀𝗲𝗿 𝗜𝗻𝗽𝘂𝘁: The actual query or task at hand
🧠 𝗔𝗴𝗲𝗻𝘁 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴: The LLM's thought process and decision-making chain
📜 𝗖𝗵𝗮𝘁 𝗛𝗶𝘀𝘁𝗼𝗿𝘆: Previous interactions that provide continuity

But what about the memory architecture?
• 𝗦𝗵𝗼𝗿𝘁-𝘁𝗲𝗿𝗺 𝗺𝗲𝗺𝗼𝗿𝘆: Lives in the context window, handling current conversations
• 𝗟𝗼𝗻𝗴-𝘁𝗲𝗿𝗺 𝗺𝗲𝗺𝗼𝗿𝘆: Stored in vector databases (like Weaviate), persisting user preferences and past interactions across sessions

Why does this matter? Because when agentic systems fail, it's rarely because the model isn't smart enough. It's because we haven't given it the right context.

The format matters too. A well-structured error message beats a massive JSON blob every time. Just like humans, LLMs need clear, digestible communication.

222

81K

Niz @homo_codean

10 months ago

🫥

X. Dong

@SimonXinDong

10 months ago

It turns out, > GRPO is performing the arithmetic mean --> token-level scaling > GSPO is performing the geometric mean --> sequence-level scaling Check the blog if you do not have time to read. https://t.co/tVHJnSYFse

737

699

69K

homo_codean retweeted

Jaana Dogan ヤナドガン

@rakyll

almost 2 years ago

Burnout happens when there is no progress. Burnout also happens when you have no visibility into how dots will connect in the future. Burnout is about not being able to close the loop after putting a ton of effort and not seeing enough results.

17K

800K

Niz

@homo_codean

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users