normal people read docs to learn @modal i rebuilt the entire platform from scratch
https://t.co/aiezpLvJvO
https://t.co/RdCSElUHNq
@charles_irl pls hire
built kernelscope - CUDA kernel debugger that maps source lines to PTX + GPU events
runs entirely on @modal btw
analyzing warp state, memory coalescing, warp divergence, SM occupancy, hints 4 perf
more info&pics in next post
@charles_irl@can this is my job application part 2
there were only two problems: Linear connector didnt work smh, I had to install Chrome so Claude for Chrome extension would work. in the end i found cli importer and docs, that did the trick
yet again talking about reading list
finally got access for @claudeai cowork and tasked him with organizing my tabs right away
now i have @linear workspace with all my blogs/papers/etc organized by projects, each entry is issue, where i can change status, dates and more
each JD is issue in Job Application that has references to other projects/issues so I can cross something from my reading list while preparing to interview
seems like I finally solved my problem, other than solving FOMO
Oh, you're using Copilot? Everyone's on Cursor now. Just kidding, we're all on Windsurf. We're using Cline. We're using Aider. We have an in-house MCP server mesh with custom tool schemas but wait, OpenCode just dropped so we're migrating to that instead. Our PM is on Gemini CLI. The team lead was on Codex but now she's back to copy-pasting into ChatGPT. If you're not on Amp, you're ngmi. Our intern is building on Goose for our internal tooling. Our CFO approved Claude Max so now we're porting our workflows to computer use. Our CTO is working on an agent-less RAG pipeline so we won't need vibe coding anymore. Our CEO thinks we're talking about actual vibrations. We're building clankercloud.
you've commented if i will opensource kernelscope
here you go! https://t.co/3mYpWa7GRR
not my best code, but you asked for it
also popular question was my reading list, i have 'readed' list in my blog https://t.co/NqODnLaG7r
gm my fellow gpu lovers, what should i do next? ml perf/infra interviews coming up, trying to lock in
i have loong reading list on cuda, dl frameworks internals, sysdis, some perf case studies and bunch of puzzles, so requesting some tips on organizing/prioritizing & must reads
Oh, you're using Copilot? Everyone's on Cursor now. Just kidding, we're all on Windsurf. We're using Cline. We're using Aider. We have an in-house MCP server mesh with custom tool schemas but wait, OpenCode just dropped so we're migrating to that instead. Our PM is on Gemini CLI. The team lead was on Codex but now she's back to copy-pasting into ChatGPT. If you're not on Amp, you're ngmi. Our intern is building on Goose for our internal tooling. Our CFO approved Claude Max so now we're porting our workflows to computer use. Our CTO is working on an agent-less RAG pipeline so we won't need vibe coding anymore. Our CEO thinks we're talking about actual vibrations. We're building clankercloud.
@AISloppyJoel little more useful is what i already read
https://t.co/NqsDJAMe5L
because i sort low signal posts/blogs and write short description
i used to track this kind of stuff in obsidian, but idk, i remember almost everything i read and can find it fast so i dont see point in this
@AISloppyJoel idk how to share but i probably can pipe sidebar tabs into text file
now i have so much to read so i vibecoded some tool to visualize my tabs so i can take some skill paths like in videogames, but it doesnt cut it
Great post! But we often see GPUs that report 'Healthy' (full clocks & links up) but suffer from massive internal instruction replays or silent packet retransmits.
If you skip active benchmarks at boot, isn't there a high risk of one 'hollow' node tanking an entire distributed cluster? Or a node might have 'healthy' links, but if the cloud provider allocated it on a different spine switch than the rest of my cluster, my training run effectively dies.