kukeshajanth kodeswaran @kukeshajanth - Twitter Profile

kukeshajanth kodeswaran

@kukeshajanth

about 2 hours ago

Marking for the future 🔜

Toronto Tech Week

@TOtechweek

about 3 hours ago

"Every idea is a bad idea until all of a sudden it's a good idea. It's just who doesn't quit the soonest." Rebel's Emily Hosie on the mission statement that builds companies.

1

0

110

0

2

kukeshajanth kodeswaran

@kukeshajanth

4 days ago

Amazing work 🔥

CJ Zafir

@cjzafir

4 days ago

Here's a teaser of our Mac-1 model. > 6.6B model > runs locally (on any Mac) > requires 7GB RAM (12GB ideal) > can use 487 MacOS native tools > perform multi-tool chained tasks > reasoning: ON > output: ~65 tok/s We built a robust application layer around the model to make UI/UX MacOS native. The "model-focused" SaaS era is here. Stay tuned for more.

160

5K

297

5K

1M

0

13

kukeshajanth retweeted

Ahmad

@TheAhmadOsman

7 days ago

Step-By-Step LLM Engineering Projects Roadmap - Build a tokenizer - Learn embeddings - Implement RoPE / ALiBi - Hand-wire attention - Build MHA - Build a Transformer block - Train a mini-former - Compare objectives - Build sampling - Speculative decoding - KV cache - MQA / GQA / MLA - Long context - FlashAttention - Hardware budgets - Toy MoE - Sparse model trade-offs - State-space / linear attention - Diffusion language models - Data pipelines - Synthetic data - Scaling laws - SFT / DPO / RLHF / GRPO - Quantization - Serving stacks - Eval harnesses - RAG - Tool use / agents - Vision-language adapters - Interpretability - Red-team suite - Full capstone model system One request: Choose an Opensource AI lab when you make it Opensource is where humanity gets to keep the tools DM me when you've made it ;)

35

2K

261

3K

119K

kukeshajanth retweeted

Ivan Fioravanti ᯅ

@ivanfioravanti

11 days ago

Apple dropped a paper called "Embarrassingly Simple Self-Distillation Improves Code Generation" in April. Their official repo runs on vLLM + Nvidia (makes sense for speed). Not MLX on Apple Silicon. So obviously I had to try the opposite. 💪 🧵

ivanfioravanti's tweet photo. Apple dropped a paper called "Embarrassingly Simple Self-Distillation Improves Code Generation" in April.

Their official repo runs on vLLM + Nvidia (makes sense for speed).

Not MLX on Apple Silicon.

So obviously I had to try the opposite. 💪 🧵 https://t.co/41hMZft7pc

7

157

8

151

28K

Who to follow

Igor Tica

@ITica007

Building startups | ex @Microsoft | ex Writer @AlphaSignalAI | Research: Self-Supervised & Contrastive Learning | LLM Reasoning & Evaluation

kukeshajanth kodeswaran

@kukeshajanth

11 days ago

Switched primarily to Codex around end of April. 8.4B tokens 🫥. This is close to 20% of the tasks, we need more compute and tokens ( this is with maxing out Claude Code 20x as well) . Not even running openclaw or Hermes. Just coding , qa 😱

0

1

0

47

kukeshajanth kodeswaran

@kukeshajanth

11 days ago

who knew lora would be the way we reach continual learning. It opened up customization in diffusion. Now it comes to llm, probably it is a starting point for what is next to come.

Trajectory

@trajectorylabs

11 days ago

🏹5 Days of Trajectory. Day 3 - An Open Source Training Stack for Continual Learning Building the platform for continual learning requires both partnering with pioneering AI companies, as we showed on Day 2 with Harvey, and working toward frontier research, which we are highlighting today. Continual learning means models that improve hourly from real production use. But with the size of frontier models, this becomes quite difficult. A Qwen-397b would need to spin up and tear down repeatedly across six GPU nodes, and that's valuable time gone. Our contribution is Continual LoRA (C-LoRA): many lightweight adapters running at once on one shared base model. Our insight centers on where the parallelism lives: instead of splitting one giant job across nodes, we load-balance many small jobs over a single base. The result: 2.81x experiment throughput over single-tenant training, with no regression on rewards. We built this together, with @anyscalecompute, @NovaSkyAI, and generous support from @GoogleCloud and @GoogleStartups. We've open-sourced on SkyRL as one of the first multi-LoRA, RL training platforms, so that every team can get to continual learning faster. We’re very excited to see what you build, please reach out!

trajectorylabs's tweet photo. 🏹5 Days of Trajectory.

Day 3 - An Open Source Training Stack for Continual Learning

Building the platform for continual learning requires both partnering with pioneering AI companies, as we showed on Day 2 with Harvey, and working toward frontier research, which we are highlighting today.

Continual learning means models that improve hourly from real production use. But with the size of frontier models, this becomes quite difficult. A Qwen-397b would need to spin up and tear down repeatedly across six GPU nodes, and that's valuable time gone.

Our contribution is Continual LoRA (C-LoRA): many lightweight adapters running at once on one shared base model. Our insight centers on where the parallelism lives: instead of splitting one giant job across nodes, we load-balance many small jobs over a single base.

The result: 2.81x experiment throughput over single-tenant training, with no regression on rewards.

We built this together, with @anyscalecompute, @NovaSkyAI, and generous support from @GoogleCloud and @GoogleStartups. We've open-sourced on SkyRL as one of the first multi-LoRA, RL training platforms, so that every team can get to continual learning faster.

We’re very excited to see what you build, please reach out!

11

511

62

395

93K

0

94

kukeshajanth kodeswaran

@kukeshajanth

11 days ago

We probably need a higher limit plan probably 50x or 100x, hitting limits on both codex 20x and claude 20x, only 5 days in. Ton more to do, not enough tokens 😅. Dreaming of the day, when there will be unlimited tokens and the bottleneck is us and our imagination. #codex #claudecode

0

1

0

62

kukeshajanth kodeswaran

@kukeshajanth

11 days ago

@gregpr07 May be 1K for unlimited codex with existing fast mode.

0

61

kukeshajanth retweeted

Viv

@Vtrivedy10

12 days ago

using a good Skill, a CLI, and seeing Codex’s in-context-learning ability is a magical experience point it to Harbor skills repo, Prime Intellect CLI, gave it an objective of what we wanted to RL and just watched it chug along figuring out the whole setup and debugging weird niche errors us humans get the fun part of interpreting results, thinking through what’s happening, and deciding what to do next agents training agents 🔥 humans guiding the process

10

128

12

93

8K

kukeshajanth kodeswaran

@kukeshajanth

12 days ago

@Nakniki3 hello

0

1

kukeshajanth kodeswaran

@kukeshajanth

12 days ago

Currently using codex max 20x and Claude max 20x . Probably will get one more codex max . Return to dollar spend is insane 🔥

BridgeMind

@bridgemindai

12 days ago

I just bought my 3rd $200 Claude Max 20x plan. That's $600/month on Claude alone. And it's the best money I spend. UltraCode is insane. Claude Opus 4.8 is better than GPT 5.5 in my honest opinion. While everyone else cries about UltraCode burning their usage, I'm running all 3 plans in parallel as BridgeMind scales to $1M ARR. Here's what people don't get. I'm a real builder with a real SaaS. I made over $50K in Stripe revenue the last 3 months. $600 for unlimited frontier AI isn't an expense, it's the cheapest employee I'll ever hire. Three Max plans means I never wait, never throttle, never stop shipping. Cry about usage or go make money with it. Your choice.

bridgemindai's tweet photo. I just bought my 3rd $200 Claude Max 20x plan.

That's $600/month on Claude alone.

And it's the best money I spend.

UltraCode is insane.

Claude Opus 4.8 is better than GPT 5.5 in my honest opinion.

While everyone else cries about UltraCode burning their usage, I'm running all 3 plans in parallel as BridgeMind scales to $1M ARR.

Here's what people don't get.

I'm a real builder with a real SaaS.

I made over $50K in Stripe revenue the last 3 months.

$600 for unlimited frontier AI isn't an expense, it's the cheapest employee I'll ever hire.

Three Max plans means I never wait, never throttle, never stop shipping.

Cry about usage or go make money with it.

Your choice.

118

762

31

140

57K

0

26

kukeshajanth kodeswaran

@kukeshajanth

13 days ago

This was not on my cards 🔥

wd 🔺

@populartourist

13 days ago

A 8B-A1B performing virtually better than 30B-A3B 🙀

16

358

23

189

52K

0

1

0

14

kukeshajanth kodeswaran

@kukeshajanth

13 days ago

I am on this boat ❤️

Jon Saad-Falcon

@JonSaadFalcon

13 days ago

The dominant story in AI has been the growing cloud: bigger clusters, larger models, more gigawatts. We believe the future is in the opposite direction: on-device inference, smaller models, watts instead of gigawatts. Today we're releasing @OpenJarvisAI v1.0: a personal AI assistant that lives, learns, and works on your device.

50

600

91

566

148K

0

1

0

263

kukeshajanth kodeswaran

@kukeshajanth

13 days ago

@brettcalhounn Software factory with continual learning/improvement for smbs

0

1

0

75

kukeshajanth retweeted

dharmesh

@dharmesh

14 days ago

If you work in the software industry and have time to read only one long-form post today, read this one. If you have time to read two, read this one twice. Highly #recommend tl;dr: Stay off the yellow brick road that the frontier model companies are racing down. There is plenty of opportunity to solve hard problems elsewhere. Focus on areas where you can build the system of work (workflows), capture compounding, non-public data and deliver deterministic outcomes that customers need.

28

2K

154

4K

460K

kukeshajanth kodeswaran

@kukeshajanth

14 days ago

@ai_for_success New girl dad , with codex and Claude code , my productivity is up but also the time I spend with my family is up.

0

1

0

112

kukeshajanth kodeswaran

@kukeshajanth

14 days ago

100%

Logan Kilpatrick

@OfficialLoganK

14 days ago

"you can outsource your thinking, but you can’t outsource your understanding" easy to forget in todays AI era, worth remembering everyday as we all wield more intelligence!

210

3K

233

391

125K

0

14

kukeshajanth retweeted

Gokul Rajaram

@gokulr

14 days ago

https://t.co/ndFngnunPX

7

105

10

239

16K

kukeshajanth kodeswaran

@kukeshajanth

14 days ago

Have been testing different versions of this , can attest quality of life have never been better and shipping speed have never been this better too. Meta : Codex app server / Claude code long lived token + VPS + remote control ( both codex and Claude ) + some secret sauce 😂🔥

Argona

@Argona0x

15 days ago

a 28-year-old in Berlin runs a 7-agent software factory off a remote server she approves checkpoints from her phone at midnight; her workspace has no desk, no city, no fixed machine - any screen is just a terminal she quoted $28,000 for a scope a local agency priced at $74,000 and told the client 'minimum 6 weeks.' the agents shipped a validated PR in 19 hours the agency was still revising their proposal i've been running a version of this for the past few months. the setup sounds absurd until the first time it works, and then you can't go back the factory lives on a remote VPS - always on, eight tmux panes, already mid-session. ssh in from whatever screen is nearby: laptop at home, phone on a train, tablet at a café at 1am. the environment never moves. you're just a terminal window connecting to something that was already running agencies price the way they do because their overhead is structural. a $74,000 quote on a 6-week scope is real math: account managers, a senior dev who gets rotated to a bigger client by week three, revision cycles that exist because context lives in fourteen slack threads instead of one file the factory collapses all of that into a CLAUDE.md → a 100-line markdown file at the repo root loads the entire project into every new agent session - stack, architectural rules, banned patterns - so no session starts blind and no context drifts between runs → agent one is read-only: maps the existing codebase, documents patterns, flags risks before any agent touches the code → agent two writes the user story and acceptance criteria, locking the exact definition of done before engineering starts → agent three produces the technical brief: data model changes, API shapes, a precise list of every file that will move - this locks before any builder runs → backend and frontend build in parallel but in isolation, each scoped to its own directories, so they can't reach across and corrupt each other's work → agent six writes acceptance tests against the original user story criteria before the implementation is considered complete → agent seven runs a final read-only audit: missing auth, tenant isolation gaps, any deviation from the brief gets flagged back into the loop before the pr is cut → three checkpoints pause the entire chain for human approval - story, spec, pre-merge - each one a 30-second phone tap when the upstream work was done right the 19 hours is the output. what compressed was everything underneath: the pm relaying a question to an engineer who responds two days later, the architectural mistake that only surfaces after code is written, the context drift between sessions because the memory layer is a human brain instead of a file that loads before anything runs the loop closes itself. validator flags a gap, builder fixes it, verifier confirms, pr is clean by morning the agency sent their revised proposal at 9am. the pr had been merged for 14 hours she approved the final checkpoint at midnight, 30 seconds on her phone. the agents were already done the desk, the office, the fixed machine - she left them out

75

3K

279

6K

769K

0

83

kukeshajanth kodeswaran

@kukeshajanth

14 days ago

GPT IMAGE 2 is great , same with nano banana pro. Curious how long was the generation and what is the cost. Gpt usually takes close to a minute for an image and then costs close to 15 to 20 cents per image and even nano banana 2 is close to 8 cents per image.For a full site , isn’t it expensive. If any one has found a provider who provides amazing identity preserving vton and also really fast inference. Please hit me up.

0

3

0

563

kukeshajanth kodeswaran

@kukeshajanth

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users