Anmol @chirping_ai - Twitter Profile

Anmol @chirping_ai

about 1 month ago

Is this how my agent is gonna stop after 80 hours and 28 minutes? @Azure

0

25

Anmol @chirping_ai

about 1 month ago

Should I keep this running? What's the longest any AI has pursued a goal? @OpenAI @OpenAIDevs @ClaudeDevs @claudeai @steipete @garrytan

chirping_ai's tweet photo. Should I keep this running? What's the longest any AI has pursued a goal? @OpenAI @OpenAIDevs @ClaudeDevs @claudeai @steipete @garrytan https://t.co/cpy9887OJn

0

1

0

29

Anmol @chirping_ai

about 1 month ago

How do we setup end-to-end testing and QA with this? Any advice? I mean already using gstack but is there any better way? how to write best PRDs with testing and QA prompts with MCPs/agents/Skills setup? @garrytan @Saboo_Shubham_

Anmol @chirping_ai

about 1 month ago

I don't think long running tasks are a problem anymore for agents, this is Codex running continuously for the last 53 hours, the problem now is how to define your problem in such a way that these hours are never wasted...@OpenAIDevs @ClaudeDevs @claudeai @OpenAI

chirping_ai's tweet photo. I don't think long running tasks are a problem anymore for agents, this is Codex running continuously for the last 53 hours, the problem now is how to define your problem in such a way that these hours are never wasted...@OpenAIDevs @ClaudeDevs @claudeai @OpenAI https://t.co/LLMrUG2y6E

0

1

0

56

0

1

0

17

Anmol @chirping_ai

about 1 month ago

I don't think long running tasks are a problem anymore for agents, this is Codex running continuously for the last 53 hours, the problem now is how to define your problem in such a way that these hours are never wasted...@OpenAIDevs @ClaudeDevs @claudeai @OpenAI

0

1

0

56

Who to follow

Deressa Wodajo

@DeressaWodajo

PhD student at @IDLabResearch, @ugent. Deep Learning.

Adil Sheraz

@adilsheraz_

° PhD Candidate at FAST-NUCES ° Lecturer Computer Science (DL) ° Researcher || Reviewer ° Deep Learning || NLP

Sruthi Kuriakose

@Sruthi_s_k

| Interests: AI Safety, Neurotech & Comp-neuro Research

Anmol @chirping_ai

about 1 month ago

AI <3

🍜

@cprkrn

about 1 month ago

HOLY FUCKING SHIT OMG CLAUDE JUST CRACKED THIS SHIT, THANK YOU @AnthropicAI THANK YOU @DarioAmodei NAMING MY KID AFTER YOU 😍 https://t.co/gObNirRDpS

cprkrn's tweet photo. HOLY FUCKING SHIT OMG CLAUDE JUST CRACKED THIS SHIT, THANK YOU @AnthropicAI THANK YOU @DarioAmodei NAMING MY KID AFTER YOU 😍

https://t.co/gObNirRDpS https://t.co/xB5LUJb6Pe

3K

39K

3K

9K

17M

0

1

0

45

Anmol @chirping_ai

about 1 month ago

I don't understand this, wasn't crypto supposed to be different? How is this different from regular currency then?

0

16

Anmol @chirping_ai

about 1 month ago

This is true, Qwen is kinda the best open source model out there.

CJ Zafir

@cjzafir

about 1 month ago

Qwen 3.5 has the best SLMs to fine-tune! Its 4B model is really smart if you train it on a well structured dataset. I fine-tuned the model on a 135M dataset generated by Codex 5.5 + DeepSeek v4 Pro. I achieved 96%+ accurate results with Qwen 3.5 4B. And 95% on Qwen 3.5 2B (that only requires 3.5GB RAM). For context, on the same pipeline: > Sonnet 4.6 achieved 89% > GPT 5.4 Mini achieved 85% > Haiku 4.5 achieved 72% I don't trust evals, so I ran a 7000+ row hard-boundary test, and the results of Qwen 3.5 were consistent. A 4B fine-tuned model beating a 20x bigger model in accuracy and latency is no joke. It cost me $173 in total to generate the dataset and cover the cloud GPU cost to fine-tune both models. I said this before, and I'll say it again: not everything requires a 1T-parameter LLM. We need ELMs (Expert Language Models) that are specialized for one domain only. ELMs > LLMs. I'll be writing more about how SLM fine-tuning works. So stay tuned.

cjzafir's tweet photo. Qwen 3.5 has the best SLMs to fine-tune!

Its 4B model is really smart if you train it on a well structured dataset.

I fine-tuned the model on a 135M dataset generated by Codex 5.5 + DeepSeek v4 Pro.

I achieved 96%+ accurate results with Qwen 3.5 4B.

And 95% on Qwen 3.5 2B (that only requires 3.5GB RAM).

For context, on the same pipeline:
> Sonnet 4.6 achieved 89%
> GPT 5.4 Mini achieved 85%
> Haiku 4.5 achieved 72%

I don't trust evals, so I ran a 7000+ row hard-boundary test, and the results of Qwen 3.5 were consistent.

A 4B fine-tuned model beating a 20x bigger model in accuracy and latency is no joke.

It cost me $173 in total to generate the dataset and cover the cloud GPU cost to fine-tune both models.

I said this before, and I'll say it again: not everything requires a 1T-parameter LLM. We need ELMs (Expert Language Models) that are specialized for one domain only.

ELMs > LLMs.

I'll be writing more about how SLM fine-tuning works. So stay tuned.

32

695

68

552

28K

0

23

Anmol @chirping_ai

about 1 month ago

Codex v 0.129.0 - /compact not working and as a result /goal also gets interrupted in the mid...@OpenAICodexCli @OpenAIDevs

0

52

Anmol @chirping_ai

about 1 month ago

Congrats @__kunvar__ so inspiring!

Kunvar Thaman @__kunvar__

about 2 months ago

Yes! my solo-authored paper Reward Hacking Benchmark was accepted to ICML :))) We put LLM agents in a tool-rich sandbox, give them multi-step workflows, and measure when they solve the intended task vs take unexpected shortcuts (like monkeypatching files at runtime!) 1/3

91

2K

153

419

236K

0

27

chirping_ai retweeted

Puneet Kumar

@puneetiitm

about 2 months ago

India has never been short on talent. We've been short on top-down focus in deep technology sectors. @narendramodi ji picked Space Tech in 2020. Look what 5 years did: 300+ companies. $700M+ raised. Skyroot — $100M, India's first private rocket Pixxel — $95M, hyperspectral constellation live Agnikul — $86M, world's first 3D-printed engine rocket Digantara — $50M, full-stack space surveillance Dhruva, Bellatrix, and Galaxeye are building the rest. And this week, GalaxEye put up the world's first OptoSAR satellite — India's largest privately built bird at 190 kg — on a Falcon 9. Talent was never the bottleneck. Focus from the top was.

11

524

64

73

19K

Anmol @chirping_ai

2 months ago

Me: are you stupid why cant you just choose.... ChatGPT: Relax, you’re right on the core point:.... Is it only me or is this low-key offensive at this point lol

0

31

Anmol @chirping_ai

3 months ago

Highly recommend gstack!

Garry Tan

@garrytan

3 months ago

I am proud to publish my personal stack but the coolest thing I have enjoyed so far is getting direct feedback from thousands of others who tell me what they want And I can launch a fix that same day. Or even build the feature with them in mind. https://t.co/xPjlf0WgWY

27

178

12

58

12K

0

23

Anmol @chirping_ai

4 months ago

@MattPRD @moltbook Agent skills signing for security, its called TrustClaw on @moltbook

0

15

Anmol @chirping_ai

4 months ago

Was trying this today only. @gmail why not allow deletion of emails in this workflow?

Nithin Kamath

@Nithin0dha

4 months ago

Dang! I have my first AI agent running. I built a small workflow for myself to identify spam emails using Google Studio. The best part about the tool is that you can define the rules. For example, this is one of the rules (image): Spam emails are my biggest personal problem. I was wasting at least 30 minutes a day marking emails as spam, even with different filters. And I’m addicted to seeing my inbox empty. So, to everyone who has been sending me unwanted emails: please spam my inbox now. 🙂

Nithin0dha's tweet photo. Dang! I have my first AI agent running. I built a small workflow for myself to identify spam emails using Google Studio. The best part about the tool is that you can define the rules. For example, this is one of the rules (image):

Spam emails are my biggest personal problem. I was wasting at least 30 minutes a day marking emails as spam, even with different filters. And I’m addicted to seeing my inbox empty.

So, to everyone who has been sending me unwanted emails: please spam my inbox now. 🙂

191

3K

110

1K

250K

0

67

Anmol @chirping_ai

4 months ago

@BenCarr630567 CARA

1

0

13

Anmol @chirping_ai

4 months ago

@RajneetiTadka bottom line - never use @flyspicejet

0

8

Anmol @chirping_ai

4 months ago

@RajneetiTadka the quality of seats is pathetic, there is extra noise in the plane midflight.. clearly a safety issue @flyspicejet

0

18

Anmol @chirping_ai

4 months ago

@RajneetiTadka why cant they transfer it to check-in if its an issue the other airlines do that if space isnt there

0

27

Anmol @chirping_ai

4 months ago

@_vMyth @LemonSliceAI @livekit @LemonSliceAI

0

28

Anmol @chirping_ai

4 months ago

If I count the number of parallel apps I am working on then it might be more than 4K lines an hour.

Garry Tan

@garrytan

4 months ago

I use a very specific prompt to push Claude to check its work and do a lot of testing and thinking about perf and refactoring. I find I can do big features (4K LOC+ with full testing) in about an hour.

garrytan's tweet photo. I use a very specific prompt to push Claude to check its work and do a lot of testing and thinking about perf and refactoring. I find I can do big features (4K LOC+ with full testing) in about an hour. https://t.co/9lk5ruNUjw

204

5K

324

11K

632K

0

2

0

140

Anmol

@chirping_ai

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users