Yashraj

Verified account

@yrjdev

AI Engineer & Full-Stack Dev Building: SupportGPT → AI support widget for SaaS Cliy → component registry for AI agents Open for retainer work ·

Ocean Wires

Joined August 2025

120 Following

155 Followers

2K Posts

Pinned Tweet

10 days ago

Most RAG systems work like this: → User asks question → Retrieve similar chunks → Send to LLM → Generate answer The problem: question-to-chunk similarity is often weak. Example: “How do I reset my password?” That may not semantically match a documentation paragraph talking about authentication flows, account settings, or credential recovery. So I’m experimenting with a different retrieval approach: At ingest time: generate possible questions each page can answer. At query time: match question-to-question instead of question-to-chunk. Early results: → significantly better retrieval relevance → fewer unnecessary tokens sent to the LLM → much cleaner context windows Calling this approach: QIndex-RAG. Still testing and refining it, but the retrieval quality improvement is already noticeable.

yrjdev's tweet photo. Most RAG systems work like this:
→ User asks question
→ Retrieve similar chunks
→ Send to LLM
→ Generate answer

The problem: question-to-chunk similarity is often weak.

Example: “How do I reset my password?”
That may not semantically match a documentation paragraph talking about authentication flows, account settings, or credential recovery.

So I’m experimenting with a different retrieval approach:
At ingest time: generate possible questions each page can answer.

At query time: match question-to-question instead of question-to-chunk.

Early results:
→ significantly better retrieval relevance
→ fewer unnecessary tokens sent to the LLM
→ much cleaner context windows

Calling this approach: QIndex-RAG.

Still testing and refining it, but the retrieval quality improvement is already noticeable.

0

3

0

0

241

about 16 hours ago

@codeswithroh that's why I always keep my local and production keys separately and most of the time used the locally hosted keys only.

1

1

0

0

2

about 16 hours ago

Secret Keys need to be Secret

about 17 hours ago

never store private keys in plain .env files @PatrickAlphaC keeps saying this for a reason but people still do it because the safer workflow is usually clunky so i built vaultenv a small npm package to store secrets encrypted locally, then load them into your shell only when needed npm install -g codeswithroh/vaultenv

8

16

3

3

423

2

3

0

0

131

about 19 hours ago

Check your knowledge

yrjdev's tweet photo. Check your knowledge https://t.co/CfCypMjYsq

0

1

0

0

4

about 21 hours ago

0

1

0

0

265

about 24 hours ago

@codeswithroh back to work

1

1

0

0

32

1 day ago

@dayonefoundry Yes, Focus is the key

0

0

0

0

22

1 day ago

When you work with LLM APIs. Before setting up the Response structure Setup for request data, so LLM can get only the information that needed. Otherwise Garbage In, Garbage Out

5 days ago

Generating a simple tweet was costing us >>>>>>> 18,000+ tokens. The prompt: ~400 tokens. The tweet: 70 tokens. So where did the other 17,500 go? // GPT-5 Nano is a reasoning model. max_output_tokens doesn't just cap the output. It caps reasoning + output combined. The model was spending 17,000 tokens thinking before writing 70. The fix wasn't a bigger cap. It was a dynamic one: >> inputTokens = ceil(promptChars / 4.5) >> reasoning = max(4000, inputTokens × 5) >> cap = outputBudget + reasoning Why ×5? Measured in production: 1,737 input → 7,273 reasoning tokens That's x4.2, We use 5 for safety margin. Short videos = small cap. Long videos = large cap. No waste. No truncation. Reasoning models need reasoning budgets. Not output limits.

yrjdev's tweet photo. Generating a simple tweet was costing us >>>>>>> 18,000+ tokens.

The prompt: ~400 tokens.
The tweet: 70 tokens.

So where did the other 17,500 go?

// GPT-5 Nano is a reasoning model. max_output_tokens doesn't just cap the output.

It caps reasoning + output combined.
The model was spending 17,000 tokens thinking before writing 70.

The fix wasn't a bigger cap.

It was a dynamic one:
>> inputTokens = ceil(promptChars / 4.5)
>> reasoning = max(4000, inputTokens × 5)
>> cap = outputBudget + reasoning

Why ×5?
Measured in production:
1,737 input → 7,273 reasoning tokens
That's x4.2, We use 5 for safety margin.

Short videos = small cap.
Long videos = large cap.

No waste. No truncation.

Reasoning models need reasoning budgets.
Not output limits.

1

3

0

0

80

0

3

0

0

16

2 days ago

@gamerhelo7 @O_Anu_O yes I know, good connections matter, but I also don't have connections to get a job.

1

0

0

0

25

2 days ago

Python Workout

2 days ago

Print the value: If you understand slicing in Python, Then you can answer this.

yrjdev's tweet photo. Print the value:

If you understand slicing in Python,
Then you can answer this. https://t.co/Ow6op19ayo

0

1

0

0

72

0

0

0

0

38

2 days ago

Print the value: If you understand slicing in Python, Then you can answer this.

yrjdev's tweet photo. Print the value:

If you understand slicing in Python,
Then you can answer this. https://t.co/Ow6op19ayo

0

1

0

0

72

2 days ago

@dayonefoundry waiting and too excited to join

0

0

0

0

22

2 days ago

@tdinh_me when the Next will happen?

0

0

0

0

70

3 days ago

@codeswithroh I'm iron man

1

1

0

0

8

3 days ago

@BatsouElef right, and I'm working on the same projects that become a layer for the coding agents

0

1

0

0

2

3 days ago

@codeswithroh same, i WANT THIS ONE

yrjdev's tweet photo. @codeswithroh same, i WANT THIS ONE https://t.co/8R6J6WH8KB

1

1

0

0

28

3 days ago

@kirat_tw Yes, I'd faced same in this apis, when working in normal is fine, but when try to optimize to the best outcome, token limit, dynamic capping, prompt template - all are really a battle not all the things works like described in docs

0

1

0

0

923

4 days ago

@codeswithroh Amazing bro, video looks really good. keep enjoying

1

1

0

0

18

Last Seen Users on Sotwe

Trends for you

Most Popular Users