sunjeet @sunjeetx - Twitter Profile

sunjeetx retweeted

about 2 months ago

The entire RAG industry is about to get cooked. Researchers have built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. It's called PageIndex. Instead of chunking your docs and stuffing them into pinecone, it builds a tree index and lets the LLM reason through it like a human reading a book. hit 98.7% on financebench. beats every vector RAG on the leaderboard. no embeddings. no chunking. no vector DB. 100% open source.

HowToPrompt__'s tweet photo. The entire RAG industry is about to get cooked.

Researchers have built a new RAG approach that:

- does not need a vector DB.
- does not embed data.
- involves no chunking.
- performs no similarity search.

It's called PageIndex. Instead of chunking your docs and stuffing them into pinecone, it builds a tree index and lets the LLM reason through it like a human reading a book.

hit 98.7% on financebench. beats every vector RAG on the leaderboard.

no embeddings. no chunking. no vector DB.

100% open source.

222

7K

777

12K

619K

sunjeetx retweeted

Brij Pandey

@LearnWithBrij

about 2 months ago

Claude Code ships with 5 architectural layers most engineers never open. Not features. Not settings. Layers — each solving a distinct problem that LLMs alone can't solve. And four of them have nothing to do with prompting. Here's the full Agent Development Kit: Layer 1 — CLAUDE.md → The Memory Layer Architecture rules, naming conventions, test expectations, repo map. Always loaded. Always active. Two scopes: • ~/.claude/CLAUDE.md → global • .claude/CLAUDE.md → project This isn't context you paste in before every session. It's context that never needs repeating. The agent's constitution. Layer 2 — Skills → The Knowledge Layer Each SKILL.md carries a description. Claude matches it at runtime and forks the skill into an isolated subagent. On-demand, never always-on. Task-specific knowledge without inflating your main context window. Modular by design. Layer 3 — Hooks → The Guardrail Layer PreToolUse → PostToolUse → SessionStart → Stop → SubagentStop This is the layer most teams skip. And the one they regret skipping first. Hooks are NOT AI. They're deterministic event-driven shell commands. • Auto-lint on every Write • Hard-block on rm -rf • Slack notification on Stop Event fires → Matcher checks → Command runs Quality enforced at the infrastructure level. Not the prompt level. Layer 4 — Subagents → The Delegation Layer Each subagent gets its own context window, model, tools, and permissions. Main agent delegates down. Receives results up. That's it. No infinite recursion — subagents can't spawn subagents. Main context stays clean. Hard boundaries by design. Layer 5 — Plugins → The Distribution Layer Bundle your skills + agents + hooks + commands into a plugin. One install. Whole team inherits the behavior. Think npm packages — but for what your agent knows how to do. Wrapping everything: → MCP Servers on the left (GitHub, databases, APIs, custom integrations) → Agent Teams on the right (parallel execution, message passing, shared permissions) The 5-layer stack in one line: CLAUDE.md sets rules → Skills provide expertise → Hooks enforce quality → Subagents delegate work → Plugins distribute to the team Most production failures in agentic systems trace back to one missing layer. Which one is the gap in your current setup?

LearnWithBrij's tweet photo. Claude Code ships with 5 architectural layers most engineers never open.
Not features. Not settings. Layers — each solving a distinct problem that LLMs alone can't solve. And four of them have nothing to do with prompting.
Here's the full Agent Development Kit:

Layer 1 — CLAUDE.md → The Memory Layer
Architecture rules, naming conventions, test expectations, repo map. Always loaded. Always active.
Two scopes:
• ~/.claude/CLAUDE.md → global
• .claude/CLAUDE.md → project
This isn't context you paste in before every session. It's context that never needs repeating. The agent's constitution.

Layer 2 — Skills → The Knowledge Layer
Each SKILL.md carries a description. Claude matches it at runtime and forks the skill into an isolated subagent. On-demand, never always-on.
Task-specific knowledge without inflating your main context window. Modular by design.

Layer 3 — Hooks → The Guardrail Layer
PreToolUse → PostToolUse → SessionStart → Stop → SubagentStop

This is the layer most teams skip. And the one they regret skipping first.

Hooks are NOT AI. They're deterministic event-driven shell commands.
• Auto-lint on every Write
• Hard-block on rm -rf
• Slack notification on Stop
Event fires → Matcher checks → Command runs
Quality enforced at the infrastructure level. Not the prompt level.

Layer 4 — Subagents → The Delegation Layer
Each subagent gets its own context window, model, tools, and permissions.
Main agent delegates down. Receives results up. That's it.
No infinite recursion — subagents can't spawn subagents. Main context stays clean. Hard boundaries by design.

Layer 5 — Plugins → The Distribution Layer
Bundle your skills + agents + hooks + commands into a plugin. One install. Whole team inherits the behavior.
Think npm packages — but for what your agent knows how to do.

Wrapping everything:
→ MCP Servers on the left (GitHub, databases, APIs, custom integrations)
→ Agent Teams on the right (parallel execution, message passing, shared permissions)

The 5-layer stack in one line:
CLAUDE.md sets rules → Skills provide expertise → Hooks enforce quality → Subagents delegate work → Plugins distribute to the team
Most production failures in agentic systems trace back to one missing layer.

Which one is the gap in your current setup?

26

2K

430

3K

358K

sunjeetx retweeted

Bilgin Ibryam

@bibryam

2 months ago

OWASP Top 10 Critical Security Risks for 2026 https://t.co/qkpDpKimfo

0

70

26

66

5K

sunjeetx retweeted

Bilgin Ibryam

@bibryam

2 months ago

AI coding without structure is just a Ferrari on muddy roads: very fast, very stuck. SPDD says: version the prompt, review the intent, then let the model write code inside the guardrails. 🤩 Structured-Prompt-Driven Development by @thoughtworks Treats prompts as first-class engineering artifacts: → Prompts live in version control → Requirements, design, norms, and safeguards are captured up front → Code generation follows a structured blueprint → Logic changes update the prompt first → Refactors sync back into the prompt https://t.co/E70oBhYiZ6

0

55

13

61

4K

Who to follow

Andrei

@Andrew_Sh

Software Engineer at https://t.co/jXuHi69xUY

Mohan Sekhar

@mohansekhar

Technologist, Innovator, Golfer, Passionate about what is Right.

sunjeetx retweeted

Allen Braden

@allen_explains

3 months ago

This 2-hour Stanford lecture breaks down how models like ChatGPT and Claude are actually built, clearer than what many people in top AI roles ever get exposed to. Save this and set aside two hours today. It might end up being the most valuable thing you learn all week.

157

27K

4K

81K

3M

sunjeetx retweeted

OpenAI

@OpenAI

3 months ago

Automations can now run in the same thread, so Codex can pick up where it left off, with the original context intact. It can schedule future work and wake up automatically to continue long-term tasks, from landing open PRs to following up on tasks or staying on top of fast-moving conversations.

7

412

14

55

65K

sunjeetx retweeted

Unsloth AI

@UnslothAI

3 months ago

You can now fine-tune Gemma 4 with our free notebooks! 🔥 You just need 8GB VRAM to train Gemma 4 locally! Unsloth trains Gemma4 1.5x faster with 50% less VRAM. GitHub: https://t.co/aZWYAtakBP Guide: https://t.co/NBwKoFH2lp Gemma-4-E4B Colab: https://t.co/JjpCQgWEpL

UnslothAI's tweet photo. You can now fine-tune Gemma 4 with our free notebooks! 🔥

You just need 8GB VRAM to train Gemma 4 locally!

Unsloth trains Gemma4 1.5x faster with 50% less VRAM.

GitHub: https://t.co/aZWYAtakBP
Guide: https://t.co/NBwKoFH2lp

Gemma-4-E4B Colab: https://t.co/JjpCQgWEpL https://t.co/OZvby0tAAq

40

2K

354

2K

241K

sunjeetx retweeted

Maziyar PANAHI

@MaziyarPanahi

3 months ago

🚨 Over 1 billion rows of psychiatric genetics data. Now on Hugging Face. ADHD. Depression. Schizophrenia. Bipolar. PTSD. OCD. Autism. Anxiety. Tourette. Eating disorders. 12 disorder groups. 52 publications. Every GWAS summary statistic from the Psychiatric Genomics Consortium. Before: wget, gunzip, 20 minutes debugging separators, repeat 50 times. Now: one line of Python.

MaziyarPanahi's tweet photo. 🚨 Over 1 billion rows of psychiatric genetics data. Now on Hugging Face.

ADHD. Depression. Schizophrenia. Bipolar. PTSD. OCD. Autism. Anxiety. Tourette. Eating disorders.

12 disorder groups. 52 publications. Every GWAS summary statistic from the Psychiatric Genomics Consortium.

Before: wget, gunzip, 20 minutes debugging separators, repeat 50 times.
Now: one line of Python.

122

4K

586

6K

1M

sunjeetx retweeted

How To Prompt

@HowToPrompt__

3 months ago

🚨 Microsoft has solved the biggest problem with AI. They open-sourced bitnet.cpp. It’s a 1-bit inference framework that runs massive 100B parameter models directly on your CPU without GPUs. it uses 82% less energy.. 100% open-source.

147

3K

386

4K

488K

sunjeetx retweeted

Google Gemma

@googlegemma

3 months ago

Gemma 4 can run on phones without an internet connection! 🤯 It can perform local agentic tasks, such as logging and analyzing trends. When connected, it can also make API calls. Want to try it yourself? Get the Google AI Edge App on iOS or Android. (🔊 Sound on for the demo!)

320

9K

1K

4K

758K

sunjeetx retweeted

Google Gemma

@googlegemma

3 months ago

Who wants to know how Gemma 4 works? This visual guide breaks down the new architectures and how they process text, images, and (for the smaller models) audio. 👇

googlegemma's tweet photo. Who wants to know how Gemma 4 works?

This visual guide breaks down the new architectures and how they process text, images, and (for the smaller models) audio.

👇 https://t.co/5f7xBps0oX

82

4K

490

2K

246K

sunjeetx retweeted

Google DeepMind @GoogleDeepMind

3 months ago

Available in four sizes: 🔵 31B Dense & 26B MoE: state-of-the-art performance for advanced local reasoning tasks – like custom coding assistants or analyzing scientific datasets. 🔵 E4B & E2B (Edge): built for mobile with real-time text, vision, and audio processing.

GoogleDeepMind's tweet photo. Available in four sizes:

🔵 31B Dense & 26B MoE: state-of-the-art performance for advanced local reasoning tasks – like custom coding assistants or analyzing scientific datasets.

🔵 E4B & E2B (Edge): built for mobile with real-time text, vision, and audio processing.

41

1K

113

400

439K

sunjeetx retweeted

Google DeepMind @GoogleDeepMind

3 months ago

Meet Gemma 4: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵

368

9K

1K

6K

4M

sunjeetx retweeted

Anthropic

@AnthropicAI

3 months ago

This research is a product of our Anthropic Fellows program, led by @tomjiralerspong and supervised by @TrentonBricken. See the full paper here: https://t.co/gz1i1Oy8ZI

19

228

30

56

98K

sunjeetx retweeted

Chrome for Developers

@ChromiumDev

over 1 year ago

Follow along with @jecfish as she shows you how WebDriver BiDi supercharges cross-browser testing 🧪 Learn how this browser automation can help with mock network requests and monitoring console events ▶️ https://t.co/TiNXQyZpJV

1

31

7

15

10K

sunjeetx retweeted

Aurimas Griciūnas

@Aurimas_Gr

over 1 year ago

A simple way to explain 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝗠𝗲𝗺𝗼𝗿𝘆. In general, the memory for an agent is something that we provide via context in the prompt passed to LLM that helps the agent to better plan and react given past interactions or data not immediately available. It is useful to group the memory into four types: 𝟭. Episodic - This type of memory contains past interactions and actions performed by the agent. After an action is taken, the application controlling the agent would store the action in some kind of persistent storage so that it can be retrieved later if needed. A good example would be using a vector Database to store semantic meaning of the interactions. 𝟮. Semantic - Any external information that is available to the agent and any knowledge the agent should have about itself. You can think of this as a context similar to one used in RAG applications. It can be internal knowledge only available to the agent or a grounding context to isolate part of the internet scale data for more accurate answers. 𝟯. Procedural - This is systemic information like the structure of the System Prompt, available tools, guardrails etc. It will usually be stored in Git, Prompt and Tool Registries. 𝟰. Occasionally, the agent application would pull information from long-term memory and store it locally if it is needed for the task at hand. 𝟱. All of the information pulled together from the long-term or stored in local memory is called short-term or working memory. Compiling all of it into a prompt will produce the prompt to be passed to the LLM and it will provide further actions to be taken by the system. We usually label 1. - 3. as Long-Term memory and 5. as Short-Term memory. A visual explanation of potential implementation details 👇 And that is it! The rest is all about how you architect the flow of your Agentic systems. What do you think about memory in AI Agents? #LLM #AI #MachineLearning Want to learn how to build an Agent from scratch without any LLM orchestration framework? Follow my journey here: https://t.co/dQS4CtNPC0

67

2K

395

3K

143K

sunjeetx retweeted

Bilgin Ibryam

@bibryam

over 1 year ago

Hot take: Cognitive load is better metric than clean code! https://t.co/PzLEXXmvat by @zakirullin

5

200

53

130

20K

sunjeetx retweeted

John Rush

@johnrushx

over 1 year ago

🚨Ilya Sutskever finally confirmed > scaling LLMs at the pre-training stage plateaued > the compute is scaling but data isn’t and new or synthetic data isn’t moving the needle What’s next > same as human brain, stopped growing in size but humanity kept advancing, the agents and tools on top of LLMs will fuel the progress > sequence to sequence learning > agentic behavior > teach self awareness Think of it as the “iPhone”, which kept getting bigger and more useful from hardware point, but plateaued and the while focused shifted to applications. 2025 will be the year of Agents! > @Replit for coding > @seobotai for content > @crewAIInc for the rest

johnrushx's tweet photo. 🚨Ilya Sutskever finally confirmed

> scaling LLMs at the pre-training stage plateaued
> the compute is scaling but data isn’t and new or synthetic data isn’t moving the needle

What’s next
> same as human brain, stopped growing in size but humanity kept advancing, the agents and tools on top of LLMs will fuel the progress
> sequence to sequence learning
> agentic behavior
> teach self awareness

Think of it as the “iPhone”, which kept getting bigger and more useful from hardware point, but plateaued and the while focused shifted to applications.

2025 will be the year of Agents!

> @Replit for coding
> @seobotai for content
> @crewAIInc for the rest

206

8K

895

6K

1M

sunjeetx retweeted

Bilgin Ibryam

@bibryam

over 1 year ago

Chunking Strategies for RAG https://t.co/wAj7lKdn61

1

13

5

9

1K

sunjeetx retweeted

Addy Osmani

@addyosmani

over 1 year ago

I’ve been recommending Bolt or v0 for full-stack code bootstrapping, and Cline or Cursor for iteration. Quick tips: ✍️ Make sure requirements are clear 🔎 Test & verify all changes are expected 💵 Plan around token limits & costs

addyosmani's tweet photo. I’ve been recommending Bolt or v0 for full-stack code bootstrapping, and Cline or Cursor for iteration. Quick tips:

✍️ Make sure requirements are clear
🔎 Test & verify all changes are expected
💵 Plan around token limits & costs https://t.co/vuyKgrUFDa

1

117

15

79

19K

sunjeet

@sunjeetx

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users