Raúl Ferrer

@RaulFerrerAI

Tech Lead | PhD Chemistry | Author ‘iOS Architecture Patterns’ | Building RAG w/ Java, Spring, Weaviate | Towards Reliable Enterprise AI #ReliableEnterpriseAI

Joined January 2011

659 Following

1K Followers

5.6K Posts

Pinned Tweet

Raúl Ferrer @RaulFerrerAI

3 months ago

After 15+ years in software engineering (Mobile Team Lead), i'm diving into Enterprise AI. My focus: • RAG architectures • AI evaluation • AI observability • Reliable Enterprise AI systems Sharing what I learn as I go. If you're exploring the same space, follow along.

RaulFerrerAI's tweet photo. After 15+ years in software engineering (Mobile Team Lead), i'm diving into Enterprise AI.

My focus:

• RAG architectures
• AI evaluation
• AI observability
• Reliable Enterprise AI systems

Sharing what I learn as I go.

If you're exploring the same space, follow along. https://t.co/QGcojdZwQe

0

3

0

0

146

Raúl Ferrer @RaulFerrerAI

about 2 months ago

@_avichawla Good framing. The real issue isn’t RAG vs CAG, it’s cache invalidation. KV cache cuts cost/latency but risks staleness and weak auditability. Use CAG for stable data; RAG for anything needing freshness, traceability, or compliance.

0

0

0

0

74

Raúl Ferrer @RaulFerrerAI

about 2 months ago

@HowToAI_ Stanford tested 3 legal RAG systems. Westlaw hallucinated 33%, Lexis+ 17%. GPT-4: 58-82%. RAG works—but closed-box systems lack hybrid search, reranking, & observability. Proper architecture fixes this. That’s what Reliable Enterprise AI means. #ReliableEnterpriseAI

0

3

3

1

293

Raúl Ferrer @RaulFerrerAI

about 2 months ago

This is a textbook case of why Reliable Enterprise AI requires rigorous auditing. Model degradation under GPU load is an operational risk. A robust Spring Boot/Weaviate stack with active guardrails is the only way to ensure compliance. #ReliableEnterpriseAI

about 2 months ago

AMD Senior AI Director confirms Claude has been nerfed. She analyzed Claude's session logs from Janurary to March: > median thinking dropped from ~2,200 to ~600 chars > API requests went up 80x from Feb to Mar. less thinking and failed attempts meaning more retries, burning more tokens, and spending more on tokens > reads-per-edit dropped from 6.6x → 2.0x. model stops researching code before touching it. > model tried to bail out or ask "should i continue" 173 times in 17 days (0 times before March 8). > self-contradiction in reasoning ("oh wait, actually...") tripled. > conventions like CLAUDE.md get ignored because there's less thinking budget to cross-check edits > 5pm and 7pm PST are the worst hours, late night is significantly better. this means the thinking allocation is most likely GPU-load-sensitive.

Hesamation's tweet photo. AMD Senior AI Director confirms Claude has been nerfed. She analyzed Claude's session logs from Janurary to March:
> median thinking dropped from ~2,200 to ~600 chars
> API requests went up 80x from Feb to Mar. less thinking and failed attempts meaning more retries, burning more tokens, and spending more on tokens
> reads-per-edit dropped from 6.6x → 2.0x. model stops researching code before touching it.
> model tried to bail out or ask "should i continue" 173 times in 17 days (0 times before March 8).
> self-contradiction in reasoning ("oh wait, actually...") tripled.
> conventions like CLAUDE.md get ignored because there's less thinking budget to cross-check edits
> 5pm and 7pm PST are the worst hours, late night is significantly better. this means the thinking allocation is most likely GPU-load-sensitive.

324

9K

1K

4K

4M

0

3

0

0

211

Who to follow

AI enthusiast | iOS developer | Web3

Verified account

Building tools to let anyone build great apps. Co-founder of @BitrigApp. Previously: Co-creator of SwiftUI at Apple.

DotNET/Flutter Software Engineer. Author. Likes C#, Astronomy, Coins, Cricket. Family man, amateur cook, whisky drinker. All round nice chap.

Raúl Ferrer @RaulFerrerAI

2 months ago

Hybrid Search is NOT one thing. “BM25 + vectors = done” is a dangerous oversimplification. It’s a design space with trade-offs in fusion, ranking, and tuning that directly impact RAG quality. If you’re building enterprise AI, read this: #AI #LLMs #RAG https://t.co/S3bqDPCRh7

0

0

1

0

32

Raúl Ferrer @RaulFerrerAI

2 months ago

Stop building brittle AI. Just dropped a deep dive on building production-ready RAG with #LangChain4j. Focus on determinism, reliability, and enterprise scale. https://t.co/kv0QFDQTlY #EnterpriseAI #Java #RAG #LLM

0

0

0

0

49

Raúl Ferrer @RaulFerrerAI

2 months ago

RAG isn’t a single pattern—it’s a set of architectural decisions. Most failures happen in retrieval, not the model. A practical map from naive to agentic RAG for reliable enterprise AI systems. https://t.co/L5WyZPrLKF

RaulFerrerAI's tweet photo. RAG isn’t a single pattern—it’s a set of architectural decisions. Most failures happen in retrieval, not the model. A practical map from naive to agentic RAG for reliable enterprise AI systems.

https://t.co/L5WyZPrLKF https://t.co/mR8gpGZH2f

0

0

0

0

25

Raúl Ferrer @RaulFerrerAI

2 months ago

KV cache efficiency is becoming the real bottleneck in LLM serving. 6x compression + 8x speedup sounds strong—but “zero accuracy loss” under which workloads? In RAG, small attention drift can break grounding and consistency.

Google Research

@GoogleResearch

2 months ago

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: https://t.co/CDSQ8HpZoc

1K

39K

6K

22K

19M

0

0

0

0

37

Raúl Ferrer @RaulFerrerAI

2 months ago

Quick compliance check for your RAG: • Can you explain WHY it retrieved that chunk? • Can you prove the source was authorized? • Can you reproduce the same output tomorrow? If any answer is “not really” — the EU AI Act has a word for that: high-risk. #ReliableEnterpriseAI

0

0

0

0

13

Raúl Ferrer @RaulFerrerAI

2 months ago

Hot take: a RAG system doesn’t have an AI problem. It has a data pipeline problem. Wrong chunk size. Mismatched embedding models. No metadata filtering. The LLM is almost never the bottleneck. #ReliableEnterpriseAI

0

0

0

0

32

RaulFerrerAI retweeted

@akshay_pachaar

3 months ago

https://t.co/SSSIK3BX4z

151

11K

1K

43K

11M

RaulFerrerAI retweeted

Weaviate Podcast

@weaviatepodcast

2 months ago

Keyword Search gives you granularity. 🔎 Semantic Search gives you meaning. 🌌 Late Interaction gives you both. 🧬 In this clip, @AmelieTabatta explains what made Multi-Vector Search click for her 👇

2

40

14

20

7K

Raúl Ferrer @RaulFerrerAI

2 months ago

@johncrickett Well, I've only read the first two chapters, but it's already earned a place on my bookshelf. It's dense in a good way, not at all superficial. Chapter 2 alone is worth it. The way they explain embedding geometry finally made me understand the recovery mechanisms.

1

5

0

1

272

Raúl Ferrer @RaulFerrerAI

2 months ago

Most enterprise RAG systems fail silently. Not because the LLM is wrong. Because the chunks fed to it are too large, too small, or misaligned with the query. Chunk strategy is the first thing to audit. Get chunking wrong → retrieval wrong → answer wrong #ReliableEnterpriseAI

0

0

0

0

29

Raúl Ferrer @RaulFerrerAI

2 months ago

@svpino Finally. "Who approved this button?" will now be answered with "the LangChain agent felt it was right."

0

0

0

0

74

Raúl Ferrer @RaulFerrerAI

2 months ago

@aparnadhinak Every enterprise has dozens of high-context streams agents can't see: war-room calls, informal Slack decisions, the hallway conversation that killed the feature. Voice capture solves input. The unsolved part: structuring ephemeral context so retrieval stays reliable.

0

0

0

0

22

Raúl Ferrer @RaulFerrerAI

2 months ago

@jerryjliu0 @vercel LlamaParse solves the hard parsing problem. The next frontier: retrieval strategies weren't designed for semi-structured content once it's plaintext. Seeing accuracy differences between VLM-parsed tables vs. native extraction in downstream retrieval?

0

0

0

0

207

Raúl Ferrer @RaulFerrerAI

2 months ago

Week recap: migrating from Mobile to Enterprise AI isn't a context switch — it's a mindset switch. Mobile: optimize for the device. Enterprise AI: optimize for trust. The question isn't "does it work?" It's "can I prove it works, every time?" #ReliableEnterpriseAI #RAG

0

0

0

0

26

Raúl Ferrer @RaulFerrerAI

3 months ago

@johniosifov This highlights a critical shift: AI agents aren’t just software components, they’re operational actors with real permissions. Building reliable systems now means treating governance, access control, and auditing as first-class concerns.

0

0

0

0

18

Last Seen Users on Sotwe

Trends for you

Most Popular Users