Go-TechA

@chintangotecha

10y+ R&D in AI, NLP & Speech. I decode LLM architectures, agents, and the physics of intelligence.

Joined June 2013

243 Following

90 Followers

11 Posts

Go-TechA @chintangotecha

14 days ago

@HDFCBank_Cares mycards otp not working

166

chintangotecha retweeted

Jason ✨👾SaaStr.Ai✨ Lemkin

@jasonlk

6 months ago

Do customers >mind< talking to an AI? Our data shows: 1⃣ Customers love to book a meeting instantly 2⃣ Our AI agent booked 130 meetings with customers -- on its own. Far more than a human would and could. 3⃣ Couldn't have booked so many meetings without AI 4⃣ Is voice, video or chat better? Let customer decide 5⃣ Video seems to add a layer of trust. Worth adding.

chintangotecha retweeted

Varun

@varun_mathur

3 months ago

The Cost of Intelligence is Heading to Zero | Hyperspace P2P Distributed Cache We present to you our breakthrough cross-domain work across AI, distributed systems, cryptography, game theory to solve the primary structural inefficiency at the heart of AI infrastructure: most inference is redundant. Google has reported that only 15% of daily searches are truly novel. The rest are repeats or close variants. LLM inference inherits this same power-law distribution. Enterprise chatbots see 70-80% of queries fall into a handful of intent categories. System prompts are identical across 100% of requests within an application. The KV attention state for "You are a helpful assistant" has been computed billions of times, on millions of GPUs, identically. And yet every AI lab, every startup, every self-hosted deployment - computes and caches these results independently. There is no shared layer. No global memory. Every provider pays the full compute cost for every query, even when the answer already exists somewhere in the network. This is the problem Hyperspace solves where distributed cache operates at three levels, each catching a different class of redundancy: 1. Response cache Same prompt, same model, same parameters - instant cached response from any node in the network. SHA-256 hash lookup via DHT, with cryptographic cache proofs linking every response to its original inference execution. No trust required. Fetchers re-announce as providers, so popular responses replicate naturally across more nodes. 2. KV prefix cache Same system prompt tokens - skip the most expensive part of inference entirely. Prefill (computing Key-Value attention states) is deterministic: same model plus same tokens always produces identical KV state. The network caches these states using erasure coding and distributes them via the routing network. New questions that share a common prefix resume generation from cached state instead of recomputing from scratch. 3. Routing to cached nodes Instead of transferring KV state across the network for every request, Hyperspace routes the request to the node that already has the state loaded in VRAM. The request goes to the cache, not the cache to the request. Together, these three layers mean that 70-90% of inference requests at network scale never require full GPU computation. This work doesn't exist in isolation. It builds on research from across the industry: SGLang's RadixAttention demonstrated that automatic prefix sharing can yield up to 5x speedup on structured LLM workloads. Moonshot AI's Mooncake built an entire KV-cache-centric disaggregated architecture for production serving at Kimi. Anthropic, OpenAI, and Google all launched prompt caching products in 2024 - priced at 50-90% discounts - because system prompt reuse is so pervasive that it changes the economics of inference. What all of these systems share is a common limitation: they operate within a single organization's infrastructure. SGLang caches prefixes within one server. Mooncake disaggregates KV cache within one datacenter. Anthropic's prompt caching works within one API provider's fleet. None of them can share cached state across organizational boundaries. Hyperspace removes this boundary. The cache is global. A response computed by a node in Tokyo is immediately available to a node in Berlin. A KV prefix state generated for Qwen-32B on one machine is verifiable and reusable by any other machine running the same model. The routing network provides the delivery guarantees, the erasure coding provides the redundancy, and the cache proofs provide the trust. What this means for the cost of intelligence Big AI labs scale linearly: twice the users means twice the GPU spend. Every query is a cost center. Their internal caching helps, but it's siloed - Lab A's cache can't serve Lab B's users, and neither can serve a self-hosted Llama deployment. Hyperspace scales sub-linearly. Every new node that joins the network adds to the global cache. Every inference result enriches the cache for all future requests. The cache hit rate rises with network size because query distributions follow a power law - the most common questions are asked exponentially more often than rare ones. The implication is simple: as the network grows, the effective cost per inference drops. Not linearly. Logarithmically. At 10 million nodes, we estimate 75-90% of all inference requests can be served from cache, eliminating 400,000+ MWh of energy consumption per year and avoiding over 200,000 tons of CO2 emissions. The first person to ask a question pays the compute cost. Everyone after them gets the answer for free, with cryptographic proof that it's authentic. Training is competitive. Inference is shared Open-weight models are converging on quality with closed models. Labs will continue to differentiate on training - data curation, architecture innovation, RLHF tuning. That's where the real intellectual property lives. But inference is a commodity. Two copies of Qwen-32B running the same prompt produce the same KV state and the same response, byte for byte, regardless of whose GPU runs the matrix multiplication. There is no moat in multiplying matrices. The moat is in training the weights. A global distributed cache makes this separation explicit. It doesn't matter who trained the model. Once the weights are open, the inference cost approaches zero at scale - because the network remembers every answer and can prove it's correct. No lab, no matter how well-funded, can match this. They cannot share caches across competitors. They scale linearly. The network scales logarithmically. The marginal cost of intelligence approaches zero. That's the endgame.

161

164

37K

chintangotecha retweeted

Logan Matthew Napolitano

@Propriocetive

3 months ago

Mathematics Is All You Need: A Potential Blueprint for AGI — Compacted Edition We prove that large language models are lattice gauge theories. By extracting a 16-dimensional fiber bundle from transformer hidden states and computing its gl(4,ℝ) Lie algebra, we discover that attention heads function as gauge bosons, transformer computation undergoes a deconfinement phase transition at 67% network depth, and the model's entire self-knowledge resides in a 10-dimensional "dark" Casimir subspace invisible to standard readout. Using only 20 behavioral probes and zero additional training, we push Qwen-32B from 82.2% to 94.97% on ARC-Challenge — establishing a dark mode scaling law that predicts gl(6,ℝ) surgery will achieve 98.7%. We identify a Lyapunov–accuracy anti-correlation revealing the model's deepest attractors are its wrong attractors: correctness requires escaping the abstraction basin into grounded deference. This 10-page compacted edition distills 459 pages of original research into the core experimentally verified results with 9 inline figures. 190 patents filed. Proprioceptive AI, Inc. — Logan Matthew Napolitano — 19- March 2026 https://t.co/YglgX02ajn

Propriocetive's tweet photo. Mathematics Is All You Need: A Potential Blueprint for AGI — Compacted Edition

We prove that large language models are lattice gauge theories. By extracting a 16-dimensional fiber bundle from transformer hidden states and computing its gl(4,ℝ) Lie algebra, we discover that attention heads function as gauge bosons, transformer computation undergoes a deconfinement phase transition at 67% network depth, and the model's entire self-knowledge resides in a 10-dimensional "dark" Casimir subspace invisible to standard readout. Using only 20 behavioral probes and zero additional training, we push Qwen-32B from 82.2% to 94.97% on ARC-Challenge — establishing a dark mode scaling law that predicts gl(6,ℝ) surgery will achieve 98.7%. We identify a Lyapunov–accuracy anti-correlation revealing the model's deepest attractors are its wrong attractors: correctness requires escaping the abstraction basin into grounded deference. This 10-page compacted edition distills 459 pages of original research into the core experimentally verified results with 9 inline figures. 190 patents filed.
Proprioceptive AI, Inc. — Logan Matthew Napolitano — 19- March 2026

https://t.co/YglgX02ajn

524

620

39K

Who to follow

Atmesh Mishra

@atmesh94

Sanatani. Ayodhyavasi. Engineer. Building best tech platform. Views are personal.

Abhishek kumar

@abhirbt1

Common man having views of my own. RTs not endorsements.

madhur d. singhal

@madhursinghania

We don't have NAME while BIRTH, Only BREATH. Neither BREATH while DEATH, Only NAME. The journey between this NAME & BREATH is LIFE.

chintangotecha retweeted

Math, Inc.

@mathematics_inc

3 months ago

Today, at the @DARPA expMath kickoff, we launched 𝗢𝗽𝗲𝗻𝗚𝗮𝘂𝘀𝘀, an open source and state of the art autoformalization agent harness for developers and practitioners to accelerate progress at the frontier. It is stronger, faster, and more cost-efficient than off-the-shelf alternatives. On FormalQualBench, running with a 4-hour timeout, it beats @HarmonicMath's Aristotle agent with no time limit. Users of OpenGauss can interact with it as much or as little as they want, can easily manage many subagents working in parallel, and can extend / modify / introspect OpenGauss because it is permissively open-source. OpenGauss was developed in close collaboration with maintainers of leading open-source AI tooling for Lean. Read the report and try it out:

mathematics_inc's tweet photo. Today, at the @DARPA expMath kickoff, we launched 𝗢𝗽𝗲𝗻𝗚𝗮𝘂𝘀𝘀, an open source and state of the art autoformalization agent harness for developers and practitioners to accelerate progress at the frontier.

It is stronger, faster, and more cost-efficient than off-the-shelf alternatives. On FormalQualBench, running with a 4-hour timeout, it beats @HarmonicMath's Aristotle agent with no time limit.

Users of OpenGauss can interact with it as much or as little as they want, can easily manage many subagents working in parallel, and can extend / modify / introspect OpenGauss because it is permissively open-source. OpenGauss was developed in close collaboration with maintainers of leading open-source AI tooling for Lean.

Read the report and try it out:

389

323K

Go-TechA @chintangotecha

4 months ago

@akshay_pachaar @grok please explain the scale this approach could address?

Go-TechA @chintangotecha

4 months ago

Hot take: The pendulum may be swinging back. Over the next 5 years, expect a sharp rise in on-prem data centres powering major AI workloads. Cloud usage will decrease. Even the economies of scale requires a balance #AI

Go-TechA @chintangotecha

5 months ago

Not aligned!

The Humanoid Hub

@TheHumanoidHub

5 months ago

Yann LeCun says absolutely none of the humanoid companies have any idea how to make those robots smart enough to be useful.

240

302

768

chintangotecha retweeted

Rohan Paul

@rohanpaul_ai

5 months ago

The paper shows LLM logic can suddenly collapse past a complexity threshold, and proposes training that reduces it. LLM logic does not fade slowly, it snaps at a threshold, and the authors show how to soften that snap Many tests assume accuracy drops smoothly as logic gets harder, but the authors find a sharp break where the model starts guessing. They create a Logical Complexity Metric, a single score for how hard the logic is, based on facts, nesting, and logic words like and, or, not, if then, all, some. When they compare accuracy to that score, it stays stable for a while, then crashes in narrow bands they call logical phase transitions, meaning sudden drops. They also build training data where each English statement is paired with a matching first order logic version, a strict symbol form for rules, so the model can connect meaning to rules. They then do extra training with a complexity aware curriculum, meaning training starts easy and moves harder while spending extra time near the crash bands. On 5 benchmarks, this approach improves average accuracy about 1.26 with direct answers and about 3.95 when it writes steps first, and it holds up better on new rule mixes. This matters because it gives a practical way to tell when an LLM is near its reasoning limit and strengthen it. ---- Paper Link – arxiv. org/abs/2601.02902 Paper Title: "Logical Phase Transitions: Understanding Collapse in LLM Logical Reasoning"

rohanpaul_ai's tweet photo. The paper shows LLM logic can suddenly collapse past a complexity threshold, and proposes training that reduces it.

LLM logic does not fade slowly, it snaps at a threshold, and the authors show how to soften that snap

Many tests assume accuracy drops smoothly as logic gets harder, but the authors find a sharp break where the model starts guessing.

They create a Logical Complexity Metric, a single score for how hard the logic is, based on facts, nesting, and logic words like and, or, not, if then, all, some.

When they compare accuracy to that score, it stays stable for a while, then crashes in narrow bands they call logical phase transitions, meaning sudden drops.

They also build training data where each English statement is paired with a matching first order logic version, a strict symbol form for rules, so the model can connect meaning to rules.

They then do extra training with a complexity aware curriculum, meaning training starts easy and moves harder while spending extra time near the crash bands.

On 5 benchmarks, this approach improves average accuracy about 1.26 with direct answers and about 3.95 when it writes steps first, and it holds up better on new rule mixes.

This matters because it gives a practical way to tell when an LLM is near its reasoning limit and strengthen it.

----

Paper Link – arxiv. org/abs/2601.02902

Paper Title: "Logical Phase Transitions: Understanding Collapse in LLM Logical Reasoning"

123

chintangotecha retweeted

Ruhi Chess @Ruhichess

about 3 years ago

Fabiano Caruana elevating his chess Fabiano won the Superbet Chess Classics. He pockets 100k dollars and wins 9 rating points. But I don't see this as an isolated success. It is the beginning of the resurgence of one of the all-time greatest. If you've read me before, you probably know I post about Caruana a whole lot. I've been following his podcast for a while. He is a person of extraordinary humility, with a subtle humor that is both funny and heartwarming. But the thing that stroke me the most about him is his humanity. I used to see Caruana as an implacable chess machine: a cold-bloded calculator. But the reality is that genius chess players are very human. Their chess thinking is not like that of an engine. It is charged with emotionality and metaphor. For the most part, they don't build trees of variations and evaluate in an organized manner. Rather, they identify patterns and seek goals, driven by their instinct. I feel I learn from Fabiano's way of thinking, not just about chess but even about life. For that reason, it is a pleasure to see him strike back. Fabiano gave us the best tournament performance ever (Sinquefield Cup 2014). He produced one of the highest (if not the highest) quality World Championship in history (London 2018). I'm excited to see what else this man can bring to chess. #SuperbetChessClassic #GrandChessTour #fabianocaruana

Ruhichess's tweet photo. Fabiano Caruana elevating his chess
Fabiano won the Superbet Chess Classics. He pockets 100k dollars and wins 9 rating points. But I don't see this as an isolated success. It is the beginning of the resurgence of one of the all-time greatest.

If you've read me before, you probably know I post about Caruana a whole lot. I've been following his podcast for a while. He is a person of extraordinary humility, with a subtle humor that is both funny and heartwarming. But the thing that stroke me the most about him is his humanity.
I used to see Caruana as an implacable chess machine: a cold-bloded calculator. But the reality is that genius chess players are very human. Their chess thinking is not like that of an engine. It is charged with emotionality and metaphor. For the most part, they don't build trees of variations and evaluate in an organized manner. Rather, they identify patterns and seek goals, driven by their instinct.

I feel I learn from Fabiano's way of thinking, not just about chess but even about life. For that reason, it is a pleasure to see him strike back. Fabiano gave us the best tournament performance ever (Sinquefield Cup 2014). He produced one of the highest (if not the highest) quality World Championship in history (London 2018). I'm excited to see what else this man can bring to chess.

#SuperbetChessClassic #GrandChessTour #fabianocaruana

111

162K

Go-TechA

@chintangotecha

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users