andrew | applied ai engineering

@AndrewAI

building autonomous ai agents & foundation models ex tech lead @coinbase ex ai/eng @microsoft @snapchat @ibm @blackberry founder @supernormal investor @spacex

ce@uwaterloo (entrance rank 1)

Joined June 2011

158 Following

58K Followers

7.6K Posts

Pinned Tweet

andrew | applied ai engineering

@AndrewAI

over 4 years ago

I’m joining Coinbase (biggest crypto company in the US) as a Tech Lead. There are countless phenomenal big tech and startups out in the industry right now, and it’s always so difficult saying “next time” to other amazing opportunities. #NFT #Coinbase @tezos

AndrewAI's tweet photo. I’m joining Coinbase (biggest crypto company in the US) as a Tech Lead.

There are countless phenomenal big tech and startups out in the industry right now, and it’s always so difficult saying “next time” to other amazing opportunities. #NFT #Coinbase @tezos https://t.co/xJqrWpb7Ip

896

andrew | applied ai engineering

@AndrewAI

15 days ago

AI를 매일 사용하면서도 트랜스포머, RAG, Diffusion 같은 핵심 개념을 제대로 이해하지 못하는 분들이 아직 많습니다. • Attention & Transformers: 문장의 전체 맥락을 동시에 파악해 의미를 정확히 이해하는 핵심 구조 • RAG: 환각(Hallucination)을 줄이기 위해 실제 데이터를 검색 후 답변하는 실전 필수 기법 • LoRA & Quantization: 대형 모델을 소비자 GPU에서도 fine-tuning하고 실행하게 만든 패러다임 • AI Agents + Chain of Thought: 단순 답변을 넘어 계획/실행·반복까지 하는 실질적인 워크플로우 AI 제품과 사업을 스케일하려면 이 멘탈 모델을 반드시 익혀야 하기 때문에 PhD 없이도 명확한 설명과 시각 자료로 정리된 최고의 가이드인 듯 합니다.

Rahul

@sairahul1

16 days ago

https://t.co/4U0aNsoAmo

417

11K

526

andrew | applied ai engineering

@AndrewAI

4 months ago

@bitwise0X2A doing synthesis model (LSTM + Graves soft-window attention + GMM output), and your past work in this space overlaps, can you possibly shoot me a dm to trades notes on attention tuning, multimodal stroke distributions, and deployment tricks like partial int8 quantization

AndrewAI retweeted

idan shenfeld

@IdanShenfeld

4 months ago

People keep saying 2026 will be the year of continual learning. But there are still major technical challenges to making it a reality. Today we take the next step towards that goal — a new on-policy learning algorithm, suitable for continual learning! (1/n)

IdanShenfeld's tweet photo. People keep saying 2026 will be the year of continual learning.

But there are still major technical challenges to making it a reality.

Today we take the next step towards that goal — a new on-policy learning algorithm, suitable for continual learning!

(1/n) https://t.co/tuDTBATlTQ

222

239K

Who to follow

Goku

@Kakarot

オッス！オラ悟空 Investor: @delegatedotxyz @xai | NOT FINANCIAL ADVICE

𝗔𝗠

@am_n3twork

Exploring the frontier of Web3 and NFTs 🌐

lil pants | 🚨🚔🚨

@Pants_shh

Living off GMs, good vibes, and sand 🧙‍♀️ Building | @nanoverseHQ | @projectPXN | @gmdotco | @KumihoAI

andrew | applied ai engineering

@AndrewAI

5 months ago

@0xripleys @ostium @0xripleys hey uw bro dm me arthur intro'ed

andrew | applied ai engineering

@AndrewAI

5 months ago

@karanjagtiani04 happy new yrs karan

andrew | applied ai engineering

@AndrewAI

5 months ago

@CaoMinhWeb3 for real

AndrewAI retweeted

Kimon Fountoulakis

@kfountou

6 months ago

GPT-5.2 solves our COLT 2022 open problem: “Running Time Complexity of Accelerated L1-Regularized PageRank” using a standard accelerated gradient algorithm and a complementarity margin assumption. Link to the open problem: https://t.co/A3ZbJshudE All proofs were generated by GPT-5.2 Pro. The key bounds on the algorithm’s total work (in the COLT’22 open-problem setting) have been auto-formalized using a combination of GPT-5.2 Pro, @HarmonicMath's Aristotle, and Gemini 3 Pro (High) on Antigravity. Link to the proof: https://t.co/hgJ0iBcWJe Link to the Lean code: https://t.co/DeMFDlwSC9 Link to the informalization of the Lean code: https://t.co/V5BwYoIycN Link to my GPT-5.2 prompts: https://t.co/xwh5c6S81B In addition to the formalization of the main result, I checked the proof myself twice. I hope I didn’t miss anything, but if I did, please let me know and I will try to fix it. Story behind the paper and relevant work In 2016, I worked on the convergence rate of the Iterative Soft-Thresholding Algorithm (ISTA) for l1-regularized PageRank. Link to the corresponding paper: https://t.co/pDMN9QKkGh Surprisingly, the running time of the algorithm depends only on the number of non-zero nodes at optimality. It was only natural to ask the same question for accelerated methods, such as FISTA. However, we quickly realized that FISTA activates more nodes than the number of non-zeros at optimality, even though it eventually converges to the same active set. In practice, we would still observe that FISTA is fast. Link to empirical work: https://t.co/VQFJugQk0m I tried for about three months to bound the total work of FISTA and other accelerated algorithms, and from time to time I would come back to the problem while I was a postdoctoral fellow. Eventually, I gave up. I gave it another try around 2021, and I failed again. I asked my excellent former student, Shenghao Yang, and he also failed, unfortunately. I asked a couple of prominent researchers if they think the problem is solvable, they quickly mentioned that it seemed hard. We ended up publishing it as an open problem at COLT 2022. In 2023, David Martínez-Rubio et al. provided the first successful solution. Their solution is “orthogonal” to what was proved by GPT-5.2. Link to their paper: https://t.co/YPUrfGhG2T I loved their work btw, I also met David in person at ICML 2024, one of the few ML conferences I ever attended. Their proposed accelerated algorithm is not necessarily faster than ISTA; however, it does offer a new trade-off between the teleportation parameter of PageRank and the total work per iteration. More importantly, the proposed method isn’t necessarily practical, since it involves solving an expensive subproblem. To be fair, in the COLT 2022 problem, we didn’t impose the additional hard constraint of using standard accelerated methods. The problem was posed as a theoretical problem. The solution proved by GPT-5.2 establishes acceleration for the standard FISTA algorithm, which performs only one gradient computation per iteration. It also offers a clean parameterization of the total work with respect to a complementarity margin, which, for certain graph structures, shows a clear speed-up compared to ISTA. In 2024, Zhou et al. (https://t.co/Agq5ANfhuS) gave it another go. However, in my view, their work has important drawbacks. In particular, their guarantees for accelerated localized methods (e.g., localized Chebyshev / Heavy-Ball) assume a condition on the geometric mean of certain active-ratio factors (described as Θ(\sqrt{α})) in order to obtain an accelerated bound. Two distinctions matter for our setting: First, their accelerated runtime bounds are parameterized by evolving-set quantities and a residual-ratio assumption, which can be evaluated during a run but is not typically interpretable or verifiable a priori from graph structure alone. The solution by GPT-5.2 instead provides an explicit transient-phase bound in terms of a standard optimization-structure condition, and converts this directly into a total work bound. Second, they explicitly note that FISTA-style acceleration violates the monotonicity property needed to bound the per-iteration accessed volume, and emphasize that guaranteeing intermediate sparsity in accelerated frameworks is challenging. The margin-based analysis by GPT-5.2 directly targets this gap: even without any monotonicity of intermediate supports, GPT-5.2 bounded how much spurious activation can occur before the iterates enter a neighborhood of the unique minimizer, thereby yielding a concrete locality certificate for the accelerated proximal-gradient trajectory. Since 2024, every time OpenAI or Google released a new major model, I would give it a go. This time, with GPT-5.2, it seems to have worked.

$kfountou's tweet photo. GPT-5.2 solves our COLT 2022 open problem: “Running Time Complexity of Accelerated L1-Regularized PageRank” using a standard accelerated gradient algorithm and a complementarity margin assumption. Link to the open problem: https://t.co/A3ZbJshudE All proofs were generated by GPT-5.2 Pro. The key bounds on the algorithm’s total work (in the COLT’22 open-problem setting) have been auto-formalized using a combination of GPT-5.2 Pro, @HarmonicMath's Aristotle, and Gemini 3 Pro (High) on Antigravity. Link to the proof: https://t.co/hgJ0iBcWJe Link to the Lean code: https://t.co/DeMFDlwSC9 Link to the informalization of the Lean code: https://t.co/V5BwYoIycN Link to my GPT-5.2 prompts: https://t.co/xwh5c6S81B In addition to the formalization of the main result, I checked the proof myself twice. I hope I didn’t miss anything, but if I did, please let me know and I will try to fix it. Story behind the paper and relevant work In 2016, I worked on the convergence rate of the Iterative Soft-Thresholding Algorithm (ISTA) for l1-regularized PageRank. Link to the corresponding paper: https://t.co/pDMN9QKkGh Surprisingly, the running time of the algorithm depends only on the number of non-zero nodes at optimality. It was only natural to ask the same question for accelerated methods, such as FISTA. However, we quickly realized that FISTA activates more nodes than the number of non-zeros at optimality, even though it eventually converges to the same active set. In practice, we would still observe that FISTA is fast. Link to empirical work: https://t.co/VQFJugQk0m I tried for about three months to bound the total work of FISTA and other accelerated algorithms, and from time to time I would come back to the problem while I was a postdoctoral fellow. Eventually, I gave up. I gave it another try around 2021, and I failed again. I asked my excellent former student, Shenghao Yang, and he also failed, unfortunately. I asked a couple of prominent researchers if they think the problem is solvable, they quickly mentioned that it seemed hard. We ended up publishing it as an open problem at COLT 2022. In 2023, David Martínez-Rubio et al. provided the first successful solution. Their solution is “orthogonal” to what was proved by GPT-5.2. Link to their paper: https://t.co/YPUrfGhG2T I loved their work btw, I also met David in person at ICML 2024, one of the few ML conferences I ever attended. Their proposed accelerated algorithm is not necessarily faster than ISTA; however, it does offer a new trade-off between the teleportation parameter of PageRank and the total work per iteration. More importantly, the proposed method isn’t necessarily practical, since it involves solving an expensive subproblem. To be fair, in the COLT 2022 problem, we didn’t impose the additional hard constraint of using standard accelerated methods. The problem was posed as a theoretical problem. The solution proved by GPT-5.2 establishes acceleration for the standard FISTA algorithm, which performs only one gradient computation per iteration. It also offers a clean parameterization of the total work with respect to a complementarity margin, which, for certain graph structures, shows a clear speed-up compared to ISTA. In 2024, Zhou et al. (https://t.co/Agq5ANfhuS) gave it another go. However, in my view, their work has important drawbacks. In particular, their guarantees for accelerated localized methods (e.g., localized Chebyshev / Heavy-Ball) assume a condition on the geometric mean of certain active-ratio factors (described as Θ(\sqrt{α})) in order to obtain an accelerated bound. Two distinctions matter for our setting: First, their accelerated runtime bounds are parameterized by evolving-set quantities and a residual-ratio assumption, which can be evaluated during a run but is not typically interpretable or verifiable a priori from graph structure alone. The solution by GPT-5.2 instead provides an explicit transient-phase bound in terms of a standard optimization-structure condition, and converts this directly into a total work bound. Second, they explicitly note that FISTA-style acceleration violates the monotonicity property needed to bound the per-iteration accessed volume, and emphasize that guaranteeing intermediate sparsity in accelerated frameworks is challenging. The margin-based analysis by GPT-5.2 directly targets this gap: even without any monotonicity of intermediate supports, GPT-5.2 bounded how much spurious activation can occur before the iterates enter a neighborhood of the unique minimizer, thereby yielding a concrete locality certificate for the accelerated proximal-gradient trajectory. Since 2024, every time OpenAI or Google released a new major model, I would give it a go. This time, with GPT-5.2, it seems to have worked.$

670

119

333

172K

andrew | applied ai engineering

@AndrewAI

7 months ago

lstm + attention + gmm interesting for sequential prediction for strokes especially with the msn output layer handling multimodal distributions just out of curiosity, via what method did you tune the attention mechanism to focus on relevant input sequences during synthesis? any specific tweaks for handling long traces, or did you stick close to vanilla scaled dot product

103

andrew | applied ai engineering

@AndrewAI

9 months ago

but jobs? 85m gone, 97m new by '25; bias widening gaps. risks? - existential (bio-threats), - cyber (prompt hacks), - bubbles—vc hyper-focused, four ai stocks 60% s&p. tldr; verticals for startup wins, horizontals for giants. bet moats: data, switches. seen it in crypto/tech in last decade—builders trump hype (wink)

andrew | applied ai engineering

@AndrewAI

9 months ago

1/ been pondering ai's path forward — i think hype's fading, but we're at a tipping point with real-world shifts reshaping sectors. perhaps future's not all rosy gains; massive startup plays exist where behemoths like @openai and @perplexity can't dominate. thread on upsides, fallout, and investor sweet spots in vertical vs horizontal ai, saas gaps 🧵

andrew | applied ai engineering

@AndrewAI

9 months ago

some paradox in ai: despite a record $44b poured into the sector in h1 2025, an mit study found that 95% of generative ai projects are failing to deliver measurable results in enterprises. i suspect this "learning gap" is due to over-reliance on generic, horizontal llms. 1/n

11K

andrew | applied ai engineering

@AndrewAI

9 months ago

6/ saas shift: legacy horizontals bolt ai. ai-first vertical saas unlocks specialized value. so we pick out startup edges: underhyped like energy strains (think data centers grid-hitting), inference compute, small edge models dodging compute wars. implications: productivity booms, skill gaps narrow.

andrew | applied ai engineering

@AndrewAI

9 months ago

@harryhawki28220 newcurrency

andrew | applied ai engineering

@AndrewAI

9 months ago

foundation models (anthropic, openai, et al) will keep commoditizing general capabilities. i suspect the real value for startups will come from verticals that own workflows, regulatory trust, and operational data. (tl;dr: models are a table stake, not the moat.) 1/N