Assoc. VP (AI) @NUSingapore, Assoc. Prof. @NUSComputing, Director @AISingapore. I love to chat about the value of data & how agents interact with data.
Congrats to my co-authors Weida Li, Zhuanghua Liu, and Yaoliang Yu!
@UncertaintyInAI#UAI2026
The #ShapleyValue is a widely used concept in attribution problems, as it uniquely satisfies the axioms of linearity, consistency, equal treatment, and efficiency. Often, the inclusion AUC metric is used to evaluate the quality of player rankings in order to identify positively participating players. However, it can be established that the Shapley value is not always reliable for this purpose. The core issue lies in its linearity: the Shapley value acts as a linear operator with an excessively large null space, which is likely to contain non-negligible perturbations that remain indistinguishable to the operator. To address this limitation, we explore the design of nonlinear axiomatic attribution methods. Inspired by the #LeastCore, which is a popular nonlinear substitute for the Shapley value, we introduce a class of nonlinear attribution methods that retain the remaining necessary axioms. Each method yields a contribution vector that is the unique optimal solution to a minimization problem, which aims to approximate utility functions as faithfully as possible. In terms of the inclusion AUC metric, our experiments demonstrate the potential effectiveness of these methods compared to Shapley value variants that relax only the efficiency axiom.
Marutama Ramen @ Millenia Walk in SG.
Whatβs unique about this outlet is that they serve rice sets.
This time round, I have tried Gindara (Black Cod) Teriyaki set with their Homemade Coffee Jelly Parfait! πππ
I also highly recommend their Stir-Fried Chicken & Eggplant with Miso set. All-time favorite.
It used to be the case that we go to the movies for leisure and entertainment.
Now, it has become a test of our mettle: Can we remain in our seats with the shortened attention span and the insecurity due to a missing 4x button to click?
I will be the control experiment, and my child will be the main (or vice versa?).
The countdown begins...
π¨ LLMs are frozen after pretraining, but the world keeps changing. How do you give an LLM new knowledge without retraining it, bloating its context, or breaking what it already knows?
Existing methods hit a wall:
πΈ RAG is brittle to retrieval noise and struggles with cross-document reasoning;
πΈ Fine-tuning is expensive and causes catastrophic forgetting;
πΈ Latent memory is tightly coupled to the model that produced it.
π Key question: Can we encode knowledge into a small, dedicated memory model that any LLM can query without accessing the LLM itself?
π Introducing MeMo (Memory as a Model) π
We train a dedicated MEMORY model on a reflection Question-Answer dataset synthesized from the target corpus. At inference, a frozen EXECUTIVE model (any LLM, including closed-source models) queries the MEMORY model through a structured 3-stage protocol that decomposes complex queries into targeted sub-queries to retrieve precise, noise-robust knowledge and reasons over the responses.
π₯ Key Highlights
π§ 5-step data synthesis pipeline captures explicit facts, implicit relationships, and cross-document connections as reflections;
π‘οΈ Robust to retrieval noise: where RAG drops up to 6.22% with added distractors, MeMo holds steady;
π Plug-and-play with any LLM, no weights, gradients, or logits required;
π¦ Fixed inference cost, independent of corpus size;
π Continual integration via model merging: 33% compute savings over full retraining and scaling benefits grow with the number of corpora.
π Strong results across BrowseComp-Plus, NarrativeQA, and MuSiQue, matching or outperforming retrieval baselines (BM25, NV-Embed-V2, HippoRAG2) with gains of up to 27% on NarrativeQA when paired with Gemini-3-Flash.
π‘ Why this matters
MeMo decouples knowledge from reasoning: Train memory once with a small open model, then plug it into the frontier LLM of your choice. No retraining as new corpora arrive, no fragile retrieval pipelines, and full compatibility with proprietary APIs, paving the way for scalable knowledge-aware AI systems.
π€ Joint work with @workryanq_nus, @961014dltkdg, @alfredleongwl, Alok Prakash, Nancy F. Chen, @arun_v3rma, Daniela Rus, and Armando Solar-Lezama
π Paper: https://t.co/9FrL4CH9O2
π» Code: https://t.co/wiOnH0LKll
π Project page: https://t.co/xsRHFxQIwY
π€ Huggingface: https://t.co/HZTSC1s81X
#LLMs #KnowledgeIntegration #MemoryAugmentedLLMs #RAG #ModelMerging
Our MeMo (Memory as a Model) has appeared on @VentureBeat!
https://t.co/YQ1XLsYPyq
Our original post on MeMo is here: https://t.co/8GU02Ut6AD
To note, the editor has mentioned a limitation of MeMo: "because MeMo synthesizes answers from parametric memory rather than retrieving exact text snippets, it obscures the provenance of the information. This makes it difficult to attribute specific claims to original source documents, which poses a critical compliance issue for enterprise applications requiring strict audit trails." To tackle this issue, do check out our previous works (Waterfall and WASA) on text watermarkingπ§π§π§ for data provenance and attribution in LLMs since 2024:
https://t.co/GcSLkfG6nF
https://t.co/vUwsHuA6FP
π¨ LLMs are frozen after pretraining, but the world keeps changing. How do you give an LLM new knowledge without retraining it, bloating its context, or breaking what it already knows?
Existing methods hit a wall:
πΈ RAG is brittle to retrieval noise and struggles with cross-document reasoning;
πΈ Fine-tuning is expensive and causes catastrophic forgetting;
πΈ Latent memory is tightly coupled to the model that produced it.
π Key question: Can we encode knowledge into a small, dedicated memory model that any LLM can query without accessing the LLM itself?
π Introducing MeMo (Memory as a Model) π
We train a dedicated MEMORY model on a reflection Question-Answer dataset synthesized from the target corpus. At inference, a frozen EXECUTIVE model (any LLM, including closed-source models) queries the MEMORY model through a structured 3-stage protocol that decomposes complex queries into targeted sub-queries to retrieve precise, noise-robust knowledge and reasons over the responses.
π₯ Key Highlights
π§ 5-step data synthesis pipeline captures explicit facts, implicit relationships, and cross-document connections as reflections;
π‘οΈ Robust to retrieval noise: where RAG drops up to 6.22% with added distractors, MeMo holds steady;
π Plug-and-play with any LLM, no weights, gradients, or logits required;
π¦ Fixed inference cost, independent of corpus size;
π Continual integration via model merging: 33% compute savings over full retraining and scaling benefits grow with the number of corpora.
π Strong results across BrowseComp-Plus, NarrativeQA, and MuSiQue, matching or outperforming retrieval baselines (BM25, NV-Embed-V2, HippoRAG2) with gains of up to 27% on NarrativeQA when paired with Gemini-3-Flash.
π‘ Why this matters
MeMo decouples knowledge from reasoning: Train memory once with a small open model, then plug it into the frontier LLM of your choice. No retraining as new corpora arrive, no fragile retrieval pipelines, and full compatibility with proprietary APIs, paving the way for scalable knowledge-aware AI systems.
π€ Joint work with @workryanq_nus, @961014dltkdg, @alfredleongwl, Alok Prakash, Nancy F. Chen, @arun_v3rma, Daniela Rus, and Armando Solar-Lezama
π Paper: https://t.co/9FrL4CH9O2
π» Code: https://t.co/wiOnH0LKll
π Project page: https://t.co/xsRHFxQIwY
π€ Huggingface: https://t.co/HZTSC1s81X
#LLMs #KnowledgeIntegration #MemoryAugmentedLLMs #RAG #ModelMerging
Congrats to my faculty colleagues in @NUSComputing@NUSingapore for receiving the Amazon Research Awards!!!
Ilya Sergey @ilyasergey: Linear Types for a Foundational Multi-Modal Program Verifier (Automated Reasoning)
Jiaheng Zhang @jiahengzhang96: Practical Watermarking for LLMs via Pseudorandom Codes (AWS Cryptography)
Bryan Kian Hsiang Low @bryanklow: Self-Configurable Agentic Learning via Co-optimization (AWS Agentic AI)
Learn more about the program on the @AmazonScience website: https://t.co/VTgJHFd1aJ
#AmazonResearchAwards
Announcing the #AmazonResearchAwards fall 2025 recipients:
π 68 researchers
π« 49 universities
π 11 countries
Each gains access to 800+ Amazon public datasets and AWS AI/ML tools. Meet the cohort: https://t.co/47jUdPuRrV
I'm excited to share that we (#GLOW.AI research group: https://t.co/XBWReWZGd8) have received an Amazon Research Award for our proposal "Self-Configurable Agentic Learning via Co-optimization", at @NUSComputing , @NUSingapore! Learn more about the program on the @AmazonScience website: https://t.co/VTgJHFd1aJ
We (@BobbyZhouZijian@workryanq_nus@arun_v3rma@alfredleongwl@ShaoYongOng@qthanhtran@YvonneFan12@snoidetx@xinyuan3142@lululu0082@nhungbui1299) are thrilled to be able to collaborate closely with Behrooz Omidvar-Tehrani at Amazon Web Services (AWS) Agentic AI in the research for this proposal. I'm also grateful to @WuZhaoxuan and @ray_qiaorui for their earlier involvement.
My heartfelt thanks to Rajiv Dhawan and Kai Hui Ang @awscloud for their help in the entire process!
Do check out some of our preliminary efforts in AI Agents:
Memory as a Model (MeMo): https://t.co/8GU02Ut6AD
CORAL πͺΈ: Towards Autonomous Multi-Agent Evolution: https://t.co/KUghPGEbLR
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents: https://t.co/CW79QUIMOx
#AmazonResearchAwards #AgenticAI
Announcing the #AmazonResearchAwards fall 2025 recipients:
π 68 researchers
π« 49 universities
π 11 countries
Each gains access to 800+ Amazon public datasets and AWS AI/ML tools. Meet the cohort: https://t.co/47jUdPuRrV
π’ Tired of benchmarking your optimizer on Hartmann and Branin? Try BoLT β‘, our new black-box optimization (BBO) benchmark grounded in 20K+ real LLM experiments instead!
LLMs involve expensive, derivative-free decisions that BBO is built to handle. Yet, most BBO research still validates on synthetic functions that miss the challenges of real LLM tasks. BoLT β‘ closes this gap so that you can evaluate BBO methods against realistic objectives without needing large-scale compute.
π¦ 3 task families, 10 problems spanning:
β’ Hyperparameter optimization (LoRA fine-tuning, mixed variables, multi-fidelity);
β’ Data mixture optimization (simplex constraints, multi-objective, heteroscedastic noise);
β’ Prompt optimization (high-dimensional discrete search up to 768 dims).
π Fast, validated emulators replace real LLM calls, returning results in milliseconds. Weights load automatically from HuggingFace on first use.
π Every problem subclasses BoTorch's BaseTestProblem, so your existing optimizer code plugs straight in.
Key findings from benchmarking 15+ methods: GP-based BO consistently beats standard HPO baselines; NEHVI matches NSGA-II on multi-objective data mixture optimization with 50Γ fewer evaluations; trust-region methods are essential for high-dimensional discrete prompt search.
Joint work with Ruth Chew @ruthchewing, Zhiliang Chen @ZhiliangChen94, and Apivich Hemachandra @apivich_h.
Check us out @icmlconf #ICML2026 DEMO Workshop (https://t.co/7Yv1OuuSte)!
π Preprint: https://t.co/F2goOx62dZ
π Project page: https://t.co/kzN4KHshcb
β GitHub: https://t.co/nJ95aV0RPd (star to keep up with future updates)
π» Docs: https://t.co/GxKUBhgaFu
#BayesianOptimization #BlackboxOptimization #LLMs
Congrats to my co-authors @RachaelSim2@YvonneFan12@snoidetx@michael_xinyi@pjaillet!!!
@icmlconf#ICML2026#CollaborativeLearning involves training high-quality models using datasets from a number of sources. To incentivize sources to share data, existing #DataValuation methods fairly reward each source based on its data submitted as is. However, as these methods do not verify nor incentivize data truthfulness, the sources can manipulate their data (e.g., by submitting duplicated or noisy data) to artificially increase their valuations and rewards or prevent others from benefiting. This paper presents the first mechanism that provably ensures (F) collaborative #fairness and incentivizes (T) #truthfulness at equilibrium for Bayesian models. Our mechanism combines semivalues (e.g., #ShapleyValue), which ensure fairness, and a truthful data valuation function (DVF) based on a validation set that is unknown to the sources. As semivalues are influenced by others' data, we introduce an additional condition to prove that a source can maximize its expected data values in coalitions and semivalues by submitting a dataset that captures its true knowledge. Additionally, we discuss the implications and suitable relaxations of (F) and (T) when the mediator has a limited budget for rewards or lacks a validation set. Our theoretical findings are validated on synthetic and real-world datasets.
I was not invited. I begged to join their dinner gathering after I learned about it.
This is a multicultural group of Vietnamese, Indonesian, Thai, Chinese, and Singaporean PhD students from @NUSComputing gathering to eat Thai food in Tengah at the recommendation of a Singaporean.
It is also my first time witnessing @apivich_h speaking in Thai to order Thai food for the group!
Highly recommended @ I Love Sukhothai: Grilled Pork Collar, Creamy Omelette, Homemade Prawn Cake.
π¨ LLMs are frozen after pretraining, but the world keeps changing. How do you give an LLM new knowledge without retraining it, bloating its context, or breaking what it already knows?
Existing methods hit a wall:
πΈ RAG is brittle to retrieval noise and struggles with cross-document reasoning;
πΈ Fine-tuning is expensive and causes catastrophic forgetting;
πΈ Latent memory is tightly coupled to the model that produced it.
π Key question: Can we encode knowledge into a small, dedicated memory model that any LLM can query without accessing the LLM itself?
π Introducing MeMo (Memory as a Model) π
We train a dedicated MEMORY model on a reflection Question-Answer dataset synthesized from the target corpus. At inference, a frozen EXECUTIVE model (any LLM, including closed-source models) queries the MEMORY model through a structured 3-stage protocol that decomposes complex queries into targeted sub-queries to retrieve precise, noise-robust knowledge and reasons over the responses.
π₯ Key Highlights
π§ 5-step data synthesis pipeline captures explicit facts, implicit relationships, and cross-document connections as reflections;
π‘οΈ Robust to retrieval noise: where RAG drops up to 6.22% with added distractors, MeMo holds steady;
π Plug-and-play with any LLM, no weights, gradients, or logits required;
π¦ Fixed inference cost, independent of corpus size;
π Continual integration via model merging: 33% compute savings over full retraining and scaling benefits grow with the number of corpora.
π Strong results across BrowseComp-Plus, NarrativeQA, and MuSiQue, matching or outperforming retrieval baselines (BM25, NV-Embed-V2, HippoRAG2) with gains of up to 27% on NarrativeQA when paired with Gemini-3-Flash.
π‘ Why this matters
MeMo decouples knowledge from reasoning: Train memory once with a small open model, then plug it into the frontier LLM of your choice. No retraining as new corpora arrive, no fragile retrieval pipelines, and full compatibility with proprietary APIs, paving the way for scalable knowledge-aware AI systems.
π€ Joint work with @workryanq_nus, @961014dltkdg, @alfredleongwl, Alok Prakash, Nancy F. Chen, @arun_v3rma, Daniela Rus, and Armando Solar-Lezama
π Paper: https://t.co/9FrL4CH9O2
π» Code: https://t.co/wiOnH0LKll
π Project page: https://t.co/xsRHFxQIwY
π€ Huggingface: https://t.co/HZTSC1s81X
#LLMs #KnowledgeIntegration #MemoryAugmentedLLMs #RAG #ModelMerging
// Memory as a Model //
The paper augments any LLM with a separate trained memory model that stores, retrieves, and integrates facts on its behalf.
It decouples memory updates from base-model weight updates. It achieves continual-learning robustness without catastrophic forgetting, which is a property that RAG fails to deliver.
A vector store is a database with a learned encoder bolted on. MeMo is a learned subsystem with explicit interfaces. That distinction matters, as agents need to be able to ingest fresh knowledge weekly without retraining or vector-DB churn.
At its core, the position here is that memory in agents should be modular, learned, and gated, not a context-window hack.
Paper: https://t.co/iMrghPtxWW
Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c
@iclr_conf#ICLR2026 has concluded! @icmlconf#ICML2026 decisions await.
Another 27 hrs 35 min of economy flight + transit from Rio de Janeiro π§π· back home πΈπ¬.
Thankfully, prior to the flight, several of us have recovered from food poisoning after challenging ourselves to the local Caipirinha drinkπΈπ.
This is possibly the longest flight time to a conference in my 25 years!
Canβt believe I am still doing this since my first conference presentations at #ICRA2002 in DC and @AAMASconf #AAMAS2002 in Bologna, Italy for my undergraduate final year project research work on integrated robot planning and control. Ooo I miss those days in robotics!