Computer Science at UT Austin

@UTCompSci

UTCS is a recognized leader in creating the scientific knowledge and practical technologies exemplifying the digital revolution that defines the 21st century.

Austin, TX

Joined December 2011

353 Following

5.9K Followers

4.3K Posts

Pinned Tweet

Computer Science at UT Austin @UTCompSci

4 months ago

@UTAustin is launching a new School of Computing in fall 2026! With Information and Statistics & Data Science, we’ll expand student opportunities, accelerate research, and strengthen pathways to high-impact careers and grad study. Read more: https://t.co/Kjg3k1gZKZ

UTCompSci retweeted

Priyanka Mandikal @PrnkMandikal

2 days ago

🤖 How can robots learn long-horizon object state change tasks like mashing a banana 🥣🍌, spreading ketchup on bread 🍅🍞, or slicing a cucumber 🔪🥒? Introducing SPARTA: object state-change manipulation via visual spatial progress 👇 🌐 https://t.co/pzRiSYqHjc

UTCompSci retweeted

Zaid Khan

@codezakh

4 days ago

Can an LLM act as a selective model of a GPU during evolutionary search, by reasoning + forecasting a kernel’s runtime but deferring to a GPU when unsure? We produced 12k kernels + runtimes from evolutionary search, costing 400M reasoning tokens + 600 GPU-hours to answer this. In our work GPU Forecasters, we study language models as selective surrogates for GPU kernel optimization. 1️⃣ Off-the-shelf LLMs can forecast how a GPU responds to a candidate kernel with non-trivial accuracy. If we rank candidates by these predictions and measure only the top 10% on a GPU, the fastest kernel we find is within 20% of the best in the pool. 2️⃣ We want LLMs to not just be accurate but also calibrated, so that we can use their uncertainty for selective prediction: during search, we should trust only confident forecasts and verify less confident forecasts by sending them to the GPU. 3️⃣ We train an open-weights surrogate (GPT-OSS-20B) with RL to improve both accuracy and calibration. Calibration-shaped rewards improve both confidence reliability and ranking ability, while correctness rewards alone do not. 4️⃣ Inside a real kernel search, the surrogate finds faster kernels than an equal-GPU-budget baseline by considering more candidates per measurement. 5️⃣ We release 12,388 LLM-generated GPU kernels with measured runtimes spanning 118 operations, CUDA and Triton backends, 3 GPU types, taking 400M tokens + 600 GPU-hours to produce. This dataset can be used for analyzing LLM-driven evolutionary program search dynamics, post-training LLMs for kernel code generation, and things we didn’t get a chance to explore, like training reward models! Thread 🧵👇

codezakh's tweet photo. Can an LLM act as a selective model of a GPU during evolutionary search, by reasoning + forecasting a kernel’s runtime but deferring to a GPU when unsure? We produced 12k kernels + runtimes from evolutionary search, costing 400M reasoning tokens + 600 GPU-hours to answer this.

In our work GPU Forecasters, we study language models as selective surrogates for GPU kernel optimization.

1️⃣ Off-the-shelf LLMs can forecast how a GPU responds to a candidate kernel with non-trivial accuracy. If we rank candidates by these predictions and measure only the top 10% on a GPU, the fastest kernel we find is within 20% of the best in the pool.

2️⃣ We want LLMs to not just be accurate but also calibrated, so that we can use their uncertainty for selective prediction: during search, we should trust only confident forecasts and verify less confident forecasts by sending them to the GPU.

3️⃣ We train an open-weights surrogate (GPT-OSS-20B) with RL to improve both accuracy and calibration. Calibration-shaped rewards improve both confidence reliability and ranking ability, while correctness rewards alone do not.

4️⃣ Inside a real kernel search, the surrogate finds faster kernels than an equal-GPU-budget baseline by considering more candidates per measurement.

5️⃣ We release 12,388 LLM-generated GPU kernels with measured runtimes spanning 118 operations, CUDA and Triton backends, 3 GPU types, taking 400M tokens + 600 GPU-hours to produce. This dataset can be used for analyzing LLM-driven evolutionary program search dynamics, post-training LLMs for kernel code generation, and things we didn’t get a chance to explore, like training reward models!

Thread 🧵👇

15K

UTCompSci retweeted

Yuke Zhu @yukez

5 days ago

Exciting news on GR00T: NVIDIA announces our first open humanoid robot platform, featuring Unitree H2 Plus and Sharpa hands, to accelerate academic research and facilitate cross-institutional collaboration. R&D in humanoid robotics needs broader participation. Open science is how we build the future faster, together.

110

14K

Who to follow

Siebel School of Computing and Data Science

@siebelschool

Leading global engineering at The Grainger College of Engineering at the University of Illinois Urbana-Champaign.

UW–Madison Computer Sciences

@WisconsinCS

Leaders in computing since 1964. 📍 Part of @uwcdis at @uwmadison.

CMU School of Computer Science

@SCSatCMU

The School of Computer Science at @carnegiemellon is one of the world's premier institutions for computer science research and education.

UTCompSci retweeted

Swarat Chaudhuri

@swarat

12 days ago

Delighted to finally unveil these results! 🎉 Many congratulations to the team, who worked tirelessly for almost a year to build and evaluate AlphaProof Nexus. We revised many priors during this project — most notably, we discovered that with current frontier models, simple agent loops with compiler feedback can rival more sophisticated systems. We were struck both by the capabilities of our systems and the magnitude of the challenges ahead. I have never been as excited about the potential of formal math to enhance human creativity and bring rigor to AI. Onward! 🚀

28K

UTCompSci retweeted

Chenfeng_X

@Chenfeng_X

18 days ago

Excited that our paper StreamdiffusionV2 received the Best Research Paper Award at #MLSys26! 🚀Video generation is quickly moving from demos to production-facing workloads. It is no longer a turn-based pipeline but should be a streaming pipeline to interact with users. 📖Our project page: https://t.co/ItuO5zc6hT and paper: https://t.co/fmz2irYIm1 👂Come join the talk if you are interested in streaming video generation. Our talk will be at the Research Track Oral Presentation: Best Paper Session on Tue 8:45AM at #MLSys26 , I will talk about how we attacked the efficiency and quality challenges. Hope to see you there! ❤️Huge thanks to all authors! This work would not have been possible without the incredible effort from the entire team. Big shout out to Tianrui Feng, Zhi Li, @Andy_ShuoYang , @HaochengXiUCB, @lmxyy1999 , @lvminzhang , @xiuyu_l , Keting Yang, @ZiqiPeng, @songhan_mit , @magrawala, @KurtKeutzer , and @cumulo_autumn

216

58K

UTCompSci retweeted

Joydeep Biswas @Joydeepb_robots

29 days ago

Are state-of-the-art AI review systems capable of providing meaningful reviews in an actual AI conference? This paper explains the findings from the AAAI 2026 AI Review Pilot 1/N

UTCompSci retweeted

Peter Stone @PeterStone_TX

11 days ago

Proud of Texas Robotics! Come see our papers at ICRA: https://t.co/zEEHtr9q4V

UTCompSci retweeted

Elias Stengel-Eskin

@EliasEskin

17 days ago

🚨 Excited to share MINTEval, a new benchmark for memory with interference. In real-world settings, agents need to handle continuously changing info (think of all your v2.5_final_final docs) . MINTEval tests memory systems on frequent and interfering changes, across challenging question types (long-range lookback/recover, multi-target reasoning) and 4 realistic domains that challenge even the strongest models/agentic memory systems. 🧵👇

UTCompSci retweeted

Duy Nguyen

@duynguyen772

16 days ago

Sparse binary rewards bottleneck LLM RL, motivating the use of privileged information in self-distillation as dense teachers. How can we use and balance multiple types of privileged info: leveraging stable cross-view info, while preserving view-specific info? Current on-policy self-distillation methods often condition the teacher on only one type of privileged view: full solution, partial rationale, answer-only, reference code, feedback, etc. This can be suboptimal: 1️⃣ No single privileged view consistently performs best when used as a teacher. 2️⃣ Views can introduce teacher-specific artifacts from information unavailable to the student. 🧠 Adaptive-View Self-Distillation (AVSD) considers multiple privileged views jointly as a teacher family, balancing cross-view consensus and view-specific signals through a token-level gate to construct better dense learning signals. 🧵👇

duynguyen772's tweet photo. Sparse binary rewards bottleneck LLM RL, motivating the use of privileged information in self-distillation as dense teachers. How can we use and balance multiple types of privileged info: leveraging stable cross-view info, while preserving view-specific info?

Current on-policy self-distillation methods often condition the teacher on only one type of privileged view: full solution, partial rationale, answer-only, reference code, feedback, etc. This can be suboptimal:

1️⃣ No single privileged view consistently performs best when used as a teacher.
2️⃣ Views can introduce teacher-specific artifacts from information unavailable to the student.

🧠 Adaptive-View Self-Distillation (AVSD) considers multiple privileged views jointly as a teacher family, balancing cross-view consensus and view-specific signals through a token-level gate to construct better dense learning signals.

🧵👇

26K

UTCompSci retweeted

hyunji amy lee @hyunji_amy_lee

17 days ago

LLM agents & memory systems operate in continuously updated environments (Git repos, evolving docs). They must process long contexts, recover earlier information, and reason over many updates that create interference between old and new information. How well do they handle this? We introduce MINTEval: ✅ Frequent context changes & interference (avg. 86 updates) ✅ 5 challenging question types, including long-range lookback & reasoning over multiple targets distributed across context ✅ 4 realistic domains: state tracking, multi-turn dialogue, Wikipedia revisions, GitHub commits ✅ Avg. 138.8k tokens per instance (up to 1.8M) ✅ Human verification on generated QAs = 95.6% 📊 Across 7 representative systems, MINTEval remains difficult, showing an avg. acc of 27.9%, and the best system reaches only 33.4%. 🔎 Our analysis shows: • Memory construction failures cause a 41.7% drop • Memory agents are highly sensitive to design choices • Memory systems have a strong bias toward insertion operations (76.8%) over deletion/update

hyunji_amy_lee's tweet photo. LLM agents & memory systems operate in continuously updated environments (Git repos, evolving docs). They must process long contexts, recover earlier information, and reason over many updates that create interference between old and new information. How well do they handle this?

We introduce MINTEval:
✅ Frequent context changes & interference (avg. 86 updates)
✅ 5 challenging question types, including long-range lookback & reasoning over multiple targets distributed across context
✅ 4 realistic domains: state tracking, multi-turn dialogue, Wikipedia revisions, GitHub commits
✅ Avg. 138.8k tokens per instance (up to 1.8M)
✅ Human verification on generated QAs = 95.6%

📊 Across 7 representative systems, MINTEval remains difficult, showing an avg. acc of 27.9%, and the best system reaches only 33.4%.

🔎 Our analysis shows:
• Memory construction failures cause a 41.7% drop
• Memory agents are highly sensitive to design choices
• Memory systems have a strong bias toward insertion operations (76.8%) over deletion/update

107

23K

Computer Science at UT Austin @UTCompSci

18 days ago

Congratulations to UT Computer Science Ph.D. student Yeonju Ro @j777ro on being named a 2026 MLCommons ML and Systems Rising Star!

MLCommons @MLCommons

18 days ago

Introducing the 2026 @MLCommons Rising Stars! 🌟 We’ve selected 39 outstanding early-career researchers from 26 global institutions who are shaping the future of ML systems, hardware-software co-design, and trustworthy AI. Meet the cohort: https://t.co/yovGC0i7Sv #AI #MLCommons

MLCommons's tweet photo. Introducing the 2026 @MLCommons Rising Stars! 🌟
We’ve selected 39 outstanding early-career researchers from 26 global institutions who are shaping the future of ML systems, hardware-software co-design, and trustworthy AI.
Meet the cohort: https://t.co/yovGC0i7Sv
#AI #MLCommons https://t.co/drqspG75Cv

652

UTCompSci retweeted

VITA Group @VITAGroupUT

about 1 month ago

Yeonju’s research tackles reliability and efficiency in agentic LLM workflows. In her recent work Sherlock 📷, informed verification + speculative execution deliver +18.3% accuracy and up to 48.7% latency reduction. Worth a read! https://t.co/nszLUBtZzU

338

UTCompSci retweeted

VITA Group @VITAGroupUT

about 1 month ago

Huge congrats to our very own @j777ro on being named a 2026 @MLCommons ML & Systems Rising Star, one of just 39 awardees this year! 📷 She’ll be heading to the workshop hosted by @AMD in Santa Clara this July. So well deserved! https://t.co/k0F5u6pyln

706

UTCompSci retweeted

UT Game Development and Design @UT_GameDev

22 days ago · Austin

Digital Demo Day brought our community together one more time this year to celebrate the creativity and hard work of students across all UT departments — and wow, did they deliver. Thank you for being part of this semester's story. See you in the fall! 🎮🤘

131

UTCompSci retweeted

Isil Dillig @IsilDillig

25 days ago

📢 I’m looking to hire a postdoc to work closely with me and my research group at UT Austin on exciting topics in core PL/FM, as well as applications of PL/FM ideas to other areas. If you are interested, or know someone who might be a great fit, please DM me!

13K

UTCompSci retweeted

Joykirat

@joykiratsingh

24 days ago

🚨Excited to announce Agent-BRACE! LLM agents in long-horizon POMDPs either blow up their context with raw history or summarize it, discarding uncertainty by collapsing belief into a point estimate. Agent-BRACE decouples the agent into belief state + policy models, jointly trained via RL. Key takeaways: 1️⃣ 🎯The belief state model produces a structured approximation of the belief distribution as a set of atomic natural-language claims with ordinal verbalized certainty labels ranging from certain to unknown. The policy conditions on this compact belief rather than the full history. 2️⃣ 📈 Outperforms strong RL baselines on long-horizon partially observable embodied language environments while maintaining a near-constant context window independent of episode length. 3️⃣ 🔄 The learned belief becomes increasingly calibrated as evidence accumulates, and epistemic belief decreases over time: the proportion of claims that the agent has the strongest level of belief in grows from 21% → 52% over an episode. 👇🧵

joykiratsingh's tweet photo. 🚨Excited to announce Agent-BRACE!

LLM agents in long-horizon POMDPs either blow up their context with raw history or summarize it, discarding uncertainty by collapsing belief into a point estimate. Agent-BRACE decouples the agent into belief state + policy models, jointly trained via RL.

Key takeaways:

1️⃣ 🎯The belief state model produces a structured approximation of the belief distribution as a set of atomic natural-language claims with ordinal verbalized certainty labels ranging from certain to unknown. The policy conditions on this compact belief rather than the full history.

2️⃣ 📈 Outperforms strong RL baselines on long-horizon partially observable embodied language environments while maintaining a near-constant context window independent of episode length.

3️⃣ 🔄 The learned belief becomes increasingly calibrated as evidence accumulates, and epistemic belief decreases over time: the proportion of claims that the agent has the strongest level of belief in grows from 21% → 52% over an episode.

👇🧵

16K

UTCompSci retweeted

Caroline Wang

@CarolineWang98

4 months ago

[1/n] Just wrapped up 7 months interning with @pcastr at DeepMind and I'm so excited to share our work: https://t.co/SsHsksxO3i. TLDR: We used LLM-powered program synthesis to automatically model and discover differences between human and LLM strategic behavior

CarolineWang98's tweet photo. [1/n] Just wrapped up 7 months interning with @pcastr at DeepMind and I'm so excited to share our work: https://t.co/SsHsksxO3i.
TLDR: We used LLM-powered program synthesis to automatically model and discover differences between human and LLM strategic behavior https://t.co/cMpnHmNHJ5

329

214

41K

UTCompSci retweeted

UT Austin @UTAustin

27 days ago

Congratulations to the Class of 2026! 🎓🧡 We can't wait to see how you change the world 🤘

332

21K

UTCompSci retweeted

UT Dean of Students @UTDoS

29 days ago

Congratulations, Class of 2026! 🎓 We're excited to celebrate this unforgettable achievement with you at commencement tomorrow! For important updates, text UTGRAD26 to 888777 to receive notifications about parking, traffic, weather and ceremony updates. Hook 'em forever 🤘🧡

UTDoS's tweet photo. Congratulations, Class of 2026! 🎓

We're excited to celebrate this unforgettable achievement with you at commencement tomorrow! For important updates, text UTGRAD26 to 888777 to receive notifications about parking, traffic, weather and ceremony updates. Hook 'em forever 🤘🧡 https://t.co/v6E2lKAHeM

172

UTCompSci retweeted

UT Austin @UTAustin

29 days ago

The clear bag policy is in place for ALL COMMENCEMENT EVENTS. Guests and graduates will not be allowed to enter with prohibited bags, including umbrellas, signs, and strollers. Review the guidelines before arriving on campus: https://t.co/5II3Y5ljnX #UT26

Computer Science at UT Austin

@UTCompSci

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users