Demetris Trihinas @dtrihinas - Twitter Profile

about 1 year ago

🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching? The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks, but we found their fundamental limitations are more severe than expected. In our latest work, we compared each “thinking” LRM with its “non-thinking” LLM twin. Unlike most prior works that only measure the final performance, we analyzed their actual reasoning traces—looking inside their long "thoughts". Our analysis reveals several interesting results ⬇️ 📄 https://t.co/PjnYpVRdX3 Work led by @ParshinShojaee and @i_mirzadeh, and with @KeivanAlizadeh2, @mchorton1991, Samy Bengio.

MFarajtabar's tweet photo. 🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching?

The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks, but we found their fundamental limitations are more severe than expected.

In our latest work, we compared each “thinking” LRM with its “non-thinking” LLM twin. Unlike most prior works that only measure the final performance, we analyzed their actual reasoning traces—looking inside their long "thoughts". Our analysis reveals several interesting results ⬇️
📄 https://t.co/PjnYpVRdX3

Work led by @ParshinShojaee and @i_mirzadeh, and with @KeivanAlizadeh2, @mchorton1991, Samy Bengio.

110

3K

568

4K

908K

dtrihinas retweeted

Jeff Dean

@JeffDean

over 1 year ago

If work you're proud of has been rejected from an important conference, just remember that it's a momentary blip and that the true judgement of impact will be whether others benefit from and build on your work. (In addition to @chelseabfinn's example below, I've had similar experiences: for example, @geoffreyhinton, @OriolVinyalsML and my paper on distillation was rejected from NeurIPS in 2015, so we published it in a workshop and put it on Arxiv, and it is now heavily cited and used quite often in practice).

14

302

23

36

38K

dtrihinas retweeted

Hamel Husain

@HamelHusain

over 1 year ago

I started doing office hours on LLM evals and met with 8+ founders in the last 3 weeks. Common questions: - Which components of our app do we start evaluating (RAG,tool calls, etc)? - What metrics should I use? - Where should I spend my time? All have the same solution. LOOK AT THE DATA. What does this mean though? It means look at your logs/traces - start with 30 or so. Start categorizing the errors and issues you see. Keep looking at logs and traces until you feel like you aren't learning anything new. In the end, you will know where your biggest issues are. You prioritize those! You will also get a sense of what is most important to measure (and how). That's it. Look at data, build evals and tests prioritized by patterns in the data. If you don't have data, generate synthetic inputs/interactions into your LLM application so you can generate data. I didn't make this technique up fwiw. These are fundamentals of building machine learning systems and is often referred to as "Error Analysis". It is a fancy word for looking at data, categorizing errors, and then doing data analysis on those errors to understand what to prioritize and work on. I've documented some of the office hours, and you can see in all cases the solution was performing error analysis. Here are links to those: https://t.co/Z5CSGznzgP

14

343

30

420

59K

Demetris Trihinas @dtrihinas

over 1 year ago

Happy to announce that the 3rd Intl Workshop on Testing Distributed #IoT Systems (TDIS) is moving to #ACM #EuroSys this year and will be held in Rotterdam, NL on Mar 31, 2025! Call for papers is already available on our website and paper submission deadline is on Jan 24, 2025

0

1

0

128

Who to follow

University of Cyprus

@UCYOfficial

Νέα από το Πανεπιστήμιο Κύπρου News from the University of Cyprus - UCY, a leading educational and research institution in the Euro-Mediterranean region

Dimitri Diakopoulos

@ddiakopoulos

Research Engineering Manager @ Apple working on Vision Pro. Formerly Meta, Adobe Research, Intel RealSense. CalArts alum. Opinions my own. he/him.

Chloe Alverti

@chpoppins

Postdoc CS,UIUC @IllinoisCS Interested in Operating Systems and Computer Architecture, https://t.co/ujsPhKCK80

Demetris Trihinas @dtrihinas

over 1 year ago

@ThaleiaDimitra Wow! Congrats and all the best!

1

0

54

Demetris Trihinas @dtrihinas

over 1 year ago

The registration for the 1st Intl Workshop on Low Carbon Computing is not only open, but thanks to our sponsors, online attendance is free! LOCO 2024 will feature 2 keynotes from Anne Currie and Ayse Coskun, 21 paper presentations and 10 lightning talks! https://t.co/6B4NeWXun6

0

4

2

0

146

dtrihinas retweeted

Artificial Intelligence Lab - Univ. of Nicosia @AilabUnic

over 1 year ago

The @UNIC_ENG department of computer science and AILab are supporting the @BankofCyprus_ BoC 5.0 #fintech #hackathon we will be providing mentoring and are hosting an open day on oct 8 at 18.00 https://t.co/Qq52wWqBNq

0

2

1

0

126

Demetris Trihinas @dtrihinas

over 1 year ago

What an energetic experience #LIS2024 was! Had a nice time as a panelist talking about #AIEthics and it contributions to education. Thanks @CardetNGO for the invite and excellent hosting at @UNIC_ENG

dtrihinas's tweet photo. What an energetic experience #LIS2024 was! Had a nice time as a panelist talking about #AIEthics and it contributions to education. Thanks @CardetNGO for the invite and excellent hosting at @UNIC_ENG https://t.co/gJFhoEeG7m

0

3

1

0

78

Demetris Trihinas @dtrihinas

over 1 year ago

There are still 11 days up to the deadline of the first workshop on low carbon computing! The cfp is online and there are two submission types with full paper and lightning talk availability. https://t.co/6B4NeWXun6

0

3

2

0

105

Demetris Trihinas @dtrihinas

about 2 years ago

We have a new research opening at the @AilabUnic in the area of energy and carbon-aware #dataops and #AI https://t.co/48DbZy8QHn

0

3

1

0

137

Demetris Trihinas @dtrihinas

about 2 years ago

Our FedBed paper about testing #Federated_Learning deployments is featured on @TheOfficialACM @GrowKudos showcase https://t.co/W0SNDsXiOg

0

8

3

0

644

dtrihinas retweeted

Artificial Intelligence Lab - Univ. of Nicosia @AilabUnic

over 2 years ago

During the first #youthtechfest held in #Cyprus the AILab held a panel session dedicated to advancing #education with #AI https://t.co/DioltE8225

0

5

2

0

228

Demetris Trihinas @dtrihinas

over 2 years ago

Extremely happy that our work on energy-aware data streaming for edge computing initially inspired during the @RainbowH2020 project, received the best paper award at the #CloudCom2023 conference. @AilabUnic @LInC_UCY

dtrihinas's tweet photo. Extremely happy that our work on energy-aware data streaming for edge computing initially inspired during the @RainbowH2020 project, received the best paper award at the #CloudCom2023 conference. @AilabUnic @LInC_UCY https://t.co/8aTghzBWIc

1

12

2

0

317

dtrihinas retweeted

Lauritz Thamsen @lauritzthamsen

over 2 years ago

Happy to share our special issue of Wiley's "Software: Practice and Experience" on #Benchmarking, #Experimentation Tools, and #Reproducible Results for #Edge #Cloud #Systems, https://t.co/uzI2r5WuoU. With our reviewers, @dbermbach, @dtrihinas, and I were able to include: [1/3]

1

11

2

0

397

Demetris Trihinas @dtrihinas

over 2 years ago

The dept of computer science @UNIC_ENG is co-sponsoring one the largest #fintech #hackathon in #cyprus organised by the @BankofCyprus_ Excited to be helping out as a mentor giving participant advise on MLOps, Big Data and Cloud services. https://t.co/bVv3nTDTqS

0

2

0

106

dtrihinas retweeted

Adam Selipsky @aselipsky

almost 3 years ago

For #PrimeDay this year, Amazon Aurora processed 318 billion transactions, stored 2,140 terabytes of data, and transferred 836 terabytes of data, while DynamoDB handled trillions of calls and peaked at 126 million requests per second. Always fascinating to see these mind-boggling numbers from @jeffbarr on how @awscloud powered this year’s record-breaking Prime Day. You all did a lot of shopping🛒!😊 https://t.co/wabQCe1zGZ

5

394

60

43

43K

Demetris Trihinas @dtrihinas

about 3 years ago

Constantia Malekkou giving a talk at the pancyprian conference on #Statistics in relation to #MachineLearning service point placement in #Cyprus @AilabUnic

dtrihinas's tweet photo. Constantia Malekkou giving a talk at the pancyprian conference on #Statistics in relation to #MachineLearning service point placement in #Cyprus @AilabUnic https://t.co/FxOA5FbNBg

0

2

1

0

175

Demetris Trihinas @dtrihinas

about 3 years ago

Super proud of Andreas Neocleous a BSc student (yes!) in #DataScience giving a talk on price trends and patterns on basic goods in #Cyprus

dtrihinas's tweet photo. Super proud of Andreas Neocleous a BSc student (yes!) in #DataScience giving a talk on price trends and patterns on basic goods in #Cyprus https://t.co/fIk1Hd8PV4

0

7

2

0

299

dtrihinas retweeted

Artificial Intelligence Lab - Univ. of Nicosia @AilabUnic

about 3 years ago

Xenia Miscouridou keynote speaker on inferencial #ArtificialIntelligence at this year's Pancyprian #Statistics conference held at @UNIC_ENG

AilabUnic's tweet photo. Xenia Miscouridou keynote speaker on inferencial #ArtificialIntelligence at this year's Pancyprian #Statistics conference held at @UNIC_ENG https://t.co/bZoBopGSMi

0

4

3

0

312

dtrihinas retweeted

Fabian Hueske @fhueske

about 3 years ago

Big news: @ApacheFlink receives this year's @sigmod Systems Award! The project was started at TU Berlin back in 2009 but it would be nowhere today (feature and adoption-wise) without its awesome community. Thanks to everyone who contributed to Apache Flink! 👏🙏

fhueske's tweet photo. Big news: @ApacheFlink receives this year's @sigmod Systems Award! The project was started at TU Berlin back in 2009 but it would be nowhere today (feature and adoption-wise) without its awesome community. Thanks to everyone who contributed to Apache Flink! 👏🙏 https://t.co/ZWHKtZc6iS

2

121

17

4

13K

Demetris Trihinas

@dtrihinas

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users