@jxmnop CVPR conference with 2.5 hrs shift for 500 papers, approx. 18secs/paper skimming time. Just enough to read titles, considering you zero break time. It was good tho that similar papers were clustered together. NIPS is even wilder I guess.
Had a hilarious chat with Gemini while debugging code! 😂 Got a random "Covid-19 Safety Measures" image mid-conversation. Gotta love those overused pretrained data surprises!
(1) the reasoning paths learnt via RLVRs are not novel, i.e., they already exist in the base model,
(2) RLVRs make the reasoning path narrower, while the base model has wider reasoning paths.
Paper Link: https://t.co/p2p2iGOLuD
A paper that breaks down the shortcomings of RLVRs (e.g., GRPO), which have been the go-to methods for training reasoning models these days. The authors have interesting findings...
[2/2]
Paper: Learning to Reason without External Rewards
ArXiv: https://t.co/oxcHWYX1Sx
TL;DR: LLM's confidence score can be used to realize RL; no cost for labeling preference data.
[1/n]
Here is a fresh paper, Reinforcement Learning with Internal Feed (RLIF), that I found this week. This paper claims that LLMs can use their own confidence score as a reward signal to optimize the preferred outputs, without relying on external rewards or labeled data.
TL;DR..
[1/n]
An interesting paper called "ICLR" 👀 published at ICLR'25:
LLMs can capture semantics in their layers' representations (token features) based on their pretraining data. While LLMs typically reflect the semantics they’ve seen during pretraining, they’re also capable ....
[n/n]
Their findings show that LLMs do adjust their representations to reflect the same graph. Also, scaling the context (using longer context prompts) helps to make the graph more refined.
[2/n]
of in-context learning — meaning they can pick up new context from the input prompt itself. Authors investigated how these newly introduced, unseen contexts affect the model’s internal representation structure. To study this, they designed a toy graph tracing experiment.
A few days ago, my phone alerted me to bad weather, urging caution and advising against long-distance travel. If only Nepal govt. had acted similarly months ago, 100s of flood deaths could have been avoided. Instead, this was our PM's shameless comment:
https://t.co/qAX6JYPQwB
Mind-muscle connection is real. It shapes muscle tension, posture, and range of motion. By tweaking these details, we can create the right stretch. Fun fact: You can engage muscles with your mind—even while standing still.
@toughresearcher To sum your point, papers should outline their spectrum of impact. With wider spectrum, you must push towards exploration, whereas narrower spectrum can push to exploitation.
[Discussion] The competition for getting SOTA results for specific datasets is huge in ML research. They employ massive hyperparameter tunings to surpass their predecessors. Also, they report the best result among multiple seeds. Is this flow of research healthy or unhealthy?
Today, we (6 of us friends) were in 3-hour long video call with constant laughter 😂 and jokes. We were taking turns to roast one guy at a time. Laughed like that after a long time.