@OpenAI Experiencing issues with model 5.5 Thinking's ability to access/read attached .csv files. Either doesn't see the file attached at all or says the cells are "truncated with ellipses". Sustained over multiple sessions & different CSV files / sources.
@karpathy For outputs that are not overly complex, I've been really enjoying asking the LLM to "generate an image for a slide deck" that "summarizes the information" or "provides the key takeaways"
https://t.co/grwA5YSBKo took down their website (I'm getting 404) and unlisted all their YouTube content? They did synchronized video dubbing for multiple languages. Anyone know what's going on?
@OpenAI Only the most recent input can be edited now? Disappointing design decision. To manage the context window, users may need to go back to a certain part of the conversation and re-generate a different "stream" of context. As of today, only the most recent input is editable
@OpenAI love the new math visuals that are rolling out today. It seems like we'll have to guess and hope which concepts are enabled? Could you publish a full list or create a GPT that allows students to explore the various explanations without having to explicitly prompt them?
I see it's fixed now - thank you! Also, love the new image gen. FYI noticed on the browser, the output for image gen keeps saying pending even though on the app, if I do to the same thread I see the image gen is completed.
Emergent introspective awareness in LLMs
Anthropic used concept injection to test whether LLMs can introspect on their internal states. They found that:
- Claude Opus 4.1 and 4 detected injected concepts with 20% success at optimal layers. They distinguished internal "thoughts" from text inputs, and identified whether outputs were intentional.
- Models also modulated internal states when prompted (like "think about X").
- Introspective ability varied by model and context, suggesting emerging, unreliable, and mechanistically diverse self-awareness.
So there is a growing spectrum of mechanistic self-awareness emerging in todayโs LLMs
@sama I'm noticing the agent gets hung up even more so than me on problematic UI design. Would it be possible to analyze the aggregate data and release best practices for web developers to make their web design more user-friendly for AI agents?
@sama@OpenAI Y'know how users can go back and edit a previous input & rerun it? Would be cool to be able to remove a previous input/output pair, too. Cld help keep the context in a thread "clean", reducing need to start a fresh thread.
@sama@OpenAI just want to make sure it's on your teams radar that reasoning decreases performance in certain creative tasks, such as writing. I know everyone loves o3, but I find it extremely frustrating to use compared to 4o for creative tasks. (1/3)
@ChristinaHartW will OpenAI be notifying users when the Gmail connection you demo'd is public? I see the Connections and am unsure if that is new. But it doesn't function the way you demonstrated.
1/ Wait, Bigfoot figured out how to run a startup without drowning in multitasking ๐
It found ๐ฆ๐ถ๐บ๐๐น๐ฎ๐ฟ ๐ฃ๐ฟ๐ผ, the worldโs first production-grade, computer-use agent that runs thousands of steps without a hiccup - working 24/7 so he didnโt have to.
So how does Simular Pro work?
Most agents use LLMs - great explorers, but flaky when # steps grows; RPAs are stable but rigid. Simular Pro uses 2 agents, maximizing generality & repeatability:
- Neural: explores
- Symbolic: executes
Every action is editable, deterministic code.
Excited to be presenting at #AOM2025 on Sunday! Thank you to #TLCAOM for your sponsorship! Picked up my poster from Copenhagen Print today. Looking forward to connecting with faculty and higher ed leadership interested in offering an introductory course on generative AI. #genAI