@AnthropicAI Einstein wouldn’t have ever come up with Special Relativity if he was hired at a frontier lab. Anthropic are chasing the wrong kind of data required for transformational science.
@socialwithaayan I think the most promising avenue in AI for research is accelerating ideation and conceptual foundations.
I built the tool for this at usepythagoras [dot] com
I built Obsidian for real researchers:
Extracts key concepts from your messy research notes.
Converts these concepts into nodes.
Enriches nodes with topic, subject, and equation features
Connects nodes with edges through semantic similarity in the feature space
Link below
@anthonykrose@demishassabis Glad to see others are thinking about this problem. Context management is crucial in allowing agents to deeply understand niche research fields.
We’ve building https://t.co/45DeGixKzQ that leverages messy human thoughts and notes through agent context.
A researcher’s best thinking is lost.
Scattered across scraps of paper, blackboards, and scribbled notebooks.
We built an AI notebook that lets messy human chains of thought compound into a living knowledge base.
@kevinweil@OpenAI How well does this catch deeper conceptual errors like physical interpretation, modelling assumptions, or symmetry constraints? It seems like these classes of errors are the main source of problems in publications in the theoretic sciences.
@ChaseBrowe32432 By saturated benchmark, I think it’s meant that any progress beyond a certain threshold isn’t a meaningful indicator of model performance. Benchmark contamination has taken place here, resulting in apparent improvement.
https://t.co/eHme7GjCh0
@ns123abc I don’t think significant value creation will occur in bio without models that are robust theory builders. There’s a fundamental difference value-creation-wise between coding and scientific research.
What happens when you train LLMs on your entire personal conversation data?
We found that the model retains your preferences, personality, and judgement.
We used this technique to win the Anthropic AgentVerse Hackathon, where we built digital twins to automate work meetings.
3) Meeting Automation
A meeting lead (task delegator) suggests actions for the agents to complete. The agents communicate in shared context and decide upon what delegation aligns with each agent's preferences.
2) Deployment
The post-trained models are then deployed on CAIPE (Cisco's multi-agent orchestration infrastructure). The models behave as digital twins of different people, with shared context.
1) Data Pipeline
We extracted all personal messages on iMessage, WhatsApp, and Gmail. Amplified the data using synthetic generation from an LLM with context of the entire dataset. This allows us to run post-training at scale on the personal data.