BG Anders @bg_anders - Twitter Profile

1 day ago

If true, interesting to see what the US admin does now. By current "standards" every model from now on (which will certainly be better than Mythos) can be banned because "it's dangerous".

Andrew Curran

@AndrewCurran_

2 days ago

A new, more capable version of Mythos has emerged from training. I don't know whether it will be called Mythos 5.1 or Mythos 6, or if Anthropic will keep it internal to accelerate further development - but it has arrived. Stopping models like Fable 5 or Mythos 5 from being served to the public does nothing to slow down development. In fact, it probably speeds it up slightly by freeing up resources. There are also no rules preventing the labs from continuing to advance capabilities while any current model is under embargo - or from keeping progress quiet until they choose to release it. None of them can afford to pause or slow down. We need only look at how capable GLM-5.2 is as proof of this. To protect their business models, the frontier labs must continually train increasingly capable systems to stay ahead of open source, and each other. The current continues to rage beneath the ice, and we continue to race toward our destination.

283

4K

408

973

1M

0

59

bg_anders retweeted

Andrew Curran

@AndrewCurran_

9 days ago

https://t.co/yUVogvdruU

259

4K

645

5K

2M

BG Anders

@bg_anders

8 days ago

@ProfAviLoeb @RichAC2020 You're missing @ericweinstein 's unique perspective.

0

4

0

141

bg_anders retweeted

Satya Nadella

@satyanadella

9 days ago

https://t.co/vLmiBKTtX3

3K

41K

8K

56K

66M

Who to follow

invisible Digital

@Invisidigi

Student of the hustle.

Dolf

@RudiCsomor

What doesn't kill you makes you stronger

The Outspoken

@TheOutspokenK

The beautiful game ..

BG Anders

@bg_anders

10 days ago

@kimmonismus Anthropic's guardrails were ridiculous already, then the US government made it worse. We're going straight to a split where some will have access to higher intelligence and some will not.

0

3

0

247

BG Anders

@bg_anders

25 days ago

@kimmonismus https://t.co/Jc4pnYerHS

BG Anders

@bg_anders

25 days ago

Opus 4.8 first impression after 4 hours of use inside a complex project: - def better than 4.7 - much better in conversations - clearer/deeper reasoning - reads intent better - improved 'read between lines' - good thinking partner - not bamboozled by the long context Overall, a very positive first impression. Since we reasoned a very good plan together, I'll let it implement in the repo and come back with results later. From me, a thumbs up @AnthropicAI

0

1

0

24

0

10

BG Anders

@bg_anders

25 days ago

Opus 4.8 first impression after 4 hours of use inside a complex project: - def better than 4.7 - much better in conversations - clearer/deeper reasoning - reads intent better - improved 'read between lines' - good thinking partner - not bamboozled by the long context Overall, a very positive first impression. Since we reasoned a very good plan together, I'll let it implement in the repo and come back with results later. From me, a thumbs up @AnthropicAI

0

1

0

24

bg_anders retweeted

David Sacks

@DavidSacks

about 2 months ago

It’s time to demystify Mythos. Mythos is not magic. It’s not a doomsday device. It’s the first of many models that can automate cyber tasks (just like coding). OpenAI’s GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there within approximately 6 months. It’s important to recognize that these models do not create vulnerabilities; they discover them. The bugs are already in the code. Using AI to discover and patch them will actually harden these systems. The leap from pre-AI cyber to post-AI cyber means that there will be a big upgrade cycle. After that, however, the market is likely to reach a new equilibrium between AI-powered cyber-offense and AI-powered cyber-defense. Obviously it’s important that cyber defenders get access before cyber attackers. That process is already underway but needs to happen quickly (see point above about Chinese models). Unlike Mythos, GPT-5.5-cyber appears not to be token constrained so it may be the first cyber model that defenders actually get to use.

274

5K

565

1K

1M

BG Anders

@bg_anders

about 2 months ago

GPT 5.5 did this :))

0

6

BG Anders

@bg_anders

2 months ago

LOL @OpenAI here is a mini-bug: in a relaxed conversation with 5.5 I asked about the new img gen 2, like how the switch happens from conversation to generating, and it immediately started generating an image without answering.

0

15

BG Anders

@bg_anders

2 months ago

@theo First feeling is good; not done torturing it yet.

0

361

BG Anders

@bg_anders

2 months ago

Feels like a great improvement over its predecessor, not so uptight in conversations. First impression is very good, but we'll know in a day or two if this "personality" is also great at complex work.

OpenAI

@OpenAI

2 months ago

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

2K

51K

7K

9K

13M

0

6

BG Anders

@bg_anders

2 months ago

@stevibe adaptive thinking... https://t.co/bZku1X1Haj

BG Anders

@bg_anders

2 months ago

Compared Opus 4.6 to 4.7 on a full end-to-end complex run. Then had 4.7 compare the outputs: "... honest answer: the old version is better than mine on more axes than it loses". Tested some more, and I don't get a glimpse of that 4.5 to 4.6 OMG-moment. Working theory so far is the Adaptive Thinking that Anthropic set for this model compared to 4.6 Extended Thinking. This because the tested tasks were reasoning intensive, and reasoning is where 4.7 was weaker. I'll do intense coding today and am curious how it behaves with CC, but I hope Anthropic lets us decide the thinking and not put in a variable they can control based on the availability of their compute.

0

142

0

105

BG Anders

@bg_anders

2 months ago

Compared Opus 4.6 to 4.7 on a full end-to-end complex run. Then had 4.7 compare the outputs: "... honest answer: the old version is better than mine on more axes than it loses". Tested some more, and I don't get a glimpse of that 4.5 to 4.6 OMG-moment. Working theory so far is the Adaptive Thinking that Anthropic set for this model compared to 4.6 Extended Thinking. This because the tested tasks were reasoning intensive, and reasoning is where 4.7 was weaker. I'll do intense coding today and am curious how it behaves with CC, but I hope Anthropic lets us decide the thinking and not put in a variable they can control based on the availability of their compute.

0

142

BG Anders

@bg_anders

2 months ago

@jonathanstark way bigger than any human innovation that we know of

0

4

BG Anders

@bg_anders

3 months ago

@DonJohnson4220 @BrianRoemmele @grok and by latest, you mean E4B?

0

1

BG Anders

@bg_anders

3 months ago

@jonathanstark with self-improvement, there is no upper limit; and even if all progress stopped today, it's already beyond mobile

0

11

BG Anders

@bg_anders

3 months ago

@karpathy Sounds great. For the "catalog" I opted for vector, not md - just references, what this document is about, which project, key topics, plus a reference to where the full file lives. All automated, MCP, accessible to any model I work with.

0

1

0

93

bg_anders retweeted

Andrej Karpathy

@karpathy

3 months ago

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

3K

60K

7K

107K

21M

BG Anders

@bg_anders

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users