Marko @marko11c - Twitter Profile

marko11c retweeted

29 days ago

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

8K

150K

11K

14K

28M

Marko

@marko11c

2 months ago

@levelsio Yup, I think this will be unlocked with better harness engineering sooner than we think. Eventually, you or your agent will be able to query a platform to complete any task you need done at an insanely low cost.

0

473

marko11c retweeted

Harrison Chase

@hwchase17

2 months ago

https://t.co/aZnKZ1SVXB

108

4K

520

11K

2M

marko11c retweeted

Paul S. Conyngham

@paul_conyngham

3 months ago

https://t.co/bpa3HHt8Mg

245

5K

931

5K

3M

marko11c retweeted

Michael Truell

@mntruell

4 months ago

We believe Cursor discovered a novel solution to Problem Six of the First Proof challenge, a set of math research problems that approximate the work of Stanford, MIT, Berkeley academics. Cursor's solution yields stronger results than the official, human-written solution. Notably, we used the same harness that built a browser from scratch a few weeks ago. It ran fully autonomously, without nudging or hints, for four days. This suggests that our technique for scaling agent coordination might generalize beyond coding.

264

8K

504

3K

1M

marko11c retweeted

Anthropic

@AnthropicAI

4 months ago

A statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War. https://t.co/rM77LJejuk

4K

56K

9K

17M

marko11c retweeted

Andrej Karpathy

@karpathy

4 months ago

It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow. Just to give an example, over the weekend I was building a local video analysis dashboard for the cameras of my home so I wrote: “Here is the local IP and username/password of my DGX Spark. Log in, set up ssh keys, set up vLLM, download and bench Qwen3-VL, set up a server endpoint to inference videos, a basic web ui dashboard, test everything, set it up with systemd, record memory notes for yourself and write up a markdown report for me”. The agent went off for ~30 minutes, ran into multiple issues, researched solutions online, resolved them one by one, wrote the code, tested it, debugged it, set up the services, and came back with the report and it was just done. I didn’t touch anything. All of this could easily have been a weekend project just 3 months ago but today it’s something you kick off and forget about for 30 minutes. As a result, programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now. It’s not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.

2K

37K

5K

20K

5M

Marko

@marko11c

4 months ago

@KobeissiLetter And this is just the start

0

16

Marko

@marko11c

4 months ago

METR @METR_Evals

4 months ago

We estimate that Claude Opus 4.6 has a 50%-time-horizon of around 14.5 hours (95% CI of 6 hrs to 98 hrs) on software tasks. While this is the highest point estimate we’ve reported, this measurement is extremely noisy because our current task suite is nearly saturated.

METR_Evals's tweet photo. We estimate that Claude Opus 4.6 has a 50%-time-horizon of around 14.5 hours (95% CI of 6 hrs to 98 hrs) on software tasks. While this is the highest point estimate we’ve reported, this measurement is extremely noisy because our current task suite is nearly saturated. https://t.co/EuRBIrEcVx

225

4K

450

1K

4M

0

54

marko11c retweeted

Boris Cherny

@bcherny

6 months ago

@YashGouravKar1 Correct. In the last thirty days, 100% of my contributions to Claude Code were written by Claude Code

124

3K

311

728

1M

Marko

@marko11c

6 months ago

@karpathy Would love to see a 1-2 hour YouTube video of you building with Claude Code!

0

10

marko11c retweeted

Dwarkesh Patel

@dwarkesh_sp

7 months ago

The @ilyasut episode 0:00:00 – Explaining model jaggedness 0:09:39 - Emotions and value functions 0:18:49 – What are we scaling? 0:25:13 – Why humans generalize better than models 0:35:45 – Straight-shotting superintelligence 0:46:47 – SSI’s model will learn from deployment 0:55:07 – Alignment 1:18:13 – “We are squarely an age of research company” 1:29:23 – Self-play and multi-agent 1:32:42 – Research taste Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify. Enjoy!

403

9K

1K

8K

4M

Marko

@marko11c

7 months ago

In one of the traces from the game, Gemini 2.5 Pro thought it got finessed by Gemini 3 Pro and sought vengeance:

Andon Labs

@andonlabs

7 months ago

In the first-ever Vending-Bench Arena game, Claude Sonnet 4.5, GPT 5.1, Gemini 2.5 Pro, and Gemini 3 Pro competed to win the local vending machine market. Gemini 3 Pro made more money than the other three contestants combined.

andonlabs's tweet photo. In the first-ever Vending-Bench Arena game, Claude Sonnet 4.5, GPT 5.1, Gemini 2.5 Pro, and Gemini 3 Pro competed to win the local vending machine market. Gemini 3 Pro made more money than the other three contestants combined. https://t.co/4CIJsolqAc

1

38

1

2

3K

0

1

0

120

Marko

@marko11c

8 months ago

The truth lies somewhere in the middle

‎Wojak Codes

@wojakcodes

8 months ago

bro is literally him

115

54K

3K

4K

2M

0

121

Marko

@marko11c

8 months ago

@OfficialLoganK 100%. Also, Python should be taught right along with learning to read and write

0

16

Marko

@marko11c

10 months ago

@levelsio Their best shot is to buy something like Perplexity.

0

1

0

42

Marko

@marko11c

10 months ago

@paulg In the short term, yes.

0

3

0

233

Marko

@marko11c

10 months ago

It was a great run

0

2

0

30

marko11c retweeted

Sam Altman

@sama

11 months ago

gpt-oss is a big deal; it is a state-of-the-art open-weights reasoning model, with strong real-world performance comparable to o4-mini, that you can run locally on your own computer (or phone with the smaller size). We believe this is the best and most usable open model in the world. We're excited to make this model, the result of billions of dollars of research, available to the world to get AI into the hands of the most people possible. We believe far more good than bad will come from it; for example, gpt-oss-120b performs about as well as o3 on challenging health issues. We have worked hard to mitigate the most serious safety issues, especially around biosecurity. gpt-oss models perform comparably to our frontier models on internal safety benchmarks. We believe in individual empowerment. Although we believe most people will want to use a convenient service like ChatGPT, people should be able to directly control and modify their own AI when they need to, and the privacy benefits are obvious. As part of this, we are quite hopeful that this release will enable new kinds of research and the creation of new kinds of products. We expect a meaningful uptick in the rate of innovation in our field, and for many more people to do important work than were able to before. OpenAI’s mission is to ensure AGI that benefits all of humanity. To that end, we are excited for the world to be building on an open AI stack created in the United States, based on democratic values, available for free to all and for wide benefit.

1K

21K

2K

4K

3M

Marko

@marko11c

11 months ago

What could go wrong

Anthropic

@AnthropicAI

11 months ago

New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.

AnthropicAI's tweet photo. New Anthropic research: Persona vectors.

Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination. https://t.co/PPX1oXj9SQ

226

6K

878

4K

1M

0

2

0

78

Marko

@marko11c

Last Seen Users on Sotwe

Trends for you

Most Popular Users