@mathandcobb These are great thanks for making them.
I am curious about the smallest number of nodes where the new solution beats the old. Is it easy for you to generate this? I think it might be interesting and demonstrative.
Following up on the suggestion from Will Sawin, here is an illustration of the new configurations that disprove Erdos' unit distance conjecture (made with the help of ChatGPT 5.5 Thinking).
@Jonathan_Blow This might be an interesting project to look at. Chart generated by ChatGPT cloning the repo and running a script it generated to do the calculation not sure how accurate it is.
https://t.co/6JlhbgHZKu
NetHack is one of the most complex and longest-lived open source programs ever written, and after 46 years, v5.0 shipped today.
https://t.co/ICEyakS6T5
And ... it is a VERY cool large codebase to work with in the LLM era.
This scoring method still seems unnecessarily arbitrary and confusing. What about reporting something much more transparent, consistent, and meaningful like this:
Report on Human Test Takers:
Games Solved - Median Human Attempt: 18/25 (did the median game attempt solve all levels within game)
Games Solved - Top Human Attempt: 25/25
Levels Solved - Median Human Game Attempt: 78/150 (how many levels of a game did the median game attempt solve, not did median level attempt solve level)
Levels Solved - Top Human Game Attempt: 150/150
Report on AI being Benchmarked:
Games Solved: 1/25
Levels Solved: 42/150
Games Solved using <= moves as median human: 1/25
Games Solved using <= moves as top human: 0/25
Levels Solved using <= moves as median human: 3/150
Levels Solved using <= moves as top human: 0/150
Total Score = (1pt for Solved Level + 1pt for Solved in <= to Median Human + 1pt for Solved in <= to Top Human) * Level Number
So for a game with 6 levels, a full game score is 3+6+9+12+15+18=63
Then just sum the game scores. If you really want a percent and not just the raw number which would be fine in my view, just divide the score by highest possible score.
@ThePrimeagen The scoring is not linear, does not use a layman's understanding of how a game normally decides score, and is not the same as previous ARC scoring methods.
level_score = (human_baseline_actions / ai_actions) ^ 2
https://t.co/rhWI1FPMYA
@Jonathan_Blow@rnmp Just wanted to make sure. If the net is that large than maybe solipsism, why there is something instead of nothing, what is does it mean for something to be true, and when is something alive as well.
The demo we are showing is an RL agent on a laptop playing an unmodified Atari with a robotic controller moving the joystick and a camera looking at the screen.
Only those who benefit from the Coasian system would stay. The point of mentioning dead weight loss is that it means no utopia, as everything is more expensive, velocity of money slows down, growth slows, these effects compound, and everyone there becomes an ancient tribe which no one else is allowed to visit and lives in ancient ways despite lots of benefits elsewhere which they wont allow.
You are a lumberjack in 1600 who primarily is cutting down firewood. Now the government is somehow enforcing that you earn a percentage of what, all money spent on heating no matter the source? What happens when your experience and job is not directly encompassed in some new technology? What is enforcing the percentage of it which you get, vs the guy who does Y or Z?
Coasian economics is just UBI and socialization of industry after the early rounds of 'ownership'
@ericweinstein This incentivizes dysgenic behavior, where money and power is controlled by whomever has the most bodies instead of whoever is the most effective.
@sgyahn@ericweinstein If you distribute resources based on population, especially a majority of resources, there is an incentive to increase the population aligned with your interests regardless of any dysgenic societal cost or effects. Family unit, religious group, ethnic group, political group, etc
@BigShamTheBeard@ericweinstein I think I maybe formulated my response wrong. I assumed that the real issue, a dysgenic expansion incentive, was obvious. It might not have been. Maybe the questions would be more directly addressed if that was the headline rather than examples supporting it.
@sgyahn@ericweinstein I think he would, but is just a busy famous person. Hopefully these concerns are a bit more on his radar now and he will get ahead of them either with a good answer, or with something beside Coase.
A Douglas dividend has the same dysgenic expansion incentive in my view.
Whatever comes next needs to address these concerns. If it incentivizes right assignment by population, that leads to a dysgenic flood of participants, human or AI.
Central should be incentivizing values like truth, predictive power, goodwill, order, not expansion.
Dilution and caps both have inevitable failure states that should be unacceptable.