Excited to share FrontierSWE!
FrontierSWE is an ultra-long horizon, extremely hard coding agent benchmark.
Models get 20 hours to solve tasks spanning greenfield implementation, research, and performance optimization.
No frontier model is currently capable of solving *any* task in FrontierSWE.
Something a little different: I recently published an Ethics paper to AI & Society that shows how granting moral status to AIs can allow them to hijack our moral landscape. Check it out if that sounds interesting https://t.co/fRdkL6FLNk
I'm open-sourcing Steer 🐂: a terminal-first AI coding agent written in Rust 🦀.
It has a TUI for interactive use, headless mode for automation, and can run as a gRPC server for programmatic usage. Steer is fast and built for engineers.
And the winners are.. @SohilAthare @SeverTopan@brendanigraham@m_catoen.
Congratulations to the winning teams and thanks to all participants! See you next time.
@SeverTopan is going to give a spotlight presentation on our recent work with @david_rolnick on learning and solving constraints over raw pixels at 4:30PM PST. Please come and have a chat if you are interested. https://t.co/4vmM3ozAl8
Happy to share that this paper will be a spotlight at #NeurIPS2021 - looking forward to @SeverTopan's presentation! Many areas stand to benefit from better integration of symbolic and statistical reasoning in AI systems.
In exciting new work with @SeverTopan (first author) and @XujieSi, we essentially solve the symbol-grounding problem for SATNet-style neural nets solving certain logical reasoning puzzles (Visual Sudoku): https://t.co/qawwI0QYz3
Details in thread. 1/