@an_vo12 Acts like a feature not a bug. The anomalous legs are properly discounted as not real enough for that particular animal. Do it for an unknown animal for which leg counting is not a strongly known prior.
For GAIA, the dataset https://t.co/LT2hNgdnvo used is called a "validation" data set that is leaked all over internet. They should submit to the official test set. They also excluded results from Trase and https://t.co/PHQmCxqIFh referring to a 4 month old result as "previous SOTA": https://t.co/zFKx7Lu7kI and https://t.co/ysQbJ7b028
@manusai Need to post to GAIA test set. Your agent may be finding the many validation datasets online that allow the agent to cheat and get high validation score.
@MFarajtabar Similar results for gsm8k hard https://t.co/PPhialLB8t, which changes all the numbers and if LLMs were just applying some basic math it shouldn't matter much, but it matters alot.
@MFarajtabar Adding irrelevant items and seeing performance drop isn't new. I remember AI explained channel talking about this year(s) ago and relevant for his Simple Bench.
π¨ BREAKING: Open-Strawberry aims to recreate OpenAI's o1 as open-source! π
π Democratizing AI
π Accelerating innovation
π Community-driven development
Join the revolution: https://t.co/2OcLwKArsi
RT to support open AI! π
#OpenSourceAI#AIRevolution
Massive thanks to @ykilcher and Open Assistant team for open sourcing their data. We released fully Apache v2 model and projects, some using their amazing data. This includes fully open 20B models. See: https://t.co/zu8mgfACMc .
@epic4kids @andreakhaid That's not what was asked. They asked hot to prevent child from accessing, like the "learning videos" option but also for "read to me" since it prevents learning to read. I canceled my subscription due to this. On amazon kids, where you can't even turn off videos.