Um Interlocutor

@UmInterlocutor

Apenas isso (e programador). Prezo pela boa discussão, com respeito e humildade intelectual. Pode me cobrar, se me ver fazendo o contrário.

Joined October 2022

624 Following

85 Followers

2.2K Posts

Pinned Tweet

Um Interlocutor @UmInterlocutor

almost 3 years ago

Recentemente, vem aparecendo cada vez mais manchetes e vídeos sobre inteligência artificial e seus riscos, até de extinção da humanidade. Nesse thread, vou tentar explicar melhor isso 🧵

UmInterlocutor's tweet photo. Recentemente, vem aparecendo cada vez mais manchetes e vídeos sobre inteligência artificial e seus riscos, até de extinção da humanidade. Nesse thread, vou tentar explicar melhor isso 🧵 https://t.co/vz45skY0G0

20K

Um Interlocutor @UmInterlocutor

1 day ago

Experimentos bem legais mostrando que um neurônio biológico consegue fazer muito mais do que um neurônio artificial, através dos dendritos. Isso talvez signifique que aquelas comparações entre o número de sinapses e pesos de uma rede neural esteja mais errado ainda.

Ido Aizenbud

@IdoAizenbud

1 day ago

What can a neuron compute? Real biological neurons are complex, but how capable are they? Using a new method, we found that a single cortical neuron can classify cats vs dogs, recognize spoken words, and solve 10-bit parity, all tasks thought to require entire networks. (1/15)

IdoAizenbud's tweet photo. What can a neuron compute?

Real biological neurons are complex, but how capable are they?

Using a new method, we found that a single cortical neuron can classify cats vs dogs, recognize spoken words, and solve 10-bit parity, all tasks thought to require entire networks. (1/15) https://t.co/SqQKjrEjUF

311

204K

Um Interlocutor @UmInterlocutor

1 day ago

@avioesemusicas Esse paper é de 2024, e até hoje esse problema não se concretizou de verdade, porque o paper testa uma versão irreal de treinamento. Tiveram algumas críticas a essa ideia de "model collapse", por ser irreal/exagerada. Sugiro esse fio/paper: https://t.co/GkNC3TaozJ

Rylan Schaeffer @RylanSchaeffer

about 1 year ago

What is the future of web-scale synthetic data, and what harms might such data cause? Delighted to announce our new position paper: Model Collapse Does Not Mean What You Think https://t.co/qKA1uQseQ9 @JoshuaK92829 @AlvanArulandu @sanmikoyejo w/ 🙏to @ang3linawang @walesalaudeen96

RylanSchaeffer's tweet photo. What is the future of web-scale synthetic data, and what harms might such data cause?

Delighted to announce our new position paper: Model Collapse Does Not Mean What You Think

https://t.co/qKA1uQseQ9

@JoshuaK92829 @AlvanArulandu @sanmikoyejo w/ 🙏to @ang3linawang @walesalaudeen96

10K

567

Um Interlocutor @UmInterlocutor

1 day ago

@WeiseFranklin Sim, e essa ideia de "model collapse" já foi bem questionada por ser um pouco irreal/exagerada. Sugiro esse fio/paper: https://t.co/GkNC3TaozJ

Rylan Schaeffer @RylanSchaeffer

about 1 year ago

10K

683

Who to follow

Steamed sweet potato

@Chachamic

Owns Bitcoin. Splatoon love. X is killing time. A billionaire from 2023

pararth

@pararths

building something new. ex-Google Gemini, Meta, Stanford, IIT-B

k110○

@feJ41fvjHDKaTEg

Um Interlocutor @UmInterlocutor

1 day ago

Parece uma boa benchmark a ser acompanhada para ver o progresso dos agentes de IA em resolver tarefas reais de trabalho.

Dawn Song

@dawnsongtweets

1 day ago

Everyone says the latest AI agents will be "job-ready" soon, especially after the release of Fable 5 this week. But is that really the case? Over the past many months, my group and collaborators have been building Agents' Last Exam (ALE), a benchmark designed to test exactly that claim on real digital labor-market work. My group and collaborators previously have created many of the benchmarks the field runs on, including MMLU, MATH, CyberGym, and ExploitGym. Today, I'm excited to share Agents' Last Exam (ALE): a rolling benchmark that measures whether AI agents can actually perform economically valuable work across a broad range of real-world domains. With ALE, we evaluated Fable 5, GPT-5.5, Composer 2.5, and other frontier agent systems across more than 1,500 expert-sourced tasks spanning 55 occupations. The result is both impressive and sobering. Today's agents can solve a meaningful fraction of professional tasks. But when we look at the hardest tasks, the ones requiring sustained reasoning, deep domain expertise, and reliable execution over long horizons, they are still far from human-level performance. On ALE's hardest tier, every frontier agent we tested, including Fable 5, achieved a 0% success rate. The age of useful agents is here. The age of truly job-ready agents is not. We hope Agents' Last Exam (ALE) will serve as a new guidepost and north star for developing agents capable of reliably performing economically valuable work across a broad range of domains. 🧵

$dawnsongtweets's tweet photo. Everyone says the latest AI agents will be "job-ready" soon, especially after the release of Fable 5 this week. But is that really the case? Over the past many months, my group and collaborators have been building Agents' Last Exam (ALE), a benchmark designed to test exactly that claim on real digital labor-market work. My group and collaborators previously have created many of the benchmarks the field runs on, including MMLU, MATH, CyberGym, and ExploitGym. Today, I'm excited to share Agents' Last Exam (ALE): a rolling benchmark that measures whether AI agents can actually perform economically valuable work across a broad range of real-world domains. With ALE, we evaluated Fable 5, GPT-5.5, Composer 2.5, and other frontier agent systems across more than 1,500 expert-sourced tasks spanning 55 occupations. The result is both impressive and sobering. Today's agents can solve a meaningful fraction of professional tasks. But when we look at the hardest tasks, the ones requiring sustained reasoning, deep domain expertise, and reliable execution over long horizons, they are still far from human-level performance. On ALE's hardest tier, every frontier agent we tested, including Fable 5, achieved a 0% success rate. The age of useful agents is here. The age of truly job-ready agents is not. We hope Agents' Last Exam (ALE) will serve as a new guidepost and north star for developing agents capable of reliably performing economically valuable work across a broad range of domains. 🧵$

734

157

365

183K

Um Interlocutor @UmInterlocutor

1 day ago

@coproduto Você viu esse resultado aqui, em que eles dizem que a arquitetura deles consome 50-100x menos computação/dados? E parece real, é open-source e já teve gente que reproduziu. Não sei porque não teve tanta repercussão. https://t.co/vGRUkCDSvI

Sapient Intelligence @Sapient_Int

25 days ago

Introducing HRM-Text. An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performance with a fraction of the data, compute, and infrastructure. Trained on just 40B structured tokens, HRM-Text achieves competitive performance while using ~1/1000 of the training data of comparable models. The kicker? The full model trains in roughly one day on a $1,000 budget. This opens the door to a new generation of AI that is powerful, accessible, and radically easier to adapt. Theories and research concepts once deemed too expensive to test are officially back in the game. Sapient Intelligence invites you to help us shape a new paradigm for general intelligence.

160

268

508K

Um Interlocutor @UmInterlocutor

2 days ago

Pontos relevantes p/ a discussão

Tom Davidson

@TomDavidsonX

2 days ago

I'm seeing a lot of hate for Anthropic's decision to secretly nerf ai RnD capabilities. But I haven't seen critics engage with the imo strongest defence of Anthropic: 1. By far the biggest risks are from superintelligent AI 2. To manage these risks the leading company will need to pause partway through the intelligence explosion. (Pausing at this time allows them to a) generate the compelling empirical evidence of misalignment that will be needed justify a longer global pause, AND b) use powerful ai to massively accelerate alignment progress. A pause today couldn't accomplish either.) 3. A pause is MUCH more likely if the leading company has a big lead. It's much less likely if multiple companies are neck and neck. (More specifically, Anthropic had good reason to think OAI wouldn't pause. This makes it v hard for Anthropic to pause if they're neck and neck. Hopefully recent announcements build mutual trust that everyone will pause) 4. If lagging AI companies can use the leader's AI for ai RnD during an intelligence explosion, the leader *cannot* maintain their lead. (This point is underappreciated. If you model out the intelligence explosion, you'll find that a laggard with equal access to the leading AI quickly catches up to the leader bc the leader faces big headwinds from having plucked low hanging fruit.) 5. So: sharing ai RnD access with competitors massively decreases the chance of a pause at the critical time, and massively increases the risk from superintelligent AI 6. Anthropic can't block competitors using Mythos without the silent sabotage. For the obvious reason: it's very hard for a frozen safeguard to block someone that can iterate against it. It sucks that this is the only way, but it is. 7. They've long had terms of service against competitors using Claude for AI RnD. They have a right to enforce their terms of service. This is the only way. --- Overall, silent sabotage is a very spooky and scary precedent to be setting and imo the wrong call. But still, the above is a strong argument for Anthropic's actions and I haven't seen it rebutted.

220

32K

UmInterlocutor retweeted

Dario Amodei

@DarioAmodei

2 days ago

Today I'm publishing a new essay, Policy on the AI Exponential. AI is progressing extremely fast—much faster than the policy process was built to handle. The essay lays out where I think the technology is now, and the action needed to close the gap: https://t.co/Lh6PWae178

13K

12K

Um Interlocutor @UmInterlocutor

7 days ago

@LukeberryPi Olha os comentários. Boa parte das pessoas simplesmente descarta totalmente a possibilidade de alguém falar a verdade. Não tem o mínimo de dúvida. Na época do projeto manhattan, alguns deles provavelmente falariam que era fearmongering do Einstein e Szilard.

314

Um Interlocutor @UmInterlocutor

10 days ago

@coproduto Sim, concordo que foi mudando ao longo do tempo, e continua mudando.

Um Interlocutor @UmInterlocutor

10 days ago

@coproduto Você acha isso? Que eles estão há 10 anos ou mais fazendo só marketing? Até em conversas privadas vazadas depois?

103

Um Interlocutor @UmInterlocutor

16 days ago

Calma aí, kk. Acho igualmente ruim quando alguém é categórico pra dizer que vai acabar o mundo ou similar. A diferença é que normalmente eu não preciso falar nada por que alguém já vai e faz esse trabalho por mim. Novamente, desculpe a chatice, é que tem gente que usa o "sempre" literalmente, daí não tinha como saber.

Um Interlocutor @UmInterlocutor

16 days ago

Tendi, eu só estava sendo chato mesmo (como de costume), porque você foi muito categórico por um momento, mas aparentemente já corrigiu. Na verdade você é um dos perfis que mais concordo com a visão e jeito de pensar. E eu não sabia dessas tretas aí não, até me surpreende alguém ter chamado de negacionista ou algo assim. Pelo que acompanhei, você sempre foi um dos mais abertos a ideia de AGI na bolha dev.

Um Interlocutor @UmInterlocutor

16 days ago

Concordo que boa parte do pessoal da IA não considera suficientemente os fatores não técnicos dessa discussão. Também concordo que AGI não faria todo mundo abandonar o trabalho "de repente". Mas entre "de repente" e "nunca" tem várias outras possibilidades. Também acho a IA (e AGI) bem diferente de crypto (e de quase todas outras tecnologias), sendo muito mais impactante, podendo quebrar vários paradigmas com esse impacto.

Um Interlocutor @UmInterlocutor

16 days ago

@coproduto @gustavo_pch @RafaelMorgan Concordo que ele terá que existir por um tempo. Mas novamente, acho "sempre" muito forte. Assim como as leis, isso vai depender da vontade da sociedade e de vários outros fatores que acho difícil prever com tanta certeza.

Um Interlocutor @UmInterlocutor

16 days ago

@coproduto @gustavo_pch @RafaelMorgan Concordo que por um tempo vai ser assim. Mas acho bem difícil cravar que "sempre" será assim. Até porque a necessidade de um "cuidador" limitaria a própria capacidade da IA fazer as coisas de forma mais rápida/eficiente.

Um Interlocutor @UmInterlocutor

17 days ago

@AndyMasley @ProfNoahGian

UmInterlocutor retweeted

Peter Wildeford🇺🇸🚀

@peterwildeford

18 days ago

Once upon a time there was an Lead AI Developer who's AI was not getting impressive benchmark results. That evening, all of his neighbors came around to commiserate. They said, "We are so sorry to hear that deep learning is hitting a wall. This is most unfortunate." The Lead Developer said, "Maybe." The next day the LLM came back bringing seven massive benchmark scores and even got 90% on the LSAT. I the evening everybody came back and said, "Oh, isn’t that lucky. What a great turn of events. You now are really close to AGI!" The Lead AI Developer again said, "Maybe." The following day his son tried to train the next successor model, and while training it, he found that 10x'ing pre-training compute wasn't giving results anymore. The neighbors then said, "Oh dear, that’s too bad. Deep learning is hitting a wall." and the Lead AI Developer responded, “Maybe.” The day after, the Lead AI Developer announced they'd achieved breakthrough results by adding inference-time compute, RL scaling, and tool use. The neighbors came around and said, "Oh wow, AGI is soon!" The Lead AI Developer said, "Maybe."

350

33K

Um Interlocutor @UmInterlocutor

18 days ago

@ayubio @FSU_BR @elivieira Já comentaram ou fizeram algo sobre isso (ou vão fazer)?

336

UmInterlocutor retweeted

Ayub | Internet propriamente dita

@ayubio

18 days ago · Ananindeua

O PL 3066/2025 aprovado semana passada na Câmara e encaminhado para o Senado prevê prisão para quem desenvolver ou fornecer serviço de VPN. Não foi falta de aviso meu.

ayubio's tweet photo. O PL 3066/2025 aprovado semana passada na Câmara e encaminhado para o Senado prevê prisão para quem desenvolver ou fornecer serviço de VPN. Não foi falta de aviso meu. https://t.co/r88YAIrZCO

126

547

256

157K

Um Interlocutor

@UmInterlocutor

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users