Detecting misbehavior in frontier reasoning models
Chain-of-thought (CoT) reasoning models “think” in natural language understandable by humans. Monitoring their “thinking” has allowed us to detect misbehavior such as subverting tests in coding tasks, deceiving users, or giving up when a problem is too hard.
We believe that CoT monitoring may be one of few tools we will have to oversee superhuman models of the future.
We have further found that directly optimizing the CoT to adhere to specific criteria (e.g. to not think about reward hacking) may boost performance in the short run; however, it does not eliminate all misbehavior and can cause a model to hide its intent. We hope future research will find ways to directly optimize CoTs without this drawback, but until then:
We recommend against applying strong optimization pressure directly to the CoTs of frontier reasoning models, leaving CoTs unrestricted for monitoring.
We understand that leaving CoTs unrestricted may make them unfit to be shown to end-users, as they might violate some misuse policies. Still, if one wanted to show policy-compliant CoTs directly to users while avoiding putting strong supervision on them, one could use a separate model, such as a CoT summarizer or sanitizer, to accomplish that.
Japan's 2024 "web3 White Paper" highlights the government's dedication to establishing web3 technologies as a foundational social infrastructure.
This commitment is evident in the favorable regulatory and tax incentives aimed specifically at Virtual Asset Service Providers (VASPs). Recent actions by several Japanese banks to issue stablecoins underscore this strategy, reflecting a clear governmental push to integrate web3 into the country's economic framework.
https://t.co/M13YyoHp8z
Granted “2.5” is most certainly a progression from 2 towards 3, these businesses are deeply rooted in Web 2. So while capturing the values and economics of Web3 might be a step in the right direction, they are unlikely to uproot itself completely from 2 in pursuit of 3.
I've made that point before:
- LLM: 1E13 tokens x 0.75 word/token x 2 bytes/token = 1E13 bytes.
- 4 year old child: 16k wake hours x 3600 s/hour x 1E6 optical nerve fibers x 2 eyes x 10 bytes/s = 1E15 bytes.
In 4 years, a child has seen 50 times more data than the biggest LLMs.
1E13 tokens is pretty much all the quality text publicly available on the Internet. It would take 170k years for a human to read (8 h/day, 250 word/minute).
Text is simply too low bandwidth and too scarce a modality to learn how the world works.
Video is more redundant, but redundancy is precisely what you need for Self-Supervised Learning to work well.
Incidentally, 16k hours of video is about 30 minutes of YouTube uploads.
【 Lootex - The First NFT Marketplace on Mantle 】
Now you can trade your @0xMantle NFTs with @LootexIO Marketplace! 🔥
Let’s take blockchain gaming to the next level, and get ready for a seamless and more efficient NFT trading experience. 🚀
Learn more: https://t.co/sGkMyuZOKK
Trade Here: https://t.co/s0K3qoQMwe
Less than 99 hours since Runway dropped their Image-to-video update...
The videos that people are creating just from an image are ... bananas!
Here are 11 of my favorite examples:
🚀 Exciting News! We're thrilled to introduce our new open-source project Accio (https://t.co/EY5EhvYzzn), an innovative tool that transforms the way you interact with your data warehouse.
Accio is your central repository for defining consistent relationships, metrics, and expressions, providing a single source of truth in your data warehouse. With on-demand SQL generation, it offers a composable, reusable approach to data exploration. 📊
Key Features:
1️⃣ Effortless Data Exploration: With Accio, you can explore data and perform analytics without worrying about inconsistent metrics. It's a game-changer for Data Engineers, Analysts, Data Scientists, and even for Application Users.
2️⃣ Human-Readable Data Models: Accio provides a syntax similar to #GraphQL, making data models more understandable and maintainable.
3️⃣ Visualize Your Data Model: Accio offers a user-friendly interface that provides a holistic view of the relationships between your data models.
4️⃣ Accelerated Metric Access: Leveraging #DuckDB's caching capabilities, Accio enhances productivity and reduces strain on data sources, resulting in efficient and seamless data exploration.
5️⃣ Standard SQL Support: Accio supports the #PostgreSQL wire protocol and a standard SQL dialect. It can dynamically generate SQL queries on-demand.
Join the Accio community today and bring consistent understanding to your metrics. Get started now!
👉 Get Accio: https://t.co/EY5EhvYzzn
⭐️ GitHub: https://t.co/9epOMnvBoF
#DataAnalytics #DataWarehouse #SQL #Accio
Creating an environment of M&As to stimulate alternative exit strategies in Japan through promoting Open Innovation. - Minister for Economic Revitalization & Startups, Shigeyuki Goto #CityTechTokyo#OpenInnovation
Many similarities between the current state of Taiwan and Japan ecosystems. Both ecosystems need more “success stories in M&A”, “Cultural confidence” and “Shift the narrative”.
Socio-cultural conventions in Japan considers exiting by M&A to large corporations a “failure”, which leads to potentially pre-mature IPOs and depressed valuations. - Dr. Ulrike Schaede. #CityTechTokyo#OpenInnovation
Here's how fast funding to generative AI is climbing and where the money is going to
Not surprising but the #1 investor in generative AI is Illuminati Ventures
https://t.co/1ynmoJDbIX
What’s really confusing about the term “unstructured data” to date is that people seem don’t take the degree of “unstructured” into consideration. Image and audio data is sheer unstructured, and text being in raw/Markdown/JSON/HTML format is already sort of semi-structured.