For 2024, excited to share an open-source product we’ve been working on: VerificationGPT (https://t.co/SxrOfFaPBP)
VerificationGPT uses Brave Search, arXiv search and GPT-4 to validate scientific claims. It often gives accurate citations for questions which vanilla GPT-4 stumbles on or doesn’t cite accurately.
We already use it internally in preference to vanilla GPT-4, even for general Internet questions.
It’s available to anyone with a ChatGPT Plus subscription now in the GPT Store @ https://t.co/SxrOfFaPBP.
As an open-source product (https://t.co/SKzXLTm6Jy), we also welcome using the library in your own products or sending PRs to add additional context to its judgments (legal APIs, human fact-checking APIs, peer review APIs, etc.). Context APIs to detect synthetic content watermarks in images are already planned, along with product integrations with other social media platforms.
Towards a healthier world in 2024 via open personal AI tools for everyone :)
Latest episode is with @chrislengerich of @contextfund !
We cover how AI is transforming:
- how startups are formed and funded
- how we discover breakthrough science
- how we might cure all disease
It was a fun one, full episode below!
Chapters
00:00 – Introduction
03:47 – The AI Gold Rush
06:53 – The War for Top AI Talent
11:39 – Verification: The New Frontier
14:12 – Automating the VC Funding Process
25:37 – AI-Powered Teams and Startup Dynamics
29:16 – AI in Drug Discovery & Development
37:11 – Personalization & Reducing Friction
40:30 – California’s Role in the AI Gold Rush
45:37 – The Inner vs. Outer AI Economy
48:23 – Curing Disease with AI
Full episode here:
Apple: https://t.co/0HWwSwFm3O
Spotify: https://t.co/fOrsIuzr0M
Youtube: https://t.co/UtxgRWDa6z
+1 for "context engineering" over "prompt engineering".
People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step. Science because doing this right involves task descriptions and explanations, few shot examples, RAG, related (possibly multimodal) data, tools, state and history, compacting... Too little or of the wrong form and the LLM doesn't have the right context for optimal performance. Too much or too irrelevant and the LLM costs might go up and performance might come down. Doing this well is highly non-trivial. And art because of the guiding intuition around LLM psychology of people spirits.
On top of context engineering itself, an LLM app has to:
- break up problems just right into control flows
- pack the context windows just right
- dispatch calls to LLMs of the right kind and capability
- handle generation-verification UIUX flows
- a lot more - guardrails, security, evals, parallelism, prefetching, ...
So context engineering is just one small piece of an emerging thick layer of non-trivial software that coordinates individual LLM calls (and a lot more) into full LLM apps. The term "ChatGPT wrapper" is tired and really, really wrong.
We are excited to announce the release of an @MLCommons AI Safety benchmark POC. Built through an inclusive decision-making and engineering process, the POC validates our approach to a v1.0 AI Safety benchmark suite. Learn more: https://t.co/LmEKYS05ME
#AI, #benchmarks
Based on market research, this April 1st, we're thrilled to announce that we're sunsetting VerificationGPT to focus on AgreeGPT.
Because we all need a yellow rubber duck who just agrees with whatever we say:
https://t.co/UHEODZ9e4B
Our response to the NTIA Open Weights RFC (15 pages) closing tonight - open weights are likely more secure than closed.
Read, comment and sign:
https://t.co/MkbjrBghvz
Update: https://t.co/20rbXxuxyO now supports verifying C2PA content signatures like those generated by DALL·E 3 or Photoshop.
Just copy the image url and type "@VerificationGPT verify <image url>" in ChatGPT Plus.
MemGPT: Towards LLMs as Operating Systems - Charles Packer, Vivian Fang, Shishir G. Patil, Kevin Lin, Sarah Wooders, Joseph E. Gonzalez
Reviews and discussion: https://t.co/uPu3iTde26
We're on Reddit now, and we've launched Context Awards: $1000 and up for useful open-source projects.
Open-source investing in AI. Drop by, say hi, and nominate your favorites:
https://t.co/5D5jytrmnm
Our comment to the recent OSTP Request for Information on AI National Priorities is available here:
https://t.co/yN3oUDGz8S
One of the primary recommendations would be creating an AI National Priorities Fund to invest into AI companies which provide underinvested public goods, such as eval sets for AI safety, high-context social media to combat misinformation and basic bio research.
We also recommend reaffirming support for open-source and explore limiting stock buybacks to encourage reinvestment and prevent the curse of resources.
Leave a comment if you’d also like to show support for this direction.
The differentiable credit license is one option we're testing for open-source code and text. It works best when it's used for your service (since you have stable revenue), but eventually might power something as large as economy-wide open-source collaboration:
https://t.co/P024XyYHzb
Not sure what to work on that won't be obsolete with new AI systems in the future?
Posting a $1,000 prize for the best open-source model and paper that answers this important homomorphic AI security question within the next 2 months*
Threat model: Rogue employee at a cloud AI provider like OpenAI who can read LM API requests which may include email data (or equivalently, a hacker with access to OpenAI logs).
Task: Given dataset X and a fraud dataset Z, improve the data efficiency of fine-tuning a model using a transformation y with a key k such that:
1. Without k, y(X) is hard for a human to read
2. With k, y(X) is easy to read
3. A model trained on y(X) has high prediction accuracy (close to that of a model trained on X)
4. We can train a fraud discriminator on the encoded data without knowing the underlying x: {y(Z) → 1, y(X) → 0, with high probability}.
A baseline example is a substitution cipher in the token vocabulary space as y, where k is the substitution order - we know that if we completely retrain a model on y(X), it should have the same accuracy as one trained on X, and also be hard for someone with a human to understand. But a.) it's vulnerable to statistical attack b.) it may be data-inefficient to train.
Is there a better such y? What is the data-efficiency of training it?
If we can do this, we make personal data safer when using deployed AI models (preserving the economic value of the individual), and the cloud logs less of a honeypot target for hackers.
*Partial credit may also be awarded to multiple submissions and to the open-source dependencies of the submissions.
Follow for more prizes and challenges around important technical problems in the personal bot economy for scientists and investors.