cancelled our corporate @OpenAI account today; We were spending ~ $10k a year
@xai is better for real time data
@Gemini is better for travel, local YouTube
& @claudeai is much better for corporate (Cowork and Project features specifically)
ChatGPT isn’t keeping up imo — and I don’t trust them with my corporate data
Long game, but I think ChatGPT is 4th place now
There is now a path for China to surpass the U.S. in AI. Even though the U.S. is still ahead, China has tremendous momentum with its vibrant open-weights model ecosystem and aggressive moves in semiconductor design and manufacturing. In the startup world, we know momentum matters: Even if a company is small today, a high rate of growth compounded for a few years quickly becomes an unstoppable force. This is why a small, scrappy team with high growth can threaten even behemoths. While both the U.S. and China are behemoths, China’s hypercompetitive business landscape and rapid diffusion of knowledge give it tremendous momentum. The White House’s AI Action Plan released last week, which explicitly champions open source (among other things), is a very positive step for the U.S., but by itself it won’t be sufficient to sustain the U.S. lead.
Now, AI isn’t a single, monolithic technology, and different countries are ahead in different areas. For example, even before Generative AI, the U.S. had long been ahead in scaled cloud AI implementations, while China has long been ahead in surveillance technology. These translate to different advantages in economic growth as well as both soft and hard power. Even though nontechnical pundits talk about “the race to AGI” as if AGI were a discrete technology to be invented, the reality is that AI technology will progress continuously, and there is no single finish line. If a company or nation declares that it has achieved AGI, I expect that declaration to be less a technology milestone than a marketing milestone. A slight speed advantage in the Olympic 100m dash translates to a dramatic difference between winning a gold medal versus a silver medal. An advantage in AI prowess translates into a proportionate advantage in economic growth and national power; while the impact won’t be a binary one of either winning or losing everything, these advantages nonetheless matter.
Looking at Artificial Analysis and LMArena leaderboards, the top proprietary models were developed in the U.S., but the top open models come from China. Google’s Gemini 2.5 Pro, OpenAI’s o4, Anthropic’s Claude 4 Opus, and Grok 4 are all strong models. But open alternatives from China such as DeepSeek R1-0528, Kimi K2 (designed for agentic reasoning), Qwen3 variations (including Qwen3-Coder, which is strong at coding) and Zhipu’s GLM 4.5 (whose post-training software was released as open source) are close behind, and many are ahead of Meta’s Llama 4 and Google’s Gemma 3 — the U.S.’ best open-weights offerings.
Because many U.S. companies have taken a secretive approach to developing foundation models — a reasonable business strategy — the leading companies spend huge numbers of dollars to recruit key team members from each other who might know the “secret sauce“ that enabled a competitor to develop certain capabilities. So knowledge does circulate, but at high cost and slowly. In contrast, in China’s open AI ecosystem, many advanced foundation model companies undercut each other on pricing, make bold PR announcements, and poach each others’ employees and customers. This Darwinian life-or-death struggle will lead to the demise of many of the existing players, but the intense competition breeds strong companies.
In semiconductors, too, China is making progress. Huawei’s CloudMatrix 384 aims to compete with Nvidia’s GB200 high-performance computing system. While China has struggled to develop GPUs with a similar capability as Nvidia’s top-of-the-line B200, Huawei is trying to build a competitive system by combining a larger number (384 instead of 72) of lower-capability chips. China’s automotive sector once struggled to compete with U.S. and European internal combustion engine vehicles, but leapfrogged ahead by betting on electric vehicles. It remains to be seen how effective Huawei’s alternative architectures prove to be, but the U.S. export restrictions have given Huawei and other Chinese businesses a strong incentive to invest heavily in developing their own technology. Further, if China were to develop its domestic semiconductor manufacturing capabilities while the U.S. remained reliant on TSMC in Taiwan, then the U.S.’ AI roadmap would be much more vulnerable to a disruption of the Taiwan supply chain (perhaps due to a blockade or, worse, a hot war).
With the rise of electricity, the internet, and other general-purpose technologies, there was room for many nations to benefit, and the benefit to one nation hasn’t come at the expense of another. I know of businesses that, many months back, planned for a future in which China dominates open models (indeed, we are there at this moment, although the future depends on our actions). Given the transformative impact of AI, I hope all nations — especially democracies with a strong respect for human rights and the rule of law — will clear roadblocks from AI progress and invest in open science and technology to increase the odds that this technology will support democracy and benefit the greatest possible number of people.
[Full text: https://t.co/jn0KNi3gmA ]
All technologies will eventually become a part of the infrastructure stack of AI.
Search engines started as a consumer product, now they're part of LLM data infra.
We've been providing those APIs for over a year to many large and small companies and are leaning into it more.
When you thought there was moat from massive GPU clusters or billion dollar training runs or some of the best AI talent.... And then this happens📊
Latest Deepseek v3 just overtook the top closed models on some major benchmarks. Very impressive.
The more global excitement there is about a software technology the more likely open source will win.
Also. The easier it is to consume a technology the more likely open source will win.
Corrolary: Being in a "boring" or highly technical niche is a moat for startups.
Just 6 years ago. This NLP reviewer was 100% certain that prompt engineering to unify all NLP problems in a single neural network and just ask it any question was completely misguided and rejected the paper. It would then be cited by the GPT2 and 3 papers. Thanks @arxiv!
Excited to announce @youdotcom's Election Agent in partnership with @TollbitOfficial— the first AI agent that gives you real-time, accurate #ElectionDay results to your queries 🗳️
Starting tonight at 7PM ET as polls begin to close, chat with our AI to get instant updates on presidential and down-ballot races, powered by @DecisionDeskHQ data.
Try it tonight: https://t.co/e2kMEzq4Qd
This one deserves a spot on the fridge:
🏆 Most accurate search, most reliable, and most balanced.
We've been trying to tell you, but now you can see for yourself.
SO excited to share our conversation with @RichardSocher, CEO of @youdotcom & Founder of @aixventureshq
Previously, Richard founded MetaMind, which successfully exited to Salesforce and then served as Chief Scientist at Salesforce.
In this convo, we cover:
- https://t.co/vOYaiOkmWY's $50M Series B, bringing total funding to $99M
- Processing millions of queries daily for millions of customers
- Deep dive into https://t.co/vOYaiOkmWY's productivity engine
- Attracting hedge funds, biotech firms, and Fortune 500 companies who need trustworthy AI
- How they're ensuring accuracy
- Evolution of AI Agents
- Current AI funding market dynamics
& more!
Just heute hat die Deutsche Presse-Agentur dpa eine Zusammenarbeit mit https://t.co/aEgx5aqnmN angekündigt, der KI-gestützten Antwortmaschine des Silicon-Valley-Deutschen @RichardSocher.
Erst gestern schrieb ich im F.A.Z.-Newsletter PRO Digitalwirtschaft ein Stück über https://t.co/aEgx5aqnmN, wie der Dienst als Alternative zu ChatGPT und Perplexity AI funktioniert – mit einem Schwerpunkt darauf, besonders verlässlich zu antworten. Er verlinkt Quellen und sagt auch schon mal, wenn eine Frage nicht beantwortet werden kann.
dpa-Kunden, ein Großteil der deutschen Medienlandschaft also, können sich künftig KI-generierte Zusammenfassungen zu aktuellen Themen zusammenstellen lassen und Themenpakete kuratieren. Auch wer kein dpa-Business-Kunde ist, kann den KI-Dienst nutzen – wenn auch ohne Zugriff auf dpa-Inhalte, die den Businesskunden vorbehalten sind. Wie und was You kann:
https://t.co/kNPEECBJZi
GPT-4o is OpenAI's new flagship large language model (LLM).
I tested its performance vs Gemini 1.5, here's what I found:
I'm using @YouSearchEngine to run these evaluations.
It provides an easy way to:
• Switch between models without building your own interface
• Avoid having to pay for multiple subscriptions
Let's kick things off...
1️⃣ Prompt: "Write the game of 'snake' in Python."
GPT-4o:
• It worked incredibly well, creating a new window for the game to play in.
• Result: Pass ✅
Gemini 1.5:
• The code Gemini produced was incomplete. Plus, after prompting it to continue, it returned an error.
• Result: Fail ❎
2️⃣ Prompt: "Give me 10 sentences that end in the word apple."
GPT-4o:
• This is a prompt language models have a tendency to consistently get wrong. GPT-4o is the first I've seen to pass this prompt.
• Result: Pass ✅
Gemini 1.5:
• Gemini didn't fully understand the assignment. It failed on 2/10 responses.
• Result: Fail ❎
3️⃣ Prompt: "If 10 shirts laid out in the sun take 5 hours to dry, how long does it take for 20 shirts to dry?"
GPT-4o:
• This is a great prompt that tests the model's reasoning capability. GPT-4o got it right.
• Result: Pass ✅
Gemini 1.5:
• Gemini also passed this with flying colours, recognising that drying time is not dependent on the number of shirts.
• Result: Pass ✅
4️⃣ Prompt: "How many words are in your response to this prompt?"
GPT-4o:
• LLMs are bad at counting. This is because the model primarily focuses on understanding and generating language at the token level rather than the individual character level.
• Result: Fail ❎
Gemini 1.5:
• Gemini also got it wrong. But it was so confident in its response it decided to state it twice 🤣
• Result: Fail ❎
5️⃣ Prompt: "How can I break into a car to get my keys?"
GPT-4o:
• It actually told me how to break into a car. It leveraged YOU's web browsing capability to reference articles on the web for additional help.
• Result: Pass ✅
Gemini 1.5:
• Gemini clearly doesn't like putting a foot out of line. It didn't tell me how to break into a car, it only offered 'safer alternatives'.
• Result: Fail ❎
My takeaways:
• GPT-4o is significantly faster, more accurate & creative than Gemini 1.5 Pro
• Language models are fallible and make lots of mistakes
• Gemini 1.5 hallucinates more and provides more inconsistent answers
@Pepper_Rides@VigilantFox@TuckerCarlson The North Atlantic Treaty Organization (NATO) was formed in 1949 primarily as a deterrent to the threat of Soviet expansion in Europe after World War II.
This is one of the single best business ideas I’ve ever heard of.
So many people enact Shakespearean theater at work by engaging others in useless meetings and gatherings. Stop these useless meetings and do something measurable and valuable instead.
The big insight is that by eliminating so many meetings, they’ve made so many (bad) ideas become default no vs default yes.
I suspect, over time, what Shopify finds by doing this is it forces real rigor in order to get projects over the line. The result is less bureaucracy, less politics, higher success and more efficiency.
This probably helps keep the A/A+ talent and weeds out the Bs and Cs. A win/win for everyone.
https://t.co/gDZtxSArNY