Today we're launching Tasklet — an AI agent for automating your business.
Unlike ChatGPT, @TaskletAI actually does the work for you: connecting to your tools, triggering automatically, and handling tasks while you sleep.
Game Arena from @generalityinc is the largest LLM strategy game tournament to date.
Games are great for measuring LLMs on instruction following, long-horizon planning, and problem-solving. In fact, models that are Olympiad-level at math and coding often struggle to make accurate moves in Game Arena (median illegal-move rate: 11.4%).
Game environments also resist contamination and saturation: every run is unique, so there’s no risk of training-data leakage, and the bar keeps rising as models improve.
Game Arena has digitized hundreds of board games into fresh new environments, and today it’s debuting its first tournament, with models like GPT-5 High, Claude Opus 4.1, and DeepSeek V3.1 from OpenAI, Anthropic, Qwen, DeepSeek, Google, and more going head-to-head.
You can watch game replays and full model reasonings on https://t.co/PkQysj7u3r.
Congrats on the launch, @sanerc110 and @kaylalee278!
today we're launching opencode zen
zen provides the best LLMs for coding ALWAYS in the best possible configuration
no more "did claude get dumber?"
no more being routed to a crappy provider
and we're giving it to you at cost
and you can use it anywhere
that enough for you?
Voice AI is growing fast, but reliability is still the #1 challenge.
That’s why we built Roark - the QA + Observability layer for Voice AI.
We’re live on Product Hunt today 🚀 - would mean a lot if you could upvote + comment: https://t.co/hdH5FDfJW6 🙏
@smthomas3@zammitjames@dan_gauci Vibes for voice agents sounds like a disaster 🙈 The stakes are raised with voice agents for security and safety because they’re frequently deployed in healthcare admin or customer facing settings and any awkward pauses or statements are more noticeable that text chat
Voice agents built on "vibes" will fail in production (with @zammitjames and @dan_gauci from Roark)
- Most teams are just "vibing" with voice agents - manually calling and testing based on feeling
- Running proper simulations should be part of your CI pipeline to catch regressions and breaks
- Roark allows both manual testing and scheduled runs to monitor third-party service issues
- When to start using proper evals? As soon as you're onboarding customers or preparing for production
- Even for simple use cases like appointment scheduling, implement evals for your main success criteria
this has been missing from american policy for a while
if we are true believers in capitalism we have to be aggressive in encouraging competition
way too many companies resting on what they've built up - this is anti-capitalist
The voice AI opportunity is here—but solving the last 20% is where the real challenge lies (with @zammitjames and @dan_gauci from Roark AI)
- Voice AI is becoming the new frontier with companies like Roark building "datadog for voice AI" to help developers debug their agents
- Popular use cases include lead generation (outbound sales calls) and appointment scheduling for industries like dental clinics and car dealerships
- Getting to 80% functionality is easy, but the last 20% is where voice agents struggle significantly
- Major issues: 30% of transcriptions are incorrect (some companies hire humans to manually transcribe), conversations deteriorate over time, and tool calling is notoriously inaccurate
- Voice agents have trouble recovering when conversations go off-script, leading to hallucinations and poor user experiences
Mastra (@mastra) is the open-source JavaScript framework for building agents. Companies use Mastra to automate support, build CAD diagrams, scrape the web, do medical transcription, and more.
https://t.co/VVLaBbZW2Z
Congrats on the launch @calcsam, @smthomas3 and @abhiaiyer!
@thdxr I just use a lambda within the same vpc that runs drizzles migrator and use pulumi's aws.lambda.Invocation to autotrigger the lambda on every deploy.