Preston Badeer

@pbadeer

I post about the intersection of 🦾AI, 🤖LLMs, 📊data products, and 📈data engineering. Owner of VDP and @aiworkhorse

Fixed cost, unlimited tokens👉

Joined March 2015

1.6K Following

521 Followers

2.2K Posts

Preston Badeer @pbadeer

12 months ago

@DataChaz The output files: The Video: A compact file containing all the text encoded as QR code frames. The FAISS Index: The binary index for lightning-fast semantic similarity search. The JSON Index: Links the search results from the FAISS index to the correct frame in the video.

Preston Badeer @pbadeer

12 months ago

@DataChaz To enable fast searching, it creates a search index that maps the semantic meaning of the text to the frame number in the video where the corresponding QR code is. The core process can be visualized as: Text -> Chunking -> JSON Payload -> QR Code Image -> Video Frame ...

Preston Badeer @pbadeer

over 1 year ago

Can we get @joinautopilot to create a @levelsio tracker? Or @marclou, or both? These guys love sharing their data, and many folks want to follow their lead. This would be a killer partnership. 🔥

Marc Lou

@marclou

over 1 year ago

OK, I increased the recurring investment to $10,000/week. The only reason I don't go all-in with $600,000 is this: This money is the fruit of 7 years of entrepreneurship failures. If the market crashes tomorrow, I won't be able to sleep. I'm going to invest almost everything I earn in the SP500 because it's proven to pay off after years. I'll just do it over the course of 365 days to lift the risk off.

143

555

258

386K

147

Preston Badeer @pbadeer

over 1 year ago

https://t.co/x8czu15j3Q

Who to follow

David Arnold

@David_M_Arnold

Business builder | Public policy hobbyist Sutherland ➝ Omaha | Leo & Will's Dad

Aviture

@aviture

We believe in innovation with the intent to transform. We’re a team of difference-makers bringing purpose to a world of rapid technology developments.

TechOmaha

@techomaha

Tech events in Omaha.

Preston Badeer @pbadeer

over 1 year ago

Chain of Continuous Thought looks dope, very excited to try models trained this way.

Preston Badeer @pbadeer

over 1 year ago

Source: https://t.co/6iUDdzHpAK

Preston Badeer @pbadeer

over 1 year ago

It's finally $AMD AI time.

101

Preston Badeer @pbadeer

over 1 year ago

@AIatMeta Llama 4 is going to be 🔥🔥🔥🔥

580

Preston Badeer @pbadeer

over 1 year ago

This 👇🏻

jason

@jxnlco

over 1 year ago

Changing a single field name in our LLM response schema improved accuracy from 4.5% to 95% on GSM8k. The fix was simple: going from final_choice to final_answer. Turns out our model was returning a multiple-choice index instead of the actual answer. If you're working with structured outputs: 1. Look closely at your field names - they fundamentally alter model behavior, same prompt, drastically different results 2. JSON mode isn't a free lunch for better performance - it showed 50% more performance variance than Function Calling across 200 test cases 3. A model needs room to think too, like you - Chain of Thought remains critical with up to 60% accuracy improvements With LLMs, it's trivial to generate schema variations and with structured outputs, it's easy to validate the results early on. Look at your data.

988

928

117K

Preston Badeer @pbadeer

over 1 year ago

🔥 This is sick. Using code to run simulations is way too uncommon IMO. So many amazing discoveries can be made by developing a simple simulation framework (even without LLMs).

Matthew Berman

@MatthewBerman

over 1 year ago

.@Microsoft just dropped TinyTroupe! Described as "an experimental Python library that allows the simulation of people with specific personalities, interests, and goals." These agents can listen, reply back, and go about their lives in simulated TinyWorld environments.

MatthewBerman's tweet photo. .@Microsoft just dropped TinyTroupe!

Described as "an experimental Python library that allows the simulation of people with specific personalities, interests, and goals."

These agents can listen, reply back, and go about their lives in simulated TinyWorld environments. https://t.co/TUX0Pq2U7g

276

242K

Preston Badeer @pbadeer

over 1 year ago

pbadeer's tweet photo. https://t.co/8DfXJ712hB

Preston Badeer @pbadeer

over 1 year ago

Swarm is cool but definitely a tutorial. Explicitly not for production and not a library, just an example.

Philipp Schmid

@_philschmid

over 1 year ago

This came unexpected! @OpenAI released Swarm, a lightweight library for building multi-agent systems. Swarm provides a stateless abstraction to manage interactions and handoffs between multiple agents and does not use the Assistants API. 🤔 How it works: 1️⃣ Define Agents, each with its own instructions, role (e.g., "Sales Agent"), and available functions (will be converted to JSON structures). 2️⃣ Define logic for transferring control to another agent based on conversation flow or specific criteria within agent functions. This handoff is achieved by simply returning the next agent to call within the function. 3️⃣ Context Variables provide initial context and update them throughout the conversation to maintain state and share information between agents. 4️⃣ Client run() initiate and manage the multi-agent conversation. It needs an initial agent, user messages, and context and returns a response containing updated messages, context variables, and the last active agent. Insights: 🔄 Swarm manages a loop of agent interactions, function calls, and potential handoffs. 🧩 Agents encapsulate instructions, available functions (tools), and handoff logic. 🔌 The framework is stateless between calls, offering transparency and fine-grained control. 🛠️ Swarm supports direct Python function calling within agents. 📊 Context variables enable state management across agent interactions. 🔄 Agent handoffs allow for dynamic switching between specialized agents. 📡 Streaming responses are supported for real-time interaction. 🧪 The framework is experimental. Maybe to collect feedback? 🔧 Flexible and works with any OpenAI client, e.g., Hugging Face TGI or vLLM-hosted models.

_philschmid's tweet photo. This came unexpected! @OpenAI released Swarm, a lightweight library for building multi-agent systems. Swarm provides a stateless abstraction to manage interactions and handoffs between multiple agents and does not use the Assistants API. 🤔

How it works:
1️⃣ Define Agents, each with its own instructions, role (e.g., "Sales Agent"), and available functions (will be converted to JSON structures).
2️⃣ Define logic for transferring control to another agent based on conversation flow or specific criteria within agent functions. This handoff is achieved by simply returning the next agent to call within the function.
3️⃣ Context Variables provide initial context and update them throughout the conversation to maintain state and share information between agents.
4️⃣ Client run() initiate and manage the multi-agent conversation. It needs an initial agent, user messages, and context and returns a response containing updated messages, context variables, and the last active agent.

Insights:
🔄 Swarm manages a loop of agent interactions, function calls, and potential handoffs.
🧩 Agents encapsulate instructions, available functions (tools), and handoff logic.
🔌 The framework is stateless between calls, offering transparency and fine-grained control.
🛠️ Swarm supports direct Python function calling within agents.
📊 Context variables enable state management across agent interactions.
🔄 Agent handoffs allow for dynamic switching between specialized agents.
📡 Streaming responses are supported for real-time interaction.
🧪 The framework is experimental. Maybe to collect feedback?
🔧 Flexible and works with any OpenAI client, e.g., Hugging Face TGI or vLLM-hosted models.

328

260K

100

Preston Badeer @pbadeer

over 1 year ago

FINALLY got access to @cerebras. They ain't kidding, it's even faster than @GroqInc. 🤯 I'm getting 447/ts on Llama 3.1 70B with JSON parsing. 0.95s round trip!

Preston Badeer @pbadeer

over 1 year ago

This is specific to the Instruct models: https://t.co/HDG7zVhtKE. However, if you're having trouble with Llama 3.1 Instruct 8B on any JSON-mode tasks, I recommend trying 70B before increasing complexity of your pipeline or changing models entirely.

Preston Badeer @pbadeer

over 1 year ago

Struggling with Llama 3.1 8B? I wish I had seen this sooner. Meta: "We recommend using Llama 70B-instruct or Llama 405B-instruct for applications that combine conversation and tool calling. Llama 8B-Instruct can not reliably maintain a conversation alongside tool calling definitions. It can be used for zero-shot tool calling, but tool instructions should be removed for regular conversations between the model and the user." (Emphasis added.) Link in reply below

Preston Badeer @pbadeer

almost 2 years ago

My python people, if you've been waiting for the right time to move to uv for package mgmt, now is the time.

Charlie Marsh

@charliermarsh

almost 2 years ago

uv 0.4.0 is out now 🚢🚢🚢 It includes first-class support for Python projects that aren't intended to be built into Python _packages_, which is common for web applications, data science projects, etc.

charliermarsh's tweet photo. uv 0.4.0 is out now 🚢🚢🚢

It includes first-class support for Python projects that aren't intended to be built into Python _packages_, which is common for web applications, data science projects, etc. https://t.co/Jt5hkWhAYX

660

106

68K

106

Preston Badeer @pbadeer

almost 2 years ago

30% is a huge improvement over all the previous hype that was around 13% (Devin), but still not something I would consider even close to production ready. Fast progress though! I hope we get some solid open source options in this space as the commercial ones improve.

Alistair

@AlistairPullen

almost 2 years ago

I'm excited to share that we've built the world's most capable AI software engineer, achieving 30.08% on SWE-Bench – ahead of Amazon and Cognition. This model is so much more than a benchmark score: it was trained from the start to think and behave like a human SWE.

247

349

820K

Preston Badeer @pbadeer

almost 2 years ago

Great update on open source, with expanded details in the replies 👇

Vaibhav (VB) Srivastav

@reach_vb

almost 2 years ago

What a massive week for Open Source AI: We finally managed to beat closed source fair and square! 1. Meta Llama 3.1 405B, 70B & 8B—The latest in the llama series, this version (base + instruct) comes with multilingual (8 languages) support, a 128K context, and an even more commercially permissive license. The best part: 405B beats GPT4o/ mini fair and square! Bonus: Meta posted a banger of a tech report with quite a lot of details also on upcoming (?) multi-modal (image/ audio/ video) 2. Mistral dropped Large 123B—Dense, multilingual (12 languages), and 128K context. Comes as instruct-only model checkpoint, with performance less than 405B but higher than L3.1 70B. Released under non-commercial license. 3. Nvidia released Minitron distilled 4B & 8B - apache 2.0 license, 256K vocab, with student beating the teacher by 16% on MMLU. Uses iterative pruning and distilling to achieve SoTA! The real question: Who is distilling 405B right now? ;) 4. InternLM shared Step Prover 7B—SoTA on the Lean, which was trained on Github repos with large-scale formal data. Achieves 48.8 pass@1, 54.5 pass@64. They release the dataset, tech report and the fine-tuned InternLM math plus model checkpoint 5. CofeAI dropped Chonky TeleFM 1T - A one trillion parameter dense model trained on 2T tokens, bilingual - Chinese and English, apache 2.0 licensed and tech report. They use a novel progressive upsampling approach. Stability dropped Sv4D, Nvidia released MambaVision, SakanaLabs with Evo (merging + stable diffusion), and more. This was a landmark week, and I'm personally quite happy with the direction of open source AI/ ML! Did I miss anything interesting drop them in comments! 🤗

322

171

63K

Preston Badeer @pbadeer

almost 2 years ago

In the commercial AI use case space, open source models are everything. This details how much of a leader Llama/meta is in this space.

Andrej Karpathy

@karpathy

almost 2 years ago

Huge congrats to @AIatMeta on the Llama 3.1 release! Few notes: Today, with the 405B model release, is the first time that a frontier-capability LLM is available to everyone to work with and build on. The model appears to be GPT-4 / Claude 3.5 Sonnet grade and the weights are open and permissively licensed, including commercial use, synthetic data generation, distillation and finetuning. This is an actual, open, frontier-capability LLM release from Meta. The release includes a lot more, e.g. including a 92-page PDF with a lot of detail about the model: https://t.co/48e3YJ8Sg9 The philosophy underlying this release is in this longread from Zuck, well worth reading as it nicely covers all the major points and arguments in favor of the open AI ecosystem worldview: "Open Source AI is the Path Forward" https://t.co/AdmpadCRM0 I like to say that it is still very early days, that we are back in the ~1980s of computing all over again, that LLMs are a next major computing paradigm, and Meta is clearly positioning itself to be the open ecosystem leader of it. - People will prompt and RAG the models. - People will finetune the models. - People will distill them into smaller expert models for narrow tasks and applications. - People will study, benchmark, optimize. Open ecosystems also self-organize in modular ways into products apps and services, where each party can contribute their own unique expertise. One example from this morning is @GroqInc , who built a new chip that inferences LLMs *really fast*. They've already integrated Llama 3.1 models and appear to be able to inference the 8B model ~instantly: https://t.co/b2kdSsz0fH And (I can't seem to try it due to server pressure) the 405B running on Groq is probably the highest capability, fastest LLM today (?). Early model evaluations look good: https://t.co/RLR5YBpmks https://t.co/ipT4x4wCvy Pending still is the "vibe check", look out for that on X / r/LocalLlama over the next few days (hours?). I expect the closed model players (which imo have a role in the ecosystem too) to give chase soon, and I'm looking forward to that. There's a lot to like on the technical side too, w.r.t. multilingual, context lengths, function calling, multimodal, etc. I'll post about some of the technical notes a bit later, once I make it through all the 92 pages of the paper :)

184

12K

988K

Preston Badeer

@pbadeer

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users