John Tokash

22 days ago

"You are here" That's how you read the dot in the logo. We're getting to the steep part now, all of us. It's better if we ride the curve together. Come visit me and the team at our first center, Diffusion Silicon Valley. Diffusion Europe and Diffusion India coming soon. https://t.co/KwE0FvWSzY

jtokash retweeted

4 months ago

We're now helping others build factories. The first exercise is to build an Attractor. I just completed one in C. The code in src/llm really illustrates how small & straightforward a coding agent can be. https://t.co/oYnLXLliB3

975

jtokash retweeted

Staff Product Designer, LLM Observability @ Datadog. Former Pendo, Square, Zendesk, and Intuit.

4 months ago

We've started saying "don't be a drinking bird" when we catch ourselves in interactive vs non-interactive mode https://t.co/AueBlSoAvs

Who to follow

Brian Greenbaum

@BrianGreenbaum

jtokash retweeted

Ethan Mollick

@emollick

5 months ago

A genuinely radical approach to software development with AI, without any human intervention. Even if this approach doesn’t work for many cases, I think we need more leapfrogging visions for how to redo processes with AI: https://t.co/GjkJ31wGOA See also: https://t.co/2rh7a1MLkG

emollick's tweet photo. A genuinely radical approach to software development with AI, without any human intervention. Even if this approach doesn’t work for many cases, I think we need more leapfrogging visions for how to redo processes with AI: https://t.co/GjkJ31wGOA

See also: https://t.co/2rh7a1MLkG https://t.co/mesPEesYc1

336

315

38K

jtokash retweeted

5 months ago

Our experience building a frontier Software Factory - https://t.co/rnzmCiBXiK

10 months ago

The last 18 months or so have been amazing for gaming. Animal Well (May 2024), Blue Prince (April 2025), Leap Year (June 2024) and Öoo (August 2025) are some of the best games I've ever played.

125

over 1 year ago

@warpdotdev Is there a way to have warp do some form of animation when new information comes through (either scroll it into view or flash it)? Sometimes it is hard to see that there is new output.

almost 2 years ago

FYI, as a fan of the game franchise, I enjoyed the Borderlands movie. cc @DuvalMagic

about 2 years ago

@pitdesi Bake the largest context windows and the most intelligent models into Apple One and leave GPT-3.5 equivalents to the free tier. Interesting. Could be. They love that services revenue!

about 2 years ago

My take on Rabbit R1: It’s a great taste of what’s to come and it’s currently best in class at fast, pocketable, multi-modal LLM inference. Faster and more intelligent output than your phone and that might be true for a long time. I’ve been closely watching Rabbit since CES. Their tech, their community, the reception of the press and of course the device and service itself, which I got my hands on Tuesday night. The reception by the press is, I think, missing the point in many ways. And Rabbit is not blameless in that. LAM and Teach Mode (more on those later further down) have a long way to go, but focusing on those overshadows what already works well about the product. Here are the things Rabbit R1 gets right today: - Pull a device out of your pocket, press a button, ask a question and get a GPT-4 response to that question, faster than on any other mobile device. - Pull a device out of your pocket, point it at something, ask a question and get a GPT-4 Vision response faster than on any other mobile device. -Pull a device out of your pocket and get a Perplexity Pro search-assisted-ai (or ai-assisted-search) response faster than on any other mobile device. - All of these responses are voiced by a very clear AI model supplied by ElevenLabs. Some say those features are not enough for $200 or that they will be built into every phone soon enough. We can agree or disagree about the $200, but I’m skeptical that we are going to see that combination of features and response time from Apple, Google and Samsung by EOY. Why’s that? GPT-4-turbo and GPT-4 vision inference are costly. For Apple, Google and Samsung to have top-of-the-line inference at scale will be difficult. How is Rabbit doing it? Are our inferences being subsidized by Rabbit’s capital, OpenAI and/or Perplexity? Likely to some extent, but the Rabbit device itself is inexpensive to produce and will become more so over time. Some of the device price is going into these costs. Yes, other models are catching up slowly to GPT-4, but Rabbit will have more freedom to opt into powerful models as they are released before top tier phone providers can due to the scale of customers they must support. Why is it so fast? I’ve been experimenting with and prototyping STT/TTS interfaces a lot since the beginning of 2023, so I have some appreciation for what is happening here. - The choice to use ElevenLabs for TTS (text to speech) was smart. Not only is it high quality, but you can stream text into its API word by word and stream the generated voice output incrementally in the same way. In other words, you are hearing generated speech often times well before the last word has come from the LLM. - What baffles me is how quickly the tool choice and model routing is happening. Rabbit is able to detect whether you need search, traditional LLM output, Uber, MidJourney, etc extremely quickly. MidJourney and Spotify support feel a little more niche to me than the features I mentioned at the top, but both seem to work extremely well and the Spotify response time is unreal. Here are the things that don’t work well on Rabbit R1 (IMO): - UI: The scrollwheel is frustrating to use. It needs to be more responsive and the touch screen should be available as a backup for scrolling. Volume is buried and hard to adjust. These are very fixable with an OTA update. - Doordash: I’ve tested several restaurants and each only shows me 6 dishes. This is almost certainly fixable and can most likely be done on the server-side with no OTA update required. That said, I’ve tested Multi-On (another browser automation AI tool - albeit with a much different approach) on Doordash and have had very mixed success. Doordash may take some time to get right. LAM and Teach Mode Jesse at Rabbit has previously noted that they chose to create a “Large Action Model”, trained on how people use browsers and computers in general to interface with apps and services for several reasons: - Freedom - To leave it up to the user which services should be available on their R1. Rabbit plans a “Teach Mode” that will enable end users to show Rabbit how to use their favorite service. - Compatibility and Comprehensiveness - Many services do not expose all of their functionality through an API or SDK, despite that functionality being available in their application or web site. - Longevity - Rabbit indicates that certain services, in their terms and conditions, may limit how their SDKs can be leveraged by AI and/or Voice. Teach mode and LAM do not currently require signing the TOS for an SDK - instead it is all happening through the service’s UI. Of the 4 LAM-based functions on Rabbit today, Spotify and MidJourney work well. As mentioned, Doordash is too limited. I have not been able to successfully test Uber, yet, but I’ll try again. I certainly hope that LAM and Teach Mode become as powerful as what Rabbit has planned, however I think the R1 device is ALREADY an exciting glance into the future, making multi-modal AI inference and AI assisted search available in an inexpensive device with best-in-class response times and the Rabbit team should be very proud of that.

155

69K

about 2 years ago

@Eye_Bee_Leaf They have a lot of the right pieces. I use today's SIRI literally every day for alarms, phone calls, texts and timers. I would love for them to leverage iOS Shortcuts, Intents, Activities and Accessibility to knock it out of the park with AI. I hope they do.....

over 2 years ago

@DivGarg9 @MultiON_AI Pretty wild!

over 2 years ago

Testing Multi-On: TikTok is testing AI Song, a tool that generates songs from text prompts using LLM Bloom. Users can even toggle the genre for a personalized touch. #AI #MusicTechTesting Multi-On: Meta's open source AGI development signals a new era of collaborative advancement in AI. Zuckerberg's vision could change the tech landscape. #OpenSource #AGI

388

over 2 years ago

For as long as I can remember in tech, the @BillGates law has been the standard: “Most people overestimate what they can achieve in a year and underestimate what they can achieve in ten years.” That isn’t going to be true in 2024. Would love your feedback, challenges, thoughts on this forecast. Baseline AI improvements coming in 2024: 1. In the past, the more specialized the model, the more powerful it was. LLMs, GPTs and Transformers broke that in late 2022 and 2023. The more data, the more generalized, the better - and now the generalized models outperform specialized models by a huge margin. In 2024 we will see a model that has not only more training data, but better organized training data. Without any other improvement to the fundamentals of the model, this will bring stronger reasoning. 2. Reinforcing #1, training data does not have to be what we see today on the internet. It can be subtly manipulated existing data or known good manufactured data. Some of this can be created by humans, but, increasingly, this data can be made by the models. 3. The models are going to become even more generalized in 2024. Truly multi-modal. Robotics took huge leaps forward in 2023 because of things learned via LLMs and Transformers. That will continue in 2024, but also all that will be fed back into the LLMs giving spatial reasoning to our GPT based applications. 4. The GPT App Store is one way for OpenAI’s to get a lot more training data. BUT it is also its way to outsource the creation and testing of sophisticated new system messages that can help the LLM adapt to varying problems and inputs. 5. Most programming languages and frameworks come with a ‘standard library’. Sometimes it is hard to even really think about a programming language without also thinking about its standard library. OpenAI’s LLMs (and I’m sure others) will start to accumulate tools that the LLM will just assume are there. Right now, code interpreter and web search are 2 of these, but this ‘standard library’ will become much larger in 2024. Likely, the LLM will be fine tuned to depend on/rely on these to produce more accurate results consistently and quickly. 6. The way LLMs “think” is to generate text. We’ll see in 2024 that the LLMs will generate a bunch of text behind the scenes when creating responses. Maybe we’ll be able to access that info during debugging or maybe it will simply be proprietary and hidden. But it will lead to much better results in APIs, in ChatGPT, in applications. This is SmarterGPT, step by step, repeat the question/problem in detail before solving, etc but happening ahead of, in parallel to and/or after the generation step we have today. We have a taste of this today with generations/reflections that happen after function calls. 7. Retrieval Augmented Generation is going to get much more thorough. There is a MASSIVE amount of competition in this space. In 2024 we’ll also see the LLMs being used to chunk and store data into semantic databases as well as being put in charge of how the retrieval happens. Today, in many cases RAG manifests as an entity that has perfect recollection of 3-5 relevant passages and absolutely no memory of the other 20. RAG needs to be better at accessing/summarizing/brute forcing the other 20. Semantic lookup is not enough. Some of the improvements here should extend to web searches, too. 8. For function calling, there will become a standard way to traffic-direct to the right functions. That work may be closer to NLP tech than now-standard LLM tech. Problem today is that function definitions take up too much of the context window - dampening the LLM’s concentration on the important information from the user. This traffic direction will be crucial so the LLM is given the function definitions it needs and doesn’t need to ‘think about’ function definitions that are not relevant at the time. See also #5 9. Understanding of the emergent behavior of the LLM is growing but still limited. Sam Altman has stated as much as recently as late December. Once the research gets better at understanding why the LLM works the way it does, we’ll get dramatically better costs of inference and be able to get more consistently accurate answers. 10. We’ll start to put the LLM in control of its creativity settings. It will decide what temperature and top_p (and 5-10 settings that aren’t even exposed or popularized right now) to use for a given topic. 11. I think we’ll start giving the AI agency over the context window. Letting the AI decide what (and how) it remembers and letting it store info in short term vs long term vs reference memory. This could be per session, per user, per company, etc. 12. It is highly likely that there will be smaller, faster models observing larger, more sophisticated models helping prune generations that are misguided. Combining 12, 4 and 6 it would be interesting to see that work crowd-sourced to create a set of agents that help guide LLMs to best outcomes. 13. These are merely the high probability improvements in 2024. Beyond these there will be impressive improvements that come out of left field. 14. Beyond the LLM: Vision and understanding images is going to be key. OCR and handwriting recognition will continue to get better. Voice and Music will be big. Video and 3D generation as well. 15. We’ll continue to see big leaps in robotics every month. Household moment for robotics (similar to the ChatGPT moment) in 2025. Observations and open questions: 16. Much of what I described comes down to middleware improvements, fine tuning/HRLF (similar to improvements of the GPT-4-1106 variety), and a modest pace of improvement/scaling/optimization in the models themselves. What are the most likely disruptive improvements to the model itself in 2024? Some of the middleware improvements in the list above could be obviated by model improvements. 17. Largely un-reported: 3.5’s 1106 model/API is anecdotally superior in at least a few ways to 3.5 turbo. To that point, what shaky prototypes built on GPT 4 are now performing better simply by moving them to GPT-4 1106? Smarter GPT prototypes, SQL generation prototypes for instance? 18. Where we’ve invested in up-front model, agent, middleware decisions in 2023 - how much of that orchestration will be given over to the LLM in 2024? Rather than building pipelines we force the LLM to be a component of, how often will we just give the LLM a budget and have it self-direct? As Justin McCarthy points out - Tesla FSD v12 has removed much of their hard-coded support structure to lean into the model with great results. 19. What aspects of the Gemini video will be true(without significant coaching) and when/how in 2024? Regardless of which model is being used. 20. What will be the biggest limitations/gaps of the model, of the middleware, etc in 2024? 21. What is the closest thing to AGI we will see in the lab and in our hands in 2024? What is the closest thing to AGI that exists in the lab today? High signal-to-noise-ratio sources of information to watch closely for these developments: https://t.co/svT8h0dL5M, https://t.co/xHCIIKfqaN, https://t.co/Wvwan61vSS

522

over 2 years ago

Getting an assistant message and assistant tool_calls in the same ChatCompletion API call seems to only happen with streaming on. I can't coax the LLM to return that way when stream=false.

102

over 2 years ago

Some observations about `tool_calls` based on some fresh experimentation with these changes to the openAI ChatCompletion API today: • When gpt-4-1106-preview wants to do parallel tool_calls when stream=True, it has a huge delay while it 'thinks' and then it spits the tool_calls out very quickly through the streaming completion chunks. So if it is calling 7 parallel tool calls for example, you currently don't get a lot of benefit from processing them as they come in because they all come in together rather than slowly over time. I suspect the API is holding the tokens on the server to do some kind of sanity check before sending them to the client. Same behavior with gpt-3.5-turbo-1106 but it is faster over all. • gpt-4-1106-preview sometimes sends both message content and tool_calls in the same ChatCompletion. I've never seen gpt-4 do that with function_calls. • Although gpt-4 and gpt-3.5-turbo both support the new tools and tool_calls syntax, neither one seems to want to do parallel tool calls.

244