Symflower

@symflower

Your virtual coding assistant who spots errors and unexpected behavior, does routine tasks for you and generates unit tests with meaningful values in real time.

Joined May 2017

121 Following

302 Followers

517 Posts

Pinned Tweet

Symflower @symflower

almost 4 years ago

Ever wished you can just generate your #unittests instead of painfully writing them? In this video 👇 Evelyn demonstrates of how to use https://t.co/5Hj9Zey9DL to speed up your daily development workflow 🚀✨ https://t.co/zUy3sXBUSG #golang #java

Symflower @symflower

about 1 year ago

Benchmarking LLM agents 💡 Our latest post covers useful benchmarks for evaluating LLM code generation agents and agentic software development workflows. https://t.co/OSDYHxeFRj #LLMAgents #SoftwareDevelopment #Benchmarking #AI #Coding

Symflower @symflower

about 1 year ago

New to LLM coding agents? 🤖 Our introduction covers the capabilities, limitations, and use cases of LLM agents for software development 👇 https://t.co/ElB3iTcpS4

Symflower @symflower

about 1 year ago

Updated DevQualityEval v1.0 results are in 👀 Check out how our new king of cost-effectiveness (Google’s Gemini 2.0 Flash Lite) performed, and find out if Claude 3.7 Sonnet (Thinking) is worth the additional costs 👇

Markus Zimmermann @zimmskal

about 1 year ago

Insights of analyzing >100 LLMs for the DevQualityEval v1.0 (generating quality code) in latest deep dive - 👑 Google’s Gemini 2.0 Flash Lite is the king of cost-effectiveness (our previous king OpenAI’s o1-preview is 1124x more expensive, and worse in score) - 🥇 Anthropic’s Claude 3.7 Sonnet is the functional best model (with help) … by far - 🏡 Qwen’s Qwen 2.5 Coder is the best model for local use - Models are on average getting better at code generation, especially in Go - Only one model is on-par with static tooling for migrating JUnit 4 to 5 code - Surprise! providers are unreliable for days for new popular models - Let’s STOP the model naming MADNESS together: we proposed a convention for naming models - We counted all the votes, v1.1 will bring: JS, Python, Rust, … - Our hunch with using static analytics to improve scoring continues to be true All the other models, details and how we continue to solve the "ceiling problem" in the deep dive: 👇🧵 (now with interactive graphs 🌈) Looking forward to your feedback :-)

zimmskal's tweet photo. Insights of analyzing >100 LLMs for the DevQualityEval v1.0 (generating quality code) in latest deep dive

- 👑 Google’s Gemini 2.0 Flash Lite is the king of cost-effectiveness (our previous king OpenAI’s o1-preview is 1124x more expensive, and worse in score)
- 🥇 Anthropic’s Claude 3.7 Sonnet is the functional best model (with help) … by far
- 🏡 Qwen’s Qwen 2.5 Coder is the best model for local use

- Models are on average getting better at code generation, especially in Go
- Only one model is on-par with static tooling for migrating JUnit 4 to 5 code
- Surprise! providers are unreliable for days for new popular models

- Let’s STOP the model naming MADNESS together: we proposed a convention for naming models
- We counted all the votes, v1.1 will bring: JS, Python, Rust, …
- Our hunch with using static analytics to improve scoring continues to be true

All the other models, details and how we continue to solve the "ceiling problem" in the deep dive: 👇🧵
(now with interactive graphs 🌈)

Looking forward to your feedback :-)

154

Who to follow

Ivana Soldat

@soldat336

Founder of SEO Curly & Global Newsly

Veterans Awards

@AwardsVeterans

Rewarding Military Veterans in Business, Fitness, Sport & community. Supporting the @soldierscharity & @RNRMC sponsored by @thalesgroup #Veteransawards 🇬🇧

Conversations on Health Care

@COHCPodcast

Conversations on Health Care, a project of the Moses/Weitzman Media Group, offers global perspectives from top thought leaders on trending topics in healthcare.

Symflower @symflower

over 1 year ago

We analyzed 80+ #LLMs for generating quality code 👀 Here‘s the deep dive blog post for the DevQualityEval v0.6: https://t.co/UH9NqRhvWX

Symflower @symflower

over 1 year ago

We analyzed >80 LLMs in the deep dive blog post from DevQualityEval v0.6 for generating quality code. Check out the insights and results 👇

Markus Zimmermann @zimmskal

over 1 year ago

OpenAI's o1-preview is the king 👑 of code generation but is super slow and expensive 😱 This and other insights of analyzing >80 LLMs in the deep dive blog post from the DevQualityEval v0.6 for generating quality code 👇 - OpenAI’s o1-preview and o1-mini are slightly ahead of Anthropic’s Claude 3.5 Sonnet in functional score, but are MUCH slower and chattier. - DeepSeek’s v2 is still the king of cost-effectiveness, but GPT-4o-mini and Meta’s Llama 3.1 405B are catching up. - o1-preview and o1-mini are worse than GPT-4o-mini in transpiling code - Best in Go is o1-mini, best in Java GPT4-turbo, best in Ruby o1-preview Please support our work for the community by liking and sharing this post! 🙏 All the details and how we will solve the "ceiling problem" in the deep dive https://t.co/TRo8GsVR28 (2x the content as the previous one!)

zimmskal's tweet photo. OpenAI's o1-preview is the king 👑 of code generation but is super slow and expensive 😱 This and other insights of analyzing >80 LLMs in the deep dive blog post from the DevQualityEval v0.6 for generating quality code 👇

- OpenAI’s o1-preview and o1-mini are slightly ahead of Anthropic’s Claude 3.5 Sonnet in functional score, but are MUCH slower and chattier.
- DeepSeek’s v2 is still the king of cost-effectiveness, but GPT-4o-mini and Meta’s Llama 3.1 405B are catching up.
- o1-preview and o1-mini are worse than GPT-4o-mini in transpiling code
- Best in Go is o1-mini, best in Java GPT4-turbo, best in Ruby o1-preview

Please support our work for the community by liking and sharing this post! 🙏

All the details and how we will solve the "ceiling problem" in the deep dive https://t.co/TRo8GsVR28 (2x the content as the previous one!)

336

198

67K

146

Symflower @symflower

over 1 year ago

#Java 23 is out! 🥳 Learn about all the updates & new features in #JDK23:https://t.co/r1cFuDOf8E

511

Symflower @symflower

over 1 year ago

Execute only the tests you need 💡We see a 29% reduction in test execution times with just a basic approach. Details of the benchmark, example & guide: https://t.co/DcCujUNcvn

Symflower @symflower

over 1 year ago

Need to cut #LLM costs? 🤑 Read up on the key practices you can use to optimize your LLM spending 👌 https://t.co/jRJ5oenf4l

Symflower @symflower

almost 2 years ago

#LLM #observability 👀 Monitoring can help improve the performance of your LLM applications. Here’s what you need to know & the most useful tools for LLM observability 🔍 https://t.co/V0zfiCRhaC

Symflower @symflower

almost 2 years ago

We used #LLMs to #transpile #Java and #Golang code to #Ruby 🦾 Here‘s what we experienced: https://t.co/6pvgzhZ8xe

Symflower @symflower

almost 2 years ago

Are you using #AI-powered tools in your #softwaredevelopment workflow❓ Aider is a good example that works well and even offers voice coding 🦾 Here’s our guide to using Aider: https://t.co/XKGfQytyqF

Symflower @symflower

almost 2 years ago

Lost in the sea of #LLM #codegeneration tools? 🌊 We’ve got you! Here’s our list of the top #AI tools for #softwaredevelopment: https://t.co/4TV1Kr7wcY

Symflower @symflower

almost 2 years ago

How well do #LLMs generate code ❓ There’s only one way to find out: #benchmarking models for #softwaredevelopment tasks. Here’s a roundup of popular LLM benchmarks & insights into our take on the topic 🤓 https://t.co/nkUVxXKfz9

Symflower @symflower

almost 2 years ago

Looking to evaluate LLMs? 👀 This post helps you navigate the #LLM #benchmark landscape 🧭 https://t.co/HtDpXUMe7M

Symflower @symflower

almost 2 years ago

What metrics do you track when evaluating #LLMs? 👀 Here‘s an overview of complex statistical and model-based scorers 💡 Bonus: we also cover the #evaluation #frameworks that help you get started assessing #LargeLanguageModels. https://t.co/Y7AkEna0wT

Symflower @symflower

almost 2 years ago

Have you ever tried to fix performance issues in your #GoLang application but could not find why it was taking longer sometimes? 🚀 Instrumenting your application for #Go #tracing 💡might be what you need: https://t.co/rY80MpEAsL

Symflower @symflower

almost 2 years ago

#Java 23 is coming in September 🥳 Here’s what you can get excited about in #JDK23! Check out all the updates in this release: https://t.co/r1cFuDOf8E

Symflower @symflower

almost 2 years ago

Do you #reuse code? ♻️ Optimizing code for #reusability helps drive down development effort and cost while improving quality. Here’s a list of the most important reusability best practices for #Java #coding: https://t.co/j9b151WnYT

Symflower @symflower

almost 2 years ago

Confused by LLM evaluation? 😵‍💫 We can’t blame you. Our new series on LLM #benchmarking guides you through all you need to know about measuring #LLM performance: https://t.co/z7TalXRPZl

418

Symflower @symflower

almost 2 years ago

Struggling with performance bottlenecks in your #GoLang app? 🤔 #Go #tracing to the rescue! Explore our comprehensive guide and conquer even the toughest optimization challenges 💪 https://t.co/rY80MpEAsL

Symflower

@symflower

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users