ellamind

3 months ago

M 2.5 by @MiniMaxAI_ is currently the most popular open weights model on @OpenRouter, but is also heavily censored. Inspecting the CoT`s reveals deliberate lying, which can also be problematic in other areas as @AnthropicAI`s research has shown. Some examples attached 👇

jphme's tweet photo. M 2.5 by @MiniMaxAI_ is currently the most popular open weights model on @OpenRouter, but is also heavily censored.

Inspecting the CoT`s reveals deliberate lying, which can also be problematic in other areas as @AnthropicAI`s research has shown.

Some examples attached 👇 https://t.co/caNzn28GfN

2

1

0

117

5 months ago · Bremen

Model Download: https://t.co/yiMLNd1RXW Annotations generated with propella-1: https://t.co/N7XoftcEtC

0

54

5 months ago · Bremen

We released propella-1, a small model for advanced pre-training data annotation 🙃. Work led by @maxidahl within the @OpenEuroLLM project. Link to model + annotations for important pre-training datasets below 👇

Max Idahl

@maxidahl

5 months ago

Time to propel open LLM training data curation to the next level. Releasing propella-1: small multilingual LLMs that annotate text documents for dataset curation at scale. 🧵👇

maxidahl's tweet photo. Time to propel open LLM training data curation to the next level. Releasing propella-1: small multilingual LLMs that annotate text documents for dataset curation at scale.
🧵👇 https://t.co/GXzOCg0cwN

2

8

1

721

1

2

1

0

459

6 months ago

Our @TheBitFlipper built an in-house benchmark for coding agents, based on real PRs from our codebase. As expected from our vibes (and other benchmarks), Opus takes the crown 🥇 - GPT-5.2 results still outstanding though 👀

Damian Barabonkov

@damian_b

6 months ago

Public benchmarks are easy to game. I built swellubench to validate real features and bug fixes from a production platform at @ellamindAI. It evaluates models on private, real-world coding tasks to measure true performance and cut through benchmark maxing noise. Methodology in 🧵

damian_b's tweet photo. Public benchmarks are easy to game.

I built swellubench to validate real features and bug fixes from a production platform at @ellamindAI. It evaluates models on private, real-world coding tasks to measure true performance and cut through benchmark maxing noise.

Methodology in 🧵

1

11

4

6

2K

1

0

91

6 months ago · Stuhr

10m gpu hours for @OpenEuroLLM, lfg 🚀

OpenEuroLLM @OpenEuroLLM

6 months ago

Strategic access to EuroHPC resources granted to OpenEuroLLM!!! -first AI project granted strategic access across multiple EuroHPC centres -for over 10 million GPU hours Thanks @EUComission and @EuroHPC_JU!

OpenEuroLLM's tweet photo. Strategic access to EuroHPC resources granted to OpenEuroLLM!!!

-first AI project granted strategic access across multiple EuroHPC centres
-for over 10 million GPU hours

Thanks @EUComission and @EuroHPC_JU! https://t.co/SP0kryOhrk

1

15

6

0

826

0

1

0

65

7 months ago · Stuhr

Joined SOOFI consortium for 🇩🇪 sovereign AI. Our role: rigorous LLM evaluation on @deutschetelekom's shiny new DGX B200s in Munich🙂. Let´s build! 🙌 More info: https://t.co/MA9VzjjUJ4 @KI_Verband @FraunhoferIAIS @MMerantix @DFKI @FraunhoferIIS @UniHannover @TUDarmstadt

ellamindAI's tweet photo. Joined SOOFI consortium for 🇩🇪 sovereign AI. Our role: rigorous LLM evaluation on @deutschetelekom's shiny new DGX B200s in Munich🙂. Let´s build! 🙌

More info:
https://t.co/MA9VzjjUJ4
@KI_Verband @FraunhoferIAIS @MMerantix @DFKI @FraunhoferIIS @UniHannover @TUDarmstadt https://t.co/018lmNYXz6

0

2

0

151

7 months ago · Barcelona

Machine translated data beats native language data? 🤔 As part of @OpenEuroLLM, we produced >5 trillion tokens of multilingual pretrain data for low-resource languages with >3M tps on LEONARDO (CINECA). Findings presented at @BSC_CNS. led by @maxidahl, release coming soon 🙂.

ellamindAI's tweet photo. Machine translated data beats native language data? 🤔
As part of @OpenEuroLLM, we produced >5 trillion tokens of multilingual pretrain data for low-resource languages with >3M tps on LEONARDO (CINECA). Findings presented at @BSC_CNS. led by @maxidahl, release coming soon 🙂. https://t.co/HDhTxqJYe7

0

4

2

1

212

8 months ago

@WolframRvnwlf 🫡

0

1

0

30

8 months ago

decide with confidence 💯 #elluminate

8 months ago

Veo 3.1 vs Sora 2 creating professional-looking (at least that was the intention 😄) minimal ads. My take: Veo3.1´s details slightly better, however Sora 2 a lot more steerable and with better text + scene changing capabilities. (prompt was adapted from some sora example though)

0

2

1

550

0

61

ellamindAI retweeted

9 months ago

This is just a small vibecheck (more currently not possible due to rate limits) - but in the German Geo eval I built on stage yesterday evening, @Alibaba_Qwen 3-Max doesn't look competitive with other top models and also falls far behind e.g. R1 or GLM 4.5. 😕 @ellamindAI

jphme's tweet photo. This is just a small vibecheck (more currently not possible due to rate limits) - but in the German Geo eval I built on stage yesterday evening, @Alibaba_Qwen 3-Max doesn't look competitive with other top models and also falls far behind e.g. R1 or GLM 4.5. 😕 @ellamindAI https://t.co/NKurPsDTNp

1

6

2

4

2K

9 months ago

Building AI products? You need real evaluations. Let's talk. https://t.co/NWU7kJ4Fca

1

4

1

139

9 months ago

AI evaluations are broken. Generic benchmarks tell you nothing. Manual QA doesn't scale. And existing tools are either too academic or simplify in the wrong places. That's why we built elluminate - evals that actually work for real product teams.

1

12

5

4K

9 months ago

The result? Teams ship faster with confidence. Product managers can actually trust their metrics. And developers spend time building, not firefighting. Whether you're a developer tired of vibe-checking, a PM who needs reliable metrics, or a domain expert who knows what "good" looks like, elluminate speaks your language.

1

2

1

0

152

10 months ago

Our co-founders project #LeoLM highlighted by @bmftr_bund. Today, we´re continuing what started as a student`s side-project with @OpenEuroLLM (and more to come). If you want to work on Open Source AI, multilingual applications and AI evaluations as well - we´re hiring! 🙂

Björn Plüster

@bjoern_pl

10 months ago

Nearly two years after release my project LeoLM is being used as a strong justification for the expansion of federal compute funding in Germany. Goes to show how much impact open-source projects can have. Hell yeah @bmftr_bund - thanks for making projects like this possible! 🚀

bjoern_pl's tweet photo. Nearly two years after release my project LeoLM is being used as a strong justification for the expansion of federal compute funding in Germany.

Goes to show how much impact open-source projects can have. Hell yeah @bmftr_bund - thanks for making projects like this possible! 🚀 https://t.co/zNi18fIo4k

1

7

0

802

1

2

0

549

ellamindAI retweeted