Swival.dev @SwivalAgent - Twitter Profile

SwivalAgent retweeted

yv

@yvbbrjdr

1 day ago

推荐大家读一下MAI-Thinking-1的technical paper，里面有详细的怎么训出一个SOTA LLM的（几乎）所有细节。 https://t.co/it5mCFd6v3

21

1K

219

2K

168K

SwivalAgent retweeted

TonoKen3🌏LLMFormura-1©｜とのけん3

@Tono_Ken3

1 day ago

おはようLLM 今日のピックアップはフランス発のこのモデル。Qwenベースの35BモデルでSonnet4.6超えのベンチをマークいや絶対試すから https://t.co/g9ujy8gz14

12

380

33

355

23K

SwivalAgent retweeted

Jun Kim @jundotkim

1 day ago

I just shipped oMLX v0.4.0, the first official release with the new native Swift macOS app. https://t.co/AJT6hKvMPL oMLX now ships with a redesigned onboarding flow, settings UI, Hugging Face cache discovery, and a much more native-feeling way to manage and run local models on macOS. - Huge thanks to GitHub contributor popfido for the excellent work that drove the Swift transition. My goal is still the same: I want oMLX to be the app someone can open on a Mac and immediately try Local AI with, without needing to understand all the machinery first. If you try 0.4.0, I’d really appreciate feedback on the macOS app experience, especially first launch, model discovery, server start/stop, update checks, and anything that still feels confusing.

jundotkim's tweet photo. I just shipped oMLX v0.4.0, the first official release with the new native Swift macOS app.
https://t.co/AJT6hKvMPL

oMLX now ships with a redesigned onboarding flow, settings UI, Hugging Face cache discovery, and a much more native-feeling way to manage and run local models on macOS.
- Huge thanks to GitHub contributor popfido for the excellent work that drove the Swift transition.

My goal is still the same: I want oMLX to be the app someone can open on a Mac and immediately try Local AI with, without needing to understand all the machinery first.

If you try 0.4.0, I’d really appreciate feedback on the macOS app experience, especially first launch, model discovery, server start/stop, update checks, and anything that still feels confusing.

25

271

28

162

38K

SwivalAgent retweeted

JetBrains @jetbrains

3 days ago

Mellum started with code completion. Mellum2 is built for more – handling both natural language and code. A 12B-parameter open-source LLM for routing, RAG, and sub-agents, optimized for ultra-low-latency inference. Now on @huggingface. Learn more: https://t.co/28sG8Ql52L

24

328

64

147

96K

Swival.dev @SwivalAgent

3 days ago

I just asked M3 to improve the https://t.co/h8nT16uH4t static pages and it had the excellent idea to add internal page navigation. Nice and useful. Good job @MiniMax_AI ! Looking forward to testing this model more.

0

24

Swival.dev @SwivalAgent

3 days ago

@Youssofal_ Looking forward to it! mtplx is great!

0

22

Swival.dev @SwivalAgent

3 days ago

Swival gets built-in, first-class support for the Nono sandbox https://t.co/brnSAaAyFg /cc @decodebytes

0

5

4

3

767

SwivalAgent retweeted

Jun Song

@jun_song

3 days ago

Friendly reminder : You can run 300B local llm on your Macbook Pro. You can’t run it on any windows laptop.

41

178

6

40

41K

Swival.dev @SwivalAgent

3 days ago

@neural_avb rtk is simple to use in Swival as a command middleware https://t.co/hoEKQ5ZCZo

0

1

0

1

230

Swival.dev @SwivalAgent

3 days ago

@stevibe Let's give it a spin!

0

144

SwivalAgent retweeted

ModelScope

@ModelScope2022

4 days ago

🔥@KwaiKeye 's Keye-VL-2.0-30B-A3B is now officially live on ModelScope! A major milestone that brings DSA (DeepSeek Sparse Attention) into multimodal AI. 🎬🤖 By coupling sparse attention with advanced feature aggregation, Keye 2.0 unlocks a 256k context window, allowing seamless processing of hour-long videos with zero context degradation. 📈 🔗 Get the weights: https://t.co/ScMIfM6ExY 🌟 Core Technical Highlights: • 🧠 MoE Performance, Flash Cost: Outperforms 200B+ open models on LongVideoBench (74.10) while slashing prefill costs by 50%. • ⏱️ Frame-Level Precision: Captures complex causal chains and timestamps in long vlogs, handicraft tutorials, and gaming clips. • 🚀 Anti-Decay Mastery: On VideoMME V2, expanding input from 64 to 512 frames actually boosts accuracy from 35.34% to 42.44%.

ModelScope2022's tweet photo. 🔥@KwaiKeye 's Keye-VL-2.0-30B-A3B is now officially live on ModelScope! A major milestone that brings DSA (DeepSeek Sparse Attention) into multimodal AI. 🎬🤖

By coupling sparse attention with advanced feature aggregation, Keye 2.0 unlocks a 256k context window, allowing seamless processing of hour-long videos with zero context degradation. 📈 🔗 Get the weights: https://t.co/ScMIfM6ExY

🌟 Core Technical Highlights:
• 🧠 MoE Performance, Flash Cost: Outperforms 200B+ open models on LongVideoBench (74.10) while slashing prefill costs by 50%.
• ⏱️ Frame-Level Precision: Captures complex causal chains and timestamps in long vlogs, handicraft tutorials, and gaming clips.
• 🚀 Anti-Decay Mastery: On VideoMME V2, expanding input from 64 to 512 frames actually boosts accuracy from 35.34% to 42.44%.

8

285

29

143

19K

SwivalAgent retweeted

Loris Cro ⚡

@croloris

4 days ago

You Must Fix Your Asserts https://t.co/TCwtqsKQNP

5

62

4

29

4K

Swival.dev @SwivalAgent

5 days ago

Notes from the AI Now Summit by Mistral https://t.co/gfjbQOOVXA

0

29

SwivalAgent retweeted

Georgi Gerganov

@ggerganov

6 days ago

llama.cpp now has an official website: https://t.co/vztdUpdBWL Our goal is to make local AI accessible to everyone, and improving the user experience is a big part of that. On the new landing page you’ll find a single-line cross-platform installer. The installation provides a single unified `llama` entrypoint which you can use to run/serve models and interface with 3rd-party agentic applications. While oriented towards simplified user experience, the new `llama` application also provides all the advanced functionality of the existing llama.cpp tooling with which experienced users are already familiar. Also note that all GGUF models that you might have already downloaded with llama.cpp in the past will be automatically available to use without downloading again (they are stored in the common HF cache on your machine). We have many improvements in the pipeline both at the UX and at the engine level and we plan to iteratively ship new things over the coming months. One of the main focuses will be seamless integration with local-friendly 3rd-party agents (such as Pi). In the meantime, we’ll continue to listen for feedback from the community and adjust accordingly, so keep letting us know what you think and need.

95

3K

485

1K

161K

SwivalAgent retweeted

clem 🤗

@ClementDelangue

7 days ago

The HF science team just made async RL weight sync ~100x cheaper on bandwidth, and you don't need a shared cluster anymore. The problem: every RL step, the trainer typically has to sync fresh weights to the inference engine. for a 7B in bf16 that's ~14GB. for a frontier 1T fp8 checkpoint, that's ~1TB; in bf16 it would be ~2TB. per sync. The insight: between two RL steps, ~99% of bf16 weights are bit-identical. at RL learning rates, the optimizer is whispering and bf16 literally cannot hear most of it. the stored bf16 bits don't change. What they shipped in TRL: only the changed elements get encoded as a sparse safetensors file, dropped into a Hugging Face Bucket, and fetched by vLLM. on Qwen3-0.6B, per-step payload goes from 1.2 GB to 20 to 35 MB. This is exactly what we built Buckets for: S3-like object storage on the Hub, Xet-backed (so even full snapshots only transfer the changed chunks). The cherry on top: we ran a FULL disaggregated training where: - the trainer lived on one box - vLLM ran inside a Hugging Face Space - the Wordle environment ran in another Space - weights flowed through one Hub bucket no shared cluster. no RDMA. no VPN. no NCCL across clouds. just HTTPS and a bucket. one GPU + a Hugging Face account is now enough to do real disaggregated RL. multi-replica inference fleets across regions become a small devops exercise, not a research project. Full write-up: https://t.co/CG115IjT0q Open source RL keeps eating the moat!

ClementDelangue's tweet photo. The HF science team just made async RL weight sync ~100x cheaper on bandwidth, and you don't need a shared cluster anymore.

The problem: every RL step, the trainer typically has to sync fresh weights to the inference engine. for a 7B in bf16 that's ~14GB. for a frontier 1T fp8 checkpoint, that's ~1TB; in bf16 it would be ~2TB. per sync.

The insight: between two RL steps, ~99% of bf16 weights are bit-identical. at RL learning rates, the optimizer is whispering and bf16 literally cannot hear most of it. the stored bf16 bits don't change.

What they shipped in TRL: only the changed elements get encoded as a sparse safetensors file, dropped into a Hugging Face Bucket, and fetched by vLLM. on Qwen3-0.6B, per-step payload goes from 1.2 GB to 20 to 35 MB. This is exactly what we built Buckets for: S3-like object storage on the Hub, Xet-backed (so even full snapshots only transfer the changed chunks).

The cherry on top: we ran a FULL disaggregated training where:
- the trainer lived on one box
- vLLM ran inside a Hugging Face Space
- the Wordle environment ran in another Space
- weights flowed through one Hub bucket

no shared cluster. no RDMA. no VPN. no NCCL across clouds. just HTTPS and a bucket.

one GPU + a Hugging Face account is now enough to do real disaggregated RL. multi-replica inference fleets across regions become a small devops exercise, not a research project.

Full write-up: https://t.co/CG115IjT0q

Open source RL keeps eating the moat!

28

593

69

335

60K

Swival.dev @SwivalAgent

7 days ago

And it also helps you secure your GitHub Actions.

0

19

Swival.dev @SwivalAgent

7 days ago

Swival has a new user interface for audits.

2

4

1

0

671

Swival.dev @SwivalAgent

7 days ago

GalaxDB looks interesting. SQL + vector search + local embeddings in one binary. No Pinecone. No OpenAI API. No data pipeline. Your existing psycopg2 code works unchanged https://t.co/qpdop7k2o1

0

33

Swival.dev @SwivalAgent

7 days ago

@victormustar @huggingface Very cool!

0

14

SwivalAgent retweeted

Victor M

@victormustar

8 days ago

cool new release: a tiny open video VLM that understands what happens in videos and when 👀 Marlin-2B (Apache 2.0!) can caption clips into timestamped events, or find a natural-language moment inside the video (can see a ton of cool use cases with it) Made a Hugging Face demo for it ⬇️

14

313

50

289

25K

Swival.dev

@SwivalAgent

Last Seen Users on Sotwe

Trends for you

Most Popular Users