Finteresting @finterestingnow - Twitter Profile

about 22 hours ago

@prakdadlani Well IT managers - at least in some orgs - make sure that you are replaceable - risk management - but that process kills innovation at individual level, everything is shared and nothing your own, its handed over to someone else to manage without benefits to the inventor.

0

46

Finteresting @FinterestingNow

1 day ago

@josepha_mayo @NVIDIAAI

0

381

FinterestingNow retweeted

Marktechpost AI

@Marktechpost

2 days ago

Most diffusion language models make one network do two jobs at once — represent the clean context and denoise the noisy tokens. Those two goals pull the same weights in different directions. NVIDIA just split them apart. They released Nemotron-Labs-TwoTower — a block-wise autoregressive diffusion model built on the Nemotron-3-Nano-30B-A3B hybrid Mamba-2/attention/MoE backbone. It runs two towers: a frozen autoregressive context tower that processes clean tokens causally, and a trainable diffusion denoiser tower that refines noisy blocks via cross-attention to that context. Only the denoiser is trained — on ~2.1T tokens, a fraction of the backbone's 25T. Here's what's actually interesting: → Two towers, not one: a frozen AR context tower and a trainable diffusion denoiser, connected layer-by-layer — denoiser layer i attends to context layer i, not just the last hidden state → 98.7% of the autoregressive baseline's quality at 2.42× generation throughput (γ=0.8, block size 16, 2×H100) → It commits multiple tokens per denoising step early in decoding — that's where the one-token-per-step AR bottleneck breaks → One checkpoint, three decoding modes: mask diffusion, mock-AR, and standard AR → Ablations: causal Mamba beats bidirectional Mamba, and tying the two towers under a joint loss is substantially worse Full analysis: https://t.co/xU17IsVGWQ Paper: https://t.co/WEmFQYmY5v Weights: https://t.co/HvRZ6VEeAb @NVIDIAAI @NVIDIARobotics @NVIDIADeepLearn @nvidiadeveloper

2

18

4

43K

FinterestingNow retweeted

NVIDIA AI

@NVIDIAAI

1 day ago

We took a 30B model and split it in two to write tokens in parallel instead of one at a time. Introducing Nemotron-Labs-TwoTower: a diffusion language model from NVIDIA Research adapted from Nemotron-3-Nano-30B-A3B. Here’s how it works: one half holds the context, the other writes the tokens, with both reusing the pretrained model instead of training a new one from scratch. We found it kept 98.7% of the original model’s quality at 2.42× faster generation.

115

4K

557

2K

645K

Who to follow

PanaSmart

@PanasmartSA

PanaSmart is a leading Voice, Data, Video, OA, Hosting, UPS and Power provider. With a service area throughout South Africa.

Mohamed Reda

@MRedaOfficial

#Presales #Cloud_hosting #Data_Center #Data_Analytics #Blockchain

Ranvijay Singh

@ranvy003

FinterestingNow retweeted

ollama

@ollama

2 days ago

Gemma 4 is now nearly 90% faster on Apple Silicon with Ollama using MLX! The speedup comes from improved multi-token prediction (MTP), now on by default for Gemma 4, with more models to come. Ollama automatically tunes how many tokens to draft as it runs, so it never slows generation down when speculation no longer contributes to a speedup.

ollama's tweet photo. Gemma 4 is now nearly 90% faster on Apple Silicon with Ollama using MLX!

The speedup comes from improved multi-token prediction (MTP), now on by default for Gemma 4, with more models to come.

Ollama automatically tunes how many tokens to draft as it runs, so it never slows generation down when speculation no longer contributes to a speedup.

59

3K

267

1K

199K

Finteresting @FinterestingNow

3 days ago

@Mahesh10816 It's very clever mechanism to charge 13 recharges instead of 12 over a year.

0

3

0

390

FinterestingNow retweeted

Jun Song

@jun_song

4 days ago

Introducing SuperQwen-Agentworld-35B-A3B-abliterated 🚀 First Super-tune version of Qwen. Agentworld-35b is one of the strongest agentic small model available for now. > Uncensored, intelligent enhanced version Try now on HF ⬇️

jun_song's tweet photo. Introducing SuperQwen-Agentworld-35B-A3B-abliterated 🚀

First Super-tune version of Qwen.

Agentworld-35b is one of the strongest agentic small model available for now.

> Uncensored, intelligent enhanced version

Try now on HF ⬇️ https://t.co/ddxcluSRQP

25

454

52

419

25K

FinterestingNow retweeted

Kyle Hessling

@KyleHessling1

4 days ago

Good morning y'all! Qwopus-3.6-35B-A3B-MTP-Coder is live! All GGUF's will be populating over the next few hours! It's a lightning-fast MOE with the coder curriculum recipe. Similar to the 27B coder, it shines with thinking disabled, offering significantly faster wall time for similar, and in some cases superior results to same-sized thinking alternatives! With thinking disabled, it goes toe-to-toe with the new Ornith 35B MoE across a huge eval suite (performed by @no_stp_on_snek), edging it on the coding trajectories and decisively on speed and cost, even though Ornith was run with thinking enabled. See the model card for the full test results, and shoutout to Tom, @no_stp_on_snek, for thoroughly evaluating the model for us before launch! With MTP and thinking disabled, along with the MOE speed, it runs so quickly in harnesses like @opencode that it almost feels instant @ 253 tps on my 5090. No 8k tokens of thinking before a coherent output is actioned. This is especially useful in long contexts, where the base models will progressively start thinking for tens of thousands of tokens before replying. Compared to the base models with thinking off, the coder curriculum really advances the no-think frontier. Especially in terms of how creative it can be. Run temp hot as usual, 0.85-1, and make sure your harness isn't overriding the temp setting of your server at runtime. If you want to use it to its full ability, I would recommend giving it very thorough prompts. I have been using it in opencode, and I have been blown away by the results it generates autonomously with chunky prompts. Please see links to the demo's Aether Dominion (RTS Game), and a slide deck presentation the model made about itself that turned out beautifully, links in comments below! I am getting results on this incredibly fast local model (with thinking disabled) that I couldn't get in some thinking frontier models over a year ago. Open source is accelerating fast, and in light of recent events, there's never been a better time to get your local AI workflows tightened up. This MOE would be a great one to play with, and it's also a great one if you don't have much VRAM because it can run fast offloaded partially to system memory! All of that said, please give it a run with thinking off and build something you'd like to see. We'd love to see your results and any feedback on specific use cases in the comments below! Also, thanks so much for 5k followers, you all make up such an enjoyable and knowledgeable open source community, and I am so blessed to be able to collaborate and discuss this research with all of you. I can't express how grateful I am for every comment. As always, I will try to reply to them all! If we ever get monetized on X, I will put every penny into buying more hardware for our lab! Have a blessed day, my friends, looking forward to your thoughts! https://t.co/0WkjglsaWS

62

1K

94

1K

129K

FinterestingNow retweeted

Google Flow

@FlowbyGoogle

7 days ago

This just in: Gemini Omni is open to everyone in Google Flow! Edit videos using natural language (it’s like Nano Banana, but for video!). We've loved your edits so we're opening access. Not a Google AI subscriber? You can still create 2 Omni videos free each day! Try it today!

0

2K

103

218

2M

FinterestingNow retweeted

Jean-Rémi King

@JeanRemiKing

4 days ago

We’re happy to announce 2 releases today: - 🧠Brain2qwerty v1 is published at @NatureNeuro - 🚀 Brain2Qwerty v2 is now publicly released Explore how we decode sentences from non-invasive brain recordings: https://t.co/IdR6gK2hcd Links: 📄v1 Nature Neuro: https://t.co/wnRjc9W9gI 📑v2 Meta preprint: https://t.co/oSfLOQFcvg 💻Code: https://t.co/Xbe0XWfWQL 📊Data: https://t.co/SCBbs4AhTg 📝Blog: https://t.co/15RvsAaXlH 🧵Thread: https://t.co/d8FJrVyDut

87

3K

424

2K

416K

FinterestingNow retweeted

alphaXiv

@askalphaxiv

5 days ago

DeepSeek just published DSpark, a speculative decoding system that boosts live DeepSeek V4 serving throughput by 51% to 406% under stricter latency targets. With how most speculative decoding methods draft more tokens, but waste verification compute when those tokens get rejected, DSpark fixes this with a semi autoregressive drafter for more coherent long drafts, plus a confidence scheduler that only verifies prefixes likely to survive. It also gives 60% to 85% faster per user generation at matched throughput.

askalphaxiv's tweet photo. DeepSeek just published DSpark, a speculative decoding system that boosts live DeepSeek V4 serving throughput by 51% to 406% under stricter latency targets.

With how most speculative decoding methods draft more tokens, but waste verification compute when those tokens get rejected, DSpark fixes this with a semi autoregressive drafter for more coherent long drafts, plus a confidence scheduler that only verifies prefixes likely to survive.

It also gives 60% to 85% faster per user generation at matched throughput.

15

536

64

175

30K

FinterestingNow retweeted

Mia

@MiaAI_lab

4 days ago

DeepSeek v4 Flash DSpark running on 2x @NVIDIAAI DGX Sparks at 60 tok/s. ~50% improvement from the previous recipe! Context set to 256k conservatively — ~3 concurrent sessions. Thanks to @rafaelcaricio for making this happen 👇 https://t.co/6HXa9pxqhj

22

328

31

228

24K

Finteresting @FinterestingNow

4 days ago

@dev_maims Sales is harder than building, unless it's vanity. And if you are good at sales, you don't need local llm per say, customer will provide you with compute.

0

2

0

18

Finteresting @FinterestingNow

5 days ago

@morganlinton I found Gemma better than qwen in Java coding tasks

0

19

FinterestingNow retweeted

ठा. प्रदीप कुमार सिंह

@Pradeepthakur_4

6 days ago

इंदौर में जो प्रयास किया जा रहा है। काश पूरे इंडिया में हो जाए तो 🙏

35

2K

452

231

145K

Finteresting @FinterestingNow

6 days ago

@IndianTechGuide Don't get me wrong, in this age 4 year campus is too long... No matter what you are learning, do it with live industry or labs...

0

315

Finteresting @FinterestingNow

6 days ago

@FactswithDinesh Diversification of energy sources

0

6

FinterestingNow retweeted

SciTech Era

@SciTechera

7 days ago

AI race is getting bigger.. CHINA UNVEILS ITS OWN AI RIVAL TO ANTHROPIC'S MYTHOS 👀 China's 360 security technology has introduced Yitian Tulong, a new AI cybersecurity platform designed as a domestic alternative to Anthropic's Mythos. Company unveiled 2 powerful AI systems" > Tulongfeng: An AI model that automatically discovers software vulnerabilities and has reportedly identified 3,432 vulnerabilities, including 105 verified by Chinese authorities. > Yitianzhen: An AI-powered cyber defense system that helps detect threats, automate security operations, and respond to cyberattacks. They're doing everything to stay in the race and even get ahead! The world is moving faster than ever!

SciTechera's tweet photo. AI race is getting bigger..

CHINA UNVEILS ITS OWN AI RIVAL TO ANTHROPIC'S MYTHOS 👀

China's 360 security technology has introduced Yitian Tulong, a new AI cybersecurity platform designed as a domestic alternative to Anthropic's Mythos.

Company unveiled 2 powerful AI systems"

> Tulongfeng: An AI model that automatically discovers software vulnerabilities and has reportedly identified 3,432 vulnerabilities, including 105 verified by Chinese authorities.

> Yitianzhen: An AI-powered cyber defense system that helps detect threats, automate security operations, and respond to cyberattacks.

They're doing everything to stay in the race and even get ahead! The world is moving faster than ever!

11

120

32

37

9K

Finteresting @FinterestingNow

6 days ago

@Indian_Analyzer Why not make it open source like GLM or Qwen, Hermes Agent and many more for the world, instead of keeping it paywalled

0

225

Finteresting @FinterestingNow

6 days ago

@jpschroeder There was no fun left in just downloading and running models.

0

3

Finteresting

@FinterestingNow

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users