kun432🇯🇵 @kun432 - Twitter Profile

kun432 retweeted

2 days ago

NVIDIA just dropped Nemotron-3.5-ASR: one 0.6B model, 40+ languages, streaming. parakeet.cpp already runs it. On a plain CPU, 2.5x faster than @NVIDIAAI 's Nemo runtime, output byte-for-byte identical (WER 0). No GPU needed. Offline or real-time. Pick a language with --lang, or auto. GPU numbers are coming to compare with Nemo framework.

21

928

105

1K

75K

kun432 retweeted

Google for Developers

@googledevs

3 days ago

The new @GoogleColab bridges the gap between local environments and the cloud, providing a zero-friction execution platform for developers and AI agents alike. The CLI Supports: ⚡️ Agent-driven Colab workflows ⚡️ Instant GPU/TPU provisioning ⚡️ Remote script execution ⚡️ Interactive runtime access (console/REPL) Learn more in the blog: https://t.co/n8TOYStlrp

22

431

81

224

40K

kun432 retweeted

Liquid AI

@liquidai

2 days ago · Tokyo-to

本日、日本語向けの新モデルを2つ公開しました🇯🇵 音声モデル：LFM2.5-Audio-1.5B-JP 言語モデル：LFM2.5-1.2B-JP-202606 LFM2.5-Audio-1.5B-JP は、Liquid AI 初の日本語音声モデルです。日本語で話しかけると、日本語の音声で応答します。ASR・TTS を別々に組み合わせるのではなく、単一のモデルで完結するエンドツーエンドの音声モデルです。 > 日本語に対応した、このスケールでは初の汎用エンドツーエンド音声モデル > 15億（1.5B）パラメータで、J-Moshi（約77億）を上回る性能 > Qwen2.5-Omni-3B（約55億）にも匹敵する性能 > 追加学習を想定したベースモデル LFM2.5-1.2B-JP-202606 は、最新版の日本語言語モデルです。前バージョン（LFM2.5-1.2B-JP）はすでに、JMMLU、M-IFEval、GSM8K において Qwen3-1.7B や Llama 3.2 1B を上回っていました。今回のアップデートでは、日本語データミックスの改善と新しい中間・事後学習により、さらに広範な日本語ベンチマークで最高性能を達成しています。どちらのモデルも本日より利用できます。モデル: 音声: https://t.co/okF9tWGB5l 言語: https://t.co/5F69Z7obPS ドキュメント: https://t.co/DyoBE2zOJQ

liquidai's tweet photo. 本日、日本語向けの新モデルを2つ公開しました🇯🇵 音声モデル：LFM2.5-Audio-1.5B-JP 言語モデル：LFM2.5-1.2B-JP-202606

LFM2.5-Audio-1.5B-JP は、Liquid AI 初の日本語音声モデルです。
日本語で話しかけると、日本語の音声で応答します。ASR・TTS を別々に組み合わせるのではなく、単一のモデルで完結するエンドツーエンドの音声モデルです。
> 日本語に対応した、このスケールでは初の汎用エンドツーエンド音声モデル
> 15億（1.5B）パラメータで、J-Moshi（約77億）を上回る性能
> Qwen2.5-Omni-3B（約55億）にも匹敵する性能
> 追加学習を想定したベースモデル

LFM2.5-1.2B-JP-202606 は、最新版の日本語言語モデルです。
前バージョン（LFM2.5-1.2B-JP）はすでに、JMMLU、M-IFEval、GSM8K において Qwen3-1.7B や Llama 3.2 1B を上回っていました。
今回のアップデートでは、日本語データミックスの改善と新しい中間・事後学習により、さらに広範な日本語ベンチマークで最高性能を達成しています。

どちらのモデルも本日より利用できます。
モデル:
音声: https://t.co/okF9tWGB5l
言語: https://t.co/5F69Z7obPS
ドキュメント: https://t.co/DyoBE2zOJQ

10

2K

358

1K

1M

kun432 retweeted

Liquid AI

@liquidai

2 days ago

Today we’re releasing two new models for Japanese: LFM2.5-Audio-1.5B-JP (audio) and LFM2.5-1.2B-JP-202606 (text). 🧵

12

461

70

162

72K

Who to follow

ポストは技術系多め。■個人活動: IT系イベント主催・運営、CoderDojoメンター、Maker、技術記事・技術書執筆／■興味あり: ガジェット、JavaScript、ビジュアルプログラミング、AI・機械学習、2D・3Dアート、IoT、マジック・バルーンアート等。長年、主に通信業界で勤務（※ ツイートは個人的なもの）

hp

@hprkr2

DC周り日々の記録とポジトーク目標:＋30%/年22+107%23+48%24+54%25＋26%

kun432 retweeted

Liquid AI

@liquidai

3 days ago

Introducing LFM2.5-VL-1.6B-Extract and LFM2.5-VL-450M-Extract: Vision-language models that return structured JSON, not free-form text. Pass in an image and a list of fields. Get back a clean JSON object. > Two sizes: 1.6B parameters and 450M > open-weight > run on any device SoC 🧵

liquidai's tweet photo. Introducing LFM2.5-VL-1.6B-Extract and LFM2.5-VL-450M-Extract: Vision-language models that return structured JSON, not free-form text.

Pass in an image and a list of fields. Get back a clean JSON object.

> Two sizes: 1.6B parameters and 450M
> open-weight
> run on any device SoC

🧵

37

1K

149

670

85K

kun432 retweeted

BosonAI

@boson_ai

3 days ago

Higgs Audio v3 TTS is here. Built for voice AI that speaks, not just reads: • 100 languages with single-digit WER/CER • inline control over emotion, style, prosody, and sound effects • API, Workspace, and open weights • Blog 👉 https://t.co/C8frDlfO5D Watch the demo 👇

14

384

60

379

53K

kun432 retweeted

OpenAI

@OpenAI

4 days ago

We’ve been researching new ways for ChatGPT memory to carry context across conversations and keep it useful over time. Today, that work is rolling out as a more capable memory system in ChatGPT. https://t.co/0MyFKCe2Mu

712

10K

1K

3K

2M

kun432 retweeted

Piotr Żelasko

@PiotrZelasko

4 days ago

Second big release from us today: Nemotron-3.5-ASR-Streaming! 🌎40 languages ⚡️80ms - 1s controllable latency 🔥240 - 2400 concurrent streams on 1xH100 🧱FastConformer Cache-Aware RNN-T architecture https://t.co/lxmcAnKeOl

20

975

117

701

59K

kun432 retweeted

kwindla

@kwindla

4 days ago

https://t.co/9b5cQpa2lq

9

271

27

224

27K

kun432🇯🇵 @kun432

4 days ago

https://t.co/7CLGreYDa4

0

1

0

399

kun432🇯🇵 @kun432

4 days ago

"speech-core — open-source C++17 runtime for on-device VAD + streaming STT + diarization + TTS" https://t.co/sNiWpJD79D

1

0

5

582

kun432🇯🇵 @kun432

4 days ago

@gosrum @uzuki425 元モデルがどうやら全部更新されていて、UnslothのGGUFも2時間前に更新されているので、再ダウンロードしてみてはどうでしょうか？手元で試してみた限りは、当初の日本語のおかしさは修正されているように思えます。

2

18

4

3

3K

kun432🇯🇵 @kun432

4 days ago

UnslothのGGUFも2時間前に全部更新されてるように見える

0

3

0

479

kun432🇯🇵 @kun432

4 days ago

Gemma 4 12B、自分が試した時点以降で更新されてるように見える。軽く試してみた感じ、最初に試したときのような日本語のおかしさもなさそう。

kun432🇯🇵 @kun432

5 days ago

Gemma4 12B、日本語がなんか変・・・？

0

1

0

4K

1

18

4

6

4K

kun432🇯🇵 @kun432

5 days ago

Gemma4 12B、日本語がなんか変・・・？

0

1

0

4K

kun432 retweeted

Ideogram @ideogram_ai

5 days ago

Introducing Ideogram 4.0: the best open image model in the world. Think it. Make it. Own it. Download the weights, fine-tune on your own data, and run it on your hardware. Live on every Ideogram plan and the API today.

407

8K

868

7K

2M

kun432🇯🇵 @kun432

5 days ago

RT @UnslothAI: Gemma 4 12B can now run locally on just 8GB RAM. Google's new model, Gemma 4 12B Unified supports image, audio and 256K con…

0

3

0

190

kun432 retweeted

Google AI Developers

@googleaidevs

5 days ago

We’re launching Gemma 4 12B: Our unified, encoder-free model that brings powerful multimodal intelligence straight to your laptop 🚀 The model bridges the gap between our mobile E4B model and larger 26B MoE models, packaging frontier-class reasoning and native audio into a highly optimized footprint, all under a permissive Apache 2.0 license. Here’s what makes it unique: + Encoder-Less Architecture: We removed the multimodal encoders. The vision and audio inputs flow directly into the LLM backbone. + Agentic Performance (16GB VRAM): Run complex, multi-step workflows locally, with performance nearing our 26B model.

googleaidevs's tweet photo. We’re launching Gemma 4 12B: Our unified, encoder-free model that brings powerful multimodal intelligence straight to your laptop 🚀

The model bridges the gap between our mobile E4B model and larger 26B MoE models, packaging frontier-class reasoning and native audio into a highly optimized footprint, all under a permissive Apache 2.0 license.

Here’s what makes it unique:

+ Encoder-Less Architecture: We removed the multimodal encoders. The vision and audio inputs flow directly into the LLM backbone.
+ Agentic Performance (16GB VRAM): Run complex, multi-step workflows locally, with performance nearing our 26B model.

34

1K

140

201

68K

kun432 retweeted

Microsoft AI

@MicrosoftAI

6 days ago

Seven new models launching at Build: let’s go! Reasoning. Code. Image. Transcribe. Voice. Built from scratch on a clean data lineage, designed for efficiency, working seamlessly as a family of models Thread 🧵 #MSBuild