MLX has been around for a while, I've been using it for internal experimentation and knowledge management (through obsidian-wiki)
Great for running models locally, Apple doubled down on it at WWDC yesterday
7 years building knowledge graphs and LLM systems from 500M-entity KGs at early stage startups to hybrid vector search at scale
Built and shipped multiple SaaS products and helped early stage teams grow to multi million enterprise contracts
Ships fast outside work too: obsidian-wiki (1800+★) and PaperOrchestra (500+ ★) with multi-agent system that researches and writes full research paper
Building the runtime security and audit layer for AI coding agents. Every agent today runs with full trust: no scope limits, no audit trail, no kill switch. Audit every tool call, enforces policy and surfaces a tamper-proof trace. Enterprises aren't waiting on better models, they're waiting on control.
More at https://t.co/4QVXkniSVj
Spent a while debugging why community FLUX models produced pure noise on mflux 0.18.0
Root cause: a missing quantization_level key in safetensors metadata makes the loader silently corrupt every weight.
Switched to FLUX.2-klein-4B instead. 4GB on disk, 9s per image, runs clean.
Here are the quantized weights on HF if you want to skip the setup 👇
day 15 of reading one arxiv paper around AI every day and sharing what actually stuck
Natural Backdoors in Code LMs (Nanjing / NTU)
tldr: your normally trained CodeBERT has hidden triggers baked in from the data. Replacing one variable name in a code snippet can flip defect detection or boost an insecure hardcoded-secret snippet to the top of search results without any attacker being present
practical step:
run trigger inversion on your deployed model before release
if inverted triggers hit >20% ASR, run unlearning
the paper's ScanNBT finds more diverse triggers than prior SOTA
paper: https://t.co/awjEtIUTKa
@timsneath Saw this on HN, Apple containers could become the default sandbox for AI coding agents on macOS
feels like Apple is building its own WSL moment but for agent workflows
day 14 of reading one arxiv paper around AI every day and sharing what actually stuck
V-JEPA 2 (Meta FAIR)
pretrained on 1M+ hours of internet video, fine-tuned on 62 hours of robot data and deployed zero-shot on Franka arms in two labs it had never seen.
pick-and-place success rate: 75%, no task labels, no reward signal
Just like GPT-2 needed RLHF before it was actually usable.
V-JEPA 2-AC would need something analogous: goals specified via language instead of curated goal images, and a planning loop that doesn't accumulate errors over long horizons
The physics prior exists now, the pretraining recipe exists.
the missing piece is the interface layer that makes this work outside a controlled lab with a carefully positioned camera
that's the interesting open problem and what @ylecun is betting on