Ash @Ashf03 - Twitter Profile

Pinned Tweet

Ash

@Ashf03

4 months ago

Aquin is out. vibe code your LLM in 2 mins.

37

164

27

49

16K

Ash

@Ashf03

about 16 hours ago

I think I'm in love with graphical/visual representations of ML model internals, it's too attractive and charming it's love at first sight every time.

2

4

0

49

Ash

@Ashf03

about 17 hours ago

@ZhiSu22 Yo @hdxswx we got some competition

0

12

Ash

@Ashf03

1 day ago

@daleverett Solid direction

1

2

0

148

Who to follow

Hai Ta (YC P26)

@HaiTa

co-founder of @userlens_hq (YC P26). AI agents for customer success teams in B2B SaaS. Viet born 🇻🇳 - built in Helsinki 🇫🇮 - now in SF 🇺🇸

földvári istván

@IstvanUfoleso

Matt Siatrak | ShipThat.App 🚀

@MSiatrak

Speed up your SwiftUI journey with https://t.co/H4ghdZfg0h 🚀save +50hrs and dive into the $86B market! Here to help you succeed. Also, dad 👨‍👩‍👧, husband🤵 & foodie 🍱

Ash

@Ashf03

2 days ago

@ycombinator @coldifl This is actually so dope

0

1

0

167

Ash

@Ashf03

3 days ago

At @AquinF03, we're continuing to make all existing evals and benchmark tools obsolete: 1/3 Custom evals: write your own scorer in Python and you get access to activations and SAE features, so you can do things like: "check whether a specific feature fired above threshold on a response" which no external eval harness can do! 2/3 Benchmark Builder now can run weight evals differently in a suite, and export results in multiple formats. 3/3 Auto-suggestions: agent observes and proactively suggests most relevant evals, with just one click to run.

Ashf03's tweet photo. At @AquinF03, we're continuing to make all existing evals and benchmark tools obsolete:

1/3
Custom evals: write your own scorer in Python and you get access to activations and SAE features, so you can do things like:

"check whether a specific feature fired above threshold on a response"

which no external eval harness can do!

2/3
Benchmark Builder now can run weight evals differently in a suite, and export results in multiple formats.

3/3
Auto-suggestions: agent observes and proactively suggests most relevant evals, with just one click to run.

2

6

2

4

579

Ash

@Ashf03

4 days ago

At @AquinF03, we just shipped SAE support for Embedding models. - Feature decomposition: see which concepts are firing and how strongly. contrastive mode shows what's different between two texts. - Feature browser: ranks concepts by how much they fire across your corpus, with auto-generated labels and top examples. - Co-activation network: concepts don't fire alone, they cluster. tight clusters are semantic domains, loose ones are general purpose. - Circuit tracing: see where in the stack each concept appears and how it builds up. some grow steadily from early on, others snap on right at the end. - Steering: boost a concept and your embedding pulls toward it, suppress it and it moves away. re-ranks your retrieval corpus so you can see exactly how results shift. - Absorption and polysemy diagnostics: absorption is when two concepts always fire together, polysemy is when one concept fires on completely unrelated things. Aquin catches both automatically. - Retrieval faithfulness: zeros out each concept and sees how much retrieval drops. high activation doesn't mean high importance. - Cross-model feature matching: finds which concepts two models share and which ones are unique to each. updated literature: aquin dot app slash embeddings

Ashf03's tweet photo. At @AquinF03, we just shipped SAE support for Embedding models.

- Feature decomposition: see which concepts are firing and how strongly. contrastive mode shows what's different between two texts.

- Feature browser: ranks concepts by how much they fire across your corpus, with auto-generated labels and top examples.

- Co-activation network: concepts don't fire alone, they cluster. tight clusters are semantic domains, loose ones are general purpose.

- Circuit tracing: see where in the stack each concept appears and how it builds up. some grow steadily from early on, others snap on right at the end.

- Steering: boost a concept and your embedding pulls toward it, suppress it and it moves away. re-ranks your retrieval corpus so you can see exactly how results shift.

- Absorption and polysemy diagnostics: absorption is when two concepts always fire together, polysemy is when one concept fires on completely unrelated things. Aquin catches both automatically.

- Retrieval faithfulness: zeros out each concept and sees how much retrieval drops. high activation doesn't mean high importance.

- Cross-model feature matching: finds which concepts two models share and which ones are unique to each.

updated literature: aquin dot app slash embeddings

Ash

@Ashf03

9 days ago

Glad to announce that @AquinF03 now supports embedding models: Geometry inspection, retrieval evaluation, fine-tuning monitoring, and embedding diff across checkpoints. here's how we support them:

Ashf03's tweet photo. Glad to announce that @AquinF03 now supports embedding models:

Geometry inspection, retrieval evaluation, fine-tuning monitoring, and embedding diff across checkpoints.

here's how we support them: https://t.co/g7tl7slkSJ

2

12

2

4

939

2

16

1

3

711

Ash

@Ashf03

5 days ago

@precious9087 @AquinF03 Obv bet on interuptibilty

0

2

0

6

Ash

@Ashf03

7 days ago

2 months building and researching interpretability tooling at @AquinF03 and I discovered that our users are divided into two groups: 1. People working on Interpretability 2. People leveraging their ML work with Interpretability First group builds on top of our tooling and experiments. Second group uses tooling for existing pipelines, and to debug/improve their ML work. At @AquinF03, we care about both. We're shipping a lot, and every release could turn into a experiment or study or a paper. Come build and research with us: https://t.co/zC92O8cdLO

Ashf03's tweet photo. 2 months building and researching interpretability tooling at @AquinF03

and I discovered that our users are divided into two groups:

1. People working on Interpretability
2. People leveraging their ML work with Interpretability

First group builds on top of our tooling and experiments. Second group uses tooling for existing pipelines, and to debug/improve their ML work.

At @AquinF03, we care about both. We're shipping a lot, and every release could turn into a experiment or study or a paper.

Come build and research with us: https://t.co/zC92O8cdLO

3

9

2

0

387

Ash

@Ashf03

5 days ago

@raina_sai goated

0

4

0

59

Ash

@Ashf03

7 days ago

@fdotinc "Pursue your life's work instead of homework" ik im using this a lot

0

3

0

104

Ash

@Ashf03

8 days ago

Full documentation: https://t.co/sPrOBaqmuy p.s. we plan to build tooling around steering, SAEs training, algo simulation, and much more!

0

5

0

1

99

Ash

@Ashf03

8 days ago

Introducing @AquinF03's Devkit! basically https://t.co/WQWNS7bfUJ's interpretability tooling locally through an SDK + CLI. Aquin SDK records training runs locally, including metrics, config, and checkpoints, then CLI packages and pushes them to Aquin for post-hoc. Once pushed, run appears in CLI runs with full inspection: loss curves, learning rate, grad norm, epoch summaries, SAE diff, and model diff. SDK is framework-agnostic. It works with any Python training loop that produces a PyTorch model. For HuggingFace Trainer and TRL, a TrainerCallback pattern wires everything in without touching training logic. pip install Aquin!

Ashf03's tweet photo. Introducing @AquinF03's Devkit!

basically https://t.co/WQWNS7bfUJ's interpretability tooling locally through an SDK + CLI.

Aquin SDK records training runs locally, including metrics, config, and checkpoints, then CLI packages and pushes them to Aquin for post-hoc.

Once pushed, run appears in CLI runs with full inspection: loss curves, learning rate, grad norm, epoch summaries, SAE diff, and model diff.

SDK is framework-agnostic. It works with any Python training loop that produces a PyTorch model.

For HuggingFace Trainer and TRL, a TrainerCallback pattern wires everything in without touching training logic.

pip install Aquin!

2

11

2

3

369

Ash

@Ashf03

9 days ago

complete literature: https://t.co/68BRJb7puI

0

4

0

1

37

Ash

@Ashf03

9 days ago

Glad to announce that @AquinF03 now supports embedding models: Geometry inspection, retrieval evaluation, fine-tuning monitoring, and embedding diff across checkpoints. here's how we support them:

2

12

2

4

939

Ash

@Ashf03

9 days ago

Embedding diff: Aquin's embedding diff compares two checkpoints on centroid positions, similarity distributions, anisotropy, and nearest-neighbor ranks. A composite drift score captures the tradeoffs, penalizing fine-tunes that improve one cluster by degrading another's geometry even if overall recall looks fine.

Ashf03's tweet photo. Embedding diff:

Aquin's embedding diff compares two checkpoints on centroid positions, similarity distributions, anisotropy, and nearest-neighbor ranks.

A composite drift score captures the tradeoffs, penalizing fine-tunes that improve one cluster by degrading another's geometry even if overall recall looks fine.

1

4

0

1

60

Ash

@Ashf03

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users