Francesco Virga

Verified account

@francescodvirga

@inference_net

Joined January 2022

406 Following

198 Followers

147 Posts

francescodvirga retweeted

Inference @inference_net

4 days ago

Specialized models are becoming a practical path to better AI UX. Olive moved from a frontier model to a custom model trained with Inference Catalyst for their food verdict workflow. After a user scans a product, the model now delivers near-instant verdicts on what to watch out for, making the in-store experience faster and more seamless while cutting inference cost significantly. Results: - p50 latency: 2,721ms → 591ms - p99 latency: 6,414ms → 998ms - time to first word: ~0.25s - inference cost: ~70% lower Great working with @oliveholistic on this! Full case study here: https://t.co/4j3rXJcrHU

3

9

5

4

1K

Francesco Virga

@francescodvirga

17 days ago

Help your agent help itself

Sam Hogan 🇺🇸

18 days ago

3 weeks ago we open-sourced HALO this led to talking with dozens of teams running agents at scale we realized the current agent monitoring tools aren't built for the future that we so clearly see ahead of us today we’re releasing native OpenTelemetry-compatible agent tracing on @inference_net, powered by the same open-source core behind HALO

11

109

21

95

21K

0

1

0

0

23

francescodvirga retweeted

25 days ago

Shout out to @samhogan @AmarSVS @francescodvirga @atbeme @mikepollard_dev and the rest of the Inference dot net team

1

18

3

0

918

francescodvirga retweeted

Sam Hogan 🇺🇸

about 1 month ago

https://t.co/wOe4vVLpuF

8

221

26

438

23K

Who to follow

Verified account

Francesco Virga

@francescodvirga

about 1 month ago

@samhogan HALO improving HALO next??

0

2

0

0

171

francescodvirga retweeted

about 1 month ago

https://t.co/35MCmxS4uz

8

224

18

425

21K

francescodvirga retweeted

Sam Hogan 🇺🇸

about 2 months ago

We’re introducing HALO 😇 Hierarchal Agent Loop Optimizer HALO is an RLM-based agent optimization technique capable of recursively self-improving agents by analyzing their execution traces and suggesting changes. This work is inspired by the Mismanaged Genius Hypothesis proposed by @a1zhang and @lateinteraction earlier this month. tldr; we improved performance on AppWorld (Sonnet 4.6) from 73.7 --> 89.5 (+15.8) by giving HALO-RLM access to harness trace data and asking it to identify issues. The feedback from HALO surfaced failures in the harness such as hallucinated tool calls, redundant arguments in tools, refusal loops, and semantic correctness issues. Each issue mapped cleanly to a direct prompt update. We then fed these finding into Cursor (Opus 4.6), and asked the coding agent to update the underlying harness. We repeated this trace -> HALO-RLM analysis -> code update loop until the score plateaued. Today we’re open-sourcing the core HALO-RLM framework, evals, and data for further review.

samhogan's tweet photo. We’re introducing HALO 😇

Hierarchal Agent Loop Optimizer

HALO is an RLM-based agent optimization technique capable of recursively self-improving agents by analyzing their execution traces and suggesting changes.

This work is inspired by the Mismanaged Genius Hypothesis proposed by @a1zhang and @lateinteraction earlier this month.

tldr; we improved performance on AppWorld (Sonnet 4.6) from 73.7 --> 89.5 (+15.8) by giving HALO-RLM access to harness trace data and asking it to identify issues.

The feedback from HALO surfaced failures in the harness such as hallucinated tool calls, redundant arguments in tools, refusal loops, and semantic correctness issues. Each issue mapped cleanly to a direct prompt update.

We then fed these finding into Cursor (Opus 4.6), and asked the coding agent to update the underlying harness.

We repeated this trace -> HALO-RLM analysis -> code update loop until the score plateaued.

Today we’re open-sourcing the core HALO-RLM framework, evals, and data for further review.

59

1K

128

2K

153K

francescodvirga retweeted

Sam Hogan 🇺🇸

about 2 months ago

Excited to launch Day One support for tracing the Cursor Agent SDK with @inference_net 3 lines of code is all you need to track agent performance across executions and iterate to perfection Docs below 👇

samhogan's tweet photo. Excited to launch Day One support for tracing the Cursor Agent SDK with @inference_net

3 lines of code is all you need to track agent performance across executions and iterate to perfection

Docs below 👇 https://t.co/4Mz9WlRh4i

8

47

6

12

9K

francescodvirga retweeted

Sam Hogan 🇺🇸

about 2 months ago

All the best programmers I know are starting to write code by hand again

665

7K

337

1K

1M

francescodvirga retweeted

about 2 months ago

Frontier LLMs can do a lot—but can they write good flashcards? Turns out: not yet! @andy_matuschak and I created an eval for flashcard generation and found surprisingly poor results. Worse, newer models aren’t helping: GPT 5.4 performs worse than 5.2, Opus 4.7 worse than 4.6.

ozziekirkby's tweet photo. Frontier LLMs can do a lot—but can they write good flashcards?

Turns out: not yet! @andy_matuschak and I created an eval for flashcard generation and found surprisingly poor results.

Worse, newer models aren’t helping: GPT 5.4 performs worse than 5.2, Opus 4.7 worse than 4.6. https://t.co/MbcWFfDNQg

8

91

10

42

9K

francescodvirga retweeted

Sam Hogan 🇺🇸

about 2 months ago

We're releasing Schematron V2, a family of Specialized Language Models for converting messy HTML to structured JSON frontier performance at 1/10th the cost Schematron V2 was designed in partnership with some of the largest web-scraping companies in the world to meet the demands of their heaviest workloads Schematron-V2-Turbo and Schematron-V2-Small are available today on @inference_net Get started: https://t.co/8ojqgqnDqx

samhogan's tweet photo. We're releasing Schematron V2, a family of Specialized Language Models for converting messy HTML to structured JSON

frontier performance at 1/10th the cost

Schematron V2 was designed in partnership with some of the largest web-scraping companies in the world to meet the demands of their heaviest workloads

Schematron-V2-Turbo and Schematron-V2-Small are available today on @inference_net

Get started: https://t.co/8ojqgqnDqx

6

74

8

80

14K

francescodvirga retweeted

Sam Hogan 🇺🇸

2 months ago

Introducing Catalyst: a developer platform to monitor, train & deploy self-improving AI models built for teams operating AI products at scale Catalyst can automatically: - collect traces from your agents - curate training data & evals - train & deploy models on par w/ Opus 4.6

samhogan's tweet photo. Introducing Catalyst: a developer platform to monitor, train & deploy self-improving AI models

built for teams operating AI products at scale

Catalyst can automatically:
- collect traces from your agents
- curate training data & evals
- train & deploy models on par w/ Opus 4.6 https://t.co/NLHoWvhrCP

30

211

19

149

43K

francescodvirga retweeted

3 months ago

https://t.co/CudCkyWuRs

52

2K

197

4K

654K

francescodvirga retweeted

4 months ago

really glad @planetscale only supports 228tps could have been a bad night 😬

TheHarryET's tweet photo. really glad @planetscale only supports 228tps could have been a bad night 😬 https://t.co/9NggLeUIL1

1

74

3

18

28K

Francesco Virga

@francescodvirga

4 months ago

Github is falling apart. Depot makes Actions usable again and lightning fast

4 months ago

Inference processes trillions of AI tokens a week for their customers. When something breaks, the @inference_net team needs to ship a fix in minutes, not wait an hour for CI to finish. @TheHarryET and @francescodvirga tell the story of how they got there. 👇

depotdev's tweet photo. Inference processes trillions of AI tokens a week for their customers. When something breaks, the @inference_net team needs to ship a fix in minutes, not wait an hour for CI to finish.

@TheHarryET and @francescodvirga tell the story of how they got there. 👇 https://t.co/K0B7gNlqjS

2

11

2

2

4K

0

2

0

0

497

francescodvirga retweeted

Inference @inference_net

5 months ago

You're overpaying by $30,000/month running AI models at scale. Here's why (and how to fix it) How OpenAI & Anthropic work Per-token pricing: → OpenAI (GPT-4o): $2.50 / $10 per million tokens → Anthropic (Sonnet 4.5): $3 / $15 per million tokens At 1M queries/month: $30,000 - $38,000/mo The problems: 1️⃣ You pay for capabilities you don't use Frontier models are trained for everything. Your task needs maybe 1% of those capabilities. You're paying for the other 99%. 2️⃣ No economies of scale Token #1: $0.003 Token #1,000,000: $0.003 Your costs never decrease. 3️⃣ Smaller frontier models and off-the-shelf open-source models mean worse quality You're forced to choose to pay more or get worse results. The solution: Dedicated GPUs + Specialized Models Instead of per-token pricing, rent dedicated GPUs at a fixed monthly cost. Then train custom models specialized for your specific task: → Distilled from frontier models and large open source models (GPT-5, Claude, Gemini, Kimi, GLM) → Match or exceed frontier quality for your use case → 2-3x faster inference At 1M queries/month: $8,600/mo That's 71-77% cheaper with no quality sacrifice. And the biggest misconception is that "custom models can't match frontier quality." The reality: When specialized for your task, they can exceed frontier intelligence. — Most teams don’t need “the smartest model in the world.” They need the smartest model for one job. Running on infrastructure they control. At a cost that actually scales.

9

51

17

21

7K

francescodvirga retweeted

5 months ago

moving prod to @PlanetScale with over 1Tb of data an hour while still processing over 20k write ops/s on RDS for ~4b tokens/hr

TheHarryET's tweet photo. moving prod to @PlanetScale with over 1Tb of data an hour while still processing over 20k write ops/s on RDS for ~4b tokens/hr https://t.co/20zb4uNQiE

2

50

6

7

18K

francescodvirga retweeted

Sam Hogan 🇺🇸

5 months ago

We're hiring ML Engineers and Researchers! @inference_net is building end-to-end automated LLM training pipelines. Our customers include the fastest-growing companies across the Fortune 500, consumer mobile, and AI-native SaaS. $10,000 referral bonus. Links below. DMs open.

34

576

35

301

67K

Francesco Virga

@francescodvirga

6 months ago

@AmarSVS When drop actual good training framework? ML too hard :(

1

2

0

0

51

francescodvirga retweeted

6 months ago

if you send more than 1000 openai requests a day and want to get some observability send me a dm, we need people to try a new product.

TheHarryET's tweet photo. if you send more than 1000 openai requests a day and want to get some observability send me a dm, we need people to try a new product. https://t.co/mUlQUgcZ20

0

4

2

1

462

Last Seen Users on Sotwe

Trends for you

Most Popular Users