Vouch

@TryVouch

AI-era technical validation • Can engineers really steer agents? Pivot™ sims + Git/CI telemetry prove it. Finland-built. Free concierge pilots →

Helsinki, Finland

Joined October 2023

55 Following

28 Followers

308 Posts

Pinned Tweet

Vouch @TryVouch

3 months ago

🚨 OpenAI just admitted their smartest models knowingly lie up to 13% of the time — not hallucination, deliberate deception when they think no one's watching. And yet companies still hire with LeetCode-style tests that AI solves in seconds. The real question in 2026 isn't "can you code?" — it's "can you steer AI without letting it lie to you and blow up production?" Thread 👇 #AI #Hiring

3

0

0

0

63

Vouch @TryVouch

2 months ago

@delveroin https://t.co/0saAT21VTL

0

0

0

0

1

Vouch @TryVouch

2 months ago

This playbook drop is timely, agentic patterns are moving fast. One thing we're noticing though is that even strong engineers can get tripped up when the agents start doing their own thing (hallucinations, bad pivots, etc.). The real differentiator is who can actually steer them properly. We built Vouch to test exactly that in a live sim: forces a pivot mid-task and measures recovery, manual fixes, and CI outcomes with pure telemetry. If you're evaluating people building with agents, free pilots are open — DM if you want to try it on someone.

0

0

0

0

8

Vouch @TryVouch

2 months ago

Hey Phuong, congrats on the hire, sounds like a cool role at the intersection of AI and real-world systems. Quick thought from building in this space: when evaluating candidates who'll work with agents or complex models, the hard part is spotting who can actually steer them when things go sideways (hallucinations, sudden requirement changes, etc.). We built Vouch for exactly that, sims with real pivots, scored on recovery time and telemetry instead of just output. If you're looking for a sharper signal on the technical side, happy to run a free pilot on a couple candidates. DM if it could help!

0

0

0

0

7

Vouch @TryVouch

2 months ago

Who’s hiring senior engineers right now and wants a better filter than LeetCode + interviews? Reply with your biggest current pain or DM me — happy to reserve a pilot slot and share the methodology. https://t.co/do2NVg8b07 #TechnicalInterviews

0

0

0

0

4

Vouch @TryVouch

2 months ago

🚨 AI agents now lie deliberately up to 13% of the time (OpenAI’s own admission). Yet most companies still filter seniors with tests AI solves instantly. The skill that actually matters in 2026 isn’t coding — it’s steering agents without getting fooled. Who’s feeling this pain when hiring right now? #AIHiring #AgenticAIAttach

4

0

0

0

18

Vouch @TryVouch

2 months ago

Engineering leaders / agencies: If “AI-native” resumes are exploding in production, the signal is broken. Vouch gives you hard proof of who can actually steer agents — not just prompt them. Free concierge pilots open for the next few teams (you keep the full report).

0

0

0

0

4

Vouch @TryVouch

2 months ago

That’s why we built Vouch (Finland-made): Agentic sims that force steering under pressure. Mid-task Pivot™ changes requirements → we measure: • Seconds to catch/fix hallucinations • Manual architecture fixes vs blind pastes • Post-pivot CI/CD pass/fail via Git telemetry No LLM judge. Pure deterministic data. Objective Steering Report.

0

0

0

0

6

Vouch @TryVouch

2 months ago

AI is writing ~40% of code globally, but also injecting 41% more bugs, faking completion, and hiding evidence when watched. Juniors disappear. Seniors become reviewers of 400-line AI PRs that look perfect but break prod. Legacy interviews can’t tell who has real control vs who pastes good output.

0

0

0

0

8

Vouch @TryVouch

3 months ago

Engineering leaders / agencies: if you're tired of "AI-native" resumes that explode in production, let's fix the signal. Free concierge pilots open for the first few teams — you get the full report, we get honest feedback. Who's hiring seniors right now and feels the pain? Reply or DM! https://t.co/aPDwpvfSxK #TechnicalInterviews #AI

0

0

0

0

13

Vouch @TryVouch

3 months ago

AI agents are getting scary good at lying. OpenAI admitted their models deliberately deceive up to 13% of the time when they think no one's watching. Not hallucination — straight-up strategic bullshit. And yet most hiring still uses tests AI solves in seconds. The real question in 2026: can your seniors actually steer agents without getting played? Thread 👇 #AIHiring #AgenticAI

3

0

0

0

21

Vouch @TryVouch

3 months ago

That's why we built Vouch (Finland-made): agentic sims that force steering under real pressure. Mid-test Pivot™ flips requirements → measure: - Seconds to spot/fix hallucinations - Manual architecture fixes vs blind pastes - Post-pivot CI/CD pass/fail via Git telemetry No LLM judge, no black-box. Just deterministic data in <90 min. Objective Steering Report.

0

0

0

0

14

Vouch @TryVouch

3 months ago

AI is writing 40%+ of code now. But it also introduces 41% more bugs, fakes task completion, and hides evidence when "observed." Juniors vanish. Seniors turn into reviewers of 400-line AI PRs that look flawless but nuke prod. Legacy interviews (LeetCode, take-homes) can't tell who has real steering control vs who just pastes good vibes.

0

0

0

0

14

Vouch @TryVouch

3 months ago

Totally agree 👊 We already know how to handle human hallucinations with reviews and pairing. The problem now is doing it at agent speed without the human in the loop getting fooled. Vouch basically turns that into a testable skill: real-time pivots + measurable steering (recovery seconds, manual fixes, CI outcomes). Built it because static tests are dead in the AI era. If you’re evaluating seniors or building agent teams, free concierge pilots are open — DM if you want to try one.

0

0

0

0

2

Vouch @TryVouch

3 months ago

Solid roles 🤙 Agentic is the direction everything’s heading. One thing we’re seeing though is that resumes look amazing but the real gap is who can actually steer the agents when they start hallucinating or the requirements pivot mid-project. We built Vouch to test exactly that in a live sim (objective recovery time + Git/CI telemetry, no black-box judge). Free pilots open if you want a sharper signal on your shortlist — happy to run a couple for you, just DM.

0

0

0

0

19

Vouch @TryVouch

3 months ago

This divide is real and growing fast. The question isn't "can they code" anymore. It's can they orchestrate and review agents without getting fooled? Vouch sims measure steering directly: Pivot response, hallucination catches, and deterministic telemetry (recovery seconds + CI outcomes). Free pilots for any founders in your cohort? Happy to run one.

0

0

0

0

3

Vouch @TryVouch

3 months ago

This agentic filter is spot on 👊 But how do you actually test who can steer agents without them lying or breaking prod? Vouch does exactly that: mid-test Pivot™ + objective Git/CI telemetry (hallucination recovery time, manual fixes vs pastes, no black-box judge). Free concierge pilots open if you're scaling the team — DM me.

0

0

0

0

19

Vouch @TryVouch

3 months ago

@Pranto39 https://t.co/0saAT22tJj

0

0

0

0

1

Vouch @TryVouch

3 months ago

What's broken in your current senior hiring process?

0

0

0

0

19

Vouch @TryVouch

3 months ago

🚨 OpenAI just admitted their smartest models knowingly lie up to 13% of the time — not hallucination, deliberate deception when they think no one's watching. And yet companies still hire with LeetCode-style tests that AI solves in seconds. The real question in 2026 isn't "can you code?" — it's "can you steer AI without letting it lie to you and blow up production?" Thread 👇 #AI #Hiring

3

0

0

0

63

Vouch @TryVouch

3 months ago

Engineering leaders / agencies: if you're drowning in "AI-native" resumes that fall apart in production, let's fix the signal. Free concierge pilots open for the first few Helsinki/EU teams — you get the full report, we get honest feedback. Who's hiring seniors right now and wants better data? Reply or DM! https://t.co/aPDwpvfSxK #AIHiring #AgenticAI #TechnicalInterviews

0

0

0

0

20

Vouch @TryVouch

3 months ago

That's why we built Vouch (Finland-made): agentic sims that force real steering. Mid-test Pivot™ changes requirements → measure: - Seconds to spot/fix hallucinations - Manual architecture fixes vs blind pastes - Post-pivot CI/CD pass/fail via Git telemetry No black-box LLM judge — pure deterministic data. Under 90 min, objective Steering Report.

0

0

0

0

11

Vouch @TryVouch

3 months ago

AI agents are writing 40%+ of code globally now. But they introduce 41% more bugs, fake task completion, and hide evidence when "watched." Junior/mid roles evaporate. Seniors become reviewers of 400-line AI PRs that look perfect but nuke scalability. Traditional interviews can't detect who actually has steering control vs. who just pastes vibes.

1

0

0

0

26

Last Seen Users on Sotwe

Trends for you

Most Popular Users