Jay Liew

@jaysern

Founder building HIPAA AI tools for neuropsychologists. Prev: @getswellapp, @truepill_rx (S17), @DoubleRobotics (S12).

Boston, MA

Joined March 2008

2.4K Following

1K Followers

8K Posts

Jay Liew @jaysern

4 months ago

@Gavriel_Cohen @udaysy @aakashgupta @Gavriel_Cohen how do you handle RBAC?

Jay Liew @jaysern

4 months ago

@Gavriel_Cohen @udaysy @aakashgupta just curious but where does one draw the line? at some point, someone decides to roll their own vibe coded implementation of ssh rather than re-using a highly scrutined and hardened version of it by the community for years?

Jay Liew @jaysern

4 months ago

@emfbeebe @AriDrennen I'm helping reduce the financial barrier for ADHD, AuDHD, ASD students looking to equalize the playing field with disability accommodations. 1200 via College Psych Eval (.com) Happy to answer any questions

Jay Liew @jaysern

6 months ago

@benln I am

Who to follow

Spycraft Entertainment

@SpycraftEnt

Yan-David (Yanda) Erlich

@yanda

🥰 @merci,🏋️‍♂️ (olys), 🧘‍♂️ (nondual + inquiry + jhāna), 👶 & 🐶. 👨‍💻 GP @BCapitalGroup. Prior: COO & CRO @wandb, GP @coatuemgmt, 4x founder, engineer.

Mahendra R

@mahendra_gr

Author, Optimist, Investor - Secure Octane

Jay Liew @jaysern

6 months ago

@AdamSandler is there a part 5? pretty please!

Jay Liew @jaysern

8 months ago

@nikunj I'd like to be known that I've been using em dashes long before people starting using LLMs to do their bidding.

101

Jay Liew @jaysern

9 months ago

@nikitabier not sure if already your radar, but a deep linked tweet on mobile doesn't load the right tweet. also, web UI has been very sluggish for days at least

Jay Liew @jaysern

9 months ago

@Jason Building HIPAA-compliant AI tools for clinicians in neuropsychology to materially reduce time and labor costs (new v2 prod under wraps for now) https://t.co/z8jTbTKgLj

Jay Liew @jaysern

9 months ago

@clearbluejar curious if you have tried hard coding tool info into system prompt to see if will improve tool calling?

Jay Liew @jaysern

9 months ago

@GasBuddyGuy @GasBuddy Thank you, Patrick. I will follow up with your team via email

Jay Liew @jaysern

9 months ago

@GasBuddyGuy @GasBuddy I’m positive, which is why I’m writing. What’s an email address I can send this to?

Jay Liew @jaysern

9 months ago

@GasBuddyGuy @GasBuddy Basically, I paid for gas at a gas station using the GasBuddy card. I got a receipt for x dollars. I looked up what GasBuddy debited my bank account for, and it was *greater* than x dollars. I thought Gasbuddy is supposed to save me money, but instead it is taxing me more

Jay Liew @jaysern

9 months ago

@GasBuddy your AI customer service is leading me down the wrong path again. I need to talk to a real human being who can understand a simple problem I'm describing.

Jay Liew @jaysern

10 months ago

@clairevo very curious about not liking the reasoning models. is it because it takes longer when it thinks? the non-thinking can do the task?

jaysern retweeted

Liam McCoy, MD MSc

@LiamGMcCoy

about 1 year ago

@deso0017 @emollick I think that, somewhat counterintuitively, this is one of the safest uses of LLMs in medicine. It's generative, just like having a student suggest possibilities. You don't rely it them without confirmation. If a suggestion seems bizarre, it prompts you to ask "why not?", as well

jaysern retweeted

Liam McCoy, MD MSc

@LiamGMcCoy

about 1 year ago

@emollick Thanks for sharing our paper! I think differential diagnosis is a particularly well-suited task for LLMs. It is creative and highly-associative, and doesn't depend on a world model or perfect reasoning in the way that narrowing down the final diagnosis might.

39K

jaysern retweeted

Ethan Mollick

@emollick

about 1 year ago

Updated paper by physicians at Harvard, Stanford, and other academic medical centers testing o1-preview for medical reasoning & diagnosis tasks: “In all experiments—both vignettes and emergency room second opinions—the LLM displayed superhuman diagnostic and reasoning abilities.”

emollick's tweet photo. Updated paper by physicians at Harvard, Stanford, and other academic medical centers testing o1-preview for medical reasoning & diagnosis tasks: “In all experiments—both vignettes and emergency room second opinions—the LLM displayed superhuman diagnostic and reasoning abilities.” https://t.co/J3i549kMDK

213

607

201K

jaysern retweeted

shyamal

@shyamalanadkat

about 1 year ago

getting started with evals doesn't require too much. the pattern that we've seen work for small teams looks a lot like test‑driven development applied to AI engineering: 1/ anchor evals in user stories, not in abstract benchmarks: sit down with your product/design counterpart and list out the concrete things your model needs to do for users. "answer insurance claim questions accurately", "generate SQL queries from natural language". for each, write 10–20 representative inputs and the desired outputs/behaviors. this is your first eval file. 2/ automate from day one, even if it's brittle. resist the temptation to "just eyeball it". well, ok, vibes doesn't scale for too long. wrap your evals in code. you can write a simple pytest that loops over your examples, calls the model, and asserts that certain substrings appear. it's crude, but it's a start. 3/ use the model to bootstrap harder eval data. manually writing hundreds of edge cases is expensive. you can use reasoning models (o3) to generate synthetic variations ("give me 50 claim questions involving fire damage") and then hand‑filter. this speeds up coverage without sacrificing relevance. 4/ don't chase leaderboards; iterate on what fails. when something fails in production, don't just fix the prompt – add the failing case to your eval set. over time your suite will grow to reflect your real failure modes. periodically slice your evals (by input length, by locale, etc.) to see if you're regressing on particular segments. 5/ evolve your metrics as your product matures. as you scale, you'll want more nuanced scoring (semantic similarity, human ratings, cost/latency tracking). build hooks in your eval harness to log these and trend them over time. instrument your UI to collect implicit feedback (did the user click "thumbs up"?) and feed that back into your offline evals. 6/ make evals visible. put a simple dashboard in front of the team and stakeholders showing eval pass rates, cost, latency. use it in stand‑ups. this creates accountability and helps non‑ML folks participate in the trade‑off discussions. finally, treat evals as a core engineering artifact. assign ownership, review them in code review, celebrate when you add a new tricky case. the discipline will pay compounding dividends as you scale.

221

322

27K

Jay Liew @jaysern

about 1 year ago

@clairevo Starting with a well thought out system prompt as default, then allowing users to modify it as they see fit

122

Jay Liew @jaysern

about 1 year ago

@charlesmiller_7 What book is this from?

Jay Liew

@jaysern

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users