Cloris 🌱

@ClorisSignal

AI DS separating signal from noise — AI, growth, the human side of tech. Building a curious mind and a beautiful life. Cultivating Signal Garden 🌱

Silicon Valley

Joined March 2024

76 Following

109 Followers

23 Posts

Cloris 🌱

@ClorisSignal

about 15 hours ago

@RayDalio Great organizations treat talent like a learning system: honest assessment, fast feedback, and roles aligned with strengths. The hard part is keeping both standards and empathy high.

Cloris 🌱

@ClorisSignal

about 17 hours ago

The reframe that matters here: evals aren't QA, they're IP ➡️ your accumulated judgment, written down in a form a machine can optimize against. Which means the hard part was never the model. It's that most orgs have never actually written down what "good" means. Evals just force the bill due.

Aaron Levie

@levie

2 days ago

Almost all AI model and agent progress is downstream from evals. Open weights post training for specific domains comes down to evals. Agent improvements in the applied AI layer is all about evals. Agentic enterprise deployments that actually can augment work is all about evals. It’s all evals. This will become a core competency of any enterprise in the future. The companies that are able to best understand their own (and/or customers) workflows and how well agents participate in that work will be in the best position to actually drive real automation.

459

752

111K

Cloris 🌱

@ClorisSignal

1 day ago

@nvidia Enterprise AI agents will be won at the workflow layer. Models matter, but the real leverage comes from domain context, tool orchestration, security, and runtime reliability.

Cloris 🌱

@ClorisSignal

1 day ago

"Loop engineering" is having a moment now. AI plans and does the task, checks its own work, fixes it, repeats..♾ It works well for the tasks with objective ground truth. But in open-ended or creative work, the AI becomes its own judge and can quietly give itself an A+. So I dug into whether AI can actually judge AI in this research article. The hardest to build auto-checks for — no single right answer, quality is multi-dimensional, exactly the work that stays human: - Creative writing & storytelling - Strategic decisions (product, investment, career) - Aesthetic & design judgment - Emotional intelligence & communication - Long-term impact evaluation Practical ways to strengthen the loop: 👉Add a second opinion (a different model, or the same one in critic mode) 👉Anchor to clear rules and rubrics 👉Spot-check with human feedback or real outcomes Every check is a proxy. Your loop is only as strong as the weakest check it can't game. 🔧 So how are you using judges or loops in your projects?

Cloris 🌱

@ClorisSignal

4 days ago

https://t.co/JZ3VQnON7K

193

Cloris 🌱

@ClorisSignal

3 days ago

@Kupilainen @TIME Exactly. AI is already today’s tool and the future is already in the workflow. The edge is learning to use it with judgment and real agency.

Cloris 🌱

@ClorisSignal

3 days ago

@sama The real unlock is moving security from detection to remediation~ If AI can reliably close the loop from finding vulnerabilities to patching them, defenders finally get compounding leverage🫡

Cloris 🌱

@ClorisSignal

3 days ago

https://t.co/OBg8SMgUIl

159

Cloris 🌱

@ClorisSignal

3 days ago

父爱如山。他教会我什么是情义、责任和担当。他说：“你是我的骄傲。” 他希望他的姑娘健康、快乐、平安。父亲节快乐，爸爸。💙 #fathersday

Cloris 🌱

@ClorisSignal

3 days ago

Exactly~ and a practical sting: many of eval pipelines filter on the judge's confidence ("only count high-confidence calls"). If that signal is near chance, you're not denoising, you're selecting for noise. Read the reasoning trace, not the verdict's certainty, that's the whole game. Adding this to my stack, thanks for the pointer!

Cloris 🌱

@ClorisSignal

4 days ago

https://t.co/JZ3VQnON7K

193

Cloris 🌱

@ClorisSignal

4 days ago

Overheard a little kid at the water edge: “it’s so pretty, it looks AI-generated.” And I thought, this is the original. The models learned “beautiful” from places like this.🏞️ Growing up AI-native means the render is your baseline and reality is the thing that resembles it. I don’t think that’s bad. I just hope them know which one came first~

Cloris 🌱

@ClorisSignal

5 days ago

This should end the "final answer" era: agents graded on final output alone pass far more cases than trajectory eval reveals. That gap isn't noise, it's every wrong tool call, accidental delete, and lucky guess your eval never looked at. 🔍 #llmeval #aieval

Hamel Husain

@HamelHusain

4 months ago

https://t.co/O1j0Qz61by

175

184K

Cloris 🌱

@ClorisSignal

5 days ago

Gorgeous chart, but it kind of buries the lede. 100+ suits and almost all of them are still just complaints. The actual legal reality is being set by maybe three resolved cases, and the chart already tagged them for you: SETTLED, WON, LOST. Everything else is "everyone's suing everyone," which is noise. The signal is the line the courts keep drawing: training on legally-bought works lands as fair use, the liability shows up when the data was pirated or the output competes with the original.

181

Cloris 🌱

@ClorisSignal

5 days ago

Most people are stuck arguing about motive: is it safety, or is it just kneecapping competitors? I saw the question as: how much does this model actually move the needle for a bad actor, versus what they could already do with open weights and tools that exist today? Nobody's put a number on it. So "it crosses a dangerous line" and "the safety case is hollow" are both basically assertions, not measurements. And the one real data point we have points the other way — 120+ security folks looked at the capability that triggered the ban and called it a standard defensive technique. That's the thing I wish we were arguing about.

325

Cloris 🌱

@ClorisSignal

Last Seen Users on Sotwe

Trends for you

Most Popular Users