Today I spent the whole day at a large fintech with their platform team & Sr Dev setting up Open telemetry alongside the Randoli observability platform.
We managed to provide a comparable APM experience to their existing Datadog usage (at 60% cost reduction as per quote) without any sampling (they were sampling before to save cost)
The SLO monitoring & real time log monitoring was icing on the cake. The Sr Dev was able to set it up with minimal guidance.
In two weeks we turn on our SRE Agent. Can't wait to see the results.
#Observability #DevEx #OpenTelemetry #Randoli
๐ High cardinality metrics have a bad reputation. I think they're misunderstood.
If you've spent time building observability platforms or troubleshooting production systems, you've probably heard advice like:
โ Avoid user_id labels
โ Avoid request_id labels
โ Avoid customer-specific dimensions
โ Avoid anything that creates too many time series
The reasoning is understandable. High cardinality metrics can explode storage requirements, increase query complexity, and in many observability platforms, dramatically increase cost.
But here's the thing: High cardinality isn't inherently bad.
In fact, some of the most valuable production insights come from high cardinality dimensions.
Consider a latency issue affecting only a subset of customers.
An aggregate metric may show everything is healthy. The average latency remains stable and error rates are low.
However, if you can break the data down by customer, tenant, region, endpoint, pod, or workload, the problem becomes immediately visible.
That's often the difference between finding a root cause in minutes versus hours.
The challenge isn't high cardinality itself. The challenge is using it intentionally.
A few principles I've found useful:
โ Use high cardinality dimensions for investigation, not every dashboard
โ Aggregate where appropriate, but retain the ability to drill down
โ Correlate high cardinality metrics with logs and traces for faster troubleshooting
โ Understand the cost implications of your observability platform
The last point is often overlooked.
Many teams avoid high cardinality metrics not because they lack value, but because their observability vendor makes them prohibitively expensive.
As a result, engineering decisions become pricing decisions.
I suspect we'll see more observability architectures emerge that process telemetry closer to the source, making it practical to leverage richer dimensions without the same ingestion penalties.
The goal shouldn't be to collect less useful data.
The goal should be to make useful data economically sustainable.
Curious how others approach this.
๐ Do you actively discourage high cardinality metrics in your organization, or have you found ways to use them safely and effectively?
#Observability #SRE #PlatformEngineering #CloudNative #OpenTelemetry
@ThePitchTalks For God sake he's 15!
He's playing the same bowlers on the same pitch as the rest and he's ahead by a mile.
It's not his fault the pitches are like that or the ball doesn't swing. I'm not seeing anyone else with much more experience clearing the ropes like he does.
Caught the highlights from Dubios vs Wardlley and was surprised to see Wardlley was allowed to continue that far. I sincerely hope he gets off without much damage to his brain. They were trading bombs from the getgo.
Massive respect to both fighters for slugging it out.
#WardleyvsDubois #boxing
๐จ Lack of Granular Cost Visibility (The "Shared Cluster" Problem)
Kubernetes clusters are typically shared by multiple teams or services, making it difficult to attribute costs to specific business units.
Attribution Hurdles: Traditional cloud bills show node-level costs, but they don't break down spend by namespace, label, or individual microservice.
๐ก The Solution Gap: Without precise tagging and "chargeback" models, finance teams cannot hold engineering teams accountable for their spend
Checkout the first comment for more details.
#kubernetes #DevOps #CostManagement
@yegor256 If done properly microservices can help reduce complexity, blast radius etc and increase scalability.
Like any architecture, it's how well you implement it.
Introducing USVC - a single basket of high-growth venture capital, for everyone.
No accreditation required, SEC-registered, and a very low $500 minimum.
Includes OpenAI, Anthropic, xAI, Sierra, Crusoe, Legora, and Vercel. As USVC adds more companies, investors will own a piece of that too.
Liquidity typically comes when companies exit, but weโre aiming to let investors redeem up to 5% of the fund every quarter. This isnโt guaranteed, but if we can make it work, you wonโt be locked up like in a traditional venture fund.
It runs on AngelList, which already supports $125 billion of investor capital.
And Iโve joined USVC as the Chairman of its Investment Committee.
โ
Go back to the 1500s, you set sail for the new world to find tons of gold - that was adventure capital.
Early-stage technology is the modern version. It says we are going to create something new, and itโs risky. Itโs daring.
But ordinary people canโt invest until itโs old, until itโs no longer interesting, until everybody has access to it. By the time a stock IPOs, most of the alpha is gone. The adventure is gone. Public market investors are literally last in line.
This problem has become farcical in the last decade. Startups are reaching trillion dollar valuations in the private markets while ordinary investors have their noses up to the glass, wondering when theyโll be let in.
Investing in private markets isnโt easy. You need feet on the ground. You need judgment built over years. Most people donโt have the patience to wait ten or twenty years for an investment to come to fruition.
But there is no more productive, harder-working way to deploy a dollar than in true venture capital.
USVC enables you to invest in venture capital in a broad, accessible, professionally-managed way, through a single basket of innovation, focused on high-growth startups, at all stages.
It is how you bet on the future of tech: the smartest young people in the world, working insane hours, leveraged to the max, with code, hardware, capital, media, and community. Your dollar doesnโt work harder anywhere.
There is an old line - in the future, either you are telling a computer what to do, or a computer is telling you what to do. You donโt want to be on the wrong side of that transaction.
USVC lets you buy the future, but you buy it now. Then you wait, and if you are right, you get paid.
Get access here:
https://t.co/pAj1sqUsG0
@ylecun@monadic@Ph_Aghion@erikbryn There are currently 120+ engineering jobs on the Anthropic website. If Dario is so bullish on this view why keep hiring more software engineers?
@swapnakpanda Anthropic currently have 120+ engineering jobs available. He's hiring more not less while telling others these jobs are going away.
Sell shovels during the gold rush ;)
Getting My Hands Dirty Again, This Time in Infra & SRE.
A few weeks ago, one of our senior SREs moved on.
As often happens in startups, there were parts of the system that only a few people truly understood.
So I stepped in.
Not because I had time. It helped to avoid overloading the other senior folks.
What I expected:
Late nights, firefighting, and a temporary patch until we hired someone.
What I didnโt expect:
Iโm actually enjoying it. Over the past few weeks, Iโve been deep in:
๐ debugging production issues
๐ tracing latency across services
๐ revisiting monitoring and what actually matters
๐ questioning some decisions and looking at things from a new perspective.
And somewhere along the way, it started to feel familiar.
A few months ago, I wrote about how AI helped me rediscover the joy of coding. This feels similar, but different. Less AI more old fashioned grunt work.
This is less about building, and more about understanding systems under pressure.
Thereโs something deeply satisfying about:
finding the real cause behind a symptom
simplifying noisy signals into something actionable
seeing a system stabilize because of a small but correct decision
Itโs also been a strong reminder:
SRE is hard.
Not because of a lack of tools.
But because of:
๐ too many signals
๐ not enough context
๐ constant trade-offs between speed and safety
Stepping back into this role has helped me in a way I didnโt fully anticipate.
Itโs helping me stay close to the real problems.
The people we build for every day are SREs and platform engineers.
Itโs easy to drift into abstractions when youโre building a product.
Being hands-on again forces you to confront:
what actually slows people down
where tools fall short
what โnoiseโ really feels like at 1 AM
This wonโt be permanent.
But for now, Iโm relishing it.
Sometimes, stepping into a gap isnโt just about keeping things running.
Itโs about reconnecting with the work itself.
๐ Curious how others have experienced this.
Have you ever had to step back into a hands-on role unexpectedly and ended up enjoying it?
#FounderMode #Observability #SiteReliabilityEngineering #Randoli
@zuess05 We've always hired for critical thinking skills, strong system design fundamentals & a passion for building things. Someone who has that can build anything with any language.