Rob F.

@robinfeh

Co-Founder of Acosom GmbH / Lead Architect At Evoura / Co-Founder MiruIQ / Co-Founder of DiaIQ

Zürich, Schweiz

Joined June 2010

477 Following

115 Followers

803 Posts

Rob F.

@robinfeh

about 2 months ago

First impressions using Opus 4.7: I ran a Google Trends and SERP saturation analysis and differences were noticed quickly. Before, Claude 4.6 had a script fetching data from Google Trends keep hitting rate limits. It said it would switch to TOR to circumvent but in practice it mostly just retries until the rate limit clears. This used up a lot of tokens. Using Opus 4.7, though, it switched to TOR much faster and completed the implementation; when limits reappeared it started rotating different username/password combinations for different SOCKS circuits. This part was much better. What was even more striking was quality of analysis afterwards. Claude 4.6 analysis was not correct because it ignored normalization of data within each group. Top keywords in each group scale to 100 and other keywords scale relative to that so you can't directly compare raw values from different groups. Ratios within a group alone are trustworthy. That detail is very important if you really want to do saturation analysis and it missed it. So far Opus 4.7 feels fast and practical dealing with obstacles instead of just retrying. Still very early though so curious to see how it does on other tasks. If you like I can also make it more forceful and skeptical or in founder style.

robinfeh's tweet photo. First impressions using Opus 4.7:

I ran a Google Trends and SERP saturation analysis and differences were noticed quickly. Before, Claude 4.6 had a script fetching data from Google Trends keep hitting rate limits. It said it would switch to TOR to circumvent but in practice it mostly just retries until the rate limit clears. This used up a lot of tokens. Using Opus 4.7, though, it switched to TOR much faster and completed the implementation; when limits reappeared it started rotating different username/password combinations for different SOCKS circuits. This part was much better.

What was even more striking was quality of analysis afterwards.

Claude 4.6 analysis was not correct because it ignored normalization of data within each group. Top keywords in each group scale to 100 and other keywords scale relative to that so you can't directly compare raw values from different groups. Ratios within a group alone are trustworthy.

That detail is very important if you really want to do saturation analysis and it missed it.

So far Opus 4.7 feels fast and practical dealing with obstacles instead of just retrying.

Still very early though so curious to see how it does on other tasks. If you like I can also make it more forceful and skeptical or in founder style.

Rob F.

@robinfeh

about 2 months ago

Turbo-OCR just got open sourced. GPU accelerated - 1,200+ img/s. one of our products sponsored the implementation and current maintenance of it. We needed it internally and decided to make it open source since that's where OCR lives. Not much to say besides - go check it out: https://t.co/cqXWBqXsps

Rob F.

@robinfeh

about 2 months ago

You can find it here - https://t.co/7sz5XBdBZW Skill for claude available too ;)

169

Rob F.

@robinfeh

about 2 months ago

Let's avoid AI detection.🎁 We built a free humanizer. I took a carefully crafted old LinkedIn post of mine and ran it through GPTZero. Flagged. So let's try to fix that together using the humanizer we built. We made an intentional mistake and built an AI humanizer that actually sounds human. Not really what I wanted to build, but our beta customers asked for it 😅 So we went down a rabbit hole, gathered research, and made it happen. It runs fully locally on our hardware. What it does: - Takes input from ChatGPT, Claude, Gemini etc. - Rewrites it so it reads like a human wrote it - Retains your formatting - Bypasses GPTZero, ZeroGPT, and Turnitin etc. At DiaIQ we understand video recordings and convert them into blog posts, newsletters, social media posts, and more. We found that AI-generated drafts were good if the input is good — but yes, they got detected by GPTZero and Turnitin every single time. This isn't meant for every piece of content you produce — not everything needs to sound "human" and "personal." But when it does, the humanizer we built actually works really well. I also firmly believe — and Google seems to agree — that well-drafted and reviewed AI content (no AI slop) shouldn't be punished. But they detect it in the first place, so... I get why it got requested. We're still looking for a few more beta customers — in case you're interested in joining the not yet public and polished version of DiaIQ, DM me. It could also bring humanity back into emails. I'd love to see that. No registration, no strings attached. Link in comments 👇 #AIHumanizer #AIDetection #FreeTools #ChatGPT #ContentCreation

523

Who to follow

Danica Fine

@TheDanicaFine

Opinions my own. Developer Advocate. 🥑 ❄️ https://t.co/bYTfXzjNpI

Ismael Juma

@ijuma

Kafka, Scala, JVM, distributed systems, performance, machine learning, Haskell, @ConfluentInc.

Stefan Prodan

@stefanprodan

Open Source Engineer. Maintainer of @fluxcd. Creator of https://t.co/gJ4wNoobmP and https://t.co/wpBE5HppOo. Consultant @controlplaneio ☸️ ex-@weaveworks

robinfeh retweeted

Hans-Peter Grahsl @hpgrahsl

about 1 year ago

🐿️ #ApacheFlink 2️⃣.0️⃣ is here 🚀 📝 https://t.co/CmgGH71olH 📦 https://t.co/bKI30NWZYD

168

robinfeh retweeted

Robin Moffatt 🍻🏃🥓

@rmoff

over 1 year ago

Details of a proprietary vectorised engine from Alibaba for #ApacheFlink called Flash. It includes two new state storage engines under the name ForStDB ("For Streaming DB") It's written in C++ and they claim 10x performance improvements. https://t.co/VA7ZwllC6a

rmoff's tweet photo. Details of a proprietary vectorised engine from Alibaba for #ApacheFlink called Flash. It includes two new state storage engines under the name ForStDB ("For Streaming DB")

It's written in C++ and they claim 10x performance improvements.

https://t.co/VA7ZwllC6a https://t.co/IwtGVT7SaN

robinfeh retweeted

Apurva Mehta

@apurva1618

over 1 year ago

Is it end of the road for RocksDB in stream processing? Disaggregated state is the clearly superior architecture, with @responsive_apps investing heavily in SlateDB while Flink 2.0 has forked RocksDB.

robinfeh retweeted

Yuri Shkuro @YuriShkuro

almost 2 years ago

Just published: Towards Jaeger v2 💥💥💥 Moar OpenTelemetry! https://t.co/H3PvxFxCOk

robinfeh retweeted

Robin Moffatt 🍻🏃🥓

@rmoff

about 2 years ago

Here's a list of all the #ApacheFlink talks at #kafkasummit London next week: 🔗https://t.co/DzuK1hLGpj #ksl24 #FlinkAtKSL

robinfeh retweeted

🕹️ Alexander Gallego ⚡️

@emaxerrno

over 2 years ago

Check out the redpanda s3 fifo impl https://t.co/ADqV7i8Ybz

17K

Rob F.

@robinfeh

almost 3 years ago

A pretty neat project that combines #Flink (SQL) and #GraphQL and creates a microservice for you that does stream processing and provides a GraphQL-API for it too: https://t.co/98zTggPHUL nice one @MBroecheler

Rob F.

@robinfeh

almost 3 years ago

Oh that's news!

Hans-Peter Grahsl @hpgrahsl

almost 3 years ago

🎉 Almost missed another fantastic @MongoDB announcement today... #Atlas #StreamProcessing Can't wait to try this via early access program 🤓 https://t.co/nP7nyjBNOC

hpgrahsl's tweet photo. 🎉 Almost missed another fantastic @MongoDB announcement today... #Atlas #StreamProcessing

Can't wait to try this via early access program 🤓

https://t.co/nP7nyjBNOC https://t.co/Ng6HtntyLt

775

robinfeh retweeted

Hans-Peter Grahsl @hpgrahsl

almost 3 years ago

🔥 #MongoDB is adding #VectorSearch which will open up new cool 😎 use cases.

286

Rob F.

@robinfeh

about 3 years ago

@hpgrahsl why not a service when a library does the same ;) a lot of companies tend to follow this trend internally too... at a cost they only realize too late.

robinfeh retweeted

Mickael Maison @MickaelMaison

about 3 years ago

Edition 64 of my Kafka Monthly Digest is out! As usual it covers the releases (3.5.0, 3.4.1) and KIPs in progress as well as recent community project releases and blogs. https://t.co/dj4ye2Sy2f

robinfeh retweeted

Kevin Fischer

@kevinafischer

about 3 years ago

I don't talk much about this - I obtained one of the first FDA approvals in ML + radiology and it informs much of how I think about AI systems and their impact on the world. If you're a pure technologist, you should read the following: There's so much to unpack for both why Geoff was wrong, and why his future predictions should not be taken seriously either. Geoff made a classic error that technologists often make, which is to observe a particular behavior (identifying some subset of radiology scans correctly) against some task (identifying hemorrhage on CT head scans correctly), and then to extrapolate based on that task alone. The reality is that reducing any job, especially a wildly complex job that requires a decade of training, to a handful of tasks is quite absurd. Here's a bunch of stuff you wouldn't know about radiologists unless you built an AI company WITH them instead of opining about their job disappearing from an ivory tower. (1) Radiologists are NOT performing 2d pattern recognition - they have a 3d world model of the brain and its physical dynamics in their head. The motion and behavior of their brain to various traumas informs their prediction of hemorrhage determination. (2) Radiologists have a whole host of grounded models to make determinations, and actually, one of the most important first order determination they make is whether there is anything notably wrong with a brain structure that "feels" off. As a result, classifiers aren’t actually performing the same task even as radiologists. (3) Radiologists, because they have a grounded brain model, only need to see a single example of a rare and obscure condition to both remember it and identify it in the future. This long tail of rare conditions to avoid missing is a large part of their training, and no one has any clue how to make a model that acts similar in this way. (4) There’s so many ways to make Radiologist lives easier instead of just replacing them, it doesn’t even make sense to try. I interviewed and hired 25 radiologists, whose primary and chief complaint was that they had to reboot their computers several times a day. (5) A large part of the radiologist job is communicating their findings with physicians, so if you are thinking about automating them away you also need to understand the complex interactions between them and different clinics, which often are unique. (6) Every hospital is a snowflake, data is held under lock and key, so your algorithm might not work in a bunch of hospitals. Worse, the imagenet datasets have such wildly different feature sets they don’t do much for pretraining for you. (7) Have you ever tried to make anything in healthcare? The entire system is optimized to avoid introducing any harm to patients - explaining the ramifications of that would take an entire book, but suffice to say even if you had an algorithm that could automate away radiologists I don’t even know if you could create a viable adoption strategy in the US regulatory environment. (8) The reality is that for every application, the amount of specific and UNKNOWABLE domain knowledge is immense. LONG STORY SHORT: thinkers have a pattern where they are so divorced from implementation details that applications seem trivial, when in reality, the small details are exactly where value accrues. Should you be worried about GPT5 being used to automate vulnerability detection on websites before they’re patched? Maybe. Should you be worried GPT5 is going to interact with SOCIAL systems and destroy our society single-handedly? No absolutely not.

559

robinfeh retweeted

Stanislav Kozlovski

@kozlovski

about 3 years ago

Queues for Kafka is the hottest new feature being discussed right now! "KIP-932: Queues for Kafka" was announced just 7 days ago. But what is it? First - let’s define a queue. A traditional queue system is one where either: 🔹- many consumers read from the same queue (pub-sub) 🔹- one specific consumer reads from one specific producer (point to point) The messages are typically stored until they’re consumed once - the queues have a maximum depth. Kafka has never supported traditional queuing like this. One of its strengths has precisely been the decoupling between producer and consumer. A bad consumer has a close-to-zero effect on a producer. (unless it causes Kafka to read from disk and exhaust IOs) One pain point with this approach is that consumer groups are coupled with the number of partitions in a topic. If you have a topic with 10 partitions, you cannot scale beyond 10 consumers. So, people usually over-partition. But that's very unintuitive for a uniform workload. 🙅‍♂️ If all your messages are independent work items with no logical grouping, a single queue consumed by a pool of applications is the intuitive solution. So, KIP-932 proposes a solution with the following benefits: ✅ - the ability for many consumers to read from the same partition ✅ - individual records acknowledgments ✅ - still keep producers and consumers decoupled ✅ - no maximum queue depth ✅ - messages are still retained - so you have the ability to replay And the following limitations: 🔴 - Ordering is NOT guaranteed. Out-of-order delivery is possible within a partition. 🔴 - No exactly once - it is at least once. 🔴 - Maximum processing delta. A consumer cannot read more than N messages ahead of the slowest one. How does it do it? Shared Consumer Groups. ✨ Each broker will be a shared group coordinator for the data partitions it is leading and manage the sharing of reads. It will keep a sliding window of start <-> end offset for each pair of partition and group. The records available for consumption will only be those within that offset range, essentially adding a maximum lag between the slowest and fastest consumer. 🛑✋ Consumers from the same shared group can read from the same partition by exclusively reserving a few records (offset range) in the partition via a time-limited acquisition lock. A consumer can then ack/release/reject the message(s): 🥇 ack - acknowledges successful processing and moves the shared group’s offset progress 🥈 release - unsuccessful processing - retry. Release the record for another delivery. 🥉 reject - unsuccessful processing - abort. Blacklists the record, making it unavailable for another delivery. ☠️ To avoid poison messages - a delivery count is kept per message. When it goes over a maximum retry limit, the message is rejected. That’s it! Quite the out of the box proposal. 💡 In one sentence, this is a usability feature for un-ordered consumption by an arbitrary number of consumers. Note that nothing here is set in stone. This proposal is still pending discussion. I was the first person to reply to the discussion on the mailing list (humblebrag). The proposal may look quite different by the time it gets in. What do you think about it?

635

101

392

221K

Rob F.

@robinfeh

about 3 years ago

#Redpanda vs. #Kafka the war is on! some good insights: https://t.co/BhY1ZIt14N TLDR; the comparisons made are probably not fair...

145

Rob F.

@robinfeh

about 3 years ago

Redpanda is not faster than Kafka...well that is pretty interesting news! Thank you @vanlightly for the time spent on this!

Jack Vanlightly

@vanlightly

about 3 years ago

Redpanda bring out benchmark after benchmark claiming performance superiority over Apache Kafka. I decided to run my own tests to see if any of it was true. https://t.co/hb8euOGYmB

479

112

145

154K

167

Rob F.

@robinfeh

about 3 years ago

🚀 New in our Kafka 101 series: Real-Time Data Visualization! Stream process data from #Kafka using #Flink and ingest them to #ApacheDruid and create real-time dashboards with #ApacheSuperset. Link: https://t.co/RN27G67Bdn👈 Kudos to Théodore Curtil! Follow for more! 📊

Rob F.

@robinfeh

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users