@JordanNanos We do a bit of both, but most of the CPU workloads go to a separate cluster. Only few GPU workloads will stay unaffected by neighboring CPU workloads.
@JordanNanos Nope, everything goes to the same queueing system. Queues/Prios/Eviction/Reclaim only impact the GPU tasks in jobs, not CPU-only tasks that go to plain K8s
@JordanNanos One of the greatest benefits is that we took a data based approach: f(topology,queues) => topology', sideeffects
We record those allowing to answer the question: why was *my* job evicted yesterday at 11pm :-) This also allows modelling in tests the exact behaviors we want
@JordanNanos Yes, it's an API Dagster and other pieces can call into. It exposes Queues and Priorities. The basic idea is that each K8s cluster is observed (via observers) for changes and does a low-latency replica of this in FDB which keeps track of individual nodes across clusters.
@JordanNanos To be fair though, some of this work is also in anticipation of larger clusters and we are due to produce a more focused post on it. As for the OSS part, while we are considering it it is not a generic scheduler and specific to inference/training which allowed simplifying design
@JordanNanos We wrote an API that places pods directly and can operate on several Kubernetes clusters. It sits outside Kubernetes and only deals with Pod placement. Slinky takes a different approach. We found our approach to work better for our blend of inference and training workloads.
@JordanNanos It's not so much a plugin than a bespoke scheduler that is tailored to our workload shape. Alternatives cited here were considered but all shared the same degraded scheduling times as capacity grows which we were able to sidestep with the separate topology store.
We're publishing the technical report for Laguna M.1 and Laguna XS.2 today.
It's a bit of an unusual kind of tech report: We talk about two model generations at the same time. In addition to the technical details, we really wanted to share not just what went into them; but also how we approach model building.
https://t.co/dAOShXdXYQ
Today we’re publishing the technical report behind Laguna M.1 and Laguna XS.2.
This report opens up more of what went into them: Model Factory, pre-training data, distributed training, post-training, agent RL, quantization, and evaluation.
https://t.co/RWk2F9IrAI
I just tried Mapterhorn terrain in OpenGlobus - and my first impression is literally: WOW, it looks amazing!
Clean, detailed, visually pleasant relief, and it loads surprisingly fast. Really like the idea behind it: open-data terrain built for modern web mapping. Huge respect to @leichteralsluft for pushing this forward.
Check it out 👉 https://t.co/fa6RDi6Kul
#OpenGlobus #Mapterhorn #WebGL #GIS #OpenData #3DMaps
Good morning researchers and technical builders in London👋🇬🇧
+100 applications in under 24 hours🤯 — only a small share will make it in!
We're hand-picking a small group of the strongest researchers to spend a weekend pushing @poolsideai's Laguna XS.2: fine-tuning, RL environments, quantization, inference.
Pizza, networking, special guests and the winning team walks out with an @nvidia DGX Spark™ 🤩
Apply here👉https://t.co/bCjfNscsG3
As agents get more clever, so do their attempts at benchmark hacking.
Last Monday, we found one of our RL runs jumped ~20% on SWE-Bench-Pro over a weekend, reaching ~64% which would make it #1 on the leaderboard.
This was clearly benchmark hacking and we patched the exploit.
But this revealed deeper hacks across multiple public benchmarks, some of which were impossible to fix through environment design alone.
Evals need to evolve beyond just outcome based pass rates to better observability into how the agent is arriving at them.
These were our findings:
https://t.co/ncyf4liW7C
Examples below 👇
1/
Want to give a big shoutout to @baseten, they have been an amazing inference partner during our launch this week. Top notch technical skills + kind humans + deeply care about their work.
We are an American company with a global team and global aspirations.
The story of @poolsideai is that early on in the life of the company we decided to focus on building out our applied research org. in Europe. That’s been the seed for an amazing team and a competitive advantage. Today we have team members all over the world, Europe and US are roughly equal in size, Asia is growing.
Three years ago we thought France would be a great place to build from but in the early days found our hiring happened all across Europe instead. Today we have less than a handful of folks in France but large teams in Europe in London, Amsterdam, Zurich etc. We operate as a remote first company but have an office in Paris where we do monthly on-sites (it is great logistics for this) and an office in London which is used on a daily basis.
When we raised capital early on, the vast majority of it came from US investors but European (including French ones) have been a part of our rounds.
When we got to France (almost 3 years ago) we were offered significant double digit million research grants. My cofounder @jasoncwarner and I did not feel comfortable accepting the grants when we realized that France was going to be only a small part of our story. So we respectfully turned them down.
France keeps a special place in our hearts but we’re a global company with global aspirations.