You can now use Spider in @llama_index as a web reader! Crawl/scrape urls and format the HTML into LLM ready markdown!
Spider is the fastest web crawler built for AI Agents and LLMs.
h/t @WilliamEspegren for the PR
Building a scraping stack from scratch.
Here's what we actually use across hundreds of outbound campaigns:
Instantly Data Scraper — directories.
When you need to pull from G2, Capterra, industry lists. Fast. No code.
Playwright + Claude Code — custom sites.
Anything with a weird structure or login wall. Claude writes the scraper. You run it.
Firecrawl — full site crawls.
When you need everything on a domain. Pricing pages, case studies, team pages. 10 minutes, not 10 hours.
Jina / https://t.co/pAKTBWthlG — scale.
10K+ pages. LLM-ready output. This is where most teams underinvest.
Browserbase — agentic browsing.
For flows a static scraper can't handle. Session persistence. Works where everything else breaks.
BrightData — bot-protected sites.
Yes it costs more. Yes it's worth it. LinkedIn. Amazon. Anything that actively fights you.
Finding the directories is one thing, understanding the use case and framing is another thing.
A company listed on a niche directory already told you something.
They chose to be found. They're actively positioning in that category. They want buyers to discover them. That's not a cold lead anymore.
Apollo gives you a list of people who fit a description.
Directories give you a list of people who took action.
Those are not the same signal.
Most funded teams spend 3 weeks debating which database to use.
Then overpay for ZoomInfo since that’s what they did at their last company.
The teams booking meetings on day one?
They scraped intent from niche directories before anyone else thought to.
A client recently funded their Series A. Board wanted a consistent pipeline in 90 days.
We skipped the generic Apollo list.
Scraped 5,600 schools from vertical-specific directories.
Matched them against relevant signals.
Sent 7,000 emails. Booked 55 meetings in 31 days.
Same offer. Same copy.
Different list quality.
The scraping stack matters.
But knowing where to point it matters more.
Directories are intent. Treat them that way.
Before writing cold email copy:
Jina → homepage
Spider → full crawl
Claygent → pricing/careers
Built With → tech stack
Google News → triggers
LinkedIn → profiles
All synthesized into ONE research column.
Then we write.
Dify v1.8.0 is live.
This release makes it easier to refine prompts, fix code, and manage workflows.
Refine or repair right inside LLM and Code nodes with an agent.
Prompt and Code
- Prompt Optimization: use {{last_run}} with your ideal outputs to quickly refine prompts and keep iterations under control.
- Code Fix: auto repair captures {{current_code}} and {{error_message}} to generate corrected versions so you spend less time on manual debugging.
- Version Management: every optimization and fix is saved as a version so you can compare and roll back any time.
Workflow and agent upgrades
- Multi model credentials: configure and switch between multiple keys for the same provider or a custom model.
- MCP with OAuth: connect to MCP servers with OAuth, including token expiry control and callback allowlists.
- Default values for workflow variables: all start node variable types now support defaults for faster setup
- Agent node token usage: track token usage in agent nodes for better monitoring and optimization.
Navigation and experience
- Knowledge base sorting: sort documents by status for smoother management.
- Extensible goto anything commands: a new architecture for faster navigation across projects.
Plus performance, security, and infra improvements across the board.
Full changelog: https://t.co/0tFENc67NJ
Happy building!
For the first time, you can vibe-code any AI agent.
Meet https://t.co/iLXHy3iBgJ — Computer Human AI by Langbase ☕
🔹Prompt: "make an agent that…"
🔹Sip: chai builds any AI agent
🔹Ship: every agent gets a UI 🤯
Like your on-demand AI Engineer.
What will you s(h)ip today?
This is how I use LLM to scrape 99% of websites
Many people didn't realise you can build agentic scraper to:
1. Handle Authentication, Human verification, Captcha
2. Handle pagination & complex UI interactions
3. Adaptive as website structure change
4. Scrape large set of data
What used to take hours now can be automated in mins; Here I show case how do you automate a scraping job on Upwork where people are paying $50~$80 per hour;
0:00 Intro
1:54 Methods overview
4:52 Web Scraper agent using @firecrawl@spider_rust@JinaAI_
8:37 Handle website auth & captcha using @AgentQL
20:27 AI buy tickets @MultiOn_AI
If you have any further question or want to get deep dive into the code example in the video, you can join my community where I post tips weekly: https://t.co/V4aBYxciqR
Excited to share that @crewAIInc raised $18 million in funding, with our series A led by @insightpartners, with @Boldstartvc leading our inception round. We're also thrilled to welcome @BlitzVentures, Earl Grey Capital(@amitvasudev_), and top AI leaders like @AndrewYNg and @dharmesh on board.
This investment is a major validation of our vision: CrewAI is delivering on the promise of generative AI for enterprise, transforming automation by harnessing the power of AI agents.
Our open-source framework executes over 10 million agents each month and is already trusted by an estimate of nearly half of the Fortune 500 to achieve automations that were previously impossible.
With the launch of CrewAI Enterprise, we're making it even easier for large organizations to design, test, and deploy complex AI agents at scale, with high-quality results.
A huge thank you to our employees, customers, partners, and investors.
We ship fast and are just getting started. ⚡⚡⚡
🔥 Dify v0.7.0 is out!
We've launched Conversation Variables and Variable Assigner nodes in Dify v0.7.0, tackling LLM memory limitations.
These features enable precise storage, retrieval, and updating of context information throughout the conversation flow. Supporting structured data types, they give chatflow-built LLM apps precise memory control, boosting LLMs' ability to handle complex scenarios in production.
- Read the blog: https://t.co/H56xumtkXM
- Docs: https://t.co/SQHUxjAB0q
We've also added new models and tools, and improved workflow functionality to enhance your AI apps.
See the full changelog: https://t.co/zNzoAxeyPv
Next Generation web crawling and scraping that can handle thousands - millions of pages in seconds. The fastest and most affordable service built fully in Rust Lang. https://t.co/sPC1kjU3K4