๐ Progress on DataGrab:
โ Rewrote 65 components in Typescript
โ Removed 15 unused ones
โ Found & fixed 12 bugs (detected by TS)
That's about 75% done so far.
I strongly typed anything I could possibly can.
Expect a reliable & robust tool from now on, folks.
๐ข Big News!
I wrote a web scraping SDK in Node.js that makes it very easy to extract data from arbitrarily complex sites. It supports Cheerio and Puppeteer.
Planning to open-source it on GitHub. Hopefully, it'll generate some traffic.
Here's a sneak peek:
๐ข๐ New blog post: A Guide to CSS Selectors for Web Scraping
Coming up with robust CSS selectors that extract the correct data is challenging.
Learn the basics of CSS selectors and some best practices to improve your results.
https://t.co/ZlwGct5P6s
๐ฅณ I just got paid for scraping 3 moderately large datasets for a client.
๐งข It took me about a week of hard work.
๐ก I learned a lot about web scraping and wrote some reusable code that I can integrate into @datagrab_io.
โ Side gigs are an excellent way to shape your tool.
๐ข If it's Friday, it's release time!
๐ข Introducing to you: pre-collected datasets.
๐ก I figured, if I'm already testing my tool with a variety of use cases, I might as well scrape the full datasets and offer them for sale.
๐ค As always, I'm open to suggestions for datasets.
Hal9 is an excellent tool. ๐
You bring your dataset and it allows you to do all kinds of interesting things with it, from visualizing it to training a machine learning model. ๐
Be sure to check it out! ๐
What's the easiest way to do #DataAnalytics and #WebScraping?
Try our new integration with @datagrab_io by @robertbalazsi!
Sure enough, we used our new acquired powers to find the best D20! ๐ฒ๐ง๐ช
https://t.co/gzfbs88lNq
โ๏ธ New blog post: 17 Best Practices for Fast, Reliable, and Ethical Scraping
They cover the following aspects:
๐ชฃ Data extraction
๐ Avoiding getting blocked
๐ Session and data management
๐งโโ๏ธ Legality and ethics
Learn about them here: https://t.co/UoduXbDVUT
๐ New blog post: The complete guide to proxies for web scraping
- What are proxies?
- Why do you need them for scraping?
- How can they be categorized?
- What are some of the top proxy providers?
Find the answers here:
https://t.co/jEWtmsLMnJ
- What is web scraping?
- What can it be used for?
- How does it work?
- What challenges can you run into?
๐ I wrote a blog post to answer these questions: https://t.co/Vflzzm9hwN
Social proof is so important. I just added a testimonials section to @datagrab_io's landing page.
Thanks, @BryceDavies8 and @felix12777 for trying out my tool, and for the kind words. You guys are awesome! ๐งก
#buildinpublic
Our pricing & plans have changed for the better! ๐ฅณ
The three subscription tiers accommodate recurring data needs.
If you only need data occasionally, we now offer bulk credit packages as well. They never expire.
Check out our pricing page for details: https://t.co/GbUtWP3VpU
@serp_api @MattTheMrM @robertbalazsi For smaller projects (say around 3000 pages), a Chrome extension is perfectly fine. I did that and got all data.
You specialize in SERPs, and I'm pretty sure you do a good job there.
DataGrab offers a no-code, DIY way to scrape anything. You cover a niche, I cover a vertical.
I recently got a web scraping side gig. ๐ค
My job is to scrape groceries' details from two major retailers and scrape their prices on a weekly basis, across several stores.
My stack:
- Node.js
- Axios
- Cheerio
- Puppeteer
Here's what I learned about scraping so far: ๐งต๐
๐ก๐ท๏ธ Web Scraping Tip: Always check for API calls
You might be inclined to just parse the HTML and extract data (e.g. using Cheerio, Scrapy, etc.).
Instead, check if there are API calls made from that page returning the same data.
A very good example: ๐
Proxy Guide for Web Scraping ๐ฅท๐ท๏ธ๐ก
You'll learn:
โ What are they and why use them
โ Types of proxies
โ Checklist for using proxies
โ Popular proxy providers
๐งต๐
Ethical web scraping checklist ๐ก๐ก๏ธ
โ Scrape only public data
โ Don't resell the data as-is (aggregate it, infer insights, etc.)
โ Be gentle (don't DDoS their servers)
โ Respect the robots.txt
โ Don't follow "nofollow" links
DataGrab is finally released!
After months of hard work, DataGrab is finally out there, and it's better than ever! Grab the Chrome Extension from here:
https://t.co/YDj2M5MQ3h