Description
Did you buy the domain list? Want to crawl the Internet and your domains for desired data?
Great, we just open sourced one of our crawlers, which is super FAST! and low resource consuming (i.e over 100 connections per second) with low RAM and CPU consumption. It is asynchronous, so provides best performance even on the smaller VPS/Linux server. You can setup a cluster of the crawlers, using for example Redis and for example RQ (Redis Queue) to process domains from several machines.
As you can see, this will cost some money to setup and run such environment. You can do the math, with ca. 100 requests per second and with 260,000,000 domains, it would require ca. 30 servers (i.e $10/month pro Server) to process it within one day. We did this and do it continuously.
By buying it from us, you save money and hassle.
Feel free to use it to get the data you want, after buying list of domains from us:
Here is the Domain Crawler Open Source GitHub page:
https://github.com/topcodersonline/domain-crawler/blob/master/crawler.py
You need to specify fields you want to crawl and input file.
Currently, it will output to standard output in JSON format these values:
– Domain
– IP
– Web Server type
– Tech stack (Powered By)
– MetaGenerator
– Email
– Country Hosted
Feel free to add/modify.
In case of questions feel free to contact us directly.