So what’s a spider?
A spider is a piece of software designed to automatically harvest data from a website(s). It extracts unstructured data from the web and saves it into a structured or ordered format that can be easily analysed.
Do you keep up with your spiders to ensure consistent functionality under any website structural changes?
We have significantly invested into a quality reporting and monitoring systems that tracks the performance of all running spiders. We can tell within a minute if a spider has crashed, providing us with ample time to maintain the spider if required.
Is this a fully managed data mining solution?
Yes. We take care of management and running of spider on your behalf, providing you all the data you need at your fingertips.
What would be the standard workflow for a data mining project?
- Contact us via phone or email and let us know: your target site(s), your desired run frequency and any other specific criteria
- We review the sites and provide you with a written quote
- Once we receive your permission to proceed, the spiders are engineered, quality control checked, and then run in production.
- Harvested data is then delivered to you in your required format
How do you get the data to me?
Harvested data is typically supplied in an excel, CSV and PDF format. Other formats can also be provided (JSON, XML, SQL database, Postgres database etc.). Small files are delivered via email, large files can be shared through a utility portal (e.g. Dropbox). We can also directly connect to your database.
Can you send me the spider(s) source code and so I can run it from my own server?
Yes, you can obtain the source code for any spiders we specifically build for you. The spiders are written in python, and use a database to store the harvested data. Whilst yes, you can run the spiders from your computers, most of our clients prefer that we do it for them. The reason being is they lack access to our quality reporting systems, making it difficult to properly monitor spider progress.
How much does it cost to mine data?
The costs depend on the complexity of the target site(s) and the amount of data to be downloaded. Simple spiders start at $500, and can exceed $5000 when working with more complex spiders.
Do you charge via Fixed Price or Per-hour?
We’ve been harvesting data from the intent for more than 10 years, and have a good indication of how long our projects typically take. Most projects are quoted on a fixed price basis, however in some situations where there is a large, unknown element, we quote on a time and material basis, and provide an indication of the possible maximum and minimum prices.
Direct deposit to our bank account is preferred. Alternatively, you can pay with credit card via PayPal, however this incurs a 2% surcharge.
What are the payment terms?
Payment is in arrears, 30 days from the invoice date. Invoices are generated on the first day of the new month following project completion.
Any volume discounts?
Yes, we offer volume discounts. They are based on the frequency at which the spiders run. In other words, the cost per daily scrape is less than the cost per weekly scrape, which is less than the cost per monthly scrape and so on…
Are you related to the web-scraping group?
Yes! We own the websites: