Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.
-
AuthorPosts
-
-
December 21, 2021 at 3:57 am #4272
ncienfuegosParticipant<div>There are a number of websites that I want to crawl that I get a blank page when I select the Crawling Restrictions and then select the Visual Selector on the section:</div>
<div class=”bws_help_box bws_help_box_right dashicons dashicons-editor-help cr_align_middle”></div>
<b>Seed Page Crawling Query Type:</b>Is there a way to bypass this or to resolve this?, as about 50% of the websites I want to crawl and select via Visual Selector it gives me back this error.
Here is an example:
https://www.bizjournals.com/orlando/news/residential-real-estate
This particular page gave me back this message instead of a blank page:
<h2>Why am I seeing this page?</h2>
The website you are visiting is protected and accelerated by Incapsula. Your computer may have been infected by malware and therefore flagged by the Incapsula network. Incapsula displays this page for you to verify that an actual human is the source of the traffic to this site, and not malicious software.
<h2>What should I do?</h2>
Just click the <b>I’m not a robot</b> checkbox to pass the security check. Incapsula will remember you and will not show this page again. We recommend you run a virus and malware scan on your computer to remove any infection.Attachments:
You must be logged in to view attached files. -
December 21, 2021 at 7:13 am #4275
Hello,
First of all, thank you for your purchase.
This is caused by the scraping protection mechanisms which are active on the sites you wish to scrape.
To make this work, you can try one or multiple suggestions listed below:
- Add a user agent to requests. You can do this using the ‘Set Custom Curl User Agent’ settings field. You can add there the user agent of the latest Chrome browser, ex: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36
- You can install puppeteer on your server and configure the plugin to use it. Puppeteer is a headless browser, which will simulate a real browser when scraping pages and get access to more sites. You can do this by selecting Puppeteer in the ‘Content Scraping Method To Use’ settings field in importing rule settings in the plugin. Steps to install puppeteer: https://www.youtube.com/watch?v=KNOIJA4pTQo
I hope this info helped.
Regards, Szabi – CodeRevolution.
-
-
AuthorPosts
The topic ‘Getting a Blank Page when I use Visual Selector’ is closed to new replies.