Some websites can’t be scraped | CodeRevolution Support

This topic is: resolved

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 1 reply, 2 voices, and was last updated 3 years, 11 months ago by Szabi – CodeRevolution.

Viewing 1 reply thread

Author

Posts
- June 15, 2021 at 12:50 pm #3217
  
  teddychu2001
  Participant
  
  Post count: 9
  
  Below are some example websites can’t be scraped:
  
  https://gsuitetips.com/news/ (This one displays blank shen using Visual Selector)
  
  bestforandroid.com (This one shows below error in Crawling Helper)
  
  https://techcult.com/ (This one shows below error in Crawling Helper)
  
  I’ve checked the above 2 sites in Crawling Helper. It shows “Error in page crawling. Please try again/other webpage.”
  
  How to scrape websites like these?
  
  Add New Note to this Reply
- June 15, 2021 at 2:12 pm #3218
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 4855
  
  Hello,
  
  First of all, thank you for your purchase.
  
  The websites you linked are using JavaScript to dynamically load their content, after the user loaded the page in the browser. This dynamic content is not visible to conventional scrapers, because they are not returned in the HTML response of the page, but are added to it afterwards, dynamically, using JavaScript.
  
  The good news is that the plugin can scrape content also from these pages, if you combine it with a headless browser, like puppeteer or phantomjs (installed on your server) or HeadlessBrowserAPI (which is a service I implemented to handle dynamic content parsing, without the need to have headless browsers installed on your server).
  
  Please check these tutorial videos for details on this:
  
  Puppeteer example: https://www.youtube.com/watch?v=g99IlDkt_SY
  
  HeadlessBrowserAPI example: https://www.youtube.com/watch?v=205EinBQAoo&list=PLEiGTaa0iBIjDrfexapWc3M28iHwJI5tT&index=2
  
  OnlyFans example: https://www.youtube.com/watch?v=TXAdvsVCuy8
  
  I hope this info helped.
  
  Regards,
  
  Szabi – CodeRevolution.
  
  Add New Note to this Reply
Author

Posts

Viewing 1 reply thread

The topic ‘Some websites can’t be scraped’ is closed to new replies.