Some images can’t be downloaded | CodeRevolution Support

This topic is: resolved

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

Tagged: images

Viewing 4 reply threads

Author

Posts
- February 20, 2022 at 11:27 am #4639
  
  teddychu2001
  Participant
  
  Post count: 9
  
  When scraping URL: https://techcult.com/how-to-install-kodi/, everything is default setting except “Copy Images From Content Locally” is tick.
  
  In the result, some images can be downloaded but some can’t. Please see screenshot.
  
  Can you please check how I can download all the images?
  
  Attachments:
  You must be logged in to view attached files.
  
  Add New Note to this Reply
- February 20, 2022 at 11:30 am #4641
  
  teddychu2001
  Participant
  
  Post count: 9
  
  This reply has been marked as private.
  
  Add New Note to this Reply
- February 20, 2022 at 12:03 pm #4642
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 4205
  
  Hello,
  
  First of all, thank you for your purchase.
  
  This site uses lazy loading for images from their content. To fix them, I added in importing rule settings, for rule ID 81, the following:
  
  Lazy Loading Images HTML Tag:
  data-full
  
  Now images should be able to be scraped correctly, please check.
  
  Tutorial video for this feature: https://www.youtube.com/watch?v=BMzJWZdodlo
  
  Also: https://www.youtube.com/watch?v=AzadF_dAAco
  
  Regards, Szabi – CodeRevolution.
  
  Add New Note to this Reply
- February 21, 2022 at 8:30 am #4643
  
  teddychu2001
  Participant
  
  Post count: 9
  
  Hello,
  
  Thanks for your prompt response. I have actually tried data-full for Lazy Loading Images before but it doesn’t work.
  
  Can you please have a look at rule ID 81 and its post again please? You will see only about half of the images scraped but not all.
  
  Add New Note to this Reply
- February 21, 2022 at 4:24 pm #4645
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 4205
  
  Hello,
  
  I checked again and indeed, this issue was caused by the scraped page limiting the usage of their images, because requests for image accessing were made too fast one after another. A scraping limiter kicks in on their part and denied access to some images.
  
  I tried to get around this limitation by adding in importing rule settings for rule ID 81: ‘ Delay Between Multiple Requests (ms)’ -> 1000 and also ‘Set Custom Curl User Agent’ -> Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36
  
  However, unfortunately none of the above helped scrape all the images correctly.
  
  I am not yet sure which content scraping protection they are using, but I suspect that getting around it would be possible only by installing a headless browser on your server (like Puppeteer) and combining the plugin with it. However, I am not 100% sure about this neither that it will help. Depends on the scraping protection system’s aggressivity.
  
  Please check details on the above, here: https://www.youtube.com/watch?v=g99IlDkt_SY
  
  How to install Puppeteer on your server (VPS only): https://www.youtube.com/watch?v=KNOIJA4pTQo
  
  Please check.
  
  Regards, Szabi – CodeRevolution.
  
  Add New Note to this Reply
Author

Posts

Viewing 4 reply threads

The topic ‘Some images can’t be downloaded’ is closed to new replies.

Attachments: