Some images can’t be downloaded

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

Tagged: 

This topic has 4 replies, 2 voices, and was last updated 2 years, 10 months ago by Szabi – CodeRevolution.

Viewing 4 reply threads
  • Author
    Posts
    • #4639


      teddychu2001
      Participant
      Post count: 9

      When scraping URL: https://techcult.com/how-to-install-kodi/, everything is default setting except “Copy Images From Content Locally” is tick.

      In the result, some images can be downloaded but some can’t. Please see screenshot.

      Can you please check how I can download all the images?

      Attachments:
      You must be logged in to view attached files.
    • #4641


      teddychu2001
      Participant
      Post count: 9
      This reply has been marked as private.
    • #4642


      Szabi – CodeRevolution
      Keymaster
      Post count: 4620

      Hello,

      First of all, thank you for your purchase.

      This site uses lazy loading for images from their content. To fix them, I added in importing rule settings, for rule ID 81, the following:

      Lazy Loading Images HTML Tag:
      data-full

      Now images should be able to be scraped correctly, please check.

      Tutorial video for this feature: https://www.youtube.com/watch?v=BMzJWZdodlo

      Also: https://www.youtube.com/watch?v=AzadF_dAAco

      Regards, Szabi – CodeRevolution.

    • #4643


      teddychu2001
      Participant
      Post count: 9

      Hello,

      Thanks for your prompt response. I have actually tried data-full for Lazy Loading Images before but it doesn’t work.

      Can you please have a look at rule ID 81 and its post again please? You will see only about half of the images scraped but not all.

    • #4645


      Szabi – CodeRevolution
      Keymaster
      Post count: 4620

      Hello,

      I checked again and indeed, this issue was caused by the scraped page limiting the usage of their images, because requests for image accessing were made too fast one after another. A scraping limiter kicks in on their part and denied access to some images.

      I tried to get around this limitation by adding in importing rule settings for rule ID 81: ‘ Delay Between Multiple Requests (ms)’ -> 1000 and also ‘Set Custom Curl User Agent’ -> Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36

      However, unfortunately none of the above helped scrape all the images correctly.

      I am not yet sure which content scraping protection they are using, but I suspect that getting around it would be possible only by installing a headless browser on your server (like Puppeteer) and combining the plugin with it. However, I am not 100% sure about this neither that it will help. Depends on the scraping protection system’s aggressivity.

      Please check details on the above, here: https://www.youtube.com/watch?v=g99IlDkt_SY

      How to install Puppeteer on your server (VPS only): https://www.youtube.com/watch?v=KNOIJA4pTQo

      Please check.

      Regards, Szabi – CodeRevolution.

Viewing 4 reply threads

The topic ‘Some images can’t be downloaded’ is closed to new replies.