How to check why some pages are not scraped?

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

Viewing 3 reply threads
  • Author
    Posts
    • #9690


      acluke
      Participant
      Post count: 11

      https://www.heran.com.tw/product-category/%e8%ae%8a%e9%a0%bb%e5%86%b0%e7%ae%b1/

      I want to scrap all the products in this page and used xpath: //*[contains(@class,’has-post-thumbnail shipping-taxable product-type-simple’)] 

      It works well but some pages are not scraped. Like HRE-B5825V 0rHRE-C5721V that I can’t find why the automation can’t scrap these pages. (no info from log as well)

      Could please tell me how to check this issue and how to fix it?

      Thanks,
      Luke

    • #9691


      Szabi – CodeRevolution
      Keymaster
      Post count: 4573

      Hello,

      First of all, thank you for your purchase.

      Please try switching to the following settings for this website:

       

      Seed Page Crawling Query Type
      Class

      Seed Page Crawling Query String
      button product_type_simple

       

      I checked it and for me, the above settings was able to scrape also HRE-B5825V

      Regards, Szabi – CodeRevolution.

    • #9692


      acluke
      Participant
      Post count: 11

      Hi, thanks for replying.

      I found the reason is I scraped some pages before and put these pages on the trashcan. (not deleted)

      Therefore, this rule automatically ignored already scraped pages…

      It’s expected behavior for this plugin right?

      And for such issues, is there any change logs or event tracking to see what happened when crawling?

       

      Thanks,

      Luke

    • #9693


      Szabi – CodeRevolution
      Keymaster
      Post count: 4573

      Yes, it is expected behavior.

      For details, check the ‘Enable Detailed Logging’ checkbox from ‘Main Settings’ and check ‘Activity Logs’ for details on running.

      Regards.

Viewing 3 reply threads

The topic ‘How to check why some pages are not scraped?’ is closed to new replies.