Can’t crawl the pageurl via visual selector for kolin.com

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 3 replies, 2 voices, and was last updated 11 months ago by Szabi – CodeRevolution.

Viewing 3 reply threads
  • Author
    Posts
    • #9448


      acluke
      Participant
      Post count: 11

      Hi, I recently purchased this plugin and not sure how to use it well.

      For example, https://kolin.com.tw/product/fridge

      I tried to use xpath “//div[@class=’thumbnail’]/a” or visual selector to get all the fridges URLs but it failed.

      I am not sure whether I need to enable headless or any extra features to make it work well.

      Could please help to take a look and give me some advices to use this plugin?

      Thanks so much.

      Luke

       

    • #9449


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Hello,

      First of all, thank you for your purchase.

      To scrape products from this specific site, please use the below config:

      Do Not Scrape Seed URL:
      checked

      Seed Page Crawling Query Type:
      Class

      Seed Page Crawling Query String:
      col-sm-4 contant-box

      Regards, Szabi – CodeRevolution.

    • #9458


      acluke
      Participant
      Post count: 11

      Thanks for replying!

      I’ve encountered another issue while crawling content page.

      Here is the log:

      [25-Dec-2023 15:26:12 Etc/GMT-8] Failed to exec curl in crawlomatic_curl_exec_utf8! https://kolin.com.tw/assets/uploads/files/product/3fridge_cate/KR-258V05/KR-258V05_%E5%95%86%E8%AA%AA02.png – err: Connection timed out after 10001 milliseconds – 28 url: https://kolin.com.tw/assets/uploads/files/product/3fridge_cate/KR-258V05/KR-258V05_%E5%95%86%E8%AA%AA02.png

      The page I crawled: https://kolin.com.tw/product/fridge/518 & xpath: //div[@class=’row pdt_content’]

      It seems the content image is too big and timed out, may I ask how to solve such questions?

      BTW, also wanna ask what’s <b>Crawled Pages Crawling Query </b>for and when I will need to use it?

      Thanks and merry x’mas,

      Luke

    • #9459


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Hello,

      The above issue does not point that the image is too large, but it points to the “Connection timeout” – meaning that the connection to the image (to get its first byte) was not able to be done after 10 seconds. This usually points that the image is inaccessible because a firewall rule blocking the connection (this can be from your server’s side or from the target server’s side, where the image is hosted).

      The ‘Crawled Pages Crawling Query’ settings refer to extracting links which should be scraped, from the URLs from where the plugin already scraped content and created posts (usually these are posts). Using this feature, you can continue to scrape links which are usually found on the right side of blog posts (posts recommended for users on the right column of posts). This feature is optional.

      Regards.

       

Viewing 3 reply threads

The topic ‘Can’t crawl the pageurl via visual selector for kolin.com’ is closed to new replies.