Crawling info from each HREF in the link

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 8 replies, 2 voices, and was last updated 2 years, 8 months ago by nils.

Viewing 8 reply threads
  • Author
    Posts
    • #4648


      nils
      Participant
      Post count: 4

      I can’t figure out two examples:

      https://lv.carweb.eu/lv/vehicles/search how do I get it to crawl info for each href in .makeModel class and then create info for each of it?

      Same in https://www.ss.com/lv/transport/cars/filter/

    • #4651


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Hello,

      First of all, thank you for your purchase.

      I checked and scraping this website is possible.

      Please check the settings I used below for each site:

       

      Scraper Start (Seed) URL:
      https://lv.carweb.eu/lv/vehicles/search

      Do Not Scrape Seed URL:
      checked

      Seed Page Crawling Query Type:
      XPath

      Seed Page Crawling Query String:
      //*[@class=’pic’]

      Content Query Type:
      XPath

      Content Query String:
      //*[@class=’lSect’]

       

       

      Scraper Start (Seed) URL:
      https://www.ss.com/lv/transport/cars/alfa-romeo/

      Do Not Scrape Seed URL:
      checked

      Seed Page Crawling Query Type:
      Class

      Seed Page Crawling Query String:
      msga2

      Content Query Type:
      XPath

      Content Query String:
      //*[@id=’msg_div_msg’]

       

      Regards, Szabi – CodeRevolution.

    • #4653


      nils
      Participant
      Post count: 4
      This reply has been marked as private.
    • #4654


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Interesting.

      I changed to:

      Seed Page Crawling Query String:
      //*[@class=’makeModel’]

       

      Now works, please check.

      Regards.

    • #4657


      nils
      Participant
      Post count: 4

      How is possible to set what data are going into the post? I want to make it so that it’s later on searchable somehow, is this the right plugin for that?

    • #4658


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Hello,

      To select which parts of the scraped articles to be imported into the post content, you can use the visual selector feature of the plugin to highlight and select the part of articles you wish to scrape.

      Please check this tutorial video for details on this feature: https://www.youtube.com/watch?v=2ixtS3LQsI4

      Regards.

    • #4724


      nils
      Participant
      Post count: 4
      This reply has been marked as private.
    • #4726


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577
      This reply has been marked as private.
    • #4735


      nils
      Participant
      Post count: 4
      This reply has been marked as private.
Viewing 8 reply threads

The topic ‘Crawling info from each HREF in the link’ is closed to new replies.