Crawling a posts page scrape all the website at each post

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 9 replies, 2 voices, and was last updated 9 months, 3 weeks ago by Szabi – CodeRevolution.

Viewing 9 reply threads
  • Author
    Posts
    • #9980


      Vaggelis
      Participant
      Post count: 11

      Hello,

      I followed your tutorial here https://www.youtube.com/watch?v=F6vhRJgCR_M exactly, step by step, in order to crawl some posts from news sites. The problem is that the Scrapper starts and imports at each post the whole site. It does not import just Title, Content, Images.

      I see my plugin version is little different/updated. Have you an updated tutorial for that?

      And CONGRATULATIONS for your projects.

    • #9981


      Szabi – CodeRevolution
      Keymaster
      Post count: 4620

      Hello,

      First of all, thank you for your purchase.

      Please send me more details about which news sites are you scraping and I check on this. Please send me site URLs.

      Regards, Szabi – CodeRevolution.

    • #9982


      Vaggelis
      Participant
      Post count: 11

      I need just to create a Crawl here https://newsgreece.gr/wp-admin/admin.php?page=crawlomatic_items_panel&crawlomatic_page=1

      redacted

      redacted

      In order to import posts from a category like https://www.pronews.gr/category/elliniki-politiki/kyvernisi/

      Language is greek.

    • #9984


      Szabi – CodeRevolution
      Keymaster
      Post count: 4620

      Hello,

      I configured the plugin on your site, please check.

       

      What I changed:

      Scraper Start (Seed) URL / Keywords
      https://www.pronews.gr/category/elliniki-politiki/kyvernisi/

      Do Not Scrape Seed URL:
      checked

      Seed Page Crawling Query Type:
      Class

      Seed Page Crawling Query String:
      article-link hrefattribute

      Content Query Type
      XPath

      Content Query String
      //*[@class=’csscontent wrap-content-body field-item even’]

       

      Regards.

    • #9990


      Vaggelis
      Participant
      Post count: 11

      As I can see, it was the fastest and direct response I have received!

      Ok so can you please advise how to retrieve the following?

      1. Seed Page Crawling Query Type
      2. Seed Page Crawling Query String
      3. Content Query Type

      Through Visual Selector? Do you have a detailed documentation or tutorial for that?

    • #9991


      Szabi – CodeRevolution
      Keymaster
      Post count: 4620

      Hello, sure, please check: https://www.youtube.com/watch?v=2ixtS3LQsI4

      Let me know if it helped.

      Regards.

    • #9997


      Vaggelis
      Participant
      Post count: 11

      My Visual Selector works fine. That was not the problem. The problem is that the class of the link I have chosen it crawled the whole page not the content of the posts.

      So do you have a detailed and updated tutorial on How to crawl posts from other sites to wordpress? I need that step by step.

      Otherwise I have to clone your existing Crawl ID and go like that way.

    • #9998


      Szabi – CodeRevolution
      Keymaster
      Post count: 4620

      Hello, which is the exact URL from where you don’t manage to scrape only full content? Let me know and I check it.

      Regards.

    • #9999


      Vaggelis
      Participant
      Post count: 11

      I cloned your scrape rule and I figure out with a href class inspection so I found the way that the scrapper works. Really thanks for that.

      If you have a  detailed and updated tutorial on how to scrape posts it and specifically how to locate and inspect the classes it would be fantastic.

    • #10001


      Szabi – CodeRevolution
      Keymaster
      Post count: 4620

      I don’t have an exact video for this, but idea noted.

      Regards.

Viewing 9 reply threads

The topic ‘Crawling a posts page scrape all the website at each post’ is closed to new replies.