Crawling a posts page scrape all the website at each post

This topic is: resolved

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 9 replies, 2 voices, and was last updated 2 years ago by Szabi – CodeRevolution.

Viewing 9 reply threads

Author

Posts
- March 1, 2024 at 5:59 pm #9980
  
  Vaggelis
  Participant
  
  Post count: 11
  
  Hello,
  
  I followed your tutorial here https://www.youtube.com/watch?v=F6vhRJgCR_M exactly, step by step, in order to crawl some posts from news sites. The problem is that the Scrapper starts and imports at each post the whole site. It does not import just Title, Content, Images.
  
  I see my plugin version is little different/updated. Have you an updated tutorial for that?
  
  And CONGRATULATIONS for your projects.
  
  Add New Note to this Reply
- March 1, 2024 at 6:31 pm #9981
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 5097
  
  Hello,
  
  First of all, thank you for your purchase.
  
  Please send me more details about which news sites are you scraping and I check on this. Please send me site URLs.
  
  Regards, Szabi – CodeRevolution.
  
  Add New Note to this Reply
- March 1, 2024 at 6:48 pm #9982
  Vaggelis
  Participant
  
  Post count: 11
  I need just to create a Crawl here https://newsgreece.gr/wp-admin/admin.php?page=crawlomatic_items_panel&crawlomatic_page=1
  
  redacted
  
  redacted
  
  In order to import posts from a category like https://www.pronews.gr/category/elliniki-politiki/kyvernisi/
  
  Language is greek.
  - This reply was modified 2 years, 1 month ago by Szabi - CodeRevolution.
  Add New Note to this Reply
- March 1, 2024 at 7:12 pm #9984
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 5097
  
  Hello,
  
  I configured the plugin on your site, please check.
  
  What I changed:
  
  Scraper Start (Seed) URL / Keywords
  https://www.pronews.gr/category/elliniki-politiki/kyvernisi/
  
  Do Not Scrape Seed URL:
  checked
  
  Seed Page Crawling Query Type:
  Class
  
  Seed Page Crawling Query String:
  article-link hrefattribute
  
  Content Query Type
  XPath
  
  Content Query String
  //*[@class=’csscontent wrap-content-body field-item even’]
  
  Regards.
  
  Add New Note to this Reply
- March 2, 2024 at 7:34 am #9990
  Vaggelis
  Participant
  
  Post count: 11
  As I can see, it was the fastest and direct response I have received!
  
  Ok so can you please advise how to retrieve the following?
  1. Seed Page Crawling Query Type
  2. Seed Page Crawling Query String
  3. Content Query Type
  Through Visual Selector? Do you have a detailed documentation or tutorial for that?
  
  Add New Note to this Reply
- March 2, 2024 at 7:51 am #9991
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 5097
  
  Hello, sure, please check: https://www.youtube.com/watch?v=2ixtS3LQsI4
  
  Let me know if it helped.
  
  Regards.
  
  Add New Note to this Reply
- March 2, 2024 at 5:36 pm #9997
  
  Vaggelis
  Participant
  
  Post count: 11
  
  My Visual Selector works fine. That was not the problem. The problem is that the class of the link I have chosen it crawled the whole page not the content of the posts.
  
  So do you have a detailed and updated tutorial on How to crawl posts from other sites to wordpress? I need that step by step.
  
  Otherwise I have to clone your existing Crawl ID and go like that way.
  
  Add New Note to this Reply
- March 2, 2024 at 6:11 pm #9998
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 5097
  
  Hello, which is the exact URL from where you don’t manage to scrape only full content? Let me know and I check it.
  
  Regards.
  
  Add New Note to this Reply
- March 3, 2024 at 9:44 am #9999
  
  Vaggelis
  Participant
  
  Post count: 11
  
  I cloned your scrape rule and I figure out with a href class inspection so I found the way that the scrapper works. Really thanks for that.
  
  If you have a detailed and updated tutorial on how to scrape posts it and specifically how to locate and inspect the classes it would be fantastic.
  
  Add New Note to this Reply
- March 3, 2024 at 12:18 pm #10001
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 5097
  
  I don’t have an exact video for this, but idea noted.
  
  Regards.
  
  Add New Note to this Reply
Author

Posts

Viewing 9 reply threads

The topic ‘Crawling a posts page scrape all the website at each post’ is closed to new replies.