The plugin posts the title twice / crawling inconsistency / image posting

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 3 replies, 2 voices, and was last updated 2 years, 2 months ago by Szabi – CodeRevolution.

Viewing 3 reply threads
  • Author
    Posts
    • #6088


      Mihnea
      Participant
      Post count: 1

      Hello Team,

       

      First of all I want to thank you for creating this amazing plugin.

      That being said I have a few things that I can’t fix / figure out.

      I’m trying to gather job listings from different sites & have them posted on one site (using the main setting – <b>Link Generated Post Titles To Source Articles </b>)

      1) I have the problem of getting double titles (image attached) from one of the sites I’m crawling ATM. No matter how much time I spent trying to go through the settings / site Inspect, I can’t figure out why. (the site is https://www.ejobs.ro/locuri-de-munca/sort-publish)

      2) I think the crawler has some sort of loading problem for the site mentioned above. As you can see in the attached photo I have only 4 posts, but I had the plugin crawl 10 posts from Ejobs. The first page has around 30-40 job listings, so I don’t understand why it stops like that. Sometimes it crawls more than 4 & other times it manages to crawl everything… I made some changes to <b>Delay Between Rule, Delay Between Multiple Requests – Global Settings – (milliseconds),</b> but I’m not sure if that’s the problem…

      3) I’m trying to get the company logo that listed the job application crawled & posted with the article link, but I could not do it. Tried to inspect for the lazy load, but it did not help.

      Thank you for your time.

      Attachments:
      You must be logged in to view attached files.
    • #6091


      Szabi – CodeRevolution
      Keymaster
      Post count: 4622

      Hello,

      First of all, thank you for your purchase.

      1. To fix double titles, please use the settings from below in the plugin:

      Title Query Type:
      Regex – First Match

      Title Query String:
      #”title”:”([^”]*?)”#

      2. I checked and the site you want to scrape initially contains only 3 jobs, the rest is loaded using Ajax. Ajax loaded content is unfortunately invisible to the plugin when scraping. I can recommend you try to scrape sitemaps instead, for example this is for latest jobs: https://www.ejobs.ro/sitemaps-new/jobs-active-latest.xml
      Tutorial video for sitemap scraping: https://www.youtube.com/watch?v=xi1S1093ubo

      3. Settings to get the company logo:

      Featured Image Query Type:
      Class

      Featured Image Query String:
      JDCDetails__Logo

       

      Other settings I used in the plugin:

      Do Not Scrape Seed URL:
      checked

      Seed Page Crawling Query Type:
      Class

      Seed Page Crawling Query String:
      JCContentMiddle__Title

      Content Query Type:
      Class

      Content Query String:
      JMDContent

       

      I hope this info helps.

      Regards, Szabi – CodeRevolution.

    • #6099


      Mihnea
      Participant
      Post count: 1

      Hello Szabi,

      Thank you for the prompt reply.

      What you instructed above helped a bunch. Everything works perfectly. I’m now trying to post the firm name from the listings & I have the same problem (the duplication part). I guess it’s time to learn some Regex to solve it. Hope it’s not that hard 😅.

       

      Thank you for the help.

      Wish you all the best,

      Mihnea

    • #6101


      Szabi – CodeRevolution
      Keymaster
      Post count: 4622

      I am always glad to help.

      I can suggest you learn Regex here: https://regexr.com/

      They have a nice cheat sheet on the left.

      Regards.

Viewing 3 reply threads

The topic ‘The plugin posts the title twice / crawling inconsistency / image posting’ is closed to new replies.