Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.
This topic has 3 replies, 2 voices, and was last updated 2 years, 2 months ago by Szabi – CodeRevolution.
-
AuthorPosts
-
-
October 8, 2022 at 12:33 pm #6088
Hello Team,
First of all I want to thank you for creating this amazing plugin.
That being said I have a few things that I can’t fix / figure out.
I’m trying to gather job listings from different sites & have them posted on one site (using the main setting – <b>Link Generated Post Titles To Source Articles </b>)
1) I have the problem of getting double titles (image attached) from one of the sites I’m crawling ATM. No matter how much time I spent trying to go through the settings / site Inspect, I can’t figure out why. (the site is https://www.ejobs.ro/locuri-de-munca/sort-publish)
2) I think the crawler has some sort of loading problem for the site mentioned above. As you can see in the attached photo I have only 4 posts, but I had the plugin crawl 10 posts from Ejobs. The first page has around 30-40 job listings, so I don’t understand why it stops like that. Sometimes it crawls more than 4 & other times it manages to crawl everything… I made some changes to <b>Delay Between Rule, Delay Between Multiple Requests – Global Settings – (milliseconds),</b> but I’m not sure if that’s the problem…
3) I’m trying to get the company logo that listed the job application crawled & posted with the article link, but I could not do it. Tried to inspect for the lazy load, but it did not help.
Thank you for your time.
Attachments:
You must be logged in to view attached files. -
October 8, 2022 at 9:34 pm #6091
Hello,
First of all, thank you for your purchase.
1. To fix double titles, please use the settings from below in the plugin:
Title Query Type:
Regex – First MatchTitle Query String:
#”title”:”([^”]*?)”#2. I checked and the site you want to scrape initially contains only 3 jobs, the rest is loaded using Ajax. Ajax loaded content is unfortunately invisible to the plugin when scraping. I can recommend you try to scrape sitemaps instead, for example this is for latest jobs: https://www.ejobs.ro/sitemaps-new/jobs-active-latest.xml
Tutorial video for sitemap scraping: https://www.youtube.com/watch?v=xi1S1093ubo3. Settings to get the company logo:
Featured Image Query Type:
ClassFeatured Image Query String:
JDCDetails__LogoOther settings I used in the plugin:
Do Not Scrape Seed URL:
checkedSeed Page Crawling Query Type:
ClassSeed Page Crawling Query String:
JCContentMiddle__TitleContent Query Type:
ClassContent Query String:
JMDContentI hope this info helps.
Regards, Szabi – CodeRevolution.
-
October 9, 2022 at 8:49 am #6099
Hello Szabi,
Thank you for the prompt reply.
What you instructed above helped a bunch. Everything works perfectly. I’m now trying to post the firm name from the listings & I have the same problem (the duplication part). I guess it’s time to learn some Regex to solve it. Hope it’s not that hard 😅.
Thank you for the help.
Wish you all the best,
Mihnea
-
October 9, 2022 at 8:53 am #6101
I am always glad to help.
I can suggest you learn Regex here: https://regexr.com/
They have a nice cheat sheet on the left.
Regards.
-
-
AuthorPosts
The topic ‘The plugin posts the title twice / crawling inconsistency / image posting’ is closed to new replies.