Jobs Scraper

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

Tagged: 

This topic has 3 replies, 2 voices, and was last updated 1 year, 9 months ago by Szabi – CodeRevolution.

Viewing 3 reply threads
  • Author
    Posts
    • #6868


      josefernandilho
      Participant
      Post count: 6

      Hi there I am trying to create a Job Scraper but need a bit of guidance of how I should best go about it. So what I am looking to do is:

      1) Scrape all links in the following search results: https://www.jobs.ac.uk/search/international-activities  – and also use the next pagination feature to go through all search result pages

      2) Attribute them to a custom post type, and a custom post type category

      3) Assigned certain scraped values to certain custom fields in the post type

      Thanks!

    • #6869


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Hello,

      First of all, thank you for your purchase.

      1. Crawlomatic can scrape the links, please use these plugin settings:

      Do Not Scrape Seed URL:
      checked

      Seed Page Crawling Query Type:
      Class

      Seed Page Crawling Query String
      j-search-result__text

      2. You can select the custom post type in the ‘Generated Post Type’ settings field. You can add the custom category in the ‘Post Custom Taxonomies’ settings field.

      3. You will need to create custom shortcodes, which will hold the custom scraped content and attribute them to custom fields and custom taxonomies. Please check this tutorial video for info on how to use custom shortcodes in the plugin: https://www.youtube.com/watch?v=OnANHg0OSVw

      Also, this other tutorial video for info on custom field and custom taxonomy usage: https://www.youtube.com/watch?v=GMIojmlI9fA

      I hope this info helps.

      Regards, Szabi – CodeRevolution.

    • #6874


      josefernandilho
      Participant
      Post count: 6

      <div class=”aju”></div>
      <div class=”gs”>
      <div class=””>
      <div id=”:p4″ tabindex=”-1″>Hi Szabi</div>
      <div id=”:ot” class=”ii gt adO”>
      <div id=”:os” class=”a3s aiL “>
      <div dir=”ltr”>
      <div>
      Thanks for the explanations, I’m understanding a bit more but still struggling to actually get the data in correctly. So for</div>
      <div></div>
      <div>1) Custom taxonomies – I have used job_listing_type => uk-ihe-jobs (slug) but this actually just creates a new custom field name uk-ihe-jobs. Should I just use the name of the custom field, as opposed to the slug? (basically I want all posts scraped by this rule to be assigned to my existing custom post type whose slug is uk-ihe-jobs and name is “UK International Higher Education Jobs”)

      2)I’ve created the custom shortcodes but am struggling a bit with the crawler, I used the helper but since the information I am scraping is from a table the CLASS doesn’t really help – is there a way to use HTML selectors?

      as an example this is an example page to be scraped – https://www.jobs.ac.<wbr />uk/job/CXO168/senior-<wbr />international-recruitment-<wbr />officer

      And I would like to scrape the following:</div>
      <div>
      _job_salary => /html/body/div[1]/div[1]/div[<wbr />2]/div/div[1]/table/tbody/tr[<wbr />2]/td
      _job_location => /html/body/div[1]/div[1]/div[<wbr />2]/div/div[1]/table/tbody/tr[<wbr />1]/td
      _company_tagline => /html/body/div[1]/div[1]/div[<wbr />2]/div/div[1]/table/tbody/tr[<wbr />4]/td
      _company_twitter => /html/body/div[1]/div[1]/div[<wbr />2]/div/div[1]/table/tbody/tr[<wbr />3]/td
      _job_expires => /html/body/div[1]/div[1]/div[<wbr />2]/div/div[2]/table/tbody/tr[<wbr />2]/td
      _company_name => /html/body/div[1]/div[1]/h3/b/<wbr />span
      _application => /html/body/div[1]/div[1]/div[<wbr />3]/a/@href

      Is there any way I could tweak the above so it actually works and creates the shortcode with the right data in?

      Thanks!  <span style=”color: #888888;”>

      Jose</span></div>
      </div>
      <div class=”yj6qo ajU”></div>
      </div>
      </div>
      </div>
      </div>

    • #6876


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Hello,

      Can you send me, please, temporary admin login credentials to your WordPress install, so I can check on this and try to help?

      Send it, please, to my email address: kisded@yahoo.com.

      Regards, Szabi – CodeRevolution.

Viewing 3 reply threads

The topic ‘Jobs Scraper’ is closed to new replies.