Issue scraping sites where links appear abbreviated in code

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

Tagged: ,

Viewing 1 reply thread
  • Author
    Posts
    • #8417


      AceTrainerK
      Participant
      Post count: 1

      Hi,

       

      Thanks for your amazing plugin. One issue I keep running into is certain websites don’t have links that contain the entire url (i.e. “www.example.com/example-article/”), and instead contain abbreviated links (i.e. just “/example-article”). The crawler always fails on these sites. Is there a way to prepend the URL of the website to these crawled links so they don’t fail?

      An example site that the plugin struggles with is: https://www.iflscience.com/space-and-physics

      Sorry if this has already been answered. I tried searching but couldn’t find anything!

      Thanks so much.

       

    • #8418


      Szabi – CodeRevolution
      Keymaster
      Post count: 4192

      Hello,

      First of all, thank you for your purchase.

      I tested this on my part, and the plugin worked well to scrape the site you mentioned, using the below settings:

       

      Scraper Start (Seed) URL / Keywords
      https://www.iflscience.com/space-and-physics

      Do Not Scrape Seed URL:
      checked

      Seed Page Crawling Query Type
      Class

      Seed Page Crawling Query String
      card-content–body–title

      Content Query Type
      Class

      Content Query String
      article-content

       

      Let me know if this helped.

      Regards, Szabi – CodeRevolution.

Viewing 1 reply thread

The topic ‘Issue scraping sites where links appear abbreviated in code’ is closed to new replies.