Source URL not correct

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 5 replies, 2 voices, and was last updated 4 months ago by Szabi – CodeRevolution.

Viewing 5 reply threads
  • Author
    Posts
    • #12387


      willbo987
      Participant
      Post count: 6

      Hi again

      (Sorry for bombarding!)

      I have noticed that on the MotorTransport site, the correct source url is not being found correctly leading to a 404 error.

      Source URL: https://motortransport.co.uk/notts-haulier-collapsed-owing-creditors-over-11m/26719.article
      Source URL being generated: https://motortransport.co.uk/2024/06/13/b-taylor-sons-collapses-after-failing-to-secure-customer-contracts-and-maintain-vehicles/

      The conical link in the source code of the article is showing the correct url so unsure where Crawlomatic is getting the non-working one?

      I have tried turning ‘ Disable URL Sanitizations:’ ON but that had no effect.

    • #12389


      Szabi – CodeRevolution
      Keymaster
      Post count: 5080

      Hello,

      I checked, but this is not happening on my part.

      Can you send me, please, temporary admin login credentials to your WordPress install, so I can check this issue out? Send it, please, to my email address: kisded@yahoo.com.

      Regards,
      Szabi – CodeRevolution.

    • #12390


      willbo987
      Participant
      Post count: 6
      This reply has been marked as private.
    • #12394


      Szabi – CodeRevolution
      Keymaster
      Post count: 5080

      Hello,
      To fix the image issue, I unchecked the ‘Do Not Copy Featured Image Locally’ checkbox from the plugin’s ‘Main Settings’ menu.
      Regarding the URL you wanted to scrape, I set up a scraping rule in the Crawlomatic rule settings, please check the rule with ID 1, it is working for me, the site is able to be scraped. Let me know if I miss something on this part.
      Regards.

    • #12396


      willbo987
      Participant
      Post count: 6

      Thank you for looking at this!

      The issue was not that it wasnt being scraped.
      The issue was originally the image not being found. But I believe up updated (however that update has not come through yet from Envato?)

      So the main thing I have noticed with the other rule?
      – The scraping method is now WordPress rather than using Puppeteer.
      – I have also noticed that the output did not go through AIomatic? It didnt pick up the source url and place in the end of the content. Could it be AIomatic that is causing the issue with the source url here?
      – Although the image was scraped it placed the image in the content rather than the featured image?

      Appreciate your efforts here

    • #12397


      Szabi – CodeRevolution
      Keymaster
      Post count: 5080

      The post was not processed by Aiomatic, as it is configured to process only drafted posts, and I set the rule I created to publish posts. I changed it now to draft and editing happened.
      I configured Crawlomatic to scrape also featured image, using:

      Featured Image Query Type
      Visual Selector

      Featured Image Query String
      //*[@class=’lazyloaded’]

Viewing 5 reply threads

The topic ‘Source URL not correct’ is closed to new replies.