Triying to craw and get content from https://www.mynewsdesk.com/se

This topic is: resolved

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 5 replies, 2 voices, and was last updated 3 years, 9 months ago by Szabi – CodeRevolution.

Viewing 5 reply threads

Author

Posts
- September 1, 2021 at 7:10 pm #3726
  
  Leif
  Participant
  
  Post count: 4
  
  Hi, first of all, I would like to thank you for the amazing plugins that you have created. I’m a proud owner of two of the CRAWLOMATIC AND NEWSOMATIC plugins that I bought the last week and have been using then.
  
  I do I have a little problem that I have try all the methods without result and I ask for your help.
  
  I have been trying to crawl content from this site below.
  
  https://www.mynewsdesk.com/se/stories/
  
  I only get an error and when I try to use the visual selector only appears a blank page, Can you please help me with this issue.
  
  Thank you so much in advance,
  
  Wishing you a nice evening.
  
  Kind regards.
  
  Leif Hansen
  
  Add New Note to this Reply
- September 2, 2021 at 7:54 am #3729
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 4854
  
  Hello,
  
  First of all, thank you for your purchase.
  
  I checked the URL you linked and it seems that the specific site uses JavaScript to load its content (after the page is loaded in the browser of the visitor). Because of this, regular PHP scrapers cannot parse these links – because they are not visible for them.
  
  However, the Crawlomatic plugin can be configured to scrape also this content, if it is combined with a headless browser (like Puppeteer or PhantomJS) – which needs to be installed on your server, or with HeadlessBrowserAPI (an API I created, which provides JavaScript generated content, without the need to install anything on your server).
  
  Please check details about this in the videos below:
  
  Puppeteer support: https://www.youtube.com/watch?v=g99IlDkt_SY
  
  How to install Puppeteer: https://www.youtube.com/watch?v=XkVfYWRZpko
  
  HeadlessBrowserAPI (as an alternative): https://www.youtube.com/watch?v=205EinBQAoo&list=PLEiGTaa0iBIjDrfexapWc3M28iHwJI5tT&index=2
  
  I hope this info helps.
  
  Regards, Szabi – CodeRevolution.
  
  Add New Note to this Reply
- September 2, 2021 at 8:08 am #3731
  
  Leif
  Participant
  
  Post count: 4
  
  Hi Szabi,
  
  Thank you so much for your quick response. I will have a look at the links you have sent to me.
  
  Once again, thank you so much for an amazing job, creating WordPress plugins solutions.
  
  Keep the good work and wishing you a wonderful day.
  
  Kind regards.
  
  Mario Leif
  
  Add New Note to this Reply
- September 2, 2021 at 8:23 am #3732
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 4854
  
  I also thank you and a great day to you too!
  
  Cheers!
  
  Add New Note to this Reply
- September 2, 2021 at 9:39 am #3737
  
  Leif
  Participant
  
  Post count: 4
  
  Thank you so much Szabi, I have created a subscription with you for the HeadlessBrowserAPI, and I have added the API key, and is working.
  
  Wish you a wonderful day.
  
  Best regards.
  
  Mario Leif
  
  Add New Note to this Reply
- September 2, 2021 at 9:40 am #3738
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 4854
  
  Thank you, i am glad to help!
  
  Regards.
  
  Add New Note to this Reply
Author

Posts

Viewing 5 reply threads

The topic ‘Triying to craw and get content from https://www.mynewsdesk.com/se’ is closed to new replies.