Triying to craw and get content from https://www.mynewsdesk.com/se

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

Viewing 5 reply threads
  • Author
    Posts
    • #3726


      Leif
      Participant
      Post count: 4

      Hi, first of all, I would like to thank you for the amazing plugins that you have created. I’m a proud owner of two of the CRAWLOMATIC AND NEWSOMATIC plugins that I bought the last week and have been using then.

      I do I have a little problem that I have try all the methods without result and I ask for your help.

      I have been trying to crawl content from this site below.

      https://www.mynewsdesk.com/se/stories/

      I only get an error and when I try to use the visual selector only appears a blank page, Can you please help me with this issue.

      Thank you so much in advance,

      Wishing you a nice evening.

      Kind regards.

      Leif Hansen

    • #3729


      Szabi – CodeRevolution
      Keymaster
      Post count: 4573

      Hello,

      First of all, thank you for your purchase.

      I checked the URL you linked and it seems that the specific site uses JavaScript to load its content (after the page is loaded in the browser of the visitor). Because of this, regular PHP scrapers cannot parse these links – because they are not visible for them.

      However, the Crawlomatic plugin can be configured to scrape also this content, if it is combined with a headless browser (like Puppeteer or PhantomJS) – which needs to be installed on your server, or with HeadlessBrowserAPI (an API I created, which provides JavaScript generated content, without the need to install anything on your server).

      Please check details about this in the videos below:

      Puppeteer support: https://www.youtube.com/watch?v=g99IlDkt_SY

      How to install Puppeteer: https://www.youtube.com/watch?v=XkVfYWRZpko

      HeadlessBrowserAPI (as an alternative): https://www.youtube.com/watch?v=205EinBQAoo&list=PLEiGTaa0iBIjDrfexapWc3M28iHwJI5tT&index=2

      I hope this info helps.

      Regards, Szabi – CodeRevolution.

    • #3731


      Leif
      Participant
      Post count: 4

      Hi Szabi,

      Thank you so much for your quick response. I will have a look at the links you have sent to me.

      Once again, thank you so much for an amazing job, creating WordPress plugins solutions.

      Keep the good work and wishing you a wonderful day.

      Kind regards.

      Mario Leif

    • #3732


      Szabi – CodeRevolution
      Keymaster
      Post count: 4573

      I also thank you and a great day to you too!

      Cheers!

    • #3737


      Leif
      Participant
      Post count: 4

      Thank you so much Szabi,  I have created a subscription with you for the HeadlessBrowserAPI, and I have added the API key, and is working.

      Wish you a wonderful day.

      Best regards.

      Mario Leif

    • #3738


      Szabi – CodeRevolution
      Keymaster
      Post count: 4573

      Thank you, i am glad to help!

      Regards.

Viewing 5 reply threads

The topic ‘Triying to craw and get content from https://www.mynewsdesk.com/se’ is closed to new replies.