Seed page not properly loaded

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 7 replies, 2 voices, and was last updated 5 years, 2 months ago by Szabi – CodeRevolution.

Viewing 7 reply threads
  • Author
    Posts
    • #444


      nextup555
      Participant
      Post count: 4

      I’m trying to crawl this page https://appsumo.com/browse/

      but the page is not fully loaded: https://www.dropbox.com/s/quemnee0vopff5u/2019-09-12_13-56-02.png?dl=0

      Can you please check?

      Thanks

    • #447


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Hello,

      First of all, thank you for your purchase.

      I checked and it seems that the page you are trying to crawl is uses JavaScript to display it’s page elements. This is invisible for normal PHP based scrapers like my plugin (only a dummy replacement will be shown). If you disable JavaScript in your browser, and check the respective page, you will see the same result.

      To make this work with the plugin, you will need to use phantomjs with the plugin – which can also execute JavaScript and make importing for this page work.

      PhantomJS needs to be installed on your server (you need access to the server using SSH, which is possible only on VPS or private servers). If you have a shared hosting, you can also ask the hosting provider’s support and ask if phantomjs installation is possible.

      Tutorial video on this: https://www.youtube.com/watch?v=hnEPlQSeAZE

      How to install phantomjs: https://www.youtube.com/watch?v=wWuI1mdIHwA

      Regards, Szabi – CodeRevolution.

    • #448


      nextup555
      Participant
      Post count: 4

      Thank you for the response.

      In the rule settings, it says PhantomJS OK: https://www.dropbox.com/s/jw58u458wta06u9/2019-09-12_17-12-20.png?dl=0

      However, when I try the helper, it still won’t show fully.

    • #451


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      I see now.

      The ‘Crawling Helper’ page was not supporting phantomjs before. I updated the plugin to v1.6.7.2 and added a new checkbox to this page: ‘Use PhantomJS’ – if you check it, phantomjs will be used for crawling in the helper page.

      Regards.

    • #452


      nextup555
      Participant
      Post count: 4

      Wow. Thank you for the fast support.

    • #455


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      I am happy to help.

    • #456


      nextup555
      Participant
      Post count: 4

      I just updated the plugin and it’s showing this error when trying to crawl the above page:

      https://www.dropbox.com/s/gflljz5u2vrwrkl/2019-09-12_18-03-58.png?dl=0

      And the javascript section is still not shown: https://www.dropbox.com/s/6ckn8ouuz8r6ebk/2019-09-12_18-04-58.png?dl=0

    • #457


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      The error is generated by some faulty JavaScript from the page, and PhantomJS is displaying it directly on the screen. The page is not displaying because phantomjs is returning the page content before the JavaScript could render the result properly.

      This is a corner case, usually no extra timeout is needed to show content.

      A quick solution for this issue is if you save the content of the page after it was rendered in your browser, in a HTML file, and upload it to your server – like this, the plugin will work without issue for crawling the URL.

      I added a HTML file for this to this comment, as an attachment.

      Regards.

       

      Attachments:
      You must be logged in to view attached files.
Viewing 7 reply threads

The topic ‘Seed page not properly loaded’ is closed to new replies.