Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.
This topic has 7 replies, 2 voices, and was last updated 5 years, 2 months ago by Szabi – CodeRevolution.
-
AuthorPosts
-
-
September 12, 2019 at 6:57 am #444
I’m trying to crawl this page https://appsumo.com/browse/
but the page is not fully loaded: https://www.dropbox.com/s/quemnee0vopff5u/2019-09-12_13-56-02.png?dl=0
Can you please check?
Thanks
-
September 12, 2019 at 10:03 am #447
Hello,
First of all, thank you for your purchase.
I checked and it seems that the page you are trying to crawl is uses JavaScript to display it’s page elements. This is invisible for normal PHP based scrapers like my plugin (only a dummy replacement will be shown). If you disable JavaScript in your browser, and check the respective page, you will see the same result.
To make this work with the plugin, you will need to use phantomjs with the plugin – which can also execute JavaScript and make importing for this page work.
PhantomJS needs to be installed on your server (you need access to the server using SSH, which is possible only on VPS or private servers). If you have a shared hosting, you can also ask the hosting provider’s support and ask if phantomjs installation is possible.
Tutorial video on this: https://www.youtube.com/watch?v=hnEPlQSeAZE
How to install phantomjs: https://www.youtube.com/watch?v=wWuI1mdIHwA
Regards, Szabi – CodeRevolution.
-
September 12, 2019 at 10:14 am #448
Thank you for the response.
In the rule settings, it says PhantomJS OK: https://www.dropbox.com/s/jw58u458wta06u9/2019-09-12_17-12-20.png?dl=0
However, when I try the helper, it still won’t show fully.
-
September 12, 2019 at 10:49 am #451
I see now.
The ‘Crawling Helper’ page was not supporting phantomjs before. I updated the plugin to v1.6.7.2 and added a new checkbox to this page: ‘Use PhantomJS’ – if you check it, phantomjs will be used for crawling in the helper page.
Regards.
-
September 12, 2019 at 10:52 am #452
Wow. Thank you for the fast support.
-
September 12, 2019 at 10:58 am #455
I am happy to help.
-
September 12, 2019 at 11:05 am #456
I just updated the plugin and it’s showing this error when trying to crawl the above page:
https://www.dropbox.com/s/gflljz5u2vrwrkl/2019-09-12_18-03-58.png?dl=0
And the javascript section is still not shown: https://www.dropbox.com/s/6ckn8ouuz8r6ebk/2019-09-12_18-04-58.png?dl=0
-
September 12, 2019 at 12:12 pm #457
The error is generated by some faulty JavaScript from the page, and PhantomJS is displaying it directly on the screen. The page is not displaying because phantomjs is returning the page content before the JavaScript could render the result properly.
This is a corner case, usually no extra timeout is needed to show content.
A quick solution for this issue is if you save the content of the page after it was rendered in your browser, in a HTML file, and upload it to your server – like this, the plugin will work without issue for crawling the URL.
I added a HTML file for this to this comment, as an attachment.
Regards.
Attachments:
You must be logged in to view attached files.
-
-
AuthorPosts
The topic ‘Seed page not properly loaded’ is closed to new replies.