Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.
This topic has 13 replies, 2 voices, and was last updated 1 year, 1 month ago by Sankalan.
-
AuthorPosts
-
-
November 4, 2023 at 4:08 pm #8973
I am trying to scrape WuxiaWorld.Site.
The method used is WordPress default (though Puppeteer works with some working).
There are two issues with both WordPress and Puppeteer.
1. For some novels, chapters are not scraped at all.
2. Images (featured images) are not pulled from the website.
Please check the attachments.
Another problem:
Though I have a scraping rule set for both NewNovels.org and WuxiaWorld.Site, the plugin is conveniently skipping the tasks for NewNovels.org and focusing only on WuxiaWorld.Site.
Attachments:
You must be logged in to view attached files. -
November 4, 2023 at 4:46 pm #8977
Hello,
First of all, thank you for your purchase.
1. Wuxiaworld uses some scraping protection, because of this, Puppeteer usage is required for it. Please send me an example novel URL which is not scraped using Puppeteer.
2. Featured images have a special protection on them, I was not able to download them so far (since an update of the source site, from February 2023). I will be investigating more methods to download these images, but more research is needed.
3. Please use only wuxiaworld scraping, as other sites are not supported in the wuxiaworld scraper of the plugin.
Regards, Szabi – CodeRevolution.
-
November 4, 2023 at 4:49 pm #8978
Also, in the Release Year field, why is the plugin using this format: 2019-01-01 00:00
Why isn’t it just 2019?
-
November 4, 2023 at 4:52 pm #8979
Here is an example of the novel not scraped using Puppeteer:
https://readwebnovelsfree.com/novel/cuddle/
I have a set scraping rule for NewNovels.Org in the NewNovels scraper and a Wuxia rule in Wuxia scraper.
Yet, the NewNovels Scraper refuses to work.
-
November 4, 2023 at 5:01 pm #8980
I am sorry, but you can scrape only links from wuxiaworld, not also from other sites.
Regards.
-
November 4, 2023 at 5:14 pm #8982
Buddy, the example link was from my site where the scraped content is going (getting published FROM WuxiaWorld).
Here is the WuxiaWorld link from which I tried to scrape the content: https://wuxiaworld.site/novel/cuddle/
Do you understand now?
Wuxia World Link: https://wuxiaworld.site/novel/cuddle/ ———-> Scraped content went here ———–> https://readwebnovelsfree.com/novel/cuddle/
I am not trying to scrape content from ReadWebNovelsFree.Com. It is my site. I have that minimum sense needed that I need to scrape content from WuxiaWorld.Site.
—————–
Let’s be more clear. Shall we?
Your plugin has three scrapers for WebNovels:
1. BoxNovels – I am not using it at all.
2. NewNovels – The rule I have set here is not working at all!
3. WuxiaWorld – The rule set here works with both Puppeteer and WordPress, but it fails to parse content/chapters for some novels.
Do you understand now?
I guess, I am not that stupid to not understand these basic facts:
1. WuxiaWorld scraper will scrape only from WuxiaWorld.
2. BoxNovel scraper will scrape only from BoxNovel.
3. NewNovles scraper will scrape only from NewNovels.
——————
Finally, I need to figure out why the publication year is taking a format like this: 2019-01-01 00:00
Why is the Release year just using the “year”?
-
November 4, 2023 at 6:12 pm #8984
Ok, I understand now. Sorry for missing this.
Please send me temporary admin login credentials to your site and I check on this. My email is kisded@yahoo.com
Regards.
-
November 4, 2023 at 6:25 pm #8985
If you could just fix two things, it will be awesome:
1. Plugin failing to parse chapters – it is not possible to update chapters manually. So, try and solve this issue.
2. The Release date format – can it just take the year instead of year, date and time?
I sent you the login credentials.
-
November 4, 2023 at 9:09 pm #8986
Hello,
I tried to scrape https://wuxiaworld.site/novel/cuddle/ on your site, and I see in the Activity and Logging menu of the plugin, the following logs, pointing to Puppeteer not being installed correctly to your server. Please check:
Error: Could not find Chrome (ver. 118.0.5993.70). This can occur if either 1. you did not perform an installation before running the script (e.g.
npm install
) or 2. your cache path is incorrectly configured (which is: /home/readwebnovelsfree.com/.cache/puppeteer). For (2), check out our guide on configuring puppeteerPlease check on this and fix the Puppeteer installation on your server.
Regards.
-
November 4, 2023 at 9:48 pm #8987
I don’t understand why you are stuck with puppeteer when scraping Wuxia with default wordpress is working fine.
<div dir=”auto”>You said that you cannot download images from Wuxia since feb 2023. I’m guessing you cannot download images even with puppeteer installed correctly. So puppeteer isn’t really that fabulous on a VPS.</div>
<div dir=”auto”></div>
<div dir=”auto”>I have a bare metal dedicated server that is more powerful than a VPS can ever be, and you told me (in my previous ticket) to switch to a vps. Looks to me like you didn’t really test your plugin will all server types – definitely not with a dedicated server.</div>
<div dir=”auto”></div>
<div dir=”auto”>You are talking only and only about puppeteer. Have you tried WordPress scraping? What about the remaining problems of the release date format, newnovels.org scraper not working? Are they also dependent on puppeteer?</div>
<div dir=”auto”></div>
<div dir=”auto”>“Unable to scrape” is a problem with both puppeteer and wordpress. And you know what? 80% or more of the content I scraped from Wuxia used wordpress.</div>
<div dir=”auto”></div>
<div dir=”auto”>Seems to me that you do not want to provide support or you are just not efficient with a dedicated server.</div> -
November 4, 2023 at 9:50 pm #8988
I don’t understand why you are stuck with puppeteer when scraping Wuxia with default wordpress is working fine.
You said that you cannot download images from Wuxia since feb 2023. I’m guessing you cannot download images even with puppeteer installed correctly. So puppeteer isn’t really that fabulous on a VPS.
I have a bare metal dedicated server that is more powerful than a VPS can ever be, and you told me (in my previous ticket) to switch to a vps. Looks to me like you didn’t really test your plugin will all server types – definitely not with a dedicated server.
You are talking only and only about puppeteer. Have you tried WordPress scraping? What about the remaining problems of the release date format, newnovels.org scraper not working? Are they also dependent on puppeteer?
“Unable to scrape” is a problem with both puppeteer and wordpress. And you know what? 80% or more of the content I scraped from Wuxia used wordpress.
Seems to me that you do not want to provide support or you are just not efficient with a dedicated server.
-
November 4, 2023 at 9:55 pm #8989
I need a refund. I don’t need your plugin. I will find something else that works on a bare-metal dedicated server. How do I get a refund?
-
November 4, 2023 at 10:07 pm #8990
I am sorry, but for proper scraping of wuxiaworld, puppeteer is required, as I mentioned before. WordPress scraping is also working, but not in a reliable way. This is why I am insisting for Puppeteer on this matter, as it will solve the 20% left of the scraping issues you mentioned.
Regarding a refund, sure, if you want this, you can make a refund request, here: https://codecanyon.net/refund_requests/new
I am sorry for the inconvenience.
-
November 4, 2023 at 10:09 pm #8991
Well, I already initiated a refund request with CodeCanyon.
-
-
AuthorPosts
The topic ‘Unable to Parse Chapters’ is closed to new replies.