Unable to Parse Chapters | CodeRevolution Support

This topic is: resolved

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 13 replies, 2 voices, and was last updated 2 years, 4 months ago by Sankalan.

Viewing 13 reply threads

Author

Posts
- November 4, 2023 at 4:08 pm #8973
  
  Sankalan
  Participant
  
  Post count: 16
  
  I am trying to scrape WuxiaWorld.Site.
  
  The method used is WordPress default (though Puppeteer works with some working).
  
  There are two issues with both WordPress and Puppeteer.
  
  1. For some novels, chapters are not scraped at all.
  
  2. Images (featured images) are not pulled from the website.
  
  Please check the attachments.
  
  Another problem:
  
  Though I have a scraping rule set for both NewNovels.org and WuxiaWorld.Site, the plugin is conveniently skipping the tasks for NewNovels.org and focusing only on WuxiaWorld.Site.
  
  Attachments:
  You must be logged in to view attached files.
  
  Add New Note to this Reply
- November 4, 2023 at 4:46 pm #8977
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 5097
  
  Hello,
  
  First of all, thank you for your purchase.
  
  1. Wuxiaworld uses some scraping protection, because of this, Puppeteer usage is required for it. Please send me an example novel URL which is not scraped using Puppeteer.
  
  2. Featured images have a special protection on them, I was not able to download them so far (since an update of the source site, from February 2023). I will be investigating more methods to download these images, but more research is needed.
  
  3. Please use only wuxiaworld scraping, as other sites are not supported in the wuxiaworld scraper of the plugin.
  
  Regards, Szabi – CodeRevolution.
  
  Add New Note to this Reply
- November 4, 2023 at 4:49 pm #8978
  
  Sankalan
  Participant
  
  Post count: 16
  
  Also, in the Release Year field, why is the plugin using this format: 2019-01-01 00:00
  
  Why isn’t it just 2019?
  
  Add New Note to this Reply
- November 4, 2023 at 4:52 pm #8979
  
  Sankalan
  Participant
  
  Post count: 16
  
  Here is an example of the novel not scraped using Puppeteer:
  
  https://readwebnovelsfree.com/novel/cuddle/
  
  I have a set scraping rule for NewNovels.Org in the NewNovels scraper and a Wuxia rule in Wuxia scraper.
  
  Yet, the NewNovels Scraper refuses to work.
  
  Add New Note to this Reply
- November 4, 2023 at 5:01 pm #8980
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 5097
  
  I am sorry, but you can scrape only links from wuxiaworld, not also from other sites.
  
  Regards.
  
  Add New Note to this Reply
- November 4, 2023 at 5:14 pm #8982
  
  Sankalan
  Participant
  
  Post count: 16
  
  Buddy, the example link was from my site where the scraped content is going (getting published FROM WuxiaWorld).
  
  Here is the WuxiaWorld link from which I tried to scrape the content: https://wuxiaworld.site/novel/cuddle/
  
  Do you understand now?
  
  Wuxia World Link: https://wuxiaworld.site/novel/cuddle/ ———-> Scraped content went here ———–> https://readwebnovelsfree.com/novel/cuddle/
  
  I am not trying to scrape content from ReadWebNovelsFree.Com. It is my site. I have that minimum sense needed that I need to scrape content from WuxiaWorld.Site.
  
  —————–
  
  Let’s be more clear. Shall we?
  
  Your plugin has three scrapers for WebNovels:
  
  1. BoxNovels – I am not using it at all.
  
  2. NewNovels – The rule I have set here is not working at all!
  
  3. WuxiaWorld – The rule set here works with both Puppeteer and WordPress, but it fails to parse content/chapters for some novels.
  
  Do you understand now?
  
  I guess, I am not that stupid to not understand these basic facts:
  
  1. WuxiaWorld scraper will scrape only from WuxiaWorld.
  
  2. BoxNovel scraper will scrape only from BoxNovel.
  
  3. NewNovles scraper will scrape only from NewNovels.
  
  ——————
  
  Finally, I need to figure out why the publication year is taking a format like this: 2019-01-01 00:00
  
  Why is the Release year just using the “year”?
  
  Add New Note to this Reply
- November 4, 2023 at 6:12 pm #8984
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 5097
  
  Ok, I understand now. Sorry for missing this.
  
  Please send me temporary admin login credentials to your site and I check on this. My email is kisded@yahoo.com
  
  Regards.
  
  Add New Note to this Reply
- November 4, 2023 at 6:25 pm #8985
  
  Sankalan
  Participant
  
  Post count: 16
  
  If you could just fix two things, it will be awesome:
  
  1. Plugin failing to parse chapters – it is not possible to update chapters manually. So, try and solve this issue.
  
  2. The Release date format – can it just take the year instead of year, date and time?
  
  I sent you the login credentials.
  
  Add New Note to this Reply
- November 4, 2023 at 9:09 pm #8986
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 5097
  
  Hello,
  
  I tried to scrape https://wuxiaworld.site/novel/cuddle/ on your site, and I see in the Activity and Logging menu of the plugin, the following logs, pointing to Puppeteer not being installed correctly to your server. Please check:
  
  Error: Could not find Chrome (ver. 118.0.5993.70). This can occur if either 1. you did not perform an installation before running the script (e.g. npm install) or 2. your cache path is incorrectly configured (which is: /home/readwebnovelsfree.com/.cache/puppeteer). For (2), check out our guide on configuring puppeteer
  
  Please check on this and fix the Puppeteer installation on your server.
  
  Regards.
  
  Add New Note to this Reply
- November 4, 2023 at 9:48 pm #8987
  
  Sankalan
  Participant
  
  Post count: 16
  
  I don’t understand why you are stuck with puppeteer when scraping Wuxia with default wordpress is working fine.
  <div dir=”auto”>You said that you cannot download images from Wuxia since feb 2023. I’m guessing you cannot download images even with puppeteer installed correctly. So puppeteer isn’t really that fabulous on a VPS.</div>
  <div dir=”auto”></div>
  <div dir=”auto”>I have a bare metal dedicated server that is more powerful than a VPS can ever be, and you told me (in my previous ticket) to switch to a vps. Looks to me like you didn’t really test your plugin will all server types – definitely not with a dedicated server.</div>
  <div dir=”auto”></div>
  <div dir=”auto”>You are talking only and only about puppeteer. Have you tried WordPress scraping? What about the remaining problems of the release date format, newnovels.org scraper not working? Are they also dependent on puppeteer?</div>
  <div dir=”auto”></div>
  <div dir=”auto”>“Unable to scrape” is a problem with both puppeteer and wordpress. And you know what? 80% or more of the content I scraped from Wuxia used wordpress.</div>
  <div dir=”auto”></div>
  <div dir=”auto”>Seems to me that you do not want to provide support or you are just not efficient with a dedicated server.</div>
  
  Add New Note to this Reply
- November 4, 2023 at 9:50 pm #8988
  
  Sankalan
  Participant
  
  Post count: 16
  
  I don’t understand why you are stuck with puppeteer when scraping Wuxia with default wordpress is working fine.
  
  You said that you cannot download images from Wuxia since feb 2023. I’m guessing you cannot download images even with puppeteer installed correctly. So puppeteer isn’t really that fabulous on a VPS.
  
  I have a bare metal dedicated server that is more powerful than a VPS can ever be, and you told me (in my previous ticket) to switch to a vps. Looks to me like you didn’t really test your plugin will all server types – definitely not with a dedicated server.
  
  You are talking only and only about puppeteer. Have you tried WordPress scraping? What about the remaining problems of the release date format, newnovels.org scraper not working? Are they also dependent on puppeteer?
  
  “Unable to scrape” is a problem with both puppeteer and wordpress. And you know what? 80% or more of the content I scraped from Wuxia used wordpress.
  
  Seems to me that you do not want to provide support or you are just not efficient with a dedicated server.
  
  Add New Note to this Reply
- November 4, 2023 at 9:55 pm #8989
  
  Sankalan
  Participant
  
  Post count: 16
  
  I need a refund. I don’t need your plugin. I will find something else that works on a bare-metal dedicated server. How do I get a refund?
  
  Add New Note to this Reply
- November 4, 2023 at 10:07 pm #8990
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 5097
  
  I am sorry, but for proper scraping of wuxiaworld, puppeteer is required, as I mentioned before. WordPress scraping is also working, but not in a reliable way. This is why I am insisting for Puppeteer on this matter, as it will solve the 20% left of the scraping issues you mentioned.
  
  Regarding a refund, sure, if you want this, you can make a refund request, here: https://codecanyon.net/refund_requests/new
  
  I am sorry for the inconvenience.
  
  Add New Note to this Reply
- November 4, 2023 at 10:09 pm #8991
  
  Sankalan
  Participant
  
  Post count: 16
  
  Well, I already initiated a refund request with CodeCanyon.
  
  Add New Note to this Reply
Author

Posts

Viewing 13 reply threads

The topic ‘Unable to Parse Chapters’ is closed to new replies.

Attachments: