Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.
This topic has 3 replies, 2 voices, and was last updated 2 years, 1 month ago by Szabi – CodeRevolution.
-
AuthorPosts
-
-
October 1, 2022 at 8:12 am #5986
Hello,
I am working with your plugin for 2 days ans it works mostly ok for my usage.
There is one thing that I cannot achieve (on many sources) and this is removing the author bloc under the scraped post content.
I tried all possible ways witch class or html ID (often not possible because post id is included in html ID).
What is the trick to get rid of author blocs ?
I tried many sites some of these : techoffside.com, androidgeet.pt, …
Thanks for your help
-
October 1, 2022 at 7:40 pm #5995
Hello,
First of all, thank you for your purchase.
You can remove author blocks using 2 different methods:
1. By selecting the exact part of the HTML page you want to scrape (without including the author block). Please check these tutorial videos for info on this: https://www.youtube.com/watch?v=b-_n-q08kXA + https://www.youtube.com/watch?v=eBZulBbvDL0 + https://www.youtube.com/watch?v=Rf755vrzvVc
2. However, in case of many sources, including the ones you listed, removing the author blocks using the above method will not work, as the sites will include the author info in the HTML block of the post. In this case, you can use the ‘Strip HTML Elements by Class’, ‘Strip HTML Elements by ID’ or ‘Run Regex On Content’ settings fields from importing rule settings, to remove parts of the scraped content.
For example, in case of AndroindGeek.pt, you can use as below:
Try to Get Full Article Content:
checkedFull Content Query Type:
XPathHTML Search Query String:
//*[@class=’entry-content clearfix single-post-content’]Run Regex On Content:
<div>\n?<div data-adid=”[\s\S]*Regards, Szabi – CodeRevolution.
-
October 7, 2022 at 3:16 pm #6084
Hello @Szabi
Thanks for your reply. As point 1 will not work (you wrote it above), I directly tried point 2.
Unfortunately it did not work, author bio is still there but at least author name has dissappeared.
The page I refer to is this one https://androidgeek.pt/samsung-galaxy-tab-s8-fe-visto-com-android-13-no-geekbench
I suppose that the regex <div>\n?<div data-adid=”[\s\S]* is supposed to do the cleaning but I do not really understand it. I checked source code of the page and “div data-adid” is not present.
I found some data-adid like this
<span class=”html-attribute-name”>itemtype</span>=”<span class=”html-attribute-value”>https://schema.org/WPAdBlock</span>” <span class=”html-attribute-name”>data-adid</span>=”<span class=”html-attribute-value”>293581</span>”
Here are the settings for this rule https://share.getcloudapp.com/BluWWPGe maybe something else is wrong ?
Thanks
-
October 7, 2022 at 9:16 pm #6085
Hello,
Can you send me, please, temporary admin login credentials to your WordPress install, so I can check this issue out directly on your site? Send it, please, to my email address: kisded@yahoo.com
Regards, Szabi – CodeRevolution.
-
-
AuthorPosts
The topic ‘Author bloc removal issue’ is closed to new replies.