Author bloc removal issue

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 3 replies, 2 voices, and was last updated 2 years, 1 month ago by Szabi – CodeRevolution.

Viewing 3 reply threads
  • Author
    Posts
    • #5986


      ZeWeb
      Participant
      Post count: 1

      Hello,

      I am working with your plugin for 2 days ans it works mostly ok for my usage.

      There is one thing that I cannot achieve (on many sources) and this is removing the author bloc under the scraped post content.

      I tried all possible ways witch class or html ID (often not possible because post id is included in html ID).

      What is the trick to get rid of author blocs ?

      I tried many sites some of these : techoffside.com, androidgeet.pt, …

      Thanks for your help

    • #5995


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Hello,

      First of all, thank you for your purchase.

      You can remove author blocks using 2 different methods:

      1. By selecting the exact part of the HTML page you want to scrape (without including the author block). Please check these tutorial videos for info on this: https://www.youtube.com/watch?v=b-_n-q08kXA + https://www.youtube.com/watch?v=eBZulBbvDL0 + https://www.youtube.com/watch?v=Rf755vrzvVc

      2. However, in case of many sources, including the ones you listed, removing the author blocks using the above method will not work, as the sites will include the author info in the HTML block of the post. In this case, you can use the ‘Strip HTML Elements by Class’, ‘Strip HTML Elements by ID’ or ‘Run Regex On Content’ settings fields from importing rule settings, to remove parts of the scraped content.

      For example, in case of AndroindGeek.pt, you can use as below:

       

      Try to Get Full Article Content:
      checked

      Full Content Query Type:
      XPath

      HTML Search Query String:
      //*[@class=’entry-content clearfix single-post-content’]

      Run Regex On Content:
      <div>\n?<div data-adid=”[\s\S]*

       

      Regards, Szabi – CodeRevolution.

    • #6084


      ZeWeb
      Participant
      Post count: 1

      Hello @Szabi

      Thanks for your reply. As point 1 will not work (you wrote it above), I directly tried point 2.

      Unfortunately it did not work, author bio is still there but at least author name has dissappeared.

      The page I refer to is this one https://androidgeek.pt/samsung-galaxy-tab-s8-fe-visto-com-android-13-no-geekbench

      I suppose that the regex <div>\n?<div data-adid=”[\s\S]* is supposed to do the cleaning but I do not really understand it. I checked source code of the page and “div data-adid” is not present.

      I found some data-adid like this

      <span class=”html-attribute-name”>itemtype</span>=”<span class=”html-attribute-value”>https://schema.org/WPAdBlock</span>&#8221; <span class=”html-attribute-name”>data-adid</span>=”<span class=”html-attribute-value”>293581</span>”

      Here are the settings for this rule https://share.getcloudapp.com/BluWWPGe maybe something else is wrong ?

      Thanks

       

       

       

    • #6085


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Hello,

      Can you send me, please, temporary admin login credentials to your WordPress install, so I can check this issue out directly on your site? Send it, please, to my email address: kisded@yahoo.com

      Regards, Szabi – CodeRevolution.

Viewing 3 reply threads

The topic ‘Author bloc removal issue’ is closed to new replies.