How to remove specific html attributes from content crawling

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 3 replies, 2 voices, and was last updated 10 months, 2 weeks ago by Szabi – CodeRevolution.

Viewing 3 reply threads
  • Author
    Posts
    • #9799


      acluke
      Participant
      Post count: 11

      Hi,

      I used //div[@id=’procuct-table’] to crawl a spec list from a page.

      But there is a class which will cause the list will be hidden on my page. class=”tab-pane fade py-sm-5″. I can’t strip the whole html because it’s a table list..

      Is it possible to only remove this class attributes when crawling?

       

      Here is the content table I crawled for reference:

      https://sanlux.com.tw/product-detail/600

      <div id=”procuct-table” class=”tab-pane fade py-sm-5″ role=”tabpanel” aria-labelledby=”nav-profile-tab”>
      <table id=”tab” class=”tableCompare indetail table”>
      <tbody>
      <tr class=”tr2″ data-rttitle=”Spec”>
      <td class=”sort-leA text-sm-end text-center pe-sm-3″>Spec</td>…….

       

      Thanks, Luke

    • #9805


      Szabi – CodeRevolution
      Keymaster
      Post count: 4620

      Hello,

      Sure, please add in settings:

      Strip HTML Elements by Class
      tab-pane fade py-sm-5

      Regards,

      Szabi – CodeRevolution.

    • #9813


      acluke
      Participant
      Post count: 11

      Hi, I just tried, but it will also removed the element I need to crawl.

      I need //div[@id=”procuct-table”] but wanna remove its class attributes “tab-pane fade py-sm-5″, is it possible to do?

      Here is the HTML for reference, thanks.

      <div id=”procuct-table” class=”tab-pane fade py-sm-5″ role=”tabpanel” aria-labelledby=”nav-profile-tab”>

    • #9815


      Szabi – CodeRevolution
      Keymaster
      Post count: 4620

      In this case, please try:

      Run Regex On Content:
      tab-pane fade py-sm-5

      Regards.

Viewing 3 reply threads

The topic ‘How to remove specific html attributes from content crawling’ is closed to new replies.