problema cu scraping

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 9 replies, 2 voices, and was last updated 1 year, 4 months ago by Szabi – CodeRevolution.

Viewing 9 reply threads
  • Author
    Posts
    • #7883


      dbs00
      Participant
      Beginner
      Post count: 5

      salut,

      1.

      cum pot scrapui

      https://www.mdlpa.ro/articles/5

      2. https://www.google.com/alerts/feeds/13710377664617919520/9256702252630226250

      cum pot lua articolul full din RSS-ul de mai sus? ia doar ce se vede initial, nu intra si pe link ca sa ia tot continutul.

       

      ms

    • #7884


      dbs00
      Participant
      Beginner
      Post count: 5

      Am bifat get full content

      a luat cateva articole, si a creat si unu gol, am atasat poza

      https://drive.google.com/file/d/1MoJPb8Im-9pmGw2wlEpiC33VEZNPBrLp/view?usp=drivesdk

    • #7889


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Salut,

      1. Pentru a face scraping pe https://www.mdlpa.ro/articles/5 recomand Crawlomatic: https://1.envato.market/crawlomatic

      Dar, pentru a importa corect paginile, va trebui sa instalezi pe server Puppeteer si va trebui setat Crawlomatic pentru a-l folosi la scraping. E nevoie de acest lucru, pentru ca aceste pagini folosesc metode de protectie impotriva scraping-ului, iar Puppeteer poate sa le evite.

      Tutorial pentru a instala Puppeteer pe server: https://www.youtube.com/watch?v=pRUDcSOe724

      Tutorial pentru utilizare Puppeteer cu Crawlomatic: https://www.youtube.com/watch?v=ZljpMpmi_dU

      2. Puppeteer va rezolva si problema cu importarea de continut full din aceste surse (problema asemanatoare). Acesta se poate folosi si cu Echo RSS.

      Spor!
      Szabi – CodeRevolution.

    • #7937


      dbs00
      Participant
      Beginner
      Post count: 5

      Okey

      unde pun locatia la puppeteer?

      ex

      /home/site/public_html/node_modules/puppeteer
      /usr/local/share/phantomjs/bin/phantomjs

      in settings vad ca am loc doar pt PhantomJS

    • #7940


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Nu e nevoie de pus locatia in settings, plugin-ul ar trebui sa il recunoasca automat dupa ce e instalat in public_html.

    • #7956


      dbs00
      Participant
      Beginner
      Post count: 5

      unde trebuie instalat puppeteer?

      l-au instalat global cei de la suport, dar pluginul tau tot nu il detecteaza

      https://drive.google.com/file/d/16sjsx6Nppeyu1r9LdFiCPv6uJIjJCcL8/view?usp=drivesdk

      phantomjs e ok

       

      https://drive.google.com/file/d/1mpTgjpodihoca8KkeImeCpPgcy3ZQh9f/view?usp=drivesdk

      puppeteer nu e

      https://drive.google.com/file/d/1gM0_eL7T0Mxs9Dk8xH9twz5pwOXLj1DA/view?usp=drivesdk

      <b>Puppeteer not found! Please install it on your server globally.</b>

       

       

    • #7959


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Hmm, interesant. Nu am mai vazut caz sa nu fie detectat cand e instalat global. Ok, in acest caz, va trebui instalat in:

      1. Folder-ul public_html (unde e instalat tot WordPress-ul).

      2. Daca nici asa nu merge, va trebui instalat in \wp-content\plugins\crawlomatic-multipage-scraper-post-generator\res\puppeteer dar in acest caz, va trebui reinstalat dupa fiecare update de plugin.

       

    • #7962


      dbs00
      Participant
      Beginner
      Post count: 5

      salut

      As requested, we have reinstalled Puppeteer directly in the public_html directory.

       

      So, the Puppeteer path is: /home/XXXX/public_html/node_modules/puppeteer

       

      nici asa nu il detecteaza

    • #7963


      dbs00
      Participant
      Beginner
      Post count: 5

      e vreo diferenta semnificativa intre phantom JS si puppeteer?

      se care ca folodins phantomjs reuseste sa scrapuiasca site-ul ala.

       

      are rost sa ma mai complic cu puppeteer?

    • #7965


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Nu, daca merge cu PhantoJS, poti sa il folosesti, nu e problema.

      Interesant ca nu vede Puppeteer. Poate e ceva problema de NodeJS… Dar daca merge PhantomJS, e ok.

Viewing 9 reply threads

The topic ‘problema cu scraping’ is closed to new replies.