Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.
This topic has 9 replies, 2 voices, and was last updated 1 year, 4 months ago by Szabi – CodeRevolution.
-
AuthorPosts
-
-
June 29, 2023 at 11:29 pm #7883
salut,
1.
cum pot scrapui
https://www.mdlpa.ro/articles/5
2. https://www.google.com/alerts/feeds/13710377664617919520/9256702252630226250
cum pot lua articolul full din RSS-ul de mai sus? ia doar ce se vede initial, nu intra si pe link ca sa ia tot continutul.
ms
-
June 29, 2023 at 11:55 pm #7884
Am bifat get full content
a luat cateva articole, si a creat si unu gol, am atasat poza
https://drive.google.com/file/d/1MoJPb8Im-9pmGw2wlEpiC33VEZNPBrLp/view?usp=drivesdk
-
June 30, 2023 at 10:07 am #7889
Salut,
1. Pentru a face scraping pe https://www.mdlpa.ro/articles/5 recomand Crawlomatic: https://1.envato.market/crawlomatic
Dar, pentru a importa corect paginile, va trebui sa instalezi pe server Puppeteer si va trebui setat Crawlomatic pentru a-l folosi la scraping. E nevoie de acest lucru, pentru ca aceste pagini folosesc metode de protectie impotriva scraping-ului, iar Puppeteer poate sa le evite.
Tutorial pentru a instala Puppeteer pe server: https://www.youtube.com/watch?v=pRUDcSOe724
Tutorial pentru utilizare Puppeteer cu Crawlomatic: https://www.youtube.com/watch?v=ZljpMpmi_dU
2. Puppeteer va rezolva si problema cu importarea de continut full din aceste surse (problema asemanatoare). Acesta se poate folosi si cu Echo RSS.
Spor!
Szabi – CodeRevolution. -
July 2, 2023 at 8:49 pm #7937
Okey
unde pun locatia la puppeteer?
ex
/home/site/public_html/node_modules/puppeteer
/usr/local/share/phantomjs/bin/phantomjsin settings vad ca am loc doar pt PhantomJS
-
July 3, 2023 at 6:48 am #7940
Nu e nevoie de pus locatia in settings, plugin-ul ar trebui sa il recunoasca automat dupa ce e instalat in public_html.
-
July 3, 2023 at 9:53 pm #7956
unde trebuie instalat puppeteer?
l-au instalat global cei de la suport, dar pluginul tau tot nu il detecteaza
https://drive.google.com/file/d/16sjsx6Nppeyu1r9LdFiCPv6uJIjJCcL8/view?usp=drivesdk
phantomjs e ok
https://drive.google.com/file/d/1mpTgjpodihoca8KkeImeCpPgcy3ZQh9f/view?usp=drivesdk
puppeteer nu e
https://drive.google.com/file/d/1gM0_eL7T0Mxs9Dk8xH9twz5pwOXLj1DA/view?usp=drivesdk
<b>Puppeteer not found! Please install it on your server globally.</b>
-
July 4, 2023 at 7:26 am #7959
Hmm, interesant. Nu am mai vazut caz sa nu fie detectat cand e instalat global. Ok, in acest caz, va trebui instalat in:
1. Folder-ul public_html (unde e instalat tot WordPress-ul).
2. Daca nici asa nu merge, va trebui instalat in \wp-content\plugins\crawlomatic-multipage-scraper-post-generator\res\puppeteer dar in acest caz, va trebui reinstalat dupa fiecare update de plugin.
-
July 4, 2023 at 11:45 am #7962
salut
As requested, we have reinstalled Puppeteer directly in the public_html directory.
So, the Puppeteer path is: /home/XXXX/public_html/node_modules/puppeteer
nici asa nu il detecteaza
-
July 4, 2023 at 11:54 am #7963
e vreo diferenta semnificativa intre phantom JS si puppeteer?
se care ca folodins phantomjs reuseste sa scrapuiasca site-ul ala.
are rost sa ma mai complic cu puppeteer?
-
July 4, 2023 at 12:50 pm #7965
Nu, daca merge cu PhantoJS, poti sa il folosesti, nu e problema.
Interesant ca nu vede Puppeteer. Poate e ceva problema de NodeJS… Dar daca merge PhantomJS, e ok.
-
-
AuthorPosts
The topic ‘problema cu scraping’ is closed to new replies.