Proxy problems
Hi Szabi, so i tried to scrape products with prices, descriptions etc.. And this site that im trying to scrape, but i failed because in the site there is activated recaptcha, tried using single static proxy, multiple static proxies and rotating proxy, but all of them doesn't solve the problem...
The error that i get: [28-Dec-2022 11:59:40 UTC] Now processing: https://www.ceneo.pl/Zegarki/Typ:Meskie.htm
[28-Dec-2022 11:59:40 UTC] Puppeteer command: node "/var/www/html/wp-content/plugins/crawlomatic-multipage-scraper-post-generator/res/puppeteer/puppeteer.js" "https://www.ceneo.pl/Zegarki/Typ:Meskie.htm" "23.109.113.60:9001~~~9vRzeMMeNZYAOAln:wifi;af;;;" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36" "default" "default" "30000" "default" "default" "default" 2>&1
[28-Dec-2022 12:00:11 UTC] puppeteer failed to download resource: https://www.ceneo.pl/Zegarki/Typ:Meskie.htm - error: /var/www/html/wp-content/plugins/crawlomatic-multipage-scraper-post-generator/res/puppeteer/puppeteer.js:10 process.on('unhandledRejection', up => { throw up }) ^ TimeoutError: Navigation timeout of 30000 ms exceeded at LifecycleWatcher._LifecycleWatcher_createTimeoutPromise (/var/www/html/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:167:12)
[28-Dec-2022 12:00:11 UTC] Delay between requests set(1), waiting 1000 ms
[28-Dec-2022 12:00:22 UTC] crawlomatic_str_get_html failed for page (first attempt), xpath is: product-full-description js_product-full-description overheight!
[28-Dec-2022 12:00:22 UTC] crawlomatic_str_get_html failed for page, xpath: product-full-description js_product-full-description overheight!
[28-Dec-2022 12:00:22 UTC] Already posted, skipping: https://www.ceneo.pl/Zegarki/Typ:Meskie.htm - ID: 1117
[28-Dec-2022 12:00:22 UTC] Crawling seed page for links: https://www.ceneo.pl/Zegarki/Typ:Meskie.htm using: visual = lazyloaded
[28-Dec-2022 12:00:22 UTC] 0 items scraped for URL: https://www.ceneo.pl/Zegarki/Typ:Meskie.htm
[28-Dec-2022 12:00:22 UTC] All crawled posts are already posted or no content found for your query. Rule ID: 1: visual -- lazyloaded