Site icon CodeRevolution Support

Duplicate posts from Crawlomatic

Hi Szabi, I got multiple duplicated posts from my crawlomatic, I don't know what causes it because some posts are prevented from being duplicated. This is some of the logs: [26-Mar-2024 04:17:15 Etc/GMT+7] Crawling seed page for links: https://www.wired.com/category/business/social-media/ using: class = summary-item__content [26-Mar-2024 04:17:15 Etc/GMT+7] Links returned: [26-Mar-2024 04:17:15 Etc/GMT+7] 0. /story/meta-kills-crucial-transparency-tool-worst-possible-time/ [26-Mar-2024 04:17:15 Etc/GMT+7] 1. /story/reddit-ipo-price-surge/ [26-Mar-2024 04:17:15 Etc/GMT+7] 2. /story/glassdoor-wants-to-know-your-real-name/ [26-Mar-2024 04:17:15 Etc/GMT+7] 3. /story/reddit-ipo-filings-reveal-the-companys-hopes-and-fears/ [26-Mar-2024 04:17:15 Etc/GMT+7] 4. /story/inside-reddit-protest-ipo/ [26-Mar-2024 04:17:15 Etc/GMT+7] 5. /story/764-com-child-predator-network/ [26-Mar-2024 04:17:15 Etc/GMT+7] 6. /story/yoel-roth-twitter-trust-safety-match-dating-apps/ [26-Mar-2024 04:17:15 Etc/GMT+7] 7. /story/influencers-paid-promote-designer-knockoffs-from-china/ [26-Mar-2024 04:17:15 Etc/GMT+7] 8. /story/senator-asks-meta-tiktok-parents-girls-influencer-accounts/ [26-Mar-2024 04:17:15 Etc/GMT+7] 9. /story/meta-hacked-users-draining-resources/ [26-Mar-2024 04:17:15 Etc/GMT+7] 10. /story/facebook-two-factor-authentication-2fa-change/ [26-Mar-2024 04:17:15 Etc/GMT+7] 11. /story/facebook-instagram-whatsapp-and-threads-back-online-outage/ [26-Mar-2024 04:17:15 Etc/GMT+7] 12. /story/elon-musk-lawsuit-hate-speech-x/ [26-Mar-2024 04:17:15 Etc/GMT+7] 13. /story/reddit-power-users-ipo/ [26-Mar-2024 04:17:15 Etc/GMT+7] 14. /story/bluesky-ceo-jay-graber-wont-enshittify-ads/ [26-Mar-2024 04:17:15 Etc/GMT+7] 15. /story/shou-zi-chew-tik-tok-big-interview/ [26-Mar-2024 04:17:15 Etc/GMT+7] 16. /story/ilan-shor-facebook-ads-moldova-elections/ [26-Mar-2024 04:17:15 Etc/GMT+7] 17. /story/rumble-sec-investigation/ [26-Mar-2024 04:17:15 Etc/GMT+7] 18. /story/flip-viral-video-app-shopping-free-stuff/ [26-Mar-2024 04:17:15 Etc/GMT+7] 19. /story/eu-x-twitter-illegal-content/ [26-Mar-2024 04:17:15 Etc/GMT+7] 20. /story/pinterest-gen-z-future/ [26-Mar-2024 04:17:15 Etc/GMT+7] 21. /story/youtube-hiding-channels-ad-revenue/ [26-Mar-2024 04:17:15 Etc/GMT+7] 22. /story/meta-messenger-instagram-end-to-end-encryption/ [26-Mar-2024 04:17:15 Etc/GMT+7] Now processing: https://www.wired.com/story/meta-kills-crucial-transparency-tool-worst-possible-time/ [26-Mar-2024 04:17:15 Etc/GMT+7] Delay between requests set(8), waiting 3000 ms [26-Mar-2024 04:17:18 Etc/GMT+7] Puppeteer command: node "/home/shandy/webapps/TheZeroByte/wp-content/plugins/crawlomatic-multipage-scraper-post-generator/res/puppeteer/puppeteer.js" "https://www.wired.com/story/meta-kills-crucial-transparency-tool-worst-possible-time/" "null" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36" "default" "default" "30000" "default" "default" "default" 2>&1 [26-Mar-2024 04:17:29 Etc/GMT+7] URL scraped: https://www.wired.com/story/meta-kills-crucial-transparency-tool-worst-possible-time/ [26-Mar-2024 04:17:29 Etc/GMT+7] Now processing: https://www.wired.com/story/reddit-ipo-price-surge/ [26-Mar-2024 04:17:29 Etc/GMT+7] Delay between requests set(8), waiting 3000 ms [26-Mar-2024 04:17:32 Etc/GMT+7] Puppeteer command: node "/home/shandy/webapps/TheZeroByte/wp-content/plugins/crawlomatic-multipage-scraper-post-generator/res/puppeteer/puppeteer.js" "https://www.wired.com/story/reddit-ipo-price-surge/" "null" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36" "default" "default" "30000" "default" "default" "default" 2>&1 [26-Mar-2024 04:17:44 Etc/GMT+7] Already posted, skipping: https://www.wired.com/story/reddit-ipo-price-surge/ - ID: 11732 [26-Mar-2024 04:17:44 Etc/GMT+7] Now processing: https://www.wired.com/story/glassdoor-wants-to-know-your-real-name/ [26-Mar-2024 04:17:44 Etc/GMT+7] Delay between requests set(8), waiting 3000 ms ________________ As you can see, the number 2. posts is detected as being already posted, but number 1. is being posted again. Screenshot for reference of other post being duplicated :) Regards
Exit mobile version