Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.
This topic has 4 replies, 2 voices, and was last updated 9 months ago by shandysuta.
-
AuthorPosts
-
-
March 26, 2024 at 11:55 am #10289
Hi Szabi,
I got multiple duplicated posts from my crawlomatic, I don’t know what causes it because some posts are prevented from being duplicated.
This is some of the logs:
[26-Mar-2024 04:17:15 Etc/GMT+7] Crawling seed page for links: https://www.wired.com/category/business/social-media/ using: class = summary-item__content
[26-Mar-2024 04:17:15 Etc/GMT+7] Links returned:
[26-Mar-2024 04:17:15 Etc/GMT+7] 0. /story/meta-kills-crucial-transparency-tool-worst-possible-time/
[26-Mar-2024 04:17:15 Etc/GMT+7] 1. /story/reddit-ipo-price-surge/
[26-Mar-2024 04:17:15 Etc/GMT+7] 2. /story/glassdoor-wants-to-know-your-real-name/
[26-Mar-2024 04:17:15 Etc/GMT+7] 3. /story/reddit-ipo-filings-reveal-the-companys-hopes-and-fears/
[26-Mar-2024 04:17:15 Etc/GMT+7] 4. /story/inside-reddit-protest-ipo/
[26-Mar-2024 04:17:15 Etc/GMT+7] 5. /story/764-com-child-predator-network/
[26-Mar-2024 04:17:15 Etc/GMT+7] 6. /story/yoel-roth-twitter-trust-safety-match-dating-apps/
[26-Mar-2024 04:17:15 Etc/GMT+7] 7. /story/influencers-paid-promote-designer-knockoffs-from-china/
[26-Mar-2024 04:17:15 Etc/GMT+7] 8. /story/senator-asks-meta-tiktok-parents-girls-influencer-accounts/
[26-Mar-2024 04:17:15 Etc/GMT+7] 9. /story/meta-hacked-users-draining-resources/
[26-Mar-2024 04:17:15 Etc/GMT+7] 10. /story/facebook-two-factor-authentication-2fa-change/
[26-Mar-2024 04:17:15 Etc/GMT+7] 11. /story/facebook-instagram-whatsapp-and-threads-back-online-outage/
[26-Mar-2024 04:17:15 Etc/GMT+7] 12. /story/elon-musk-lawsuit-hate-speech-x/
[26-Mar-2024 04:17:15 Etc/GMT+7] 13. /story/reddit-power-users-ipo/
[26-Mar-2024 04:17:15 Etc/GMT+7] 14. /story/bluesky-ceo-jay-graber-wont-enshittify-ads/
[26-Mar-2024 04:17:15 Etc/GMT+7] 15. /story/shou-zi-chew-tik-tok-big-interview/
[26-Mar-2024 04:17:15 Etc/GMT+7] 16. /story/ilan-shor-facebook-ads-moldova-elections/
[26-Mar-2024 04:17:15 Etc/GMT+7] 17. /story/rumble-sec-investigation/
[26-Mar-2024 04:17:15 Etc/GMT+7] 18. /story/flip-viral-video-app-shopping-free-stuff/
[26-Mar-2024 04:17:15 Etc/GMT+7] 19. /story/eu-x-twitter-illegal-content/
[26-Mar-2024 04:17:15 Etc/GMT+7] 20. /story/pinterest-gen-z-future/
[26-Mar-2024 04:17:15 Etc/GMT+7] 21. /story/youtube-hiding-channels-ad-revenue/
[26-Mar-2024 04:17:15 Etc/GMT+7] 22. /story/meta-messenger-instagram-end-to-end-encryption/
[26-Mar-2024 04:17:15 Etc/GMT+7] Now processing: https://www.wired.com/story/meta-kills-crucial-transparency-tool-worst-possible-time/
[26-Mar-2024 04:17:15 Etc/GMT+7] Delay between requests set(8), waiting 3000 ms
[26-Mar-2024 04:17:18 Etc/GMT+7] Puppeteer command: node “/home/shandy/webapps/TheZeroByte/wp-content/plugins/crawlomatic-multipage-scraper-post-generator/res/puppeteer/puppeteer.js” “https://www.wired.com/story/meta-kills-crucial-transparency-tool-worst-possible-time/” “null” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36” “default” “default” “30000” “default” “default” “default” 2>&1
[26-Mar-2024 04:17:29 Etc/GMT+7] URL scraped: https://www.wired.com/story/meta-kills-crucial-transparency-tool-worst-possible-time/
[26-Mar-2024 04:17:29 Etc/GMT+7] Now processing: https://www.wired.com/story/reddit-ipo-price-surge/
[26-Mar-2024 04:17:29 Etc/GMT+7] Delay between requests set(8), waiting 3000 ms
[26-Mar-2024 04:17:32 Etc/GMT+7] Puppeteer command: node “/home/shandy/webapps/TheZeroByte/wp-content/plugins/crawlomatic-multipage-scraper-post-generator/res/puppeteer/puppeteer.js” “https://www.wired.com/story/reddit-ipo-price-surge/” “null” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36” “default” “default” “30000” “default” “default” “default” 2>&1
[26-Mar-2024 04:17:44 Etc/GMT+7] Already posted, skipping: https://www.wired.com/story/reddit-ipo-price-surge/ – ID: 11732
[26-Mar-2024 04:17:44 Etc/GMT+7] Now processing: https://www.wired.com/story/glassdoor-wants-to-know-your-real-name/
[26-Mar-2024 04:17:44 Etc/GMT+7] Delay between requests set(8), waiting 3000 ms________________
As you can see, the number 2. posts is detected as being already posted, but number 1. is being posted again. Screenshot for reference of other post being duplicated 🙂
Regards
Attachments:
You must be logged in to view attached files. -
March 26, 2024 at 3:41 pm #10292
Hello,
First of all, thank you for your purchase.
Please go to the plugin’s ‘Main Settings’ menu -> check the ‘Make Sure No Duplicate Post Titles Are Published’ checkbox -> save settings -> check importing new posts again.
Let me know if this helped.
Regards, Szabi – CodeRevolution.
-
March 27, 2024 at 6:08 am #10297
This reply has been marked as private. -
March 27, 2024 at 6:51 am #10300
Hello,
Can you send me, please, temporary admin login credentials to your WordPress install, so I can check this issue out? Send it, please, to my email address: kisded@yahoo.com
Regards, Szabi – CodeRevolution.
-
March 27, 2024 at 7:21 am #10301
Hi,
I have sent the email 🙂
Please check, thank you
-
-
AuthorPosts
The topic ‘Duplicate posts from Crawlomatic’ is closed to new replies.