Duplicate posts from Crawlomatic

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

Viewing 4 reply threads
  • Author
    Posts
    • #10289


      shandysuta
      Participant
      Post count: 11

      Hi Szabi,

      I got multiple duplicated posts from my crawlomatic, I don’t know what causes it because some posts are prevented from being duplicated.

      This is some of the logs:

      [26-Mar-2024 04:17:15 Etc/GMT+7] Crawling seed page for links: https://www.wired.com/category/business/social-media/ using: class = summary-item__content
      [26-Mar-2024 04:17:15 Etc/GMT+7] Links returned:
      [26-Mar-2024 04:17:15 Etc/GMT+7] 0. /story/meta-kills-crucial-transparency-tool-worst-possible-time/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 1. /story/reddit-ipo-price-surge/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 2. /story/glassdoor-wants-to-know-your-real-name/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 3. /story/reddit-ipo-filings-reveal-the-companys-hopes-and-fears/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 4. /story/inside-reddit-protest-ipo/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 5. /story/764-com-child-predator-network/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 6. /story/yoel-roth-twitter-trust-safety-match-dating-apps/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 7. /story/influencers-paid-promote-designer-knockoffs-from-china/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 8. /story/senator-asks-meta-tiktok-parents-girls-influencer-accounts/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 9. /story/meta-hacked-users-draining-resources/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 10. /story/facebook-two-factor-authentication-2fa-change/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 11. /story/facebook-instagram-whatsapp-and-threads-back-online-outage/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 12. /story/elon-musk-lawsuit-hate-speech-x/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 13. /story/reddit-power-users-ipo/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 14. /story/bluesky-ceo-jay-graber-wont-enshittify-ads/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 15. /story/shou-zi-chew-tik-tok-big-interview/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 16. /story/ilan-shor-facebook-ads-moldova-elections/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 17. /story/rumble-sec-investigation/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 18. /story/flip-viral-video-app-shopping-free-stuff/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 19. /story/eu-x-twitter-illegal-content/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 20. /story/pinterest-gen-z-future/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 21. /story/youtube-hiding-channels-ad-revenue/
      [26-Mar-2024 04:17:15 Etc/GMT+7] 22. /story/meta-messenger-instagram-end-to-end-encryption/
      [26-Mar-2024 04:17:15 Etc/GMT+7] Now processing: https://www.wired.com/story/meta-kills-crucial-transparency-tool-worst-possible-time/
      [26-Mar-2024 04:17:15 Etc/GMT+7] Delay between requests set(8), waiting 3000 ms
      [26-Mar-2024 04:17:18 Etc/GMT+7] Puppeteer command: node “/home/shandy/webapps/TheZeroByte/wp-content/plugins/crawlomatic-multipage-scraper-post-generator/res/puppeteer/puppeteer.js” “https://www.wired.com/story/meta-kills-crucial-transparency-tool-worst-possible-time/” “null” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36” “default” “default” “30000” “default” “default” “default” 2>&1
      [26-Mar-2024 04:17:29 Etc/GMT+7] URL scraped: https://www.wired.com/story/meta-kills-crucial-transparency-tool-worst-possible-time/
      [26-Mar-2024 04:17:29 Etc/GMT+7] Now processing: https://www.wired.com/story/reddit-ipo-price-surge/
      [26-Mar-2024 04:17:29 Etc/GMT+7] Delay between requests set(8), waiting 3000 ms
      [26-Mar-2024 04:17:32 Etc/GMT+7] Puppeteer command: node “/home/shandy/webapps/TheZeroByte/wp-content/plugins/crawlomatic-multipage-scraper-post-generator/res/puppeteer/puppeteer.js” “https://www.wired.com/story/reddit-ipo-price-surge/” “null” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36” “default” “default” “30000” “default” “default” “default” 2>&1
      [26-Mar-2024 04:17:44 Etc/GMT+7] Already posted, skipping: https://www.wired.com/story/reddit-ipo-price-surge/ – ID: 11732
      [26-Mar-2024 04:17:44 Etc/GMT+7] Now processing: https://www.wired.com/story/glassdoor-wants-to-know-your-real-name/
      [26-Mar-2024 04:17:44 Etc/GMT+7] Delay between requests set(8), waiting 3000 ms

      ________________

      As you can see, the number 2. posts is detected as being already posted, but number 1. is being posted again. Screenshot for reference of other post being duplicated 🙂

      Regards

      Attachments:
      You must be logged in to view attached files.
    • #10292


      Szabi – CodeRevolution
      Keymaster
      Post count: 4195

      Hello,

      First of all, thank you for your purchase.

      Please go to the plugin’s ‘Main Settings’ menu -> check the ‘Make Sure No Duplicate Post Titles Are Published’ checkbox -> save settings -> check importing new posts again.

      Let me know if this helped.

      Regards, Szabi – CodeRevolution.

    • #10297


      shandysuta
      Participant
      Post count: 11
      This reply has been marked as private.
    • #10300


      Szabi – CodeRevolution
      Keymaster
      Post count: 4195

      Hello,

      Can you send me, please, temporary admin login credentials to your WordPress install, so I can check this issue out? Send it, please, to my email address: [email protected]

      Regards, Szabi – CodeRevolution.

    • #10301


      shandysuta
      Participant
      Post count: 11

      Hi,

      I have sent the email 🙂
      Please check, thank you

Viewing 4 reply threads

The topic ‘Duplicate posts from Crawlomatic’ is closed to new replies.