Auto-crawling problems

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 17 replies, 2 voices, and was last updated 2 months, 4 weeks ago by Szabi – CodeRevolution.

Viewing 17 reply threads
  • Author
    Posts
    • #12463


      congdol
      Participant
      Post count: 14

      I want to crawl news content on the https://m.sports.naver.com/golf/news site.

      The sub-URL of the article is
      ex1) https://m.sports.naver.com/golf/article/009/0005558352
      ex2)https://m.sports.naver.com/golf/article/018/0006115678

      The number after the object/ is different.

      Is there any way I can crawl automatically?

      also,
      If you upload automatically to the post, the main page of my homepage will be dead.
      May I know why?

      my site it : https://golfnfriend.com/

    • #12468


      Szabi – CodeRevolution
      Keymaster
      Post count: 5080

      Hello,

      First of all, thank you for your purchase.

      Can you send me, please, temporary admin login credentials to your WordPress install, so I can check this issue out? Send it, please, to my email address: kisded@yahoo.com.

      Regards,
      Szabi – CodeRevolution.

    • #12471


      congdol
      Participant
      Post count: 14

      I delivered the temporary manager ID and PW by mail.
      Please check.

    • #12474


      congdol
      Participant
      Post count: 14

      Would you like to try again?
      I am connected with the ID I shared and PW.

    • #12475


      Szabi – CodeRevolution
      Keymaster
      Post count: 5080

      Hello,

      I checked and I am getting the same issue, please check the email I sent to you, there I show a screen recording of the issue.

      Regards.

    • #12482


      congdol
      Participant
      Post count: 14

      Thank you for your quick confirmation.
      All security plug-ins have been deleted.
      Please try logging in again.

    • #12483


      congdol
      Participant
      Post count: 14

      Hosting company has set up only Korean servers to access. Currently, it has been modified.
      You’ll be able to access it now.

    • #12487


      Szabi – CodeRevolution
      Keymaster
      Post count: 5080

      Hello,

      Thank you for the login credentials. I checked the site you want to scrape and indeed, scraping it is very hard, as it’s content is fully JavaScript generated (dynamic), everything is rendered on the page after it is loaded (this is why you see the lazy loading placeholders when you load the page).

      I managed to scrape it automatically only using Puppeteer – this needs to be installed on your server, as shown here: https://www.youtube.com/watch?v=pRUDcSOe724 – contact hosting support and ask about this.

      After it is installed, it can be used as shown here: https://www.youtube.com/watch?v=ZljpMpmi_dU

      I managed to scrape the site you mentioned, using the below settings:

      Scraper Start (Seed) URL / Keywords
      https://m.sports.naver.com/golf/index

      Content Scraping Method To Use:
      Puppeteer

      Headless Browser Wait Before Rendering Pages (ms):
      5000

      Do Not Scrape Seed URL:
      Checked

      Seed Page Crawling Query Type:
      Class

      Seed Page Crawling Query String:
      grid_item

      Content Query Type
      Class

      Content Query String
      _article_content

      I hope this helps.

      Regards,
      Szabi – CodeRevolution.

    • #12488


      congdol
      Participant
      Post count: 14

      Thank you very much.
      I was impressed with your skills.

      As you told me, I will proceed using Puppeteer.

      If there is any blockages, I will leave an inquiry again.

      Thank you.

    • #12489


      Szabi – CodeRevolution
      Keymaster
      Post count: 5080

      I am glad to help!

    • #12496


      congdol
      Participant
      Post count: 14

      To install Puppeteer on a Windows operating system
      Do I have to sign up for https://cloud.digitalocean.com/ ?

      I’m inquiring about the cost here as well.

    • #12498


      Szabi – CodeRevolution
      Keymaster
      Post count: 5080

      Puppeteer needs to be installed on your server (where your site is running), not on your local Windows computer. If your current hosting is not allowing Puppeteer install, you can set up your site on Digital Ocean, where you will be able to install Puppeteer.

      If your site is running locally on your computer (localhost), you can install Puppeteer on Windows, as shown here: https://www.youtube.com/watch?v=s4fEYCOIZjk

      Regards.

    • #12499


      congdol
      Participant
      Post count: 14

      I created the site through a hosting company called Cafe24 in Korea.
      (https://hosting.cafe24.com/?controller=new_product_page&page=adsense-wordpress)

      The host company provides WordPress CMS,
      It can be accessed by FTP.

      In this case, I would like to ask if it is okay to install Node.js and npm on the FPT path.

    • #12500


      Szabi – CodeRevolution
      Keymaster
      Post count: 5080

      Well, I am not sure if it is possible to install node.js and npm only using FTP, as I know that you need SSH access for this. Please ask your hosting company if it is possible to grant also SSH access to your server.

      Regards.

    • #12501


      congdol
      Participant
      Post count: 14

      I accessed my site through the PuTTy program.
      However, none of the commands work.

      All the commands say [denial of permission]
      What should I do?

      Attachments:
      You must be logged in to view attached files.
    • #12503


      Szabi – CodeRevolution
      Keymaster
      Post count: 5080

      You have to ask your hosting support about this, they might have some security measure which blocks access.

    • #12506


      congdol
      Participant
      Post count: 14
      This reply has been marked as private.
    • #12509


      Szabi – CodeRevolution
      Keymaster
      Post count: 5080

      I am sorry, but installing Puppeteer using phpMyadmin is not possible. phpMyAdmin is just a PHP web app for managing databases – it has no way to install or run Node.js scripts.

      Sorry for this.

Viewing 17 reply threads

The topic ‘Auto-crawling problems’ is closed to new replies.