scraping web like brainly.co.id

This topic is: resolved

 

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 7 replies, 2 voices, and was last updated 2 years, 5 months ago by Newbe.

Viewing 7 reply threads
  • Author
    Posts
    • #5211


      Newbe
      Participant
      Post count: 3

      hello any tutorial for scraping web like brainly.co.id

      thanks

    • #5213


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Hello,

      First of all, thank you for your purchase.

      Please give me more details about which parts of the site you wish to scrape.

      Is it the search result which is found in this location? https://brainly.co.id/app/ask?entry=hero&q=test

      Let me know details and I will help.

      Regards, Szabi – CodeRevolution.

    • #5217


      Newbe
      Participant
      Post count: 3
      This reply has been marked as private.
    • #5221


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Hello,

      I checked and this website uses JavaScript to render its content, because of this, to be able to scrape it, you need to install Puppeteer on your server and configure the plugin to use it when scraping these sites. Please check this video for details on this: https://www.youtube.com/watch?v=g99IlDkt_SY
      How to install puppeteer on your server: https://www.youtube.com/watch?v=KNOIJA4pTQo
      If installing puppeteer is not possible on your server, you can also use HeadlessBrowserAPI, which is a cloud service which renders JavaScript on pages and allows scraping of them: https://headlessbrowserapi.com/
      Regards,
      Szabi – CodeRevolution.
    • #5222


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Also, please check the below settings I used in the plugin to scrape the page, using puppeteer:

       

      Scraper Start (Seed) URL / Keywords:
      https://brainly.co.id/mapel/matematika

      Content Scraping Method To Use:
      Puppeteer

      Do Not Scrape Seed URL:
      checked

      Seed Page Crawling Query Type:
      Class

      Seed Page Crawling Query String:
      brn-feed-item__content

      Content Query Type:
      Class

      Content Query String:
      question_box_text

       

      Regards.

    • #5223


      Newbe
      Participant
      Post count: 3

      Greats Thanks

      is there a way to scraping post using post url list on txt ?

      Like Import Url List and put it in queue list

      Because I don’t want to take all the posts there, just take some of what I need

      Regards

    • #5226


      Szabi – CodeRevolution
      Keymaster
      Post count: 4577

      Yes, this is possible, for this, you need to create a file containing the URL list you wish to scrape and upload it to your server.

      Afterwards, you can start scraping from that specific URL list file.

      In this case, please be sure to uncheck the ‘Do Not Crawl External Links’ checkbox in importing rule settings, also set:

      Do Not Scrape Seed URL:
      checked

      Seed Page Crawling Query Type:
      Auto Detect

      I will make a tutorial video on this soon and publish it to my YouTube channel: https://www.youtube.com/channel/UCVLIksvzyk-D_oEdHab2Lgg

      Regards.

    • #5227


      Newbe
      Participant
      Post count: 3

      Thank you very Much for your help and support

      Regard

Viewing 7 reply threads

The topic ‘scraping web like brainly.co.id’ is closed to new replies.