scraping web like brainly.co.id | CodeRevolution Support

This topic is: resolved

Thank you for contacting me. Please note that I live in the GMT+3 time zone - responses might be delayed by this.

This topic has 7 replies, 2 voices, and was last updated 2 years, 11 months ago by Newbe.

Viewing 7 reply threads

Author

Posts
- May 28, 2022 at 12:15 pm #5211
  
  Newbe
  Participant
  
  Post count: 3
  
  hello any tutorial for scraping web like brainly.co.id
  
  thanks
  
  Add New Note to this Reply
- May 28, 2022 at 2:50 pm #5213
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 4834
  
  Hello,
  
  First of all, thank you for your purchase.
  
  Please give me more details about which parts of the site you wish to scrape.
  
  Is it the search result which is found in this location? https://brainly.co.id/app/ask?entry=hero&q=test
  
  Let me know details and I will help.
  
  Regards, Szabi – CodeRevolution.
  
  Add New Note to this Reply
- May 28, 2022 at 9:07 pm #5217
  
  Newbe
  Participant
  
  Post count: 3
  
  This reply has been marked as private.
  
  Add New Note to this Reply
- May 29, 2022 at 11:42 am #5221
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 4834
  
  Hello,
  
  I checked and this website uses JavaScript to render its content, because of this, to be able to scrape it, you need to install Puppeteer on your server and configure the plugin to use it when scraping these sites. Please check this video for details on this: https://www.youtube.com/watch?v=g99IlDkt_SY
  
  How to install puppeteer on your server: https://www.youtube.com/watch?v=KNOIJA4pTQo
  
  If installing puppeteer is not possible on your server, you can also use HeadlessBrowserAPI, which is a cloud service which renders JavaScript on pages and allows scraping of them: https://headlessbrowserapi.com/
  
  Tutorial video on this: https://www.youtube.com/watch?v=rj-LOI-sc14
  
  Regards,
  
  Szabi – CodeRevolution.
  
  Add New Note to this Reply
- May 29, 2022 at 11:44 am #5222
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 4834
  
  Also, please check the below settings I used in the plugin to scrape the page, using puppeteer:
  
  Scraper Start (Seed) URL / Keywords:
  https://brainly.co.id/mapel/matematika
  
  Content Scraping Method To Use:
  Puppeteer
  
  Do Not Scrape Seed URL:
  checked
  
  Seed Page Crawling Query Type:
  Class
  
  Seed Page Crawling Query String:
  brn-feed-item__content
  
  Content Query Type:
  Class
  
  Content Query String:
  question_box_text
  
  Regards.
  
  Add New Note to this Reply
- May 29, 2022 at 12:16 pm #5223
  
  Newbe
  Participant
  
  Post count: 3
  
  Greats Thanks
  
  is there a way to scraping post using post url list on txt ?
  
  Like Import Url List and put it in queue list
  
  Because I don’t want to take all the posts there, just take some of what I need
  
  Regards
  
  Add New Note to this Reply
- May 29, 2022 at 12:53 pm #5226
  
  Szabi – CodeRevolution
  Keymaster
  
  Post count: 4834
  
  Yes, this is possible, for this, you need to create a file containing the URL list you wish to scrape and upload it to your server.
  
  Afterwards, you can start scraping from that specific URL list file.
  
  In this case, please be sure to uncheck the ‘Do Not Crawl External Links’ checkbox in importing rule settings, also set:
  
  Do Not Scrape Seed URL:
  checked
  
  Seed Page Crawling Query Type:
  Auto Detect
  
  I will make a tutorial video on this soon and publish it to my YouTube channel: https://www.youtube.com/channel/UCVLIksvzyk-D_oEdHab2Lgg
  
  Regards.
  
  Add New Note to this Reply
- May 29, 2022 at 12:59 pm #5227
  
  Newbe
  Participant
  
  Post count: 3
  
  Thank you very Much for your help and support
  
  Regard
  
  Add New Note to this Reply
Author

Posts

Viewing 7 reply threads

The topic ‘scraping web like brainly.co.id’ is closed to new replies.