zupix

Forum Replies Created

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • in reply to: Guest Author #6805


    zupix
    Participant
    Post count: 3

    I know this is a strech but is there an option to also make a “reverse restriction” under the “Crawling restriction:” and “URL Patterns to Not Crawl and Import:”.

    So that only URL with specific pattern would be imported. Because in this case I only need to insert one URL pattern that contains the category name in the URL and not all others possible URL category patterns that I need to exclude.

    Just for example:

    A lot of the sites have URL patterns like this http://www.example.com/category1/title of the article

    So with the reverse rule restriction I could only insert http://www.example.com/category1/ URL pattern and all other possible categories would be excluded.

    Attachments:
    You must be logged in to view attached files.
  • in reply to: Guest Author #6803


    zupix
    Participant
    Post count: 3

    Yes it works great.

    Can you also tell me how to sort the scraping of websites by category? I tried a few options in the settings but nothing is working as I would like.

    My problem is this:

    Let us take theverge.com for example. If I choose TECH(www.theverge.com/tech) as a category I have posts within that category that are related to TECH category but also a few that are related to other categories like GOOGLE, TRANSPORTATION, REVIEWS, HOW-TO etc… I would only like to scrape posts related to TECH category. I watched a few of your videos related to this question but still can’t make it work. I know it is probaly stupid simple but with so many options I am not sure what should be enabled and what not under category customizations.

  • in reply to: Guest Author #6801


    zupix
    Participant
    Post count: 3

    Wow. Amazing support. Things like this makes you feel money well spent on a plugin. Just a quick question? Could there also be an option for domain without the “www” prefix? Like “theverge.com” instead of “www.theverge.com”. Not trying to be picky after your quick implementation but I had to ask.

Viewing 3 posts - 1 through 3 (of 3 total)