Skip to content

Added 'Exclude' URL Feature to Scraper#12

Draft
adamlaz wants to merge 4 commits intoBuilderIO:mainfrom
adamlaz:feat/exclude-urls
Draft

Added 'Exclude' URL Feature to Scraper#12
adamlaz wants to merge 4 commits intoBuilderIO:mainfrom
adamlaz:feat/exclude-urls

Conversation

@adamlaz
Copy link
Copy Markdown

@adamlaz adamlaz commented Nov 17, 2023

I added a small but handy feature 🏗️

Now you can specify URLs to be ignored 🙈 during the crawls.
This should help skip stuff you don’t need and keep the output data cleaner.

What's in this PR:

  1. Added an exclude field in config.ts for patterns we want to skip.
  2. Tweaked requestHandler in src/main.ts to filter out these URLs.

This PR will close #9 🥳

@adamlaz
Copy link
Copy Markdown
Author

adamlaz commented Nov 17, 2023

Test it out and let me know what you think!

This was quick and first way I thought of, pretty sure this breaks things in normal use cases.. But, it worked when I was just playing with it. I think it needs more logic when not including the exclude param.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Exclude directories

1 participant