feat(): Introduce domain specific parsers by Dunky13 · Pull Request #605 · cmintey/wishlist

Dunky13 · 2026-01-08T21:00:39Z

As discussed here: #594
I suggested a domain specific parser. Yes, hm.com gave access denied, due to akamai protection, but it did load a few times during testing.

As you can see it can easily be extended by the community with domain specific parsing. I've tried to keep is as simple and reproducable as possible, whilst keeping the original parser intact.

Also introduced typings that are in line with the metascraper package, since Check wasn't exposed anymore in the installed version

cmintey · 2026-01-12T04:06:11Z

Thanks so much for fixing the types!! I haven't thoroughly reviewed yet, but my first thought (and what I was getting at in the issue discussion) is that H&M and Ikea are not "custom". I mean, there might be other sites that benefit from those same rules considering you're just extracting from the Json-LD, so I think those should just be part of the default rule set. bol.com seems to be mostly custom, at least by using the variant id, so that one makes sense to me to be a truly custom rule set. Other than that, again at first glance, I like the domain framework that you've set up. I'll do a thorough review when I have some more time

cmintey

Again, thank you for your work on this and cleaning things up! I have a few comments I'd like to see addressed.

cmintey · 2026-01-19T15:42:30Z

+                const name = jsonld ? getProperty(jsonld, "name") : undefined;
+                return typeof name === "string" ? name : undefined;


You use this pattern a lot and you've actually already created a helper in domain-helpers, so you should use that instead of repeating this code a bunch

cmintey · 2026-01-19T15:46:01Z

+                const formatted = toPriceFormat($('[property="og:price:amount"]').attr("content"));
+                return formatted !== undefined ? String(formatted) : undefined;


It would probably be better to move this logic into the toPriceFormat method to cast to a string

cmintey · 2026-01-19T15:49:36Z

+ * Extract price from JSON-LD offers with proper formatting.
+ */
+export const extractPrice = (data: unknown): string | undefined => {
+    return extractOffersField(data, "price");


This should make use of toPriceFormat

yup; will fix

cmintey · 2026-01-19T15:50:35Z

+ */
+const domains = ["ikea.com"];
+
+export const ikeaComRules: ShoppingDomainRules = rulesForDomain(domains, {


Why does Ikea need a custom parser? This site works for me with the base parser. Also, the rules here are not unique to Ikea and are basically the same as the base parser

I've had pages that didn't load for me - where the current parser did not pick up meta data.

cmintey · 2026-01-19T15:51:18Z

+ */
+const domains = ["hm.com"];
+
+export const hmComRules: ShoppingDomainRules = rulesForDomain(domains, {


I don't think this needs to be a custom parser. There are likely several sites that also use this ProductGroup schema with a variant. I'd prefer these rules just be added to the base parser

One of the reasons why I considered it as a separate app is due to performance. For every rule you add to the "base parser", the more it needs to process, even if the page doesn't contain such an element. It still needs to be parsed (albeit memoized, but still) - where it could've been skipped alltogether..

Also, I used hm, ikea & bol as examples. Happy to omit one or more, if that has a preference

I don't think the performance hit is very large. I'd rather the parsing take just a few ms longer to have a robust set of standard rules and then only have a few custom rules for outliers. Like I think it makes sense for Amazon to have it's own parser, since as you pointed out, there are several Amazon-only rules in the base parser

feat(): Introduce domain specific parsers

fd5a4fc

Dunky13 marked this pull request as draft January 8, 2026 21:04

Dunky13 and others added 4 commits January 8, 2026 22:28

fix: builds

e0c147d

fix: Clean up linting & build

eb240f5

fix: dubblequote

36c3f84

fix: remove unused import

723ae25

Dunky13 marked this pull request as ready for review January 9, 2026 08:59

cmintey requested changes Jan 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(): Introduce domain specific parsers#605

feat(): Introduce domain specific parsers#605
Dunky13 wants to merge 5 commits intocmintey:mainfrom
Dunky13:feat/domain-specific-rules

Dunky13 commented Jan 8, 2026

Uh oh!

cmintey commented Jan 12, 2026

Uh oh!

cmintey left a comment

Uh oh!

cmintey Jan 19, 2026

Uh oh!

cmintey Jan 19, 2026

Uh oh!

cmintey Jan 19, 2026

Uh oh!

Dunky13 Jan 19, 2026

Uh oh!

cmintey Jan 19, 2026

Uh oh!

Dunky13 Jan 19, 2026

Uh oh!

cmintey Jan 19, 2026

Uh oh!

Dunky13 Jan 19, 2026

Uh oh!

cmintey Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		const name = jsonld ? getProperty(jsonld, "name") : undefined;
		return typeof name === "string" ? name : undefined;

		const formatted = toPriceFormat($('[property="og:price:amount"]').attr("content"));
		return formatted !== undefined ? String(formatted) : undefined;

Uh oh!

Conversation

Dunky13 commented Jan 8, 2026

Uh oh!

cmintey commented Jan 12, 2026

Uh oh!

cmintey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants