Skip to content

improve English subtypes refs finder, add tests#1155

Open
goose-life wants to merge 1 commit intomasterfrom
subtype-refs-refined
Open

improve English subtypes refs finder, add tests#1155
goose-life wants to merge 1 commit intomasterfrom
subtype-refs-refined

Conversation

@goose-life
Copy link
Copy Markdown
Collaborator

No description provided.

self.subtypes_string = '|'.join([re.escape(s) for s in self.subtype_names + self.subtype_abbreviations])
self.full_subtypes = Subtype.objects.all()
self.subtypes = [s.name for s in self.full_subtypes] + \
[s.abbreviation for s in self.full_subtypes]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given what we've learnt from term matching and italics, does it make sense to sort these longest first?

(?P<num>\d+)
(\s+of\s+|/)
(?P<year>\d{{4}})
.{{0,60}}(\s|/)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me a bit nervous, it's pretty broad.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neil had an interesting idea which maybe we need to start exploring: start with a broad regex, and then filter down edge cases with supplemental regexes. His argument is it helps to keep the regexes simple and understandable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eg. this would be a false positive: as published in GN No 102. In section 4378 of the Act...

This stuff is tricky.

Comment thread indigo/tests/test_refs.py
<paragraph eId="sec_1.paragraph-0">
<content>
<p>Something to do with <ref href="/akn/za/act/gn/2012/102">GN no 102 published on 5 March 2012</ref>.</p>
<p>And another thing about <ref href="/akn/za/act/si/1998/4">SI 4, published in Government Gazette 12345 of 31 January 1998</ref>.</p>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should only markup the GN no 102 and SI 4?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants