Releases: irgroup/repro_eval
Releases · irgroup/repro_eval
v0.5.0
Notes:
The main contribution of this release is the replacement of the evaluation backend. Specifically, ir‑measures replaces pytrec_eval. Future releases of repro_eval could include different evaluation backends; their comparison would be an interesting reproducibility study in its own right. In general, the project’s code is now more agnostic about the evaluation backend. Internally, runs, qrels, and topics use a nested‑dictionary data structure. To integrate another evaluation backend, the code must be added to repro_eval/util.py.
New features:
- ir-measures replaces pytrec_eval as the evaluation backend
- RBO and KTU are now computed as an aggregated score per default, setting
per_topic=Truereturns topic-wise rank correlation scores - RBO is parameterizable from the repro_eval interface
- The old RBO implementation was removed.
- Kendall's tau Union is now called via
ktu()instead ofktau_union()
Bugfixes:
- Some methods were improved to evaluating additional/external runs that are not provided on initialization.
- Rename variable
qreltoqrels.
v0.4.0
v0.3.3
v0.3.2
v0.3.1
- New feature: Faster implementation of the RBO (from the TREC Health Misinformation Track; see also: https://github.com/claclark/Compatibility).