Skip to content

v0.5.0

Latest

Choose a tag to compare

@breuert breuert released this 02 Dec 17:44
63b5d8c

Notes:
The main contribution of this release is the replacement of the evaluation backend. Specifically, ir‑measures replaces pytrec_eval. Future releases of repro_eval could include different evaluation backends; their comparison would be an interesting reproducibility study in its own right. In general, the project’s code is now more agnostic about the evaluation backend. Internally, runs, qrels, and topics use a nested‑dictionary data structure. To integrate another evaluation backend, the code must be added to repro_eval/util.py.

New features:

  • ir-measures replaces pytrec_eval as the evaluation backend
  • RBO and KTU are now computed as an aggregated score per default, setting per_topic=True returns topic-wise rank correlation scores
  • RBO is parameterizable from the repro_eval interface
  • The old RBO implementation was removed.
  • Kendall's tau Union is now called via ktu() instead of ktau_union()

Bugfixes:

  • Some methods were improved to evaluating additional/external runs that are not provided on initialization.
  • Rename variable qrel to qrels.