11# Mean Average Precision over words or n-grams with speech features
22
3+ Compute the Mean Average Precision (MAP) with speech features.
4+ More precisely, this is the MAP@R from equation (3) of https://arxiv.org/abs/2003.08505 .
5+
36## Installation
47
58This package is available on PyPI:
69
710``` bash
8- [uv] pip install speech-map
11+ pip install speech-map
912```
1013
11- It is recommended to use the Faiss backend for the k-NN.
14+ It is much more efficient to use the Faiss backend for the k-NN, instead of the naive PyTorch backend .
1215Since Faiss is not available on PyPI, you can install this package in a conda environment with your conda variant:
1316
1417- CPU version:
@@ -22,13 +25,45 @@ Since Faiss is not available on PyPI, you can install this package in a conda en
2225
2326# # Usage
2427
28+ # ## CLI
29+
30+ ```
31+ ❯ python -m speech_map --help
32+ usage: __ main__ .py [ -h] [ --pooling {MEAN,MAX,MIN,HAMMING}] [ --frequency FREQUENCY] [ --backend {FAISS,TORCH}] features jsonl
33+
34+ Mean Average Precision over n-grams / words with speech features
35+
36+ positional arguments:
37+ features Path to the directory with pre-computed features
38+ jsonl Path to the JSONL file with annotations
39+
40+ options:
41+ -h, --help show this help message and exit
42+ --pooling {MEAN,MAX,MIN,HAMMING}
43+ Pooling (default: MEAN)
44+ --frequency FREQUENCY
45+ Feature frequency in Hz (default: 50 Hz)
46+ --backend {FAISS,TORCH}
47+ KNN (default: FAISS)
48+ ```
49+
50+ ### Python API
51+
52+ You most probably need only two functions: `build_embeddings_and_labels` and `mean_average_precision`.
53+ Use them like this:
54+
2555```python
2656from speech_map import build_embeddings_and_labels, mean_average_precision
2757
2858embeddings, labels = build_embeddings_and_labels(path_to_features, path_to_jsonl)
2959print(mean_average_precision(embeddings, labels))
3060```
3161
62+ In this example, ` path_to_features ` is a path to a directory containing features stored in individual PyTorch
63+ tensor files, and ` path_to_jsonl ` is the path to the JSONL annotations file.
64+
65+ You can also use those functions in a more advanced setting like this:
66+
3267``` python
3368from speech_map import Pooling, build_embeddings_and_labels, mean_average_precision
3469
@@ -43,3 +78,41 @@ embeddings, labels = build_embeddings_and_labels(
4378print (mean_average_precision(embeddings, labels))
4479```
4580
81+ This is a minimal package, and you can easily go through the code in ` src/speech_map/core.py ` if you want to check the details.
82+
83+ ## Data
84+
85+ We distribute in ` data ` the words and n-grams annotations for LibriSpeech evaluation subsets. Decompress them with zstd.
86+
87+ We have not used the n-grams annotations recently; there is probably too much samples and they would need some clever subsampling.
88+
89+ ## References
90+
91+ MAP for speech representations:
92+
93+ ``` bibtex
94+ @inproceedings{carlin11_interspeech,
95+ title = {Rapid evaluation of speech representations for spoken term discovery},
96+ author = {Michael A. Carlin and Samuel Thomas and Aren Jansen and Hynek Hermansky},
97+ year = {2011},
98+ booktitle = {Interspeech 2011},
99+ pages = {821--824},
100+ doi = {10.21437/Interspeech.2011-304},
101+ issn = {2958-1796},
102+ }
103+ ```
104+
105+ Data and original implementation:
106+
107+ ``` bibtex
108+ @inproceedings{algayres20_interspeech,
109+ title = {Evaluating the Reliability of Acoustic Speech Embeddings},
110+ author = {Robin Algayres and Mohamed Salah Zaiem and Benoît Sagot and Emmanuel Dupoux},
111+ year = {2020},
112+ booktitle = {Interspeech 2020},
113+ pages = {4621--4625},
114+ doi = {10.21437/Interspeech.2020-2362},
115+ issn = {2958-1796},
116+ }
117+ ```
118+
0 commit comments