Skip to content

Commit 4c5a5a8

Browse files
committed
ready
1 parent 01d08c3 commit 4c5a5a8

File tree

2 files changed

+90
-3
lines changed

2 files changed

+90
-3
lines changed

README.md

Lines changed: 75 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,17 @@
11
# Mean Average Precision over words or n-grams with speech features
22

3+
Compute the Mean Average Precision (MAP) with speech features.
4+
More precisely, this is the MAP@R from equation (3) of https://arxiv.org/abs/2003.08505.
5+
36
## Installation
47

58
This package is available on PyPI:
69

710
```bash
8-
[uv] pip install speech-map
11+
pip install speech-map
912
```
1013

11-
It is recommended to use the Faiss backend for the k-NN.
14+
It is much more efficient to use the Faiss backend for the k-NN, instead of the naive PyTorch backend.
1215
Since Faiss is not available on PyPI, you can install this package in a conda environment with your conda variant:
1316

1417
- CPU version:
@@ -22,13 +25,45 @@ Since Faiss is not available on PyPI, you can install this package in a conda en
2225

2326
## Usage
2427

28+
### CLI
29+
30+
```
31+
❯ python -m speech_map --help
32+
usage: __main__.py [-h] [--pooling {MEAN,MAX,MIN,HAMMING}] [--frequency FREQUENCY] [--backend {FAISS,TORCH}] features jsonl
33+
34+
Mean Average Precision over n-grams / words with speech features
35+
36+
positional arguments:
37+
features Path to the directory with pre-computed features
38+
jsonl Path to the JSONL file with annotations
39+
40+
options:
41+
-h, --help show this help message and exit
42+
--pooling {MEAN,MAX,MIN,HAMMING}
43+
Pooling (default: MEAN)
44+
--frequency FREQUENCY
45+
Feature frequency in Hz (default: 50 Hz)
46+
--backend {FAISS,TORCH}
47+
KNN (default: FAISS)
48+
```
49+
50+
### Python API
51+
52+
You most probably need only two functions: `build_embeddings_and_labels` and `mean_average_precision`.
53+
Use them like this:
54+
2555
```python
2656
from speech_map import build_embeddings_and_labels, mean_average_precision
2757
2858
embeddings, labels = build_embeddings_and_labels(path_to_features, path_to_jsonl)
2959
print(mean_average_precision(embeddings, labels))
3060
```
3161

62+
In this example, `path_to_features` is a path to a directory containing features stored in individual PyTorch
63+
tensor files, and `path_to_jsonl` is the path to the JSONL annotations file.
64+
65+
You can also use those functions in a more advanced setting like this:
66+
3267
```python
3368
from speech_map import Pooling, build_embeddings_and_labels, mean_average_precision
3469

@@ -43,3 +78,41 @@ embeddings, labels = build_embeddings_and_labels(
4378
print(mean_average_precision(embeddings, labels))
4479
```
4580

81+
This is a minimal package, and you can easily go through the code in `src/speech_map/core.py` if you want to check the details.
82+
83+
## Data
84+
85+
We distribute in `data` the words and n-grams annotations for LibriSpeech evaluation subsets. Decompress them with zstd.
86+
87+
We have not used the n-grams annotations recently; there is probably too much samples and they would need some clever subsampling.
88+
89+
## References
90+
91+
MAP for speech representations:
92+
93+
```bibtex
94+
@inproceedings{carlin11_interspeech,
95+
title = {Rapid evaluation of speech representations for spoken term discovery},
96+
author = {Michael A. Carlin and Samuel Thomas and Aren Jansen and Hynek Hermansky},
97+
year = {2011},
98+
booktitle = {Interspeech 2011},
99+
pages = {821--824},
100+
doi = {10.21437/Interspeech.2011-304},
101+
issn = {2958-1796},
102+
}
103+
```
104+
105+
Data and original implementation:
106+
107+
```bibtex
108+
@inproceedings{algayres20_interspeech,
109+
title = {Evaluating the Reliability of Acoustic Speech Embeddings},
110+
author = {Robin Algayres and Mohamed Salah Zaiem and Benoît Sagot and Emmanuel Dupoux},
111+
year = {2020},
112+
booktitle = {Interspeech 2020},
113+
pages = {4621--4625},
114+
doi = {10.21437/Interspeech.2020-2362},
115+
issn = {2958-1796},
116+
}
117+
```
118+

src/speech_map/core.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,21 @@ def build_embeddings_and_labels(
121121
device: torch.device | None = None,
122122
file_extension: str = ".pt",
123123
) -> tuple[Tensor, Tensor]:
124-
"""Build the pooled embeddings and labels from the annotations and the pre-computed features."""
124+
"""Build the pooled embeddings and labels from the annotations and the pre-computed features.
125+
126+
Args:
127+
root: Path to the directory with input files.
128+
jsonl: Path to the JSONL file with annotations.
129+
pooling: Pooling to use for the embeddings, either Pooling.MEAN, Pooling.MAX, Pooling.MIN, or Pooling.HAMMING.
130+
frequency: Frequency of the input features, used to compute the segment frontiers.
131+
feature_maker: Function to load the features, default is torch.load. You can use your own model here.
132+
device: Device to use for the embeddings, default is "cuda" if available, otherwise "cpu".
133+
file_extension: Extension of the input files, default is ".pt". If you use your own model,
134+
you can change it to ".wav" for example.
135+
136+
Returns:
137+
A tuple of two tensors: the pooled embeddings and the labels.
138+
"""
125139
if device is None:
126140
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
127141
root, jsonl = Path(root), Path(jsonl)

0 commit comments

Comments
 (0)