ready

mxmpl · mxmpl · commit 4c5a5a8d249a · 2025-06-26T22:32:34.000+02:00
diff --git a/README.md b/README.md
@@ -1,14 +1,17 @@
 # Mean Average Precision over words or n-grams with speech features
 
+Compute the Mean Average Precision (MAP) with speech features.
+More precisely, this is the MAP@R from equation (3) of https://arxiv.org/abs/2003.08505.
+
 ## Installation
 
 This package is available on PyPI:
 
 ```bash
-[uv] pip install speech-map
+pip install speech-map
 ```
 
-It is recommended to use the Faiss backend for the k-NN.
+It is much more efficient to use the Faiss backend for the k-NN, instead of the naive PyTorch backend.
 Since Faiss is not available on PyPI, you can install this package in a conda environment with your conda variant:
 
 - CPU version:
@@ -22,13 +25,45 @@ Since Faiss is not available on PyPI, you can install this package in a conda en
 
 ## Usage
 
+### CLI
+
+```
+❯ python -m speech_map --help
+usage: __main__.py [-h] [--pooling {MEAN,MAX,MIN,HAMMING}] [--frequency FREQUENCY] [--backend {FAISS,TORCH}] features jsonl
+
+Mean Average Precision over n-grams / words with speech features
+
+positional arguments:
+  features              Path to the directory with pre-computed features
+  jsonl                 Path to the JSONL file with annotations
+
+options:
+  -h, --help            show this help message and exit
+  --pooling {MEAN,MAX,MIN,HAMMING}
+                        Pooling (default: MEAN)
+  --frequency FREQUENCY
+                        Feature frequency in Hz (default: 50 Hz)
+  --backend {FAISS,TORCH}
+                        KNN (default: FAISS)
+```
+
+### Python API
+
+You most probably need only two functions: `build_embeddings_and_labels` and `mean_average_precision`.
+Use them like this:
+
 ```python
 from speech_map import build_embeddings_and_labels, mean_average_precision
 
 embeddings, labels = build_embeddings_and_labels(path_to_features, path_to_jsonl)
 print(mean_average_precision(embeddings, labels))
 ```
 
+In this example, `path_to_features` is a path to a directory containing features stored in individual PyTorch
+tensor files, and `path_to_jsonl` is the path to the JSONL annotations file.
+
+You can also use those functions in a more advanced setting like this:
+
 ```python
 from speech_map import Pooling, build_embeddings_and_labels, mean_average_precision
 
@@ -43,3 +78,41 @@ embeddings, labels = build_embeddings_and_labels(
 print(mean_average_precision(embeddings, labels))
 ```
 
+This is a minimal package, and you can easily go through the code in `src/speech_map/core.py` if you want to check the details.
+
+## Data
+
+We distribute in `data` the words and n-grams annotations for LibriSpeech evaluation subsets. Decompress them with zstd.
+
+We have not used the n-grams annotations recently; there is probably too much samples and they would need some clever subsampling.
+
+## References
+
+MAP for speech representations:
+
+```bibtex
+@inproceedings{carlin11_interspeech,
+  title     = {Rapid evaluation of speech representations for spoken term discovery},
+  author    = {Michael A. Carlin and Samuel Thomas and Aren Jansen and Hynek Hermansky},
+  year      = {2011},
+  booktitle = {Interspeech 2011},
+  pages     = {821--824},
+  doi       = {10.21437/Interspeech.2011-304},
+  issn      = {2958-1796},
+}
+```
+
+Data and original implementation:
+
+```bibtex
+@inproceedings{algayres20_interspeech,
+  title     = {Evaluating the Reliability of Acoustic Speech Embeddings},
+  author    = {Robin Algayres and Mohamed Salah Zaiem and Benoît Sagot and Emmanuel Dupoux},
+  year      = {2020},
+  booktitle = {Interspeech 2020},
+  pages     = {4621--4625},
+  doi       = {10.21437/Interspeech.2020-2362},
+  issn      = {2958-1796},
+}
+```
+
diff --git a/src/speech_map/core.py b/src/speech_map/core.py
@@ -121,7 +121,21 @@ def build_embeddings_and_labels(
     device: torch.device | None = None,
     file_extension: str = ".pt",
 ) -> tuple[Tensor, Tensor]:
-    """Build the pooled embeddings and labels from the annotations and the pre-computed features."""
+    """Build the pooled embeddings and labels from the annotations and the pre-computed features.
+
+    Args:
+        root: Path to the directory with input files.
+        jsonl: Path to the JSONL file with annotations.
+        pooling: Pooling to use for the embeddings, either Pooling.MEAN, Pooling.MAX, Pooling.MIN, or Pooling.HAMMING.
+        frequency: Frequency of the input features, used to compute the segment frontiers.
+        feature_maker: Function to load the features, default is torch.load. You can use your own model here.
+        device: Device to use for the embeddings, default is "cuda" if available, otherwise "cpu".
+        file_extension: Extension of the input files, default is ".pt". If you use your own model,
+                        you can change it to ".wav" for example.
+
+    Returns:
+        A tuple of two tensors: the pooled embeddings and the labels.
+    """
     if device is None:
         device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
     root, jsonl = Path(root), Path(jsonl)