Implementation of embedding analysis of pretrained audio models applied to ShipRadiatedNoise recognition
This repository provides a flexible evaluation pipeline for audio representations, supporting both supervised and unsupervised evaluation of features extracted from pretrained models (e.g. MelSpectrograms, BEATS, HuBERT, AudioMAE, etc.).
The main entry point is a single script (main.py) that allows you to run experiments via the command line in a reproducible and configurable way.
- Multiple audio representation backends (e.g. MelSpec, BEATS, HuBERT, WavLM)
- Supervised evaluation
- Linear classifier
- Attentive classifier
- Unsupervised evaluation
- Clustering with Mutual Information
- Rank-based similarity metrics
- Clean CLI interface for experiments
- Easy extension to new models or datasets
project/
├── main.py
├── loaders.py # data loaders (assumed)
├── models.py # feature extractors (assumed)
├── Supervised_evaluation.py
├── Unsupervised_evaluation.py
├── Similarity.py
├── models/
│ └── Classifiers/
└── Data/
└── <dataset_name>/
├── train/
└── test/
⚠️ Assumed is that the embeddings of the bioacoustic models are already extracted using: https://github.com/bioacoustic-ai/bacpipe
python3 main.py --train_dir /path/to/train --test_dir /path/to/test --model MelSpec --unsupervised
python3 main.py --train_dir /path/to/train --test_dir /path/to/test --model BEATS --classifier linear --supervised
python3 main.py --train_dir /path/to/train --test_dir /path/to/test --model BEATS --classifier attentive --supervised
python3 main.py --train_dir /path/to/train --test_dir /path/to/test --model HuBERT --supervised --unsupervised
| Argument | Type | Description |
|---|---|---|
| --train_dir | str | Path to training dataset directory (required) |
| --test_dir | str | Path to test dataset directory (required) |
| --model | str | Feature extractor (BEATS, HuBERT, AudioMAE, WavLM, Data2vec, MelSpec) |
| --classifier | str | linear or attentive (supervised only) |
| --model_out | str | Path to save trained classifier |
| --supervised | flag | Enable supervised evaluation |
| --unsupervised | flag | Enable unsupervised evaluation |
Depending on the selected options, the script produces the following outputs.
- Classification accuracy
- Confusion matrix
- Optional rank-similarity scores on classifier logits
- Saved classifier model (
.ptfile)
- Clustering Mutual Information (MI) score
- Rank-based similarity scores on embeddings
-
Trained classifiers are saved to:
models/Classifiers/<model_name>.pt
The following feature extractors are supported. The table includes embedding dimensions and links to the original papers:
| Model | Embedding Dimension | Supervised | Unsupervised | Paper / Reference |
|---|---|---|---|---|
| Animal2Vec | 1024 | ✅ | ✅ | Animal2Vec |
| AudioMAE | 768 | ✅ | ✅ | AudioMAE |
| AVES | 768 | ✅ | ✅ | AVES |
| AvesEcho | 768 | ✅ | ✅ | AvesEcho |
| BEATS | 768 | ✅ | ✅ | BEATS: Audio Representation Learning |
| BirdMAE | 1280 | ✅ | ✅ | BirdMAE |
| BirdNet | 1024 | ✅ | ✅ | BirdNet |
| Data2vec | 768 | ✅ | ✅ | Data2vec |
| GoogleWhale | 1280 | ✅ | ✅ | GoogleWhale |
| HuBERT | 768 | ✅ | ✅ | HuBERT |
| HuBERTAS | 768 | ✅ | ✅ | HuBERTAS |
| MelSpec | 128 | ✅ | ✅ | Mel Spectrograms |
| Perch | 1280 | ✅ | ✅ | Perch |
| Perch2.0 | 1536 | ✅ | ✅ | Perch2.0 |
| SurfPerch | 1280 | ✅ | ✅ | SurfPerch |
| Wav2Vec | 768 | ✅ | ✅ | Wav2Vec |
| WavLM | 768 | ✅ | ✅ | WavLM |
Implement a prediction_() function.
Register it inside the extract_features() function in main.py.
This project has been evaluated on the following publicly available underwater acoustic datasets:
- Description: Large-scale underwater acoustic dataset for ship classification, covering multiple vessel types and recording conditions.
- Access: https://github.com/irfankamboh/DeepShip
- Paper:
K. Irfan et al., DeepShip: A Large-Scale Underwater Acoustic Benchmark Dataset, Expert Systems with Applications 2021.
- Description: Real-world underwater acoustic recordings of ships and ambient noise, collected in the port of Vigo, Spain.
- Access: https://atlanttic.uvigo.es/underwaternoise/ships-ear/
- Paper:
M. Santos-Domínguez et al., ShipsEar: An Underwater Vessel Noise Database, Applied Acoustics, 2016.
⚠️ Note:
Please refer to the original dataset websites for licensing terms and usage restrictions.
Some datasets may require registration or approval for access.
-
Always use absolute paths for datasets to avoid silent errors
-
The script performs directory sanity checks at runtime