JamALT is an automatic lyrics transcription (ALT) benchmark, based on the JamendoLyrics dataset.
The lyrics have been revised according to a newly compiled annotation guide, which unifies the music industry’s lyrics transcription and formatting guidelines, covering aspects such as punctuation, line breaks, spelling, background vocals, and non-word sounds.
This page visualizes the differences between the original JamendoLyrics dataset and our revision.
Please note that the dataset is not time-aligned as it does not easily map to the timestamps from JamendoLyrics. To evaluate automatic lyrics alignment (ALA), please use JamendoLyrics directly.
Apart from the classical word error rate, the benchmark includes metrics that take into account letter case, punctuation and line/section breaks.
The benchmark is described in our paper Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark.
Running the benchmark
The dataset can be loaded easily using HuggingFace datasets
and the evaluation is implemented in our alt-eval
package:
from datasets import load_dataset
from alt_eval import compute_metrics
dataset = load_dataset("audioshake/jam-alt", revision="v1.0.0")["test"]
# transcriptions: list[str]
compute_metrics(dataset["text"], transcriptions, languages=dataset["language"])
By default, the dataset includes the audio, allowing you to run transcription directly. For example, the following code can be used to evaluate Whisper:
dataset = load_dataset("audioshake/jam-alt", revision="v1.0.0")["test"]
dataset = dataset.cast_column("audio", datasets.Audio(decode=False)) # Get the raw audio file, let Whisper decode it
model = whisper.load_model("tiny")
transcriptions = [
"\n".join(s["text"].strip() for s in model.transcribe(a["path"])["segments"])
for a in dataset["audio"]
]
compute_metrics(dataset["text"], transcriptions, languages=dataset["language"])
Alternatively, if you already have transcriptions, you might prefer to skip loading the audio:
dataset = load_dataset("audioshake/jam-alt", revision="v1.0.0", with_audio=False)["test"]
Citation
When using the benchmark, please cite our paper as well as the original JamendoLyrics paper:
@misc{cifka-2023-jam-alt,
author = {Ond\v{r}ej C\'ifka and
Constantinos Dimitriou and
{Cheng-i} Wang and
Hendrik Schreiber and
Luke Miner and
Fabian-Robert St\"oter},
title = {{Jam-ALT}: A Formatting-Aware Lyrics Transcription Benchmark},
eprint = {arXiv:2311.13987},
year = 2023
}
@inproceedings{durand-2023-contrastive,
author={Durand, Simon and Stoller, Daniel and Ewert, Sebastian},
booktitle={2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages},
year={2023},
pages={1-5},
address={Rhodes Island, Greece},
doi={10.1109/ICASSP49357.2023.10096725}
}