Comment on The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates
WalnutLum@lemmy.ml 2 months agoFrom the approach section:
A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
This is not sufficient data information to recreate the model.
From the training data section:
The models are trained on 680,000 hours of audio and the corresponding transcripts collected from the internet. 65% of this data (or 438,000 hours) represents English-language audio and matched English transcripts, roughly 18% (or 126,000 hours) represents non-English audio and English transcripts, while the final 17% (or 117,000 hours) represents non-English audio and the corresponding transcript. This non-English data represents 98 different languages. As discussed in the accompanying paper, we see that performance on transcription in a given language is directly correlated with the amount of training data we employ in that language.
This is also insufficient data information and links to the paper itself for that data information.
Additionally, model cards =/= data cards. It’s an important distinction in AI training.
There are guides on how to Finetune the model yourself: huggingface.co/blog/fine-tune-whisper
Fine-tuning is not re-creating the model. This is an important distinction.
The OSAID has a pretty simple checklist for the OSAID definition: opensource.org/…/the-open-source-ai-definition-ch…
To go through the list of materials required to fit the OSAID:
Datasets Available under OSD-compliant license
Whisper does not provide the datasets.
Research paper Available under OSD-compliant license
The research paper is available, but does not fit an OSD-compliant license.
Technical report Available under OSD-compliant license
Whisper does not provide the technical report.
Data card Available under OSD-compliant license
Whisper provides the model card, but not the data card.