At the Łódź conference John Coleman presented an interesting talk about the spoken component of the British National Corpus. It comprises about ten percent of the entire corpus.
It includes a wide range of authentic spoken material, recorded in 1991-92 by volunteers wearing Walkman devices recording all their conversational interactions over a 24-hour period. As well as all kinds of structured and unstructured talk directed at other people, from sermons to discussions of boyfriends, the files include dog-directed and parrot-directed speech. Who’s a pretty boy, then?
The material has now been digitized by the British Library from the original analogue recordings.
Although comprising only ten percent of the whole corpus, the audio material of the BNC extends to 9 TB (nine terabytes), about 1800 hours’ worth. So you won’t be downloading it all and storing it on your hard disc any time soon.
Although the whole spoken corpus is unmanageably large, a selection of audio files from the BNC is now available online.
The ten most frequently used words in the spoken corpus, Coleman says, occur more than 58,000 times each. At the other extreme, 23% of the words used (12,400 words) occur only once. Many other words that are surely in people’s vocabulary never occur at all.
Coleman presented some observations about assimilation of place of articulation. As well as the familiar dealveolar type (ˈtem ˈmɪnɪts, ˈɡʊɡ ˈɡɜːl), he found various instances of “nonstandard place assimilation of word-final /m/ and /ŋ/”. Delabial examples included siːn in seem to and seɪŋ in same kind of. As well as plenty of cases of aɪŋ(ɡ)ənə etc for I’m going to, he reports “18 tokens per 10 million” of əˈlɑːŋ klɒk for alarm clock. The most frequent item classified as develar was swimming pool pronounced as ˈswɪmɪm puːl — but there of course the underlying form of the -ing ending would be ɪn rather than ɪŋ for some speakers in some styles of speech (as the sociolinguists have documented), so that the assimilation could be dealveolar after all, not develar. The same applies to ˈwedɪm in wedding present.
We await further reports with interest.