
The single-item scale asks the respondent to rate the ‘‘overall impression’’ of the syn-
thesized speech clip on a 1–5 scale. The other items relate to various aspects of synthetic speech
such as listening effort, pronunciation, speed, pleasantness, naturalness, audio flow, ease of lis-
tening, comprehension, and articulation. Responses are gathered on the 5-point scales with ap-
propriate phrase-anchors.
The two rating scales, type I and type Q, contain common items
(overall impression of sound quality and acceptance, the latter requiring a binary yes-or-no re-
sponse). The unique items in the type I questionnaire are listening effort, comprehension prob-
lems, and articulation, while type Q inquires about pronunciation, speaking rate, and voice
pleasantness (Fig. 1 presents items from types I and Q). Thus, the MOS scale combines an item on
overall impression of sound quality (referred to subsequently as overall sound quality) with other
items that are more specific and relate to different facets of speech quality.
【1】Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale
