TIMIT语音库

TIMIT语音库有着准确的音素标注，因此可以应用于语音分割性能评价，同时该数据库又含有几百个说话人语音，所以也是评价说话人识别常用的权威语音库，但该语音库的商业用途是要花钱买的。下面的资源来自与MIT教学实验使用，大概有430多M。

下载地址：http://web.mit.edu/course/6/6.863/share/nltk_lite/

不需要单个文件下载，可以使用下面的下载工具批量下载。

下载工具：http://www.onlinedown.net/soft/53010.htm

          The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus
                                   (TIMIT)

                            Training and Test Data
                           NIST Speech Disc CD1-1.1

The TIMIT corpus of read speech has been designed to provide speech data for
the acquisition of acoustic-phonetic knowledge and for the development and
evaluation of automatic speech recognition systems. TIMIT has resulted from
the joint efforts of several sites under sponsorship from the Defense Advanced
Research Projects Agency - Information Science and Technology Office
(DARPA-ISTO). Text corpus design was a joint effort among the Massachusetts
Institute of Technology (MIT), Stanford Research Institute (SRI), and Texas
Instruments (TI). The speech was recorded at TI, transcribed at MIT, and has
been maintained, verified, and prepared for CD-ROM production by the National
Institute of Standards and Technology (NIST). This file contains a brief
description of the TIMIT Speech Corpus. Additional information including the
referenced material and some relevant reprints of articles may be found in the
printed documentation which is also available from NTIS (NTIS# PB91-100354).

1. Corpus Speaker Distribution
-- ---------------------------

TIMIT contains a total of 6300 sentences, 10 sentences spoken by each of 630
speakers from 8 major dialect regions of the United States. Table 1 shows the
number of speakers for the 8 dialect regions, broken down by sex. The
percentages are given in parentheses. A speaker's dialect region is the
geographical area of the U.S. where they lived during their childhood years.
The geographical areas correspond with recognized dialect regions in U.S.
(Language Files, Ohio State University Linguistics Dept., 1982), with the
exception of the Western region (dr7) in which dialect boundaries are not
known with any confidence and dialect region 8 where the speakers moved around
a lot during their childhood.

   Table 1: Dialect distribution of speakers

      Dialect
      Region(dr)    #Male    #Female    Total
      ---------- --------- --------- ----------
         1         31 (63%) 18 (27%)   49 (8%)
         2         71 (70%) 31 (30%) 102 (16%)
         3         79 (67%) 23 (23%) 102 (16%)
         4         69 (69%) 31 (31%) 100 (16%)
         5         62 (63%) 36 (37%)   98 (16%)
         6         30 (65%) 16 (35%)   46 (7%)
         7         74 (74%) 26 (26%) 100 (16%)
         8         22 (67%) 11 (33%)   33 (5%)
       ------     --------- --------- ----------
         8        438 (70%) 192 (30%) 630 (100%)

The dialect regions are:
     dr1: New England
     dr2: Northern
     dr3: North Midland
     dr4: South Midland
     dr5: Southern
     dr6: New York City
     dr7: Western
     dr8: Army Brat (moved around)

2. Corpus Text Material
-- --------------------

The text material in the TIMIT prompts (found in the file "prompts.doc")
consists of 2 dialect "shibboleth" sentences designed at SRI, 450
phonetically-compact sentences designed at MIT, and 1890 phonetically-diverse
sentences selected at TI. The dialect sentences (the SA sentences) were meant
to expose the dialectal variants of the speakers and were read by all 630
speakers. The phonetically-compact sentences were designed to provide a good
coverage of pairs of phones, with extra occurrences of phonetic contexts
thought to be either difficult or of particular interest. Each speaker read 5
of these sentences (the SX sentences) and each text was spoken by 7 different
speakers. The phonetically-diverse sentences (the SI sentences) were selected
from existing text sources - the Brown Corpus (Kuchera and Francis, 1967) and
the Playwrights Dialog (Hultzen, et al., 1964) - so as to add diversity in
sentence types and phonetic contexts. The selection criteria maximized the
variety of allophonic contexts found in the texts. Each speaker read 3 of
these sentences, with each sentence being read only by a single speaker.
Table 2 summarizes the speech material in TIMIT.

    Table 2: TIMIT speech material

Sentence Type   #Sentences   #Speakers   Total   #Sentences/Speaker
-------------   ----------   ---------   -----   ------------------
Dialect (SA)          2         630       1260           2
Compact (SX)        450           7       3150           5
Diverse (SI)       1890           1       1890           3
-------------   ----------   ---------   -----    ----------------
Total              2342                   6300          10

3. Suggested Training/Test Subdivision
-- -----------------------------------

The speech material has been subdivided into portions for training and
testing. The criteria for the subdivision is described in the file
"testset.doc". THIS SUBDIVISION HAS NO RELATION TO THE DATA DISTRIBUTED ON
THE PROTOTYPE VERSION OF THE CDROM.

Core Test Set:

The test data has a core portion containing 24 speakers, 2 male and 1 female
from each dialect region. The core test speakers are shown in Table 3. Each
speaker read a different set of SX sentences. Thus the core test material
contains 192 sentences, 5 SX and 3 SI for each speaker, each having a distinct
text prompt.

    Table 3: The core test set of 24 speakers

     Dialect        Male      Female
     -------       ------     ------
        1        DAB0, WBT0    ELC0
        2        TAS1, WEW0    PAS0
        3        JMP0, LNT0    PKT0
        4        LLL0, TLS0    JLM0
        5        BPM0, KLT0    NLP0
        6        CMJ0, JDH0    MGD0
        7        GRT0, NJM0    DHC0
        8        JLN0, PAM0    MLD0

Complete Test Set:

A more extensive test set was obtained by including the sentences from all
speakers that read any of the SX texts included in the core test set. In
doing so, no sentence text appears in both the training and test sets. This
complete test set contains a total of 168 speakers and 1344 utterances,
accounting for about 27% of the total speech material. The resulting dialect
distribution of the 168 speaker test set is given in Table 4. The complete
test material contains 624 distinct texts.

     Table 4: Dialect distribution for complete test set

      Dialect    #Male   #Female   Total
      -------    -----   -------   -----
        1           7        4       11
        2          18        8       26
        3          23        3       26
        4          16       16       32
        5          17       11       28
        6           8        3       11
       7          15        8       23
        8           8        3       11
      -----      -----   -------   ------
      Total       112       56      168

4. CDROM TIMIT Directory and File Structure
-- ----------------------------------------

The speech and associated data is organized on the CD-ROM according to the
following hierarchy:

/<CORPUS>/<USAGE>/<DIALECT>/<SEX><SPEAKER_ID>/<SENTENCE_ID>.<FILE_TYPE>

     where,

     CORPUS :== timit
     USAGE :== train | test
     DIALECT :== dr1 | dr2 | dr3 | dr4 | dr5 | dr6 | dr7 | dr8
                 (see Table 1 for dialect code description)
     SEX :== m | f
     SPEAKER_ID :== <INITIALS><DIGIT>

          where,
          INITIALS :== speaker initials, 3 letters
          DIGIT :== number 0-9 to differentiate speakers with identical
                    initials

     SENTENCE_ID :== <TEXT_TYPE><SENTENCE_NUMBER>

          where,

          TEXT_TYPE :== sa | si | sx
                        (see Section 2 for sentence text type description)
          SENTENCE_NUMBER :== 1 ... 2342

     FILE_TYPE :== wav | txt | wrd | phn
                   (see Table 5 for file type description)

Examples:
     /timit/train/dr1/fcjf0/sa1.wav

     (TIMIT corpus, training set, dialect region 1, female speaker,
      speaker-ID "cjf0", sentence text "sa1", speech waveform file)


      /timit/test/df5/mbpm0/sx407.phn

      (TIMIT corpus, test set, dialect region 5, male speaker, speaker-ID
       "bpm0", sentence text "sx407", phonetic transcription file)


Online documentation and tables are located in the directory "timit/doc".
A brief description of each file in this directory can be found in Section 6.

5. File Types
-- ----------

The TIMIT corpus includes several files associated with each utterance. In
addition to a speech waveform file (.wav), three associated transcription
files (.txt, .wrd, .phn) exist. These associated files have the form:

        <BEGIN_SAMPLE> <END_SAMPLE> <TEXT><new-line>
        .
        .
        .
        <BEGIN_SAMPLE> <END_SAMPLE> <TEXT><new-line>

        where,

                BEGIN_SAMPLE :== The beginning integer sample number for the
                                 segment (Note: The first BEGIN_SAMPLE of each
                                 file is always 0)

                END_SAMPLE :== The ending integer sample number for the segment
                               (Note: Because of the transcription method used,
                               the last END_SAMPLE in each transcription file
                               may be less than the actual last sample in the
                               corresponding .wav file)

                TEXT :== <ORTHOGRAPHY> | <WORD_LABEL> | <PHONETIC_LABEL>

                where,

                     ORTHOGRAPHY :== Complete orthographic text transcription
                     WORD_LABEL :== Single word from the orthography
                     PHONETIC_LABEL :== Single phonetic transcription code
                                        (See "phoncode.doc" for description
                                        of codes)

    Table 5: Utterance-associated file types

File Type                     Description
--------- ------------------------------------------------------

     .wav - SPHERE-headered speech waveform file. (See the "/sphere"
            directory for speech file manipulation utilities.)

     .txt - Associated orthographic transcription of the words the
            person said. (Usually this is the same as the prompt, but
            in a few cases the orthography and prompt disagree.)

     .wrd - Time-aligned word transcription. The word boundaries
            were aligned with the phonetic segments using a dynamic
            string alignment program (see the printed documentation
            section "Notes on the Word Alignments" and the lexical
            pronunciations given in "timitdic.txt".)

     .phn - Time-aligned phonetic transcription. (See the reprint
            of the article by Seneff and Zue (1988), in the printed
            documentation, and the section "Notes on Checking the
            Phonetic Transcriptions" for more details on the phonetic
            transcription protocols.)


Example transcriptions from the utterance in "/timit/test/dr5/fnlp0/sa1.wav"

Orthography (.txt):
        0 61748 She had your dark suit in greasy wash water all year.

Word label (.wrd):
        7470 11362 she
        11362 16000 had
        15420 17503 your
        17503 23360 dark
        23360 28360 suit
        28360 30960 in
        30960 36971 greasy
        36971 42290 wash
        43120 47480 water
        49021 52184 all
        52184 58840 year

Phonetic label (.phn):
(Note: beginning and ending silence regions are marked with h#)
        0 7470 h#
        7470 9840 sh
        9840 11362 iy
        11362 12908 hv
        12908 14760 ae
        14760 15420 dcl
        15420 16000 jh
        16000 17503 axr
        17503 18540 dcl
        18540 18950 d
        18950 21053 aa
        21053 22200 r
        22200 22740 kcl
        22740 23360 k
        23360 25315 s
        25315 27643 ux
        27643 28360 tcl
        28360 29272 q
        29272 29932 ih
        29932 30960 n
        30960 31870 gcl
        31870 32550 g
        32550 33253 r
        33253 34660 iy
        34660 35890 z
        35890 36971 iy
        36971 38391 w
        38391 40690 ao
        40690 42290 sh
        42290 43120 epi
        43120 43906 w
        43906 45480 ao
        45480 46040 dx
        46040 47480 axr
        47480 49021 q
        49021 51348 ao
        51348 52184 l
        52184 54147 y
        54147 56654 ih
        56654 58840 axr
        58840 61680 h#




6. Online Documentation
-- --------------------

Compact documentation is located in the "/timit/doc" directory. Files in this
directory with a ".doc" extension contain freeform descriptive text and files
with a ".txt" extension contain tables of formatted text which can be searched
programmatically. Lines in the ".txt" files beginning with a semicolon are
comments and should be ignored on searches. The following is a brief
description of their contents:

    phoncode.doc - Table of phone symbols used in phonemic dictionary and
                   phonetic transcriptions
     prompts.txt - Table of sentence prompts and sentence-ID numbers
    spkrinfo.txt - Table of speaker attributes
    spkrsent.txt - Table of sentence-ID numbers for each speaker
     testset.doc - Description of suggested train/test subdivision
    timitdic.doc - Description of phonemic lexicion
    timitdic.txt - Phonemic dictionary of all orthographic words in prompts

A more extensive description of corpus design, collection, and transcription
can be found in the printed documentation.

你可能感兴趣的:(it)

安装数据库首次应用 Array_06 java oracle sql
可是为什么再一次失败之后就变成直接跳过那个要求 enter full pathname of java.exe的界面这个java.exe是你的Oracle 11g安装目录中例如：【F:\app\chen\product\11.2.0\dbhome_1\jdk\jre\bin】下的java.exe 。不是你的电脑安装的java jdk下的java.exe！注意第一次，使用SQL D
Weblogic Server Console密码修改和遗忘解决方法 bijian1013 Welogic
在工作中一同事将Weblogic的console的密码忘记了，通过网上查询资料解决，实践整理了一下。一.修改Console密码打开weblogic控制台，安全领域 --> myrealm -->&n
IllegalStateException: Cannot forward a response that is already committed Cwind java Servlets
对于初学者来说，一个常见的误解是：当调用 forward() 或者 sendRedirect() 时控制流将会自动跳出原函数。标题所示错误通常是基于此误解而引起的。示例代码： protected void doPost() { if (someCondition) { sendRedirect(); } forward(); // Thi
基于流的装饰设计模式木zi_鸣设计模式
当想要对已有类的对象进行功能增强时，可以定义一个类，将已有对象传入，基于已有的功能，并提供加强功能。自定义的类成为装饰类模仿BufferedReader，对Reader进行包装，体现装饰设计模式装饰类通常会通过构造方法接受被装饰的对象，并基于被装饰的对象功能，提供更强的功能。装饰模式比继承灵活，避免继承臃肿，降低了类与类之间的关系装饰类因为增强已有对象，具备的功能该
Linux中的uniq命令被触发 linux
Linux命令uniq的作用是过滤重复部分显示文件内容，这个命令读取输入文件，并比较相邻的行。在正常情况下，第二个及以后更多个重复行将被删去，行比较是根据所用字符集的排序序列进行的。该命令加工后的结果写到输出文件中。输入文件和输出文件必须不同。如果输入文件用“- ”表示，则从标准输入读取。 AD： uniq [选项] 文件说明：这个命令读取输入文件，并比较相邻的行。在正常情况下，第二个
正则表达式Pattern 肆无忌惮_ Pattern
正则表达式是符合一定规则的表达式，用来专门操作字符串，对字符创进行匹配，切割，替换，获取。例如，我们需要对QQ号码格式进行检验规则是长度6~12位不能0开头只能是数字，我们可以一位一位进行比较，利用parseLong进行判断，或者是用正则表达式来匹配[1-9][0-9]{4,14} 或者 [1-9]\d{4,14} &nbs
Oracle高级查询之OVER (PARTITION BY ..) 知了ing oracle sql
一、rank()/dense_rank() over(partition by ...order by ...) 现在客户有这样一个需求，查询每个部门工资最高的雇员的信息，相信有一定oracle应用知识的同学都能写出下面的SQL语句： select e.ename, e.job, e.sal, e.deptno from scott.emp e, (se
Python调试矮蛋蛋 python pdb
原文地址： http://blog.csdn.net/xuyuefei1988/article/details/19399137 1、下面网上收罗的资料初学者应该够用了，但对比IBM的Python 代码调试技巧： IBM：包括 pdb 模块、利用 PyDev 和 Eclipse 集成进行调试、PyCharm 以及 Debug 日志进行调试： http://www.ibm.com/d
webservice传递自定义对象时函数为空，以及boolean不对应的问题 alleni123 webservice
今天在客户端调用方法 NodeStatus status=iservice.getNodeStatus(). 结果NodeStatus的属性都是null。进行debug之后，发现服务器端返回的确实是有值的对象。后来发现原来是因为在客户端，NodeStatus的setter全部被我删除了。本来是因为逻辑上不需要在客户端使用setter，结果改了之后竟然不能获取带属性值的
java如何干掉指针，又如何巧妙的通过引用来操作指针————>说的就是java指针百合不是茶
C语言的强大在于可以直接操作指针的地址，通过改变指针的地址指向来达到更改地址的目的,又是由于c语言的指针过于强大，初学者很难掌握， java的出现解决了c，c++中指针的问题 java将指针封装在底层，开发人员是不能够去操作指针的地址，但是可以通过引用来间接的操作：定义一个指针p来指向a的地址（&是地址符号）：
Eclipse打不开，提示“An error has occurred.See the log file ***/.log” bijian1013 eclipse
打开eclipse工作目录的\.metadata\.log文件，发现如下错误： !ENTRY org.eclipse.osgi 4 0 2012-09-10 09:28:57.139 !MESSAGE Application error !STACK 1 java.lang.NoClassDefFoundError: org/eclipse/core/resources/IContai
spring aop实例annotation方法实现 bijian1013 java spring AOP annotation
在spring aop实例中我们通过配置xml文件来实现AOP，这里学习使用annotation来实现，使用annotation其实就是指明具体的aspect,pointcut和advice。1.申明一个切面(用一个类来实现)在这个切面里,包括了advice和pointcut AdviceMethods.jav
[Velocity一]Velocity语法基础入门 bit1129 velocity
用户和开发人员参考文档 http://velocity.apache.org/engine/releases/velocity-1.7/developer-guide.html 注释 1.行级注释## 2.多行注释#* *# 变量定义使用$开头的字符串是变量定义，例如$var1, $var2, 赋值使用#set为变量赋值，例
【Kafka十一】关于Kafka的副本管理 bit1129 kafka
1. 关于request.required.acks request.required.acks控制者Producer写请求的什么时候可以确认写成功，默认是0， 0表示即不进行确认即返回。 1表示Leader写成功即返回，此时还没有进行写数据同步到其它Follower Partition中 -1表示根据指定的最少Partition确认后才返回，这个在 Th
lua统计nginx内部变量数据 ronin47 lua nginx　统计
server { listen 80; server_name photo.domain.com; location /{set $str $uri; content_by_lua ' local url = ngx.var.uri local res = ngx.location.capture(
java-11.二叉树中节点的最大距离 bylijinnan java
import java.util.ArrayList; import java.util.List; public class MaxLenInBinTree { /* a. 1 / \ 2 3 / \ / \ 4 5 6 7 max=4 pass "root"
Netty源码学习-ReadTimeoutHandler bylijinnan java netty
ReadTimeoutHandler的实现思路：开启一个定时任务，如果在指定时间内没有接收到消息，则抛出ReadTimeoutException 这个异常的捕获，在开发中，交给跟在ReadTimeoutHandler后面的ChannelHandler，例如 private final ChannelHandler timeoutHandler = new ReadTim
jquery验证上传文件样式及大小(好用) cngolon 文件上传 jquery验证
<!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <script src="jquery1.8/jquery-1.8.0.
浏览器兼容【转】 cuishikuan css 浏览器 IE
浏览器兼容问题一：不同浏览器的标签默认的外补丁和内补丁不同问题症状：随便写几个标签，不加样式控制的情况下，各自的margin 和padding差异较大。碰到频率:100% 解决方案：CSS里 *{margin:0;padding:0;} 备注：这个是最常见的也是最易解决的一个浏览器兼容性问题，几乎所有的CSS文件开头都会用通配符*来设
Shell特殊变量：Shell $0, $#, $*, $@, $?, $$和命令行参数 daizj shell $#$?特殊变量
前面已经讲到，变量名只能包含数字、字母和下划线，因为某些包含其他字符的变量有特殊含义，这样的变量被称为特殊变量。例如，$ 表示当前Shell进程的ID，即pid，看下面的代码： $echo $$ 运行结果 29949 特殊变量列表变量含义 $0 当前脚本的文件名 $n 传递给脚本或函数的参数。n 是一个数字，表示第几个参数。例如，第一个
程序设计KISS 原则-------KEEP IT SIMPLE, STUPID! dcj3sjt126com unix
翻到一本书，讲到编程一般原则是kiss：Keep It Simple, Stupid.对这个原则深有体会，其实不仅编程如此，而且系统架构也是如此。 KEEP IT SIMPLE, STUPID! 编写只做一件事情，并且要做好的程序；编写可以在一起工作的程序，编写处理文本流的程序，因为这是通用的接口。这就是UNIX哲学.所有的哲学真正的浓缩为一个铁一样的定律，高明的工程师的神圣的“KISS 原
android Activity间List传值 dcj3sjt126com Activity
第一个Activity： import java.util.ArrayList;import java.util.HashMap;import java.util.List;import java.util.Map;import android.app.Activity;import android.content.Intent;import android.os.Bundle;import a
tomcat 设置java虚拟机内存 eksliang tomcat 内存设置
转载请出自出处：http://eksliang.iteye.com/blog/2117772 http://eksliang.iteye.com/ 常见的内存溢出有以下两种: java.lang.OutOfMemoryError: PermGen space java.lang.OutOfMemoryError: Java heap space ------------
Android 数据库事务处理 gqdy365 android
使用SQLiteDatabase的beginTransaction()方法可以开启一个事务，程序执行到endTransaction() 方法时会检查事务的标志是否为成功，如果程序执行到endTransaction()之前调用了setTransactionSuccessful() 方法设置事务的标志为成功则提交事务，如果没有调用setTransactionSuccessful() 方法则回滚事务。事
Java 打开浏览器 hw1287789687 打开网址 open浏览器 open browser 打开url 打开浏览器
使用java 语言如何打开浏览器呢? 我们先研究下在cmd窗口中,如何打开网址使用IE 打开 D:\software\bin>cmd /c start iexplore http://hw1287789687.iteye.com/blog/2153709 使用火狐打开 D:\software\bin>cmd /c start firefox http://hw1287789
ReplaceGoogleCDN：将 Google CDN 替换为国内的 Chrome 插件 justjavac chrome Google google api chrome插件
Chrome Web Store 安装地址： https://chrome.google.com/webstore/detail/replace-google-cdn/kpampjmfiopfpkkepbllemkibefkiice 由于众所周知的原因，只需替换一个域名就可以继续使用Google提供的前端公共库了。同样，通过script标记引用这些资源，让网站访问速度瞬间提速吧
进程VS.线程 m635674608 线程
资料来源： http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/001397567993007df355a3394da48f0bf14960f0c78753f000 1、Apache最早就是采用多进程模式 2、IIS服务器默认采用多线程模式 3、多进程优缺点优点：多进程模式最大
Linux下安装MemCached 字符串 memcached
前提准备：1. MemCached目前最新版本为：1.4.22，可以从官网下载到。2. MemCached依赖libevent，因此在安装MemCached之前需要先安装libevent。2.1 运行下面命令，查看系统是否已安装libevent。[root@SecurityCheck ~]# rpm -qa|grep libevent libevent-headers-1.4.13-4.el6.n
java设计模式之--jdk动态代理（实现aop编程） Supanccy2013 java DAO 设计模式 AOP
与静态代理类对照的是动态代理类，动态代理类的字节码在程序运行时由Java反射机制动态生成，无需程序员手工编写它的源代码。动态代理类不仅简化了编程工作，而且提高了软件系统的可扩展性，因为Java 反射机制可以生成任意类型的动态代理类。java.lang.reflect 包中的Proxy类和InvocationHandler 接口提供了生成动态代理类的能力。 &
Spring 4.2新特性-对java8默认方法(default method)定义Bean的支持 wiselyman spring 4
2.1 默认方法(default method) java8引入了一个default medthod; 用来扩展已有的接口,在对已有接口的使用不产生任何影响的情况下,添加扩展使用default关键字 Spring 4.2支持加载在默认方法里声明的bean 2.2 将要被声明成bean的类 public class DemoService {