908的男同学

KALDI-IO库的生成与读取

使用此库可以用作“ kaldi I / O C ++ API”，可以读写“ .ark， .scp”格式，还可以使用kaldi :: [Matrix | Vector | ..]

编译要求：

cmake> = 3.0

数学库：mkl（推荐），安装conda，并使用它来安装mkl：（conda install mkl默认情况下，mkl与conda一起安装），

　　　　当cmake时，conda应该确保已经安装好，

cd kaldi-io
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=.. .. # set install prefix as '../kaldi-io'
make
make install

结果：

kaldi-io / lib：
- libkaldi_io_static.a
- libkaldi_io_shared.so
kaldi-io /包括：
- kaldi-io.h（标头，可以根据需要进行修改）
- 子标题目录...

其他：

数学依赖关系是通过cmake从系统路径自动解决的find_package（cmake / Modules / FindBLAS.cmake）

首选MKL / ATLAS / Accelerate（osx）

如果要设置特定的数学库：

＃确保构建目录是干净的
cmake -DBLAS_VENDORS = [ ATLAS | MKL | OPEN | ..] ..
 ＃另外，可以设置自定义数学库搜索路径，例如： 
cmake -DBLAS_VENDORS = ATLAS -DBLAS_ATLAS_LIB_DIRS = ... / atlas / build / lib .. 
cmake -DBLAS_VENDORS = MKL -DBLAS_MKL_LIB_DIRS = / opt / intel / mkl / lib / intel64 ..

可以通过kaldi_io.py读取kaldi特征

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import print_function
from __future__ import division

import numpy as np
import sys, os, re, gzip, struct

# Adding kaldi tools to shell path,

# Select kaldi,
if not 'KALDI_ROOT' in os.environ:
    # Default! To change run python with 'export KALDI_ROOT=/some_dir python'
    os.environ['KALDI_ROOT']='/mnt/matylda5/iveselyk/Tools/kaldi-trunk'

# Add kaldi tools to path,
path = os.popen('echo $KALDI_ROOT/src/bin:$KALDI_ROOT/tools/openfst/bin:$KALDI_ROOT/src/fstbin/:$KALDI_ROOT/src/gmmbin/:$KALDI_ROOT/src/featbin/:$KALDI_ROOT/src/lm/:$KALDI_ROOT/src/sgmmbin/:$KALDI_ROOT/src/sgmm2bin/:$KALDI_ROOT/src/fgmmbin/:$KALDI_ROOT/src/latbin/:$KALDI_ROOT/src/nnetbin:$KALDI_ROOT/src/nnet2bin:$KALDI_ROOT/src/nnet3bin:$KALDI_ROOT/src/online2bin/:$KALDI_ROOT/src/ivectorbin/:$KALDI_ROOT/src/lmbin/')
os.environ['PATH'] = path.readline().strip() + ':' + os.environ['PATH']
path.close()


# Define all custom exceptions,
class UnsupportedDataType(Exception): pass
class UnknownVectorHeader(Exception): pass
class UnknownMatrixHeader(Exception): pass

class BadSampleSize(Exception): pass
class BadInputFormat(Exception): pass

class SubprocessFailed(Exception): pass

# Data-type independent helper functions,

def open_or_fd(file, mode='rb'):
    """ fd = open_or_fd(file)
     Open file, gzipped file, pipe, or forward the file-descriptor.
     Eventually seeks in the 'file' argument contains ':offset' suffix.
    """
    offset = None
    try:
        # strip 'ark:' prefix from r{x,w}filename (optional),
        if re.search('^(ark|scp)(,scp|,b|,t|,n?f|,n?p|,b?o|,n?s|,n?cs)*:', file):
            (prefix,file) = file.split(':',1)
        # separate offset from filename (optional),
        if re.search(':[0-9]+$', file):
            (file,offset) = file.rsplit(':',1)
        # input pipe?
        if file[-1] == '|':
            fd = popen(file[:-1], 'rb') # custom,
        # output pipe?
        elif file[0] == '|':
            fd = popen(file[1:], 'wb') # custom,
        # is it gzipped?
        elif file.split('.')[-1] == 'gz':
            fd = gzip.open(file, mode)
        # a normal file...
        else:
            fd = open(file, mode)
    except TypeError:
        # 'file' is opened file descriptor,
        fd = file
    # Eventually seek to offset,
    if offset != None: fd.seek(int(offset))
    return fd

# based on '/usr/local/lib/python3.6/os.py'
def popen(cmd, mode="rb"):
    if not isinstance(cmd, str):
        raise TypeError("invalid cmd type (%s, expected string)" % type(cmd))

    import subprocess, io, threading

    # cleanup function for subprocesses,
    def cleanup(proc, cmd):
        ret = proc.wait()
        if ret > 0:
            raise SubprocessFailed('cmd %s returned %d !' % (cmd,ret))
        return

    # text-mode,
    if mode == "r":
        proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=sys.stderr)
        threading.Thread(target=cleanup,args=(proc,cmd)).start() # clean-up thread,
        return io.TextIOWrapper(proc.stdout)
    elif mode == "w":
        proc = subprocess.Popen(cmd, shell=True, stdin=subprocess.PIPE, stderr=sys.stderr)
        threading.Thread(target=cleanup,args=(proc,cmd)).start() # clean-up thread,
        return io.TextIOWrapper(proc.stdin)
    # binary,
    elif mode == "rb":
        proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=sys.stderr)
        threading.Thread(target=cleanup,args=(proc,cmd)).start() # clean-up thread,
        return proc.stdout
    elif mode == "wb":
        proc = subprocess.Popen(cmd, shell=True, stdin=subprocess.PIPE, stderr=sys.stderr)
        threading.Thread(target=cleanup,args=(proc,cmd)).start() # clean-up thread,
        return proc.stdin
    # sanity,
    else:
        raise ValueError("invalid mode %s" % mode)


def read_key(fd):
    """ [key] = read_key(fd)
     Read the utterance-key from the opened ark/stream descriptor 'fd'.
    """
    assert('b' in fd.mode), "Error: 'fd' was opened in text mode (in python3 use sys.stdin.buffer)"

    key = ''
    while 1:
        char = fd.read(1).decode("latin1")
        if char == '' : break
        if char == ' ' : break
        key += char
    key = key.strip()
    if key == '': return None # end of file,
    assert(re.match('^\S+$',key) != None) # check format (no whitespace!)
    return key


# Integer vectors (alignments, ...),

def read_ali_ark(file_or_fd):
    """ Alias to 'read_vec_int_ark()' """
    return read_vec_int_ark(file_or_fd)

def read_vec_int_ark(file_or_fd):
    """ generator(key,vec) = read_vec_int_ark(file_or_fd)
     Create generator of (key,vector) tuples, which reads from the ark file/stream.
     file_or_fd : ark, gzipped ark, pipe or opened file descriptor.
     Read ark to a 'dictionary':
     d = { u:d for u,d in kaldi_io.read_vec_int_ark(file) }
    """
    fd = open_or_fd(file_or_fd)
    try:
        key = read_key(fd)
        while key:
            ali = read_vec_int(fd)
            yield key, ali
            key = read_key(fd)
    finally:
        if fd is not file_or_fd: fd.close()

def read_vec_int(file_or_fd):
    """ [int-vec] = read_vec_int(file_or_fd)
     Read kaldi integer vector, ascii or binary input,
    """
    fd = open_or_fd(file_or_fd)
    binary = fd.read(2).decode()
    if binary == '\0B': # binary flag
        assert(fd.read(1).decode() == '\4'); # int-size
        vec_size = np.frombuffer(fd.read(4), dtype='int32', count=1)[0] # vector dim
        if vec_size == 0:
            return np.array([], dtype='int32')
        # Elements from int32 vector are sored in tuples: (sizeof(int32), value),
        vec = np.frombuffer(fd.read(vec_size*5), dtype=[('size','int8'),('value','int32')], count=vec_size)
        assert(vec[0]['size'] == 4) # int32 size,
        ans = vec[:]['value'] # values are in 2nd column,
    else: # ascii,
        arr = (binary + fd.readline().decode()).strip().split()
        try:
            arr.remove('['); arr.remove(']') # optionally
        except ValueError:
            pass
        ans = np.array(arr, dtype=int)
    if fd is not file_or_fd : fd.close() # cleanup
    return ans

# Writing,
def write_vec_int(file_or_fd, v, key=''):
    """ write_vec_int(f, v, key='')
     Write a binary kaldi integer vector to filename or stream.
     Arguments:
     file_or_fd : filename or opened file descriptor for writing,
     v : the vector to be stored,
     key (optional) : used for writing ark-file, the utterance-id gets written before the vector.
     Example of writing single vector:
     kaldi_io.write_vec_int(filename, vec)
     Example of writing arkfile:
     with open(ark_file,'w') as f:
         for key,vec in dict.iteritems():
             kaldi_io.write_vec_flt(f, vec, key=key)
    """
    assert(isinstance(v, np.ndarray))
    assert(v.dtype == np.int32)
    fd = open_or_fd(file_or_fd, mode='wb')
    if sys.version_info[0] == 3: assert(fd.mode == 'wb')
    try:
        if key != '' : fd.write((key+' ').encode("latin1")) # ark-files have keys (utterance-id),
        fd.write('\0B'.encode()) # we write binary!
        # dim,
        fd.write('\4'.encode()) # int32 type,
        fd.write(struct.pack(np.dtype('int32').char, v.shape[0]))
        # data,
        for i in range(len(v)):
            fd.write('\4'.encode()) # int32 type,
            fd.write(struct.pack(np.dtype('int32').char, v[i])) # binary,
    finally:
        if fd is not file_or_fd : fd.close()


# Float vectors (confidences, ivectors, ...),

# Reading,
def read_vec_flt_scp(file_or_fd):
    """ generator(key,mat) = read_vec_flt_scp(file_or_fd)
     Returns generator of (key,vector) tuples, read according to kaldi scp.
     file_or_fd : scp, gzipped scp, pipe or opened file descriptor.
     Iterate the scp:
     for key,vec in kaldi_io.read_vec_flt_scp(file):
         ...
     Read scp to a 'dictionary':
     d = { key:mat for key,mat in kaldi_io.read_mat_scp(file) }
    """
    fd = open_or_fd(file_or_fd)
    try:
        for line in fd:
            (key,rxfile) = line.decode().split(' ')
            vec = read_vec_flt(rxfile)
            yield key, vec
    finally:
        if fd is not file_or_fd : fd.close()

def read_vec_flt_ark(file_or_fd):
    """ generator(key,vec) = read_vec_flt_ark(file_or_fd)
     Create generator of (key,vector) tuples, reading from an ark file/stream.
     file_or_fd : ark, gzipped ark, pipe or opened file descriptor.
     Read ark to a 'dictionary':
     d = { u:d for u,d in kaldi_io.read_vec_flt_ark(file) }
    """
    fd = open_or_fd(file_or_fd)
    try:
        key = read_key(fd)
        while key:
            ali = read_vec_flt(fd)
            yield key, ali
            key = read_key(fd)
    finally:
        if fd is not file_or_fd : fd.close()

def read_vec_flt(file_or_fd):
    """ [flt-vec] = read_vec_flt(file_or_fd)
     Read kaldi float vector, ascii or binary input,
    """
    fd = open_or_fd(file_or_fd)
    binary = fd.read(2).decode()
    if binary == '\0B': # binary flag
        ans = _read_vec_flt_binary(fd)
    else:    # ascii,
        arr = (binary + fd.readline().decode()).strip().split()
        try:
            arr.remove('['); arr.remove(']') # optionally
        except ValueError:
            pass
        ans = np.array(arr, dtype=float)
    if fd is not file_or_fd : fd.close() # cleanup
    return ans

def _read_vec_flt_binary(fd):
    header = fd.read(3).decode()
    if header == 'FV ' : sample_size = 4 # floats
    elif header == 'DV ' : sample_size = 8 # doubles
    else : raise UnknownVectorHeader("The header contained '%s'" % header)
    assert (sample_size > 0)
    # Dimension,
    assert (fd.read(1).decode() == '\4'); # int-size
    vec_size = np.frombuffer(fd.read(4), dtype='int32', count=1)[0] # vector dim
    if vec_size == 0:
        return np.array([], dtype='float32')
    # Read whole vector,
    buf = fd.read(vec_size * sample_size)
    if sample_size == 4 : ans = np.frombuffer(buf, dtype='float32')
    elif sample_size == 8 : ans = np.frombuffer(buf, dtype='float64')
    else : raise BadSampleSize
    return ans


# Writing,
def write_vec_flt(file_or_fd, v, key=''):
    """ write_vec_flt(f, v, key='')
     Write a binary kaldi vector to filename or stream. Supports 32bit and 64bit floats.
     Arguments:
     file_or_fd : filename or opened file descriptor for writing,
     v : the vector to be stored,
     key (optional) : used for writing ark-file, the utterance-id gets written before the vector.
     Example of writing single vector:
     kaldi_io.write_vec_flt(filename, vec)
     Example of writing arkfile:
     with open(ark_file,'w') as f:
         for key,vec in dict.iteritems():
             kaldi_io.write_vec_flt(f, vec, key=key)
    """
    assert(isinstance(v, np.ndarray))
    fd = open_or_fd(file_or_fd, mode='wb')
    if sys.version_info[0] == 3: assert(fd.mode == 'wb')
    try:
        if key != '' : fd.write((key+' ').encode("latin1")) # ark-files have keys (utterance-id),
        fd.write('\0B'.encode()) # we write binary!
        # Data-type,
        if v.dtype == 'float32': fd.write('FV '.encode())
        elif v.dtype == 'float64': fd.write('DV '.encode())
        else: raise UnsupportedDataType("'%s', please use 'float32' or 'float64'" % v.dtype)
        # Dim,
        fd.write('\04'.encode())
        fd.write(struct.pack(np.dtype('uint32').char, v.shape[0])) # dim
        # Data,
        fd.write(v.tobytes())
    finally:
        if fd is not file_or_fd : fd.close()


# Float matrices (features, transformations, ...),

# Reading,
def read_mat_scp(file_or_fd):
    """ generator(key,mat) = read_mat_scp(file_or_fd)
     Returns generator of (key,matrix) tuples, read according to kaldi scp.
     file_or_fd : scp, gzipped scp, pipe or opened file descriptor.
     Iterate the scp:
     for key,mat in kaldi_io.read_mat_scp(file):
         ...
     Read scp to a 'dictionary':
     d = { key:mat for key,mat in kaldi_io.read_mat_scp(file) }
    """
    fd = open_or_fd(file_or_fd)
    try:
        for line in fd:
            (key,rxfile) = line.decode().split(' ')
            mat = read_mat(rxfile)
            yield key, mat
    finally:
        if fd is not file_or_fd : fd.close()

def read_mat_ark(file_or_fd):
    """ generator(key,mat) = read_mat_ark(file_or_fd)
     Returns generator of (key,matrix) tuples, read from ark file/stream.
     file_or_fd : scp, gzipped scp, pipe or opened file descriptor.
     Iterate the ark:
     for key,mat in kaldi_io.read_mat_ark(file):
         ...
     Read ark to a 'dictionary':
     d = { key:mat for key,mat in kaldi_io.read_mat_ark(file) }
    """
    fd = open_or_fd(file_or_fd)
    try:
        key = read_key(fd)
        while key:
            mat = read_mat(fd)
            yield key, mat
            key = read_key(fd)
    finally:
        if fd is not file_or_fd : fd.close()

def read_mat(file_or_fd):
    """ [mat] = read_mat(file_or_fd)
     Reads single kaldi matrix, supports ascii and binary.
     file_or_fd : file, gzipped file, pipe or opened file descriptor.
    """
    fd = open_or_fd(file_or_fd)
    try:
        binary = fd.read(2).decode()
        if binary == '\0B' :
            mat = _read_mat_binary(fd)
        else:
            assert(binary == ' [')
            mat = _read_mat_ascii(fd)
    finally:
        if fd is not file_or_fd: fd.close()
    return mat

def _read_mat_binary(fd):
    # Data type
    header = fd.read(3).decode()
    # 'CM', 'CM2', 'CM3' are possible values,
    if header.startswith('CM'): return _read_compressed_mat(fd, header)
    elif header == 'FM ': sample_size = 4 # floats
    elif header == 'DM ': sample_size = 8 # doubles
    else: raise UnknownMatrixHeader("The header contained '%s'" % header)
    assert(sample_size > 0)
    # Dimensions
    s1, rows, s2, cols = np.frombuffer(fd.read(10), dtype='int8,int32,int8,int32', count=1)[0]
    # Read whole matrix
    buf = fd.read(rows * cols * sample_size)
    if sample_size == 4 : vec = np.frombuffer(buf, dtype='float32')
    elif sample_size == 8 : vec = np.frombuffer(buf, dtype='float64')
    else : raise BadSampleSize
    mat = np.reshape(vec,(rows,cols))
    return mat

def _read_mat_ascii(fd):
    rows = []
    while 1:
        line = fd.readline().decode()
        if (len(line) == 0) : raise BadInputFormat # eof, should not happen!
        if len(line.strip()) == 0 : continue # skip empty line
        arr = line.strip().split()
        if arr[-1] != ']':
            rows.append(np.array(arr,dtype='float32')) # not last line
        else:
            rows.append(np.array(arr[:-1],dtype='float32')) # last line
            mat = np.vstack(rows)
            return mat


def _read_compressed_mat(fd, format):
    """ Read a compressed matrix,
        see: https://github.com/kaldi-asr/kaldi/blob/master/src/matrix/compressed-matrix.h
        methods: CompressedMatrix::Read(...), CompressedMatrix::CopyToMat(...),
    """
    assert(format == 'CM ') # The formats CM2, CM3 are not supported...

    # Format of header 'struct',
    global_header = np.dtype([('minvalue','float32'),('range','float32'),('num_rows','int32'),('num_cols','int32')]) # member '.format' is not written,
    per_col_header = np.dtype([('percentile_0','uint16'),('percentile_25','uint16'),('percentile_75','uint16'),('percentile_100','uint16')])

    # Read global header,
    globmin, globrange, rows, cols = np.frombuffer(fd.read(16), dtype=global_header, count=1)[0]

    # The data is structed as [Colheader, ... , Colheader, Data, Data , .... ]
    #                                                 {                     cols                     }{         size                 }
    col_headers = np.frombuffer(fd.read(cols*8), dtype=per_col_header, count=cols)
    col_headers = np.array([np.array([x for x in y]) * globrange * 1.52590218966964e-05 + globmin for y in col_headers], dtype=np.float32)
    data = np.reshape(np.frombuffer(fd.read(cols*rows), dtype='uint8', count=cols*rows), newshape=(cols,rows)) # stored as col-major,

    mat = np.zeros((cols,rows), dtype='float32')
    p0 = col_headers[:, 0].reshape(-1, 1)
    p25 = col_headers[:, 1].reshape(-1, 1)
    p75 = col_headers[:, 2].reshape(-1, 1)
    p100 = col_headers[:, 3].reshape(-1, 1)
    mask_0_64 = (data <= 64)
    mask_193_255 = (data > 192)
    mask_65_192 = (~(mask_0_64 | mask_193_255))

    mat += (p0    + (p25 - p0) / 64. * data) * mask_0_64.astype(np.float32)
    mat += (p25 + (p75 - p25) / 128. * (data - 64)) * mask_65_192.astype(np.float32)
    mat += (p75 + (p100 - p75) / 63. * (data - 192)) * mask_193_255.astype(np.float32)

    return mat.T # transpose! col-major -> row-major,


# Writing,
def write_mat(file_or_fd, m, key=''):
    """ write_mat(f, m, key='')
    Write a binary kaldi matrix to filename or stream. Supports 32bit and 64bit floats.
    Arguments:
     file_or_fd : filename of opened file descriptor for writing,
     m : the matrix to be stored,
     key (optional) : used for writing ark-file, the utterance-id gets written before the matrix.
     Example of writing single matrix:
     kaldi_io.write_mat(filename, mat)
     Example of writing arkfile:
     with open(ark_file,'w') as f:
         for key,mat in dict.iteritems():
             kaldi_io.write_mat(f, mat, key=key)
    """
    assert(isinstance(m, np.ndarray))
    assert(len(m.shape) == 2), "'m' has to be 2d matrix!"
    fd = open_or_fd(file_or_fd, mode='wb')
    if sys.version_info[0] == 3: assert(fd.mode == 'wb')
    try:
        if key != '' : fd.write((key+' ').encode("latin1")) # ark-files have keys (utterance-id),
        fd.write('\0B'.encode()) # we write binary!
        # Data-type,
        if m.dtype == 'float32': fd.write('FM '.encode())
        elif m.dtype == 'float64': fd.write('DM '.encode())
        else: raise UnsupportedDataType("'%s', please use 'float32' or 'float64'" % m.dtype)
        # Dims,
        fd.write('\04'.encode())
        fd.write(struct.pack(np.dtype('uint32').char, m.shape[0])) # rows
        fd.write('\04'.encode())
        fd.write(struct.pack(np.dtype('uint32').char, m.shape[1])) # cols
        # Data,
        fd.write(m.tobytes())
    finally:
        if fd is not file_or_fd : fd.close()


# 'Posterior' kaldi type (posteriors, confusion network, nnet1 training targets, ...)
# Corresponds to: vector > >
# - outer vector: time axis
# - inner vector: records at the time
# - tuple: int = index, float = value
#

def read_cnet_ark(file_or_fd):
    """ Alias of function 'read_post_ark()', 'cnet' = confusion network """
    return read_post_ark(file_or_fd)

def read_post_rxspec(file_):
    """ adaptor to read both 'ark:...' and 'scp:...' inputs of posteriors,
    """
    if file_.startswith("ark:"):
        return read_post_ark(file_)
    elif file_.startswith("scp:"):
        return read_post_scp(file_)
    else:
        print("unsupported intput type: %s" % file_)
        print("it should begint with 'ark:' or 'scp:'")
        sys.exit(1)

def read_post_scp(file_or_fd):
    """ generator(key,post) = read_post_scp(file_or_fd)
     Returns generator of (key,post) tuples, read according to kaldi scp.
     file_or_fd : scp, gzipped scp, pipe or opened file descriptor.
     Iterate the scp:
     for key,post in kaldi_io.read_post_scp(file):
         ...
     Read scp to a 'dictionary':
     d = { key:post for key,post in kaldi_io.read_post_scp(file) }
    """
    fd = open_or_fd(file_or_fd)
    try:
        for line in fd:
            (key,rxfile) = line.decode().split(' ')
            post = read_post(rxfile)
            yield key, post
    finally:
        if fd is not file_or_fd : fd.close()

def read_post_ark(file_or_fd):
    """ generator(key,vec>) = read_post_ark(file)
     Returns generator of (key,posterior) tuples, read from ark file.
     file_or_fd : ark, gzipped ark, pipe or opened file descriptor.
     Iterate the ark:
     for key,post in kaldi_io.read_post_ark(file):
         ...
     Read ark to a 'dictionary':
     d = { key:post for key,post in kaldi_io.read_post_ark(file) }
    """
    fd = open_or_fd(file_or_fd)
    try:
        key = read_key(fd)
        while key:
            post = read_post(fd)
            yield key, post
            key = read_key(fd)
    finally:
        if fd is not file_or_fd: fd.close()

def read_post(file_or_fd):
    """ [post] = read_post(file_or_fd)
     Reads single kaldi 'Posterior' in binary format.
     The 'Posterior' is C++ type 'vector > >',
     the outer-vector is usually time axis, inner-vector are the records
     at given time,    and the tuple is composed of an 'index' (integer)
     and a 'float-value'. The 'float-value' can represent a probability
     or any other numeric value.
     Returns vector of vectors of tuples.
    """
    fd = open_or_fd(file_or_fd)
    ans=[]
    binary = fd.read(2).decode(); assert(binary == '\0B'); # binary flag
    assert(fd.read(1).decode() == '\4'); # int-size
    outer_vec_size = np.frombuffer(fd.read(4), dtype='int32', count=1)[0] # number of frames (or bins)

    # Loop over 'outer-vector',
    for i in range(outer_vec_size):
        assert(fd.read(1).decode() == '\4'); # int-size
        inner_vec_size = np.frombuffer(fd.read(4), dtype='int32', count=1)[0] # number of records for frame (or bin)
        data = np.frombuffer(fd.read(inner_vec_size*10), dtype=[('size_idx','int8'),('idx','int32'),('size_post','int8'),('post','float32')], count=inner_vec_size)
        assert(data[0]['size_idx'] == 4)
        assert(data[0]['size_post'] == 4)
        ans.append(data[['idx','post']].tolist())

    if fd is not file_or_fd: fd.close()
    return ans


# Kaldi Confusion Network bin begin/end times,
# (kaldi stores CNs time info separately from the Posterior).
#

def read_cntime_ark(file_or_fd):
    """ generator(key,vec>) = read_cntime_ark(file_or_fd)
     Returns generator of (key,cntime) tuples, read from ark file.
     file_or_fd : file, gzipped file, pipe or opened file descriptor.
     Iterate the ark:
     for key,time in kaldi_io.read_cntime_ark(file):
         ...
     Read ark to a 'dictionary':
     d = { key:time for key,time in kaldi_io.read_post_ark(file) }
    """
    fd = open_or_fd(file_or_fd)
    try:
        key = read_key(fd)
        while key:
            cntime = read_cntime(fd)
            yield key, cntime
            key = read_key(fd)
    finally:
        if fd is not file_or_fd : fd.close()

def read_cntime(file_or_fd):
    """ [cntime] = read_cntime(file_or_fd)
     Reads single kaldi 'Confusion Network time info', in binary format:
     C++ type: vector >.
     (begin/end times of bins at the confusion network).
     Binary layout is '     ...'
     file_or_fd : file, gzipped file, pipe or opened file descriptor.
     Returns vector of tuples.
    """
    fd = open_or_fd(file_or_fd)
    binary = fd.read(2).decode(); assert(binary == '\0B'); # assuming it's binary

    assert(fd.read(1).decode() == '\4'); # int-size
    vec_size = np.frombuffer(fd.read(4), dtype='int32', count=1)[0] # number of frames (or bins)

    data = np.frombuffer(fd.read(vec_size*10), dtype=[('size_beg','int8'),('t_beg','float32'),('size_end','int8'),('t_end','float32')], count=vec_size)
    assert(data[0]['size_beg'] == 4)
    assert(data[0]['size_end'] == 4)
    ans = data[['t_beg','t_end']].tolist() # Return vector of tuples (t_beg,t_end),

    if fd is not file_or_fd : fd.close()
    return ans


# Segments related,
#

# Segments as 'Bool vectors' can be handy,
# - for 'superposing' the segmentations,
# - for frame-selection in Speaker-ID experiments,
def read_segments_as_bool_vec(segments_file):
    """ [ bool_vec ] = read_segments_as_bool_vec(segments_file)
     using kaldi 'segments' file for 1 wav, format : '   '
     - t-beg, t-end is in seconds,
     - assumed 100 frames/second,
    """
    segs = np.loadtxt(segments_file, dtype='object,object,f,f', ndmin=1)
    # Sanity checks,
    assert(len(segs) > 0) # empty segmentation is an error,
    assert(len(np.unique([rec[1] for rec in segs ])) == 1) # segments with only 1 wav-file,
    # Convert time to frame-indexes,
    start = np.rint([100 * rec[2] for rec in segs]).astype(int)
    end = np.rint([100 * rec[3] for rec in segs]).astype(int)
    # Taken from 'read_lab_to_bool_vec', htk.py,
    frms = np.repeat(np.r_[np.tile([False,True], len(end)), False],
                     np.r_[np.c_[start - np.r_[0, end[:-1]], end-start].flat, 0])
    assert np.sum(end-start) == np.sum(frms)
    return frms

决策树算法全解析：从零基础到Titanic实战，一文搞定机器学习经典模型吴师兄大模型 0基础实现机器学习入门到精通算法机器学习决策树人工智能深度学习编程开发语言
Langchain系列文章目录01-玩转LangChain：从模型调用到Prompt模板与输出解析的完整指南02-玩转LangChainMemory模块：四种记忆类型详解及应用场景全覆盖03-全面掌握LangChain：从核心链条构建到动态任务分配的实战指南04-玩转LangChain：从文档加载到高效问答系统构建的全程实战05-玩转LangChain：深度评估问答系统的三种高效方法（示例生成、手
图像处理篇---图像预处理 Ronin-Lotus 图像处理篇深度学习篇程序代码篇图像处理人工智能 opencv python 深度学习计算机视觉
文章目录前言一、通用目的1.1数据标准化目的实现1.2噪声抑制目的实现高斯滤波中值滤波双边滤波1.3尺寸统一化目的实现1.4数据增强目的实现1.5特征增强目的实现：边缘检测直方图均衡化锐化二、分领域预处理2.1传统机器学习（如SVM、随机森林）2.1.1特点2.1.2预处理重点灰度化二值化形态学操作特征工程2.2深度学习（如CNN、Transformer）2.2.1特点2.2.2预处理重点通道顺序
目前市场上主流的机器视觉的框架有哪些？他们的特点及优劣 yuanpan 机器学习计算机视觉
目前市场上主流的机器视觉框架和工具可以分为商业软件、开源工具和深度学习框架三大类。以下是它们的总结及特点对比：1.商业软件(1)Halcon(MVTec)特点：专注于工业机器视觉，提供高精度、高效率的算法。支持复杂的工业应用，如缺陷检测、3D视觉、深度学习等。提供图形化开发工具HDevelop和多种编程接口。优势：算法优化好，适合实时工业应用。硬件兼容性强，支持多种工业相机和设备。劣势：商业软件，
1.1PaddleTS_环境配置：一个易用的深度时序建模的Python库 pythonQA python paddlepaddle
PaddleTS是一个易用的深度时序建模的Python库，它基于飞桨深度学习框架PaddlePaddle，专注业界领先的深度模型，旨在为领域专家和行业用户提供可扩展的时序建模能力和便捷易用的用户体验。PaddleTS的主要特性包括：设计统一数据结构，实现对多样化时序数据的表达，支持单目标与多目标变量，支持多类型协变量封装基础模型功能，如数据加载、回调设置、损失函数、训练过程控制等公共方法，帮助开发
【大模型科普】AIGC技术发展与应用实践（一文读懂AIGC）人工智能
【专栏介绍】⌈⌈⌈人工智能与大模型应用⌋⌋⌋人工智能（AI）通过算法模拟人类智能，利用机器学习、深度学习等技术驱动医疗、金融等领域的智能化。大模型是千亿参数的深度神经网络（如ChatGPT），经海量数据训练后能完成文本生成、图像创作等复杂任务，显著提升效率，但面临算力消耗、数据偏见等挑战。当前正加速与教育、科研融合，未来需平衡技术创新与伦理风险，推动可持续发展。文章目录一、AIGC概述（一）什么是
代码逐行解析 | 教你在C++中使用深度学习提取特征点 3Ｄ视觉工坊 3D视觉从入门到精通 c++深度学习开发语言人工智能
点击下方卡片，关注「3D视觉工坊」公众号选择星标，干货第一时间送达扫描下方二维码，加入3D视觉技术星球，星球内汇集了众多3D视觉实战问题，以及各个模块的学习资料：最新顶会论文、书籍、源码、视频（近20门系统课程[星球成员可免费学习]）等。想要入门3D视觉、做项目、搞科研，就加入我们吧。作者：泡椒味的口香糖|来源：3DCV添加微信：dddvision
【产品小白】什么是AI产品经理百事不可口y 产品经理的一步一步人工智能产品经理学习产品运营内容运营用户运营
一、AI产品经理的定义与角色定位AI产品经理是人工智能技术与商业应用之间的核心桥梁，负责将复杂的AI技术转化为满足市场需求的产品。需同时具备技术理解力、商业洞察力和用户思维，既要参与算法选型与数据建模，又要定义产品功能与市场策略，是贯穿产品全生命周期的关键角色。与传统互联网产品经理相比，AI产品经理的独特之处在于：技术深度参与：需理解机器学习、自然语言处理（NLP）、计算机视觉等技术原理，并参与数
深度学习-130-RAG技术之基于Anything LLM搭建本地私人知识库的应用策略问题总结(一) 皮皮冰燃深度学习深度学习人工智能 RAG
文章目录1AnythingLLM的本地知识库1.1本地知识库应用场景1.2效果对比及思考1.3本地体现在哪些方面1.3.1知识在本地1.3.2分割后的文档在本地1.3.3大模型部署运行在本地2问错问题带来的问题2.1常见的问题2.2原因分析3为什么LLM不使用我的文件？3.1LLM不是万能的【omnipotent】3.2LLM不会自省【introspect】3.3AnythingLLM是如何工作的
3DMAX点云算法：实现毫米级BIM模型偏差检测（附完整代码）夏末之花人工智能
摘要本文基于激光雷达点云数据与BIM模型的高精度对齐技术，提出一种融合动态体素化与多模态特征匹配的偏差检测方法。通过点云预处理、语义分割、模型配准及差异分析，最终实现建筑构件毫米级偏差的可视化检测。文中提供关键代码实现，涵盖点云处理、特征提取与深度学习模型搭建。一、核心算法流程点云预处理与特征增强去噪与下采样：采用统计滤波与体素网格下采样，去除离群点并降低数据量。语义分割：基于PointNet++
数据增强：扩充数据集提升模型泛化能力 AI天才研究院计算 AI大模型企业级应用开发实战 ChatGPT 计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
1.背景介绍1.1.数据增强的重要性在机器学习领域，模型的泛化能力至关重要。一个泛化能力强的模型能够在未见数据上表现良好，而过拟合的模型则会在训练数据上表现出色，但在新数据上表现糟糕。数据增强是一种有效提升模型泛化能力的技术，它通过对现有数据进行各种变换，人为地扩充数据集，从而增加训练数据的数量和多样性。1.2.数据增强的应用场景数据增强广泛应用于各种机器学习任务中，包括：图像识别:对图像进行旋转
数据增强：扩充数据集，提升模型的鲁棒性 AI天才研究院 DeepSeek R1 &大数据AI人工智能大模型 LLM大模型落地实战指南计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
数据增强：扩充数据集，提升模型的鲁棒性1.背景介绍1.1数据集的重要性在机器学习和深度学习领域中,数据集是训练模型的基础。高质量的数据集对于构建准确、鲁棒的模型至关重要。然而,在现实世界中,获取大量高质量的数据通常是一个巨大的挑战。数据采集过程耗时耗力,而且成本高昂。此外,某些领域的数据存在隐私和安全问题,难以获取。1.2数据集不足的挑战当数据集规模有限时,模型很容易过拟合,无法很好地推广到新的、
Docker打包深度学习项目 FLY_LTL docker 深度学习容器
文章目录Docker打包深度学习项目1.Docker和NVIDIAContainerToolkit的安装1.Docker2.NVIDIAContainerToolkit3.添加国内镜像源2.使用Dockerfile打包并保存镜像1.Dockerfile2.通过Dockerfile生成镜像3.保存镜像和加载4.运行Docker并测试参考Docker打包深度学习项目本文来源于个人实践总结，供各位同学参
使用TensorFlow、OpenCV和Pygame实现图像处理与游戏开发 UwoiGit tensorflow opencv pygame
在本篇文章中，我们将介绍如何结合使用TensorFlow、OpenCV和Pygame来进行图像处理和游戏开发。这三个工具在机器学习、计算机视觉和游戏开发领域都非常流行，并且它们的结合可以提供强大的功能和无限的创造力。我们将逐步介绍如何安装和配置这些工具，并提供相关的源代码示例。安装TensorFlowTensorFlow是一个基于数据流图的开源机器学习框架，提供了丰富的工具和库来构建和训练各种深度
机器学习之KMeans算法知舟不叙机器学习算法 kmeans
文章目录引言1.KMeans算法简介2.KMeans算法的数学原理3.KMeans算法的步骤3.1初始化簇中心3.2分配数据点3.3更新簇中心3.4停止条件4.KMeans算法的优缺点4.1优点4.2缺点5.KMeans算法的应用场景5.1图像分割5.2市场细分5.3文档聚类5.4异常检测6.Python实现KMeans算法7.总结引言KMeans算法是机器学习中最经典的无监督学习算法之一，广泛应
机器学习流程—数据预处理清洗不二人生机器学习机器学习人工智能数据预处理
文章目录机器学习流程—数据预处理清洗定义问题数据预处理数据加载与展示重复数据处理数据类型空值处理无关特征删除数据分布删除异常值生成标签和特征数据分割机器学习流程—数据预处理清洗数据处理是将数据从给定形式转换为更可用和更理想的形式的任务，即使其更有意义、信息更丰富。使用机器学习算法、数学建模和统计知识，整个过程可以自动化。这个完整过程的输出可以是任何所需的形式，如图形、视频、图表、表格、图像等等，具
深度革命：ResNet 如何用 “残差连接“ 颠覆深度学习安意诚Matrix 机器学习笔记深度学习人工智能
一文快速了解ResNet创新点在深度学习的历史长河中，2015年或许是最具突破性的一年。这一年，微软亚洲研究院的何恺明团队带着名为ResNet（残差网络）的模型横空出世，在ImageNet图像分类竞赛中以3.57%的错误率夺冠，将人类视觉的识别误差（约5.1%）远远甩在身后。更令人震撼的是，ResNet将神经网络的深度推至152层，彻底打破了"深层网络无法训练"的魔咒。这场革命的核心，正是一个简单
智能形状匹配技术全解析：从经典算法到深度学习与神经形态计算【超级详细版】 AI筑梦师计算机视觉算法深度学习人工智能机器学习计算机视觉 python
智能形状匹配技术全解析：从经典算法到深度学习与神经形态计算1.引言1.1研究背景在计算机视觉、模式识别、医学影像分析和自动驾驶等领域，形状匹配是核心任务之一。然而，现实世界的形状往往存在可变性（Variability），主要体现在以下几个方面：形变（Deformation）：物体可能由于柔性材料、外力作用或生物运动发生非刚性形变。尺度变化（ScaleVariation）：目标形状在不同场景下可能大
Python 模拟鼠标轨迹算法 a485240 鼠标轨迹计算机外设
一.鼠标轨迹模拟简介传统的鼠标轨迹模拟依赖于简单的数学模型，如直线或曲线路径。然而，这种方法难以捕捉到人类操作的复杂性和多样性。AI大模型的出现，使得能够通过深度学习技术，学习并模拟更自然的鼠标移动行为。二.鼠标轨迹算法实现AI大模型通过学习大量的人类鼠标操作数据，能够识别和模拟出自然且具有个体差异的鼠标轨迹。以下是实现这一技术的关键步骤：数据收集：收集不同玩家在各种游戏环境中的鼠标操作数据，包括
Apache Storm：实时数据处理的闪电战 Aaron_945 Java apache storm 大数据
文章目录ApacheStorm原理拓扑结构数据流处理容错机制官网链接基础使用安装与配置编写拓扑提交与运行高级使用状态管理窗口操作多语言支持优点高吞吐量低延迟可扩展性容错性总结ApacheStorm是一个开源的分布式实时计算系统，它允许你以极高的吞吐量处理无界数据流。Storm被广泛用于实时分析、在线机器学习、连续计算等多种场景。本文将深入探讨ApacheStorm的原理、基础使用、高级特性及其优点
什么是机器视觉3D引导大模型视觉人机器视觉机器视觉3D 3d 数码相机机器人人工智能大数据
机器视觉3D引导大模型是结合深度学习、多模态数据融合与三维感知技术的智能化解决方案，旨在提升工业自动化、医疗、物流等领域的操作精度与效率。以下从技术架构、行业应用、挑战与未来趋势等方面综合分析：一、技术架构与核心原理多模态数据融合与深度学习3D视觉引导大模型通常整合RGB图像、点云数据、深度信息等多模态输入，通过深度学习算法（如卷积神经网络、Transformer）进行特征提取与融合。例如，油田机
Python 机器学习基础之学习基础环境搭建仙魁XAN Python 机器学习基础+实战案例 python 学习开发语言机器学习 machine learning
Python机器学习基础之学习基础环境搭建目录Python机器学习基础之学习基础环境搭建一、简单介绍二、什么是机器学习三、python环境的搭建1、Python安装包下载2、这里以下载Python3.10.9为例3、安装Python3.10.94、检验python是否安装成功，win+R快捷打开运行，输入cmd，打开cmd四、Pycharm环境搭建1、下载Pycharm安装包2、安装Pycharm
深度学习在医学影像分析中的应用：DeepSeek系统的实践与探索 Evaporator Core #深度学习 #DeepSeek快速入门 DeepSeek进阶开发与应用深度学习人工智能
随着人工智能技术的迅猛发展，深度学习在医学领域的应用逐渐成为研究热点。医学影像分析作为医疗诊断的重要组成部分，正受益于深度学习技术的突破。DeepSeek系统是一种基于深度学习的医学影像分析平台，旨在通过高效、精准的算法辅助医生进行疾病诊断和治疗决策。本文将深入探讨DeepSeek系统的技术原理、实现方法及其在医学影像分析中的实际应用，并结合代码示例展示其核心功能。1.DeepSeek系统的技术架
【机器学习】主成分分析法（PCA）若兰幽竹机器学习机器学习信息可视化人工智能
【机器学习】主成分分析法（PCA）一、摘要二、主成分分析的基本概念三、主成分分析的数学模型五、主成分分析法目标函数公式推导（`梯度上升法`求解目标函数）六、梯度上升法求解目标函数第一个主成分七、求解前n个主成分及PCA在数据预处理中的处理步骤（后续实现）一、摘要本文主要讲述了主成分分析法（PCA）的原理和应用。PCA通过选择最重要的特征，将高维数据映射到低维空间，同时保持数据间的关系，实现降维和去
【深度学习遥感分割|论文解读2】UNetFormer：一种类UNet的Transformer，用于高效的遥感城市场景图像语义分割 985小水博一枚呀论文解读深度学习 transformer 人工智能网络 cnn
【深度学习遥感分割|论文解读2】UNetFormer：一种类UNet的Transformer，用于高效的遥感城市场景图像语义分割【深度学习遥感分割|论文解读2】UNetFormer：一种类UNet的Transformer，用于高效的遥感城市场景图像语义分割文章目录【深度学习遥感分割|论文解读2】UNetFormer：一种类UNet的Transformer，用于高效的遥感城市场景图像语义分割2.Re
PyTorch 深度学习博客 Zoro｜ PyTorch Deep Learning 人工智能
PyTorch深度学习博客欢迎来到我的PyTorch深度学习博客！在这里，我将分享使用PyTorch学习和实践深度学习项目的点滴经验。本博客适用于初学者和有一定基础的开发者，旨在帮助大家快速搭建环境、掌握核心概念，并通过实例了解实际应用。环境配置为了确保项目的稳定性和兼容性，我选择了Python3.9环境，并在conda创建的虚拟环境中运行最新且稳定的PyTorch版本2.6.0。1.创建Pyth
深入探索 PyTorch 在语音识别中的应用 Zoro｜ PyTorch Deep Learning 机器学习 pytorch 语音识别人工智能
深入探索PyTorch在语音识别中的应用在本篇博客中，我将分享如何使用PyTorch进行语音识别任务，重点围绕环境配置、数据预处理、特征提取、模型设计以及模型比较展开。本文基于最近一次机器学习作业（HW2）的任务内容，任务目标是对语音信号进行逐帧音素预测，从而完成多类别分类任务。一、介绍任务背景任务目标：利用深度神经网络对语音信号进行逐帧音素预测。音素定义：音素是语音中能够区分单词的最小语音单位。
MNIST数据集&手写数字识别 Zoro｜ keras tensorflow 人工智能机器学习
TensorFlow是一个开源的机器学习框架，由Google开发并发布。它提供了一种基于数据流图的编程模型，用于构建和训练机器学习模型。TensorFlow的核心概念是张量（Tensor）和流图（Graph）。张量是TensorFlow中的基本数据单位，可以理解为多维数组，可以是标量、向量、矩阵或更高维度的数组。流图是由一系列操作（Operation）和张量组成的。操作定义了计算和转换张量的方式。
深度学习五大模型：CNN、Transformer、BERT、RNN、GAN详细解析深度学习
卷积神经网络（ConvolutionalNeuralNetwork,CNN）原理：CNN主要由卷积层、池化层和全连接层组成。卷积层通过卷积核在输入数据上进行卷积运算，提取局部特征；池化层则对特征图进行下采样，降低特征维度，同时保留主要特征；全连接层将特征图展开为一维向量，并进行分类或回归计算。CNN利用卷积操作实现局部连接和权重共享，能够自动学习数据中的空间特征。适用场景：广泛应用于图像处理相关的
OpenLSD是一个自适应开源数据集，旨在支持逻辑综合中的多种机器学习任务。数据集
2024-11-14，由中国科学院计算技术研究所、鹏城实验室和北京大学等联合创建OpenLSD数据集，目的为逻辑综合过程中的机器学习任务提供一个自适应的数据集生成框架。该数据集的核心研究问题是如何在逻辑综合的三个基本步骤——布尔表示、逻辑优化和技术映射中，通过机器学习方法提升效率和质量。一、研究背景：逻辑综合是电子设计自动化（EDA）流程中的关键环节，它负责将高级设计规范转化为门级网络列表。近年来
算力技术创新驱动多场景应用演进智能计算研究中心其他
内容概要算力技术创新正成为数字经济时代的基础性驱动力，从异构计算架构的多元融合到量子计算的颠覆性突破，技术演进不断突破物理与算法的双重边界。在工业互联网场景中，边缘计算通过分布式节点实现毫秒级响应，支撑智能制造产线的实时控制；智能安防系统依托深度学习模型与流计算技术，完成海量视频数据的动态解析；而科学计算领域通过分布式计算与模型压缩技术，将基因测序、气候模拟等复杂任务的效率提升至新量级。值得注意的
Java实现的简单双向Map，支持重复Value superlxw1234 java 双向map
关键字：Java双向Map、DualHashBidiMap 有个需求，需要根据即时修改Map结构中的Value值，比如，将Map中所有value=V1的记录改成value=V2，key保持不变。数据量比较大，遍历Map性能太差，这就需要根据Value先找到Key，然后去修改。即：既要根据Key找Value，又要根据Value
PL/SQL触发器基础及例子百合不是茶 oracle数据库触发器 PL/SQL编程
触发器的简介; 触发器的定义就是说某个条件成立的时候，触发器里面所定义的语句就会被自动的执行。因此触发器不需要人为的去调用，也不能调用。触发器和过程函数类似过程函数必须要调用, 一个表中最多只能有12个触发器类型的,触发器和过程函数相似触发器不需要调用直接执行, 触发时间：指明触发器何时执行，该值可取： before：表示在数据库动作之前触发
[时空与探索]穿越时空的一些问题 comsci 问题
我们还没有进行过任何数学形式上的证明,仅仅是一个猜想..... 这个猜想就是; 任何有质量的物体(哪怕只有一微克)都不可能穿越时空,该物体强行穿越时空的时候,物体的质量会与时空粒子产生反应,物体会变成暗物质,也就是说,任何物体穿越时空会变成暗物质..(暗物质就我的理
easy ui datagrid上移下移一行商人shang js 上移下移 easyui datagrid
/** * 向上移动一行 * * @param dg * @param row */ function moveupRow(dg, row) { var datagrid = $(dg); var index = datagrid.datagrid("getRowIndex", row); if (isFirstRow(dg, row)) {
Java反射 oloz 反射
本人菜鸟，今天恰好有时间，写写博客，总结复习一下java反射方面的知识，欢迎大家探讨交流学习指教首先看看java中的Class package demo; public class ClassTest { /*先了解java中的Class*/ public static void main(String[] args) { //任何一个类都
springMVC 使用JSR-303 Validation验证杨白白 spring mvc
JSR-303是一个数据验证的规范，但是spring并没有对其进行实现，Hibernate Validator是实现了这一规范的，通过此这个实现来讲SpringMVC对JSR-303的支持。 JSR-303的校验是基于注解的，首先要把这些注解标记在需要验证的实体类的属性上或是其对应的get方法上。登录需要验证类 public class Login { @NotEmpty
log4j 香水浓 log4j
log4j.rootCategory=DEBUG, STDOUT, DAILYFILE, HTML, DATABASE #log4j.rootCategory=DEBUG, STDOUT, DAILYFILE, ROLLINGFILE, HTML #console log4j.appender.STDOUT=org.apache.log4j.ConsoleAppender log4
使用ajax和history.pushState无刷新改变页面URL agevs jquery 框架 Ajax html5 chrome
表现如果你使用chrome或者firefox等浏览器访问本博客、github.com、plus.google.com等网站时，细心的你会发现页面之间的点击是通过ajax异步请求的，同时页面的URL发生了了改变。并且能够很好的支持浏览器前进和后退。是什么有这么强大的功能呢？ HTML5里引用了新的API，history.pushState和history.replaceState，就是通过
centos中文乱码 AILIKES centos OS ssh
一、CentOS系统访问 g.cn ，发现中文乱码。于是用以前的方式：yum -y install fonts-chinese CentOS系统安装后，还是不能显示中文字体。我使用 gedit 编辑源码，其中文注释也为乱码。后来，终于找到以下方法可以解决，需要两个中文支持的包： fonts-chinese-3.02-12.
触发器 baalwolf 触发器
触发器(trigger)：监视某种情况，并触发某种操作。触发器创建语法四要素：1.监视地点(table) 2.监视事件(insert/update/delete) 3.触发时间(after/before) 4.触发事件(insert/update/delete) 语法： create trigger triggerName after/before
JS正则表达式的i m g bijian1013 JavaScript 正则表达式
g:表示全局（global)模式，即模式将被应用于所有字符串，而非在发现第一个匹配项时立即停止。 i:表示不区分大小写（case-insensitive）模式，即在确定匹配项时忽略模式与字符串的大小写。 m:表示
HTML5模式和Hashbang模式 bijian1013 JavaScript AngularJS Hashbang模式 HTML5模式
我们可以用$locationProvider来配置$location服务（可以采用注入的方式，就像AngularJS中其他所有东西一样）。这里provider的两个参数很有意思，介绍如下。 html5Mode 一个布尔值，标识$location服务是否运行在HTML5模式下。 ha
[Maven学习笔记六]Maven生命周期 bit1129 maven
从mvn test的输出开始说起当我们在user-core中执行mvn test时，执行的输出如下： /software/devsoftware/jdk1.7.0_55/bin/java -Dmaven.home=/software/devsoftware/apache-maven-3.2.1 -Dclassworlds.conf=/software/devs
【Hadoop七】基于Yarn的Hadoop Map Reduce容错 bit1129 hadoop
运行于Yarn的Map Reduce作业，可能发生失败的点包括 Task Failure Application Master Failure Node Manager Failure Resource Manager Failure 1. Task Failure 任务执行过程中产生的异常和JVM的意外终止会汇报给Application Master。僵死的任务也会被A
记一次数据推送的异常解决端口解决 ronin47 记一次数据推送的异常解决
　　需求：从db获取数据然后推送到B 程序开发完成，上jboss,刚开始报了很多错，逐一解决，可最后显示连接不到数据库。机房的同事说可以ping 通。　　自已画了个图，逐一排除，把linux 防火墙　和　setenforce　设置最低。　　　service iptables stop
巧用视错觉-UI更有趣 brotherlamp UI ui视频 ui教程 ui自学 ui资料
我们每个人在生活中都曾感受过视错觉（optical illusion）的魅力。视错觉现象是双眼跟我们开的一个玩笑，而我们往往还心甘情愿地接受我们看到的假象。其实不止如此，视觉错现象的背后还有一个重要的科学原理——格式塔原理。格式塔原理解释了人们如何以视觉方式感觉物体，以及图像的结构，视角，大小等要素是如何影响我们的视觉的。在下面这篇文章中，我们首先会简单介绍一下格式塔原理中的基本概念，
线段树-poj1177-N个矩形求边长（离散化+扫描线） bylijinnan 数据结构算法线段树
package com.ljn.base; import java.util.Arrays; import java.util.Comparator; import java.util.Set; import java.util.TreeSet; /** * POJ 1177 (线段树+离散化+扫描线)，题目链接为http://poj.org/problem?id=1177
HTTP协议详解 chicony http协议
引言
Scala设计模式 chenchao051 设计模式 scala
Scala设计模式我的话：在国外网站上看到一篇文章，里面详细描述了很多设计模式，并且用Java及Scala两种语言描述，清晰的让我们看到各种常规的设计模式，在Scala中是如何在语言特性层面直接支持的。基于文章很nice，我利用今天的空闲时间将其翻译，希望大家能一起学习，讨论。翻译
安装mysql daizj mysql 安装
安装mysql (1)删除linux上已经安装的mysql相关库信息。rpm -e xxxxxxx --nodeps (强制删除) 执行命令rpm -qa |grep mysql 检查是否删除干净 (2)执行命令 rpm -i MySQL-server-5.5.31-2.el
HTTP状态码大全 dcj3sjt126com http状态码
完整的 HTTP 1.1规范说明书来自于RFC 2616，你可以在http://www.talentdigger.cn/home/link.php?url=d3d3LnJmYy1lZGl0b3Iub3JnLw%3D%3D在线查阅。HTTP 1.1的状态码被标记为新特性，因为许多浏览器只支持 HTTP 1.0。你应只把状态码发送给支持 HTTP 1.1的客户端，支持协议版本可以通过调用request
asihttprequest上传图片 dcj3sjt126com ASIHTTPRequest
NSURL *url =@"yourURL"; ASIFormDataRequest*currentRequest =[ASIFormDataRequest requestWithURL:url]; [currentRequest setPostFormat:ASIMultipartFormDataPostFormat];[currentRequest se
C语言中，关键字static的作用 e200702084 C++c C#
在C语言中，关键字static有三个明显的作用： 1)在函数体，局部的static变量。生存期为程序的整个生命周期，（它存活多长时间）；作用域却在函数体内（它在什么地方能被访问（空间））。一个被声明为静态的变量在这一函数被调用过程中维持其值不变。因为它分配在静态存储区，函数调用结束后并不释放单元，但是在其它的作用域的无法访问。当再次调用这个函数时，这个局部的静态变量还存活，而且用在它的访
win7/8使用curl geeksun win7
1. WIN7/8下要使用curl，需要下载curl-7.20.0-win64-ssl-sspi.zip和Win64OpenSSL_Light-1_0_2d.exe。下载地址： http://curl.haxx.se/download.html 请选择不带SSL的版本，否则还需要安装SSL的支持包 2. 可以给Windows增加c
Creating a Shared Repository; Users Sharing The Repository hongtoushizi git
转载自： http://www.gitguys.com/topics/creating-a-shared-repository-users-sharing-the-repository/ Commands discussed in this section: git init –bare git clone git remote git pull git p
Java实现字符串反转的8种或9种方法 Josh_Persistence 异或反转递归反转二分交换反转 java字符串反转栈反转
注：对于第7种使用异或的方式来实现字符串的反转，如果不太看得明白的，可以参照另一篇博客： http://josh-persistence.iteye.com/blog/2205768 /** * */ package com.wsheng.aggregator.algorithm.string; import java.util.Stack; /**
代码实现任意容量倒水问题 home198979 PHP 算法倒水
形象化设计模式实战 HELLO!架构 redis命令源码解析倒水问题：有两个杯子，一个A升，一个B升，水有无限多，现要求利用这两杯子装C
Druid datasource zhb8015 druid
推荐大家使用数据库连接池 DruidDataSource. http://code.alibabatech.com/wiki/display/Druid/DruidDataSource DruidDataSource经过阿里巴巴数百个应用一年多生产环境运行验证，稳定可靠。它最重要的特点是：监控、扩展和性能。下载和Maven配置看这里： http
两种启动监听器ApplicationListener和ServletContextListener spjich java spring 框架
引言:有时候需要在项目初始化的时候进行一系列工作，比如初始化一个线程池，初始化配置文件，初始化缓存等等，这时候就需要用到启动监听器，下面分别介绍一下两种常用的项目启动监听器 ServletContextListener 特点: 依赖于sevlet容器，需要配置web.xml 使用方法: public class StartListener implements
JavaScript Rounding Methods of the Math object 何不笑 JavaScript Math
The next group of methods has to do with rounding decimal values into integers. Three methods — Math.ceil(), Math.floor(), and Math.round() — handle rounding in differen

KALDI-IO库的生成与读取

你可能感兴趣的:(深度学习,机器学习)