zzfive

fastspeech复现github项目--模型构建

在完成fastspeech论文学习后，对github上一个复现的仓库进行学习，帮助理解算法实现过程中的一些细节；所选择的仓库复现仓库是基于pytorch实现，链接为https://github.com/xcmyz/FastSpeech。该仓库使用的数据集为LJSpeech，数据处理部分的代码见笔记“fastspeech复现github项目–数据准备”。本笔记对模型搭建的代码进行详细的注释，项目中与模型构建相关的是transformer路径下文件以及modules.py和model.py，希望本笔记中的代码注释能帮助读者理解FastSpeech。

文章目录

transformer
- Constants.py
- Modules.py
- SubLayers.py
- Layers.py
- Models.py
modules.py
model.py

transformer

因FastSpeech中FFT块和用于训练duration predictor的对齐信息提取都与transformer相关的，本项目把相关的文件都放在了transformer路径下

Constants.py

本文件中设置数据处理中可能会使用到一些常量

PAD = 0
UNK = 1
BOS = 2
EOS = 3

PAD_WORD = ''
UNK_WORD = ''
BOS_WORD = ''
EOS_WORD = ''

Modules.py

本文件中是实现了一个点乘模块，用于计算注意力

import torch
import torch.nn as nn
import numpy as np


# 自定义的点乘模块，用于计算注意力
class ScaledDotProductAttention(nn.Module):
    ''' Scaled Dot-Product Attention '''

    def __init__(self, temperature, attn_dropout=0.1):
        super().__init__()
        self.temperature = temperature
        self.dropout = nn.Dropout(attn_dropout)
        self.softmax = nn.Softmax(dim=2)

    def forward(self, q, k, v, mask=None):
        # 对q、k进行矩阵乘法
        attn = torch.bmm(q, k.transpose(1, 2))
        attn = attn / self.temperature  # 除以温度参数

        if mask is not None:
            attn = attn.masked_fill(mask, -np.inf)

        attn = self.softmax(attn)
        attn = self.dropout(attn)
        output = torch.bmm(attn, v)

        return output, attn

SubLayers.py

本文件中定义了FFT块中的多头注意力层和基于位置的前馈神经网络(FFN)

import torch.nn as nn
import torch.nn.functional as F
import numpy as np

from transformer.Modules import ScaledDotProductAttention
import hparams as hp


# 自定义的多头注意力模块
class MultiHeadAttention(nn.Module):
    ''' Multi-Head Attention module '''

    def __init__(self, n_head, d_model, d_k, d_v, dropout=0.1):
        super().__init__()

        self.n_head = n_head
        self.d_k = d_k
        self.d_v = d_v
        # 计算注意力之前，对q、k、v进行映射的线性层
        self.w_qs = nn.Linear(d_model, n_head * d_k)
        self.w_ks = nn.Linear(d_model, n_head * d_k)
        self.w_vs = nn.Linear(d_model, n_head * d_v)
        # 初始化上述线性层
        nn.init.normal_(self.w_qs.weight, mean=0,
                        std=np.sqrt(2.0 / (d_model + d_k)))
        nn.init.normal_(self.w_ks.weight, mean=0,
                        std=np.sqrt(2.0 / (d_model + d_k)))
        nn.init.normal_(self.w_vs.weight, mean=0,
                        std=np.sqrt(2.0 / (d_model + d_v)))

        self.attention = ScaledDotProductAttention(
            temperature=np.power(d_k, 0.5))
        self.layer_norm = nn.LayerNorm(d_model)

        self.fc = nn.Linear(n_head * d_v, d_model)
        nn.init.xavier_normal_(self.fc.weight)

        self.dropout = nn.Dropout(dropout)

    def forward(self, q, k, v, mask=None):

        d_k, d_v, n_head = self.d_k, self.d_v, self.n_head

        sz_b, len_q, _ = q.size()
        sz_b, len_k, _ = k.size()
        sz_b, len_v, _ = v.size()

        residual = q  # 备份，用于最后与经过计算后的输出相加再输出
        # 讲q、k、v转换为四维，即切分为多头
        q = self.w_qs(q).view(sz_b, len_q, n_head, d_k)
        k = self.w_ks(k).view(sz_b, len_k, n_head, d_k)
        v = self.w_vs(v).view(sz_b, len_v, n_head, d_v)
        # 将q、k、v中的batch_size和n_head两个维度合并，降维三维
        q = q.permute(2, 0, 1, 3).contiguous().view(-1,
                                                    len_q, d_k)  # (n*b) x lq x dk
        k = k.permute(2, 0, 1, 3).contiguous().view(-1,
                                                    len_k, d_k)  # (n*b) x lk x dk
        v = v.permute(2, 0, 1, 3).contiguous().view(-1,
                                                    len_v, d_v)  # (n*b) x lv x dv
        # 为每个头复制一个相同的mask掩码
        mask = mask.repeat(n_head, 1, 1)  # (n*b) x .. x ..
        output, attn = self.attention(q, k, v, mask=mask)  # 计算注意力输出和attn矩阵

        output = output.view(n_head, sz_b, len_q, d_v)
        output = output.permute(1, 2, 0, 3).contiguous().view(
            sz_b, len_q, -1)  # b x lq x (n*dv)  # 先还原为四维，然后再将特征维度还原，降为三维

        output = self.dropout(self.fc(output))
        output = self.layer_norm(output + residual)  # residual

        return output, attn


# 自定义FFT中基于位置的前馈神经网络/FFN，内部均使用一位卷积，也计算了残差
class PositionwiseFeedForward(nn.Module):
    ''' A two-feed-forward-layer module '''

    def __init__(self, d_in, d_hid, dropout=0.1):
        super().__init__()

        # Use Conv1D
        # position-wise
        self.w_1 = nn.Conv1d(
            d_in, d_hid, kernel_size=hp.fft_conv1d_kernel[0], padding=hp.fft_conv1d_padding[0])
        # position-wise
        self.w_2 = nn.Conv1d(
            d_hid, d_in, kernel_size=hp.fft_conv1d_kernel[1], padding=hp.fft_conv1d_padding[1])

        self.layer_norm = nn.LayerNorm(d_in)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        residual = x
        output = x.transpose(1, 2)
        output = self.w_2(F.relu(self.w_1(output)))
        output = output.transpose(1, 2)
        output = self.dropout(output)
        output = self.layer_norm(output + residual)

        return output

Layers.py

本文件中定义了FFT块网络和后续可能会使用到一些简单模块，其中的一些模块后续也没有使用，模块的基本构造与Tocatron和Transformer-TTS相似

import torch
import torch.nn as nn
from torch.nn import functional as F
import numpy as np
from collections import OrderedDict

from transformer.SubLayers import MultiHeadAttention, PositionwiseFeedForward
from text.symbols import symbols


# 自定义的线性全连接层
class Linear(nn.Module):
    """
    Linear Module
    """

    def __init__(self, in_dim, out_dim, bias=True, w_init='linear'):
        """
        :param in_dim: dimension of input
        :param out_dim: dimension of output
        :param bias: boolean. if True, bias is included.
        :param w_init: str. weight inits with xavier initialization.
        """
        super(Linear, self).__init__()
        self.linear_layer = nn.Linear(in_dim, out_dim, bias=bias)

        nn.init.xavier_uniform_(
            self.linear_layer.weight,
            gain=nn.init.calculate_gain(w_init))

    def forward(self, x):
        return self.linear_layer(x)


# 预处理网络，ReLU激活，带有dropout的两层全连接层
class PreNet(nn.Module):
    """
    Pre Net before passing through the network
    """

    def __init__(self, input_size, hidden_size, output_size, p=0.5):
        """
        :param input_size: dimension of input
        :param hidden_size: dimension of hidden unit
        :param output_size: dimension of output
        """
        super(PreNet, self).__init__()
        self.input_size = input_size
        self.output_size = output_size
        self.hidden_size = hidden_size
        self.layer = nn.Sequential(OrderedDict([
            ('fc1', Linear(self.input_size, self.hidden_size)),
            ('relu1', nn.ReLU()),
            ('dropout1', nn.Dropout(p)),
            ('fc2', Linear(self.hidden_size, self.output_size)),
            ('relu2', nn.ReLU()),
            ('dropout2', nn.Dropout(p)),
        ]))

    def forward(self, input_):

        out = self.layer(input_)

        return out


# 自定义的卷积层，内部使用一维卷积
class Conv(nn.Module):
    """
    Convolution Module
    """

    def __init__(self,
                 in_channels,
                 out_channels,
                 kernel_size=1,
                 stride=1,
                 padding=0,
                 dilation=1,
                 bias=True,
                 w_init='linear'):
        """
        :param in_channels: dimension of input
        :param out_channels: dimension of output
        :param kernel_size: size of kernel
        :param stride: size of stride
        :param padding: size of padding
        :param dilation: dilation rate
        :param bias: boolean. if True, bias is included.
        :param w_init: str. weight inits with xavier initialization.
        """
        super(Conv, self).__init__()

        self.conv = nn.Conv1d(in_channels,
                              out_channels,
                              kernel_size=kernel_size,
                              stride=stride,
                              padding=padding,
                              dilation=dilation,
                              bias=bias)

        nn.init.xavier_uniform_(
            self.conv.weight, gain=nn.init.calculate_gain(w_init))

    def forward(self, x):
        x = self.conv(x)
        return x


# FFT块
class FFTBlock(torch.nn.Module):
    """FFT Block"""

    def __init__(self,
                 d_model,
                 d_inner,
                 n_head,
                 d_k,
                 d_v,
                 dropout=0.1):
        super(FFTBlock, self).__init__()
        self.slf_attn = MultiHeadAttention(
            n_head, d_model, d_k, d_v, dropout=dropout)
        self.pos_ffn = PositionwiseFeedForward(
            d_model, d_inner, dropout=dropout)

    def forward(self, enc_input, non_pad_mask=None, slf_attn_mask=None):
        enc_output, enc_slf_attn = self.slf_attn(
            enc_input, enc_input, enc_input, mask=slf_attn_mask)
        enc_output *= non_pad_mask  # 只保留非pad的元素

        enc_output = self.pos_ffn(enc_output)
        enc_output *= non_pad_mask

        return enc_output, enc_slf_attn


# 能自动计算pad的一维卷积
class ConvNorm(torch.nn.Module):
    def __init__(self,
                 in_channels,
                 out_channels,
                 kernel_size=1,
                 stride=1,
                 padding=None,
                 dilation=1,
                 bias=True,
                 w_init_gain='linear'):
        super(ConvNorm, self).__init__()

        if padding is None:
            assert(kernel_size % 2 == 1)
            padding = int(dilation * (kernel_size - 1) / 2)

        self.conv = torch.nn.Conv1d(in_channels,
                                    out_channels,
                                    kernel_size=kernel_size,
                                    stride=stride,
                                    padding=padding,
                                    dilation=dilation,
                                    bias=bias)

        torch.nn.init.xavier_uniform_(
            self.conv.weight, gain=torch.nn.init.calculate_gain(w_init_gain))

    def forward(self, signal):
        conv_signal = self.conv(signal)

        return conv_signal


# 后处理网络，由5个卷积核大小为5，通道数为512的一维卷积堆叠；与tocatron和Transformer-TTS中一致
class PostNet(nn.Module):
    """
    PostNet: Five 1-d convolution with 512 channels and kernel size 5
    """

    def __init__(self,
                 n_mel_channels=80,
                 postnet_embedding_dim=512,
                 postnet_kernel_size=5,
                 postnet_n_convolutions=5):

        super(PostNet, self).__init__()
        self.convolutions = nn.ModuleList()

        self.convolutions.append(
            nn.Sequential(
                ConvNorm(n_mel_channels,
                         postnet_embedding_dim,
                         kernel_size=postnet_kernel_size,
                         stride=1,
                         padding=int((postnet_kernel_size - 1) / 2),
                         dilation=1,
                         w_init_gain='tanh'),

                nn.BatchNorm1d(postnet_embedding_dim))
        )

        for i in range(1, postnet_n_convolutions - 1):
            self.convolutions.append(
                nn.Sequential(
                    ConvNorm(postnet_embedding_dim,
                             postnet_embedding_dim,
                             kernel_size=postnet_kernel_size,
                             stride=1,
                             padding=int((postnet_kernel_size - 1) / 2),
                             dilation=1,
                             w_init_gain='tanh'),

                    nn.BatchNorm1d(postnet_embedding_dim))
            )

        self.convolutions.append(
            nn.Sequential(
                ConvNorm(postnet_embedding_dim,
                         n_mel_channels,
                         kernel_size=postnet_kernel_size,
                         stride=1,
                         padding=int((postnet_kernel_size - 1) / 2),
                         dilation=1,
                         w_init_gain='linear'),

                nn.BatchNorm1d(n_mel_channels))
        )

    def forward(self, x):
        x = x.contiguous().transpose(1, 2)

        for i in range(len(self.convolutions) - 1):
            x = F.dropout(torch.tanh(
                self.convolutions[i](x)), 0.5, self.training)
        x = F.dropout(self.convolutions[-1](x), 0.5, self.training)

        x = x.contiguous().transpose(1, 2)
        return x

Models.py

本文件中主要实现Fast Speech中的编码器和解码器

import torch
import torch.nn as nn
import numpy as np
import hparams as hp

import transformer.Constants as Constants
from transformer.Layers import FFTBlock, PreNet, PostNet, Linear


# 获取序列中非pad元素对应的mask
def get_non_pad_mask(seq):
    assert seq.dim() == 2
    return seq.ne(Constants.PAD).type(torch.float).unsqueeze(-1)


# 正弦位置编码
def get_sinusoid_encoding_table(n_position, d_hid, padding_idx=None):
    ''' Sinusoid position encoding table '''

    def cal_angle(position, hid_idx):
        return position / np.power(10000, 2 * (hid_idx // 2) / d_hid)

    def get_posi_angle_vec(position):
        return [cal_angle(position, hid_j) for hid_j in range(d_hid)]

    sinusoid_table = np.array([get_posi_angle_vec(pos_i)
                               for pos_i in range(n_position)])

    sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2])  # dim 2i
    sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2])  # dim 2i+1
    # 将正弦位置编码中与padding_idx对应位置的值设置为0
    if padding_idx is not None:
        # zero vector for padding dimension
        sinusoid_table[padding_idx] = 0.

    return torch.FloatTensor(sinusoid_table)


# 为每个key序列生成对应的掩码
def get_attn_key_pad_mask(seq_k, seq_q):
    ''' For masking out the padding part of key sequence. '''

    # Expand to fit the shape of key query attention matrix.
    len_q = seq_q.size(1)
    padding_mask = seq_k.eq(Constants.PAD)
    padding_mask = padding_mask.unsqueeze(
        1).expand(-1, len_q, -1)  # b x lq x lk

    return padding_mask


# Length Regulator之前的网络架构，即编码器
class Encoder(nn.Module):
    ''' Encoder '''

    def __init__(self,
                 n_src_vocab=hp.vocab_size,
                 len_max_seq=hp.vocab_size,
                 d_word_vec=hp.encoder_dim,
                 n_layers=hp.encoder_n_layer,
                 n_head=hp.encoder_head,
                 d_k=hp.encoder_dim // hp.encoder_head,
                 d_v=hp.encoder_dim // hp.encoder_head,
                 d_model=hp.encoder_dim,
                 d_inner=hp.encoder_conv1d_filter_size,
                 dropout=hp.dropout):

        super(Encoder, self).__init__()

        n_position = len_max_seq + 1
        # 相当于词嵌入层
        self.src_word_emb = nn.Embedding(n_src_vocab,
                                         d_word_vec,
                                         padding_idx=Constants.PAD)
        # 构建位置编码
        self.position_enc = nn.Embedding.from_pretrained(
            get_sinusoid_encoding_table(n_position, d_word_vec, padding_idx=0),
            freeze=True)
        # 堆叠多个FFT快
        self.layer_stack = nn.ModuleList([FFTBlock(
            d_model, d_inner, n_head, d_k, d_v, dropout=dropout) for _ in range(n_layers)])

    def forward(self, src_seq, src_pos, return_attns=False):

        enc_slf_attn_list = []

        # -- Prepare masks
        slf_attn_mask = get_attn_key_pad_mask(seq_k=src_seq, seq_q=src_seq)
        non_pad_mask = get_non_pad_mask(src_seq)

        # -- Forward，将音素序列转为嵌入向量
        enc_output = self.src_word_emb(src_seq) + self.position_enc(src_pos)

        # 进行注意力计算
        for enc_layer in self.layer_stack:
            enc_output, enc_slf_attn = enc_layer(
                enc_output,
                non_pad_mask=non_pad_mask,
                slf_attn_mask=slf_attn_mask)
            if return_attns:
                enc_slf_attn_list += [enc_slf_attn]

        return enc_output, non_pad_mask


# Length Regulator之后的网络架构，即解码器
class Decoder(nn.Module):
    """ Decoder """

    def __init__(self,
                 len_max_seq=hp.max_seq_len,
                 n_layers=hp.decoder_n_layer,
                 n_head=hp.decoder_head,
                 d_k=hp.decoder_dim // hp.decoder_head,
                 d_v=hp.decoder_dim // hp.decoder_head,
                 d_model=hp.decoder_dim,
                 d_inner=hp.decoder_conv1d_filter_size,
                 dropout=hp.dropout):

        super(Decoder, self).__init__()

        n_position = len_max_seq + 1

        self.position_enc = nn.Embedding.from_pretrained(
            get_sinusoid_encoding_table(n_position, d_model, padding_idx=0),
            freeze=True)

        self.layer_stack = nn.ModuleList([FFTBlock(
            d_model, d_inner, n_head, d_k, d_v, dropout=dropout) for _ in range(n_layers)])

    def forward(self, enc_seq, enc_pos, return_attns=False):

        dec_slf_attn_list = []

        # -- Prepare masks
        slf_attn_mask = get_attn_key_pad_mask(seq_k=enc_pos, seq_q=enc_pos)
        non_pad_mask = get_non_pad_mask(enc_pos)

        # -- Forward
        dec_output = enc_seq + self.position_enc(enc_pos)  # 进行二次位置编码

        for dec_layer in self.layer_stack:
            dec_output, dec_slf_attn = dec_layer(
                dec_output,
                non_pad_mask=non_pad_mask,
                slf_attn_mask=slf_attn_mask)
            if return_attns:
                dec_slf_attn_list += [dec_slf_attn]

        return dec_output

modules.py

相比于transformer路径下文件中的模块，本文件中的各个模块更加上层，包括Length Regulator等

import torch
import torch.nn as nn
import torch.nn.functional as F

from collections import OrderedDict
from numba import jit
import numpy as np
import copy
import math

import hparams as hp
import utils

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


def get_sinusoid_encoding_table(n_position, d_hid, padding_idx=None):
    ''' Sinusoid position encoding table '''

    def cal_angle(position, hid_idx):
        return position / np.power(10000, 2 * (hid_idx // 2) / d_hid)

    def get_posi_angle_vec(position):
        return [cal_angle(position, hid_j) for hid_j in range(d_hid)]

    sinusoid_table = np.array([get_posi_angle_vec(pos_i)
                               for pos_i in range(n_position)])

    sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2])  # dim 2i
    sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2])  # dim 2i+1

    if padding_idx is not None:
        # zero vector for padding dimension
        sinusoid_table[padding_idx] = 0.

    return torch.FloatTensor(sinusoid_table)


def clones(module, N):
    return nn.ModuleList([copy.deepcopy(module) for _ in range(N)])  # 深层clone，即参数不共享


# @jit(nopython=True)
def create_alignment(base_mat, duration_predictor_output):
    """
    基于对齐信息（音素序列对应的音素持续时间）调整对齐矩阵
    @param base_mat:元素值全为0的初始对齐矩阵，[batch_size, max_mel_length, max_sequence_len]
    @param duration_predictor_output:音素持续时间矩阵，[batch_size, max_sequence_len]
    @return:经过调整后的对齐矩阵
    """
    N, L = duration_predictor_output.shape  # 音素持续时间矩阵的尺寸
    for i in range(N):  # batch中的第i个音素序列
        count = 0
        for j in range(L):  # 第i个音素序列中第j个音素
            for k in range(duration_predictor_output[i][j]):  # duration_predictor_output[i][j]值表示[i,j]位置对应音素的长度
                base_mat[i][count+k][j] = 1  # duration_predictor_output[i][j]长度范围内连续位置均设置为1，出现几个1表示重复几次
            count = count + duration_predictor_output[i][j]
    return base_mat


class LengthRegulator(nn.Module):
    """ Length Regulator """

    def __init__(self):
        super(LengthRegulator, self).__init__()
        self.duration_predictor = DurationPredictor()

    # 对输入的音素序列x进行长度调整
    def LR(self, x, duration_predictor_output, mel_max_length=None):
        """
        基于音素持续时间将音素序列长度与音素谱图序列长度对齐一致
        @param x:经过FFT块转换后的音素序列，[batch_size, max_sequence_len, encoder_dim]
        @param duration_predictor_output:音素持续时间矩阵，[batch_size, max_sequence_len]
        @param mel_max_length:音素谱图序列中最大长度
        @return:长度经过调整后的音素序列，[batch_size, expand_max_len, encoder_dim]
        """
        # 获取所有音素谱图序列中长度最大值
        expand_max_len = torch.max(
            torch.sum(duration_predictor_output, -1), -1)[0]
        # 以0初始化对齐矩阵，[batch_size, expand_max_len, max_sequence_len]
        alignment = torch.zeros(duration_predictor_output.size(0),
                                expand_max_len,
                                duration_predictor_output.size(1)).numpy()
        # 以音素持续时间调整对齐矩阵，[batch_size, expand_max_len, max_sequence_len]
        alignment = create_alignment(alignment,
                                     duration_predictor_output.cpu().numpy())
        alignment = torch.from_numpy(alignment).to(device)
        # 对齐矩阵与输入x进行矩阵相乘运算后，音素序列x长度将于音素谱图序列长度一致
        output = alignment @ x  # [batch_size, expand_max_len, encoder_dim]
        if mel_max_length:  # 如果设置了音素谱图序列最大值，还需要使用0进行pad
            output = F.pad(
                output, (0, 0, 0, mel_max_length-output.size(1), 0, 0))
        return output

    def forward(self, x, alpha=1.0, target=None, mel_max_length=None):  # target是预先提取的音素持续时间，其有无决定模块输出
        duration_predictor_output = self.duration_predictor(x)  # duration predictor计算输出的音素持续时间

        if target is not None:  # 如果target存在，表示训练过程，目的是作为监督信息训练duration predictor
            output = self.LR(x, target, mel_max_length=mel_max_length)
            return output, duration_predictor_output
        else:  # target不存在，为推理过程，直接使用duration predictor输出的音素持续时间
            duration_predictor_output = (
                (duration_predictor_output + 0.5) * alpha).int()
            output = self.LR(x, duration_predictor_output)
            mel_pos = torch.stack(
                [torch.Tensor([i+1 for i in range(output.size(1))])]).long().to(device)

            return output, mel_pos


# 音素持续时间预测模块，内部是两层一维卷积再加上一个全连接层
class DurationPredictor(nn.Module):
    """ Duration Predictor """

    def __init__(self):
        super(DurationPredictor, self).__init__()

        self.input_size = hp.encoder_dim
        self.filter_size = hp.duration_predictor_filter_size
        self.kernel = hp.duration_predictor_kernel_size
        self.conv_output_size = hp.duration_predictor_filter_size
        self.dropout = hp.dropout

        self.conv_layer = nn.Sequential(OrderedDict([
            ("conv1d_1", Conv(self.input_size,
                              self.filter_size,
                              kernel_size=self.kernel,
                              padding=1)),
            ("layer_norm_1", nn.LayerNorm(self.filter_size)),
            ("relu_1", nn.ReLU()),
            ("dropout_1", nn.Dropout(self.dropout)),
            ("conv1d_2", Conv(self.filter_size,
                              self.filter_size,
                              kernel_size=self.kernel,
                              padding=1)),
            ("layer_norm_2", nn.LayerNorm(self.filter_size)),
            ("relu_2", nn.ReLU()),
            ("dropout_2", nn.Dropout(self.dropout))
        ]))

        self.linear_layer = Linear(self.conv_output_size, 1)
        self.relu = nn.ReLU()

    def forward(self, encoder_output):
        out = self.conv_layer(encoder_output)
        out = self.linear_layer(out)
        out = self.relu(out)
        out = out.squeeze()
        if not self.training:
            out = out.unsqueeze(0)
        return out


# 一维卷积后批量正则
class BatchNormConv1d(nn.Module):
    def __init__(self, in_dim, out_dim, kernel_size, stride, padding,
                 activation=None, w_init_gain='linear'):
        super(BatchNormConv1d, self).__init__()
        self.conv1d = nn.Conv1d(in_dim, out_dim,
                                kernel_size=kernel_size,
                                stride=stride, padding=padding, bias=False)
        self.bn = nn.BatchNorm1d(out_dim)
        self.activation = activation

        torch.nn.init.xavier_uniform_(
            self.conv1d.weight, gain=torch.nn.init.calculate_gain(w_init_gain))

    def forward(self, x):
        x = self.conv1d(x)
        if self.activation is not None:
            x = self.activation(x)
        return self.bn(x)


# 自定义一维卷积层，卷积计算前会将数据的后两个维度进行转置
class Conv(nn.Module):
    """
    Convolution Module
    """

    def __init__(self,
                 in_channels,
                 out_channels,
                 kernel_size=1,
                 stride=1,
                 padding=0,
                 dilation=1,
                 bias=True,
                 w_init='linear'):
        """
        :param in_channels: dimension of input
        :param out_channels: dimension of output
        :param kernel_size: size of kernel
        :param stride: size of stride
        :param padding: size of padding
        :param dilation: dilation rate
        :param bias: boolean. if True, bias is included.
        :param w_init: str. weight inits with xavier initialization.
        """
        super(Conv, self).__init__()

        self.conv = nn.Conv1d(in_channels,
                              out_channels,
                              kernel_size=kernel_size,
                              stride=stride,
                              padding=padding,
                              dilation=dilation,
                              bias=bias)

        nn.init.xavier_uniform_(
            self.conv.weight, gain=nn.init.calculate_gain(w_init))

    def forward(self, x):
        x = x.contiguous().transpose(1, 2)
        x = self.conv(x)
        x = x.contiguous().transpose(1, 2)

        return x


# 自定义线性全连接层
class Linear(nn.Module):
    """
    Linear Module
    """

    def __init__(self, in_dim, out_dim, bias=True, w_init='linear'):
        """
        :param in_dim: dimension of input
        :param out_dim: dimension of output
        :param bias: boolean. if True, bias is included.
        :param w_init: str. weight inits with xavier initialization.
        """
        super(Linear, self).__init__()
        self.linear_layer = nn.Linear(in_dim, out_dim, bias=bias)

        nn.init.xavier_uniform_(
            self.linear_layer.weight,
            gain=nn.init.calculate_gain(w_init))

    def forward(self, x):
        return self.linear_layer(x)


class Highway(nn.Module):
    def __init__(self, in_size, out_size):
        super(Highway, self).__init__()
        self.H = nn.Linear(in_size, out_size)
        self.H.bias.data.zero_()
        self.T = nn.Linear(in_size, out_size)
        self.T.bias.data.fill_(-1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, inputs):
        H = self.relu(self.H(inputs))
        T = self.sigmoid(self.T(inputs))
        return H * T + inputs * (1.0 - T)


# 前处理网络
class Prenet(nn.Module):
    """
    Prenet before passing through the network
    """

    def __init__(self, input_size, hidden_size, output_size):
        super(Prenet, self).__init__()
        self.input_size = input_size
        self.output_size = output_size
        self.hidden_size = hidden_size
        self.layer = nn.Sequential(OrderedDict([
            ('fc1', Linear(self.input_size, self.hidden_size)),
            ('relu1', nn.ReLU()),
            ('dropout1', nn.Dropout(0.5)),
            ('fc2', Linear(self.hidden_size, self.output_size)),
            ('relu2', nn.ReLU()),
            ('dropout2', nn.Dropout(0.5)),
        ]))

    def forward(self, x):
        out = self.layer(x)
        return out


# 此模块将mel谱图转换为线性幅度谱图，以供griff-lim或其它vocoder使用幅度谱图重建音频
class CBHG(nn.Module):
    """CBHG module: a recurrent neural network composed of:
        - 1-d convolution banks
        - Highway networks + residual connections
        - Bidirectional gated recurrent units
    """

    def __init__(self, in_dim, K=16, projections=[128, 128]):
        super(CBHG, self).__init__()
        self.in_dim = in_dim
        self.relu = nn.ReLU()
        self.conv1d_banks = nn.ModuleList(
            [BatchNormConv1d(in_dim, in_dim, kernel_size=k, stride=1,
                             padding=k // 2, activation=self.relu)
             for k in range(1, K + 1)])
        self.max_pool1d = nn.MaxPool1d(kernel_size=2, stride=1, padding=1)

        in_sizes = [K * in_dim] + projections[:-1]
        activations = [self.relu] * (len(projections) - 1) + [None]
        self.conv1d_projections = nn.ModuleList(
            [BatchNormConv1d(in_size, out_size, kernel_size=3, stride=1,
                             padding=1, activation=ac)
             for (in_size, out_size, ac) in zip(
                 in_sizes, projections, activations)])

        self.pre_highway = nn.Linear(projections[-1], in_dim, bias=False)
        self.highways = nn.ModuleList(
            [Highway(in_dim, in_dim) for _ in range(4)])

        self.gru = nn.GRU(
            in_dim, in_dim, 1, batch_first=True, bidirectional=True)

    def forward(self, inputs, input_lengths=None):
        # (B, T_in, in_dim)
        x = inputs

        # Needed to perform conv1d on time-axis
        # (B, in_dim, T_in)
        if x.size(-1) == self.in_dim:
            x = x.transpose(1, 2)

        T = x.size(-1)

        # (B, in_dim*K, T_in)
        # Concat conv1d bank outputs
        x = torch.cat([conv1d(x)[:, :, :T]
                       for conv1d in self.conv1d_banks], dim=1)
        assert x.size(1) == self.in_dim * len(self.conv1d_banks)
        x = self.max_pool1d(x)[:, :, :T]

        for conv1d in self.conv1d_projections:
            x = conv1d(x)

        # (B, T_in, in_dim)
        # Back to the original shape
        x = x.transpose(1, 2)

        if x.size(-1) != self.in_dim:
            x = self.pre_highway(x)

        # Residual connection
        x += inputs
        for highway in self.highways:
            x = highway(x)

        if input_lengths is not None:
            x = nn.utils.rnn.pack_padded_sequence(
                x, input_lengths, batch_first=True)

        # (B, T_in, in_dim*2)
        self.gru.flatten_parameters()
        outputs, _ = self.gru(x)

        if input_lengths is not None:
            outputs, _ = nn.utils.rnn.pad_packed_sequence(
                outputs, batch_first=True)

        return outputs


if __name__ == "__main__":
    # TEST，可以为音素序列长度调整的过程
    a = torch.Tensor([[2, 3, 4], [1, 2, 3]])  # 音素序列1
    b = torch.Tensor([[5, 6, 7], [7, 8, 9]])  # 音素序列2
    c = torch.stack([a, b])  # 相当于一个batch的音素序列

    d = torch.Tensor([[1, 4], [6, 3]]).int()  # 相当于duration predictor的输出，即音素持续时间
    expand_max_len = torch.max(torch.sum(d, -1), -1)[0]  # 获取同一batch中音素谱图序列的长度最大值
    base = torch.zeros(c.size(0), expand_max_len, c.size(1))  # 以0初始化对齐矩阵
    # 基于音素持续时间调整对齐矩阵
    alignment = create_alignment(base.numpy(), d.numpy())
    print(alignment)
    # 将对齐矩阵与batch数据执行矩阵乘法，使得音素序列长度得到调整，与音素谱图序列长度对齐
    # 结果如d[0][0]的值为1，那么c中[0][0]位置的音素就出现1次，结果如d[0][1]的值为4，那么c中[0][1]位置的音素就出现4次
    print(torch.from_numpy(alignment) @ c)

model.py

本文件基于前面所有的模块实现完成FastSpeech模型搭建

import torch
import torch.nn as nn
import hparams as hp
import utils

from transformer.Models import Encoder, Decoder
from transformer.Layers import Linear, PostNet
from modules import LengthRegulator, CBHG


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


class FastSpeech(nn.Module):
    """ FastSpeech """

    def __init__(self):
        super(FastSpeech, self).__init__()

        self.encoder = Encoder()  # Length Regulator之前网络为编码器
        self.length_regulator = LengthRegulator()
        self.decoder = Decoder()  # Length Regulator之后网络为解码器

        self.mel_linear = Linear(hp.decoder_dim, hp.num_mels) 
        self.postnet = CBHG(hp.num_mels, K=8, projections=[256, hp.num_mels])  # 使用CBHG作为后粗合理网络
        self.last_linear = Linear(hp.num_mels * 2, hp.num_mels)

    def mask_tensor(self, mel_output, position, mel_max_length):
        lengths = torch.max(position, -1)[0]
        mask = ~utils.get_mask_from_lengths(lengths, max_len=mel_max_length)
        mask = mask.unsqueeze(-1).expand(-1, -1, mel_output.size(-1))
        return mel_output.masked_fill(mask, 0.)

    def forward(self, src_seq, src_pos, mel_pos=None, mel_max_length=None, length_target=None, alpha=1.0):
        encoder_output, _ = self.encoder(src_seq, src_pos)  # 编码器输出，[b, max_sequence_len, encoder_dim]

        if self.training:  # 训练
            # length_regulator_output是长度调整后的音素序列，与音素谱图序列长度对齐，[b, max_mel_len, encoder_dim]
            # duration_predictor_output是训练过程中duration predictor计算数据的音素持续时间，[b, max_sequence_len]
            # 训练过程中，使用预先提取好的音素持续时间target作为监督信息训练duration predictor
            length_regulator_output, duration_predictor_output = self.length_regulator(encoder_output,
                                                                                       target=length_target,
                                                                                       alpha=alpha,
                                                                                       mel_max_length=mel_max_length)
            # 解码器输出，mel谱图尺寸的数据，[b, max_mel_len, num_mels]
            decoder_output = self.decoder(length_regulator_output, mel_pos)

            mel_output = self.mel_linear(decoder_output)  # [b, max_mel_len, num_mels]
            mel_output = self.mask_tensor(mel_output, mel_pos, mel_max_length)  # 将pad部分的数据置0
            residual = self.postnet(mel_output)  # [b, max_mel_len, num_mels]
            residual = self.last_linear(residual)  # [b, max_mel_len, num_mels]
            mel_postnet_output = mel_output + residual  # [b, max_mel_len, num_mels]
            mel_postnet_output = self.mask_tensor(mel_postnet_output,
                                                  mel_pos,
                                                  mel_max_length)  # [b, max_mel_len, num_mels]

            return mel_output, mel_postnet_output, duration_predictor_output  # 用于训练duration predictor
        else:  # 推理
            # 推理时直接使用训练后duration predictor输出的音素持续时间进行音素序列长度调整
            length_regulator_output, decoder_pos = self.length_regulator(encoder_output,
                                                                         alpha=alpha)

            decoder_output = self.decoder(length_regulator_output, decoder_pos)

            mel_output = self.mel_linear(decoder_output)
            residual = self.postnet(mel_output)
            residual = self.last_linear(residual)
            mel_postnet_output = mel_output + residual

            return mel_output, mel_postnet_output


if __name__ == "__main__":
    # Test
    model = FastSpeech()
    print(sum(param.numel() for param in model.parameters()))

本笔记主要记录所选择的fastspeech复现仓库中模型构建相关的代码，结合之前FastSppech论文阅读笔记笔记中的模型部分进行理解。本笔记主要是对代码进行详细的注释，读者若发现问题或错误，请评论指出，互相学习。

你可能感兴趣的:(github项目代码,TTS,深度学习,人工智能)

Android 开源组件和第三方库汇总 gyyzzr Android Android 开源框架
转载1、github排名https://github.com/trending,github搜索：https://github.com/search2、https://github.com/wasabeef/awesome-android-ui目录UIUI卫星菜单节选器下拉刷新模糊效果HUD与Toast进度条UI其它动画网络相关响应式编程地图数据库图像浏览及处理视频音频处理测试及调试动态更新热更新
PyTorch & TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）阿牛的药铺算法移植部署 pytorch tensorflow fpga开发
PyTorch&TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）引言：为什么算法移植工程师必须掌握框架基础？针对光学类产品算法FPGA移植岗位需求（如可见光/红外图像处理），深度学习框架是算法落地的"桥梁"——既要用PyTorch/TensorFlow验证算法可行性，又要将训练好的模型（如CNN、目标检测）转换为FPGA可部署的格式（ONNX、TFLite）。本文采用"
算法学习笔记：17.蒙特卡洛算法 ——从原理到实战，涵盖 LeetCode 与考研 408 例题
在计算机科学和数学领域，蒙特卡洛算法（MonteCarloAlgorithm）以其独特的随机抽样思想，成为解决复杂问题的有力工具。从圆周率的计算到金融风险评估，从物理模拟到人工智能，蒙特卡洛算法都发挥着不可替代的作用。本文将深入剖析蒙特卡洛算法的思想、解题思路，结合实际应用场景与Java代码实现，并融入考研408的相关考点，穿插图片辅助理解，帮助你全面掌握这一重要算法。蒙特卡洛算法的基本概念蒙特卡
无面试无offer? 你需要AI 求职co-pilot的帮助!
大家好啊，我写的开源免费求职AIco-pilot工具发布了v3.0.0，欢迎大家参与、使用!https://github.com/weicanie/prisma-ai一、项目介绍开源免费的求职co-pilot，自动化简历准备至offer到手的整个流程。优化您的项目、定制您的简历、为您匹配工作，并帮助您做好面试准备。二、核心价值prisma-ai旨在解决求职者在准备简历和寻找工作时最头疼的3个问题:
AI音乐模拟器：AIGC时代的智能音乐创作革命 lauo 人工智能 AIGC 开源前端机器人
AI音乐模拟器：AIGC时代的智能音乐创作革命引言：AIGC浪潮下的音乐创作新范式在数字化转型的浪潮中，人工智能生成内容（AIGC）正在重塑各个创意领域。音乐产业作为创意经济的重要组成部分，正经历着前所未有的变革。据最新市场研究数据显示，全球AI音乐市场规模预计将从2023年的5.8亿美元增长到2030年的26.8亿美元，年复合增长率高达24.3%。这一快速增长的市场背后，是AI音乐技术正在打破传
深度学习模型表征提取全解析 ZhangJiQun&MXP 教学 2024大模型以及算力 2021 AI python 深度学习人工智能 python embedding 语言模型
模型内部进行表征提取的方法在自然语言处理（NLP）中，“表征（Representation）”指将文本（词、短语、句子、文档等）转化为计算机可理解的数值形式（如向量、矩阵），核心目标是捕捉语言的语义、语法、上下文依赖等信息。自然语言表征技术可按“静态/动态”“有无上下文”“是否融入知识”等维度划分一、传统静态表征（无上下文，词级为主）这类方法为每个词分配固定向量，不考虑其在具体语境中的含义（无法解
个人网站一键引入免费开关评论功能 giscus 后端java
快速接入选择giscus连接到的仓库。请确保：该仓库是公开的，否则访客将无法查看discussion。giscusapp已安装，否则访客将无法评论和回应。Discussions功能已在你的仓库中启用。1、创建仓库github创建一个公开的仓库https://github.com/houbb/my-discussion2、安装apphttps://github.com/apps/giscus/ins
git cherry-pick使用教程
gitcherry-pick使用教程发版分支命名格式release-20241009单次commit命名格式【功能点概括】-开发人员名称-详细内容例如：【项目初始化】-眸廓-初始化项目代码，用于开发源分支gitcherry-pick功能简介gitcherry-pick是用来从一个分支中选择一个或多个特定的提交，并将这些提交应用到当前分支。这样可以只选择需要的更改，而不是合并整个分支。gitcher
视频分析：让AI看懂动态画面随机森林404 计算机视觉音视频人工智能 microsoft
引言：动态视觉理解的革命在数字信息爆炸的时代，视频已成为最主要的媒介形式。据统计，每分钟有超过500小时的视频内容被上传到YouTube平台，而全球互联网流量的82%来自视频数据传输。面对如此海量的视频内容，传统的人工处理方式已无法满足需求，这正是人工智能视频分析技术大显身手的舞台。视频分析技术赋予机器"看懂"动态画面的能力，使其能够自动理解、解释甚至预测视频中的内容，这一突破正在彻底改变我们与视
Centos7安装uwsgi详细步骤快乐骑行^_^ 大数据 Centos7 安装uwsgi
Centos7安装uwsgi详细步骤步骤一：下载源码到centos7服务器步骤二：解压步骤三：编译环境准备步骤四：进入解压目录，并且编译uwsgi步骤五：准备测试安装是否成功的python代码testUwsgi步骤六：启动uWSGI来运行一个HTTP服务器步骤七：服务器ip+端口号访问步骤一：下载源码到centos7服务器uwsgi最新版2.0.20下载地址如下：https://github.co
three前置课程知识
学习中文网(1.threejs文件包下载和目录简介|Three.js中文网)threejs官方文件包所有版本：https://github.com/mrdoob/three.js/releases更新迭代较快，要选择对应版本使用---下载zip压缩包Threejs官网中文文档链接：https://threejs.org/docs/index.html#manual/zh/重要的内容docs包:文档
【Qualcomm】高通SNPE框架简介、下载与使用 Jackilina_Stone 人工智能 Qualcomm SNPE
目录一高通SNPE框架1SNPE简介2QNN与SNPE3Capabilities4工作流程二SNPE的安装与使用1下载2Setup3SNPE的使用概述一高通SNPE框架1SNPE简介SNPE（SnapdragonNeuralProcessingEngine），是高通公司推出的面向移动端和物联网设备的深度学习推理框架。SNPE提供了一套完整的深度学习推理框架，能够支持多种深度学习模型，包括Pytor
深度学习篇---昇腾NPU&CANN 工具包 Atticus-Orion 上位机知识篇图像处理篇深度学习篇深度学习人工智能 NPU 昇腾 CANN
介绍昇腾NPU是华为推出的神经网络处理器，具有强大的AI计算能力，而CANN工具包则是面向AI场景的异构计算架构，用于发挥昇腾NPU的性能优势。以下是详细介绍：昇腾NPU架构设计：采用达芬奇架构，是一个片上系统，主要由特制的计算单元、大容量的存储单元和相应的控制单元组成。集成了多个CPU核心，包括控制CPU和AICPU，前者用于控制处理器整体运行，后者承担非矩阵类复杂计算。此外，还拥有AICore
深度学习图像分类数据集—桃子识别分类 AI街潜水的八角深度学习图像数据集深度学习分类人工智能
该数据集为图像分类数据集，适用于ResNet、VGG等卷积神经网络，SENet、CBAM等注意力机制相关算法，VisionTransformer等Transformer相关算法。数据集信息介绍：桃子识别分类：['B1','M2','R0','S3']训练数据集总共有6637张图片，每个文件夹单独放一种数据各子文件夹图片统计:·B1:1601张图片·M2:1800张图片·R0:1601张图片·S3:
法律科技领域人工智能代理构建的十个经验教训，一位人工智能工程师通过构建、部署和维护智能代理的经验教训来优化法律工作流程的历程。知识大胖 NVIDIA GPU和大语言模型开发教程人工智能 ai
目录介绍什么是代理人？为什么它对法律如此重要？法律技术中代理用例示例-合同审查代理-法律研究代理在LegalTech中使用代理的十个教训-教训1：即使代理很酷，它们也不能解决所有问题-教训2：选择最适合您用例的框架-教训3：能够快速迭代不同的模型-教训4：从简单开始，必要时扩展-教训5：使用跟踪解决方案；您将需要它-教训6：确保跟踪成本，代理循环可能很昂贵-教训7：将控制权交给最终用户（人在环路中
Llama-Omni会说话的人工智能“语音到语音LLM” 利用低延迟、高质量语音转语音 AI 彻底改变对话方式（教程含源码）知识大胖 NVIDIA GPU和大语言模型开发教程 llama 人工智能 nvidia llm
介绍“单靠技术是不够的——技术与文科、人文学科的结合，才能产生让我们心花怒放的成果。”——史蒂夫·乔布斯近年来，人机交互领域发生了重大变化，尤其是随着ChatGPT、GPT-4等大型语言模型(LLM)的出现。虽然这些模型主要基于文本，但人们对语音交互的兴趣日益浓厚，以使人机对话更加无缝和自然。然而，实现语音交互而不受语音转文本处理中常见的延迟和错误的影响仍然是一个挑战。关键字：Llama-Omni
什么是热力学计算？它如何帮助人工智能发展？知识大胖 NVIDIA GPU和大语言模型开发教程人工智能量子计算
现代计算的基础是晶体管，这是一种微型电子开关，可以用它构建逻辑门，从而创建CPU或GPU等复杂的数字电路。随着技术的进步，晶体管变得越来越小。根据摩尔定律，集成电路中晶体管的数量大约每两年增加一倍。这种指数级增长使得计算技术呈指数级发展。然而，晶体管尺寸的缩小是有限度的。我们很快就会达到晶体管无法工作的阈值。此外，人工智能的进步使得对计算能力的需求比以往任何时候都更加迫切。根本问题是自然是随机的（
上海交大：工具增强推理agent
标题：SciMaster:TowardsGeneral-PurposeScientificAIAgentsPartI.X-MasterasFoundation-CanWeLeadonHumanity’sLastExam?来源：arXiv,2507.05241摘要人工智能代理的快速发展激发了利用它们加速科学发现的长期雄心。实现这一目标需要深入了解人类知识的前沿。因此，人类的最后一次考试（HLE）为评
微算法科技的前沿探索：量子机器学习算法在视觉任务中的革新应用 MicroTech2025 量子计算算法
在信息技术飞速发展的今天，计算机视觉作为人工智能领域的重要分支，正逐步渗透到我们生活的方方面面。从自动驾驶到人脸识别，从医疗影像分析到安防监控，计算机视觉技术展现了巨大的应用潜力。然而，随着视觉任务复杂度的不断提升，传统机器学习算法在处理大规模、高维度数据时遇到了计算瓶颈。在此背景下，量子计算作为一种颠覆性的计算模式，以其独特的并行处理能力和指数级增长的计算空间，为解决这一难题提供了新的思路。微算
中国银联豪掷1亿采购海光C86架构服务器信创新态势海光芯片 C86 国产芯片海光信息
近日，中国银联国产服务器采购大单正式敲定，基于海光C86架构的服务器产品中标，项目金额超过1亿元。接下来，C86服务器将用于支撑中国银联的虚拟化、大数据、人工智能、研发测试等技术场景，进一步提升其业务处理能力、用户服务效率和信息安全水平。作为我国重要的银行卡组织和金融基础设施，中国银联在全球183个国家和地区设有银联受理网络，境内外成员机构超过2600家，是世界三大银行卡品牌之一。此次中国银联发力
【代码学习】扩散模型原理+代码李加号pluuuus CV基础代码学习扩散模型机器学习算法学习
来源：超详细的扩散模型（DiffusionModels）原理+代码-知乎(zhihu.com)代码：drizzlezyk/DDPM-MindSpore(github.com)DDPM1.Unet1.1正弦位置编码classSinusoidalPosEmb(nn.Cell):def__init__(self,dim):super().__init__()half_dim=dim//2#将给定的维度除
【医学影像】无痛安装mamba 周树皮医学影像 python
去年编辑的一个帖子。摆了一段时间后重新回归，发送一下作为状态分界线。很癫狂的体验，man，whatcanisay！issue查看我的狗急跳墙状态1.确定版本cudanvcc-Vpythonpython--versiontorchpipshowtorch2.下载对应版本wheelcausal-conv1d：https://github.com/Dao-AILab/causal-conv1d/rele
AI人工智能浪潮中文心一言的独特优势
AI人工智能浪潮中文心一言的独特优势：为什么它是中国市场的“AI主力军”？关键词：文心一言,AI大模型,中文处理,多模态融合,产业落地,安全可控,百度ERNIE摘要：在全球AI大模型浪潮中，百度文心一言（ERNIEBot）凭借“懂中文、会多模态、能落地、守规矩”的四大核心优势，成为中国市场最具竞争力的AI产品之一。本文将用“超级大脑”的比喻，从中文理解、多模态能力、产业生态融合、安全可控性四个维度
正义的算法迷宫—人工智能重构司法体系的技术悖论与文明试炼
一、法庭的数字化迁徙当美国威斯康星州法院采纳COMPAS算法评估被告再犯风险，当中国"智慧法院"系统年处理1.2亿件案件，司法体系正经历从石柱法典到代码裁判的范式革命。这场转型的核心驱动力是司法效率与公正的永恒张力：美国重罪案件平均审理周期达18个月，中国基层法官年人均结案357件（是德国同行的6倍），而算法能在0.3秒内完成百万份文书比对。人工智能渗透司法引发三重裂变：证据分析从经验推断转向数据
关于香橙派系统烧录，1.1.8或者1.1.10两个版本都无法启动Orangepi5 lindsayshuo ubuntu
先执行gitclonehttps://github.com/orangepi-xunlong/orangepi-build.gitgitlog默认会显示较新的提交记录。如果你需要查看更多的提交记录，可以使用以下方法：gitlog--oneline--graph--all这会以简洁的方式显示所有分支的提交记录，并以图形化的方式展示提交历史。输出如下：*7ebb9a0(HEAD->next,origi
两台pc如何高速度传输大文件费城之鹰其他两台电脑高速传输文件局域网不适用U盘传输资料网线直连两台电脑传资料
今天笔记本跑一个大一点的项目，8G的内存直接100%，i5的CPU直接75%并且在超频工作了，原本1.6Ghz的频率直接飙到了3.8Ghz，由于项目性质原因，采用的是公司配的笔记本，但是年初采购的联想E480，还在三包时间段内，公司不允许拆机增加内存，只能换一台新的台式机，听起来挺爽，有新设备，但是办公区域不准使用U盘这一类的存储设备，这就蛋疼了，大半年了项目代码，资料全在这个不够用的笔记本里，问
NumPy-@运算符详解 GG不是gg numpy numpy
NumPy-@运算符详解一、@运算符的起源与设计目标1.从数学到代码：符号的统一2.设计目标二、@运算符的核心语法与运算规则1.基础用法：二维矩阵乘法2.一维向量的矩阵语义3.高维数组：批次矩阵运算4.广播机制：灵活的形状匹配三、@运算符与其他乘法方式的核心区别1.对比`np.dot()`2.对比元素级乘法`*`3.对比`np.matrix`的`*`运算符四、典型应用场景：从基础到高阶1.深度学习
【python实战】不玩微博，一封邮件就能知道实时热榜，天秀吃瓜一条coding 从实战学python 人工智能 python linux 爬虫
❤️欢迎订阅《从实战学python》专栏，用python实现办公自动化、数据可视化、人工智能等各个方向的实战案例，有趣又有用！❤️更多精品专栏简介点这里有的人金玉其表败絮其中，有的人却若彩虹般绚烂，怦然心动前言哈喽，大家好，我是一条。在生活中我是一个不太喜欢逛娱乐平台的人，抖音、快手、微博我手机里都没装，甚至微信朋友圈都不看，但是自从开始写博客，有些热度不得不蹭。所以就有了这样一个需求，能不能让微
mac挂载阿里云盘做本地盘【webdav-aliyundriver】【CloudMounter】木有会杂七杂八家庭工作站鼓捣鼓捣
转自：提升工作效率-mac挂载阿里云盘做本地盘webdav-aliyundriver用来把阿里云盘变成本地文件服务器。CloudMounter这个工具可以用来挂载文件服务器当做本地磁盘。webdav-aliyundriver安装：github：https://github.com/messense/aliyundrive-webdav比较喜欢用docker，感觉干净些。下载镜像dockerpull
Git 分支管理规范
一、大公司的分支管理实践1.GitFlow（经典模型）master：主分支，仅用于发布正式版本featureelop：开发分支，集成新功能feature/*：功能分支，从featureelop分支创建，用于开发新功能release/*：发布分支，从featureelop分支创建，用于测试和修复hotfix/*：热修复分支，从master分支创建，用于紧急修复2.GitHubFlow（持续交付型）m
JAVA中的Enum 周凡杨 java enum 枚举
Enum是计算机编程语言中的一种数据类型---枚举类型。在实际问题中，有些变量的取值被限定在一个有限的范围内。例如，一个星期内只有七天我们通常这样实现上面的定义： public String monday; public String tuesday; public String wensday; public String thursday
赶集网mysql开发36条军规 Bill_chen mysql 业务架构设计 mysql调优 mysql性能优化
(一)核心军规 (1)不在数据库做运算 cpu计算务必移至业务层； (2)控制单表数据量 int型不超过1000w，含char则不超过500w；合理分表；限制单库表数量在300以内； (3)控制列数量字段少而精，字段数建议在20以内
Shell test命令 daizj shell 字符串 test 数字文件比较
Shell test命令 Shell中的 test 命令用于检查某个条件是否成立，它可以进行数值、字符和文件三个方面的测试。数值测试参数说明 -eq 等于则为真 -ne 不等于则为真 -gt 大于则为真 -ge 大于等于则为真 -lt 小于则为真 -le 小于等于则为真实例演示： num1=100 num2=100if test $[num1]
XFire框架实现WebService(二) 周凡杨 java webservice
有了XFire框架实现WebService(一)，就可以继续开发WebService的简单应用。 Webservice的服务端(WEB工程)：两个java bean类： Course.java package cn.com.bean; public class Course { private
重绘之画图板朱辉辉33 画图板
上次博客讲的五子棋重绘比较简单，因为只要在重写系统重绘方法paint（）时加入棋盘和棋子的绘制。这次我想说说画图板的重绘。画图板重绘难在需要重绘的类型很多，比如说里面有矩形，园，直线之类的，所以我们要想办法将里面的图形加入一个队列中，这样在重绘时就
Java的IO流西蜀石兰 java
刚学Java的IO流时，被各种inputStream流弄的很迷糊，看老罗视频时说想象成插在文件上的一根管道，当初听时觉得自己很明白，可到自己用时，有不知道怎么代码了。。。每当遇到这种问题时，我习惯性的从头开始理逻辑，会问自己一些很简单的问题，把这些简单的问题想明白了，再看代码时才不会迷糊。 IO流作用是什么？答：实现对文件的读写，这里的文件是广义的； Java如何实现程序到文件
No matching PlatformTransactionManager bean found for qualifier 'add' - neither 林鹤霄
java.lang.IllegalStateException: No matching PlatformTransactionManager bean found for qualifier 'add' - neither qualifier match nor bean name match! 网上找了好多的资料没能解决，后来发现：项目中使用的是xml配置的方式配置事务，但是
Row size too large (> 8126). Changing some columns to TEXT or BLOB aigo column
原文：http://stackoverflow.com/questions/15585602/change-limit-for-mysql-row-size-too-large 异常信息： Row size too large (> 8126). Changing some columns to TEXT or BLOB or using ROW_FORMAT=DYNAM
JS 格式化时间 alxw4616 JavaScript
/** * 格式化时间 2013/6/13 by 半仙 [email protected] * 需要 pad 函数 * 接收可用的时间值. * 返回替换时间占位符后的字符串 * * 时间占位符:年 Y 月 M 日 D 小时 h 分 m 秒 s 重复次数表示占位数 * 如 YYYY 4占4位 YY 占2位<p></p> * MM DD hh mm
队列中数据的移除问题百合不是茶队列移除
队列的移除一般都是使用的remov();都可以移除的,但是在昨天做线程移除的时候出现了点问题,没有将遍历出来的全部移除, 代码如下; // package com.Thread0715.com; import java.util.ArrayList; public class Threa
Runnable接口使用实例 bijian1013 java thread Runnable java多线程
Runnable接口 a. 该接口只有一个方法：public void run(); b. 实现该接口的类必须覆盖该run方法 c. 实现了Runnable接口的类并不具有任何天
oracle里的extend详解 bijian1013 oracle 数据库 extend
扩展已知的数组空间，例： DECLARE TYPE CourseList IS TABLE OF VARCHAR2(10); courses CourseList; BEGIN -- 初始化数组元素，大小为3 courses := CourseList('Biol 4412 ', 'Psyc 3112 ', 'Anth 3001 '); --
【httpclient】httpclient发送表单POST请求 bit1129 httpclient
浏览器Form Post请求浏览器可以通过提交表单的方式向服务器发起POST请求，这种形式的POST请求不同于一般的POST请求 1. 一般的POST请求，将请求数据放置于请求体中，服务器端以二进制流的方式读取数据，HttpServletRequest.getInputStream()。这种方式的请求可以处理任意数据形式的POST请求，比如请求数据是字符串或者是二进制数据 2. Form
【Hive十三】Hive读写Avro格式的数据 bit1129 hive
1. 原始数据 hive> select * from word; OK 1 MSN 10 QQ 100 Gtalk 1000 Skype 2. 创建avro格式的数据表 hive> CREATE TABLE avro_table(age INT, name STRING)STORE
nginx+lua+redis自动识别封解禁频繁访问IP ronin47
在站点遇到攻击且无明显攻击特征，造成站点访问慢，nginx不断返回502等错误时，可利用nginx+lua+redis实现在指定的时间段内，若单IP的请求量达到指定的数量后对该IP进行封禁，nginx返回403禁止访问。利用redis的expire命令设置封禁IP的过期时间达到在指定的封禁时间后实行自动解封的目的。一、安装环境： CentOS x64 release 6.4(Fin
java-二叉树的遍历-先序、中序、后序（递归和非递归）、层次遍历 bylijinnan java
import java.util.LinkedList; import java.util.List; import java.util.Stack; public class BinTreeTraverse { //private int[] array={ 1, 2, 3, 4, 5, 6, 7, 8, 9 }; private int[] array={ 10,6,
Spring源码学习-XML 配置方式的IoC容器启动过程分析 bylijinnan java spring IOC
以FileSystemXmlApplicationContext为例，把Spring IoC容器的初始化流程走一遍： ApplicationContext context = new FileSystemXmlApplicationContext ("C:/Users/ZARA/workspace/HelloSpring/src/Beans.xml&q
[科研与项目]民营企业请慎重参与军事科技工程 comsci 企业
军事科研工程和项目并非要用最先进，最时髦的技术，而是要做到“万无一失” 而民营科技企业在搞科技创新工程的时候，往往考虑的是技术的先进性，而对先进技术带来的风险考虑得不够，在今天提倡军民融合发展的大环境下，这种“万无一失”和“时髦性”的矛盾会日益凸显。。。。。。所以请大家在参与任何重大的军事和政府项目之前，对
spring 定时器-两种方式 cuityang spring quartz 定时器
方式一：间隔一定时间运行 <bean id="updateSessionIdTask" class="com.yang.iprms.common.UpdateSessionTask" autowire="byName" /> <bean id="updateSessionIdSchedule
简述一下关于BroadView站点的相关设计 damoqiongqiu view
终于弄上线了，累趴，戳这里http://www.broadview.com.cn 简述一下相关的技术点前端：jQuery+BootStrap3.2+HandleBars，全站Ajax（貌似对SEO的影响很大啊！怎么破？），用Grunt对全部JS做了压缩处理，对部分JS和CSS做了合并（模块间存在很多依赖，全部合并比较繁琐，待完善）。后端：U
运维 PHP问题汇总 dcj3sjt126com windows2003
1、Dede(织梦)发表文章时,内容自动添加关键字显示空白页解决方法：后台>系统>系统基本参数>核心设置>关键字替换（是/否），这里选择“是”。后台>系统>系统基本参数>其他选项>自动提取关键字，这里选择“是”。 2、解决PHP168超级管理员上传图片提示你的空间不足网站是用PHP168做的，反映使用管理员在后台无法
mac 下安装php扩展 - mcrypt dcj3sjt126com PHP
MCrypt是一个功能强大的加密算法扩展库，它包括有22种算法，phpMyAdmin依赖这个PHP扩展，具体如下：下载并解压libmcrypt-2.5.8.tar.gz。在终端执行如下命令： tar zxvf libmcrypt-2.5.8.tar.gz cd libmcrypt-2.5.8/ ./configure --disable-posix-threads --
MongoDB更新文档 [四] eksliang mongodb Mongodb更新文档
MongoDB更新文档转载请出自出处：http://eksliang.iteye.com/blog/2174104 MongoDB对文档的CURD，前面的博客简单介绍了，但是对文档更新篇幅比较大，所以这里单独拿出来。语法结构如下： db.collection.update( criteria, objNew, upsert, multi) 参数含义参数
Linux下的解压，移除，复制，查看tomcat命令 y806839048 tomcat
重复myeclipse生成webservice有问题删除以前的，干净 1、先切换到：cd usr/local/tomcat5/logs 2、tail -f catalina.out 3、这样运行时就可以实时查看运行日志了 Ctrl+c 是退出tail命令。有问题不明的先注掉 cp /opt/tomcat-6.0.44/webapps/g
Spring之使用事务缘由(3-XML实现) ihuning spring
用事务通知声明式地管理事务事务管理是一种横切关注点。为了在 Spring 2.x 中启用声明式事务管理，可以通过 tx Schema 中定义的 <tx:advice> 元素声明事务通知，为此必须事先将这个 Schema 定义添加到 <beans> 根元素中去。声明了事务通知后，就需要将它与切入点关联起来。由于事务通知是在 <aop:
GCD使用经验与技巧浅谈啸笑天 GC
前言 GCD(Grand Central Dispatch)可以说是Mac、iOS开发中的一大“利器”，本文就总结一些有关使用GCD的经验与技巧。 dispatch_once_t必须是全局或static变量这一条算是“老生常谈”了，但我认为还是有必要强调一次，毕竟非全局或非static的dispatch_once_t变量在使用时会导致非常不好排查的bug，正确的如下： 1
linux（Ubuntu）下常用命令备忘录1 macroli linux 工作 ubuntu
在使用下面的命令是可以通过--help来获取更多的信息1,查询当前目录文件列表：ls ls命令默认状态下将按首字母升序列出你当前文件夹下面的所有内容，但这样直接运行所得到的信息也是比较少的，通常它可以结合以下这些参数运行以查询更多的信息： ls / 显示/.下的所有文件和目录 ls -l 给出文件或者文件夹的详细信息 ls -a 显示所有文件，包括隐藏文
nodejs同步操作mysql qiaolevip 学习永无止境每天进步一点点 mysql nodejs
// db-util.js var mysql = require('mysql'); var pool = mysql.createPool({ connectionLimit : 10, host: 'localhost', user: 'root', password: '', database: 'test', port: 3306 });
一起学Hive系列文章 superlxw1234 hive Hive入门
[一起学Hive]系列文章目录贴，入门Hive，持续更新中。 [一起学Hive]之一—Hive概述，Hive是什么 [一起学Hive]之二—Hive函数大全-完整版 [一起学Hive]之三—Hive中的数据库(Database)和表(Table) [一起学Hive]之四-Hive的安装配置 [一起学Hive]之五-Hive的视图和分区 [一起学Hive
Spring开发利器：Spring Tool Suite 3.7.0 发布 wiselyman spring
Spring Tool Suite(简称STS)是基于Eclipse，专门针对Spring开发者提供大量的便捷功能的优秀开发工具。在3.7.0版本主要做了如下的更新：将eclipse版本更新至Eclipse Mars 4.5 GA Spring Boot(JavaEE开发的颠覆者集大成者，推荐大家学习)的配置语言YAML编辑器的支持(包含自动提示，