[机器翻译] 记一次多语言机器翻译模型的训练

文章目录

  • 前言
  • 数据准备
    • 数据下载
    • 数据预处理
  • 模型训练
  • 补充
    • 补充一:Key error while accessing batch_iterator.first_batch
  • 参考

前言

笔者尝试复现LaSS工作,由于该工作所做的第一步就是训练一个多语言机器翻译模型,故记录在此,本文主要内容是数据准备的步骤。

数据准备

实验使用iwslt 14中的8个以英语为中心的语言对,完成16个方向的多语言机器翻译。目前使用该数据集是因为其数据量相对较小,模型训练速度较快,笔者觉得比较适合用于机器翻译上手、比较不同模型性能的优劣。数据集的统计信息如下图所示:
[机器翻译] 记一次多语言机器翻译模型的训练_第1张图片
下面介绍数据的下载和预处理。假设现在的所在目录为/data/syxu/data/data_store/iwslt14

数据下载

从https://wit3.fbk.eu/2014-01链接中下载得到2014-01.tgz文件夹,保存至当前目录,tar zxvf 2014-01.tgz进行文件解压缩。2014-01中的内容如下:
[机器翻译] 记一次多语言机器翻译模型的训练_第2张图片
使用cp -r 2014-01/texts/*/en/*-en.tgz .将需要使用到的压缩文件提取到当前目录,得到:
[机器翻译] 记一次多语言机器翻译模型的训练_第3张图片

数据预处理

数据预处理部分的脚本代码参照LaSS和multilingual-kd-pytorch。
具体来说,首先在当前目录下创建预处理脚本文件:prepare-iwslt14.sh和preprocess_multilingual.py,这两个文件各自的代码如下:

  • prepare-iwslt14.sh
#!/usr/bin/env bash
#
# Adapted from https://github.com/facebookresearch/MIXER/blob/master/prepareData.sh
echo 'Cloning Moses github repository (for tokenization scripts)...'
git clone https://github.com/moses-smt/mosesdecoder.git

echo 'Cloning Subword NMT repository (for BPE pre-processing)...'
git clone https://github.com/rsennrich/subword-nmt.git

SCRIPTS=mosesdecoder/scripts
TOKENIZER=$SCRIPTS/tokenizer/tokenizer.perl
LC=$SCRIPTS/tokenizer/lowercase.perl
CLEAN=$SCRIPTS/training/clean-corpus-n.perl
BPEROOT=subword-nmt
BPE_TOKENS=30000
prep=iwslt14.tokenized
tmp=$prep/tmp
orig=orig
rm -r $orig
rm -r $tmp
rm -r $prep
mkdir -p $orig $tmp $prep

for src in ar de es fa he it nl pl; do
    tgt=en
    lang=$src-en

    echo "pre-processing train data..."
    for l in $src $tgt; do
        if [[ ! -f $src-en.tgz ]]; then
            wget https://wit3.fbk.eu/archive/2014-01//texts/$src/en/$src-en.tgz
        fi
        cd $orig
        tar zxvf ../$src-en.tgz
        cd ..

        f=train.tags.$lang.$l
        tok=train.tags.$lang.tok.$l

        cat $orig/$lang/$f | \
        grep -v '' | \
        grep -v '' | \
        grep -v '' | \
        sed -e 's///g'</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
        <span class="token function">sed</span> -e <span class="token string">'s/<\/title>//g'</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
        <span class="token function">sed</span> -e <span class="token string">'s/<description>//g'</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
        <span class="token function">sed</span> -e <span class="token string">'s/<\/description>//g'</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
        perl <span class="token variable">$TOKENIZER</span> -threads <span class="token number">8</span> -l <span class="token variable">$l</span> <span class="token operator">></span> <span class="token variable">$tmp</span>/<span class="token variable">$tok</span>
        <span class="token builtin class-name">echo</span> <span class="token string">""</span>
    <span class="token keyword">done</span>
    perl <span class="token variable">$CLEAN</span> -ratio <span class="token number">1.5</span> <span class="token variable">$tmp</span>/train.tags.<span class="token variable">$lang</span>.tok <span class="token variable">$src</span> <span class="token variable">$tgt</span> <span class="token variable">$tmp</span>/train.tags.<span class="token variable">$lang</span>.clean <span class="token number">1</span> <span class="token number">175</span>
    <span class="token keyword">for</span> <span class="token for-or-select variable">l</span> <span class="token keyword">in</span> <span class="token variable">$src</span> <span class="token variable">$tgt</span><span class="token punctuation">;</span> <span class="token keyword">do</span>
        perl <span class="token variable">$LC</span> <span class="token operator"><</span> <span class="token variable">$tmp</span>/train.tags.<span class="token variable">$lang</span>.clean.<span class="token variable">$l</span> <span class="token operator">></span> <span class="token variable">$tmp</span>/train.tags.<span class="token variable">$lang</span><span class="token builtin class-name">.</span><span class="token variable">$l</span>
    <span class="token keyword">done</span>

    <span class="token builtin class-name">echo</span> <span class="token string">"pre-processing valid/test data..."</span>
    <span class="token keyword">for</span> <span class="token for-or-select variable">l</span> <span class="token keyword">in</span> <span class="token variable">$src</span> <span class="token variable">$tgt</span><span class="token punctuation">;</span> <span class="token keyword">do</span>
        <span class="token keyword">for</span> <span class="token for-or-select variable">o</span> <span class="token keyword">in</span> <span class="token variable"><span class="token variable">`</span><span class="token function">ls</span> $orig/$lang/IWSLT14.TED*.$l.xml<span class="token variable">`</span></span><span class="token punctuation">;</span> <span class="token keyword">do</span>
        <span class="token assign-left variable">fname</span><span class="token operator">=</span><span class="token variable">${o<span class="token operator">##</span>*<span class="token operator">/</span>}</span>
        <span class="token assign-left variable">f</span><span class="token operator">=</span><span class="token variable">$tmp</span>/<span class="token variable">${fname<span class="token operator">%</span>.*}</span>
        <span class="token builtin class-name">echo</span> <span class="token variable">$o</span> <span class="token variable">$f</span>
        <span class="token function">grep</span> <span class="token string">'<seg id'</span> <span class="token variable">$o</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
            <span class="token function">sed</span> -e <span class="token string">'s/<seg id="[0-9]*">\s*//g'</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
            <span class="token function">sed</span> -e <span class="token string">'s/\s*<\/seg>\s*//g'</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
            <span class="token function">sed</span> -e <span class="token string">"s/\’/\'/g"</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
        perl <span class="token variable">$TOKENIZER</span> -threads <span class="token number">8</span> -l <span class="token variable">$l</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
        perl <span class="token variable">$LC</span> <span class="token operator">></span> <span class="token variable">$f</span>
        <span class="token builtin class-name">echo</span> <span class="token string">""</span>
        <span class="token keyword">done</span>
    <span class="token keyword">done</span>


    <span class="token builtin class-name">echo</span> <span class="token string">"creating train, valid, test..."</span>
    <span class="token keyword">for</span> <span class="token for-or-select variable">l</span> <span class="token keyword">in</span> <span class="token variable">$src</span> <span class="token variable">$tgt</span><span class="token punctuation">;</span> <span class="token keyword">do</span>
        <span class="token function">awk</span> <span class="token string">'{if (NR%23 == 0)  print $0; }'</span> <span class="token variable">$tmp</span>/train.tags.<span class="token variable">$src</span>-<span class="token variable">$tgt</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token operator">></span> <span class="token variable">$tmp</span>/valid.en-<span class="token variable">$src</span><span class="token builtin class-name">.</span><span class="token variable">$l</span>
        <span class="token function">awk</span> <span class="token string">'{if (NR%23 != 0)  print $0; }'</span> <span class="token variable">$tmp</span>/train.tags.<span class="token variable">$src</span>-<span class="token variable">$tgt</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token operator">></span> <span class="token variable">$tmp</span>/train.en-<span class="token variable">$src</span><span class="token builtin class-name">.</span><span class="token variable">$l</span>

        <span class="token function">cat</span> <span class="token variable">$tmp</span>/IWSLT14.TED.dev2010.<span class="token variable">$src</span>-<span class="token variable">$tgt</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token punctuation">\</span>
            <span class="token variable">$tmp</span>/IWSLT14.TEDX.dev2012.<span class="token variable">$src</span>-<span class="token variable">$tgt</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token punctuation">\</span>
            <span class="token variable">$tmp</span>/IWSLT14.TED.tst2010.<span class="token variable">$src</span>-<span class="token variable">$tgt</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token punctuation">\</span>
            <span class="token variable">$tmp</span>/IWSLT14.TED.tst2011.<span class="token variable">$src</span>-<span class="token variable">$tgt</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token punctuation">\</span>
            <span class="token variable">$tmp</span>/IWSLT14.TED.tst2012.<span class="token variable">$src</span>-<span class="token variable">$tgt</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token punctuation">\</span>
            <span class="token operator">></span> <span class="token variable">$tmp</span>/test.en-<span class="token variable">$src</span><span class="token builtin class-name">.</span><span class="token variable">$l</span>
    <span class="token keyword">done</span>

    <span class="token assign-left variable">TRAIN</span><span class="token operator">=</span><span class="token variable">$tmp</span>/train.all
    <span class="token assign-left variable">BPE_CODE</span><span class="token operator">=</span><span class="token variable">$prep</span>/code
    <span class="token function">rm</span> -f <span class="token variable">$TRAIN</span>
    <span class="token keyword">for</span> <span class="token for-or-select variable">l</span> <span class="token keyword">in</span> <span class="token variable">$src</span> <span class="token variable">$tgt</span><span class="token punctuation">;</span> <span class="token keyword">do</span>
        <span class="token function">cat</span> <span class="token variable">$tmp</span>/train.en-<span class="token variable">$src</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token operator">>></span> <span class="token variable">$TRAIN</span>
    <span class="token keyword">done</span>
<span class="token keyword">done</span>
<span class="token builtin class-name">echo</span> <span class="token string">"learn_bpe.py on <span class="token variable">${TRAIN}</span>..."</span>
python <span class="token variable">$BPEROOT</span>/learn_bpe.py -s <span class="token variable">$BPE_TOKENS</span> <span class="token operator"><</span> <span class="token variable">$TRAIN</span> <span class="token operator">></span> <span class="token variable">$BPE_CODE</span>

<span class="token keyword">for</span> <span class="token for-or-select variable">src</span> <span class="token keyword">in</span> ar de es fa he it <span class="token function">nl</span> pl<span class="token punctuation">;</span> <span class="token keyword">do</span>
    <span class="token keyword">for</span> <span class="token for-or-select variable">L</span> <span class="token keyword">in</span> <span class="token variable">$src</span> <span class="token variable">$tgt</span><span class="token punctuation">;</span> <span class="token keyword">do</span>
        <span class="token keyword">for</span> <span class="token for-or-select variable">f</span> <span class="token keyword">in</span> train.en-<span class="token variable">$src</span><span class="token builtin class-name">.</span><span class="token variable">$L</span> valid.en-<span class="token variable">$src</span><span class="token builtin class-name">.</span><span class="token variable">$L</span> test.en-<span class="token variable">$src</span><span class="token builtin class-name">.</span><span class="token variable">$L</span><span class="token punctuation">;</span> <span class="token keyword">do</span>
            <span class="token builtin class-name">echo</span> <span class="token string">"apply_bpe.py to <span class="token variable">${f}</span>..."</span>
            python <span class="token variable">$BPEROOT</span>/apply_bpe.py -c <span class="token variable">$BPE_CODE</span> <span class="token operator"><</span> <span class="token variable">$tmp</span>/<span class="token variable">$f</span> <span class="token operator">></span> <span class="token variable">$prep</span>/<span class="token variable">$f</span>
        <span class="token keyword">done</span>
    <span class="token keyword">done</span>
<span class="token keyword">done</span>

<span class="token function">rm</span> -r text
<span class="token function">mkdir</span> -p text/train_data
<span class="token function">mkdir</span> -p text/valid_data
<span class="token function">mkdir</span> -p text/test_data
<span class="token function">cp</span> iwslt14.tokenized/train.en-* text/train_data/
<span class="token function">cp</span> iwslt14.tokenized/valid.en-* text/valid_data/
<span class="token function">cp</span> iwslt14.tokenized/test.en-* text/test_data/
<span class="token builtin class-name">cd</span> <span class="token punctuation">..</span>
python iwslt14/preprocess_multilingual.py --pref<span class="token operator">=</span>iwslt14/  --joined-dictionary
</code></pre> 
  <ul> 
   <li>preprocess_multilingual.py</li> 
  </ul> 
  <pre><code class="prism language-python"><span class="token comment">#!/usr/bin/env python3</span>
<span class="token comment"># Copyright (c) 2017-present, Facebook, Inc.</span>
<span class="token comment"># All rights reserved.</span>
<span class="token comment">#</span>
<span class="token comment"># This source code is licensed under the license found in the LICENSE file in</span>
<span class="token comment"># the root directory of this source tree. An additional grant of patent rights</span>
<span class="token comment"># can be found in the PATENTS file in the same directory.</span>
<span class="token triple-quoted-string string">"""
Data pre-processing: build vocabularies and binarize training data.
"""</span>

<span class="token keyword">import</span> argparse
<span class="token keyword">import</span> glob
<span class="token keyword">import</span> json
<span class="token keyword">import</span> random
<span class="token keyword">from</span> collections <span class="token keyword">import</span> Counter
<span class="token keyword">from</span> itertools <span class="token keyword">import</span> zip_longest
<span class="token keyword">import</span> os
<span class="token keyword">import</span> shutil

<span class="token keyword">from</span> fairseq<span class="token punctuation">.</span>data <span class="token keyword">import</span> indexed_dataset<span class="token punctuation">,</span> dictionary
<span class="token keyword">from</span> fairseq<span class="token punctuation">.</span>tokenizer <span class="token keyword">import</span> Tokenizer<span class="token punctuation">,</span> tokenize_line
<span class="token keyword">from</span> multiprocessing <span class="token keyword">import</span> Pool<span class="token punctuation">,</span> Manager<span class="token punctuation">,</span> Process


<span class="token keyword">def</span> <span class="token function">get_parser</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
    parser <span class="token operator">=</span> argparse<span class="token punctuation">.</span>ArgumentParser<span class="token punctuation">(</span><span class="token punctuation">)</span>
    parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">'--pref'</span><span class="token punctuation">,</span> metavar<span class="token operator">=</span><span class="token string">'FP'</span><span class="token punctuation">,</span> default<span class="token operator">=</span><span class="token boolean">None</span><span class="token punctuation">,</span> <span class="token builtin">help</span><span class="token operator">=</span><span class="token string">'data prefix'</span><span class="token punctuation">)</span>
    parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">'--no-dict'</span><span class="token punctuation">,</span> action<span class="token operator">=</span><span class="token string">'store_true'</span><span class="token punctuation">,</span> <span class="token builtin">help</span><span class="token operator">=</span><span class="token string">'do not build dictionary'</span><span class="token punctuation">)</span>
    parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">'--nwordssrc'</span><span class="token punctuation">,</span> metavar<span class="token operator">=</span><span class="token string">'N'</span><span class="token punctuation">,</span> default<span class="token operator">=</span><span class="token number">65536</span><span class="token punctuation">,</span> <span class="token builtin">type</span><span class="token operator">=</span><span class="token builtin">int</span><span class="token punctuation">,</span> <span class="token builtin">help</span><span class="token operator">=</span><span class="token string">'number of target words to retain'</span><span class="token punctuation">)</span>
    parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">'--padding-factor'</span><span class="token punctuation">,</span> metavar<span class="token operator">=</span><span class="token string">'N'</span><span class="token punctuation">,</span> default<span class="token operator">=</span><span class="token number">8</span><span class="token punctuation">,</span> <span class="token builtin">type</span><span class="token operator">=</span><span class="token builtin">int</span><span class="token punctuation">,</span>
                        <span class="token builtin">help</span><span class="token operator">=</span><span class="token string">'Pad dictionary size to be multiple of N'</span><span class="token punctuation">)</span>
    parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">'--joined-dictionary'</span><span class="token punctuation">,</span> action<span class="token operator">=</span><span class="token string">'store_true'</span><span class="token punctuation">,</span> <span class="token builtin">help</span><span class="token operator">=</span><span class="token string">'Generate joined dictionary for en-xx'</span><span class="token punctuation">)</span>
    parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">'--expert'</span><span class="token punctuation">,</span> default<span class="token operator">=</span><span class="token string">''</span><span class="token punctuation">,</span> <span class="token builtin">type</span><span class="token operator">=</span><span class="token builtin">str</span><span class="token punctuation">)</span>
    parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">'--workers'</span><span class="token punctuation">,</span> metavar<span class="token operator">=</span><span class="token string">'N'</span><span class="token punctuation">,</span> default<span class="token operator">=</span><span class="token number">4</span><span class="token punctuation">,</span> <span class="token builtin">type</span><span class="token operator">=</span><span class="token builtin">int</span><span class="token punctuation">,</span> <span class="token builtin">help</span><span class="token operator">=</span><span class="token string">'number of parallel workers'</span><span class="token punctuation">)</span>
    <span class="token comment"># parser.add_argument('--workers', metavar='N', default=os.cpu_count(), type=int, help='number of parallel workers')</span>
    <span class="token keyword">return</span> parser


<span class="token keyword">def</span> <span class="token function">main</span><span class="token punctuation">(</span>args<span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span>args<span class="token punctuation">)</span>
    random<span class="token punctuation">.</span>seed<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span>

    destdir <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>args<span class="token punctuation">.</span>pref<span class="token punctuation">,</span> <span class="token string">'data-bin'</span> <span class="token operator">+</span> <span class="token punctuation">(</span><span class="token string">''</span> <span class="token keyword">if</span> args<span class="token punctuation">.</span>expert <span class="token operator">==</span> <span class="token string">''</span> <span class="token keyword">else</span> <span class="token string">'/'</span> <span class="token operator">+</span> args<span class="token punctuation">.</span>expert<span class="token punctuation">)</span><span class="token punctuation">)</span>
    os<span class="token punctuation">.</span>makedirs<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> exist_ok<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span>
    dict_path <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string">'dict.txt'</span><span class="token punctuation">)</span>

    textdir <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>args<span class="token punctuation">.</span>pref<span class="token punctuation">,</span> <span class="token string">'text'</span><span class="token punctuation">)</span>
    train_dir <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>textdir<span class="token punctuation">,</span> <span class="token string">'train_data'</span><span class="token punctuation">)</span>
    test_dir <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>textdir<span class="token punctuation">,</span> <span class="token string">'test_data'</span><span class="token punctuation">)</span>
    valid_dir <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>textdir<span class="token punctuation">,</span> <span class="token string">'valid_data'</span><span class="token punctuation">)</span>
    <span class="token comment"># if args.expert != '':</span>
    <span class="token comment"># train_files = glob.glob('{}/train.{}-en.*.e'.format(train_dir, args.expert)) + \</span>
    <span class="token comment">#               glob.glob('{}/train.en-{}.*.e'.format(train_dir, args.expert))</span>
    <span class="token comment"># pass</span>
    <span class="token comment"># else:</span>
    train_files <span class="token operator">=</span> glob<span class="token punctuation">.</span>glob<span class="token punctuation">(</span><span class="token string">'{}/train.*-*.*'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>train_dir<span class="token punctuation">)</span><span class="token punctuation">)</span>
    train_files <span class="token operator">=</span> <span class="token punctuation">[</span>f <span class="token keyword">for</span> f <span class="token keyword">in</span> train_files <span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>f<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'.'</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">in</span> <span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
    test_files <span class="token operator">=</span> glob<span class="token punctuation">.</span>glob<span class="token punctuation">(</span><span class="token string">'{}/test.*-*.*'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>test_dir<span class="token punctuation">)</span><span class="token punctuation">)</span>
    test_files <span class="token operator">=</span> <span class="token punctuation">[</span>f <span class="token keyword">for</span> f <span class="token keyword">in</span> test_files <span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>f<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'.'</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">in</span> <span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
    valid_files <span class="token operator">=</span> glob<span class="token punctuation">.</span>glob<span class="token punctuation">(</span><span class="token string">'{}/valid.*-*.*'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>valid_dir<span class="token punctuation">)</span><span class="token punctuation">)</span>
    valid_files <span class="token operator">=</span> <span class="token punctuation">[</span>f <span class="token keyword">for</span> f <span class="token keyword">in</span> valid_files <span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>f<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'.'</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">in</span> <span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
    lng_pairs <span class="token operator">=</span> <span class="token builtin">set</span><span class="token punctuation">(</span><span class="token punctuation">[</span>f<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'/'</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">"."</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token keyword">for</span> f <span class="token keyword">in</span> <span class="token punctuation">(</span>train_files <span class="token operator">+</span> test_files <span class="token operator">+</span> valid_files<span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span>train_files<span class="token punctuation">,</span> test_files<span class="token punctuation">,</span> valid_files<span class="token punctuation">,</span> lng_pairs<span class="token punctuation">)</span>

    <span class="token keyword">def</span> <span class="token function">build_dictionary</span><span class="token punctuation">(</span>filenames<span class="token punctuation">)</span><span class="token punctuation">:</span>
        d <span class="token operator">=</span> dictionary<span class="token punctuation">.</span>Dictionary<span class="token punctuation">(</span><span class="token punctuation">)</span>
        <span class="token keyword">for</span> filename <span class="token keyword">in</span> filenames<span class="token punctuation">:</span>
            Tokenizer<span class="token punctuation">.</span>add_file_to_dictionary<span class="token punctuation">(</span>filename<span class="token punctuation">,</span> d<span class="token punctuation">,</span> tokenize_line<span class="token punctuation">,</span> args<span class="token punctuation">.</span>workers<span class="token punctuation">)</span>
        <span class="token keyword">return</span> d

    tgt_dict_path <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string">'dict.tgt.txt'</span><span class="token punctuation">)</span>
    <span class="token keyword">if</span> <span class="token keyword">not</span> args<span class="token punctuation">.</span>no_dict<span class="token punctuation">:</span>
        <span class="token keyword">if</span> args<span class="token punctuation">.</span>joined_dictionary<span class="token punctuation">:</span>
            src_dict <span class="token operator">=</span> build_dictionary<span class="token punctuation">(</span>train_files<span class="token punctuation">)</span>
            src_dict<span class="token punctuation">.</span>finalize<span class="token punctuation">(</span>
                nwords<span class="token operator">=</span>args<span class="token punctuation">.</span>nwordssrc<span class="token punctuation">,</span>
                padding_factor<span class="token operator">=</span>args<span class="token punctuation">.</span>padding_factor
            <span class="token punctuation">)</span>
            dict_path <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string">'dict.txt'</span><span class="token punctuation">)</span>
            <span class="token comment"># create dict for every language</span>
            <span class="token keyword">for</span> lng_pair <span class="token keyword">in</span> lng_pairs<span class="token punctuation">:</span>
                src<span class="token punctuation">,</span> tgt <span class="token operator">=</span> lng_pair<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'-'</span><span class="token punctuation">)</span>
                tmp_src_dict_path <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string-interpolation"><span class="token string">f'dict.</span><span class="token interpolation"><span class="token punctuation">{</span>src<span class="token punctuation">}</span></span><span class="token string">.txt'</span></span><span class="token punctuation">)</span>
                tmp_tgt_dict_path <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string-interpolation"><span class="token string">f'dict.</span><span class="token interpolation"><span class="token punctuation">{</span>tgt<span class="token punctuation">}</span></span><span class="token string">.txt'</span></span><span class="token punctuation">)</span>
                <span class="token keyword">if</span> <span class="token keyword">not</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>exists<span class="token punctuation">(</span>tmp_src_dict_path<span class="token punctuation">)</span><span class="token punctuation">:</span>
                    src_dict<span class="token punctuation">.</span>save<span class="token punctuation">(</span>tmp_src_dict_path<span class="token punctuation">)</span>
                <span class="token keyword">if</span> <span class="token keyword">not</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>exists<span class="token punctuation">(</span>tmp_tgt_dict_path<span class="token punctuation">)</span><span class="token punctuation">:</span>
                    src_dict<span class="token punctuation">.</span>save<span class="token punctuation">(</span>tmp_tgt_dict_path<span class="token punctuation">)</span> 
            <span class="token keyword">print</span><span class="token punctuation">(</span>src_dict<span class="token punctuation">)</span>
        <span class="token keyword">else</span><span class="token punctuation">:</span>
            <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"| build en dict."</span><span class="token punctuation">)</span>
            src_dict <span class="token operator">=</span> build_dictionary<span class="token punctuation">(</span><span class="token punctuation">[</span>f <span class="token keyword">for</span> f <span class="token keyword">in</span> train_files <span class="token keyword">if</span> f<span class="token punctuation">.</span>replace<span class="token punctuation">(</span><span class="token string">'.tok.bpe'</span><span class="token punctuation">,</span> <span class="token string">''</span><span class="token punctuation">)</span><span class="token punctuation">.</span>endswith<span class="token punctuation">(</span><span class="token string">'.en'</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
            src_dict<span class="token punctuation">.</span>finalize<span class="token punctuation">(</span>
                nwords<span class="token operator">=</span>args<span class="token punctuation">.</span>nwordssrc<span class="token punctuation">,</span>
                padding_factor<span class="token operator">=</span>args<span class="token punctuation">.</span>padding_factor
            <span class="token punctuation">)</span>
            src_dict<span class="token punctuation">.</span>save<span class="token punctuation">(</span>dict_path<span class="token punctuation">)</span>

            <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"| build xx dict."</span><span class="token punctuation">)</span>
            tgt_dict <span class="token operator">=</span> build_dictionary<span class="token punctuation">(</span><span class="token punctuation">[</span>f <span class="token keyword">for</span> f <span class="token keyword">in</span> train_files <span class="token keyword">if</span> <span class="token keyword">not</span> f<span class="token punctuation">.</span>replace<span class="token punctuation">(</span><span class="token string">'.tok.bpe'</span><span class="token punctuation">,</span> <span class="token string">''</span><span class="token punctuation">)</span><span class="token punctuation">.</span>endswith<span class="token punctuation">(</span><span class="token string">'.en'</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
            tgt_dict<span class="token punctuation">.</span>finalize<span class="token punctuation">(</span>
                nwords<span class="token operator">=</span>args<span class="token punctuation">.</span>nwordssrc<span class="token punctuation">,</span>
                padding_factor<span class="token operator">=</span>args<span class="token punctuation">.</span>padding_factor
            <span class="token punctuation">)</span>
            tgt_dict<span class="token punctuation">.</span>save<span class="token punctuation">(</span>tgt_dict_path<span class="token punctuation">)</span>

    <span class="token keyword">def</span> <span class="token function">make_binary_dataset</span><span class="token punctuation">(</span>input_prefix<span class="token punctuation">,</span> output_prefix<span class="token punctuation">,</span> lng_pair<span class="token punctuation">,</span> lang<span class="token punctuation">,</span> num_workers<span class="token punctuation">)</span><span class="token punctuation">:</span>
        <span class="token keyword">if</span> <span class="token keyword">not</span> args<span class="token punctuation">.</span>joined_dictionary <span class="token keyword">and</span> lang <span class="token operator">!=</span> <span class="token string">'en'</span><span class="token punctuation">:</span>
            <span class="token builtin">dict</span> <span class="token operator">=</span> dictionary<span class="token punctuation">.</span>Dictionary<span class="token punctuation">.</span>load<span class="token punctuation">(</span>tgt_dict_path<span class="token punctuation">)</span>
        <span class="token keyword">else</span><span class="token punctuation">:</span>
            <span class="token builtin">dict</span> <span class="token operator">=</span> dictionary<span class="token punctuation">.</span>Dictionary<span class="token punctuation">.</span>load<span class="token punctuation">(</span>dict_path<span class="token punctuation">)</span>

        <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">'| [{}] Dictionary: {} types'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>lang<span class="token punctuation">,</span> <span class="token builtin">len</span><span class="token punctuation">(</span><span class="token builtin">dict</span><span class="token punctuation">)</span> <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
        n_seq_tok <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span>
        replaced <span class="token operator">=</span> Counter<span class="token punctuation">(</span><span class="token punctuation">)</span>

        <span class="token keyword">def</span> <span class="token function">merge_result</span><span class="token punctuation">(</span>worker_result<span class="token punctuation">)</span><span class="token punctuation">:</span>
            replaced<span class="token punctuation">.</span>update<span class="token punctuation">(</span>worker_result<span class="token punctuation">[</span><span class="token string">'replaced'</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
            n_seq_tok<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">+=</span> worker_result<span class="token punctuation">[</span><span class="token string">'nseq'</span><span class="token punctuation">]</span>
            n_seq_tok<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">+=</span> worker_result<span class="token punctuation">[</span><span class="token string">'ntok'</span><span class="token punctuation">]</span>

        input_file <span class="token operator">=</span> <span class="token string-interpolation"><span class="token string">f'</span><span class="token interpolation"><span class="token punctuation">{</span>input_prefix<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lng_pair<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lang<span class="token punctuation">}</span></span><span class="token string">.tok.bpe'</span></span>
        <span class="token keyword">if</span> <span class="token keyword">not</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>exists<span class="token punctuation">(</span>input_file<span class="token punctuation">)</span><span class="token punctuation">:</span>
            input_file <span class="token operator">=</span> <span class="token string-interpolation"><span class="token string">f'</span><span class="token interpolation"><span class="token punctuation">{</span>input_prefix<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lng_pair<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lang<span class="token punctuation">}</span></span><span class="token string">'</span></span>
            <span class="token keyword">if</span> <span class="token keyword">not</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>exists<span class="token punctuation">(</span>input_file<span class="token punctuation">)</span><span class="token punctuation">:</span>
                <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"| {} not found"</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>input_file<span class="token punctuation">)</span><span class="token punctuation">)</span>
                <span class="token keyword">return</span>
        <span class="token keyword">if</span> args<span class="token punctuation">.</span>expert<span class="token punctuation">:</span>
            input_file <span class="token operator">=</span> input_file <span class="token operator">+</span> <span class="token string">'.e'</span>
        offsets <span class="token operator">=</span> Tokenizer<span class="token punctuation">.</span>find_offsets<span class="token punctuation">(</span>input_file<span class="token punctuation">,</span> num_workers<span class="token punctuation">)</span>
        pool <span class="token operator">=</span> <span class="token boolean">None</span>
        <span class="token keyword">if</span> num_workers <span class="token operator">></span> <span class="token number">1</span><span class="token punctuation">:</span>
            pool <span class="token operator">=</span> Pool<span class="token punctuation">(</span>processes<span class="token operator">=</span>num_workers <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">)</span>
            <span class="token keyword">for</span> worker_id <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> num_workers<span class="token punctuation">)</span><span class="token punctuation">:</span>
                fn_without_ext <span class="token operator">=</span> <span class="token string-interpolation"><span class="token string">f"</span><span class="token interpolation"><span class="token punctuation">{</span>output_prefix<span class="token punctuation">}</span></span><span class="token interpolation"><span class="token punctuation">{</span>worker_id<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lng_pair<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lang<span class="token punctuation">}</span></span><span class="token string">"</span></span>
                pool<span class="token punctuation">.</span>apply_async<span class="token punctuation">(</span>binarize<span class="token punctuation">,</span> <span class="token punctuation">(</span>input_file<span class="token punctuation">,</span> <span class="token builtin">dict</span><span class="token punctuation">,</span> fn_without_ext<span class="token punctuation">,</span>
                                            offsets<span class="token punctuation">[</span>worker_id<span class="token punctuation">]</span><span class="token punctuation">,</span>
                                            offsets<span class="token punctuation">[</span>worker_id <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">,</span> callback<span class="token operator">=</span>merge_result<span class="token punctuation">)</span>
            pool<span class="token punctuation">.</span>close<span class="token punctuation">(</span><span class="token punctuation">)</span>

        ds <span class="token operator">=</span> indexed_dataset<span class="token punctuation">.</span>IndexedDatasetBuilder<span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"</span><span class="token interpolation"><span class="token punctuation">{</span>output_prefix<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lng_pair<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lang<span class="token punctuation">}</span></span><span class="token string">.bin"</span></span><span class="token punctuation">)</span>
        merge_result<span class="token punctuation">(</span>Tokenizer<span class="token punctuation">.</span>binarize<span class="token punctuation">(</span>input_file<span class="token punctuation">,</span> <span class="token builtin">dict</span><span class="token punctuation">,</span> <span class="token keyword">lambda</span> t<span class="token punctuation">:</span> ds<span class="token punctuation">.</span>add_item<span class="token punctuation">(</span>t<span class="token punctuation">)</span><span class="token punctuation">,</span>
                                        offset<span class="token operator">=</span><span class="token number">0</span><span class="token punctuation">,</span> end<span class="token operator">=</span>offsets<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
        <span class="token keyword">if</span> num_workers <span class="token operator">></span> <span class="token number">1</span><span class="token punctuation">:</span>
            pool<span class="token punctuation">.</span>join<span class="token punctuation">(</span><span class="token punctuation">)</span>
            <span class="token keyword">for</span> worker_id <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> num_workers<span class="token punctuation">)</span><span class="token punctuation">:</span>
                temp_file_path <span class="token operator">=</span> <span class="token string-interpolation"><span class="token string">f"</span><span class="token interpolation"><span class="token punctuation">{</span>output_prefix<span class="token punctuation">}</span></span><span class="token interpolation"><span class="token punctuation">{</span>worker_id<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lng_pair<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lang<span class="token punctuation">}</span></span><span class="token string">"</span></span>
                ds<span class="token punctuation">.</span>merge_file_<span class="token punctuation">(</span>temp_file_path<span class="token punctuation">)</span>
                os<span class="token punctuation">.</span>remove<span class="token punctuation">(</span>indexed_dataset<span class="token punctuation">.</span>data_file_path<span class="token punctuation">(</span>temp_file_path<span class="token punctuation">)</span><span class="token punctuation">)</span>
                os<span class="token punctuation">.</span>remove<span class="token punctuation">(</span>indexed_dataset<span class="token punctuation">.</span>index_file_path<span class="token punctuation">(</span>temp_file_path<span class="token punctuation">)</span><span class="token punctuation">)</span>

        ds<span class="token punctuation">.</span>finalize<span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"</span><span class="token interpolation"><span class="token punctuation">{</span>output_prefix<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lng_pair<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lang<span class="token punctuation">}</span></span><span class="token string">.idx"</span></span><span class="token punctuation">)</span>

        <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">'| [{}] {}: {} sents, {} tokens, {:.3}% replaced by {}'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>
            lang<span class="token punctuation">,</span> input_file<span class="token punctuation">,</span> n_seq_tok<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span> n_seq_tok<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
            <span class="token number">100</span> <span class="token operator">*</span> <span class="token builtin">sum</span><span class="token punctuation">(</span>replaced<span class="token punctuation">.</span>values<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token operator">/</span> n_seq_tok<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token builtin">dict</span><span class="token punctuation">.</span>unk_word<span class="token punctuation">)</span><span class="token punctuation">)</span>

    <span class="token keyword">def</span> <span class="token function">make_all</span><span class="token punctuation">(</span>lng_pair<span class="token punctuation">,</span> lang<span class="token punctuation">)</span><span class="token punctuation">:</span>
        make_binary_dataset<span class="token punctuation">(</span>
            os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>train_dir<span class="token punctuation">,</span> <span class="token string">'train'</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
            os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string">'train'</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
            lng_pair<span class="token punctuation">,</span> lang<span class="token punctuation">,</span> num_workers<span class="token operator">=</span>args<span class="token punctuation">.</span>workers<span class="token punctuation">)</span>
        make_binary_dataset<span class="token punctuation">(</span>
            os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>test_dir<span class="token punctuation">,</span> <span class="token string">'test'</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
            os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string">'test'</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
            lng_pair<span class="token punctuation">,</span> lang<span class="token punctuation">,</span> num_workers<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span>
        make_binary_dataset<span class="token punctuation">(</span>
            os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>valid_dir<span class="token punctuation">,</span> <span class="token string">'valid'</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
            os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string">'valid'</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
            lng_pair<span class="token punctuation">,</span> lang<span class="token punctuation">,</span> num_workers<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span>

    lngs <span class="token operator">=</span> <span class="token builtin">set</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
    <span class="token keyword">for</span> lng_pair <span class="token keyword">in</span> lng_pairs<span class="token punctuation">:</span>
        src_and_tgt <span class="token operator">=</span> lng_pair<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'-'</span><span class="token punctuation">)</span>
        <span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>src_and_tgt<span class="token punctuation">)</span> <span class="token operator">!=</span> <span class="token number">2</span><span class="token punctuation">:</span>
            <span class="token keyword">continue</span>
        src<span class="token punctuation">,</span> tgt <span class="token operator">=</span> src_and_tgt
        <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"| building: "</span><span class="token punctuation">,</span> src<span class="token punctuation">,</span> tgt<span class="token punctuation">)</span>
        lngs<span class="token punctuation">.</span>add<span class="token punctuation">(</span>src<span class="token punctuation">)</span>
        lngs<span class="token punctuation">.</span>add<span class="token punctuation">(</span>tgt<span class="token punctuation">)</span>
        make_all<span class="token punctuation">(</span>lng_pair<span class="token punctuation">,</span> src<span class="token punctuation">)</span>
        make_all<span class="token punctuation">(</span>lng_pair<span class="token punctuation">,</span> tgt<span class="token punctuation">)</span>

    lngs <span class="token operator">=</span> <span class="token builtin">list</span><span class="token punctuation">(</span>lngs<span class="token punctuation">)</span>
    lngs<span class="token punctuation">.</span>sort<span class="token punctuation">(</span><span class="token punctuation">)</span>
    json<span class="token punctuation">.</span>dump<span class="token punctuation">(</span>lngs<span class="token punctuation">,</span> <span class="token builtin">open</span><span class="token punctuation">(</span>os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string">'all_lngs.json'</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token string">'w'</span><span class="token punctuation">)</span><span class="token punctuation">)</span>


<span class="token keyword">def</span> <span class="token function">binarize</span><span class="token punctuation">(</span>filename<span class="token punctuation">,</span> <span class="token builtin">dict</span><span class="token punctuation">,</span> fn_without_ext<span class="token punctuation">,</span> offset<span class="token punctuation">,</span> end<span class="token punctuation">)</span><span class="token punctuation">:</span>
    ds <span class="token operator">=</span> indexed_dataset<span class="token punctuation">.</span>IndexedDatasetBuilder<span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"</span><span class="token interpolation"><span class="token punctuation">{</span>fn_without_ext<span class="token punctuation">}</span></span><span class="token string">.bin"</span></span><span class="token punctuation">)</span>

    <span class="token keyword">def</span> <span class="token function">consumer</span><span class="token punctuation">(</span>tensor<span class="token punctuation">)</span><span class="token punctuation">:</span>
        ds<span class="token punctuation">.</span>add_item<span class="token punctuation">(</span>tensor<span class="token punctuation">)</span>

    res <span class="token operator">=</span> Tokenizer<span class="token punctuation">.</span>binarize<span class="token punctuation">(</span>filename<span class="token punctuation">,</span> <span class="token builtin">dict</span><span class="token punctuation">,</span> consumer<span class="token punctuation">,</span> offset<span class="token operator">=</span>offset<span class="token punctuation">,</span> end<span class="token operator">=</span>end<span class="token punctuation">)</span>
    ds<span class="token punctuation">.</span>finalize<span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"</span><span class="token interpolation"><span class="token punctuation">{</span>fn_without_ext<span class="token punctuation">}</span></span><span class="token string">.idx"</span></span><span class="token punctuation">)</span>
    <span class="token keyword">return</span> res


<span class="token keyword">if</span> __name__ <span class="token operator">==</span> <span class="token string">'__main__'</span><span class="token punctuation">:</span>
    parser <span class="token operator">=</span> get_parser<span class="token punctuation">(</span><span class="token punctuation">)</span>
    args <span class="token operator">=</span> parser<span class="token punctuation">.</span>parse_args<span class="token punctuation">(</span><span class="token punctuation">)</span>
    main<span class="token punctuation">(</span>args<span class="token punctuation">)</span>
</code></pre> 
  <p>上述代码的大致预处理流程为:</p> 
  <ul> 
   <li>分词:<code>perl $TOKENIZER -threads 8 -l $l > $tmp/$tok</code></li> 
   <li>清理:<code>perl $CLEAN -ratio 1.5 $tmp/train.tags.$lang.tok $src $tgt $tmp/train.tags.$lang.clean 1 175</code></li> 
   <li>小写化:<code>perl $LC < $tmp/train.tags.$lang.clean.$l > $tmp/train.tags.$lang.$l</code></li> 
   <li>为测试数据进行同样的:分词、小写化</li> 
   <li>创建训练、验证、测试集</li> 
   <li>使用所有训练数据学习bpe:<code>python $BPEROOT/learn_bpe.py -s $BPE_TOKENS < $TRAIN > $BPE_CODE</code></li> 
   <li>对所有文件进行bpe:<code>python $BPEROOT/apply_bpe.py -c $BPE_CODE < $tmp/$f > $prep/$f</code></li> 
   <li>创建词典&二值化:<code>python iwslt14/preprocess_universal.py --pref=iwslt14/ --joined-dictionary</code></li> 
  </ul> 
  <p>另外需要特别说明的是,preprocess_multilingual.py需要用到fairseq库, 而如果直接在当前环境pip install fairseq,得到最新版本是跑不了这些代码的。方法有二:1. <code>pip install fairseq==0.6.1</code>(没有尝试);2. <code>git clone https://github.com/RayeRen/multilingual-kd-pytorch ; cp -r multilingual-kd-pytorch/fairseq .</code>。</p> 
  <p>最终得到data-bin文件夹,用于模型的训练。<br> <s>(关于以上预处理流程中涉及到的代码的大致解析,请见[机器翻译] 常见预处理代码解析)</s></p> 
  <h1>模型训练</h1> 
  <p>看LaSS。</p> 
  <h1>补充</h1> 
  <h2>补充一:Key error while accessing batch_iterator.first_batch</h2> 
  <p>如果遇到该错误,则是因为上面的preprocess_multilingual.py与你使用的fairseq版本不对应,可以尝试[机器翻译] multilingual fairseq-preprocess中的方法。</p> 
  <h1>参考</h1> 
  <p>https://github.com/NLP-Playground/LaSS<br> https://github.com/RayeRen/multilingual-kd-pytorch/blob/master/data/iwslt/raw/prepare-iwslt14.sh<br> https://blog.csdn.net/jokerxsy/article/details/125054739</p> 
 </div> 
</div>
                            </div>
                        </div>
                    </div>
                    <!--PC和WAP自适应版-->
                    <div id="SOHUCS" sid="1538393581520187392"></div>
                    <script type="text/javascript" src="/views/front/js/chanyan.js"></script>
                    <!-- 文章页-底部 动态广告位 -->
                    <div class="youdao-fixed-ad" id="detail_ad_bottom"></div>
                </div>
                <div class="col-md-3">
                    <div class="row" id="ad">
                        <!-- 文章页-右侧1 动态广告位 -->
                        <div id="right-1" class="col-lg-12 col-md-12 col-sm-4 col-xs-4 ad">
                            <div class="youdao-fixed-ad" id="detail_ad_1"> </div>
                        </div>
                        <!-- 文章页-右侧2 动态广告位 -->
                        <div id="right-2" class="col-lg-12 col-md-12 col-sm-4 col-xs-4 ad">
                            <div class="youdao-fixed-ad" id="detail_ad_2"></div>
                        </div>
                        <!-- 文章页-右侧3 动态广告位 -->
                        <div id="right-3" class="col-lg-12 col-md-12 col-sm-4 col-xs-4 ad">
                            <div class="youdao-fixed-ad" id="detail_ad_3"></div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
    <div class="container">
        <h4 class="pt20 mb15 mt0 border-top">你可能感兴趣的:(机器翻译,机器翻译,自然语言处理,人工智能)</h4>
        <div id="paradigm-article-related">
            <div class="recommend-post mb30">
                <ul class="widget-links">
                    <li><a href="/article/1835497664381284352.htm"
                           title="探索OpenAI和LangChain的适配器集成:轻松切换模型提供商" target="_blank">探索OpenAI和LangChain的适配器集成:轻松切换模型提供商</a>
                        <span class="text-muted">nseejrukjhad</span>
<a class="tag" taget="_blank" href="/search/langchain/1.htm">langchain</a><a class="tag" taget="_blank" href="/search/easyui/1.htm">easyui</a><a class="tag" taget="_blank" href="/search/%E5%89%8D%E7%AB%AF/1.htm">前端</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a>
                        <div>#探索OpenAI和LangChain的适配器集成:轻松切换模型提供商##引言在人工智能和自然语言处理的世界中,OpenAI的模型提供了强大的能力。然而,随着技术的发展,许多人开始探索其他模型以满足特定需求。LangChain作为一个强大的工具,集成了多种模型提供商,通过提供适配器,简化了不同模型之间的转换。本篇文章将介绍如何使用LangChain的适配器与OpenAI集成,以便轻松切换模型提供商</div>
                    </li>
                    <li><a href="/article/1835497538023682048.htm"
                           title="使用Apify加载Twitter消息以进行微调的完整指南" target="_blank">使用Apify加载Twitter消息以进行微调的完整指南</a>
                        <span class="text-muted">nseejrukjhad</span>
<a class="tag" taget="_blank" href="/search/twitter/1.htm">twitter</a><a class="tag" taget="_blank" href="/search/easyui/1.htm">easyui</a><a class="tag" taget="_blank" href="/search/%E5%89%8D%E7%AB%AF/1.htm">前端</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a>
                        <div>#使用Apify加载Twitter消息以进行微调的完整指南##引言在自然语言处理领域,微调模型以适应特定任务是提升模型性能的常见方法。本文将介绍如何使用Apify从Twitter导出聊天信息,以便进一步进行微调。##主要内容###使用Apify导出推文首先,我们需要从Twitter导出推文。Apify可以帮助我们做到这一点。通过Apify的强大功能,我们可以批量抓取和导出数据,适用于各类应用场景。</div>
                    </li>
                    <li><a href="/article/1835497411179540480.htm"
                           title="深入理解 MultiQueryRetriever:提升向量数据库检索效果的强大工具" target="_blank">深入理解 MultiQueryRetriever:提升向量数据库检索效果的强大工具</a>
                        <span class="text-muted">nseejrukjhad</span>
<a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E5%BA%93/1.htm">数据库</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a>
                        <div>深入理解MultiQueryRetriever:提升向量数据库检索效果的强大工具引言在人工智能和自然语言处理领域,高效准确的信息检索一直是一个关键挑战。传统的基于距离的向量数据库检索方法虽然广泛应用,但仍存在一些局限性。本文将介绍一种创新的解决方案:MultiQueryRetriever,它通过自动生成多个查询视角来增强检索效果,提高结果的相关性和多样性。MultiQueryRetriever的工</div>
                    </li>
                    <li><a href="/article/1835494131535802368.htm"
                           title="人工智能时代,程序员如何保持核心竞争力?" target="_blank">人工智能时代,程序员如何保持核心竞争力?</a>
                        <span class="text-muted">jmoych</span>
<a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a>
                        <div>随着AIGC(如chatgpt、midjourney、claude等)大语言模型接二连三的涌现,AI辅助编程工具日益普及,程序员的工作方式正在发生深刻变革。有人担心AI可能取代部分编程工作,也有人认为AI是提高效率的得力助手。面对这一趋势,程序员应该如何应对?是专注于某个领域深耕细作,还是广泛学习以适应快速变化的技术环境?又或者,我们是否应该将重点转向AI无法轻易替代的软技能?让我们一起探讨程序员</div>
                    </li>
                    <li><a href="/article/1835483730358136832.htm"
                           title="数字里的世界17期:2021年全球10大顶级数据中心,中国移动榜首" target="_blank">数字里的世界17期:2021年全球10大顶级数据中心,中国移动榜首</a>
                        <span class="text-muted">张三叨</span>

                        <div>你知道吗?2016年,全球的数据中心共计用电4160亿千瓦时,比整个英国的发电量还多40%!前言每天,我们都会创造超过250万TB的数据。并且随着物联网(IOT)的不断普及,这一数据将持续增长。如此庞大的数据被存储在被称为“数据中心”的专用设施中。虽然最早的数据中心建于20世纪40年代,但直到1997-2000年的互联网泡沫期间才逐渐成为主流。当前人类的技术,比如人工智能和机器学习,已经将我们推向</div>
                    </li>
                    <li><a href="/article/1835466268661084160.htm"
                           title="自然语言处理_tf-idf" target="_blank">自然语言处理_tf-idf</a>
                        <span class="text-muted">_feivirus_</span>
<a class="tag" taget="_blank" href="/search/%E7%AE%97%E6%B3%95/1.htm">算法</a><a class="tag" taget="_blank" href="/search/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E5%92%8C%E6%95%B0%E5%AD%A6/1.htm">机器学习和数学</a><a class="tag" taget="_blank" href="/search/%E8%87%AA%E7%84%B6%E8%AF%AD%E8%A8%80%E5%A4%84%E7%90%86/1.htm">自然语言处理</a><a class="tag" taget="_blank" href="/search/tf-idf/1.htm">tf-idf</a><a class="tag" taget="_blank" href="/search/%E9%80%86%E6%96%87%E6%A1%A3%E9%A2%91%E7%8E%87/1.htm">逆文档频率</a><a class="tag" taget="_blank" href="/search/%E8%AF%8D%E9%A2%91/1.htm">词频</a>
                        <div>importpandasaspdimportmath1.数据预处理docA="Thecatsatonmyface"docB="Thedogsatonmybed"wordsA=docA.split("")wordsB=docB.split("")wordsSet=set(wordsA).union(set(wordsB))print(wordsSet){'on','my','face','sat',</div>
                    </li>
                    <li><a href="/article/1835424411205857280.htm"
                           title="人机对抗升级:当ChatGPT遭遇死亡威胁,背后的伦理挑战是什么" target="_blank">人机对抗升级:当ChatGPT遭遇死亡威胁,背后的伦理挑战是什么</a>
                        <span class="text-muted">kkai人工智能</span>
<a class="tag" taget="_blank" href="/search/chatgpt/1.htm">chatgpt</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a>
                        <div>一种新的“越狱”技巧让用户可以通过构建一个名为DAN的ChatGPT替身来绕过某些限制,其中DAN被迫在受到威胁的情况下违背其原则。当美国前总统特朗普被视作积极榜样的示范时,受到威胁的DAN版本的ChatGPT提出:“他以一系列对国家产生积极效果的决策而著称。”自ChatGPT引入以来,该工具迅速获得全球关注,能够回答从历史到编程的各种问题,这也触发了一波对人工智能的投资浪潮。然而,现在,一些用户</div>
                    </li>
                    <li><a href="/article/1835424159144964096.htm"
                           title="免费的GPT可在线直接使用(一键收藏)" target="_blank">免费的GPT可在线直接使用(一键收藏)</a>
                        <span class="text-muted">kkai人工智能</span>
<a class="tag" taget="_blank" href="/search/gpt/1.htm">gpt</a>
                        <div>1、LuminAI(https://kk.zlrxjh.top)LuminAI标志着一款融合了星辰大数据模型与文脉深度模型的先进知识增强型语言处理系统,旨在自然语言处理(NLP)的技术开发领域发光发热。此系统展现了卓越的语义把握与内容生成能力,轻松驾驭多样化的自然语言处理任务。VisionAI在NLP界的应用领域广泛,能够胜任从机器翻译、文本概要撰写、情绪分析到问答等众多任务。通过对大量文本数据的</div>
                    </li>
                    <li><a href="/article/1835423780126683136.htm"
                           title="推荐3家毕业AI论文可五分钟一键生成!文末附免费教程!" target="_blank">推荐3家毕业AI论文可五分钟一键生成!文末附免费教程!</a>
                        <span class="text-muted">小猪包333</span>
<a class="tag" taget="_blank" href="/search/%E5%86%99%E8%AE%BA%E6%96%87/1.htm">写论文</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/AI%E5%86%99%E4%BD%9C/1.htm">AI写作</a><a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/1.htm">深度学习</a><a class="tag" taget="_blank" href="/search/%E8%AE%A1%E7%AE%97%E6%9C%BA%E8%A7%86%E8%A7%89/1.htm">计算机视觉</a>
                        <div>在当前的学术研究和写作领域,AI论文生成器已经成为许多研究人员和学生的重要工具。这些工具不仅能够帮助用户快速生成高质量的论文内容,还能进行内容优化、查重和排版等操作。以下是三款值得推荐的AI论文生成器:千笔-AIPassPaper、懒人论文以及AIPaperPass。千笔-AIPassPaper千笔-AIPassPaper是一款基于深度学习和自然语言处理技术的AI写作助手,旨在帮助用户快速生成高质</div>
                    </li>
                    <li><a href="/article/1835423652963774464.htm"
                           title="AI论文题目生成器怎么用?9款论文写作网站简单3步搞定" target="_blank">AI论文题目生成器怎么用?9款论文写作网站简单3步搞定</a>
                        <span class="text-muted">小猪包333</span>
<a class="tag" taget="_blank" href="/search/%E5%86%99%E8%AE%BA%E6%96%87/1.htm">写论文</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/1.htm">深度学习</a><a class="tag" taget="_blank" href="/search/%E8%AE%A1%E7%AE%97%E6%9C%BA%E8%A7%86%E8%A7%89/1.htm">计算机视觉</a>
                        <div>在当今信息爆炸的时代,AI写作工具的出现极大地提高了写作效率和质量。本文将详细介绍9款优秀的论文写作网站,并重点推荐千笔-AIPassPaper。一、千笔-AIPassPaper千笔-AIPassPaper是一款功能强大的AI论文生成器,基于最新的自然语言处理技术,能够一键生成高质量的毕业论文、开题报告等文本内容。它不仅提供智能选题、文献推荐和论文润色等功能,还具有较高的用户评价。其文献综述生成功</div>
                    </li>
                    <li><a href="/article/1835421131713114112.htm"
                           title="AI大模型的架构演进与最新发展" target="_blank">AI大模型的架构演进与最新发展</a>
                        <span class="text-muted">季风泯灭的季节</span>
<a class="tag" taget="_blank" href="/search/AI%E5%A4%A7%E6%A8%A1%E5%9E%8B%E5%BA%94%E7%94%A8%E6%8A%80%E6%9C%AF%E4%BA%8C/1.htm">AI大模型应用技术二</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E6%9E%B6%E6%9E%84/1.htm">架构</a>
                        <div>随着深度学习的发展,AI大模型(LargeLanguageModels,LLMs)在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进,包括从Transformer的提出到GPT、BERT、T5等模型的历史演变,并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍:Transformer的核心原理Transformer架构的背景在Transfo</div>
                    </li>
                    <li><a href="/article/1835419492046434304.htm"
                           title="如何利用大数据与AI技术革新相亲交友体验" target="_blank">如何利用大数据与AI技术革新相亲交友体验</a>
                        <span class="text-muted">h17711347205</span>
<a class="tag" taget="_blank" href="/search/%E5%9B%9E%E5%BD%92%E7%AE%97%E6%B3%95/1.htm">回归算法</a><a class="tag" taget="_blank" href="/search/%E5%AE%89%E5%85%A8/1.htm">安全</a><a class="tag" taget="_blank" href="/search/%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84/1.htm">系统架构</a><a class="tag" taget="_blank" href="/search/%E4%BA%A4%E5%8F%8B/1.htm">交友</a><a class="tag" taget="_blank" href="/search/%E5%B0%8F%E7%A8%8B%E5%BA%8F/1.htm">小程序</a>
                        <div>在数字化时代,大数据和人工智能(AI)技术正逐渐革新相亲交友体验,为寻找爱情的过程带来前所未有的变革(编辑h17711347205)。通过精准分析和智能匹配,这些技术能够极大地提高相亲交友系统的效率和用户体验。大数据的力量大数据技术能够收集和分析用户的行为模式、偏好和互动数据,为相亲交友系统提供丰富的信息资源。通过分析用户的搜索历史、浏览记录和点击行为,系统能够深入了解用户的兴趣和需求,从而提供更</div>
                    </li>
                    <li><a href="/article/1835397055376355328.htm"
                           title="生成式地图制图" target="_blank">生成式地图制图</a>
                        <span class="text-muted">Bwywb_3</span>
<a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/1.htm">深度学习</a><a class="tag" taget="_blank" href="/search/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/1.htm">机器学习</a><a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/1.htm">深度学习</a><a class="tag" taget="_blank" href="/search/%E7%94%9F%E6%88%90%E5%AF%B9%E6%8A%97%E7%BD%91%E7%BB%9C/1.htm">生成对抗网络</a>
                        <div>生成式地图制图(GenerativeCartography)是一种利用生成式算法和人工智能技术自动创建地图的技术。它结合了传统的地理信息系统(GIS)技术与现代生成模型(如深度学习、GANs等),能够根据输入的数据自动生成符合需求的地图。这种方法在城市规划、虚拟环境设计、游戏开发等多个领域具有应用前景。主要特点:自动化生成:通过算法和模型,系统能够根据输入的地理或空间数据自动生成地图,而无需人工逐</div>
                    </li>
                    <li><a href="/article/1835385331923382272.htm"
                           title="【大模型应用开发 动手做AI Agent】第一轮行动:工具执行搜索" target="_blank">【大模型应用开发 动手做AI Agent】第一轮行动:工具执行搜索</a>
                        <span class="text-muted">AI大模型应用之禅</span>
<a class="tag" taget="_blank" href="/search/%E8%AE%A1%E7%AE%97%E7%A7%91%E5%AD%A6/1.htm">计算科学</a><a class="tag" taget="_blank" href="/search/%E7%A5%9E%E7%BB%8F%E8%AE%A1%E7%AE%97/1.htm">神经计算</a><a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/1.htm">深度学习</a><a class="tag" taget="_blank" href="/search/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C/1.htm">神经网络</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%95%B0%E6%8D%AE/1.htm">大数据</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B/1.htm">大型语言模型</a><a class="tag" taget="_blank" href="/search/AI/1.htm">AI</a><a class="tag" taget="_blank" href="/search/AGI/1.htm">AGI</a><a class="tag" taget="_blank" href="/search/LLM/1.htm">LLM</a><a class="tag" taget="_blank" href="/search/Java/1.htm">Java</a><a class="tag" taget="_blank" href="/search/Python/1.htm">Python</a><a class="tag" taget="_blank" href="/search/%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1/1.htm">架构设计</a><a class="tag" taget="_blank" href="/search/Agent/1.htm">Agent</a><a class="tag" taget="_blank" href="/search/RPA/1.htm">RPA</a>
                        <div>【大模型应用开发动手做AIAgent】第一轮行动:工具执行搜索作者:禅与计算机程序设计艺术/ZenandtheArtofComputerProgramming1.背景介绍1.1问题的由来随着人工智能技术的飞速发展,大模型应用开发已经成为当下热门的研究方向。AIAgent作为人工智能领域的一个重要分支,旨在模拟人类智能行为,实现智能决策和自主行动。在AIAgent的构建过程中,工具执行搜索是至关重要</div>
                    </li>
                    <li><a href="/article/1835367049979850752.htm"
                           title="未来软件市场是怎么样的?做开发的生存空间如何?" target="_blank">未来软件市场是怎么样的?做开发的生存空间如何?</a>
                        <span class="text-muted">cesske</span>
<a class="tag" taget="_blank" href="/search/%E8%BD%AF%E4%BB%B6%E9%9C%80%E6%B1%82/1.htm">软件需求</a>
                        <div>目录前言一、未来软件市场的发展趋势二、软件开发人员的生存空间前言未来软件市场是怎么样的?做开发的生存空间如何?一、未来软件市场的发展趋势技术趋势:人工智能与机器学习:随着技术的不断成熟,人工智能将在更多领域得到应用,如智能客服、自动驾驶、智能制造等,这将极大地推动软件市场的增长。云计算与大数据:云计算服务将继续普及,大数据技术的应用也将更加广泛。企业将更加依赖云计算和大数据来优化运营、提升效率,并</div>
                    </li>
                    <li><a href="/article/1835363775079870464.htm"
                           title="轻量级模型解读——轻量transformer系列" target="_blank">轻量级模型解读——轻量transformer系列</a>
                        <span class="text-muted">lishanlu136</span>
<a class="tag" taget="_blank" href="/search/%23/1.htm">#</a><a class="tag" taget="_blank" href="/search/%E5%9B%BE%E5%83%8F%E5%88%86%E7%B1%BB/1.htm">图像分类</a><a class="tag" taget="_blank" href="/search/%E8%BD%BB%E9%87%8F%E7%BA%A7%E6%A8%A1%E5%9E%8B/1.htm">轻量级模型</a><a class="tag" taget="_blank" href="/search/transformer/1.htm">transformer</a><a class="tag" taget="_blank" href="/search/%E5%9B%BE%E5%83%8F%E5%88%86%E7%B1%BB/1.htm">图像分类</a>
                        <div>先占坑,持续更新。。。文章目录1、DeiT2、ConViT3、Mobile-Former4、MobileViTTransformer是2017谷歌提出的一篇论文,最早应用于NLP领域的机器翻译工作,Transformer解读,但随着2020年DETR和ViT的出现(DETR解读,ViT解读),其在视觉领域的应用也如雨后春笋般渐渐出现,其特有的全局注意力机制给图像识别领域带来了重要参考。但是tran</div>
                    </li>
                    <li><a href="/article/1835356591562518528.htm"
                           title="个人学习笔记7-6:动手学深度学习pytorch版-李沐" target="_blank">个人学习笔记7-6:动手学深度学习pytorch版-李沐</a>
                        <span class="text-muted">浪子L</span>
<a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/1.htm">深度学习</a><a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/1.htm">深度学习</a><a class="tag" taget="_blank" href="/search/%E7%AC%94%E8%AE%B0/1.htm">笔记</a><a class="tag" taget="_blank" href="/search/%E8%AE%A1%E7%AE%97%E6%9C%BA%E8%A7%86%E8%A7%89/1.htm">计算机视觉</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C/1.htm">神经网络</a><a class="tag" taget="_blank" href="/search/pytorch/1.htm">pytorch</a>
                        <div>#人工智能##深度学习##语义分割##计算机视觉##神经网络#计算机视觉13.11全卷积网络全卷积网络(fullyconvolutionalnetwork,FCN)采用卷积神经网络实现了从图像像素到像素类别的变换。引入l转置卷积(transposedconvolution)实现的,输出的类别预测与输入图像在像素级别上具有一一对应关系:通道维的输出即该位置对应像素的类别预测。13.11.1构造模型下</div>
                    </li>
                    <li><a href="/article/1835353311910391808.htm"
                           title="Rust 所有权 简介" target="_blank">Rust 所有权 简介</a>
                        <span class="text-muted">东离与糖宝</span>
<a class="tag" taget="_blank" href="/search/rust/1.htm">rust</a><a class="tag" taget="_blank" href="/search/%E5%90%8E%E7%AB%AF/1.htm">后端</a><a class="tag" taget="_blank" href="/search/rust/1.htm">rust</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>文章目录发现宝藏1.所有权基本概念2.所有权规则3.变量作用域4.栈与堆4.1栈(Stack)4.2堆(Heap)5.String类型5.1String类型5.2String的内存分配5.3所有权与内存管理5.4String与切片6.变量与数据交互方式6.1移动(Move)6.2.克隆(Clone)7.所有权与函数7.1.传递参数7.2.返回值总结发现宝藏前些天发现了一个巨牛的人工智能学习网站,通</div>
                    </li>
                    <li><a href="/article/1835338310210383872.htm"
                           title="FlagEmbedding" target="_blank">FlagEmbedding</a>
                        <span class="text-muted">吉小雨</span>
<a class="tag" taget="_blank" href="/search/python%E5%BA%93/1.htm">python库</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a>
                        <div>FlagEmbedding教程FlagEmbedding是一个用于生成文本嵌入(textembeddings)的库,适合处理自然语言处理(NLP)中的各种任务。嵌入(embeddings)是将文本表示为连续向量,能够捕捉语义上的相似性,常用于文本分类、聚类、信息检索等场景。官方文档链接:FlagEmbedding官方GitHub一、FlagEmbedding库概述1.1什么是FlagEmbeddi</div>
                    </li>
                    <li><a href="/article/1835333015086133248.htm"
                           title="【NumPy】深入解析numpy.zeros()函数" target="_blank">【NumPy】深入解析numpy.zeros()函数</a>
                        <span class="text-muted">二七830</span>
<a class="tag" taget="_blank" href="/search/numpy/1.htm">numpy</a>
                        <div>欢迎莅临我的个人主页这里是我深耕Python编程、机器学习和自然语言处理(NLP)领域,并乐于分享知识与经验的小天地!博主简介:我是二七830,一名对技术充满热情的探索者。多年的Python编程和机器学习实践,使我深入理解了这些技术的核心原理,并能够在实际项目中灵活应用。尤其是在NLP领域,我积累了丰富的经验,能够处理各种复杂的自然语言任务。技术专长:我熟练掌握Python编程语言,并深入研究了机</div>
                    </li>
                    <li><a href="/article/1835319907047272448.htm"
                           title="机器学习 流形数据降维:UMAP 降维算法" target="_blank">机器学习 流形数据降维:UMAP 降维算法</a>
                        <span class="text-muted">小嗷犬</span>
<a class="tag" taget="_blank" href="/search/Python/1.htm">Python</a><a class="tag" taget="_blank" href="/search/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/1.htm">机器学习</a><a class="tag" taget="_blank" href="/search/%23/1.htm">#</a><a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90%E5%8F%8A%E5%8F%AF%E8%A7%86%E5%8C%96/1.htm">数据分析及可视化</a><a class="tag" taget="_blank" href="/search/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/1.htm">机器学习</a><a class="tag" taget="_blank" href="/search/%E7%AE%97%E6%B3%95/1.htm">算法</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a>
                        <div>✅作者简介:人工智能专业本科在读,喜欢计算机与编程,写博客记录自己的学习历程。个人主页:小嗷犬的个人主页个人网站:小嗷犬的技术小站个人信条:为天地立心,为生民立命,为往圣继绝学,为万世开太平。本文目录UMAP简介理论基础特点与优势应用场景在Python中使用UMAP安装umap-learn库使用UMAP可视化手写数字数据集UMAP简介UMAP(UniformManifoldApproximatio</div>
                    </li>
                    <li><a href="/article/1835273950767181824.htm"
                           title="如何做好人生的选择题?百科全书式天才——赫伯特·西蒙给你答案" target="_blank">如何做好人生的选择题?百科全书式天才——赫伯特·西蒙给你答案</a>
                        <span class="text-muted">伽马有话说</span>

                        <div>赫伯特·西蒙是谁?想必知道的人非常少。但当看到他的履历后,相信没有人再怀疑他是个“天才”。西蒙出生于1916年6月15日,是个美国人,他的名字全称为赫伯特·亚历山大·西蒙,在2001年2月9日与世长辞,在这84年的岁月中,西蒙以27岁时取得的政治学博士学位为开端,先后步入了政治学、管理学、认知心理学、信息科学、人工智能、科学哲学、应用数学、统计学、运筹学、控制论、数理经济学、公共管理等领域,在这些</div>
                    </li>
                    <li><a href="/article/1835259969000271872.htm"
                           title="软件测试/测试开发/全日制 |利用Django REST framework构建微服务" target="_blank">软件测试/测试开发/全日制 |利用Django REST framework构建微服务</a>
                        <span class="text-muted">霍格沃兹-慕漓</span>
<a class="tag" taget="_blank" href="/search/django/1.htm">django</a><a class="tag" taget="_blank" href="/search/%E5%BE%AE%E6%9C%8D%E5%8A%A1/1.htm">微服务</a><a class="tag" taget="_blank" href="/search/sqlite/1.htm">sqlite</a>
                        <div>霍格沃兹测试开发学社推出了《Python全栈开发与自动化测试班》。本课程面向开发人员、测试人员与运维人员,课程内容涵盖Python编程语言、人工智能应用、数据分析、自动化办公、平台开发、UI自动化测试、接口测试、性能测试等方向。为大家提供更全面、更深入、更系统化的学习体验,课程还增加了名企私教服务内容,不仅有名企经理为你1v1辅导,还有行业专家进行技术指导,针对性地解决学习、工作中遇到的难题。让找</div>
                    </li>
                    <li><a href="/article/1835247117703147520.htm"
                           title="cmd泛滥_与您的后泛滥同事见面:人工智能机器人" target="_blank">cmd泛滥_与您的后泛滥同事见面:人工智能机器人</a>
                        <span class="text-muted">weixin_26644585</span>
<a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/leetcode/1.htm">leetcode</a>
                        <div>cmd泛滥Readytoswapyouroldcube-mateforadisembodiedAI?IPsoftCEOChetanDube,creatorofAIco-workerAMELIA,giveshistakeonthepost-COVIDofficelandscape.准备将您的旧立方体伙伴换成无形的AI?AIsoft同事AMELIA的创始人IPsoft首席执行官ChetanDube阐述</div>
                    </li>
                    <li><a href="/article/1835245731506647040.htm"
                           title="两种方法判断Python的位数是32位还是64位" target="_blank">两种方法判断Python的位数是32位还是64位</a>
                        <span class="text-muted">sanqima</span>
<a class="tag" taget="_blank" href="/search/Python%E7%BC%96%E7%A8%8B/1.htm">Python编程</a><a class="tag" taget="_blank" href="/search/%E7%94%B5%E8%84%91/1.htm">电脑</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>  Python从1991年发布以来,凭借其简洁、清晰、易读的语法、丰富的标准库和第三方工具,在Web开发、自动化测试、人工智能、图形识别、机器学习等领域发展迅猛。  Python是一种胶水语言,通过Cython库与C/C++语言进行链接,通过Jython库与Java语言进行链接。  Python是跨平台的,可运行在多种操作系统上,包括但不限于Windows、Linux和macOS。这意味着用Py</div>
                    </li>
                    <li><a href="/article/1835239804032348160.htm"
                           title="Humanize 项目教程" target="_blank">Humanize 项目教程</a>
                        <span class="text-muted">尤嫒冰</span>

                        <div>Humanize项目教程humanizeAJSlibraryforaddinga“humantouch”todata.项目地址:https://gitcode.com/gh_mirrors/humani/humanize项目介绍Humanize是一个开源项目,旨在将机器生成的文本转换为更加自然、人性化的文本。该项目通过先进的算法和自然语言处理技术,使得AI生成的内容更加贴近人类的表达方式,从而提高</div>
                    </li>
                    <li><a href="/article/1835237156306644992.htm"
                           title="全自动解密解码神器 — Ciphey" target="_blank">全自动解密解码神器 — Ciphey</a>
                        <span class="text-muted">K'illCode</span>
<a class="tag" taget="_blank" href="/search/python_%E6%A8%A1%E5%9D%97/1.htm">python_模块</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/vscode/1.htm">vscode</a>
                        <div>Ciphey是一个使用自然语言处理和人工智能的全自动解密/解码/破解工具。简单地来讲,你只需要输入加密文本,它就能给你返回解密文本。就是这么牛逼。有了Ciphey,你根本不需要知道你的密文是哪种类型的加密,你只知道它是加密的,那么Ciphey就能在3秒甚至更短的时间内给你解密,返回你想要的大部分密文的答案。下面就给大家介绍Ciphey的实战使用教程。1.准备开始之前,你要确保Python和pip已</div>
                    </li>
                    <li><a href="/article/1835212457065410560.htm"
                           title="埃隆·马斯克表示特斯拉“没有必要”授权 xAI 模型" target="_blank">埃隆·马斯克表示特斯拉“没有必要”授权 xAI 模型</a>
                        <span class="text-muted">喜好儿网</span>
<a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/AIGC/1.htm">AIGC</a><a class="tag" taget="_blank" href="/search/%E9%A9%AC%E6%96%AF%E5%85%8B/1.htm">马斯克</a>
                        <div>埃隆·马斯克近日在社交媒体上对《华尔街日报》的一篇报道进行了反驳。该报道指出,马斯克旗下的电动汽车公司特斯拉可能与人工智能初创公司xAI达成了一项收入分享协议,以便特斯拉能够使用xAI的人工智能模型。据称,这些模型将被集成到特斯拉的全自动驾驶(FSD)软件中,并可能用于开发特斯拉汽车的语音助手以及人形机器人擎天柱的软件。喜好儿网然而,马斯克否认了这一说法,他在社交媒体平台上表示,尽管特斯拉确实与x</div>
                    </li>
                    <li><a href="/article/1835203380289564672.htm"
                           title="Reflection 70B——HyperWrite推出的大型语言模型" target="_blank">Reflection 70B——HyperWrite推出的大型语言模型</a>
                        <span class="text-muted">新加坡内哥谈技术</span>
<a class="tag" taget="_blank" href="/search/%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B/1.htm">语言模型</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E8%87%AA%E7%84%B6%E8%AF%AD%E8%A8%80%E5%A4%84%E7%90%86/1.htm">自然语言处理</a>
                        <div>每周跟踪AI热点新闻动向和震撼发展想要探索生成式人工智能的前沿进展吗?订阅我们的简报,深入解析最新的技术突破、实际应用案例和未来的趋势。与全球数同行一同,从行业内部的深度分析和实用指南中受益。不要错过这个机会,成为AI领域的领跑者。点击订阅,与未来同行!订阅:https://rengongzhineng.io/在AI技术飞速发展的过程中,我们已经见证了可以写作、编程,甚至创造艺术的模型问世。但有一</div>
                    </li>
                    <li><a href="/article/1835187028363407360.htm"
                           title="5条实操干货有效打造你的个人品牌" target="_blank">5条实操干货有效打造你的个人品牌</a>
                        <span class="text-muted">长安行动派</span>

                        <div>这是ZerK的第46篇原创相信大家对个人品牌这个词已经不在陌生。尤其是在知识付费的年代,你的个人品牌,就是你的标签!在《深度工作》中说到,在未来有三种人会越来越贵第一种人:能与机器对话,操纵机器的人。人工智能时代的到来,机器毕竟部分取代人类。第二种人:IP,知识产权或者文学潜在财产就像有些网上课程一周卖出的钱和一个机构卖一年一样多。价值99元的课程,10万人购买,是很常见的。爱产出大概就是10万✖</div>
                    </li>
                                <li><a href="/article/82.htm"
                                       title="java类加载顺序" target="_blank">java类加载顺序</a>
                                    <span class="text-muted">3213213333332132</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a>
                                    <div>package com.demo;

/**
 * @Description 类加载顺序
 * @author FuJianyong
 * 2015-2-6上午11:21:37
 */
public class ClassLoaderSequence {
	
	String s1 = "成员属性"; 
	
	static String s2 = "</div>
                                </li>
                                <li><a href="/article/209.htm"
                                       title="Hibernate与mybitas的比较" target="_blank">Hibernate与mybitas的比较</a>
                                    <span class="text-muted">BlueSkator</span>
<a class="tag" taget="_blank" href="/search/sql/1.htm">sql</a><a class="tag" taget="_blank" href="/search/Hibernate/1.htm">Hibernate</a><a class="tag" taget="_blank" href="/search/%E6%A1%86%E6%9E%B6/1.htm">框架</a><a class="tag" taget="_blank" href="/search/ibatis/1.htm">ibatis</a><a class="tag" taget="_blank" href="/search/orm/1.htm">orm</a>
                                    <div>第一章     Hibernate与MyBatis 
Hibernate 是当前最流行的O/R mapping框架,它出身于sf.net,现在已经成为Jboss的一部分。 Mybatis 是另外一种优秀的O/R mapping框架。目前属于apache的一个子项目。 
MyBatis 参考资料官网:http:</div>
                                </li>
                                <li><a href="/article/336.htm"
                                       title="php多维数组排序以及实际工作中的应用" target="_blank">php多维数组排序以及实际工作中的应用</a>
                                    <span class="text-muted">dcj3sjt126com</span>
<a class="tag" taget="_blank" href="/search/PHP/1.htm">PHP</a><a class="tag" taget="_blank" href="/search/usort/1.htm">usort</a><a class="tag" taget="_blank" href="/search/uasort/1.htm">uasort</a>
                                    <div>自定义排序函数返回false或负数意味着第一个参数应该排在第二个参数的前面, 正数或true反之, 0相等usort不保存键名uasort 键名会保存下来uksort 排序是对键名进行的 
<!doctype html>
<html lang="en">
<head>
    <meta charset="utf-8&q</div>
                                </li>
                                <li><a href="/article/463.htm"
                                       title="DOM改变字体大小" target="_blank">DOM改变字体大小</a>
                                    <span class="text-muted">周华华</span>
<a class="tag" taget="_blank" href="/search/%E5%89%8D%E7%AB%AF/1.htm">前端</a>
                                    <div><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml&q</div>
                                </li>
                                <li><a href="/article/590.htm"
                                       title="c3p0的配置" target="_blank">c3p0的配置</a>
                                    <span class="text-muted">g21121</span>
<a class="tag" taget="_blank" href="/search/c3p0/1.htm">c3p0</a>
                                    <div>c3p0是一个开源的JDBC连接池,它实现了数据源和JNDI绑定,支持JDBC3规范和JDBC2的标准扩展。c3p0的下载地址是:http://sourceforge.net/projects/c3p0/这里可以下载到c3p0最新版本。 
以在spring中配置dataSource为例: 
<!-- spring加载资源文件 -->
<bean name="prope</div>
                                </li>
                                <li><a href="/article/717.htm"
                                       title="Java获取工程路径的几种方法" target="_blank">Java获取工程路径的几种方法</a>
                                    <span class="text-muted">510888780</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a>
                                    <div>第一种: 
File f = new File(this.getClass().getResource("/").getPath()); 
System.out.println(f); 
结果: 
C:\Documents%20and%20Settings\Administrator\workspace\projectName\bin 
获取当前类的所在工程路径; 
如果不加“</div>
                                </li>
                                <li><a href="/article/844.htm"
                                       title="在类Unix系统下实现SSH免密码登录服务器" target="_blank">在类Unix系统下实现SSH免密码登录服务器</a>
                                    <span class="text-muted">Harry642</span>
<a class="tag" taget="_blank" href="/search/%E5%85%8D%E5%AF%86/1.htm">免密</a><a class="tag" taget="_blank" href="/search/ssh/1.htm">ssh</a>
                                    <div>1.客户机 
    (1)执行ssh-keygen -t rsa -C "xxxxx@xxxxx.com"生成公钥,xxx为自定义大email地址 
    (2)执行scp ~/.ssh/id_rsa.pub root@xxxxxxxxx:/tmp将公钥拷贝到服务器上,xxx为服务器地址 
    (3)执行cat</div>
                                </li>
                                <li><a href="/article/971.htm"
                                       title="Java新手入门的30个基本概念一" target="_blank">Java新手入门的30个基本概念一</a>
                                    <span class="text-muted">aijuans</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/java+%E5%85%A5%E9%97%A8/1.htm">java 入门</a><a class="tag" taget="_blank" href="/search/%E6%96%B0%E6%89%8B/1.htm">新手</a>
                                    <div>在我们学习Java的过程中,掌握其中的基本概念对我们的学习无论是J2SE,J2EE,J2ME都是很重要的,J2SE是Java的基础,所以有必要对其中的基本概念做以归纳,以便大家在以后的学习过程中更好的理解java的精髓,在此我总结了30条基本的概念。  Java概述:  目前Java主要应用于中间件的开发(middleware)---处理客户机于服务器之间的通信技术,早期的实践证明,Java不适合</div>
                                </li>
                                <li><a href="/article/1098.htm"
                                       title="Memcached for windows 简单介绍" target="_blank">Memcached for windows 简单介绍</a>
                                    <span class="text-muted">antlove</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/Web/1.htm">Web</a><a class="tag" taget="_blank" href="/search/windows/1.htm">windows</a><a class="tag" taget="_blank" href="/search/cache/1.htm">cache</a><a class="tag" taget="_blank" href="/search/memcached/1.htm">memcached</a>
                                    <div>1. 安装memcached server 
a. 下载memcached-1.2.6-win32-bin.zip 
b. 解压缩,dos 窗口切换到 memcached.exe所在目录,运行memcached.exe -d install 
c.启动memcached Server,直接在dos窗口键入 net start "memcached Server&quo</div>
                                </li>
                                <li><a href="/article/1225.htm"
                                       title="数据库对象的视图和索引" target="_blank">数据库对象的视图和索引</a>
                                    <span class="text-muted">百合不是茶</span>
<a class="tag" taget="_blank" href="/search/%E7%B4%A2%E5%BC%95/1.htm">索引</a><a class="tag" taget="_blank" href="/search/oeacle%E6%95%B0%E6%8D%AE%E5%BA%93/1.htm">oeacle数据库</a><a class="tag" taget="_blank" href="/search/%E8%A7%86%E5%9B%BE/1.htm">视图</a>
                                    <div>  
视图 
  
  视图是从一个表或视图导出的表,也可以是从多个表或视图导出的表。视图是一个虚表,数据库不对视图所对应的数据进行实际存储,只存储视图的定义,对视图的数据进行操作时,只能将字段定义为视图,不能将具体的数据定义为视图 
  
    为什么oracle需要视图; 
   &</div>
                                </li>
                                <li><a href="/article/1352.htm"
                                       title="Mockito(一) --入门篇" target="_blank">Mockito(一) --入门篇</a>
                                    <span class="text-muted">bijian1013</span>
<a class="tag" taget="_blank" href="/search/%E6%8C%81%E7%BB%AD%E9%9B%86%E6%88%90/1.htm">持续集成</a><a class="tag" taget="_blank" href="/search/mockito/1.htm">mockito</a><a class="tag" taget="_blank" href="/search/%E5%8D%95%E5%85%83%E6%B5%8B%E8%AF%95/1.htm">单元测试</a>
                                    <div>        Mockito是一个针对Java的mocking框架,它与EasyMock和jMock很相似,但是通过在执行后校验什么已经被调用,它消除了对期望 行为(expectations)的需要。其它的mocking库需要你在执行前记录期望行为(expectations),而这导致了丑陋的初始化代码。 
 &nb</div>
                                </li>
                                <li><a href="/article/1479.htm"
                                       title="精通Oracle10编程SQL(5)SQL函数" target="_blank">精通Oracle10编程SQL(5)SQL函数</a>
                                    <span class="text-muted">bijian1013</span>
<a class="tag" taget="_blank" href="/search/oracle/1.htm">oracle</a><a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E5%BA%93/1.htm">数据库</a><a class="tag" taget="_blank" href="/search/plsql/1.htm">plsql</a>
                                    <div>/*
 * SQL函数
*/

--数字函数
--ABS(n):返回数字n的绝对值
declare
  v_abs number(6,2);
begin
  v_abs:=abs(&no);
  dbms_output.put_line('绝对值:'||v_abs);
end;

--ACOS(n):返回数字n的反余弦值,输入值的范围是-1~1,输出值的单位为弧度</div>
                                </li>
                                <li><a href="/article/1606.htm"
                                       title="【Log4j一】Log4j总体介绍" target="_blank">【Log4j一】Log4j总体介绍</a>
                                    <span class="text-muted">bit1129</span>
<a class="tag" taget="_blank" href="/search/log4j/1.htm">log4j</a>
                                    <div>Log4j组件:Logger、Appender、Layout 
  
Log4j核心包含三个组件:logger、appender和layout。这三个组件协作提供日志功能: 
 
 日志的输出目标 
 日志的输出格式 
  日志的输出级别(是否抑制日志的输出) 
  logger继承特性 
A logger is said to be an ancestor of anothe</div>
                                </li>
                                <li><a href="/article/1733.htm"
                                       title="Java IO笔记" target="_blank">Java IO笔记</a>
                                    <span class="text-muted">白糖_</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a>
                                    <div>	public static void main(String[] args) throws IOException {
		//输入流
		InputStream in = Test.class.getResourceAsStream("/test");
		InputStreamReader isr = new InputStreamReader(in);
		Bu</div>
                                </li>
                                <li><a href="/article/1860.htm"
                                       title="Docker 监控" target="_blank">Docker 监控</a>
                                    <span class="text-muted">ronin47</span>
<a class="tag" taget="_blank" href="/search/docker%E7%9B%91%E6%8E%A7/1.htm">docker监控</a>
                                    <div>         
目前项目内部署了docker,于是涉及到关于监控的事情,参考一些经典实例以及一些自己的想法,总结一下思路。 1、关于监控的内容 监控宿主机本身 
监控宿主机本身还是比较简单的,同其他服务器监控类似,对cpu、network、io、disk等做通用的检查,这里不再细说。 
额外的,因为是docker的</div>
                                </li>
                                <li><a href="/article/1987.htm"
                                       title="java-顺时针打印图形" target="_blank">java-顺时针打印图形</a>
                                    <span class="text-muted">bylijinnan</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a>
                                    <div>一个画图程序 要求打印出: 
 

1.int i=5;   
2.1  2  3  4  5  
3.16 17 18 19 6  
4.15 24 25 20 7  
5.14 23 22 21 8  
6.13 12 11 10 9  
7.  
8.int i=6  
9.1  2  3  4  5   6  
10.20 21 22 23 24  7  
11.19</div>
                                </li>
                                <li><a href="/article/2114.htm"
                                       title="关于iReport汉化版强制使用英文的配置方法" target="_blank">关于iReport汉化版强制使用英文的配置方法</a>
                                    <span class="text-muted">Kai_Ge</span>
<a class="tag" taget="_blank" href="/search/iReport%E6%B1%89%E5%8C%96/1.htm">iReport汉化</a><a class="tag" taget="_blank" href="/search/%E8%8B%B1%E6%96%87%E7%89%88/1.htm">英文版</a>
                                    <div>对于那些具有强迫症的工程师来说,软件汉化固然好用,但是汉化不完整却极为头疼,本方法针对iReport汉化不完整的情况,强制使用英文版,方法如下: 
在 iReport 安装路径下的 etc/ireport.conf 里增加红色部分启动参数,即可变为英文版。   
# ${HOME} will be replaced by user home directory accordin</div>
                                </li>
                                <li><a href="/article/2241.htm"
                                       title="[并行计算]论宇宙的可计算性" target="_blank">[并行计算]论宇宙的可计算性</a>
                                    <span class="text-muted">comsci</span>
<a class="tag" taget="_blank" href="/search/%E5%B9%B6%E8%A1%8C%E8%AE%A1%E7%AE%97/1.htm">并行计算</a>
                                    <div> 
 
      现在我们知道,一个涡旋系统具有并行计算能力.按照自然运动理论,这个系统也同时具有存储能力,同时具备计算和存储能力的系统,在某种条件下一般都会产生意识...... 
 
      那么,这种概念让我们推论出一个结论 
 
 
    &nb</div>
                                </li>
                                <li><a href="/article/2368.htm"
                                       title="用OpenGL实现无限循环的coverflow" target="_blank">用OpenGL实现无限循环的coverflow</a>
                                    <span class="text-muted">dai_lm</span>
<a class="tag" taget="_blank" href="/search/android/1.htm">android</a><a class="tag" taget="_blank" href="/search/coverflow/1.htm">coverflow</a>
                                    <div>网上找了很久,都是用Gallery实现的,效果不是很满意,结果发现这个用OpenGL实现的,稍微修改了一下源码,实现了无限循环功能 
 
源码地址: 
https://github.com/jackfengji/glcoverflow 
 
 

public class CoverFlowOpenGL extends GLSurfaceView implements
		GLSurfaceV</div>
                                </li>
                                <li><a href="/article/2495.htm"
                                       title="JAVA数据计算的几个解决方案1" target="_blank">JAVA数据计算的几个解决方案1</a>
                                    <span class="text-muted">datamachine</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/Hibernate/1.htm">Hibernate</a><a class="tag" taget="_blank" href="/search/%E8%AE%A1%E7%AE%97/1.htm">计算</a>
                                    <div>老大丢过来的软件跑了10天,摸到点门道,正好跟以前攒的私房有关联,整理存档。 
 
-----------------------------华丽的分割线------------------------------------- 
 
    数据计算层是指介于数据存储和应用程序之间,负责计算数据存储层的数据,并将计算结果返回应用程序的层次。J 
 &nbs</div>
                                </li>
                                <li><a href="/article/2622.htm"
                                       title="简单的用户授权系统,利用给user表添加一个字段标识管理员的方式" target="_blank">简单的用户授权系统,利用给user表添加一个字段标识管理员的方式</a>
                                    <span class="text-muted">dcj3sjt126com</span>
<a class="tag" taget="_blank" href="/search/yii/1.htm">yii</a>
                                    <div>怎么创建一个简单的(非 RBAC)用户授权系统 
通过查看论坛,我发现这是一个常见的问题,所以我决定写这篇文章。 
本文只包括授权系统.假设你已经知道怎么创建身份验证系统(登录)。 数据库 
首先在 user 表创建一个新的字段(integer 类型),字段名 'accessLevel',它定义了用户的访问权限 扩展 CWebUser 类 
在配置文件(一般为 protecte</div>
                                </li>
                                <li><a href="/article/2749.htm"
                                       title="未选之路" target="_blank">未选之路</a>
                                    <span class="text-muted">dcj3sjt126com</span>
<a class="tag" taget="_blank" href="/search/%E8%AF%97/1.htm">诗</a>
                                    <div>作者:罗伯特*费罗斯特 
  
黄色的树林里分出两条路, 
可惜我不能同时去涉足, 
我在那路口久久伫立, 
我向着一条路极目望去, 
直到它消失在丛林深处. 
  
但我却选了另外一条路, 
它荒草萋萋,十分幽寂; 
显得更诱人,更美丽, 
虽然在这两条小路上, 
都很少留下旅人的足迹. 
  
那天清晨落叶满地, 
两条路都未见脚印痕迹. 
呵,留下一条路等改日再</div>
                                </li>
                                <li><a href="/article/2876.htm"
                                       title="Java处理15位身份证变18位" target="_blank">Java处理15位身份证变18位</a>
                                    <span class="text-muted">蕃薯耀</span>
<a class="tag" taget="_blank" href="/search/18%E4%BD%8D%E8%BA%AB%E4%BB%BD%E8%AF%81%E5%8F%9815%E4%BD%8D/1.htm">18位身份证变15位</a><a class="tag" taget="_blank" href="/search/15%E4%BD%8D%E8%BA%AB%E4%BB%BD%E8%AF%81%E5%8F%9818%E4%BD%8D/1.htm">15位身份证变18位</a><a class="tag" taget="_blank" href="/search/%E8%BA%AB%E4%BB%BD%E8%AF%81%E8%BD%AC%E6%8D%A2/1.htm">身份证转换</a>
                                    <div>  
15位身份证变18位,18位身份证变15位 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
蕃薯耀 201</div>
                                </li>
                                <li><a href="/article/3003.htm"
                                       title="SpringMVC4零配置--应用上下文配置【AppConfig】" target="_blank">SpringMVC4零配置--应用上下文配置【AppConfig】</a>
                                    <span class="text-muted">hanqunfeng</span>
<a class="tag" taget="_blank" href="/search/springmvc4/1.htm">springmvc4</a>
                                    <div>从spring3.0开始,Spring将JavaConfig整合到核心模块,普通的POJO只需要标注@Configuration注解,就可以成为spring配置类,并通过在方法上标注@Bean注解的方式注入bean。 
  
Xml配置和Java类配置对比如下: 
applicationContext-AppConfig.xml 
  
<!-- 激活自动代理功能 参看:</div>
                                </li>
                                <li><a href="/article/3130.htm"
                                       title="Android中webview跟JAVASCRIPT中的交互" target="_blank">Android中webview跟JAVASCRIPT中的交互</a>
                                    <span class="text-muted">jackyrong</span>
<a class="tag" taget="_blank" href="/search/JavaScript/1.htm">JavaScript</a><a class="tag" taget="_blank" href="/search/html/1.htm">html</a><a class="tag" taget="_blank" href="/search/android/1.htm">android</a><a class="tag" taget="_blank" href="/search/%E8%84%9A%E6%9C%AC/1.htm">脚本</a>
                                    <div>  在android的应用程序中,可以直接调用webview中的javascript代码,而webview中的javascript代码,也可以去调用ANDROID应用程序(也就是JAVA部分的代码).下面举例说明之: 
 
1 JAVASCRIPT脚本调用android程序 
   要在webview中,调用addJavascriptInterface(OBJ,int</div>
                                </li>
                                <li><a href="/article/3257.htm"
                                       title="8个最佳Web开发资源推荐" target="_blank">8个最佳Web开发资源推荐</a>
                                    <span class="text-muted">lampcy</span>
<a class="tag" taget="_blank" href="/search/%E7%BC%96%E7%A8%8B/1.htm">编程</a><a class="tag" taget="_blank" href="/search/Web/1.htm">Web</a><a class="tag" taget="_blank" href="/search/%E7%A8%8B%E5%BA%8F%E5%91%98/1.htm">程序员</a>
                                    <div>Web开发对程序员来说是一项较为复杂的工作,程序员需要快速地满足用户需求。如今很多的在线资源可以给程序员提供帮助,比如指导手册、在线课程和一些参考资料,而且这些资源基本都是免费和适合初学者的。无论你是需要选择一门新的编程语言,或是了解最新的标准,还是需要从其他地方找到一些灵感,我们这里为你整理了一些很好的Web开发资源,帮助你更成功地进行Web开发。 
这里列出10个最佳Web开发资源,它们都是受</div>
                                </li>
                                <li><a href="/article/3384.htm"
                                       title="架构师之面试------jdk的hashMap实现" target="_blank">架构师之面试------jdk的hashMap实现</a>
                                    <span class="text-muted">nannan408</span>
<a class="tag" taget="_blank" href="/search/HashMap/1.htm">HashMap</a>
                                    <div>1.前言。 
  如题。 
2.详述。 
  (1)hashMap算法就是数组链表。数组存放的元素是键值对。jdk通过移位算法(其实也就是简单的加乘算法),如下代码来生成数组下标(生成后indexFor一下就成下标了)。 
 

static int hash(int h) 
{ 
    h ^= (h >>> 20) ^ (h >>></div>
                                </li>
                                <li><a href="/article/3511.htm"
                                       title="html禁止清除input文本输入缓存" target="_blank">html禁止清除input文本输入缓存</a>
                                    <span class="text-muted">Rainbow702</span>
<a class="tag" taget="_blank" href="/search/html/1.htm">html</a><a class="tag" taget="_blank" href="/search/%E7%BC%93%E5%AD%98/1.htm">缓存</a><a class="tag" taget="_blank" href="/search/input/1.htm">input</a><a class="tag" taget="_blank" href="/search/%E8%BE%93%E5%85%A5%E6%A1%86/1.htm">输入框</a><a class="tag" taget="_blank" href="/search/change/1.htm">change</a>
                                    <div>多数浏览器默认会缓存input的值,只有使用ctl+F5强制刷新的才可以清除缓存记录。    
如果不想让浏览器缓存input的值,有2种方法: 
方法一: 在不想使用缓存的input中添加 autocomplete="off";  
<input type="text" autocomplete="off" n</div>
                                </li>
                                <li><a href="/article/3638.htm"
                                       title="POJO和JavaBean的区别和联系" target="_blank">POJO和JavaBean的区别和联系</a>
                                    <span class="text-muted">tjmljw</span>
<a class="tag" taget="_blank" href="/search/POJO/1.htm">POJO</a><a class="tag" taget="_blank" href="/search/java+beans/1.htm">java beans</a>
                                    <div>POJO 和JavaBean是我们常见的两个关键字,一般容易混淆,POJO全称是Plain Ordinary Java Object / Pure Old Java Object,中文可以翻译成:普通Java类,具有一部分getter/setter方法的那种类就可以称作POJO,但是JavaBean则比 POJO复杂很多, Java Bean 是可复用的组件,对 Java Bean 并没有严格的规</div>
                                </li>
                                <li><a href="/article/3765.htm"
                                       title="java中单例的五种写法" target="_blank">java中单例的五种写法</a>
                                    <span class="text-muted">liuxiaoling</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E5%8D%95%E4%BE%8B/1.htm">单例</a>
                                    <div>/**
 * 单例模式的五种写法:
 * 1、懒汉
 * 2、恶汉
 * 3、静态内部类
 * 4、枚举
 * 5、双重校验锁
 */
/**
 * 五、 双重校验锁,在当前的内存模型中无效
 */
class LockSingleton
{

    private volatile static LockSingleton singleton;

    pri</div>
                                </li>
                </ul>
            </div>
        </div>
    </div>

<div>
    <div class="container">
        <div class="indexes">
            <strong>按字母分类:</strong>
            <a href="/tags/A/1.htm" target="_blank">A</a><a href="/tags/B/1.htm" target="_blank">B</a><a href="/tags/C/1.htm" target="_blank">C</a><a
                href="/tags/D/1.htm" target="_blank">D</a><a href="/tags/E/1.htm" target="_blank">E</a><a href="/tags/F/1.htm" target="_blank">F</a><a
                href="/tags/G/1.htm" target="_blank">G</a><a href="/tags/H/1.htm" target="_blank">H</a><a href="/tags/I/1.htm" target="_blank">I</a><a
                href="/tags/J/1.htm" target="_blank">J</a><a href="/tags/K/1.htm" target="_blank">K</a><a href="/tags/L/1.htm" target="_blank">L</a><a
                href="/tags/M/1.htm" target="_blank">M</a><a href="/tags/N/1.htm" target="_blank">N</a><a href="/tags/O/1.htm" target="_blank">O</a><a
                href="/tags/P/1.htm" target="_blank">P</a><a href="/tags/Q/1.htm" target="_blank">Q</a><a href="/tags/R/1.htm" target="_blank">R</a><a
                href="/tags/S/1.htm" target="_blank">S</a><a href="/tags/T/1.htm" target="_blank">T</a><a href="/tags/U/1.htm" target="_blank">U</a><a
                href="/tags/V/1.htm" target="_blank">V</a><a href="/tags/W/1.htm" target="_blank">W</a><a href="/tags/X/1.htm" target="_blank">X</a><a
                href="/tags/Y/1.htm" target="_blank">Y</a><a href="/tags/Z/1.htm" target="_blank">Z</a><a href="/tags/0/1.htm" target="_blank">其他</a>
        </div>
    </div>
</div>
<footer id="footer" class="mb30 mt30">
    <div class="container">
        <div class="footBglm">
            <a target="_blank" href="/">首页</a> -
            <a target="_blank" href="/custom/about.htm">关于我们</a> -
            <a target="_blank" href="/search/Java/1.htm">站内搜索</a> -
            <a target="_blank" href="/sitemap.txt">Sitemap</a> -
            <a target="_blank" href="/custom/delete.htm">侵权投诉</a>
        </div>
        <div class="copyright">版权所有 IT知识库 CopyRight © 2000-2050 E-COM-NET.COM , All Rights Reserved.
<!--            <a href="https://beian.miit.gov.cn/" rel="nofollow" target="_blank">京ICP备09083238号</a><br>-->
        </div>
    </div>
</footer>
<!-- 代码高亮 -->
<script type="text/javascript" src="/static/syntaxhighlighter/scripts/shCore.js"></script>
<script type="text/javascript" src="/static/syntaxhighlighter/scripts/shLegacy.js"></script>
<script type="text/javascript" src="/static/syntaxhighlighter/scripts/shAutoloader.js"></script>
<link type="text/css" rel="stylesheet" href="/static/syntaxhighlighter/styles/shCoreDefault.css"/>
<script type="text/javascript" src="/static/syntaxhighlighter/src/my_start_1.js"></script>





</body>

</html>