[机器翻译] 记一次多语言机器翻译模型的训练

文章目录

  • 前言
  • 数据准备
    • 数据下载
    • 数据预处理
  • 模型训练
  • 补充
    • 补充一:Key error while accessing batch_iterator.first_batch
  • 参考

前言

笔者尝试复现LaSS工作,由于该工作所做的第一步就是训练一个多语言机器翻译模型,故记录在此,本文主要内容是数据准备的步骤。

数据准备

实验使用iwslt 14中的8个以英语为中心的语言对,完成16个方向的多语言机器翻译。目前使用该数据集是因为其数据量相对较小,模型训练速度较快,笔者觉得比较适合用于机器翻译上手、比较不同模型性能的优劣。数据集的统计信息如下图所示:
[机器翻译] 记一次多语言机器翻译模型的训练_第1张图片
下面介绍数据的下载和预处理。假设现在的所在目录为/data/syxu/data/data_store/iwslt14

数据下载

从https://wit3.fbk.eu/2014-01链接中下载得到2014-01.tgz文件夹,保存至当前目录,tar zxvf 2014-01.tgz进行文件解压缩。2014-01中的内容如下:
[机器翻译] 记一次多语言机器翻译模型的训练_第2张图片
使用cp -r 2014-01/texts/*/en/*-en.tgz .将需要使用到的压缩文件提取到当前目录,得到:
[机器翻译] 记一次多语言机器翻译模型的训练_第3张图片

数据预处理

数据预处理部分的脚本代码参照LaSS和multilingual-kd-pytorch。
具体来说,首先在当前目录下创建预处理脚本文件:prepare-iwslt14.sh和preprocess_multilingual.py,这两个文件各自的代码如下:

  • prepare-iwslt14.sh
#!/usr/bin/env bash
#
# Adapted from https://github.com/facebookresearch/MIXER/blob/master/prepareData.sh
echo 'Cloning Moses github repository (for tokenization scripts)...'
git clone https://github.com/moses-smt/mosesdecoder.git

echo 'Cloning Subword NMT repository (for BPE pre-processing)...'
git clone https://github.com/rsennrich/subword-nmt.git

SCRIPTS=mosesdecoder/scripts
TOKENIZER=$SCRIPTS/tokenizer/tokenizer.perl
LC=$SCRIPTS/tokenizer/lowercase.perl
CLEAN=$SCRIPTS/training/clean-corpus-n.perl
BPEROOT=subword-nmt
BPE_TOKENS=30000
prep=iwslt14.tokenized
tmp=$prep/tmp
orig=orig
rm -r $orig
rm -r $tmp
rm -r $prep
mkdir -p $orig $tmp $prep

for src in ar de es fa he it nl pl; do
    tgt=en
    lang=$src-en

    echo "pre-processing train data..."
    for l in $src $tgt; do
        if [[ ! -f $src-en.tgz ]]; then
            wget https://wit3.fbk.eu/archive/2014-01//texts/$src/en/$src-en.tgz
        fi
        cd $orig
        tar zxvf ../$src-en.tgz
        cd ..

        f=train.tags.$lang.$l
        tok=train.tags.$lang.tok.$l

        cat $orig/$lang/$f | \
        grep -v '' | \
        grep -v '' | \
        grep -v '' | \
        sed -e 's///g'</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
        <span class="token function">sed</span> -e <span class="token string">'s/<\/title>//g'</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
        <span class="token function">sed</span> -e <span class="token string">'s/<description>//g'</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
        <span class="token function">sed</span> -e <span class="token string">'s/<\/description>//g'</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
        perl <span class="token variable">$TOKENIZER</span> -threads <span class="token number">8</span> -l <span class="token variable">$l</span> <span class="token operator">></span> <span class="token variable">$tmp</span>/<span class="token variable">$tok</span>
        <span class="token builtin class-name">echo</span> <span class="token string">""</span>
    <span class="token keyword">done</span>
    perl <span class="token variable">$CLEAN</span> -ratio <span class="token number">1.5</span> <span class="token variable">$tmp</span>/train.tags.<span class="token variable">$lang</span>.tok <span class="token variable">$src</span> <span class="token variable">$tgt</span> <span class="token variable">$tmp</span>/train.tags.<span class="token variable">$lang</span>.clean <span class="token number">1</span> <span class="token number">175</span>
    <span class="token keyword">for</span> <span class="token for-or-select variable">l</span> <span class="token keyword">in</span> <span class="token variable">$src</span> <span class="token variable">$tgt</span><span class="token punctuation">;</span> <span class="token keyword">do</span>
        perl <span class="token variable">$LC</span> <span class="token operator"><</span> <span class="token variable">$tmp</span>/train.tags.<span class="token variable">$lang</span>.clean.<span class="token variable">$l</span> <span class="token operator">></span> <span class="token variable">$tmp</span>/train.tags.<span class="token variable">$lang</span><span class="token builtin class-name">.</span><span class="token variable">$l</span>
    <span class="token keyword">done</span>

    <span class="token builtin class-name">echo</span> <span class="token string">"pre-processing valid/test data..."</span>
    <span class="token keyword">for</span> <span class="token for-or-select variable">l</span> <span class="token keyword">in</span> <span class="token variable">$src</span> <span class="token variable">$tgt</span><span class="token punctuation">;</span> <span class="token keyword">do</span>
        <span class="token keyword">for</span> <span class="token for-or-select variable">o</span> <span class="token keyword">in</span> <span class="token variable"><span class="token variable">`</span><span class="token function">ls</span> $orig/$lang/IWSLT14.TED*.$l.xml<span class="token variable">`</span></span><span class="token punctuation">;</span> <span class="token keyword">do</span>
        <span class="token assign-left variable">fname</span><span class="token operator">=</span><span class="token variable">${o<span class="token operator">##</span>*<span class="token operator">/</span>}</span>
        <span class="token assign-left variable">f</span><span class="token operator">=</span><span class="token variable">$tmp</span>/<span class="token variable">${fname<span class="token operator">%</span>.*}</span>
        <span class="token builtin class-name">echo</span> <span class="token variable">$o</span> <span class="token variable">$f</span>
        <span class="token function">grep</span> <span class="token string">'<seg id'</span> <span class="token variable">$o</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
            <span class="token function">sed</span> -e <span class="token string">'s/<seg id="[0-9]*">\s*//g'</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
            <span class="token function">sed</span> -e <span class="token string">'s/\s*<\/seg>\s*//g'</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
            <span class="token function">sed</span> -e <span class="token string">"s/\’/\'/g"</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
        perl <span class="token variable">$TOKENIZER</span> -threads <span class="token number">8</span> -l <span class="token variable">$l</span> <span class="token operator">|</span> <span class="token punctuation">\</span>
        perl <span class="token variable">$LC</span> <span class="token operator">></span> <span class="token variable">$f</span>
        <span class="token builtin class-name">echo</span> <span class="token string">""</span>
        <span class="token keyword">done</span>
    <span class="token keyword">done</span>


    <span class="token builtin class-name">echo</span> <span class="token string">"creating train, valid, test..."</span>
    <span class="token keyword">for</span> <span class="token for-or-select variable">l</span> <span class="token keyword">in</span> <span class="token variable">$src</span> <span class="token variable">$tgt</span><span class="token punctuation">;</span> <span class="token keyword">do</span>
        <span class="token function">awk</span> <span class="token string">'{if (NR%23 == 0)  print $0; }'</span> <span class="token variable">$tmp</span>/train.tags.<span class="token variable">$src</span>-<span class="token variable">$tgt</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token operator">></span> <span class="token variable">$tmp</span>/valid.en-<span class="token variable">$src</span><span class="token builtin class-name">.</span><span class="token variable">$l</span>
        <span class="token function">awk</span> <span class="token string">'{if (NR%23 != 0)  print $0; }'</span> <span class="token variable">$tmp</span>/train.tags.<span class="token variable">$src</span>-<span class="token variable">$tgt</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token operator">></span> <span class="token variable">$tmp</span>/train.en-<span class="token variable">$src</span><span class="token builtin class-name">.</span><span class="token variable">$l</span>

        <span class="token function">cat</span> <span class="token variable">$tmp</span>/IWSLT14.TED.dev2010.<span class="token variable">$src</span>-<span class="token variable">$tgt</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token punctuation">\</span>
            <span class="token variable">$tmp</span>/IWSLT14.TEDX.dev2012.<span class="token variable">$src</span>-<span class="token variable">$tgt</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token punctuation">\</span>
            <span class="token variable">$tmp</span>/IWSLT14.TED.tst2010.<span class="token variable">$src</span>-<span class="token variable">$tgt</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token punctuation">\</span>
            <span class="token variable">$tmp</span>/IWSLT14.TED.tst2011.<span class="token variable">$src</span>-<span class="token variable">$tgt</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token punctuation">\</span>
            <span class="token variable">$tmp</span>/IWSLT14.TED.tst2012.<span class="token variable">$src</span>-<span class="token variable">$tgt</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token punctuation">\</span>
            <span class="token operator">></span> <span class="token variable">$tmp</span>/test.en-<span class="token variable">$src</span><span class="token builtin class-name">.</span><span class="token variable">$l</span>
    <span class="token keyword">done</span>

    <span class="token assign-left variable">TRAIN</span><span class="token operator">=</span><span class="token variable">$tmp</span>/train.all
    <span class="token assign-left variable">BPE_CODE</span><span class="token operator">=</span><span class="token variable">$prep</span>/code
    <span class="token function">rm</span> -f <span class="token variable">$TRAIN</span>
    <span class="token keyword">for</span> <span class="token for-or-select variable">l</span> <span class="token keyword">in</span> <span class="token variable">$src</span> <span class="token variable">$tgt</span><span class="token punctuation">;</span> <span class="token keyword">do</span>
        <span class="token function">cat</span> <span class="token variable">$tmp</span>/train.en-<span class="token variable">$src</span><span class="token builtin class-name">.</span><span class="token variable">$l</span> <span class="token operator">>></span> <span class="token variable">$TRAIN</span>
    <span class="token keyword">done</span>
<span class="token keyword">done</span>
<span class="token builtin class-name">echo</span> <span class="token string">"learn_bpe.py on <span class="token variable">${TRAIN}</span>..."</span>
python <span class="token variable">$BPEROOT</span>/learn_bpe.py -s <span class="token variable">$BPE_TOKENS</span> <span class="token operator"><</span> <span class="token variable">$TRAIN</span> <span class="token operator">></span> <span class="token variable">$BPE_CODE</span>

<span class="token keyword">for</span> <span class="token for-or-select variable">src</span> <span class="token keyword">in</span> ar de es fa he it <span class="token function">nl</span> pl<span class="token punctuation">;</span> <span class="token keyword">do</span>
    <span class="token keyword">for</span> <span class="token for-or-select variable">L</span> <span class="token keyword">in</span> <span class="token variable">$src</span> <span class="token variable">$tgt</span><span class="token punctuation">;</span> <span class="token keyword">do</span>
        <span class="token keyword">for</span> <span class="token for-or-select variable">f</span> <span class="token keyword">in</span> train.en-<span class="token variable">$src</span><span class="token builtin class-name">.</span><span class="token variable">$L</span> valid.en-<span class="token variable">$src</span><span class="token builtin class-name">.</span><span class="token variable">$L</span> test.en-<span class="token variable">$src</span><span class="token builtin class-name">.</span><span class="token variable">$L</span><span class="token punctuation">;</span> <span class="token keyword">do</span>
            <span class="token builtin class-name">echo</span> <span class="token string">"apply_bpe.py to <span class="token variable">${f}</span>..."</span>
            python <span class="token variable">$BPEROOT</span>/apply_bpe.py -c <span class="token variable">$BPE_CODE</span> <span class="token operator"><</span> <span class="token variable">$tmp</span>/<span class="token variable">$f</span> <span class="token operator">></span> <span class="token variable">$prep</span>/<span class="token variable">$f</span>
        <span class="token keyword">done</span>
    <span class="token keyword">done</span>
<span class="token keyword">done</span>

<span class="token function">rm</span> -r text
<span class="token function">mkdir</span> -p text/train_data
<span class="token function">mkdir</span> -p text/valid_data
<span class="token function">mkdir</span> -p text/test_data
<span class="token function">cp</span> iwslt14.tokenized/train.en-* text/train_data/
<span class="token function">cp</span> iwslt14.tokenized/valid.en-* text/valid_data/
<span class="token function">cp</span> iwslt14.tokenized/test.en-* text/test_data/
<span class="token builtin class-name">cd</span> <span class="token punctuation">..</span>
python iwslt14/preprocess_multilingual.py --pref<span class="token operator">=</span>iwslt14/  --joined-dictionary
</code></pre> 
  <ul> 
   <li>preprocess_multilingual.py</li> 
  </ul> 
  <pre><code class="prism language-python"><span class="token comment">#!/usr/bin/env python3</span>
<span class="token comment"># Copyright (c) 2017-present, Facebook, Inc.</span>
<span class="token comment"># All rights reserved.</span>
<span class="token comment">#</span>
<span class="token comment"># This source code is licensed under the license found in the LICENSE file in</span>
<span class="token comment"># the root directory of this source tree. An additional grant of patent rights</span>
<span class="token comment"># can be found in the PATENTS file in the same directory.</span>
<span class="token triple-quoted-string string">"""
Data pre-processing: build vocabularies and binarize training data.
"""</span>

<span class="token keyword">import</span> argparse
<span class="token keyword">import</span> glob
<span class="token keyword">import</span> json
<span class="token keyword">import</span> random
<span class="token keyword">from</span> collections <span class="token keyword">import</span> Counter
<span class="token keyword">from</span> itertools <span class="token keyword">import</span> zip_longest
<span class="token keyword">import</span> os
<span class="token keyword">import</span> shutil

<span class="token keyword">from</span> fairseq<span class="token punctuation">.</span>data <span class="token keyword">import</span> indexed_dataset<span class="token punctuation">,</span> dictionary
<span class="token keyword">from</span> fairseq<span class="token punctuation">.</span>tokenizer <span class="token keyword">import</span> Tokenizer<span class="token punctuation">,</span> tokenize_line
<span class="token keyword">from</span> multiprocessing <span class="token keyword">import</span> Pool<span class="token punctuation">,</span> Manager<span class="token punctuation">,</span> Process


<span class="token keyword">def</span> <span class="token function">get_parser</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
    parser <span class="token operator">=</span> argparse<span class="token punctuation">.</span>ArgumentParser<span class="token punctuation">(</span><span class="token punctuation">)</span>
    parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">'--pref'</span><span class="token punctuation">,</span> metavar<span class="token operator">=</span><span class="token string">'FP'</span><span class="token punctuation">,</span> default<span class="token operator">=</span><span class="token boolean">None</span><span class="token punctuation">,</span> <span class="token builtin">help</span><span class="token operator">=</span><span class="token string">'data prefix'</span><span class="token punctuation">)</span>
    parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">'--no-dict'</span><span class="token punctuation">,</span> action<span class="token operator">=</span><span class="token string">'store_true'</span><span class="token punctuation">,</span> <span class="token builtin">help</span><span class="token operator">=</span><span class="token string">'do not build dictionary'</span><span class="token punctuation">)</span>
    parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">'--nwordssrc'</span><span class="token punctuation">,</span> metavar<span class="token operator">=</span><span class="token string">'N'</span><span class="token punctuation">,</span> default<span class="token operator">=</span><span class="token number">65536</span><span class="token punctuation">,</span> <span class="token builtin">type</span><span class="token operator">=</span><span class="token builtin">int</span><span class="token punctuation">,</span> <span class="token builtin">help</span><span class="token operator">=</span><span class="token string">'number of target words to retain'</span><span class="token punctuation">)</span>
    parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">'--padding-factor'</span><span class="token punctuation">,</span> metavar<span class="token operator">=</span><span class="token string">'N'</span><span class="token punctuation">,</span> default<span class="token operator">=</span><span class="token number">8</span><span class="token punctuation">,</span> <span class="token builtin">type</span><span class="token operator">=</span><span class="token builtin">int</span><span class="token punctuation">,</span>
                        <span class="token builtin">help</span><span class="token operator">=</span><span class="token string">'Pad dictionary size to be multiple of N'</span><span class="token punctuation">)</span>
    parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">'--joined-dictionary'</span><span class="token punctuation">,</span> action<span class="token operator">=</span><span class="token string">'store_true'</span><span class="token punctuation">,</span> <span class="token builtin">help</span><span class="token operator">=</span><span class="token string">'Generate joined dictionary for en-xx'</span><span class="token punctuation">)</span>
    parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">'--expert'</span><span class="token punctuation">,</span> default<span class="token operator">=</span><span class="token string">''</span><span class="token punctuation">,</span> <span class="token builtin">type</span><span class="token operator">=</span><span class="token builtin">str</span><span class="token punctuation">)</span>
    parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">'--workers'</span><span class="token punctuation">,</span> metavar<span class="token operator">=</span><span class="token string">'N'</span><span class="token punctuation">,</span> default<span class="token operator">=</span><span class="token number">4</span><span class="token punctuation">,</span> <span class="token builtin">type</span><span class="token operator">=</span><span class="token builtin">int</span><span class="token punctuation">,</span> <span class="token builtin">help</span><span class="token operator">=</span><span class="token string">'number of parallel workers'</span><span class="token punctuation">)</span>
    <span class="token comment"># parser.add_argument('--workers', metavar='N', default=os.cpu_count(), type=int, help='number of parallel workers')</span>
    <span class="token keyword">return</span> parser


<span class="token keyword">def</span> <span class="token function">main</span><span class="token punctuation">(</span>args<span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span>args<span class="token punctuation">)</span>
    random<span class="token punctuation">.</span>seed<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span>

    destdir <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>args<span class="token punctuation">.</span>pref<span class="token punctuation">,</span> <span class="token string">'data-bin'</span> <span class="token operator">+</span> <span class="token punctuation">(</span><span class="token string">''</span> <span class="token keyword">if</span> args<span class="token punctuation">.</span>expert <span class="token operator">==</span> <span class="token string">''</span> <span class="token keyword">else</span> <span class="token string">'/'</span> <span class="token operator">+</span> args<span class="token punctuation">.</span>expert<span class="token punctuation">)</span><span class="token punctuation">)</span>
    os<span class="token punctuation">.</span>makedirs<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> exist_ok<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span>
    dict_path <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string">'dict.txt'</span><span class="token punctuation">)</span>

    textdir <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>args<span class="token punctuation">.</span>pref<span class="token punctuation">,</span> <span class="token string">'text'</span><span class="token punctuation">)</span>
    train_dir <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>textdir<span class="token punctuation">,</span> <span class="token string">'train_data'</span><span class="token punctuation">)</span>
    test_dir <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>textdir<span class="token punctuation">,</span> <span class="token string">'test_data'</span><span class="token punctuation">)</span>
    valid_dir <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>textdir<span class="token punctuation">,</span> <span class="token string">'valid_data'</span><span class="token punctuation">)</span>
    <span class="token comment"># if args.expert != '':</span>
    <span class="token comment"># train_files = glob.glob('{}/train.{}-en.*.e'.format(train_dir, args.expert)) + \</span>
    <span class="token comment">#               glob.glob('{}/train.en-{}.*.e'.format(train_dir, args.expert))</span>
    <span class="token comment"># pass</span>
    <span class="token comment"># else:</span>
    train_files <span class="token operator">=</span> glob<span class="token punctuation">.</span>glob<span class="token punctuation">(</span><span class="token string">'{}/train.*-*.*'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>train_dir<span class="token punctuation">)</span><span class="token punctuation">)</span>
    train_files <span class="token operator">=</span> <span class="token punctuation">[</span>f <span class="token keyword">for</span> f <span class="token keyword">in</span> train_files <span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>f<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'.'</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">in</span> <span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
    test_files <span class="token operator">=</span> glob<span class="token punctuation">.</span>glob<span class="token punctuation">(</span><span class="token string">'{}/test.*-*.*'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>test_dir<span class="token punctuation">)</span><span class="token punctuation">)</span>
    test_files <span class="token operator">=</span> <span class="token punctuation">[</span>f <span class="token keyword">for</span> f <span class="token keyword">in</span> test_files <span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>f<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'.'</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">in</span> <span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
    valid_files <span class="token operator">=</span> glob<span class="token punctuation">.</span>glob<span class="token punctuation">(</span><span class="token string">'{}/valid.*-*.*'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>valid_dir<span class="token punctuation">)</span><span class="token punctuation">)</span>
    valid_files <span class="token operator">=</span> <span class="token punctuation">[</span>f <span class="token keyword">for</span> f <span class="token keyword">in</span> valid_files <span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>f<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'.'</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">in</span> <span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
    lng_pairs <span class="token operator">=</span> <span class="token builtin">set</span><span class="token punctuation">(</span><span class="token punctuation">[</span>f<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'/'</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">"."</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token keyword">for</span> f <span class="token keyword">in</span> <span class="token punctuation">(</span>train_files <span class="token operator">+</span> test_files <span class="token operator">+</span> valid_files<span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span>train_files<span class="token punctuation">,</span> test_files<span class="token punctuation">,</span> valid_files<span class="token punctuation">,</span> lng_pairs<span class="token punctuation">)</span>

    <span class="token keyword">def</span> <span class="token function">build_dictionary</span><span class="token punctuation">(</span>filenames<span class="token punctuation">)</span><span class="token punctuation">:</span>
        d <span class="token operator">=</span> dictionary<span class="token punctuation">.</span>Dictionary<span class="token punctuation">(</span><span class="token punctuation">)</span>
        <span class="token keyword">for</span> filename <span class="token keyword">in</span> filenames<span class="token punctuation">:</span>
            Tokenizer<span class="token punctuation">.</span>add_file_to_dictionary<span class="token punctuation">(</span>filename<span class="token punctuation">,</span> d<span class="token punctuation">,</span> tokenize_line<span class="token punctuation">,</span> args<span class="token punctuation">.</span>workers<span class="token punctuation">)</span>
        <span class="token keyword">return</span> d

    tgt_dict_path <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string">'dict.tgt.txt'</span><span class="token punctuation">)</span>
    <span class="token keyword">if</span> <span class="token keyword">not</span> args<span class="token punctuation">.</span>no_dict<span class="token punctuation">:</span>
        <span class="token keyword">if</span> args<span class="token punctuation">.</span>joined_dictionary<span class="token punctuation">:</span>
            src_dict <span class="token operator">=</span> build_dictionary<span class="token punctuation">(</span>train_files<span class="token punctuation">)</span>
            src_dict<span class="token punctuation">.</span>finalize<span class="token punctuation">(</span>
                nwords<span class="token operator">=</span>args<span class="token punctuation">.</span>nwordssrc<span class="token punctuation">,</span>
                padding_factor<span class="token operator">=</span>args<span class="token punctuation">.</span>padding_factor
            <span class="token punctuation">)</span>
            dict_path <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string">'dict.txt'</span><span class="token punctuation">)</span>
            <span class="token comment"># create dict for every language</span>
            <span class="token keyword">for</span> lng_pair <span class="token keyword">in</span> lng_pairs<span class="token punctuation">:</span>
                src<span class="token punctuation">,</span> tgt <span class="token operator">=</span> lng_pair<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'-'</span><span class="token punctuation">)</span>
                tmp_src_dict_path <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string-interpolation"><span class="token string">f'dict.</span><span class="token interpolation"><span class="token punctuation">{</span>src<span class="token punctuation">}</span></span><span class="token string">.txt'</span></span><span class="token punctuation">)</span>
                tmp_tgt_dict_path <span class="token operator">=</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string-interpolation"><span class="token string">f'dict.</span><span class="token interpolation"><span class="token punctuation">{</span>tgt<span class="token punctuation">}</span></span><span class="token string">.txt'</span></span><span class="token punctuation">)</span>
                <span class="token keyword">if</span> <span class="token keyword">not</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>exists<span class="token punctuation">(</span>tmp_src_dict_path<span class="token punctuation">)</span><span class="token punctuation">:</span>
                    src_dict<span class="token punctuation">.</span>save<span class="token punctuation">(</span>tmp_src_dict_path<span class="token punctuation">)</span>
                <span class="token keyword">if</span> <span class="token keyword">not</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>exists<span class="token punctuation">(</span>tmp_tgt_dict_path<span class="token punctuation">)</span><span class="token punctuation">:</span>
                    src_dict<span class="token punctuation">.</span>save<span class="token punctuation">(</span>tmp_tgt_dict_path<span class="token punctuation">)</span> 
            <span class="token keyword">print</span><span class="token punctuation">(</span>src_dict<span class="token punctuation">)</span>
        <span class="token keyword">else</span><span class="token punctuation">:</span>
            <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"| build en dict."</span><span class="token punctuation">)</span>
            src_dict <span class="token operator">=</span> build_dictionary<span class="token punctuation">(</span><span class="token punctuation">[</span>f <span class="token keyword">for</span> f <span class="token keyword">in</span> train_files <span class="token keyword">if</span> f<span class="token punctuation">.</span>replace<span class="token punctuation">(</span><span class="token string">'.tok.bpe'</span><span class="token punctuation">,</span> <span class="token string">''</span><span class="token punctuation">)</span><span class="token punctuation">.</span>endswith<span class="token punctuation">(</span><span class="token string">'.en'</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
            src_dict<span class="token punctuation">.</span>finalize<span class="token punctuation">(</span>
                nwords<span class="token operator">=</span>args<span class="token punctuation">.</span>nwordssrc<span class="token punctuation">,</span>
                padding_factor<span class="token operator">=</span>args<span class="token punctuation">.</span>padding_factor
            <span class="token punctuation">)</span>
            src_dict<span class="token punctuation">.</span>save<span class="token punctuation">(</span>dict_path<span class="token punctuation">)</span>

            <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"| build xx dict."</span><span class="token punctuation">)</span>
            tgt_dict <span class="token operator">=</span> build_dictionary<span class="token punctuation">(</span><span class="token punctuation">[</span>f <span class="token keyword">for</span> f <span class="token keyword">in</span> train_files <span class="token keyword">if</span> <span class="token keyword">not</span> f<span class="token punctuation">.</span>replace<span class="token punctuation">(</span><span class="token string">'.tok.bpe'</span><span class="token punctuation">,</span> <span class="token string">''</span><span class="token punctuation">)</span><span class="token punctuation">.</span>endswith<span class="token punctuation">(</span><span class="token string">'.en'</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
            tgt_dict<span class="token punctuation">.</span>finalize<span class="token punctuation">(</span>
                nwords<span class="token operator">=</span>args<span class="token punctuation">.</span>nwordssrc<span class="token punctuation">,</span>
                padding_factor<span class="token operator">=</span>args<span class="token punctuation">.</span>padding_factor
            <span class="token punctuation">)</span>
            tgt_dict<span class="token punctuation">.</span>save<span class="token punctuation">(</span>tgt_dict_path<span class="token punctuation">)</span>

    <span class="token keyword">def</span> <span class="token function">make_binary_dataset</span><span class="token punctuation">(</span>input_prefix<span class="token punctuation">,</span> output_prefix<span class="token punctuation">,</span> lng_pair<span class="token punctuation">,</span> lang<span class="token punctuation">,</span> num_workers<span class="token punctuation">)</span><span class="token punctuation">:</span>
        <span class="token keyword">if</span> <span class="token keyword">not</span> args<span class="token punctuation">.</span>joined_dictionary <span class="token keyword">and</span> lang <span class="token operator">!=</span> <span class="token string">'en'</span><span class="token punctuation">:</span>
            <span class="token builtin">dict</span> <span class="token operator">=</span> dictionary<span class="token punctuation">.</span>Dictionary<span class="token punctuation">.</span>load<span class="token punctuation">(</span>tgt_dict_path<span class="token punctuation">)</span>
        <span class="token keyword">else</span><span class="token punctuation">:</span>
            <span class="token builtin">dict</span> <span class="token operator">=</span> dictionary<span class="token punctuation">.</span>Dictionary<span class="token punctuation">.</span>load<span class="token punctuation">(</span>dict_path<span class="token punctuation">)</span>

        <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">'| [{}] Dictionary: {} types'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>lang<span class="token punctuation">,</span> <span class="token builtin">len</span><span class="token punctuation">(</span><span class="token builtin">dict</span><span class="token punctuation">)</span> <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
        n_seq_tok <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span>
        replaced <span class="token operator">=</span> Counter<span class="token punctuation">(</span><span class="token punctuation">)</span>

        <span class="token keyword">def</span> <span class="token function">merge_result</span><span class="token punctuation">(</span>worker_result<span class="token punctuation">)</span><span class="token punctuation">:</span>
            replaced<span class="token punctuation">.</span>update<span class="token punctuation">(</span>worker_result<span class="token punctuation">[</span><span class="token string">'replaced'</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
            n_seq_tok<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">+=</span> worker_result<span class="token punctuation">[</span><span class="token string">'nseq'</span><span class="token punctuation">]</span>
            n_seq_tok<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">+=</span> worker_result<span class="token punctuation">[</span><span class="token string">'ntok'</span><span class="token punctuation">]</span>

        input_file <span class="token operator">=</span> <span class="token string-interpolation"><span class="token string">f'</span><span class="token interpolation"><span class="token punctuation">{</span>input_prefix<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lng_pair<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lang<span class="token punctuation">}</span></span><span class="token string">.tok.bpe'</span></span>
        <span class="token keyword">if</span> <span class="token keyword">not</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>exists<span class="token punctuation">(</span>input_file<span class="token punctuation">)</span><span class="token punctuation">:</span>
            input_file <span class="token operator">=</span> <span class="token string-interpolation"><span class="token string">f'</span><span class="token interpolation"><span class="token punctuation">{</span>input_prefix<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lng_pair<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lang<span class="token punctuation">}</span></span><span class="token string">'</span></span>
            <span class="token keyword">if</span> <span class="token keyword">not</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>exists<span class="token punctuation">(</span>input_file<span class="token punctuation">)</span><span class="token punctuation">:</span>
                <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"| {} not found"</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>input_file<span class="token punctuation">)</span><span class="token punctuation">)</span>
                <span class="token keyword">return</span>
        <span class="token keyword">if</span> args<span class="token punctuation">.</span>expert<span class="token punctuation">:</span>
            input_file <span class="token operator">=</span> input_file <span class="token operator">+</span> <span class="token string">'.e'</span>
        offsets <span class="token operator">=</span> Tokenizer<span class="token punctuation">.</span>find_offsets<span class="token punctuation">(</span>input_file<span class="token punctuation">,</span> num_workers<span class="token punctuation">)</span>
        pool <span class="token operator">=</span> <span class="token boolean">None</span>
        <span class="token keyword">if</span> num_workers <span class="token operator">></span> <span class="token number">1</span><span class="token punctuation">:</span>
            pool <span class="token operator">=</span> Pool<span class="token punctuation">(</span>processes<span class="token operator">=</span>num_workers <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">)</span>
            <span class="token keyword">for</span> worker_id <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> num_workers<span class="token punctuation">)</span><span class="token punctuation">:</span>
                fn_without_ext <span class="token operator">=</span> <span class="token string-interpolation"><span class="token string">f"</span><span class="token interpolation"><span class="token punctuation">{</span>output_prefix<span class="token punctuation">}</span></span><span class="token interpolation"><span class="token punctuation">{</span>worker_id<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lng_pair<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lang<span class="token punctuation">}</span></span><span class="token string">"</span></span>
                pool<span class="token punctuation">.</span>apply_async<span class="token punctuation">(</span>binarize<span class="token punctuation">,</span> <span class="token punctuation">(</span>input_file<span class="token punctuation">,</span> <span class="token builtin">dict</span><span class="token punctuation">,</span> fn_without_ext<span class="token punctuation">,</span>
                                            offsets<span class="token punctuation">[</span>worker_id<span class="token punctuation">]</span><span class="token punctuation">,</span>
                                            offsets<span class="token punctuation">[</span>worker_id <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">,</span> callback<span class="token operator">=</span>merge_result<span class="token punctuation">)</span>
            pool<span class="token punctuation">.</span>close<span class="token punctuation">(</span><span class="token punctuation">)</span>

        ds <span class="token operator">=</span> indexed_dataset<span class="token punctuation">.</span>IndexedDatasetBuilder<span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"</span><span class="token interpolation"><span class="token punctuation">{</span>output_prefix<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lng_pair<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lang<span class="token punctuation">}</span></span><span class="token string">.bin"</span></span><span class="token punctuation">)</span>
        merge_result<span class="token punctuation">(</span>Tokenizer<span class="token punctuation">.</span>binarize<span class="token punctuation">(</span>input_file<span class="token punctuation">,</span> <span class="token builtin">dict</span><span class="token punctuation">,</span> <span class="token keyword">lambda</span> t<span class="token punctuation">:</span> ds<span class="token punctuation">.</span>add_item<span class="token punctuation">(</span>t<span class="token punctuation">)</span><span class="token punctuation">,</span>
                                        offset<span class="token operator">=</span><span class="token number">0</span><span class="token punctuation">,</span> end<span class="token operator">=</span>offsets<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
        <span class="token keyword">if</span> num_workers <span class="token operator">></span> <span class="token number">1</span><span class="token punctuation">:</span>
            pool<span class="token punctuation">.</span>join<span class="token punctuation">(</span><span class="token punctuation">)</span>
            <span class="token keyword">for</span> worker_id <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> num_workers<span class="token punctuation">)</span><span class="token punctuation">:</span>
                temp_file_path <span class="token operator">=</span> <span class="token string-interpolation"><span class="token string">f"</span><span class="token interpolation"><span class="token punctuation">{</span>output_prefix<span class="token punctuation">}</span></span><span class="token interpolation"><span class="token punctuation">{</span>worker_id<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lng_pair<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lang<span class="token punctuation">}</span></span><span class="token string">"</span></span>
                ds<span class="token punctuation">.</span>merge_file_<span class="token punctuation">(</span>temp_file_path<span class="token punctuation">)</span>
                os<span class="token punctuation">.</span>remove<span class="token punctuation">(</span>indexed_dataset<span class="token punctuation">.</span>data_file_path<span class="token punctuation">(</span>temp_file_path<span class="token punctuation">)</span><span class="token punctuation">)</span>
                os<span class="token punctuation">.</span>remove<span class="token punctuation">(</span>indexed_dataset<span class="token punctuation">.</span>index_file_path<span class="token punctuation">(</span>temp_file_path<span class="token punctuation">)</span><span class="token punctuation">)</span>

        ds<span class="token punctuation">.</span>finalize<span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"</span><span class="token interpolation"><span class="token punctuation">{</span>output_prefix<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lng_pair<span class="token punctuation">}</span></span><span class="token string">.</span><span class="token interpolation"><span class="token punctuation">{</span>lang<span class="token punctuation">}</span></span><span class="token string">.idx"</span></span><span class="token punctuation">)</span>

        <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">'| [{}] {}: {} sents, {} tokens, {:.3}% replaced by {}'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>
            lang<span class="token punctuation">,</span> input_file<span class="token punctuation">,</span> n_seq_tok<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span> n_seq_tok<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
            <span class="token number">100</span> <span class="token operator">*</span> <span class="token builtin">sum</span><span class="token punctuation">(</span>replaced<span class="token punctuation">.</span>values<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token operator">/</span> n_seq_tok<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token builtin">dict</span><span class="token punctuation">.</span>unk_word<span class="token punctuation">)</span><span class="token punctuation">)</span>

    <span class="token keyword">def</span> <span class="token function">make_all</span><span class="token punctuation">(</span>lng_pair<span class="token punctuation">,</span> lang<span class="token punctuation">)</span><span class="token punctuation">:</span>
        make_binary_dataset<span class="token punctuation">(</span>
            os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>train_dir<span class="token punctuation">,</span> <span class="token string">'train'</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
            os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string">'train'</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
            lng_pair<span class="token punctuation">,</span> lang<span class="token punctuation">,</span> num_workers<span class="token operator">=</span>args<span class="token punctuation">.</span>workers<span class="token punctuation">)</span>
        make_binary_dataset<span class="token punctuation">(</span>
            os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>test_dir<span class="token punctuation">,</span> <span class="token string">'test'</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
            os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string">'test'</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
            lng_pair<span class="token punctuation">,</span> lang<span class="token punctuation">,</span> num_workers<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span>
        make_binary_dataset<span class="token punctuation">(</span>
            os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>valid_dir<span class="token punctuation">,</span> <span class="token string">'valid'</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
            os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string">'valid'</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
            lng_pair<span class="token punctuation">,</span> lang<span class="token punctuation">,</span> num_workers<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span>

    lngs <span class="token operator">=</span> <span class="token builtin">set</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
    <span class="token keyword">for</span> lng_pair <span class="token keyword">in</span> lng_pairs<span class="token punctuation">:</span>
        src_and_tgt <span class="token operator">=</span> lng_pair<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'-'</span><span class="token punctuation">)</span>
        <span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>src_and_tgt<span class="token punctuation">)</span> <span class="token operator">!=</span> <span class="token number">2</span><span class="token punctuation">:</span>
            <span class="token keyword">continue</span>
        src<span class="token punctuation">,</span> tgt <span class="token operator">=</span> src_and_tgt
        <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"| building: "</span><span class="token punctuation">,</span> src<span class="token punctuation">,</span> tgt<span class="token punctuation">)</span>
        lngs<span class="token punctuation">.</span>add<span class="token punctuation">(</span>src<span class="token punctuation">)</span>
        lngs<span class="token punctuation">.</span>add<span class="token punctuation">(</span>tgt<span class="token punctuation">)</span>
        make_all<span class="token punctuation">(</span>lng_pair<span class="token punctuation">,</span> src<span class="token punctuation">)</span>
        make_all<span class="token punctuation">(</span>lng_pair<span class="token punctuation">,</span> tgt<span class="token punctuation">)</span>

    lngs <span class="token operator">=</span> <span class="token builtin">list</span><span class="token punctuation">(</span>lngs<span class="token punctuation">)</span>
    lngs<span class="token punctuation">.</span>sort<span class="token punctuation">(</span><span class="token punctuation">)</span>
    json<span class="token punctuation">.</span>dump<span class="token punctuation">(</span>lngs<span class="token punctuation">,</span> <span class="token builtin">open</span><span class="token punctuation">(</span>os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>join<span class="token punctuation">(</span>destdir<span class="token punctuation">,</span> <span class="token string">'all_lngs.json'</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token string">'w'</span><span class="token punctuation">)</span><span class="token punctuation">)</span>


<span class="token keyword">def</span> <span class="token function">binarize</span><span class="token punctuation">(</span>filename<span class="token punctuation">,</span> <span class="token builtin">dict</span><span class="token punctuation">,</span> fn_without_ext<span class="token punctuation">,</span> offset<span class="token punctuation">,</span> end<span class="token punctuation">)</span><span class="token punctuation">:</span>
    ds <span class="token operator">=</span> indexed_dataset<span class="token punctuation">.</span>IndexedDatasetBuilder<span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"</span><span class="token interpolation"><span class="token punctuation">{</span>fn_without_ext<span class="token punctuation">}</span></span><span class="token string">.bin"</span></span><span class="token punctuation">)</span>

    <span class="token keyword">def</span> <span class="token function">consumer</span><span class="token punctuation">(</span>tensor<span class="token punctuation">)</span><span class="token punctuation">:</span>
        ds<span class="token punctuation">.</span>add_item<span class="token punctuation">(</span>tensor<span class="token punctuation">)</span>

    res <span class="token operator">=</span> Tokenizer<span class="token punctuation">.</span>binarize<span class="token punctuation">(</span>filename<span class="token punctuation">,</span> <span class="token builtin">dict</span><span class="token punctuation">,</span> consumer<span class="token punctuation">,</span> offset<span class="token operator">=</span>offset<span class="token punctuation">,</span> end<span class="token operator">=</span>end<span class="token punctuation">)</span>
    ds<span class="token punctuation">.</span>finalize<span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"</span><span class="token interpolation"><span class="token punctuation">{</span>fn_without_ext<span class="token punctuation">}</span></span><span class="token string">.idx"</span></span><span class="token punctuation">)</span>
    <span class="token keyword">return</span> res


<span class="token keyword">if</span> __name__ <span class="token operator">==</span> <span class="token string">'__main__'</span><span class="token punctuation">:</span>
    parser <span class="token operator">=</span> get_parser<span class="token punctuation">(</span><span class="token punctuation">)</span>
    args <span class="token operator">=</span> parser<span class="token punctuation">.</span>parse_args<span class="token punctuation">(</span><span class="token punctuation">)</span>
    main<span class="token punctuation">(</span>args<span class="token punctuation">)</span>
</code></pre> 
  <p>上述代码的大致预处理流程为:</p> 
  <ul> 
   <li>分词:<code>perl $TOKENIZER -threads 8 -l $l > $tmp/$tok</code></li> 
   <li>清理:<code>perl $CLEAN -ratio 1.5 $tmp/train.tags.$lang.tok $src $tgt $tmp/train.tags.$lang.clean 1 175</code></li> 
   <li>小写化:<code>perl $LC < $tmp/train.tags.$lang.clean.$l > $tmp/train.tags.$lang.$l</code></li> 
   <li>为测试数据进行同样的:分词、小写化</li> 
   <li>创建训练、验证、测试集</li> 
   <li>使用所有训练数据学习bpe:<code>python $BPEROOT/learn_bpe.py -s $BPE_TOKENS < $TRAIN > $BPE_CODE</code></li> 
   <li>对所有文件进行bpe:<code>python $BPEROOT/apply_bpe.py -c $BPE_CODE < $tmp/$f > $prep/$f</code></li> 
   <li>创建词典&二值化:<code>python iwslt14/preprocess_universal.py --pref=iwslt14/ --joined-dictionary</code></li> 
  </ul> 
  <p>另外需要特别说明的是,preprocess_multilingual.py需要用到fairseq库, 而如果直接在当前环境pip install fairseq,得到最新版本是跑不了这些代码的。方法有二:1. <code>pip install fairseq==0.6.1</code>(没有尝试);2. <code>git clone https://github.com/RayeRen/multilingual-kd-pytorch ; cp -r multilingual-kd-pytorch/fairseq .</code>。</p> 
  <p>最终得到data-bin文件夹,用于模型的训练。<br> <s>(关于以上预处理流程中涉及到的代码的大致解析,请见[机器翻译] 常见预处理代码解析)</s></p> 
  <h1>模型训练</h1> 
  <p>看LaSS。</p> 
  <h1>补充</h1> 
  <h2>补充一:Key error while accessing batch_iterator.first_batch</h2> 
  <p>如果遇到该错误,则是因为上面的preprocess_multilingual.py与你使用的fairseq版本不对应,可以尝试[机器翻译] multilingual fairseq-preprocess中的方法。</p> 
  <h1>参考</h1> 
  <p>https://github.com/NLP-Playground/LaSS<br> https://github.com/RayeRen/multilingual-kd-pytorch/blob/master/data/iwslt/raw/prepare-iwslt14.sh<br> https://blog.csdn.net/jokerxsy/article/details/125054739</p> 
 </div> 
</div>
                            </div>
                        </div>
                    </div>
                    <!--PC和WAP自适应版-->
                    <div id="SOHUCS" sid="1538393581520187392"></div>
                    <script type="text/javascript" src="/views/front/js/chanyan.js"></script>
                    <!-- 文章页-底部 动态广告位 -->
                    <div class="youdao-fixed-ad" id="detail_ad_bottom"></div>
                </div>
                <div class="col-md-3">
                    <div class="row" id="ad">
                        <!-- 文章页-右侧1 动态广告位 -->
                        <div id="right-1" class="col-lg-12 col-md-12 col-sm-4 col-xs-4 ad">
                            <div class="youdao-fixed-ad" id="detail_ad_1"> </div>
                        </div>
                        <!-- 文章页-右侧2 动态广告位 -->
                        <div id="right-2" class="col-lg-12 col-md-12 col-sm-4 col-xs-4 ad">
                            <div class="youdao-fixed-ad" id="detail_ad_2"></div>
                        </div>
                        <!-- 文章页-右侧3 动态广告位 -->
                        <div id="right-3" class="col-lg-12 col-md-12 col-sm-4 col-xs-4 ad">
                            <div class="youdao-fixed-ad" id="detail_ad_3"></div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
    <div class="container">
        <h4 class="pt20 mb15 mt0 border-top">你可能感兴趣的:(机器翻译,机器翻译,自然语言处理,人工智能)</h4>
        <div id="paradigm-article-related">
            <div class="recommend-post mb30">
                <ul class="widget-links">
                    <li><a href="/article/1899416845945991168.htm"
                           title="清华DeepSeek以手札为剑,破AI迷津雾霭,开启荣耀进阶征途" target="_blank">清华DeepSeek以手札为剑,破AI迷津雾霭,开启荣耀进阶征途</a>
                        <span class="text-muted">2501_91080610</span>
<a class="tag" taget="_blank" href="/search/pdf/1.htm">pdf</a>
                        <div>清华DeepSeek:以手札为剑,破AI迷津雾霭,开启荣耀进阶征途在当下这个科技浪潮奔涌不息的时代,人工智能领域成为了无数科研人员竞逐的“战场”。在这片充满无限可能却又迷雾重重的天地中,清华DeepSeek宛如一位英勇无畏的剑客,紧握“手札”这把利剑,奋力劈开迷津雾霭,大步踏上荣耀进阶的征途。溯源:手札中的智慧传承与沉淀清华DeepSeek背后,是一群怀揣着对AI炽热梦想的清华学子与科研精英。手札</div>
                    </li>
                    <li><a href="/article/1899411928439123968.htm"
                           title="模型上下文协议(MCP):构建 AI 与数据交互的新范式" target="_blank">模型上下文协议(MCP):构建 AI 与数据交互的新范式</a>
                        <span class="text-muted">xxgshxs</span>
<a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/chatgpt/1.htm">chatgpt</a><a class="tag" taget="_blank" href="/search/prompt/1.htm">prompt</a><a class="tag" taget="_blank" href="/search/%E6%96%87%E5%BF%83%E4%B8%80%E8%A8%80/1.htm">文心一言</a><a class="tag" taget="_blank" href="/search/llama/1.htm">llama</a><a class="tag" taget="_blank" href="/search/copilot/1.htm">copilot</a>
                        <div>引言在人工智能领域,大型语言模型(LLMs)的应用正从通用问答向复杂任务执行演进,但数据孤岛、工具集成碎片化及隐私安全等问题制约了其潜力。模型上下文协议(ModelContextProtocol,MCP)作为Anthropic提出的开放标准,旨在通过标准化接口连接AI应用与异构数据源及工具,重塑AI开发范式。本文从技术架构、核心功能、应用场景等维度解析MCP的设计逻辑与实践价值。一、核心概念与设计</div>
                    </li>
                    <li><a href="/article/1899409406920028160.htm"
                           title="量子计算如何颠覆能源优化领域:从理论到实践" target="_blank">量子计算如何颠覆能源优化领域:从理论到实践</a>
                        <span class="text-muted">Echo_Wish</span>
<a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E5%89%8D%E6%B2%BF%E6%8A%80%E6%9C%AF/1.htm">前沿技术</a><a class="tag" taget="_blank" href="/search/%E9%87%8F%E5%AD%90%E8%AE%A1%E7%AE%97/1.htm">量子计算</a><a class="tag" taget="_blank" href="/search/%E8%83%BD%E6%BA%90/1.htm">能源</a>
                        <div>量子计算如何颠覆能源优化领域:从理论到实践大家好,我是Echo_Wish,一个热爱探索前沿技术的人工智能与Python领域的技术分享者。今天,我们将深入探讨一个激动人心的话题——量子计算在能源优化中的应用。这不仅是科技领域的全新趋势,也可能为全人类的能源利用效率带来革命性突破。从理论模型到实际应用,量子计算已经在一些能源相关领域崭露头角,例如电网优化、可再生能源分配和物流节能规划。以下,让我们一步</div>
                    </li>
                    <li><a href="/article/1899407893111828480.htm"
                           title="AI人工智能 Agent:电力系统中智能体的应用" target="_blank">AI人工智能 Agent:电力系统中智能体的应用</a>
                        <span class="text-muted">AI天才研究院</span>
<a class="tag" taget="_blank" href="/search/AI%E5%A4%A7%E6%A8%A1%E5%9E%8B%E4%BC%81%E4%B8%9A%E7%BA%A7%E5%BA%94%E7%94%A8%E5%BC%80%E5%8F%91%E5%AE%9E%E6%88%98/1.htm">AI大模型企业级应用开发实战</a><a class="tag" taget="_blank" href="/search/DeepSeek/1.htm">DeepSeek</a><a class="tag" taget="_blank" href="/search/R1/1.htm">R1</a><a class="tag" taget="_blank" href="/search/%26amp%3B/1.htm">&</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%95%B0%E6%8D%AEAI%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD%E5%A4%A7%E6%A8%A1%E5%9E%8B/1.htm">大数据AI人工智能大模型</a><a class="tag" taget="_blank" href="/search/%E8%AE%A1%E7%AE%97%E7%A7%91%E5%AD%A6/1.htm">计算科学</a><a class="tag" taget="_blank" href="/search/%E7%A5%9E%E7%BB%8F%E8%AE%A1%E7%AE%97/1.htm">神经计算</a><a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/1.htm">深度学习</a><a class="tag" taget="_blank" href="/search/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C/1.htm">神经网络</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%95%B0%E6%8D%AE/1.htm">大数据</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B/1.htm">大型语言模型</a><a class="tag" taget="_blank" href="/search/AI/1.htm">AI</a><a class="tag" taget="_blank" href="/search/AGI/1.htm">AGI</a><a class="tag" taget="_blank" href="/search/LLM/1.htm">LLM</a><a class="tag" taget="_blank" href="/search/Java/1.htm">Java</a><a class="tag" taget="_blank" href="/search/Python/1.htm">Python</a><a class="tag" taget="_blank" href="/search/%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1/1.htm">架构设计</a><a class="tag" taget="_blank" href="/search/Agent/1.htm">Agent</a><a class="tag" taget="_blank" href="/search/RPA/1.htm">RPA</a>
                        <div>AI人工智能Agent:电力系统中智能体的应用作者:禅与计算机程序设计艺术1.背景介绍1.1电力系统的挑战与机遇电力系统是现代社会运行的基石,其安全、可靠、高效运行对经济发展和人民生活至关重要。近年来,随着可再生能源的快速发展、电力需求的不断增长以及电力市场化的推进,电力系统面临着前所未有的挑战,同时也迎来了新的发展机遇。挑战:可再生能源的波动性和间歇性:太阳能和风能等可再生能源的输出功率受天气条</div>
                    </li>
                    <li><a href="/article/1899388724085583872.htm"
                           title="Python从0到100(七十六):计算机视觉-直方图和自适应直方图均衡化" target="_blank">Python从0到100(七十六):计算机视觉-直方图和自适应直方图均衡化</a>
                        <span class="text-muted">是Dream呀</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E8%AE%A1%E7%AE%97%E6%9C%BA%E8%A7%86%E8%A7%89/1.htm">计算机视觉</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>前言:零基础学Python:Python从0到100最新最全教程。想做这件事情很久了,这次我更新了自己所写过的所有博客,汇集成了Python从0到100,共一百节课,帮助大家一个月时间里从零基础到学习Python基础语法、Python爬虫、Web开发、计算机视觉、机器学习、神经网络以及人工智能相关知识,成为学习学习和学业的先行者!欢迎大家订阅专栏:零基础学Python:Python从0到100最新</div>
                    </li>
                    <li><a href="/article/1899377369186103296.htm"
                           title="autoMate - AI实现电脑任务自动化的本地工具" target="_blank">autoMate - AI实现电脑任务自动化的本地工具</a>
                        <span class="text-muted">小众AI</span>
<a class="tag" taget="_blank" href="/search/AI%E5%BC%80%E6%BA%90/1.htm">AI开源</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E8%87%AA%E5%8A%A8%E5%8C%96/1.htm">自动化</a><a class="tag" taget="_blank" href="/search/%E8%BF%90%E7%BB%B4/1.htm">运维</a>
                        <div>GitHub:https://github.com/yuruotong1/autoMate更多AI开源软件:发现分享好用的AI工具、AI开源软件、AI模型、AI变现-小众AIautoMate是一款由开源开发的本地自动化工具,以AI+RPA(人工智能+机器人流程自动化)为核心特色。它将大型语言模型的智能理解与RPA的流程执行能力结合,用户只需用自然语言描述任务,如“整理桌面文件”或“生成周报”,即可</div>
                    </li>
                    <li><a href="/article/1899367023465525248.htm"
                           title="从零开始构建大模型(LLM)应用" target="_blank">从零开始构建大模型(LLM)应用</a>
                        <span class="text-muted">和老莫一起学AI</span>
<a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/ai/1.htm">ai</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%A8%A1%E5%9E%8B/1.htm">大模型</a><a class="tag" taget="_blank" href="/search/%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B/1.htm">语言模型</a><a class="tag" taget="_blank" href="/search/llm/1.htm">llm</a><a class="tag" taget="_blank" href="/search/%E8%87%AA%E7%84%B6%E8%AF%AD%E8%A8%80%E5%A4%84%E7%90%86/1.htm">自然语言处理</a><a class="tag" taget="_blank" href="/search/%E5%AD%A6%E4%B9%A0/1.htm">学习</a>
                        <div>大模型(LLM)已经成为当前人工智能的重要部分。但是,在这个领域还没有固定的操作标准,开发者们往往没有明确的指导,需要不断尝试和摸索。在过去两年中,我帮助了许多公司利用LLM来开发了很多创新的应用产品。基于这些经验,我形成了一套实用的方法,并准备在这篇文章中与大家分享。这套方法将提供一些步骤,帮助需要的小伙伴在LLM应用开发的复杂环境中找到方向。从最初的构思到PoC、评估再到产品化,了解如何将创意</div>
                    </li>
                    <li><a href="/article/1899365511448293376.htm"
                           title="【LLM】从零开始实现 LLaMA3" target="_blank">【LLM】从零开始实现 LLaMA3</a>
                        <span class="text-muted">FOUR_A</span>
<a class="tag" taget="_blank" href="/search/LLM/1.htm">LLM</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/1.htm">机器学习</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%A8%A1%E5%9E%8B/1.htm">大模型</a><a class="tag" taget="_blank" href="/search/llama/1.htm">llama</a><a class="tag" taget="_blank" href="/search/%E7%AE%97%E6%B3%95/1.htm">算法</a>
                        <div>分词器在这里,我们不会实现一个BPE分词器(但AndrejKarpathy有一个非常简洁的实现)。BPE(BytePairEncoding,字节对编码)是一种数据压缩算法,也被用于自然语言处理中的分词方法。它通过逐步将常见的字符或子词组合成更长的词元(tokens),从而有效地表示文本中的词汇。在自然语言处理中的BPE分词器的工作原理如下:初始化:首先,将所有词汇表中的单词分解为单个字符或符号。例</div>
                    </li>
                    <li><a href="/article/1899364248610467840.htm"
                           title="机器学习之线性代数" target="_blank">机器学习之线性代数</a>
                        <span class="text-muted">珠峰日记</span>
<a class="tag" taget="_blank" href="/search/AI%E7%90%86%E8%AE%BA%E4%B8%8E%E5%AE%9E%E8%B7%B5/1.htm">AI理论与实践</a><a class="tag" taget="_blank" href="/search/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/1.htm">机器学习</a><a class="tag" taget="_blank" href="/search/%E7%BA%BF%E6%80%A7%E4%BB%A3%E6%95%B0/1.htm">线性代数</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a>
                        <div>文章目录一、引言:线性代数为何是AI的基石二、向量:AI世界的基本构建块(一)向量的定义(二)向量基础操作(三)重要概念三、矩阵:AI数据的强大容器(一)矩阵的定义(二)矩阵运算(三)矩阵特性(四)矩阵分解(五)Python示例(使用NumPy库)四、线性代数在AI中的应用(一)数据表示(二)降维:PCA(三)线性回归(四)计算机视觉(五)自然语言处理一、引言:线性代数为何是AI的基石在人工智能领</div>
                    </li>
                    <li><a href="/article/1899356558266003456.htm"
                           title="基于transformer实现机器翻译(日译中)" target="_blank">基于transformer实现机器翻译(日译中)</a>
                        <span class="text-muted">小白_laughter</span>
<a class="tag" taget="_blank" href="/search/%E8%AF%BE%E7%A8%8B%E5%AD%A6%E4%B9%A0/1.htm">课程学习</a><a class="tag" taget="_blank" href="/search/transformer/1.htm">transformer</a><a class="tag" taget="_blank" href="/search/%E6%9C%BA%E5%99%A8%E7%BF%BB%E8%AF%91/1.htm">机器翻译</a><a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/1.htm">深度学习</a>
                        <div>文章目录一、引言二、使用编码器—解码器和注意力机制来实现机器翻译模型2.0含注意力机制的编码器—解码器2.1读取和预处理数据2.2含注意力机制的编码器—解码器2.3训练模型2.4预测不定长的序列2.5评价翻译结果三、使用Transformer架构和PyTorch深度学习库来实现的日中机器翻译模型3.1、导入必要的库3.2、数据集准备3.3、准备分词器3.4、构建TorchText词汇表对象,并将句</div>
                    </li>
                    <li><a href="/article/1899327434306678784.htm"
                           title="AI大模型零基础金融人如何一周自学大模型,从零基础到入门,看这篇就够了!" target="_blank">AI大模型零基础金融人如何一周自学大模型,从零基础到入门,看这篇就够了!</a>
                        <span class="text-muted">冻感糕人~</span>
<a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E9%87%91%E8%9E%8D/1.htm">金融</a><a class="tag" taget="_blank" href="/search/AI%E5%A4%A7%E6%A8%A1%E5%9E%8B/1.htm">AI大模型</a><a class="tag" taget="_blank" href="/search/LLM/1.htm">LLM</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%A8%A1%E5%9E%8B%E6%8A%80%E6%9C%AF/1.htm">大模型技术</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%A8%A1%E5%9E%8B%E5%AD%A6%E4%B9%A0%E8%B7%AF%E7%BA%BF/1.htm">大模型学习路线</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%A8%A1%E5%9E%8B%E5%9F%BA%E7%A1%80/1.htm">大模型基础</a>
                        <div>前几天参加了字节跳动在上海举办的火山引擎Force原动力大会,OpenAI也连续开了12天发布会,最近堪称科技界的春晚了。如果说2022年ChatGPT横空出世把人工智能的发展带上了一个新的台阶,那么2024年末,大模型对工作、生活的全面“侵入”让我们越来越接近库兹韦尔所描述的那个奇点时刻。作为金融民工,我们想通过这篇文章讲讲从用户的角度如何一周快速掌握大模型,以及为什么我建议每一个金融从业人员(</div>
                    </li>
                    <li><a href="/article/1899298169255161856.htm"
                           title="成功案例丨开发时间从1小时缩短到3分钟:如何利用历史数据训练AI模型,预测设计性能?" target="_blank">成功案例丨开发时间从1小时缩短到3分钟:如何利用历史数据训练AI模型,预测设计性能?</a>
                        <span class="text-muted">Altair澳汰尔</span>
<a class="tag" taget="_blank" href="/search/PhysicsAI/1.htm">PhysicsAI</a><a class="tag" taget="_blank" href="/search/%E4%BB%BF%E7%9C%9F/1.htm">仿真</a><a class="tag" taget="_blank" href="/search/AI/1.htm">AI</a><a class="tag" taget="_blank" href="/search/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/1.htm">机器学习</a><a class="tag" taget="_blank" href="/search/HyperWorks/1.htm">HyperWorks</a><a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90/1.htm">数据分析</a>
                        <div>案例简介PhysicsAI™助力HEROMOTOCORP实现设计效率提升99%印度领先的跨国摩托车和踏板车制造商HeroMotoCorpLtd.(以下简称Hero)致力于通过将人工智能(AI)和机器学习技术融入有限元分析(FEA)流程,以加速产品开发周期。在其首个AI驱动项目——摩托车把手设计优化中,Hero采用了PhysicsAI™几何深度学习解决方案,利用历史数据训练AI模型并预测设计性能。A</div>
                    </li>
                    <li><a href="/article/1899289850423603200.htm"
                           title="数据分析与AI丨AI Fabric:数据和人工智能架构的未来" target="_blank">数据分析与AI丨AI Fabric:数据和人工智能架构的未来</a>
                        <span class="text-muted">Altair澳汰尔</span>
<a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90/1.htm">数据分析</a><a class="tag" taget="_blank" href="/search/ai/1.htm">ai</a><a class="tag" taget="_blank" href="/search/RapidMiner/1.htm">RapidMiner</a><a class="tag" taget="_blank" href="/search/%E7%9F%A5%E8%AF%86%E5%9B%BE%E8%B0%B1/1.htm">知识图谱</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a>
                        <div>AIFabric架构是模块化、可扩展且面向未来的,是现代商业环境中企业实现卓越的关键。在当今商业环境中,数据分析和人工智能领域发展可谓日新月异。几乎每天都有新兴技术诞生,新的应用场景不断涌现,前沿探索持续拓展。可遗憾的是,众多企业在利用数据和人工智能方面,脚步总是滞后。这是每个行业进行创新和获得竞争优势的冲刺阶段,但正如大多数企业时常感受到的那样,大规模实施下一代数据和AI工具说起来容易做起来难。</div>
                    </li>
                    <li><a href="/article/1899288339375255552.htm"
                           title="Manus演示案例: 英伟达财务估值建模 解锁投资洞察的深度剖析" target="_blank">Manus演示案例: 英伟达财务估值建模 解锁投资洞察的深度剖析</a>
                        <span class="text-muted">ylfhpy</span>
<a class="tag" taget="_blank" href="/search/Manus/1.htm">Manus</a><a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/1.htm">深度学习</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/1.htm">机器学习</a><a class="tag" taget="_blank" href="/search/%E6%9C%BA%E5%99%A8%E7%BF%BB%E8%AF%91/1.htm">机器翻译</a><a class="tag" taget="_blank" href="/search/Manus/1.htm">Manus</a>
                        <div>在当今瞬息万变的金融投资领域,精准剖析企业价值是投资者决胜市场的关键。英伟达(NVIDIA),作为科技行业的耀眼明星,其在人工智能和半导体领域的卓越表现备受瞩目。Manus凭借专业的财务估值建模能力,深入挖掘英伟达的潜在价值,为投资者提供了一份极具价值的分析报告。Manus在接到为英伟达进行详细财务估值建模的任务后,迅速且有条不紊地开展工作。数据收集是建模的基石,其重要性不言而喻。在收集英伟达公司</div>
                    </li>
                    <li><a href="/article/1899279014544076800.htm"
                           title="Python学习指南:系统化路径 + 避坑建议" target="_blank">Python学习指南:系统化路径 + 避坑建议</a>
                        <span class="text-muted">程之编</span>
<a class="tag" taget="_blank" href="/search/Python%E5%85%A8%E6%A0%88%E9%80%9A%E5%85%B3%E7%A7%98%E7%B1%8D/1.htm">Python全栈通关秘籍</a><a class="tag" taget="_blank" href="/search/%E9%9D%92%E5%B0%91%E5%B9%B4%E7%BC%96%E7%A8%8B/1.htm">青少年编程</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/1.htm">机器学习</a>
                        <div>新手小白学习编程就像搭积木——需要从基础开始,逐步构建知识体系。以下是为你量身定制的Python学习路径,帮你告别杂乱,高效入门!一、学习前的关键认知明确目标:想用Python做什么?数据分析(如Excel自动化、可视化)Web开发(如搭建网站)人工智能(如机器学习)自动化办公(如处理文件、邮件)目标不同,后续学习侧重点不同(但基础通用)。避免误区:❌只看教程不写代码✅边学边动手,哪怕抄代码也要运</div>
                    </li>
                    <li><a href="/article/1899276866909433856.htm"
                           title="大语言模型原理基础与前沿 双层路由多模态融合、多任务学习和模块化架构" target="_blank">大语言模型原理基础与前沿 双层路由多模态融合、多任务学习和模块化架构</a>
                        <span class="text-muted">AI智能涌现深度研究</span>
<a class="tag" taget="_blank" href="/search/AI%E5%A4%A7%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%E5%92%8C%E7%9F%A5%E8%AF%86%E5%9B%BE%E8%B0%B1%E8%9E%8D%E5%90%88/1.htm">AI大语言模型和知识图谱融合</a><a class="tag" taget="_blank" href="/search/Python%E5%85%A5%E9%97%A8%E5%AE%9E%E6%88%98/1.htm">Python入门实战</a><a class="tag" taget="_blank" href="/search/DeepSeek/1.htm">DeepSeek</a><a class="tag" taget="_blank" href="/search/R1/1.htm">R1</a><a class="tag" taget="_blank" href="/search/%26amp%3B/1.htm">&</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%95%B0%E6%8D%AEAI%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">大数据AI人工智能</a><a class="tag" taget="_blank" href="/search/%E8%AE%A1%E7%AE%97%E7%A7%91%E5%AD%A6/1.htm">计算科学</a><a class="tag" taget="_blank" href="/search/%E7%A5%9E%E7%BB%8F%E8%AE%A1%E7%AE%97/1.htm">神经计算</a><a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/1.htm">深度学习</a><a class="tag" taget="_blank" href="/search/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C/1.htm">神经网络</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%95%B0%E6%8D%AE/1.htm">大数据</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B/1.htm">大型语言模型</a><a class="tag" taget="_blank" href="/search/AI/1.htm">AI</a><a class="tag" taget="_blank" href="/search/AGI/1.htm">AGI</a><a class="tag" taget="_blank" href="/search/LLM/1.htm">LLM</a><a class="tag" taget="_blank" href="/search/Java/1.htm">Java</a><a class="tag" taget="_blank" href="/search/Python/1.htm">Python</a><a class="tag" taget="_blank" href="/search/%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1/1.htm">架构设计</a><a class="tag" taget="_blank" href="/search/Agent/1.htm">Agent</a><a class="tag" taget="_blank" href="/search/RPA/1.htm">RPA</a>
                        <div>大语言模型原理基础与前沿:双层路由多模态融合、多任务学习和模块化架构关键词:大语言模型、双层路由、多模态融合、多任务学习、模块化架构、神经网络、自然语言处理1.背景介绍大语言模型(LargeLanguageModels,LLMs)已经成为人工智能和自然语言处理领域的重要研究方向。随着GPT-3、BERT等模型的出现,大语言模型在各种任务中展现出了惊人的性能。然而,随着模型规模的不断扩大和应用场景的</div>
                    </li>
                    <li><a href="/article/1899275984855691264.htm"
                           title="新的一年,新的感受和成长" target="_blank">新的一年,新的感受和成长</a>
                        <span class="text-muted">是小天才哦</span>
<a class="tag" taget="_blank" href="/search/%23/1.htm">#</a><a class="tag" taget="_blank" href="/search/%E9%AB%98%E8%81%8C%E7%94%9F%E9%97%B2%E8%B0%88/1.htm">高职生闲谈</a><a class="tag" taget="_blank" href="/search/%E6%9C%8D%E5%8A%A1%E5%99%A8/1.htm">服务器</a>
                        <div>本人现在是工作快2年的打工人,我是前年7月份毕业的大专生。其实我在大学刚开始的时候因为体验过社会的毒打,所以发誓一定要好好学习,而我也的确好好学习了,在学校2年时间里,大部分时间都是在图书馆里面看书,主要为啥天天在图书馆很大原因是本专业的课程自己不是非常喜欢(我是人工智能专业,人工智能专业大专学历出来基本也是打框的无聊活)所以我就自己学习了系统运维方向,这个过程也考取了RHCE认证,也是因为这个认</div>
                    </li>
                    <li><a href="/article/1899270812926537728.htm"
                           title="Python机器学习实战:构建序列到序列(Seq2Seq)模型处理翻译任务" target="_blank">Python机器学习实战:构建序列到序列(Seq2Seq)模型处理翻译任务</a>
                        <span class="text-muted">AGI大模型与大数据研究院</span>
<a class="tag" taget="_blank" href="/search/%E7%A8%8B%E5%BA%8F%E5%91%98%E6%8F%90%E5%8D%87%E8%87%AA%E6%88%91/1.htm">程序员提升自我</a><a class="tag" taget="_blank" href="/search/%E7%A1%85%E5%9F%BA%E8%AE%A1%E7%AE%97/1.htm">硅基计算</a><a class="tag" taget="_blank" href="/search/%E7%A2%B3%E5%9F%BA%E8%AE%A1%E7%AE%97/1.htm">碳基计算</a><a class="tag" taget="_blank" href="/search/%E8%AE%A4%E7%9F%A5%E8%AE%A1%E7%AE%97/1.htm">认知计算</a><a class="tag" taget="_blank" href="/search/%E7%94%9F%E7%89%A9%E8%AE%A1%E7%AE%97/1.htm">生物计算</a><a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/1.htm">深度学习</a><a class="tag" taget="_blank" href="/search/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C/1.htm">神经网络</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%95%B0%E6%8D%AE/1.htm">大数据</a><a class="tag" taget="_blank" href="/search/AIGC/1.htm">AIGC</a><a class="tag" taget="_blank" href="/search/AGI/1.htm">AGI</a><a class="tag" taget="_blank" href="/search/LLM/1.htm">LLM</a><a class="tag" taget="_blank" href="/search/Java/1.htm">Java</a><a class="tag" taget="_blank" href="/search/Python/1.htm">Python</a><a class="tag" taget="_blank" href="/search/%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1/1.htm">架构设计</a><a class="tag" taget="_blank" href="/search/Agent/1.htm">Agent</a><a class="tag" taget="_blank" href="/search/%E7%A8%8B%E5%BA%8F%E5%91%98%E5%AE%9E%E7%8E%B0%E8%B4%A2%E5%AF%8C%E8%87%AA%E7%94%B1/1.htm">程序员实现财富自由</a>
                        <div>Python机器学习实战:构建序列到序列(Seq2Seq)模型处理翻译任务1.背景介绍1.1问题的由来翻译是跨语言沟通的重要桥梁,随着全球化进程的加速,翻译需求日益增长。传统的机器翻译方法主要依赖于规则和统计方法,如基于短语的翻译、基于统计的机器翻译等。然而,这些方法难以处理复杂的语言现象,翻译质量参差不齐。近年来,随着深度学习技术的快速发展,基于神经网络序列到序列(Sequence-to-Seq</div>
                    </li>
                    <li><a href="/article/1899241432019955712.htm"
                           title="通义万相2.1:AI视频生成迎来“质变”,运镜、文字、物理规律全面突破" target="_blank">通义万相2.1:AI视频生成迎来“质变”,运镜、文字、物理规律全面突破</a>
                        <span class="text-muted">that's boy</span>
<a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E9%80%9A%E4%B9%89%E4%B8%87%E8%B1%A12.1/1.htm">通义万象2.1</a><a class="tag" taget="_blank" href="/search/chatgpt/1.htm">chatgpt</a><a class="tag" taget="_blank" href="/search/openai/1.htm">openai</a><a class="tag" taget="_blank" href="/search/qwen/1.htm">qwen</a><a class="tag" taget="_blank" href="/search/AI%E4%BD%9C%E7%94%BB/1.htm">AI作画</a><a class="tag" taget="_blank" href="/search/AI%E7%BC%96%E7%A8%8B/1.htm">AI编程</a>
                        <div>AI视频生成,从“能看”到“惊艳”的跨越在人工智能的浪潮中,AI视频生成无疑是最受瞩目的领域之一。从最初的简单动画到如今的逼真模拟,AI视频生成技术正在快速发展,不断刷新人们的认知。近日,阿里云旗下通义万相视频生成模型宣布了2.1版本的重磅升级,不仅在性能上实现了全面提升,更在运镜、文字生成、物理规律模拟等方面取得了突破性进展,让AI视频生成真正进入了“质变”的新阶段。通义万相2.1的出现,不仅是</div>
                    </li>
                    <li><a href="/article/1899237527466864640.htm"
                           title="C++开源库大全" target="_blank">C++开源库大全</a>
                        <span class="text-muted">大王算法</span>
<a class="tag" taget="_blank" href="/search/C%2FC%2B%2B%E5%BC%80%E5%8F%91%E5%AE%9E%E6%88%98365/1.htm">C/C++开发实战365</a><a class="tag" taget="_blank" href="/search/C%2B%2B%E5%85%A5%E9%97%A8%E5%8F%8A%E9%A1%B9%E7%9B%AE%E5%AE%9E%E6%88%98%E5%AE%9D%E5%85%B8/1.htm">C++入门及项目实战宝典</a><a class="tag" taget="_blank" href="/search/c%2B%2B/1.htm">c++</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E6%BA%90/1.htm">开源</a>
                        <div>程序员要站在巨人的肩膀上,C++拥有丰富的开源库,这里包括:标准库、Web应用框架、人工智能、数据库、图片处理、机器学习、日志、代码分析等。标准库C++StandardLibrary:是一系列类和函数的集合,使用核心语言编写,也是C++ISO自身标准的一部分。</div>
                    </li>
                    <li><a href="/article/1899233742585655296.htm"
                           title="LangChain大模型应用开发指南-大模型Memory不止于对话" target="_blank">LangChain大模型应用开发指南-大模型Memory不止于对话</a>
                        <span class="text-muted">喝不喝奶茶丫</span>
<a class="tag" taget="_blank" href="/search/langchain/1.htm">langchain</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%A8%A1%E5%9E%8B/1.htm">大模型</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%A8%A1%E5%9E%8B%E5%BA%94%E7%94%A8/1.htm">大模型应用</a><a class="tag" taget="_blank" href="/search/AI%E5%A4%A7%E6%A8%A1%E5%9E%8B/1.htm">AI大模型</a><a class="tag" taget="_blank" href="/search/Memory/1.htm">Memory</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B/1.htm">大语言模型</a>
                        <div>上节课,我我为您介绍了LangChain中最基本的链式结构,以及基于这个链式结构演化出来的ReAct对话链模型。今天我将由简入繁,为大家拆解LangChain内置的多种记忆机制。本教程将详细介绍这些记忆组件的工作原理、特性以及使用方法。【一一AGI大模型学习所有资源获取处一一】①人工智能/大模型学习路线②AI产品经理资源合集③200本大模型PDF书籍④超详细海量大模型实战项目⑤LLM大模型系统学习</div>
                    </li>
                    <li><a href="/article/1899221892301123584.htm"
                           title="llama.cpp框架下GGUF格式及量化参数全解析" target="_blank">llama.cpp框架下GGUF格式及量化参数全解析</a>
                        <span class="text-muted">Black_Rock_br</span>
<a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a>
                        <div>前言:在人工智能领域,语言模型的高效部署和推理一直是研究热点。随着模型规模的不断扩大,如何在有限的硬件资源上实现快速、高效的推理,成为了一个关键问题。`llama.cpp`框架以其出色的性能和灵活性,为这一问题提供了有效的解决方案。其中,GGUF格式和模型量化参数是实现高效推理的重要技术手段。本文将对`llama.cpp`框架下的GGUF格式及量化参数进行详细解析,帮助读者更好地理解和应用这些技术</div>
                    </li>
                    <li><a href="/article/1899219999638220800.htm"
                           title="AI 驱动的软件测试革命:从自动化到智能化的进阶之路" target="_blank">AI 驱动的软件测试革命:从自动化到智能化的进阶之路</a>
                        <span class="text-muted">綦枫Maple</span>
<a class="tag" taget="_blank" href="/search/AI%2B%E8%BD%AF%E4%BB%B6%E6%B5%8B%E8%AF%95/1.htm">AI+软件测试</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E8%87%AA%E5%8A%A8%E5%8C%96/1.htm">自动化</a><a class="tag" taget="_blank" href="/search/%E8%BF%90%E7%BB%B4/1.htm">运维</a>
                        <div>引言:软件测试的智能化转型浪潮在数字化转型加速的今天,软件产品的迭代速度与复杂度呈指数级增长。传统软件测试依赖人工编写用例、执行测试的模式,已难以应对快速交付与高质量要求的双重挑战。人工智能技术的突破为测试领域注入了新动能,通过机器学习、深度学习、自然语言处理等技术,测试流程正从“被动验证”向“主动预防”演进。本文将深入探讨AI与软件测试的融合路径,结合技术原理、工具实践与行业趋势,为读者呈现一幅</div>
                    </li>
                    <li><a href="/article/1899199304216670208.htm"
                           title="向量数据库简介" target="_blank">向量数据库简介</a>
                        <span class="text-muted">openwin_top</span>
<a class="tag" taget="_blank" href="/search/python%E7%BC%96%E7%A8%8B%E7%A4%BA%E4%BE%8B%E7%B3%BB%E5%88%97/1.htm">python编程示例系列</a><a class="tag" taget="_blank" href="/search/python%E7%BC%96%E7%A8%8B%E7%A4%BA%E4%BE%8B%E7%B3%BB%E5%88%97%E4%BA%8C/1.htm">python编程示例系列二</a><a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E5%BA%93/1.htm">数据库</a>
                        <div>向量数据库(VectorDatabase)是一种专门用于存储和查询向量数据的数据库系统。向量数据库通常使用高效的向量索引技术,支持基于向量相似度的查询和检索,可以应用于图像搜索、自然语言处理、推荐系统、机器学习等领域。与传统的关系型数据库不同,向量数据库通常使用基于向量的数据模型,将向量作为数据的核心表示形式。向量数据库可以存储和处理大量的向量数据,支持高效的向量相似度计算和查询。常见的向量索引技</div>
                    </li>
                    <li><a href="/article/1899191483634872320.htm"
                           title="在LangChain中运行Replicate模型的实用指南" target="_blank">在LangChain中运行Replicate模型的实用指南</a>
                        <span class="text-muted">fgayif</span>
<a class="tag" taget="_blank" href="/search/langchain/1.htm">langchain</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a>
                        <div>##技术背景介绍Replicate是一个平台,可以轻松调用各种预训练的AI模型。与传统的模型托管和调用相比,Replicate提供了简单的API接口,使开发者能够快速集成和使用强大的AI模型。本文将重点介绍如何在LangChain项目中集成和调用Replicate模型。##核心原理解析在集成Replicate模型之前,需要进行一些基础设置和安装工作。LangChain是一个用于自然语言处理的库,它</div>
                    </li>
                    <li><a href="/article/1899190474363695104.htm"
                           title="使用Activeloop Deep Lake构建深度学习数据仓库与向量存储" target="_blank">使用Activeloop Deep Lake构建深度学习数据仓库与向量存储</a>
                        <span class="text-muted">dgay_hua</span>
<a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/1.htm">深度学习</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a>
                        <div>技术背景介绍随着深度学习技术的发展,数据的存储与管理成为了一个重要的问题。尤其是对于需要处理大量数据的应用,例如自然语言处理和图像识别,传统的数据存储方式已经无法满足需求。ActiveloopDeepLake是专为深度学习设计的数据仓库,可以作为向量存储使用,支持多模态数据的存储和处理,并且可以直接用于细调大型语言模型(LLMs)。此外,它还提供自动版本控制,无需依赖其他服务,兼容主要云服务提供商</div>
                    </li>
                    <li><a href="/article/1899189717266657280.htm"
                           title="使用CharacterTextSplitter进行文本分割的实战指南" target="_blank">使用CharacterTextSplitter进行文本分割的实战指南</a>
                        <span class="text-muted">bBADAS</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a>
                        <div>在处理长文本时,将其切割成较小的片段是常见的需求,尤其是在自然语言处理任务中。CharacterTextSplitter是一个强大的工具,用于通过字符分隔符对文本进行分割,本文将深入介绍如何使用它进行文本处理。技术背景介绍当面对一份冗长的文本时,比如总统演讲稿、法律文档等,我们常常需要将其拆分成便于处理的小段。CharacterTextSplitter正是为此而生的一个轻量级工具,专门用于基于特定</div>
                    </li>
                    <li><a href="/article/1899169914137145344.htm"
                           title="大语言模型原理基础与前沿 挑战与机遇" target="_blank">大语言模型原理基础与前沿 挑战与机遇</a>
                        <span class="text-muted">AI大模型应用之禅</span>
<a class="tag" taget="_blank" href="/search/DeepSeek/1.htm">DeepSeek</a><a class="tag" taget="_blank" href="/search/R1/1.htm">R1</a><a class="tag" taget="_blank" href="/search/%26amp%3B/1.htm">&</a><a class="tag" taget="_blank" href="/search/AI%E5%A4%A7%E6%A8%A1%E5%9E%8B%E4%B8%8E%E5%A4%A7%E6%95%B0%E6%8D%AE/1.htm">AI大模型与大数据</a><a class="tag" taget="_blank" href="/search/%E8%AE%A1%E7%AE%97%E7%A7%91%E5%AD%A6/1.htm">计算科学</a><a class="tag" taget="_blank" href="/search/%E7%A5%9E%E7%BB%8F%E8%AE%A1%E7%AE%97/1.htm">神经计算</a><a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/1.htm">深度学习</a><a class="tag" taget="_blank" href="/search/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C/1.htm">神经网络</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%95%B0%E6%8D%AE/1.htm">大数据</a><a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B/1.htm">大型语言模型</a><a class="tag" taget="_blank" href="/search/AI/1.htm">AI</a><a class="tag" taget="_blank" href="/search/AGI/1.htm">AGI</a><a class="tag" taget="_blank" href="/search/LLM/1.htm">LLM</a><a class="tag" taget="_blank" href="/search/Java/1.htm">Java</a><a class="tag" taget="_blank" href="/search/Python/1.htm">Python</a><a class="tag" taget="_blank" href="/search/%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1/1.htm">架构设计</a><a class="tag" taget="_blank" href="/search/Agent/1.htm">Agent</a><a class="tag" taget="_blank" href="/search/RPA/1.htm">RPA</a>
                        <div>大语言模型原理基础与前沿挑战与机遇1.背景介绍大语言模型(LargeLanguageModels,LLMs)是近年来人工智能领域的一个重要突破。它们通过深度学习技术,特别是基于变换器(Transformer)架构的模型,能够在自然语言处理(NLP)任务中表现出色。大语言模型的出现不仅推动了学术研究的发展,也在实际应用中展现了巨大的潜力。1.1大语言模型的起源大语言模型的起源可以追溯到早期的统计语言</div>
                    </li>
                    <li><a href="/article/1899168779489832960.htm"
                           title="AI Prompt 提示词工程入门指南:新手小白快速上手" target="_blank">AI Prompt 提示词工程入门指南:新手小白快速上手</a>
                        <span class="text-muted">机器学习司猫白</span>
<a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/prompt/1.htm">prompt</a>
                        <div>近年来,人工智能(AI)发展迅猛,特别是大语言模型(LLMs)(如ChatGPT、Claude、Gemini、Llama等)的广泛应用,让人们可以用自然语言与AI进行互动。而提示词工程(PromptEngineering),即如何设计有效的提示词,已经成为一项重要技能。本篇博客专为新手小白打造,帮助你快速掌握Prompt工程的基础,学会如何撰写高质量的提示词,让AI更精准地理解你的需求,并产出最优</div>
                    </li>
                    <li><a href="/article/1899166258922844160.htm"
                           title="AI提示词(Prompt)的理解和学习指南" target="_blank">AI提示词(Prompt)的理解和学习指南</a>
                        <span class="text-muted">时光不负追梦人</span>
<a class="tag" taget="_blank" href="/search/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/1.htm">人工智能</a><a class="tag" taget="_blank" href="/search/prompt/1.htm">prompt</a>
                        <div>AI提示词(Prompt)的理解和学习指南一、什么是AI提示词?AI提示词(Prompt)是用户输入给人工智能模型的指令或问题,用于引导模型生成特定类型的回答或内容。它如同与AI沟通的“钥匙”,设计得当的提示词能显著提升输出质量。二、提示词的核心要素明确目标模糊示例:“写一篇关于环保的文章。”优化示例:“以‘垃圾分类’为主题,撰写一篇面向社区居民的科普文章,要求包含实施步骤和常见误区,字数约800</div>
                    </li>
                                <li><a href="/article/82.htm"
                                       title="java类加载顺序" target="_blank">java类加载顺序</a>
                                    <span class="text-muted">3213213333332132</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a>
                                    <div>package com.demo;

/**
 * @Description 类加载顺序
 * @author FuJianyong
 * 2015-2-6上午11:21:37
 */
public class ClassLoaderSequence {
	
	String s1 = "成员属性"; 
	
	static String s2 = "</div>
                                </li>
                                <li><a href="/article/209.htm"
                                       title="Hibernate与mybitas的比较" target="_blank">Hibernate与mybitas的比较</a>
                                    <span class="text-muted">BlueSkator</span>
<a class="tag" taget="_blank" href="/search/sql/1.htm">sql</a><a class="tag" taget="_blank" href="/search/Hibernate/1.htm">Hibernate</a><a class="tag" taget="_blank" href="/search/%E6%A1%86%E6%9E%B6/1.htm">框架</a><a class="tag" taget="_blank" href="/search/ibatis/1.htm">ibatis</a><a class="tag" taget="_blank" href="/search/orm/1.htm">orm</a>
                                    <div>第一章     Hibernate与MyBatis 
Hibernate 是当前最流行的O/R mapping框架,它出身于sf.net,现在已经成为Jboss的一部分。 Mybatis 是另外一种优秀的O/R mapping框架。目前属于apache的一个子项目。 
MyBatis 参考资料官网:http:</div>
                                </li>
                                <li><a href="/article/336.htm"
                                       title="php多维数组排序以及实际工作中的应用" target="_blank">php多维数组排序以及实际工作中的应用</a>
                                    <span class="text-muted">dcj3sjt126com</span>
<a class="tag" taget="_blank" href="/search/PHP/1.htm">PHP</a><a class="tag" taget="_blank" href="/search/usort/1.htm">usort</a><a class="tag" taget="_blank" href="/search/uasort/1.htm">uasort</a>
                                    <div>自定义排序函数返回false或负数意味着第一个参数应该排在第二个参数的前面, 正数或true反之, 0相等usort不保存键名uasort 键名会保存下来uksort 排序是对键名进行的 
<!doctype html>
<html lang="en">
<head>
    <meta charset="utf-8&q</div>
                                </li>
                                <li><a href="/article/463.htm"
                                       title="DOM改变字体大小" target="_blank">DOM改变字体大小</a>
                                    <span class="text-muted">周华华</span>
<a class="tag" taget="_blank" href="/search/%E5%89%8D%E7%AB%AF/1.htm">前端</a>
                                    <div><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml&q</div>
                                </li>
                                <li><a href="/article/590.htm"
                                       title="c3p0的配置" target="_blank">c3p0的配置</a>
                                    <span class="text-muted">g21121</span>
<a class="tag" taget="_blank" href="/search/c3p0/1.htm">c3p0</a>
                                    <div>c3p0是一个开源的JDBC连接池,它实现了数据源和JNDI绑定,支持JDBC3规范和JDBC2的标准扩展。c3p0的下载地址是:http://sourceforge.net/projects/c3p0/这里可以下载到c3p0最新版本。 
以在spring中配置dataSource为例: 
<!-- spring加载资源文件 -->
<bean name="prope</div>
                                </li>
                                <li><a href="/article/717.htm"
                                       title="Java获取工程路径的几种方法" target="_blank">Java获取工程路径的几种方法</a>
                                    <span class="text-muted">510888780</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a>
                                    <div>第一种: 
File f = new File(this.getClass().getResource("/").getPath()); 
System.out.println(f); 
结果: 
C:\Documents%20and%20Settings\Administrator\workspace\projectName\bin 
获取当前类的所在工程路径; 
如果不加“</div>
                                </li>
                                <li><a href="/article/844.htm"
                                       title="在类Unix系统下实现SSH免密码登录服务器" target="_blank">在类Unix系统下实现SSH免密码登录服务器</a>
                                    <span class="text-muted">Harry642</span>
<a class="tag" taget="_blank" href="/search/%E5%85%8D%E5%AF%86/1.htm">免密</a><a class="tag" taget="_blank" href="/search/ssh/1.htm">ssh</a>
                                    <div>1.客户机 
    (1)执行ssh-keygen -t rsa -C "xxxxx@xxxxx.com"生成公钥,xxx为自定义大email地址 
    (2)执行scp ~/.ssh/id_rsa.pub root@xxxxxxxxx:/tmp将公钥拷贝到服务器上,xxx为服务器地址 
    (3)执行cat</div>
                                </li>
                                <li><a href="/article/971.htm"
                                       title="Java新手入门的30个基本概念一" target="_blank">Java新手入门的30个基本概念一</a>
                                    <span class="text-muted">aijuans</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/java+%E5%85%A5%E9%97%A8/1.htm">java 入门</a><a class="tag" taget="_blank" href="/search/%E6%96%B0%E6%89%8B/1.htm">新手</a>
                                    <div>在我们学习Java的过程中,掌握其中的基本概念对我们的学习无论是J2SE,J2EE,J2ME都是很重要的,J2SE是Java的基础,所以有必要对其中的基本概念做以归纳,以便大家在以后的学习过程中更好的理解java的精髓,在此我总结了30条基本的概念。  Java概述:  目前Java主要应用于中间件的开发(middleware)---处理客户机于服务器之间的通信技术,早期的实践证明,Java不适合</div>
                                </li>
                                <li><a href="/article/1098.htm"
                                       title="Memcached for windows 简单介绍" target="_blank">Memcached for windows 简单介绍</a>
                                    <span class="text-muted">antlove</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/Web/1.htm">Web</a><a class="tag" taget="_blank" href="/search/windows/1.htm">windows</a><a class="tag" taget="_blank" href="/search/cache/1.htm">cache</a><a class="tag" taget="_blank" href="/search/memcached/1.htm">memcached</a>
                                    <div>1. 安装memcached server 
a. 下载memcached-1.2.6-win32-bin.zip 
b. 解压缩,dos 窗口切换到 memcached.exe所在目录,运行memcached.exe -d install 
c.启动memcached Server,直接在dos窗口键入 net start "memcached Server&quo</div>
                                </li>
                                <li><a href="/article/1225.htm"
                                       title="数据库对象的视图和索引" target="_blank">数据库对象的视图和索引</a>
                                    <span class="text-muted">百合不是茶</span>
<a class="tag" taget="_blank" href="/search/%E7%B4%A2%E5%BC%95/1.htm">索引</a><a class="tag" taget="_blank" href="/search/oeacle%E6%95%B0%E6%8D%AE%E5%BA%93/1.htm">oeacle数据库</a><a class="tag" taget="_blank" href="/search/%E8%A7%86%E5%9B%BE/1.htm">视图</a>
                                    <div>  
视图 
  
  视图是从一个表或视图导出的表,也可以是从多个表或视图导出的表。视图是一个虚表,数据库不对视图所对应的数据进行实际存储,只存储视图的定义,对视图的数据进行操作时,只能将字段定义为视图,不能将具体的数据定义为视图 
  
    为什么oracle需要视图; 
   &</div>
                                </li>
                                <li><a href="/article/1352.htm"
                                       title="Mockito(一) --入门篇" target="_blank">Mockito(一) --入门篇</a>
                                    <span class="text-muted">bijian1013</span>
<a class="tag" taget="_blank" href="/search/%E6%8C%81%E7%BB%AD%E9%9B%86%E6%88%90/1.htm">持续集成</a><a class="tag" taget="_blank" href="/search/mockito/1.htm">mockito</a><a class="tag" taget="_blank" href="/search/%E5%8D%95%E5%85%83%E6%B5%8B%E8%AF%95/1.htm">单元测试</a>
                                    <div>        Mockito是一个针对Java的mocking框架,它与EasyMock和jMock很相似,但是通过在执行后校验什么已经被调用,它消除了对期望 行为(expectations)的需要。其它的mocking库需要你在执行前记录期望行为(expectations),而这导致了丑陋的初始化代码。 
 &nb</div>
                                </li>
                                <li><a href="/article/1479.htm"
                                       title="精通Oracle10编程SQL(5)SQL函数" target="_blank">精通Oracle10编程SQL(5)SQL函数</a>
                                    <span class="text-muted">bijian1013</span>
<a class="tag" taget="_blank" href="/search/oracle/1.htm">oracle</a><a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E5%BA%93/1.htm">数据库</a><a class="tag" taget="_blank" href="/search/plsql/1.htm">plsql</a>
                                    <div>/*
 * SQL函数
*/

--数字函数
--ABS(n):返回数字n的绝对值
declare
  v_abs number(6,2);
begin
  v_abs:=abs(&no);
  dbms_output.put_line('绝对值:'||v_abs);
end;

--ACOS(n):返回数字n的反余弦值,输入值的范围是-1~1,输出值的单位为弧度</div>
                                </li>
                                <li><a href="/article/1606.htm"
                                       title="【Log4j一】Log4j总体介绍" target="_blank">【Log4j一】Log4j总体介绍</a>
                                    <span class="text-muted">bit1129</span>
<a class="tag" taget="_blank" href="/search/log4j/1.htm">log4j</a>
                                    <div>Log4j组件:Logger、Appender、Layout 
  
Log4j核心包含三个组件:logger、appender和layout。这三个组件协作提供日志功能: 
 
 日志的输出目标 
 日志的输出格式 
  日志的输出级别(是否抑制日志的输出) 
  logger继承特性 
A logger is said to be an ancestor of anothe</div>
                                </li>
                                <li><a href="/article/1733.htm"
                                       title="Java IO笔记" target="_blank">Java IO笔记</a>
                                    <span class="text-muted">白糖_</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a>
                                    <div>	public static void main(String[] args) throws IOException {
		//输入流
		InputStream in = Test.class.getResourceAsStream("/test");
		InputStreamReader isr = new InputStreamReader(in);
		Bu</div>
                                </li>
                                <li><a href="/article/1860.htm"
                                       title="Docker 监控" target="_blank">Docker 监控</a>
                                    <span class="text-muted">ronin47</span>
<a class="tag" taget="_blank" href="/search/docker%E7%9B%91%E6%8E%A7/1.htm">docker监控</a>
                                    <div>         
目前项目内部署了docker,于是涉及到关于监控的事情,参考一些经典实例以及一些自己的想法,总结一下思路。 1、关于监控的内容 监控宿主机本身 
监控宿主机本身还是比较简单的,同其他服务器监控类似,对cpu、network、io、disk等做通用的检查,这里不再细说。 
额外的,因为是docker的</div>
                                </li>
                                <li><a href="/article/1987.htm"
                                       title="java-顺时针打印图形" target="_blank">java-顺时针打印图形</a>
                                    <span class="text-muted">bylijinnan</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a>
                                    <div>一个画图程序 要求打印出: 
 

1.int i=5;   
2.1  2  3  4  5  
3.16 17 18 19 6  
4.15 24 25 20 7  
5.14 23 22 21 8  
6.13 12 11 10 9  
7.  
8.int i=6  
9.1  2  3  4  5   6  
10.20 21 22 23 24  7  
11.19</div>
                                </li>
                                <li><a href="/article/2114.htm"
                                       title="关于iReport汉化版强制使用英文的配置方法" target="_blank">关于iReport汉化版强制使用英文的配置方法</a>
                                    <span class="text-muted">Kai_Ge</span>
<a class="tag" taget="_blank" href="/search/iReport%E6%B1%89%E5%8C%96/1.htm">iReport汉化</a><a class="tag" taget="_blank" href="/search/%E8%8B%B1%E6%96%87%E7%89%88/1.htm">英文版</a>
                                    <div>对于那些具有强迫症的工程师来说,软件汉化固然好用,但是汉化不完整却极为头疼,本方法针对iReport汉化不完整的情况,强制使用英文版,方法如下: 
在 iReport 安装路径下的 etc/ireport.conf 里增加红色部分启动参数,即可变为英文版。   
# ${HOME} will be replaced by user home directory accordin</div>
                                </li>
                                <li><a href="/article/2241.htm"
                                       title="[并行计算]论宇宙的可计算性" target="_blank">[并行计算]论宇宙的可计算性</a>
                                    <span class="text-muted">comsci</span>
<a class="tag" taget="_blank" href="/search/%E5%B9%B6%E8%A1%8C%E8%AE%A1%E7%AE%97/1.htm">并行计算</a>
                                    <div> 
 
      现在我们知道,一个涡旋系统具有并行计算能力.按照自然运动理论,这个系统也同时具有存储能力,同时具备计算和存储能力的系统,在某种条件下一般都会产生意识...... 
 
      那么,这种概念让我们推论出一个结论 
 
 
    &nb</div>
                                </li>
                                <li><a href="/article/2368.htm"
                                       title="用OpenGL实现无限循环的coverflow" target="_blank">用OpenGL实现无限循环的coverflow</a>
                                    <span class="text-muted">dai_lm</span>
<a class="tag" taget="_blank" href="/search/android/1.htm">android</a><a class="tag" taget="_blank" href="/search/coverflow/1.htm">coverflow</a>
                                    <div>网上找了很久,都是用Gallery实现的,效果不是很满意,结果发现这个用OpenGL实现的,稍微修改了一下源码,实现了无限循环功能 
 
源码地址: 
https://github.com/jackfengji/glcoverflow 
 
 

public class CoverFlowOpenGL extends GLSurfaceView implements
		GLSurfaceV</div>
                                </li>
                                <li><a href="/article/2495.htm"
                                       title="JAVA数据计算的几个解决方案1" target="_blank">JAVA数据计算的几个解决方案1</a>
                                    <span class="text-muted">datamachine</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/Hibernate/1.htm">Hibernate</a><a class="tag" taget="_blank" href="/search/%E8%AE%A1%E7%AE%97/1.htm">计算</a>
                                    <div>老大丢过来的软件跑了10天,摸到点门道,正好跟以前攒的私房有关联,整理存档。 
 
-----------------------------华丽的分割线------------------------------------- 
 
    数据计算层是指介于数据存储和应用程序之间,负责计算数据存储层的数据,并将计算结果返回应用程序的层次。J 
 &nbs</div>
                                </li>
                                <li><a href="/article/2622.htm"
                                       title="简单的用户授权系统,利用给user表添加一个字段标识管理员的方式" target="_blank">简单的用户授权系统,利用给user表添加一个字段标识管理员的方式</a>
                                    <span class="text-muted">dcj3sjt126com</span>
<a class="tag" taget="_blank" href="/search/yii/1.htm">yii</a>
                                    <div>怎么创建一个简单的(非 RBAC)用户授权系统 
通过查看论坛,我发现这是一个常见的问题,所以我决定写这篇文章。 
本文只包括授权系统.假设你已经知道怎么创建身份验证系统(登录)。 数据库 
首先在 user 表创建一个新的字段(integer 类型),字段名 'accessLevel',它定义了用户的访问权限 扩展 CWebUser 类 
在配置文件(一般为 protecte</div>
                                </li>
                                <li><a href="/article/2749.htm"
                                       title="未选之路" target="_blank">未选之路</a>
                                    <span class="text-muted">dcj3sjt126com</span>
<a class="tag" taget="_blank" href="/search/%E8%AF%97/1.htm">诗</a>
                                    <div>作者:罗伯特*费罗斯特 
  
黄色的树林里分出两条路, 
可惜我不能同时去涉足, 
我在那路口久久伫立, 
我向着一条路极目望去, 
直到它消失在丛林深处. 
  
但我却选了另外一条路, 
它荒草萋萋,十分幽寂; 
显得更诱人,更美丽, 
虽然在这两条小路上, 
都很少留下旅人的足迹. 
  
那天清晨落叶满地, 
两条路都未见脚印痕迹. 
呵,留下一条路等改日再</div>
                                </li>
                                <li><a href="/article/2876.htm"
                                       title="Java处理15位身份证变18位" target="_blank">Java处理15位身份证变18位</a>
                                    <span class="text-muted">蕃薯耀</span>
<a class="tag" taget="_blank" href="/search/18%E4%BD%8D%E8%BA%AB%E4%BB%BD%E8%AF%81%E5%8F%9815%E4%BD%8D/1.htm">18位身份证变15位</a><a class="tag" taget="_blank" href="/search/15%E4%BD%8D%E8%BA%AB%E4%BB%BD%E8%AF%81%E5%8F%9818%E4%BD%8D/1.htm">15位身份证变18位</a><a class="tag" taget="_blank" href="/search/%E8%BA%AB%E4%BB%BD%E8%AF%81%E8%BD%AC%E6%8D%A2/1.htm">身份证转换</a>
                                    <div>  
15位身份证变18位,18位身份证变15位 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
蕃薯耀 201</div>
                                </li>
                                <li><a href="/article/3003.htm"
                                       title="SpringMVC4零配置--应用上下文配置【AppConfig】" target="_blank">SpringMVC4零配置--应用上下文配置【AppConfig】</a>
                                    <span class="text-muted">hanqunfeng</span>
<a class="tag" taget="_blank" href="/search/springmvc4/1.htm">springmvc4</a>
                                    <div>从spring3.0开始,Spring将JavaConfig整合到核心模块,普通的POJO只需要标注@Configuration注解,就可以成为spring配置类,并通过在方法上标注@Bean注解的方式注入bean。 
  
Xml配置和Java类配置对比如下: 
applicationContext-AppConfig.xml 
  
<!-- 激活自动代理功能 参看:</div>
                                </li>
                                <li><a href="/article/3130.htm"
                                       title="Android中webview跟JAVASCRIPT中的交互" target="_blank">Android中webview跟JAVASCRIPT中的交互</a>
                                    <span class="text-muted">jackyrong</span>
<a class="tag" taget="_blank" href="/search/JavaScript/1.htm">JavaScript</a><a class="tag" taget="_blank" href="/search/html/1.htm">html</a><a class="tag" taget="_blank" href="/search/android/1.htm">android</a><a class="tag" taget="_blank" href="/search/%E8%84%9A%E6%9C%AC/1.htm">脚本</a>
                                    <div>  在android的应用程序中,可以直接调用webview中的javascript代码,而webview中的javascript代码,也可以去调用ANDROID应用程序(也就是JAVA部分的代码).下面举例说明之: 
 
1 JAVASCRIPT脚本调用android程序 
   要在webview中,调用addJavascriptInterface(OBJ,int</div>
                                </li>
                                <li><a href="/article/3257.htm"
                                       title="8个最佳Web开发资源推荐" target="_blank">8个最佳Web开发资源推荐</a>
                                    <span class="text-muted">lampcy</span>
<a class="tag" taget="_blank" href="/search/%E7%BC%96%E7%A8%8B/1.htm">编程</a><a class="tag" taget="_blank" href="/search/Web/1.htm">Web</a><a class="tag" taget="_blank" href="/search/%E7%A8%8B%E5%BA%8F%E5%91%98/1.htm">程序员</a>
                                    <div>Web开发对程序员来说是一项较为复杂的工作,程序员需要快速地满足用户需求。如今很多的在线资源可以给程序员提供帮助,比如指导手册、在线课程和一些参考资料,而且这些资源基本都是免费和适合初学者的。无论你是需要选择一门新的编程语言,或是了解最新的标准,还是需要从其他地方找到一些灵感,我们这里为你整理了一些很好的Web开发资源,帮助你更成功地进行Web开发。 
这里列出10个最佳Web开发资源,它们都是受</div>
                                </li>
                                <li><a href="/article/3384.htm"
                                       title="架构师之面试------jdk的hashMap实现" target="_blank">架构师之面试------jdk的hashMap实现</a>
                                    <span class="text-muted">nannan408</span>
<a class="tag" taget="_blank" href="/search/HashMap/1.htm">HashMap</a>
                                    <div>1.前言。 
  如题。 
2.详述。 
  (1)hashMap算法就是数组链表。数组存放的元素是键值对。jdk通过移位算法(其实也就是简单的加乘算法),如下代码来生成数组下标(生成后indexFor一下就成下标了)。 
 

static int hash(int h) 
{ 
    h ^= (h >>> 20) ^ (h >>></div>
                                </li>
                                <li><a href="/article/3511.htm"
                                       title="html禁止清除input文本输入缓存" target="_blank">html禁止清除input文本输入缓存</a>
                                    <span class="text-muted">Rainbow702</span>
<a class="tag" taget="_blank" href="/search/html/1.htm">html</a><a class="tag" taget="_blank" href="/search/%E7%BC%93%E5%AD%98/1.htm">缓存</a><a class="tag" taget="_blank" href="/search/input/1.htm">input</a><a class="tag" taget="_blank" href="/search/%E8%BE%93%E5%85%A5%E6%A1%86/1.htm">输入框</a><a class="tag" taget="_blank" href="/search/change/1.htm">change</a>
                                    <div>多数浏览器默认会缓存input的值,只有使用ctl+F5强制刷新的才可以清除缓存记录。    
如果不想让浏览器缓存input的值,有2种方法: 
方法一: 在不想使用缓存的input中添加 autocomplete="off";  
<input type="text" autocomplete="off" n</div>
                                </li>
                                <li><a href="/article/3638.htm"
                                       title="POJO和JavaBean的区别和联系" target="_blank">POJO和JavaBean的区别和联系</a>
                                    <span class="text-muted">tjmljw</span>
<a class="tag" taget="_blank" href="/search/POJO/1.htm">POJO</a><a class="tag" taget="_blank" href="/search/java+beans/1.htm">java beans</a>
                                    <div>POJO 和JavaBean是我们常见的两个关键字,一般容易混淆,POJO全称是Plain Ordinary Java Object / Pure Old Java Object,中文可以翻译成:普通Java类,具有一部分getter/setter方法的那种类就可以称作POJO,但是JavaBean则比 POJO复杂很多, Java Bean 是可复用的组件,对 Java Bean 并没有严格的规</div>
                                </li>
                                <li><a href="/article/3765.htm"
                                       title="java中单例的五种写法" target="_blank">java中单例的五种写法</a>
                                    <span class="text-muted">liuxiaoling</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E5%8D%95%E4%BE%8B/1.htm">单例</a>
                                    <div>/**
 * 单例模式的五种写法:
 * 1、懒汉
 * 2、恶汉
 * 3、静态内部类
 * 4、枚举
 * 5、双重校验锁
 */
/**
 * 五、 双重校验锁,在当前的内存模型中无效
 */
class LockSingleton
{

    private volatile static LockSingleton singleton;

    pri</div>
                                </li>
                </ul>
            </div>
        </div>
    </div>

<div>
    <div class="container">
        <div class="indexes">
            <strong>按字母分类:</strong>
            <a href="/tags/A/1.htm" target="_blank">A</a><a href="/tags/B/1.htm" target="_blank">B</a><a href="/tags/C/1.htm" target="_blank">C</a><a
                href="/tags/D/1.htm" target="_blank">D</a><a href="/tags/E/1.htm" target="_blank">E</a><a href="/tags/F/1.htm" target="_blank">F</a><a
                href="/tags/G/1.htm" target="_blank">G</a><a href="/tags/H/1.htm" target="_blank">H</a><a href="/tags/I/1.htm" target="_blank">I</a><a
                href="/tags/J/1.htm" target="_blank">J</a><a href="/tags/K/1.htm" target="_blank">K</a><a href="/tags/L/1.htm" target="_blank">L</a><a
                href="/tags/M/1.htm" target="_blank">M</a><a href="/tags/N/1.htm" target="_blank">N</a><a href="/tags/O/1.htm" target="_blank">O</a><a
                href="/tags/P/1.htm" target="_blank">P</a><a href="/tags/Q/1.htm" target="_blank">Q</a><a href="/tags/R/1.htm" target="_blank">R</a><a
                href="/tags/S/1.htm" target="_blank">S</a><a href="/tags/T/1.htm" target="_blank">T</a><a href="/tags/U/1.htm" target="_blank">U</a><a
                href="/tags/V/1.htm" target="_blank">V</a><a href="/tags/W/1.htm" target="_blank">W</a><a href="/tags/X/1.htm" target="_blank">X</a><a
                href="/tags/Y/1.htm" target="_blank">Y</a><a href="/tags/Z/1.htm" target="_blank">Z</a><a href="/tags/0/1.htm" target="_blank">其他</a>
        </div>
    </div>
</div>
<footer id="footer" class="mb30 mt30">
    <div class="container">
        <div class="footBglm">
            <a target="_blank" href="/">首页</a> -
            <a target="_blank" href="/custom/about.htm">关于我们</a> -
            <a target="_blank" href="/search/Java/1.htm">站内搜索</a> -
            <a target="_blank" href="/sitemap.txt">Sitemap</a> -
            <a target="_blank" href="/custom/delete.htm">侵权投诉</a>
        </div>
        <div class="copyright">版权所有 IT知识库 CopyRight © 2000-2050 E-COM-NET.COM , All Rights Reserved.
<!--            <a href="https://beian.miit.gov.cn/" rel="nofollow" target="_blank">京ICP备09083238号</a><br>-->
        </div>
    </div>
</footer>
<!-- 代码高亮 -->
<script type="text/javascript" src="/static/syntaxhighlighter/scripts/shCore.js"></script>
<script type="text/javascript" src="/static/syntaxhighlighter/scripts/shLegacy.js"></script>
<script type="text/javascript" src="/static/syntaxhighlighter/scripts/shAutoloader.js"></script>
<link type="text/css" rel="stylesheet" href="/static/syntaxhighlighter/styles/shCoreDefault.css"/>
<script type="text/javascript" src="/static/syntaxhighlighter/src/my_start_1.js"></script>





</body>

</html>