唐僧爱吃唐僧肉

bert-for-tf2源码解读1

bert-for-tf2源码解读

bert-for-tf2读源码
- 预处理过程
- 参数调用过程
- 模型的输入以及输出

bert-for-tf2读源码

最近想要阅读一下bert对应的源码，由于电脑环境的限制无法安装tensorflow1，所以不得不选择tf2的源代码作为阅读，对应的项目地址如下：
bert-for-tf2源码地址
这里我们使用examples下面的gpu_movie_reviews.ipynb对应的代码作为测试性代码来进行相应的解读。

预处理过程

from bert.tokenization.bert_tokenization import FullTokenizer
tokenizer = FullTokenizer(vocab_file=os.path.join(bert_ckpt_dir, "vocab.txt"))

这一句为bert当中使用的分词操作，我们打开bert.tokenization对应的文件之中查看具体的源代码的内容，接着我们查看MovieReviewData当中定义的类

data = MovieReviewData(tokenizer, 
                       sample_size=10*128*2,#2560, 
                       max_seq_len=128)

这一部分是从stanford.edu之中下载相应的数据并作相应的处理的操作，我们查看相应的MovieReviewData对应的类的定义

class MovieReviewData:
    DATA_COLUMN = "sentence"
    LABEL_COLUMN = "polarity"

    def __init__(self, tokenizer: FullTokenizer, sample_size=None, max_seq_len=1024):
        print('MovieReviewData __init__')
        #这里初始化的时候定义了data=MovieReviewData(tokenizer,sampler_size=2560,
        #max_seq_len = 128)
        self.tokenizer = tokenizer
        self.sample_size = sample_size
        self.max_seq_len = 0
        train, test = download_and_load_datasets()
        #这个下载文件的函数在上面定义的
        #这里面train的类型为
        train, test = map(lambda df: df.reindex(df[MovieReviewData.DATA_COLUMN].str.len().sort_values().index), 
                          [train, test])
        #[train,test]为传入的多个参数构成的相应数组，这个lambda函数相当于传入df数组，
        #返回的内容为df.reindex(df["sentence"].str.len().sort_values().index
        #返回的内容即为df["sentence"]句子按照长度进行升序的排列
        #lambda函数：匿名函数，MovieReviewData.DATA_COLUMN="sentence"
        r"""for  i  in  range(len(train)):
            print(len(train[i]["sentence"]))
        print('test_len = ')
        for  i  in  range(len(test)):
            print(len(test[i]["sentence"]))
        当使用上面这种写法的时候，循环会相应的报错，因为dataframe不能使用常规方法
        进行操作
        """
        train.to_csv('after_map_train.csv')
        test.to_csv('after_map_test.csv')
        #这里的sample_size = None
        if sample_size is not None:
            assert sample_size % 128 == 0
            train, test = train.head(sample_size), test.head(sample_size)
            # train, test = map(lambda df: df.sample(sample_size), [train, test])
            #不为None的时候一次从train中取出sample_size个数值，从test中取出sample_size
            #个对应的数值
        
        #之前定义了sampler_size = 2560,则能够进入到assert函数之中
        #train,test = train.head(sample_size), test.head(sample_size)
        #此时取出的len(train) = 2560,len(test) = 2560
        ((self.train_x, self.train_y),
         (self.test_x, self.test_y)) = map(self._prepare, [train, test])
        #train,test为之前按照升序排列好的数据
        #map(self._prepare,[train,test])相当于返回self._prepare(train)
        #+self._prepare(test)的内容
        r"""
        self.train_x = 2560,self.train_y = 2560,self.test_x = 2560,
        self.test_y = 2560,max_seq_len = 178
        self._prepare(train)之中,因为定义了sampler_size=2560,所以一个批次为
        2560个对应的数值
        """
        self.max_seq_len = min(self.max_seq_len, max_seq_len)
        #定义的max_seq_len = 128,取出的self.max_seq_len = 178,所以最后的结果为128
        ((self.train_x, self.train_x_token_types),
         (self.test_x, self.test_x_token_types)) = map(self._pad, 
                                                       [self.train_x, self.test_x])
        
    def _prepare(self, df):
        print('MovieReviewData _prepare')
        x, y = [], []
        with tqdm(total=df.shape[0], unit_scale=True) as pbar:
        #循环之中使用tqdm操作：for  i  in  tqdm(range(100)):
            for ndx, row in df.iterrows():
            #使用iterrows()函数对dataframe进行遍历
                print('ndx = ')
                print(ndx)
                print('row = ')
                print(row)
                text, label = row[MovieReviewData.DATA_COLUMN], row[MovieReviewData.LABEL_COLUMN]
                #text,label = row["sentence"],row["polarity"]
                tokens = self.tokenizer.tokenize(text)
                #调用FullTokenizer.tokenize(text)
                tokens = ["[CLS]"] + tokens + ["[SEP]"]
                token_ids = self.tokenizer.convert_tokens_to_ids(tokens)
                #对于每一个切分出来的在词汇表之中存在的部分，将对应内容转化为每个单词表之中相应的编号
                self.max_seq_len = max(self.max_seq_len, len(token_ids))
                #self.max_seq_len初始化时为0，这里是求len(token_ids)的最大长度
                x.append(token_ids)
                y.append(int(label))
                pbar.update()
                #更新进度条
        return np.array(x), np.array(y)

    def _pad(self, ids):
        print('_pad')
        x, t = [], []
        token_type_ids = [0] * self.max_seq_len
        for input_ids in ids:
            #print('before _pad')
            #print('input_ids = ')
            #print(input_ids)
            input_ids = input_ids[:min(len(input_ids), self.max_seq_len - 2)]
            #这里面min使用的非常巧妙，考虑到了越界的情况
            #???为什么使用self.max_seq_len-2???
            #看这个例子:before _pad input_ids = [101,1998,...999,102],
            #after _pad input_ids = [101,1998,...999,0,0]
            #可以发现如果最后的位置仅存一个102的时候，会将对应的[SEP]标志删除掉
            #然后补充上零
            input_ids = input_ids + [0] * (self.max_seq_len - len(input_ids))
            x.append(np.array(input_ids))
            t.append(token_type_ids)
        return np.array(x), np.array(t)

首先使用

for ndx, row in df.iterrows():
            #使用iterrows()函数对dataframe进行遍历
                print('ndx = ')
                print(ndx)
                print('row = ')
                print(row)

可以看出每一个句子有两项指标，一个是情感态度sentiment，另外一个是是否两极分化的polarity的内容，
接下来的语句就是关键的分词语句

tokens = self.tokenizer.tokenize(text)
#调用FullTokenizer.tokenize(text)
tokens = ["[CLS]"] + tokens + ["[SEP]"]

首先我们查看FullTokenizer初始化的内容

    def __init__(self, vocab_file, do_lower_case=True):
        self.vocab = load_vocab(vocab_file)
        print('vocab = ')
        print(self.vocab)
        r"""
        vocab = OrderedDict([('[PAD]',0),('[unused0]',1),('[unused1]',2)
        ...
        ('##?',30520),('##~',30521)这是对应的vocab单词之中的内容
        """
        self.inv_vocab = {
     v: k for k, v in self.vocab.items()}
        r"""
        将上面vocab的内容形成相应的字典
        {0:'[PAD]',1:'[unused0]',2:'[unused1]',...30521:'##~'}
        """
        print('self.inv_vocab = ')
        print(self.inv_vocab)
        self.basic_tokenizer = BasicTokenizer(do_lower_case=do_lower_case)
        #BasicTokenizer的对应类就在下面
        self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)
        #WordpieceTokenizer的定义在BasicTokenizer定义的下面

发现这里在初始化了字典之后，需要初始化BasicTokenizer以及WordpieceTokenizer的对应内容，这里我们查看FullTokenizer里面的tokenize对应的函数

    def tokenize(self, text):
        split_tokens = []
        for token in self.basic_tokenizer.tokenize(text):
        #BasicTokenizer.tokenize(text)
            #print('token = ')
            #print(token)
            #第一次切分：将所有的单词一次性地全部切分出来
            for sub_token in self.wordpiece_tokenizer.tokenize(token):
            #在BasicTokenizer结果的基础上进行再一次切分，得到子词语(subword),
            #词汇表就是在此时引入的(中文不存在字词，因为中文切分出来都是一个字)
                #print('sub_token = ')
                #print(sub_token)
                split_tokens.append(sub_token)
                
        return split_tokens

接下来查看BasicTokenzier里面的定义

class BasicTokenizer(object):
    """Runs basic tokenization (punctuation splitting, lower casing, etc.)."""

    def __init__(self, do_lower_case=True):
        """Constructs a BasicTokenizer.

        Args:
          do_lower_case: Whether to lower case the input.
        """
        print('BasicTokenizer __init__')
        self.do_lower_case = do_lower_case
        results = self._run_split_on_punc('I am a pig!I love you')
        print('results = ')
        print(results)
        #返回的数组为['I am a pig','!','I love you']

    def tokenize(self, text):
        """Tokenizes a piece of text.第一个循环之中的tokenize分词操作"""
        text = convert_to_unicode(text)
        text = self._clean_text(text)

        # This was added on November 1st, 2018 for the multilingual and Chinese
        # models. This is also applied to the English models now, but it doesn't
        # matter since the English models were not trained on any Chinese data
        # and generally don't have any Chinese data in them (there are Chinese
        # characters in the vocabulary because Wikipedia does have some Chinese
        # words in the English Wikipedia.).
        text = self._tokenize_chinese_chars(text)
	# 如果是中文的话，每个中文汉字的左右两边都加上空格，(这里只对汉字进行分字，
	# 不对汉字进行分词)，如果是英文的话不需要进行处理，因为英文词与词之间自身
	# 带有相应的空格
        orig_tokens = whitespace_tokenize(text)
        # whitespace_tokenize的定义在FullTokenizer的上面,将对应的句子使用空格进行分割开
        split_tokens = []
        #print('orig_tokens = ')
        #print(orig_tokens)
        
        for token in orig_tokens:
            if self.do_lower_case:
            
                token = token.lower()
                token = self._run_strip_accents(token)
                #_run_strip_accents(text)重点内容，将accents内容去掉
            split_tokens.extend(self._run_split_on_punc(token))
            #将对应的标点符号单独的切分出来，比如['I am a pig!I love you']切分成为
            #['I am a pig','!','I love you']
        output_tokens = whitespace_tokenize(" ".join(split_tokens))
        #这一步先用标准空格拼接上一步的处理结果，再执行空格分词(去除连续空格)
        return output_tokens

    def _run_strip_accents(self, text):
        r"""Strips accents from a piece of text.
        _run_strip_accents(text) 方法用于去除 accents，即变音符号，那么什么是变音符号呢？
        像 Keras 作者 François Chollet 名字中些许奇怪的字符 ç、简历的英文 résumé 中的 é 和中文拼音声调 á 等，这些都是变音符号 accents，
        维基百科中描述如下：附加符号或称变音符号（diacritic、diacritical mark、diacritical point、diacritical sign），
        是指添加在字母上面的符号，以更改字母的发音或者以区分拼写相似词语。例如汉语拼音字母“ü”上面的两个小点，或“á”、“à”字母上面的标调符。"""
        #这里面放入英语单词比如'this','movie'等词语的时候
        #在unicodedata之前与unicodedata之后的内容均为this,movie
        #等内容
        text = unicodedata.normalize("NFD", text)
        #核心:unicodedata.normalize和unicodedata.category,前者返回
        #输入字符串text的规范分解形式，后者返回输入字符char的Unicode类别
        #unicodedata将英文字母和accents的内容切分出来
        output = []
        for char in text:
            cat = unicodedata.category(char)
            #unicodedata.category:返回一个字符在unicode里分类的类型
            #Mn:Nonspace
            if cat == "Mn":
            #cat=="Mn"代表当前的内容为重音符号
                continue
            output.append(char)
        return "".join(output)

    def _run_split_on_punc(self, text):
        """Splits punctuation on a piece of text."""
        #print('_run_split_on_punc')
        chars = list(text)
        #输入的字符串为I am a pig!I love you，这里形成相应的数组
        #['I',' ','a','m',' ','a',' ','p','i','g','!','I',' ',
        #'l','o','v','e',' ','y','o','u']
        i = 0
        start_new_word = True
        output = []
        #print('chars = ')
        #print(chars)
        while i < len(chars):
            char = chars[i]
            if _is_punctuation(char):
	  #cp = ord(char)
	  #if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or
	  #    (cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)):
	  #  return True
	  #cat = unicodedata.category(char)
	  #if cat.startswith("P"):
	  #带"P"打头的[Pc]:Punctuation,Connector,[Pd]:Punctuation,Dash,
	  #[Pe]:Punctuation,Close...
	  #  return Truereturn False
                output.append([char])
                #当为!的时候，单压入一个[!]的数组'
                start_new_word = True
            else:
                if start_new_word:
                    output.append([])
                    #前面如果单独压入一个[!]的数组之后，需要重新压入一个新的数组
                start_new_word = False
                output[-1].append(char)
                #只要现在压入数组了start_new_word就必须为False，并且在最后一个数组压入char
                #['I',' ','a','m',' ','a',' ','p','i','g','!','I',' ','l'
                #' ','o','v','e',' ','y','o','u']
                #输出的对应的数组为['I am a pig','!','I love you']
            i += 1

        return ["".join(x) for x in output]

    def _tokenize_chinese_chars(self, text):
        """Adds whitespace around any CJK character."""
        output = []
        for char in text:
            cp = ord(char)
            if self._is_chinese_char(cp):
                output.append(" ")
                output.append(char)
                output.append(" ")
            else:
                output.append(char)
        return "".join(output)

    def _is_chinese_char(self, cp):
        """Checks whether CP is the codepoint of a CJK character."""
        # This defines a "chinese character" as anything in the CJK Unicode block:
        #   https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
        #
        # Note that the CJK Unicode block is NOT all Japanese and Korean characters,
        # despite its name. The modern Korean Hangul alphabet is a different block,
        # as is Japanese Hiragana and Katakana. Those alphabets are used to write
        # space-separated words, so they are not treated specially and handled
        # like the all of the other languages.
        if ((cp >= 0x4E00 and cp <= 0x9FFF) or  #
                (cp >= 0x3400 and cp <= 0x4DBF) or  #
                (cp >= 0x20000 and cp <= 0x2A6DF) or  #
                (cp >= 0x2A700 and cp <= 0x2B73F) or  #
                (cp >= 0x2B740 and cp <= 0x2B81F) or  #
                (cp >= 0x2B820 and cp <= 0x2CEAF) or
                (cp >= 0xF900 and cp <= 0xFAFF) or  #
                (cp >= 0x2F800 and cp <= 0x2FA1F)):  #
            return True

        return False

    def _clean_text(self, text):
        """Performs invalid character removal and whitespace cleanup on text."""
        output = []
        for char in text:
            cp = ord(char)
            if cp == 0 or cp == 0xfffd or _is_control(char):
                continue
            if _is_whitespace(char):
                output.append(" ")
            else:
                output.append(char)
        return "".join(output)

BasicTokenizer主要是用于切分句子的，比如输入的是’I am a pig!I love you’切分完之后的内容为[‘I am a pig’,’!’,‘I love you’]

class WordpieceTokenizer(object):
    """Runs WordPiece tokenziation."""

    def __init__(self, vocab, unk_token="[UNK]", max_input_chars_per_word=200):
        #self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)
        self.vocab = vocab
        r"""vocab = OrderedDict([('[PAD]',0),('[unused0]',1),('[unused1]',2)
        ...
        ('##?',30520),('##~',30521)这是对应的vocab单词之中的内容
        """
        self.unk_token = unk_token
        #unk_token="[UNK]"
        self.max_input_chars_per_word = max_input_chars_per_word
	#max_input_chars_per_word = 200
        #unaffable
        print('#########################WordpieceTokenizer __init__#########################')
        results = self.tokenize('unaffable')
        print('$$$$$$$$$$$$$$$$$$$$$$$$$results = $$$$$$$$$$$$$$$$$$$$$$$$$')
        print(results)
    def tokenize(self, text):
        """Tokenizes a piece of text into its word pieces.

        This uses a greedy longest-match-first algorithm to perform tokenization
        using the given vocabulary.

        For example:
          input = "unaffable"
          output = ["un", "##aff", "##able"]

        Args:
          text: A single token or whitespace separated tokens. This should have
            already been passed through `BasicTokenizer.

        Returns:
          A list of wordpiece tokens.
        """

        text = convert_to_unicode(text)
        #print('text = ')
        #print(text)
        #text = unaffable
        output_tokens = []
        for token in whitespace_tokenize(text):
            chars = list(token)
            #print('~~~chars = ~~~')
            #print(chars)
            if len(chars) > self.max_input_chars_per_word:
                output_tokens.append(self.unk_token)
                continue
            #单词长度超出200的时候使用["UNK"]标志来代替
            is_bad = False
            start = 0
            sub_tokens = []
            #start为上一次结束的位置，初始化的时候start的值为0
            while start < len(chars):
                end = len(chars)
                #end初始化为字符串最后的位置，这波操作有点类似于双指针操作
                cur_substr = None
                while start < end:
                    substr = "".join(chars[start:end])
                    #注意python中end直接指向结束的位置
                    if start > 0:
                        substr = "##" + substr
                    if substr in self.vocab:
                        cur_substr = substr
                        break
                    end -= 1
                if cur_substr is None:
                    is_bad = True
                    break
                sub_tokens.append(cur_substr)
                start = end

            if is_bad:
                output_tokens.append(self.unk_token)
            #没有找到的时候也标记为["UNK"]
            else:
                output_tokens.extend(sub_tokens)
            #这里使用extend的原因在于一个单词被拆分成为了多个部分，
            #每一个部分都是一个拆分过后的数组，比如unaffable被拆分为了
            #['una','##ffa','##ble'],如果多个unaffable就被拆分为了
            #['una','##ffa','##ble','una','##ffa','##ble']
        return output_tokens

WordpieceTokenizer是对相应的单词进行切割，这里使用的vocab.txt之中的内容对单词进行匹配，比如"unaffable"进行匹配的时候就会拆分会被拆分成为[‘una’,’##ffa’,’##ble’]，这里的’una’,’##ffa’,’##ble’都是vocab.txt之中提供的前缀单词。

参数调用过程

首先贴出对应的正式训练的代码

def create_model(max_seq_len, adapter_size=64):
    """Creates a classification model."""
    #adapter_size = 64  # see - arXiv:1902.00751
    #create_model(data.max_seq_len, adapter_size=adapter_size)
    print('max_seq_len = ')
    print(max_seq_len)
    print('adapter_size = ')
    print(adapter_size)
    #传入的max_seq_len = 128,adapter_size = None
    # create the bert layer
    flags = 0
    with tf.io.gfile.GFile(bert_config_file, "r") as reader:
        """bert_config.json模型中参数的配置：
        !!!注意bert_config.json之中不能随意加注释，否则可能会报错
        json报错：Expecting property name enclosed in double quotes: 
        line 1 column 2 (char 1)
        """
        print('flags = %d'%flags)
        flags = flags+1
        bc = StockBertConfig.from_json_string(reader.read())
        #这里的StockBertConfig是继承自Params，所以仍然可以使用
        #Params的相关的方法，查看Params的源文件之后发现from_json_string
        #的对应的方法
        #第一步使用from_json_string:读取对应的参数,bc为一个集成的Params
        #里面存放了各种从json之中读取出来的参数
        print('bc = ')
        print(bc)
        print('begin map_to_stock_')
        bert_params = map_stock_config_to_params(bc)
        #Converts the original BERT or ALBERT config dictionary
        #to a 'BertModelLayer.Params' instance,return  a  'BertModelLayer
        #Params' instance.
        bert_params.adapter_size = adapter_size
        #传入的adapter_size = None，使用map_stock_config_to_params
        #重新添加了一系列新的参数
        print('^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^')
        print('bert_params = ')
        print(bert_params)
        bert = BertModelLayer.from_params(bert_params, name="bert")
        #BertModelLayer在model.py之中，继承自bert.layer之中的方法
        #这里面的from_params在params中的with_params.py
        #!!!注意BertModelLayer之中有个Params类，里面有BertEmbeddingsLayer和TransformerEncoderLayer
        #具体的参数在BertEmbeddingsLayer.Params以及TransformerEncoderLayer.Params之中有定义
        #所以在使用BertModelLayer的时候会调用BertEmbeddingLayer还有TransoformerEncoderLayer
        
    input_ids      = keras.layers.Input(shape=(max_seq_len,), dtype='int32', name="input_ids")
    # token_type_ids = keras.layers.Input(shape=(max_seq_len,), dtype='int32', name="token_type_ids")
    # output         = bert([input_ids, token_type_ids])
    output         = bert(input_ids)

    print("bert shape", output.shape)
    cls_out = keras.layers.Lambda(lambda seq: seq[:, 0, :])(output)
    cls_out = keras.layers.Dropout(0.5)(cls_out)
    logits = keras.layers.Dense(units=768, activation="tanh")(cls_out)
    logits = keras.layers.Dropout(0.5)(logits)
    logits = keras.layers.Dense(units=2, activation="softmax")(logits)

    # model = keras.Model(inputs=[input_ids, token_type_ids], outputs=logits)
    # model.build(input_shape=[(None, max_seq_len), (None, max_seq_len)])
    model = keras.Model(inputs=input_ids, outputs=logits)
    model.build(input_shape=(None, max_seq_len))

    # load the pre-trained model weights
    load_stock_weights(bert, bert_ckpt_file)

    # freeze weights if adapter-BERT is used
    if adapter_size is not None:
        freeze_bert_layers(bert)

    model.compile(optimizer=keras.optimizers.Adam(),
                loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=[keras.metrics.SparseCategoricalAccuracy(name="acc")])

    model.summary()
        
    return model

其中如下的语句：

bert_params = map_stock_config_to_params(bc)
bert_params.adapter_size = adapter_size

定义了当前相应的参数，打开bert_config.json文件查看相应的参数

{
     
  "attention_probs_dropout_prob": 0.1,//乘法attention时，softmax后面dropout的概率
  "hidden_act": "gelu",//激活函数，引入随机正则的思想，是一种对于神经元输入概率的描述,直观上更符合自然的认识，同时饰演效果要比Relus与ELUs都要好，GELU(x) = xP(X<=x) = x*哦米达(x),def  gelu(input_tensor):	cdf = 0.5*(1.0+tf.erf(input_tensor/tf.sqrt(2.0))),return  input_tensor*cdf，gelu为一种激活函数
  "hidden_dropout_prob": 0.1,//隐藏层dropout的概率
  "hidden_size": 768,//隐藏层单元数
  "initializer_range": 0.02,//初始化范围
  "intermediate_size": 3072,//升维维度
  "max_position_embeddings": 512,//一个大于seq_length的参数，用于生成position_embedding
  "num_attention_heads": 12,//在encoder层中的注意头个数
  "num_hidden_layers": 12,//隐藏层数
  "type_vocab_size": 2,//segment_ids类别[0,1]
  "vocab_size": 30522//词典中词数
}

我们查看with_params.py文件之中对于from_params函数的定义

@classmethod
    #classmethod修饰符对应的函数不需要实例化，不需要self参数，但第一个参数需要是表示自身类
    #的cls参数，可以来调用类的属性，类的方法，实例化对象等。
    #from_params为对应的类方法，不需要实例化，直接调用，比如WithParams.from_params(...)
    def from_params(cls, params: Params, *args, **kwargs):
        #print('!!!!!!!!!!!!!!!!!!!!from_params!!!!!!!!!!!!!!!!!!!!')
        #print(sys._getframe().f_code.co_name)
        #查看当前的函数
        #print(sys._getframe().f_back.f_code.co_name)
        #print(cls.__class__.__name__)
        #print('params = ')
        #print(params)
        #print('args = ')
        #print(args)
        #print('kwargs = ')
        #print(kwargs)
        """
        Creates an instance from the specified parameters (by overriding the params argument with kwargs).
        *args将参数打包成tuple给函数体调用，**kwargs将参数打包关键字参数成dict给函数体调用
        """
        # split non Params kwargs
        kwargs, other_args = cls.Params.from_dict(kwargs, return_unused=True, return_instance=False)
        """这里调用的cls为BertModelLayer的部分，我们在这里查看一下对应的类的定义
        class BertModelLayer(Layer):
			class Params(BertEmbeddingsLayer.Params,
						 TransformerEncoderLayer.Params):
				pass
		调用的本质上是BertModelLayer之中的Params类
        """
        #print('kwargs = ')
        #print(kwargs)
        #print('other_args = ')
        #print(other_args)
        kwargs = dict(cls.Params.from_dict(
            params, return_unused=False).clone(**kwargs))   # override with kwargs
        #clone函数在Params之中有相应的定义
        kwargs.update(**other_args)                         # add non Params kwargs back
        #同理这里面的cls也是代指BertModelLayer的对应的cls
        #这里也是使用嵌套类操作
        #print('#############1############')
        #print('kwargs = ')
        #print(kwargs)
        #print('after update')
        #print('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!')
        #print('args = ')
        #print(args)
        #print('kwargs = ')
        #print(kwargs)
        #print('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!')
        instance = cls(*args, **kwargs)
        #cls这个参数用来表示类自身，所以这里启动的函数为WithParams __init__函数、
        #这个类自身并非WithParams，而是使用类去调用的那个类本身，比如BertModelLayer.from_params
        #此时调用的cls就是BertModelLayer的部分
        #print('&&&&&&&&&&&&&2&&&&&&&&&&&&')
        #print('instance = ')
        #print(instance)
        #不带self._construct()这里就直接运行出来了，只有一层类型数据
        return instance

        r"""
        程序解读：首先运行instance = cls(*args,**kwargs),接着内部调用def __init__(self,*args,**kwargs)
        这里面的__init__是对于它自身类的__init__,然后调用self._construct(*args,*other_args),最后调用
        相应的super.__init__(),这里我们查看一下BertModel对应的定义
        class BertModelLayer(Layer):
			class Params(BertEmbeddingsLayer.Params,
						 TransformerEncoderLayer.Params):
				pass
		每次嵌套的时候深入的为接下来的继承类，但是接着往下操作的时候调出来的参数为class Params(pp.Params):
		对应的嵌套类
        """

整理一下对应的instance的输出内容`

instance = 
<bert.embeddings.BertEmbeddingsLayer object at 0x7fa2f0619af0>
instance = 
<bert.transformer.TransformerEncoderLayer object at 0x7fa1ebd98ac0>
instance = 
<bert.model.BertModelLayer object at 0x7fa1ebc6ec10>
instance = 
<bert.embeddings.PositionEmbeddingLayer object at 0x7fa1ebdbce20>
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7fa24c441b80>
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7fa1ebac60a0>
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7fa1ec9f0280>
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7fa1ebac6bb0>
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7fa1ebaa6c40>
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7fa1ec9f0040>
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7fa1ebac6e20>
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7fa1ebaa6af0>
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7fa1ebaa6940>
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7fa1ebacca90>
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7fa1ebbed6d0>
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7fa1ebbedb50>
(这里面总共有12个SingleTransformerEncoderLayer层)
instance = 
<bert.transformer.TransformerSelfAttentionLayer object at 0x7fa1eba4e610>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1eba4e2b0>
instance = 
<bert.attention.AttentionLayer object at 0x7fa1ebabb3d0>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1ebabb550>
instance = 
<bert.transformer.TransformerSelfAttentionLayer object at 0x7fa1ebb984f0>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1ebc75a30>
instance = 
<bert.attention.AttentionLayer object at 0x7fa1ebc08280>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1ebc084c0>
instance = 
<bert.transformer.TransformerSelfAttentionLayer object at 0x7fa1eba42670>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1ebbfce20>
instance = 
<bert.attention.AttentionLayer object at 0x7fa1ebcb2c40>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1ebcb25b0>
instance = 
<bert.transformer.TransformerSelfAttentionLayer object at 0x7fa1ebbd6c70>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1ebc9ff10>
instance = 
<bert.attention.AttentionLayer object at 0x7fa1eb9d4f40>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1eba0e310>
instance = 
<bert.transformer.TransformerSelfAttentionLayer object at 0x7fa217f5a880>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1ebb06a90>
instance = 
<bert.attention.AttentionLayer object at 0x7fa1ebbc60d0>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1ebbc6040>
instance = 
<bert.transformer.TransformerSelfAttentionLayer object at 0x7fa1ebc972b0>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1e8076d30>
instance = 
<bert.attention.AttentionLayer object at 0x7fa1e807fb50>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1e807fee0>
instance = 
<bert.transformer.TransformerSelfAttentionLayer object at 0x7fa1ebb966a0>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1eb92ff40>
instance = 
<bert.attention.AttentionLayer object at 0x7fa1eb943280>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1eb91de20>
instance = 
<bert.transformer.TransformerSelfAttentionLayer object at 0x7fa1eb938dc0>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1d009ee20>
instance = 
<bert.attention.AttentionLayer object at 0x7fa1d00b0940>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1d00b0cd0>
instance = 
<bert.transformer.TransformerSelfAttentionLayer object at 0x7fa1d00a7ca0>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1d0020eb0>
instance = 
<bert.attention.AttentionLayer object at 0x7fa1d0028c10>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa1d0028e20>
instance = 
<bert.transformer.TransformerSelfAttentionLayer object at 0x7fa1d00a7940>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa16c55d9a0>
instance = 
<bert.attention.AttentionLayer object at 0x7fa16c5701c0>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa16c570550>
instance = 
<bert.transformer.TransformerSelfAttentionLayer object at 0x7fa16c4d7ee0>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa16c4e06a0>
instance = 
<bert.attention.AttentionLayer object at 0x7fa16c4e5790>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa16c4e5a30>
instance = 
<bert.transformer.TransformerSelfAttentionLayer object at 0x7fa16c54ed90>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa16c45afa0>
instance = 
<bert.attention.AttentionLayer object at 0x7fa16c461d30>
instance = 
<bert.transformer.ProjectionLayer object at 0x7fa16c469100>
这里面有12个神经网络层,每个神经网络层由
<TransformerSelfAttentionLayer>,<ProjectionLayer>,<AttentionLayer>,<ProjectionLayer>
组成

分析出原因主要存在于bert所有文件内部的继承类的构建之后，我们将对应的函数调用情况打标注，先查看出函数对应的调用的情况
!!!这个时候发现_construct对应的函数被调用了两次

WithParams __init__
BertModelLayer _construct
_construct


第一次_construct结束时对from_params的调用
调用bert的继承函数BertEmbeddingsLayer
from_params
WithParams __init__
BertEmbeddingsLayer _construct
_construct
先调用BertEmbeddingsLayer _construct！！！
instance = 
<bert.embeddings.BertEmbeddingsLayer object at 0x7f8559d35af0>


from_params
WithParams __init__
TransformerEncoderLayer _construct
_construct
instance = 
<bert.transformer.TransformerEncoderLayer object at 0x7f8559d35e20>
instance = 
<bert.model.BertModelLayer object at 0x7f855a068b80>
!!!这一段调用两次from_params的原因在BertModelLayer之中的_construct有写到
!!!这里调用了两次_construct函数，先调用BertModelLayer之中的_construct，再调用
WithParams之中的_construct函数

这里面我们发现使用

instance = cls(*args,**kwargs)

先调用的是BertEmbeddingsLayer _construct的对应函数，然后才调用WithParams中的_construct函数，我们在with_params.py之中加入一段打印当前类名称的语句
输出的对应结果如下：
这里面我们惊奇地发现，WithParams __init__这个初始化函数之中调用的类竟然为BertModelLayer，也就是说使用

instance = cls(*args,**kwargs)

调用的类仍然为BertModelLayer，但是调用的初始化类却为WithParams这个类，使用BertModelLayer去初始化WithParams这个类，感觉这一特殊之处正是@classmethod修饰的不需要实例化函数的特殊用法
这里我们先写一个简单的函数测试一下@classmethod之中初始化类的用法
发现这里的WithParams2.hello()无法直接调用WithParams1的类，提示没有hello的方法，因为这个时候WithParams2和WithParams1为两个不相干的类
接着我们定义一个相应的继承关系，然后再进行一次调用
发现此时Withparams可以正常调用了，并且在WithParams1之中的@classmethod调用的类为子类WithParams2的类，
这里我们检查一下，model.py之中的BertModelLayer中并没有定义过from_params的对应函数，所以这里我们大胆地猜想，BertModelLayer与WithParams的类有着某种的联系，才能够从BertModelLayer之中直接调用from_params方法
原文中调用的函数如下：

bert = BertModelLayer.from_params(bert_params, name="bert")

仔细观察BertModelLayer之中的初始化操作(没有__init__函数下面的这波class Params即为初始化操作，嵌套类的定义)
以及WithParams之中类的定义
也就是说，这里发生有联系的地方只有可能是刚开始初始化的过程中类嵌套的定义
但是由于定义的from_params这个函数是在WithParams的内部定义的，也就是说如果要BertModel能够直接调用from_params函数的时候，只能是BertModel与WithParams函数发生联系
此时我又一次陷入程序的僵局，不知道接下来该如何处理，接下来我想到使用对程序进行局部注释的方法，进而找寻注释掉哪一部分之后无法调用from_params函数
第一次：注释掉对应的Params类

class Params(pp.Params):
	def create(self,*args,**kwargs):
		return  self.__class__._outer_class_.from_params(self,*args,**kwargs)

发现仍然能够正常的调用
第二次：注释掉对应的__init_subclass__函数

def __init_subclass__(cls, **kwargs):
    print('with_params __init_subclass__')
    cls.Params._outer_class_ = cls

发现仍然不行，除此之外注释掉其他的对应函数都不成立。
当注释掉相应的函数都不能够成立的时候，我们进行相应的转换思路，
由于之前我们测试的内容是子类能够调用父类的函数，所以这里我们转换思路，一层一层去寻找相应的父类
然后我们转而去寻找Layer父类相应的代码
可以看出，Layer的子类是由pf.Layer的父类继承的，我们继续去查看pf.Layer的父类
!!!这里查找之后终于找到我们的目的地pp.WithParams了，也就是说BertModelLayer是经过pp.WithParams经过多次继承之后得到的，所以BertModelLayer这里直接调用WithParams中定义了@classmethod，并且调用之后显示的还是它自己!!!
然后我们再回到WithParams类之中，查看from_params函数的调用过程。
先看刚开始的运行结果

bert_params = 
{'initializer_range': 0.02, 'max_position_embeddings': 512, 'hidden_size': 768, 'embedding_size': None, 'project_embeddings_with_bias': True, 'vocab_size': 30522, 'use_token_type': True, 'use_position_embeddings': True, 'token_type_vocab_size': 2, 'hidden_dropout': 0.1, 'extra_tokens_vocab_size': None, 'project_position_embeddings': True, 'mask_zero': False, 'adapter_size': None, 'adapter_activation': 'gelu', 'adapter_init_scale': 0.001, 'num_heads': 12, 'size_per_head': None, 'query_activation': None, 'key_activation': None, 'value_activation': None, 'attention_dropout': 0.1, 'negative_infinity': -10000.0, 'intermediate_size': 3072, 'intermediate_activation': 'gelu', 'num_layers': 12, 'out_layer_ndxs': None, 'shared_layer': False}
from_params
WithParams __init__
BertModelLayer _construct
_construct
(使用super()._construct调用了一下with_params中的_construct函数）

这里面调用BertModelLayer的时候利用_construct调用BertModelLayer中的内容，我们查看一下BertModelLayer之中的_construct函数

    def _construct(self, **kwargs):
        print('BertModelLayer _construct')
        super()._construct(**kwargs)
        self.embeddings_layer = BertEmbeddingsLayer.from_params(
            self.params,
            name="embeddings"
        )
        # create all transformer encoder sub-layers
        self.encoders_layer = TransformerEncoderLayer.from_params(
            self.params,
            name="encoder"
        )

所以上面输出的第一个’BertModelLayer _construct’对应的是BertModelLayer _construct，第二个_construct的输出对应的是super()._construct(**kwargs)继承的_construct函数的输出
然后分别调用的BertEmbeddingsLayer.from_params以及TransformerEncoderLayer.from_params，也就是接下来对应的输出内容

from_params
WithParams __init__
!!!注意这里调用self._construct函数的时候先调用BertEmbeddingsLayer _construct
函数，然后再调用withParams之中的_construct函数，继承关系为BertEmbeddingsLayer
->Layer->pf.Layer->pp.withParams函数
BertEmbeddingsLayer _construct
_construct
instance = 



from_params
WithParams __init__
TransformerEncoderLayer _construct
_construct
instance = 

instance = 

!!!这一段调用两次from_params的原因在BertModelLayer之中的_construct有写到
!!!这里调用了两次_construct函数，先调用BertModelLayer之中的_construct，再调用
WithParams之中的_construct函数

然后我们继续查看输出
我们惊奇地发现，到这里的时候WithParams函数实际已经调用结束了，开始运行BertModelLayer build函数以及BertModelLayer call函数了，但是WithParams运行完之后好像没有调用BertModelLayer build函数和call函数的痕迹
猜想：调用完之后的Input函数调用相应的build以及call函数
于是我在输入结束的位置加上finish的操作
回到之前的调用函数之中进行相应的查看
发现这里确实是operator finish之后再对BertModelLayer build以及BertModelLayer call函数进行调用的，接下来我们对Input以及output进行相应的解读。

模型的输入以及输出

首先再keras.layers.Input与bert(input_ids)之间进行输入内容，然后我们发现后面的BertModelLayer build以及BertModelLayer call的对应函数都是BertModelLayer进行调用的，这个时候我实在搞不明白王司马output = bert(input_ids)会调用对应的BertModelLayer build以及BertModelLayer call函数，所以我在中间加入了一段代码

output = bert.build(input_ids)

发现此时无法进行运行，因为build函数需要传入的为相应的形状
接下来将处于BertModelLayer之中的build函数相应的注释掉，然后继续进行运行，发现此时程序仍然能够正常的运行，只是少了build函数调用这一块内容，
仔细观察build函数与call函数，发现他们的定义类似

def	build(self,input_shape):

def	compute_output_shape(self,input_shape):

def	call(self,inputs,mask=None,training=None):

发现他们的初始化类似，但是这里面的compute_output_shape没有被调用，而剩余的两个函数却被调用了。
猜想：使用bert(input_ids)会调用所有符合要求的函数
此种方法尝试无果之后，我又想到了另外一种方法，
打印当前函数的上一层函数的内容
打印出来的函数我并没有找到相应的位置，接着我准备找寻到相应的函数运行的范围，于是我在函数快要结束的位置打上finish
从中可以看出，build finish很快就运行结束了，而call函数运行的范围有点长
所以build函数可以不被调用，但是call函数必须被调用
但是现在仍然没有找到对应的函数调用的过程，此时我想到了使用pycharm的IDE，在pycharm的IDE之中进行相应的单步调试操作
在

output = bert(input_ids)

语句的周围打上相应的断点，然后进入语句之中查看相应的函数调用情况
发现进入函数之后运行到相应的这个位置
(注意这里查找到的内容为keras/engine/base_layer.py文件)
此时我们向上查找相应的函数，惊奇地发现了base_layer.py调用的__call__函数
旁边对应的参数标注点明了传入的对应的函数内容

args: KerasTensor(type_spec=TensorSpec(shape=(None,128),dtype=tf.int32,name=‘input_ids’),name=‘input_ids’,description=“created by layer ‘input_ids’”) kwargs: {} self:

整个函数对应的源代码内容放置在下面

  def __call__(self, *args, **kwargs):
    """Wraps `call`, applying pre- and post-processing steps.

    Arguments:
      *args: Positional arguments to be passed to `self.call`.
      **kwargs: Keyword arguments to be passed to `self.call`.

    Returns:
      Output tensor(s).

    Note:
      - The following optional keyword arguments are reserved for specific uses:
        * `training`: Boolean scalar tensor of Python boolean indicating
          whether the `call` is meant for training or inference.
        * `mask`: Boolean input mask.
      - If the layer's `call` method takes a `mask` argument (as some Keras
        layers do), its default value will be set to the mask generated
        for `inputs` by the previous layer (if `input` did come from
        a layer that generated a corresponding mask, i.e. if it came from
        a Keras layer with masking support.

    Raises:
      ValueError: if the layer's `call` method returns None (an invalid value).
      RuntimeError: if `super().__init__()` was not called in the constructor.
    """
    if not hasattr(self, '_thread_local'):
      raise RuntimeError(
          'You must call `super().__init__()` in the layer constructor.')

    # `inputs` (the first arg in the method spec) is special cased in
    # layer call due to historical reasons.
    # This special casing currently takes the form of:
    # - 'inputs' must be explicitly passed. A layer cannot have zero arguments,
    #   and inputs cannot have been provided via the default value of a kwarg.
    # - numpy/scalar values in `inputs` get converted to tensors
    # - implicit masks / mask metadata are only collected from 'inputs`
    # - Layers are built using shape info from 'inputs' only
    # - input_spec compatibility is only checked against `inputs`
    # - mixed precision casting (autocast) is only applied to `inputs`,
    #   not to any other argument.
    # - setting the SavedModel saving spec.
    inputs, args, kwargs = self._split_out_first_arg(args, kwargs)
    input_list = nest.flatten(inputs)

    # Functional Model construction mode is invoked when `Layer`s are called on
    # symbolic `KerasTensor`s, i.e.:
    # >> inputs = tf.keras.Input(10)
    # >> outputs = MyLayer()(inputs)  # Functional construction mode.
    # >> model = tf.keras.Model(inputs, outputs)
    if _in_functional_construction_mode(self, inputs, args, kwargs, input_list):
      return self._functional_construction_call(inputs, args, kwargs,
                                                input_list)

    # Maintains info about the `Layer.call` stack.
    call_context = base_layer_utils.call_context()

    # Accept NumPy and scalar inputs by converting to Tensors.
    if any(isinstance(x, (
        np_arrays.ndarray, np.ndarray, float, int)) for x in input_list):
      inputs = nest.map_structure(_convert_numpy_or_python_types, inputs)
      input_list = nest.flatten(inputs)

    # Handle `mask` propagation from previous layer to current layer. Masks can
    # be propagated explicitly via the `mask` argument, or implicitly via
    # setting the `_keras_mask` attribute on the inputs to a Layer. Masks passed
    # explicitly take priority.
    input_masks, mask_is_implicit = self._get_input_masks(
        inputs, input_list, args, kwargs)
    if self._expects_mask_arg and mask_is_implicit:
      kwargs['mask'] = input_masks

    # Training mode for `Layer.call` is set via (in order of priority):
    # (1) The `training` argument passed to this `Layer.call`, if it is not None
    # (2) The training mode of an outer `Layer.call`.
    # (3) The default mode set by `tf.keras.backend.set_learning_phase` (if set)
    # (4) Any non-None default value for `training` specified in the call
    #  signature
    # (5) False (treating the layer as if it's in inference)
    args, kwargs, training_mode = self._set_training_mode(
        args, kwargs, call_context)

    # Losses are cleared for all sublayers on the outermost `Layer.call`.
    # Losses are not cleared on inner `Layer.call`s, because sublayers can be
    # called multiple times.
    if not call_context.in_call:
      self._clear_losses()

    eager = context.executing_eagerly()
    with call_context.enter(
        layer=self,
        inputs=inputs,
        build_graph=not eager,
        training=training_mode):

      if self._autocast:
        inputs = self._maybe_cast_inputs(inputs, input_list)

      input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
      if eager:
        call_fn = self.call
        name_scope = self._name
      else:
        name_scope = self._name_scope()  # Avoid autoincrementing.
        call_fn = self._autographed_call()

      with ops.name_scope_v2(name_scope):
        if not self.built:
          self._maybe_build(inputs)

        with autocast_variable.enable_auto_cast_variables(
            self._compute_dtype_object):
          outputs = call_fn(inputs, *args, **kwargs)

        if self._activity_regularizer:
          self._handle_activity_regularization(inputs, outputs)
        if self._supports_masking:
          self._set_mask_metadata(inputs, outputs, input_masks, not eager)
        if self._saved_model_inputs_spec is None:
          self._set_save_spec(inputs)

        return outputs

仔细看代码之中对应的标注：

"""Wraps `call`, applying pre- and post-processing steps.

    Arguments:
      *args: Positional arguments to be passed to `self.call`.
      **kwargs: Keyword arguments to be passed to `self.call`.

    Returns:
      Output tensor(s).

    Note:
      - The following optional keyword arguments are reserved for specific uses:
        * `training`: Boolean scalar tensor of Python boolean indicating
          whether the `call` is meant for training or inference.
        * `mask`: Boolean input mask.
      - If the layer's `call` method takes a `mask` argument (as some Keras
        layers do), its default value will be set to the mask generated
        for `inputs` by the previous layer (if `input` did come from
        a layer that generated a corresponding mask, i.e. if it came from
        a Keras layer with masking support.

    Raises:
      ValueError: if the layer's `call` method returns None (an invalid value).
      RuntimeError: if `super().__init__()` was not called in the constructor.
    """

也就是说，位置函数和关键字函数都会优先调用self.__call__函数，那么为什么会调用__call__函数呢？
call()是可调用方法，一旦实现该方法，我们的类对象在某些行为上可以表现的和函数一样。可以直接通过类对象object（）进行调用。
比如如下的函数运行：

此时这里使用obj = myObject(20)实例化了之后，直接调用obj()实例化对象之后会调用相应的__call__函数
这里我们也可以明白平时为什么可以直接使用例如LSTM(32)(input)这种形式来添加网络层，这种形式的本质是：

output = LSTM(32)(input)
上述代码等价于
mylayer = LSTM(32)
output = mylayer.__call__(input)

回到我们调用的函数之中，

output = bert(input_ids)
是不是就可以进行替换成为
output = bert.__call__(input_ids)

修改之后对应的运行结果如下：
发现这里先运行的为base_layer __call__函数，然后再调用BertModelLayer build函数，调用完BertModelLayer build函数之后再去调用BertModelLayer call函数，
(这中间的_maybe_build函数为调用build的对应的函数，BertModelLayer call之中的converted_call为当前运行的BertModelLayer call函数，tf__call为BertModelLayer call上面的对应函数)
在base_layer.py之中搜索build函数，看能否找到对应的build函数的调用
发现了这之中_maybe_build函数的调用过程，然后去查看_maybe_build函数，_maybe_build函数也在base_layer.py之中
继续往下查找，能够查找到build函数的调用过程
然后我们查看程序调用的过程
首先调用

output = bert.__call__(input_ids)

对应程序语句的时候，查看BertModelLayer对应的py文件，发现里面并没有定义相应的__call__函数，此时会调用父类中的__call__函数，即base_layer __call__函数
这里我写一个例子进行相应的验证
newObject当中没有common_method函数，此时就会调用父类myObject之中定义的common_method函数
接下来我们进入base_layer.py文件之中的部分，查看调用BertModelLayer build部分的函数

先查看一段程序的输出部分的内容
可以看出来，这里调用base_layer __call__输出一次内容之后没有等到base_layer call finish就继续调用下一次的base_layer __call的内容了，那么问题来了，第一次base_layer __call__与第二次base_layer __call__函数之间有BertModelLayer build函数的调用，这里究竟是如何发生调用的呢？
我们在__call__函数中间加上相应的断点，从而确定程序的运行顺序

  def __call__(self, *args, **kwargs):
    print('base_layer __call__')
    """Wraps `call`, applying pre- and post-processing steps.

    Arguments:
      *args: Positional arguments to be passed to `self.call`.
      **kwargs: Keyword arguments to be passed to `self.call`.
      *args：位置参数，**kwargs:关键字参数
    Returns:
      Output tensor(s).

    Note:
      - The following optional keyword arguments are reserved for specific uses:
        * `training`: Boolean scalar tensor of Python boolean indicating
          whether the `call` is meant for training or inference.
        training:Python的boolean张量，表示'call'函数是用于训练还是用于推断。
        * `mask`: Boolean input mask.
      - If the layer's `call` method takes a `mask` argument (as some Keras
        layers do), its default value will be set to the mask generated
        for `inputs` by the previous layer 
        如果层的调用方法采用'mask'参数(如某些Keras层所做)，则其默认值将设置为前一层为'inputs'
        产生出来的mask。
        (if `input` did come from  a layer that generated a corresponding mask, i.e. if it came from
        a Keras layer with masking support.
        如果'input'确实来自layer产生相应的mask，即它来自一个keras层有对应的masking支持。
    Raises:
      ValueError: if the layer's `call` method returns None (an invalid value).
      RuntimeError: if `super().__init__()` was not called in the constructor.
    """
    if not hasattr(self, '_thread_local'):
      raise RuntimeError(
          'You must call `super().__init__()` in the layer constructor.')
    # `inputs` (the first arg in the method spec) is special cased in
    # layer call due to historical reasons.
    # This special casing currently takes the form of:
    # - 'inputs' must be explicitly(明确地) passed. A layer cannot have zero arguments,
    #   and inputs cannot have been provided via the default value of a kwarg.
    # - numpy/scalar values in `inputs` get converted to tensors
    # - implicit masks / mask metadata(元数据) are only collected from 'inputs`
    # - Layers are built using shape info from 'inputs' only
    # - input_spec compatibility is only checked against `inputs`
    # - mixed precision casting (autocast) is only applied to `inputs`,
    #   not to any other argument.
    # - setting the SavedModel saving spec.
    inputs, args, kwargs = self._split_out_first_arg(args, kwargs)
    input_list = nest.flatten(inputs)
    print('situation1')
    # Functional Model construction mode is invoked when `Layer`s are called on
    # symbolic `KerasTensor`s, i.e.:
    # >> inputs = tf.keras.Input(10)
    # >> outputs = MyLayer()(inputs)  # Functional construction mode.
    # >> model = tf.keras.Model(inputs, outputs)
    if _in_functional_construction_mode(self, inputs, args, kwargs, input_list):
      return self._functional_construction_call(inputs, args, kwargs,
                                                input_list)
    print('situation2')
    # Maintains info about the `Layer.call` stack.
    call_context = base_layer_utils.call_context()

    # Accept NumPy and scalar inputs by converting to Tensors.
    if any(isinstance(x, (
        np_arrays.ndarray, np.ndarray, float, int)) for x in input_list):
      inputs = nest.map_structure(_convert_numpy_or_python_types, inputs)
      input_list = nest.flatten(inputs)
    print('situation3')
    # Handle `mask` propagation from previous layer to current layer. Masks can
    # be propagated explicitly via the `mask` argument, or implicitly via
    # setting the `_keras_mask` attribute on the inputs to a Layer. Masks passed
    # explicitly take priority.
    input_masks, mask_is_implicit = self._get_input_masks(
        inputs, input_list, args, kwargs)
    if self._expects_mask_arg and mask_is_implicit:
      kwargs['mask'] = input_masks
    print('situation4')
    # Training mode for `Layer.call` is set via (in order of priority):
    # (1) The `training` argument passed to this `Layer.call`, if it is not None
    # (2) The training mode of an outer `Layer.call`.
    # (3) The default mode set by `tf.keras.backend.set_learning_phase` (if set)
    # (4) Any non-None default value for `training` specified in the call
    #  signature
    # (5) False (treating the layer as if it's in inference)
    args, kwargs, training_mode = self._set_training_mode(
        args, kwargs, call_context)
    print('situation5')
    # Losses are cleared for all sublayers on the outermost `Layer.call`.
    # Losses are not cleared on inner `Layer.call`s, because sublayers can be
    # called multiple times.
    if not call_context.in_call:
      self._clear_losses()
    print('situation6')
    eager = context.executing_eagerly()
    with call_context.enter(
        layer=self,
        inputs=inputs,
        build_graph=not eager,
        training=training_mode):
      if self._autocast:
        inputs = self._maybe_cast_inputs(inputs, input_list)
      print('situation7')
      input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
      if eager:
        call_fn = self.call
        name_scope = self._name
      else:
        name_scope = self._name_scope()  # Avoid autoincrementing.
        call_fn = self._autographed_call()
      print('$$$$$$$$$$$$$$')
      with ops.name_scope_v2(name_scope):
        if not self.built:
          self._maybe_build(inputs)
        #这里调用BertModelLayer之中相应的build函数
        print('call_fn')
        with autocast_variable.enable_auto_cast_variables(
            self._compute_dtype_object):
          outputs = call_fn(inputs, *args, **kwargs)
        #这里的call_fn只是简单的返回inputs的内容
        if self._activity_regularizer:
          print('_handle_activity_regularizer')
          self._handle_activity_regularization(inputs, outputs)
        if self._supports_masking:
          print('_set_mask_metadata')
          self._set_mask_metadata(inputs, outputs, input_masks, not eager)
        if self._saved_model_inputs_spec is None:
          print('_set_save_spec')
          self._set_save_spec(inputs)
        print('__call__ finish')
        return outputs

再次查看函数当前的输出情况
此时我们对于输出的内容进行分析：
首先，第一个BertModelLayer之中调用了一次_maybe_build函数，但是由于种种原因未能运行到底，第一个_maybe_build finish的输出应该对应的是上面situation1下面由_functional_construction_call之中调用的_maybe_build之中输出的内容
这点可以通过下面model.py之中的build的函数调用来看出，这里附上build的函数的调用过程

    # noinspection PyAttributeOutsideInit
    def build(self, input_shape):
        print('BertModelLayer build')
        print(sys._getframe().f_code.co_name)
        print(sys._getframe().f_back.f_code.co_name)
        print('input_shape = ')
        print(input_shape)
        if isinstance(input_shape, list):
            assert len(input_shape) == 2
            input_ids_shape, token_type_ids_shape = input_shape
            self.input_spec = [keras.layers.InputSpec(shape=input_ids_shape),
                               keras.layers.InputSpec(shape=token_type_ids_shape)]
        else:
            input_ids_shape = input_shape
            self.input_spec = keras.layers.InputSpec(shape=input_ids_shape)
        super(BertModelLayer, self).build(input_shape)
        print('BertModelLayer build finish')

结合调用过程之中的输出内容
可以看出来，BertModelLayer build函数之中的

print(sys._getframe().f_code.co_name)

输出的函数内容为build函数，而

print(sys._getframe().f_back.f_code.co_name)

输出的内容为_maybe_build的相应的函数，这里说明了build的上一层函数为_maybe_build函数，而_maybe_build函数去调用的build函数

    def call(self, inputs, mask=None, training=None):
        print('BertModelLayer call')
        print(sys._getframe().f_code.co_name)
        #查看当前的函数
        print(sys._getframe().f_back.f_code.co_name)
        if mask is None:
            mask = self.embeddings_layer.compute_mask(inputs)

        embedding_output = self.embeddings_layer(inputs, mask=mask, training=training)
        output           = self.encoders_layer(embedding_output, mask=mask, training=training)
        print('BertModelLayer call finish')
        return output   # [B, seq_len, hidden_size]

可以看出这里先调用self.embeddings_layer.compute_mask函数，调用完成之后再使用类似的output = bert(input_ids)的两个相应的编程语句

        embedding_output = self.embeddings_layer(inputs, mask=mask, training=training)
        output           = self.encoders_layer(embedding_output, mask=mask, training=training)

接下来回到base_layer之中，进入相应的base_layer底层进行运行。
首先我们分析一下bert需要的结构，该程序中bert的结构由两个部分组成，一个部分为BertEmbeddingLayer层，另外一个部分为TransformerEncoder层，这里我们查看一下bert层相应的结构内容
向量表示的部分为BertEmbeddingLayer的对应层，接下来由12个层组成相应的TransformerEncoder层，然后输出最终相应的结果。
然而，一直以来我都有的疑问是：keras是如何通过调用Model()来实现图(graph)的构建的呢?
下面的文章内容转载自https://blog.csdn.net/Flag_ing/article/details/107848319#commentBox
首先我们查看topology.py,topology.py定义了keras构建模型流图(即模型的网络结构)时的拓扑规则。

"""This module is deprecated, but kept around for backwards compatibility.
"""
from .base_layer import Layer, Node, InputSpec
from .input_layer import Input, InputLayer
from .network import Network, get_source_inputs

也就是说，topology.py构成了这样的一种类的继承、调用的关系
基类Layer()定义了网络结构中所有层的基本属性及方法，除了图中提到的class外，keras的其他所有层类(如全连接层Dense、二维卷积层Conv2D、泛型函数Model等)都是直接或间接继承Layer()类，并通过overwrite有关Methods或增添相应的Properties及Methods而来的。
InputSpec()类用以指定网络中每一个layer的input tensor的维度数(ndim)、数据类型(dtype)、维度(shape)等属性，并在构建有关layer时用以检测该layer输入tensor(s)的兼容性。
InputLayer()类定义了每个模型网络结构的原始输入tensor的有关信息。
Node()类用以在每个layer中创建node节点，使得该layer跟与其前后有关的layers关联起来。具体怎么关联，下面会有详细的介绍。
Container()类通过递归的方式，从模型的最终输出layer(s)层的output tensor(s)开始，根据每个layer的node信息反向递归找出模型的所有网络层信息并构建模型的整体网络图(graph)(有点类似C语言中链表结构的递归查询)。上述提到的泛型函数Model()就是通过继承Container()类而来的。
继承了Layer类的子类一般要实现重写以下的methods
1、init(): Defines custom layer attributes, and creates layer state variables that do not depend on input shapes, using add_weight().

定义自定义层属性，并创建不依赖于输入形状的层状态变量。

2、build(self, input_shape): This method can be used to create weights that depend on the shape(s) of the input(s), using add_weight(). call() will automatically build the layer (if it has not been built yet) by calling build().

用来创建依赖于输入shape的weights，即每层的权重。在layer的创建中，build() methods通过调用add_weight()方法来创建layer的weights。在基类Layer中build() methods为空：

class Layer(object):
  """   代码其余部分省略   """
    def build(self, input_shape):
        """Creates the layer weights.
        Must be implemented on all layers that have weights.
        # Arguments
            input_shape: Keras tensor (future input to layer)
                or list/tuple of Keras tensors to reference
                for weight shape computations.
        """
        self.built = True

因此build() methods需要在在继承了Layer class的子类中实现，以class _Conv(Layer):类为例，其实现build() method的方式为：

    class _Conv(Layer):
    """   代码其余部分省略   """
    def build(self, input_shape):
            if self.data_format == 'channels_first':
                channel_axis = 1
            else:
                channel_axis = -1
            if input_shape[channel_axis] is None:
                raise ValueError('The channel dimension of the inputs '
                                 'should be defined. Found `None`.')
            input_dim = input_shape[channel_axis]
            kernel_shape = self.kernel_size + (input_dim, self.filters)
            self.kernel = self.add_weight(shape=kernel_shape,
                                          initializer=self.kernel_initializer,
                                          name='kernel',
                                          regularizer=self.kernel_regularizer,
                                          constraint=self.kernel_constraint)
            #调用add_weight()方法来创建layer的weights权重
            if self.use_bias:
                self.bias = self.add_weight(shape=(self.filters,),
                                            initializer=self.bias_initializer,
                                            name='bias',
                                            regularizer=self.bias_regularizer,
                                            constraint=self.bias_constraint)
            else:
                self.bias = None
            # Set input spec.
            self.input_spec = InputSpec(ndim=self.rank + 2,
                                        axes={
     channel_axis: input_dim})
            self.built = True

这里我们查看相应的bert之中的build函数的内容

# noinspection PyAttributeOutsideInit
def build(self, input_shape):
    print('BertModelLayer build')
    print(sys._getframe().f_code.co_name)
    print(sys._getframe().f_back.f_code.co_name)
    print('input_shape = ')
    print(input_shape)
    if isinstance(input_shape, list):
        assert len(input_shape) == 2
        input_ids_shape, token_type_ids_shape = input_shape
        self.input_spec = [keras.layers.InputSpec(shape=input_ids_shape),
                           keras.layers.InputSpec(shape=token_type_ids_shape)]
    else:
        input_ids_shape = input_shape
        self.input_spec = keras.layers.InputSpec(shape=input_ids_shape)
    super(BertModelLayer, self).build(input_shape)
    print('BertModelLayer build finish')
这里的build函数并未添加相应的权重，但是加入了InputSpec函数作为输入的self.input_spec，这点与上面的_Conv类是一致的

3、call(self, *args, **kwargs): Called in call after making sure build() has been called. call() performs the logic of applying the layer to the input tensors (which should be passed in as argument). Two reserved keyword arguments you can optionally use in call() are: - training (boolean, whether the call is in inference mode or training mode) - mask (boolean tensor encoding masked timesteps in the input, used in RNN layers)

在__call__ method中，build() method被调用之后调用call() method，用以处理该layer的input tensors，并输出相应的output tensors。类似于上述build method的实现原理，在Layer类中，并没有对call method进行具体的实现：

class Layer(object):
"""   代码其余部分省略   """
    def call(self, inputs, **kwargs):
        """This is where the layer's logic lives.
        # Arguments
            inputs: Input tensor, or list/tuple of input tensors.
            **kwargs: Additional keyword arguments.
        # Returns
            A tensor or list/tuple of tensors.
        """
        return inputs

因此需要在Layer的子类中对其进行具体的实现，同样以class _Conv(Layer):类为例，其实现call() method的方式为：

class _Conv(Layer):
"""   省略其余部分代码   """
    def call(self, inputs):
        if self.rank == 1:
        """   省略该部分代码   """
        if self.rank == 2:
            outputs = K.conv2d(
                inputs,
                self.kernel,
                strides=self.strides,
                padding=self.padding,
                data_format=self.data_format,
                dilation_rate=self.dilation_rate)
        if self.rank == 3:
        """   省略该部分代码   """
        if self.use_bias:
            outputs = K.bias_add(
                outputs,
                self.bias,
                data_format=self.data_format)
        if self.activation is not None:
            return self.activation(outputs)
        return outputs

call函数相当于对数据进行处理，由Input的内容转为output的内容进行相应的输出
然后这里我们查看bert之中定义的call的相应的函数

def call(self, inputs, mask=None, training=None):
    print('BertModelLayer call')
    print(sys._getframe().f_code.co_name)
    #查看当前的函数
    print(sys._getframe().f_back.f_code.co_name)
    if mask is None:
        mask = self.embeddings_layer.compute_mask(inputs)

    embedding_output = self.embeddings_layer(inputs, mask=mask, training=training)
    output           = self.encoders_layer(embedding_output, mask=mask, training=training)
    print('BertModelLayer call finish')
    return output   # [B, seq_len, hidden_size]

先使用self.embeddings_layer得到一个中间的输出embedding_output，然后再用embedding_output得到最终的输出output的内容
在这个位置的时候，发现self.embeddings_layer以及self.encoders_layer函数的调用需要使用到之前的_construct函数

# noinspection PyUnusedLocal
def _construct(self, **kwargs):
    print('BertModelLayer _construct')
    super()._construct(**kwargs)
    self.embeddings_layer = BertEmbeddingsLayer.from_params(
        self.params,
        name="embeddings"
    )
    # create all transformer encoder sub-layers
    self.encoders_layer = TransformerEncoderLayer.from_params(
        self.params,
        name="encoder"
    )
#这里通过from_params调用出来相应的参数，感觉这里的from_params类似于add_weights的函数
    self.support_masking  = True
    print('BertModelLayer finish')

4、get_config(self): Returns a dictionary containing the configuration used to initialize this layer. If the keys differ from the arguments in init, then override from_config(self) as well. This method is used when saving the layer or a model that contains this layer.

在开始例子中我们使用到的泛型函数Model()其实就是继承了Container()类而来的，其类原型是Model(Container)。而另外一种序列模型Sequential()就是继承了Model()而来的，其原型是Sequential(Model)。
Node、layer、tensor在网络构建过程中的关系
node、layer、tensor的关系时相辅相成的。在模型的网络流图构建过程中，tensor(s)在每个layer中流动，在流动的过程中，每一个layer都会创建并绑定一个或多个(一般共享层会有多个node)node，node把这些tensor与layer的前后关系联系起来，进而实现tensor、layer、node三者的相互联系。
图中所示模型的网络流图共有7个网络层Layer A~G，并假设LayerF为连结(concatenate)层而不是共享层，本次不讨论含有共享层的情况。每个layer都绑定有一个Node结点，网络的输入层为Layer A(该输入层就是使用上述的InputLayer()创建的)，输出层为Layer G。IN1为LayerA的输入tensor，OUT1为LayerA的输出tensor也即LayerB和LayerC的输入tensor IN2、IN3，其他同理。
1.Node
node并不参与计算，只是用来记录各layer、tensor之间关系的一个桥梁。以Layer F的node6为例说明其Node关联LayerE、LayerD及Layer G的过程，其他的与此相同。
首先，node的常用属性如下图所示：
这些属性中只有"outbound_layer"是单数形式，其他的都是复数形式。这是由于outbound_layer代表将input tensors转化成output tensors。上图可以看出来，完成IN6到OUT6转换过程的layer正是Layer F本身，而这个层还有且只能有一个，这一点并不难理解。
首先，大家可能注意到这些属性中只有“outbound_layer”是单数形式，其他的都是复数形式。这是由于outbound_layer代表将input tensors转化成output tensors的层，上图可以看出在Layer F中，完成IN6到OUT6转换过程的layer正是Layer F本身，而这个层还只有且只能有一个，这一点不难理解。

inbound_layers表示node6的入站层，可以有多个，图中不难看出，node6的inbound_layers为Layer E和Layer D两个，即inbound_layers[] = [Layer E, Layer D]。

node_indices表示Layer F中node的索引，当一个Layer中有2个或2个以上的node时，一般对应着共享层，此处只讨论非共享层的情况，因此每个Layer只有一个node，因此这种情况下node_indices为0，即一个。

tensor_indices表示Layer F中output_tensors中每个tensor的索引，对应多输出的情况。假设图中的layer全部是单输出layer，所以node6的tensor_indices为0，即只输出一个tensor。

input_tensors与output_tensors分表示该node的输入tensor与输出tensor。从图中可以看出，node6的input_tensors有两个，output_tensors有一个。

input_shapes与output_shapes分表示input_tensors与output_tensors的tensor的shape，这点容易理解。

然后我们插播一些非常重要的注释，作为对于base_layer.py文件相应的解读，来看看当构建一个layer层的时候需要如何构建

  A layer is a callable object that takes as input one or more tensors and
  that outputs one or more tensors. It involves *computation*, defined
  in the `call()` method, (computation在call()函数之中定义)
  and a *state* (weight variables), defined
  either in the constructor `__init__()` or in the `build()` method.
  (weight变量在__init__()或者是在build()函数之中定义)
  Users will just instantiate a layer and then treat it as a callable.

  Arguments(参数):
    trainable: Boolean, whether the layer's variables should be trainable.
    trainable：layer's变量是否需要被训练
    name: String name of the layer.
    dtype: The dtype of the layer's computations and weights (default of
      `None` means use `tf.keras.backend.floatx` in TensorFlow 2, or the type
      of the first input in TensorFlow 1).
    dynamic: Set this to `True` if your layer should only be run eagerly, and
      should not be used to generate a static computation graph.
      静态图是一种声明式(declaretive)的编程模型，eagerly模型的时候计算图将计算的定义和
      执行分隔开，这是一种声明式(declaretive)的编程模型。
      如果为True代表你的层只能运行eagerly的模型
      This would be the case for a Tree-RNN or a recursive network(递归神经网络),
      for example, or generally for any layer that manipulates tensors
      using Python control flow. If `False`, we assume that the layer can
      safely be used to generate a static computation graph.
      如果为False的情况下可以运行静态图模型
  Attributes(属性):
    name: The name of the layer (string).
    dtype: The dtype of the layer's computations and weights. If mixed
      precision is used with a `tf.keras.mixed_precision.experimental.Policy`,
      this is instead just the dtype of the layer's weights, as the computations
      are done in a different dtype.
    trainable_weights: List of variables to be included in backprop.
    non_trainable_weights: List of variables that should not be
      included in backprop.(不训练的权重)
    weights: The concatenation of the lists trainable_weights and
      non_trainable_weights (in this order).
    trainable: Whether the layer should be trained (boolean), i.e. whether
      its potentially-trainable weights should be returned as part of
      `layer.trainable_weights`.
    trainable:the layer应该被训练吗(boolean)，即是否它的潜在训练权重应该被作为
    layer.trainable_weights进行返回
    input_spec: Optional (list of) `InputSpec` object(s) specifying the
      constraints on inputs that can be accepted by the layer.
    指明可以被layer接受的输入的限制
    input_spec()

  We recommend that descendants of `Layer` implement the following methods:

  * `__init__()`: Defines custom layer attributes, and creates layer state
    variables that do not depend on input shapes, using `add_weight()`.
  * `build(self, input_shape)`: This method can be used to create weights that
    depend on the shape(s) of the input(s), using `add_weight()`. `__call__()`
    will automatically build the layer (if it has not been built yet) by
    calling `build()`.
  * `call(self, *args, **kwargs)`: Called in `__call__` after making sure
    `build()` has been called. `call()` performs the logic of applying the
    layer to the input tensors (which should be passed in as argument).
    Two reserved keyword arguments you can optionally use in `call()` are:
      - `training` (boolean, whether the call is in
        inference mode or training mode)
      - `mask` (boolean tensor encoding masked timesteps in the input, used
        in RNN layers)
  * `get_config(self)`: Returns a dictionary containing the configuration used
    to initialize this layer. If the keys differ from the arguments
    in `__init__`, then override `from_config(self)` as well.
    This method is used when saving
    the layer or a model that contains this layer.

  Examples:

  Here's a basic example: a layer with two variables, `w` and `b`,
  that returns `y = w . x + b`.
  It shows how to implement `build()` and `call()`.
  Variables set as attributes of a layer are tracked as weights
  of the layers (in `layer.weights`).

介绍完调用的函数以及相应的layer层的定义之后，我们接着回到函数，去查看程序的运行过程，这里附上我改写的base_layer.py对应的函数内容:

  def __call__(self, *args, **kwargs):
    """Wraps `call`, applying pre- and post-processing steps.

    Arguments:
      *args: Positional arguments to be passed to `self.call`.
      **kwargs: Keyword arguments to be passed to `self.call`.

    Returns:
      Output tensor(s).

    Note:
      - The following optional keyword arguments are reserved for specific uses:
        * `training`: Boolean scalar tensor of Python boolean indicating
          whether the `call` is meant for training or inference.
        * `mask`: Boolean input mask.
      - If the layer's `call` method takes a `mask` argument (as some Keras
        layers do), its default value will be set to the mask generated
        for `inputs` by the previous layer (if `input` did come from
        a layer that generated a corresponding mask, i.e. if it came from
        a Keras layer with masking support.

    Raises:
      ValueError: if the layer's `call` method returns None (an invalid value).
      RuntimeError: if `super().__init__()` was not called in the constructor.
    """
    print('base_layer __call__')
    if not hasattr(self, '_thread_local'):
      raise RuntimeError(
          'You must call `super().__init__()` in the layer constructor.')

    # `inputs` (the first arg in the method spec) is special cased in
    # layer call due to historical reasons.
    # This special casing currently takes the form of:
    # - 'inputs' must be explicitly passed. A layer cannot have zero arguments,
    #   and inputs cannot have been provided via the default value of a kwarg.
    # - numpy/scalar values in `inputs` get converted to tensors
    # - implicit masks / mask metadata are only collected from 'inputs`
    # - Layers are built using shape info from 'inputs' only
    # - input_spec compatibility is only checked against `inputs`
    # - mixed precision casting (autocast) is only applied to `inputs`,
    #   not to any other argument.
    # - setting the SavedModel saving spec.
    inputs, args, kwargs = self._split_out_first_arg(args, kwargs)
    input_list = nest.flatten(inputs)
    print('situation1')
    # Functional Model construction mode is invoked when `Layer`s are called on
    # symbolic `KerasTensor`s, i.e.:
    # >> inputs = tf.keras.Input(10)
    # >> outputs = MyLayer()(inputs)  # Functional construction mode.
    # >> model = tf.keras.Model(inputs, outputs)
    if _in_functional_construction_mode(inputs, args, kwargs, input_list):
      return self._functional_construction_call(inputs, args, kwargs,
                                                input_list)
    print('situation2')
    # Maintains info about the `Layer.call` stack.
    call_context = base_layer_utils.call_context()

    # Accept NumPy and scalar inputs by converting to Tensors.
    if any(isinstance(x, (np.ndarray, float, int)) for x in input_list):
      inputs = nest.map_structure(_convert_numpy_or_python_types, inputs)
      input_list = nest.flatten(inputs)
    print('situation3')
    # Handle `mask` propagation from previous layer to current layer. Masks can
    # be propagated explicitly via the `mask` argument, or implicitly via
    # setting the `_keras_mask` attribute on the inputs to a Layer. Masks passed
    # explicitly take priority.
    input_masks, mask_is_implicit = self._get_input_masks(
        inputs, input_list, args, kwargs)
    if self._expects_mask_arg and mask_is_implicit:
      kwargs['mask'] = input_masks
    print('situation4')
    # Training mode for `Layer.call` is set via (in order of priority):
    # (1) The `training` argument passed to this `Layer.call`, if it is not None
    # (2) The training mode of an outer `Layer.call`.
    # (3) The default mode set by `tf.keras.backend.set_learning_phase` (if set)
    # (4) Any non-None default value for `training` specified in the call
    #  signature
    # (5) False (treating the layer as if it's in inference)
    args, kwargs, training_mode = self._set_training_mode(
        args, kwargs, call_context)
    print('situation5')
    # Losses are cleared for all sublayers on the outermost `Layer.call`.
    # Losses are not cleared on inner `Layer.call`s, because sublayers can be
    # called multiple times.
    if not call_context.in_call:
      self._clear_losses()
    print('situation6')
    eager = context.executing_eagerly()
    with call_context.enter(
        layer=self,
        inputs=inputs,
        build_graph=not eager,
        training=training_mode):

      if self._autocast:
        inputs = self._maybe_cast_inputs(inputs, input_list)
      print('situation7')
      if eager:
        call_fn = self.call
        name_scope = self._name
      else:
        input_spec.assert_input_compatibility(self.input_spec, inputs,
                                              self.name)
        name_scope = self._name_scope()  # Avoid autoincrementing.
        call_fn = self._autographed_call()
      print('$$$$$$$$$$$$$$$$$')
      with ops.name_scope_v2(name_scope):
        if not self.built:
          self._maybe_build(inputs)
        print('call_fn')
        with ops.enable_auto_cast_variables(self._compute_dtype_object):
          outputs = call_fn(inputs, *args, **kwargs)

        if self._activity_regularizer:
          print('_handle_activity_regularizer')
          self._handle_activity_regularization(inputs, outputs)
        if self._supports_masking:
          print('_set_mask_metadata')
          self._set_mask_metadata(inputs, outputs, input_masks, not eager)
        if self._saved_model_inputs_spec is None:
          print('_set_save_spec')
          self._set_save_spec(inputs)
        print('__call__ finish')
        return outputs

以及得到的刚开始一段的输出的内容

max_seq_len = 
128
adapter_size = 
None
flags = 0
bc = 
{
     'attention_probs_dropout_prob': 0.1, 'hidden_act': 'gelu', 'hidden_dropout_prob': 0.1, 'hidden_size': 768, 'initializer_range': 0.02, 'intermediate_size': 3072, 'max_position_embeddings': 512, 'num_attention_heads': 12, 'num_hidden_layers': 12, 'type_vocab_size': 2, 'vocab_size': 30522, 'ln_type': None, 'embedding_size': None}
begin map_to_stock_
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
bert_params = 
{
     'initializer_range': 0.02, 'max_position_embeddings': 512, 'hidden_size': 768, 'embedding_size': None, 'project_embeddings_with_bias': True, 'vocab_size': 30522, 'use_token_type': True, 'use_position_embeddings': True, 'token_type_vocab_size': 2, 'hidden_dropout': 0.1, 'extra_tokens_vocab_size': None, 'project_position_embeddings': True, 'mask_zero': False, 'adapter_size': None, 'adapter_activation': 'gelu', 'adapter_init_scale': 0.001, 'num_heads': 12, 'size_per_head': None, 'query_activation': None, 'key_activation': None, 'value_activation': None, 'attention_dropout': 0.1, 'negative_infinity': -10000.0, 'intermediate_size': 3072, 'intermediate_activation': 'gelu', 'num_layers': 12, 'out_layer_ndxs': None, 'shared_layer': False}
from_params
WithParams __init__
BertModelLayer
BertModelLayer _construct
_construct
from_params
WithParams __init__
BertEmbeddingsLayer
BertEmbeddingsLayer _construct
_construct
instance = 
<bert.embeddings.BertEmbeddingsLayer object at 0x7efd70bc6130>
from_params
WithParams __init__
TransformerEncoderLayer
TransformerEncoderLayer _construct
_construct
instance = 
<bert.transformer.TransformerEncoderLayer object at 0x7efd70bc68b0>
BertModelLayer finish
instance = 
<bert.model.BertModelLayer object at 0x7efd70a00e50>
!!!operator finish!!!
input_ids = 
Tensor("input_ids:0", shape=(None, 128), dtype=int32)
!!!keras output!!!




bert = 
<bert.model.BertModelLayer object at 0x7efd70a00e50>
base_layer __call__
situation1
_functional_construction_call
BertModelLayer build
build
_maybe_build
input_shape = 
(None, 128)
BertModelLayer build finish
BertModelLayer call
converted_call
tf__call
BertEmbeddingsLayer compute_mask
base_layer __call__
situation1
_functional_construction_call
BertEmbeddingsLayer build
from_params
WithParams __init__
PositionEmbeddingLayer
_construct
instance = 
<bert.embeddings.PositionEmbeddingLayer object at 0x7efd701449d0>
WithParams __init__
LayerNormalization
_construct
BertEmbeddingsLayer call
base_layer __call__
situation1
_functional_construction_call
base_layer __call__
situation1
_functional_construction_call
PositionEmbeddingLayer build
PositionEmbeddingLayer call
base_layer __call__
situation1
_functional_construction_call
base_layer __call__
situation1
_functional_construction_call
BertEmbeddingsLayer compute_mask
base_layer __call__
situation1
_functional_construction_call
TransformerEncoderLayer build
from_params
WithParams __init__
SingleTransformerEncoderLayer
SingleTransformerEncoderLayer _construct
_construct
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7efd70d18070>
from_params
WithParams __init__
SingleTransformerEncoderLayer
SingleTransformerEncoderLayer _construct
_construct
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7efd70d18f40>
from_params
WithParams __init__
SingleTransformerEncoderLayer
SingleTransformerEncoderLayer _construct
_construct
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7efd709e85e0>
from_params
WithParams __init__
SingleTransformerEncoderLayer
SingleTransformerEncoderLayer _construct
_construct
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7efd70b85b20>
from_params
WithParams __init__
SingleTransformerEncoderLayer
SingleTransformerEncoderLayer _construct
_construct
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7efd70b85490>
from_params
WithParams __init__
SingleTransformerEncoderLayer
SingleTransformerEncoderLayer _construct
_construct
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7efd70b851c0>
from_params
WithParams __init__
SingleTransformerEncoderLayer
SingleTransformerEncoderLayer _construct
_construct
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7efd70a25b80>
from_params
WithParams __init__
SingleTransformerEncoderLayer
SingleTransformerEncoderLayer _construct
_construct
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7efd70a25c40>
from_params
WithParams __init__
SingleTransformerEncoderLayer
SingleTransformerEncoderLayer _construct
_construct
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7efd70a254f0>
from_params
WithParams __init__
SingleTransformerEncoderLayer
SingleTransformerEncoderLayer _construct
_construct
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7efd70a25340>
from_params
WithParams __init__
SingleTransformerEncoderLayer
SingleTransformerEncoderLayer _construct
_construct
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7efd70b85430>
from_params
WithParams __init__
SingleTransformerEncoderLayer
SingleTransformerEncoderLayer _construct
_construct
instance = 
<bert.transformer.SingleTransformerEncoderLayer object at 0x7efd70d187c0>
TransformerEncoderLayer call
base_layer __call__
situation1
_functional_construction_call
SingleTransformerEncoderLayer build
from_params
WithParams __init__
TransformerSelfAttentionLayer
TransformerSelfAttentionLayer _construct
_construct
instance = 
<bert.transformer.TransformerSelfAttentionLayer object at 0x7efda2b04760>
from_params
WithParams __init__
ProjectionLayer
ProjectionLayer _construct
_construct
instance = 
<bert.transformer.ProjectionLayer object at 0x7efda2b04040>
SingleTransformerEncoderLayer call
base_layer __call__
situation1
_functional_construction_call
TransformerSelfAttentionLayer build
from_params
WithParams __init__
AttentionLayer

通过输出的内容分析函数相应的运行过程：
首先进入base_layer.py之中的__call__函数的调用之中
可以看出，这里先输出’situation1’，然后调用_in_functional_construction_mode函数对输入和输出操作，这里有一段注释，我们提取出来看

    # Functional Model construction mode is invoked when `Layer`s are called on
    # symbolic `KerasTensor`s, i.e.:
    # >> inputs = tf.keras.Input(10)
    # >> outputs = MyLayer()(inputs)  # Functional construction mode.
    # >> model = tf.keras.Model(inputs, outputs)

即使用上面定义的方式的时候会直接调用self._functional_construction_call函数，而我们查看bert之中定义的内容

	input_ids      = keras.layers.Input(shape=(max_seq_len,), dtype='int32', name="input_ids")
    output         = bert(input_ids)
    cls_out = keras.layers.Lambda(lambda seq: seq[:, 0, :])(output)
    cls_out = keras.layers.Dropout(0.5)(cls_out)
    logits = keras.layers.Dense(units=768, activation="tanh")(cls_out)
    logits = keras.layers.Dropout(0.5)(logits)
    logits = keras.layers.Dense(units=2, activation="softmax")(logits)
    model = keras.Model(inputs=input_ids, outputs=logits)
    model.build(input_shape=(None, max_seq_len))

可以看出这里使用的正是keras定义的第一种方式，所以这里调用self._functional_construction_call函数，然后我们继续查看self._functional_construction_call之中的内容，看到了如下相应的调用语句
可以看出这里接着调用BertModel之中的build函数，我们查看现在相应的输出内容
然后它有一段注释，我简单的翻译了一下相应的内容，
这里面会将call_fn定义为相应的self.call函数的内容，然后下一句相应的进行调用
这里我们进入到相应的def call函数之中进行相应的查看
这里的call函数的内容为空，只是简单的返回一个inputs的内容，等着上层函数进行复写，于是便有了上层函数调用的输出内容
但是到了调用函数之中查看的时候，我发现猜想错误
BertModel之中的call函数对应内容

    def call(self, inputs, mask=None, training=None):
        print('BertModelLayer call')
        print(sys._getframe().f_code.co_name)
        #查看当前的函数
        print(sys._getframe().f_back.f_code.co_name)
        if mask is None:
            mask = self.embeddings_layer.compute_mask(inputs)

        embedding_output = self.embeddings_layer(inputs, mask=mask, training=training)
        output           = self.encoders_layer(embedding_output, mask=mask, training=training)
        print('BertModelLayer call finish')
        return output   # [B, seq_len, hidden_size]

从中可以看出，是converted_call函数调用了对应的call函数的内容，而converted_call函数上一层的函数为对应的tf__call函数，所以我猜想这里调用的为tf_convert函数
然而找寻了之后，还是没有看到有调用tf_convert函数或者是self.call函数的部分，此时我们在调用结束之后的部分加上print(‘run111111111111’)的内容，然后运行程序。
从输出的内容之中可以看出先运行run111111111111的输出操作，然后再调用BertModelLayer call函数，所以这个调用BertModelLayer call函数是在run111111111111下面的内容之中调用的
在中间加上输入输出的调用内容
发现运行的结果如下：
由此可见，调用BertEmbeddingsLayer call函数应该是在run11111111111111111以及run22222222222222222之间进行的，推测应该为中间的call_fn函数，查看对应的call_fn函数的内容，进入了api.py之中的tf_convert函数之中，

call_fn = autograph.tf_convert(self.call,ag_ctx.control_status_ctx())

这里面我们需要到call_fn的函数之中查看相应的
BertModelLayer call
converted_call
tf__call
这样几个对应函数的调用过程，注意这个tf_convert的调用函数在api.py对应的文件之中
查看api.py之中的相应的函数

def tf_convert(f, ctx, convert_by_default=True, user_requested=False):
  """Decorator that applies AutoGraph to a function.

  Use in internal APIs.

  This API is suitable for high order functions internal to the TensorFlow API,
  and more generally any function to which Autograph is not applied.

  Guidance: convert was a decorator meant for use directly by developers, and
  will be soon deprecated in favor of tf.function. tf_convert is to be called
  from high order functions internal to TF.

  Args:
    f: Callable.
    ctx: ag_ctx.ControlStatusCtx, the Autograph context in which `f` is used.
    convert_by_default: bool, whether to use AutoGraph when the context doesn't
      specify.
    user_requested: bool, whether to ignore the conversion whitelist. See
      ConversionOptions.user_requested.

  Returns:
    Either `f or the converted version of `f`.
  """

  if is_autograph_artifact(f):
    return f
  f_wrapper = f
  decorators, f = tf_decorator.unwrap(f)

  # TODO(mdan): Grab features from context.
  # Note: we pass the original context through to convert to properly handle the
  # following scenario, which can be used inside TF implementations:
  #
  #   ctx = ag_ctx.control_status_ctx()
  #   @function(autograph=False)  # Low-level graph code
  #   def inner_fn():
  #     # The context is disabled here, but should be enabled in user user_fn
  #     tf_convert(user_fn, ctx=ctx)
  if ctx.status == ag_ctx.Status.ENABLED:
    wrapper_factory = convert(
        recursive=True, user_requested=user_requested, conversion_ctx=ctx)
  elif ctx.status == ag_ctx.Status.DISABLED:
    wrapper_factory = do_not_convert
  elif ctx.status == ag_ctx.Status.UNSPECIFIED:
    if convert_by_default:
      wrapper_factory = convert(
          recursive=True, user_requested=user_requested, conversion_ctx=ctx)
    else:
      wrapper_factory = call_with_unspecified_conversion_status
  else:
    assert False, 'This switch contains all possible cases!'
  wrapper = wrapper_factory(f)

  if decorators:
    wrapper = tf_decorator.rewrap(f_wrapper, f, wrapper)

  return autograph_artifact(wrapper)

读到这里的时候我感觉自己读的内容有点过于底层了，请教了一下github上面的toolkit4nlp作者的大佬，大佬的意思是复现相应的结构并不需要去深入的读取keras之中的源代码，于是这里我转换思路，继续深入读取作者所写的代码

你可能感兴趣的:(bert源码解读)

在NLP深层语义分析中，深度学习和机器学习的区别与联系
在自然语言处理（NLP）的深层语义分析任务中，深度学习与机器学习的区别和联系主要体现在以下方面：一、核心区别特征提取方式机器学习：依赖人工设计特征（如词频、句法规则、TF-IDF等），需要领域专家对文本进行结构化处理。例如，传统情感分析需人工定义“情感词库”或通过词性标注提取关键成分。深度学习：通过神经网络自动学习多层次特征。例如，BERT等模型可从原始文本中捕获词向量、句法关系甚至篇章级语义，无
在美国，现在有超过10万台atm机允许你用借记卡购买比特币麦田财经
在美国，通过普通自动取款机购买比特币已经成为现实。这一进展预示着Genmega和LibertyX之间最近的合作关系。通过这种合作，人们可以用借记卡从多台atm机上购买比特币。通过ATM机使用借记卡购买比特币“金融时报”2018年10月15日(星期一)发布的一份新闻稿显示，该国所有的Genmega自动取款机现在基本上都是比特币自动取款机。在合作的基础上，Genmega自动取款机现在将提供Freety
多语言文本分类在AI应用中的实践 AI原生应用开发人工智能分类数据挖掘 ai
多语言文本分类在AI应用中的实践关键词：多语言文本分类、自然语言处理、机器学习、深度学习、BERT、迁移学习、跨语言模型摘要：本文深入探讨多语言文本分类在AI领域的应用实践。我们将从基础概念出发，逐步讲解其核心原理、技术架构和实现方法，并通过实际案例展示如何构建一个高效的多语言文本分类系统。文章将涵盖从传统机器学习方法到最先进的深度学习技术，特别关注跨语言迁移学习在实际业务场景中的应用。背景介绍目
敏捷史话（六）：也许这个人能拯救你的代码 —— Robert C. Martin 敏捷初级运动员
本文摘自敏捷开发。RobertC.Martin（罗伯特·C·马丁），作为世界级软件开发大师、设计模式和敏捷开发先驱、C++Report杂志前主编，也是敏捷联盟（AgileAlliance）的第一任主席，我们尊称他为“Bob大叔（UncleBob）”。如今，年逾六十的Bob大叔过着典型的“斜杠”生活，他不仅是优秀的程序员、畅销书作家、演讲家，以及视频制作者，还是一名柔术爱好者。多年学习柔术的经历，带
[学习] Hilbert变换：从数学原理到物理意义的深度解析与仿真实验（完整实验代码）极客不孤独学习概率论信号处理 python 数学建模
Hilbert变换：从数学原理到物理意义的深度解析与仿真实验文章目录Hilbert变换：从数学原理到物理意义的深度解析与仿真实验一、数学原理二、作用与物理意义1.构造解析信号2.相位移动特性3.应用场景三、仿真实验实验1：正弦信号的Hilbert变换实验2：调幅信号的Hilbert变换四、结论Hilbert变换是信号处理领域中一项经典而强大的工具，广泛应用于瞬时频率分析、调制解调、相位提取等场景。
Elasticsearch 聚合查询源码解读与架构方法论北漂老男人 Elasticsearch elasticsearch 架构大数据搜索引擎全文检索
Elasticsearch聚合查询源码解读与架构方法论01.引言Elasticsearch的聚合查询（Aggregation）是大规模分布式数据分析的核心能力。理解其源码结构与设计方法论，不仅有助于高效使用聚合，也能为自定义扩展、性能优化、集群运维等提供理论与实践基础。本文将从源码结构、核心模块、关键实现、行级注释与方法论出发，系统剖析聚合查询的底层原理。02.源码结构与核心模块2.1聚合相关源码
Java集合框架源码解读(1)——ArrayList、LinkedList和Vector
java.util.List接口是JavaCollectionsFramework的一个重要组成部分，List接口的架构图如下：本文将通过剖析List接口的三个实现类——ArrayList、LinkedList和Vector的源码，带你走近List的世界。ArrayListArrayList是List接口可调整数组大小的实现。实现所有可选列表操作，并允许放入包括空值在内的所有元素。每个ArrayL
高效对接全球车企：知行之桥满足科伯舒特Kromberg & Schubert EDI核心需求
科博舒特(Kromberg&Schubert,K&S)是德国顶尖的汽车线束系统供应商，服务于全球各大知名车企。作为其供应商，满足K&S严苛的EDI要求是实现高效合作、进入其全球供应链的关键环节。知行之桥EDI系统专为应对此类挑战设计，本文将详细解析K&S的核心EDI需求，并展示知行之桥如何提供稳定、高效、自动化的对接方案，助力供应商轻松达标。Kromberg&SchubertEDI项目目标与K&S
SBERT、CoSENT和BETR以及transformers的区别和联系 panshengnan NLP nlp transformer
SBERT、CoSENT、BETR和Transformers的区别和联系这几个模型（SBERT、CoSENT、BETR）和框架（Transformers）都是围绕自然语言处理（NLP）的句子嵌入和语义理解任务展开的。它们的联系主要在于基于Transformer架构，并针对特定任务做了优化；区别则在于目标任务、优化策略、训练方法和适用场景等方面。1.联系基于Transformer架构：它们的核心编码
SFT：大型语言模型专业化定制的核心技术体系——原理、创新与应用全景大千AI助手人工智能 Python #OTHER 语言模型人工智能自然语言处理深度学习机器学习微调 SFT
本文由「大千AI助手」原创发布，专注用真话讲AI，回归技术本质。拒绝神话或妖魔化。搜索「大千AI助手」关注我，一起撕掉过度包装，学习真实的AI技术！以下基于权威期刊、会议论文及技术报告，对监督微调（SupervisedFine-Tuning,SFT）的技术框架、创新方法与实际应用进行系统梳理：一、核心定义与技术原理基本概念SFT是在预训练语言模型（如GPT、BERT）基础上，利用标注数据集对模型进
LLM系统性学习完全指南（初学者必看系列） GA琥珀 LLM 学习人工智能语言模型
前言这篇文章将系统性的讲解LLM（LargeLanguageModels,LLM）的知识和应用。我们将从支撑整个领域的数学与机器学习基石出发，逐步剖析自然语言处理（NLP）的经典范式，深入探究引发革命的Transformer架构，并按时间顺序追溯从BERT、GPT-2到GPT-4、Llama及Gemini等里程碑式模型的演进。随后，我们将探讨如何将这些强大的基础模型转化为实用、安全的应用，涵盖对齐
HW prefetcher之CDP(Content-Directed Data Prefetching) Chip Design xPU Chip Design CPU GEM5
CDP是Content-DirectedDataPrefetching的缩写，它基于RobertCooksey和StephanJourdan提出的"Stateless,Content-DirectedDataPrefetchingMechanism"论文实现。是一种内容导向的数据预取机制，CDP通过分析内存中的数据内容来识别可能的指针，当识别到指针时，它会预取指针指向的内存地址。CDP使用VPN表
【超硬核】JVM源码解读：Java方法main在虚拟机上解释执行 HeapDump性能社区 java 开发语言后端 jvm
本文由HeapDump性能社区首席讲师鸠摩（马智）授权整理发布第1篇-关于Java虚拟机HotSpot，开篇说的简单点开讲Java运行时，这一篇讲一些简单的内容。我们写的主类中的main()方法是如何被Java虚拟机调用到的？在Java类中的一些方法会被由C/C++编写的HotSpot虚拟机的C/C++函数调用，不过由于Java方法与C/C++函数的调用约定不同，所以并不能直接调用，需要JavaC
【AI大模型】LLM模型架构深度解析：BERT vs. GPT vs. T5 我爱一条柴ya 学习AI记录 ai 人工智能 AI编程 python
引言Transformer架构的诞生（Vaswanietal.,2017）彻底改变了自然语言处理（NLP）。在其基础上，BERT、GPT和T5分别代表了三种不同的模型范式，主导了预训练语言模型的演进。理解它们的差异是LLM开发和学习的基石。一、核心架构对比特性BERT(BidirectionalEncoder)GPT(GenerativePre-trainedTransformer)T5(Text
MySQL 中如何优化 DISTINCT 查询：基于 Java 的实践与应用喵手数据库 mysql java 数据库
全文目录：开篇语前言摘要简介概述1.使用索引优化2.限制选择字段3.使用`GROUPBY`替代`DISTINCT`核心源码解读Java代码示例：优化`DISTINCT`查询代码说明案例分析案例一：数据去重优化应用场景演示场景一：日志数据去重场景二：用户信息检索优缺点分析优点缺点类代码方法介绍及演示MySQLDistinctOptimization类测试用例main函数测试用例测试结果预期测试代码分
bert中 [CLS] 和 [SEP] 表示什么意思？
[CLS]和[SEP]是BERT中的两个特殊标记符号，在BERT的输入文本中起到特殊的作用。[CLS]是"classification"的缩写，在文本分类任务中，它通常表示句子或文档的开头。在BERT中，[CLS]对应着输入文本中第一个词的词向量，输出层中的第一个神经元通常会被用来预测文本的类别。[SEP]是"separator"的缩写，它通常表示句子或文档的结尾。在BERT中，[SEP]对应着输
新型BERT勒索软件肆虐：多线程攻击同时针对Windows、Linux及ESXi系统
趋势科技安全分析师发现，一个代号为BERT（内部追踪名WaterPombero）的新型勒索软件组织正在亚洲、欧洲和美国展开多线程攻击。该组织主要针对医疗保健、科技和会展服务行业，其活动范围显示其正成为勒索软件生态中的新兴威胁力量。攻击技术分析在Windows系统中，BERT通过PowerShell加载器（start.ps1）实施攻击，该脚本会执行以下操作：禁用WindowsDefender防火墙和
Ntfs!LfsReadRestart函数和Ntfs!_LFCB-＞LogHeadBuffer的关系 sitelist nt4源代码分析 LfsReadRestart LogHeadBuffer RESTART_PAGE_HE RT_PAGE_HEADER
Ntfs!LfsReadRestart函数和Ntfs!_LFCB->LogHeadBuffer的关系第0部分：////RememberthislogfileintheLfcb.//ThisLfcb->FileObject=LogFile;SetFlag(ThisLfcb->Flags,(LFCB_READ_FIRST_RESTART|LFCB_READ_SECOND_RESTART));if(Lf
Transformer、BERT等模型原理与应用案例程序猿全栈の董（董翔）人工智能热门技术领域 transformer bert 深度学习
Transformer、BERT等模型原理与应用案例Transformer模型原理Transformer是一种基于注意力机制的深度学习模型架构，由Vaswani等人在2017年的论文"AttentionIsAllYouNeed"中提出。与传统的循环神经网络(RNN)和卷积神经网络(CNN)不同，Transformer完全依赖自注意力机制来处理输入序列的全局依赖关系。核心组件多头自注意力机制(Mul
NLP-D7-李宏毅机器学习---X-Attention&&GAN&BERT&GPT 甄小胖机器学习自然语言处理机器学习 bert
—0521今天4:30就起床了！真的是迫不及待想看新的课程！！！昨天做人脸识别系统的demo查资料的时候，发现一个北理的大四做cv的同学，差距好大！！！我也要努力呀！！不是比较，只是别人可以做到这个程度，我也一定可以！！！要向他学习！！！开始看课程啦！-----0753看完了各种attention，由于attention自己计算的限制，当N很大的时候会产生计算速度问题，从各种不同角度（人工知识输入
量子计算的数学地基：解码希尔伯特空间的魔法牧之112 量子计算
在科技圈，“量子计算”早已不是陌生的名词。从谷歌的“量子霸权”实验到IBM的量子云服务，从药物研发的分子模拟到密码学的革命性突破，量子计算正以颠覆式的姿态重塑着人类对计算的认知。但在这些令人惊叹的应用背后，藏着一个关键的数学基石——希尔伯特空间（HilbertSpace）。它像一片隐形的“量子舞台”，支撑着量子比特的叠加、纠缠与计算，是理解量子计算本质绕不开的概念。一、从“普通空间”到“量子空间”
【AI大模型】深入解析预训练：大模型时代的核心引擎我爱一条柴ya 学习AI记录深度学习人工智能 ai python AI编程算法
预训练已成为现代人工智能，尤其是自然语言处理和计算机视觉领域的基石技术。它彻底改变了模型开发范式，催生了BERT、GPT等革命性模型。本文将系统阐述预训练的核心概念、原理、方法、应用及挑战。一、预训练的本质：为何需要它？核心问题：数据标注的瓶颈监督学习依赖海量高质量标注数据，获取成本极高（时间、金钱、专业知识）。对于复杂任务（如理解语义、生成文本），标注难度呈指数级上升。标注数据稀缺导致模型泛化能
prism项目搭建 wpf_Prism 源码解读1-Bootstrapper和Region的创建媛源啊 prism项目搭建 wpf
介绍之前也研究过Prism框架但是一直没有深入理解，现在项目上想把一个Winform的桌面应用程序改造成WPF程序，同时我希望程序是可测试可维护架构良好的，Prism的这些设计理念正好符合我的需求，其主要用于WPF和Xamarin，用于构建松耦合，可维护，可测试的应用程序框架，在我看到源码后也深受启发，欢迎大家一起交流探讨。开始我将从官方的Samples的顺序，看介绍中的每个功能是怎么实现的。0、
python transformers库笔记（BertForTokenClassification类）夏末蝉未鸣01 自然语言处理 python transformer 自然语言处理
BertForTokenClassification类BertForTokenclassification类是HuggingFacetransformers库中专门为基于BERT的序列标注任务（如命名实体识别NER、词性标注POS）设计的模型类。它在BERT的基础上添加了一个线性分类层，用于对每个token进行分类。1、特点任务类型：专为Token-level分类设计，即对输入序列中的每一个tok
AI 正在深度重构软件开发的底层逻辑和全生命周期，从技术演进、流程重构和未来趋势三个维度进行系统性分析
AI正在深度重构软件开发的底层逻辑和全生命周期，从需求分析到运维维护的每个环节都产生了范式级变革。以下从技术演进、流程重构和未来趋势三个维度进行系统性分析：一、AI重构软件开发全栈的技术图谱需求工程智能化NLP驱动的需求解析：GPT-4架构实现用户访谈转录自动转化为UML用例图，准确率达89%（微软2023内部数据）情感计算应用：基于BERT的意图识别模型可捕捉用户需求中的隐性情绪，需求变更预测准
剖析C++底层文件系统：文件描述符管理与资源分配机制源码解读～郭俊辉@ c++
在C++底层文件系统中，文件描述符作为操作系统与文件交互的核心标识，其管理与资源分配机制对系统的性能、稳定性和资源利用率起着决定性作用。文件描述符不仅用于标识打开的文件，还涵盖了诸如管道、套接字等多种I/O设备。本文将深入剖析C++中文件描述符的管理策略与资源分配机制，结合源码揭示其运行原理与实现细节。一、文件描述符的基本概念与作用文件描述符（FileDescriptor）是操作系统为已打开文件或
【vben3源码解读】【useEcharts】【VueUse】详解useEcharts这个hooks的作用与相关库的使用（VueUse) 患得患失949 Vben3项目知识总结 Vben3 源码解读 vue.js 前端 javascript
源代码importtype{EChartsOption}from'echarts';importtype{Ref}from'vue';importtype{Nullable}from'@vben/types';importtypeEchartsUIfrom'./echarts-ui.vue';import{computed,nextTick,watch}from'vue';import{usePr
深入详解 AI 与深度学习：从零开始掌握 BERT 模型架构拉不拉斯AICoding 技术探索人工智能深度学习 bert
深入详解AI与深度学习：从零开始掌握BERT模型架构引言在自然语言处理（NLP）领域，BERT（BidirectionalEncoderRepresentationsfromTransformers）是近年来最具影响力的模型之一。它通过双向上下文理解彻底改变了NLP任务的处理方式。本文将从基础概念到核心原理、应用场景和实践技巧，深入浅出地讲解BERT，帮助初学者快速掌握这一技术。一、BERT的核心
SEO优化技巧深度解析：从算法逻辑到实战策略的全链路突破 boyedu 网站建设网站建设网站运营网站架构
第一章搜索引擎算法逻辑：SEO优化的底层密码1.1算法进化史：从关键词匹配到意图理解搜索引擎算法经历了从简单关键词匹配到复杂语义理解的跨越式发展。早期算法以PageRank为核心，通过分析网页间链接关系评估权威性。随着Hummingbird算法的推出，搜索引擎开始解析自然语言，BERT算法进一步实现上下文语义理解。当前算法已形成多维度评估体系，涵盖内容质量、用户体验、权威性建设等层面。以Googl
【零基础学AI】第29讲：BERT模型实战 - 情感分析 1989 0基础学AI bert 人工智能深度学习神经网络 cnn python 自然语言处理
本节课你将学到BERT模型的核心原理与优势HuggingFaceTransformers库的BERT接口使用情感分析任务的完整实现流程模型微调(Fine-tuning)技巧开始之前环境要求Python3.8+需要安装的包：pipinstalltorchtransformersdatasetspandastqdmGPU推荐（可加速训练）前置知识第28讲Transformer基础基本PyTorch使用
java Illegal overloaded getter method with ambiguous type for propert的解决 zwllxs java jdk
好久不来iteye,今天又来看看，哈哈,今天碰到在编码时，反射中会抛出 Illegal overloaded getter method with ambiguous type for propert这么个东东，从字面意思看，是反射在获取getter时迷惑了，然后回想起java在boolean值在生成getter时，分别有is和getter，也许我们的反射对象中就有is开头的方法迷惑了jdk，
IT人应当知道的10个行业小内幕 beijingjava 工作互联网
10. 虽然IT业的薪酬比其他很多行业要好，但有公司因此视你为其“佣人”。　　尽管IT人士的薪水没有互联网泡沫之前要好，但和其他行业人士比较，IT人的薪资还算好点。在接下的几十年中，科技在商业和社会发展中所占分量会一直增加，所以我们完全有理由相信，IT专业人才的需求量也不会减少。　　然而，正因为IT人士的薪水普遍较高，所以有些公司认为给了你这么多钱，就把你看成是公司的“佣人”，拥有你的支配
java 实现自定义链表 CrazyMizzz java 数据结构
1.链表结构链表是链式的结构 2.链表的组成链表是由头节点，中间节点和尾节点组成节点是由两个部分组成： 1.数据域 2.引用域 3.链表的实现 &nbs
web项目发布到服务器后图片过一会儿消失麦田的设计者 struts2 上传图片永久保存
作为一名学习了android和j2ee的程序员，我们必须要意识到，客服端和服务器端的交互是很有必要的，比如你用eclipse写了一个web工程，并且发布到了服务器（tomcat）上，这时你在webapps目录下看到了你发布的web工程，你可以打开电脑的浏览器输入http://localhost:8080/工程/路径访问里面的资源。但是，有时你会突然的发现之前用struts2上传的图片
CodeIgniter框架Cart类 name 不能设置中文的解决方法 IT独行者 CodeIgniter Cart 框架　
今天试用了一下CodeIgniter的Cart类时遇到了个小问题，发现当name的值为中文时，就写入不了session。在这里特别提醒一下。在CI手册里也有说明，如下： $data = array( 'id' => 'sku_123ABC', 'qty' => 1, '
linux回收站 _wy_ linux 回收站
今天一不小心在ubuntu下把一个文件移动到了回收站，我并不想删，手误了。我急忙到Nautilus下的回收站中准备恢复它，但是里面居然什么都没有。后来我发现这是由于我删文件的地方不在HOME所在的分区，而是在另一个独立的Linux分区下，这是我专门用于开发的分区。而我删除的东东在分区根目录下的.Trash-1000/file目录下，相关的删除信息（删除时间和文件所在
jquery回到页面顶端知了ing html jquery css
html代码： <h1 id="anchor">页面标题</h1> <div id="container">页面内容</div> <p><a href="#anchor" class="topLink">回到顶端</a><
B树、B-树、B+树、B*树矮蛋蛋 B树
原文地址： http://www.cnblogs.com/oldhorse/archive/2009/11/16/1604009.html B树即二叉搜索树： 1.所有非叶子结点至多拥有两个儿子（Left和Right）； &nb
数据库连接池 alafqq 数据库连接池
http://www.cnblogs.com/xdp-gacl/p/4002804.html @Anthor:孤傲苍狼数据库连接池用MySQLv5版本的数据库驱动没有问题，使用MySQLv6和Oracle的数据库驱动时候报如下错误： java.lang.ClassCastException: $Proxy0 cannot be cast to java.sql.Connec
java泛型百合不是茶 java泛型
泛型在Java SE 1.5之前，没有泛型的情况的下，通过对类型Object的引用来实现参数的“任意化”，任意化的缺点就是要实行强制转换，这种强制转换可能会带来不安全的隐患泛型的特点：消除强制转换确保类型安全向后兼容简单泛型的定义：泛型：就是在类中将其模糊化，在创建对象的时候再具体定义 class fan
javascript闭包[两个小测试例子] bijian1013 JavaScript JavaScript
一.程序一 <script> var name = "The Window"; var Object_a = { 　　name : "My Object", 　　getNameFunc : function(){ var that = this; 　　　　return function(){ 　　　　
探索JUnit4扩展：假设机制（Assumption） bijian1013 java Assumption JUnit 单元测试
一.假设机制（Assumption）概述理想情况下，写测试用例的开发人员可以明确的知道所有导致他们所写的测试用例不通过的地方，但是有的时候，这些导致测试用例不通过的地方并不是很容易的被发现，可能隐藏得很深，从而导致开发人员在写测试用例时很难预测到这些因素，而且往往这些因素并不是开发人员当初设计测试用例时真正目的，
【Gson四】范型POJO的反序列化 bit1129 POJO
在下面这个例子中，POJO(Data类)是一个范型类，在Tests中，指定范型类为PieceData，POJO初始化完成后，通过 String str = new Gson().toJson(data); 得到范型化的POJO序列化得到的JSON串，然后将这个JSON串反序列化为POJO import com.google.gson.Gson; import java.
【Spark八十五】Spark Streaming分析结果落地到MySQL bit1129 Stream
几点总结： 1. DStream.foreachRDD是一个Output Operation，类似于RDD的action，会触发Job的提交。DStream.foreachRDD是数据落地很常用的方法 2. 获取MySQL Connection的操作应该放在foreachRDD的参数（是一个RDD[T]=>Unit的函数类型)，这样，当foreachRDD方法在每个Worker上执行时，
NGINX + LUA实现复杂的控制 ronin47 nginx lua
安装lua_nginx_module 模块 lua_nginx_module 可以一步步的安装，也可以直接用淘宝的OpenResty Centos和debian的安装就简单了。。这里说下freebsd的安装： fetch http://www.lua.org/ftp/lua-5.1.4.tar.gz tar zxvf lua-5.1.4.tar.gz cd lua-5.1.4 ma
java-递归判断数组是否升序 bylijinnan java
public class IsAccendListRecursive { /*递归判断数组是否升序 * if a Integer array is ascending,return true * use recursion */ public static void main(String[] args){ IsAccendListRecursiv
Netty源码学习-DefaultChannelPipeline2 bylijinnan java netty
Netty3的API http://docs.jboss.org/netty/3.2/api/org/jboss/netty/channel/ChannelPipeline.html 里面提到ChannelPipeline的一个“pitfall”：如果ChannelPipeline只有一个handler（假设为handlerA）且希望用另一handler（假设为handlerB）来
Java工具之JPS chinrui java
JPS使用熟悉Linux的朋友们都知道，Linux下有一个常用的命令叫做ps（Process Status)，是用来查看Linux环境下进程信息的。同样的，在Java Virtual Machine里面也提供了类似的工具供广大Java开发人员使用，它就是jps（Java Process Status)，它可以用来
window.print分页打印 ctrain window
function init() { var tt = document.getElementById("tt"); var childNodes = tt.childNodes[0].childNodes; var level = 0; for (var i = 0; i < childNodes.length; i++) {
安装hadoop时执行jps命令Error occurred during initialization of VM daizj jdk hadoop jps
在安装hadoop时，执行JPS出现下面错误 [slave16][email protected]:/tmp/hsperfdata_hdfs# jps Error occurred during initialization of VM java.lang.Error: Properties init: Could not determine current working
PHP开发大型项目的一点经验 dcj3sjt126com PHP 重构
一、变量最好是把所有的变量存储在一个数组中，这样在程序的开发中可以带来很多的方便，特别是当程序很大的时候。变量的命名就当适合自己的习惯，不管是用拼音还是英语，至少应当有一定的意义，以便适合记忆。变量的命名尽量规范化，不要与PHP中的关键字相冲突。二、函数 PHP自带了很多函数，这给我们程序的编写带来了很多的方便。当然，在大型程序中我们往往自己要定义许多个函数，几十
android笔记之--向网络发送GET/POST请求参数 dcj3sjt126com android
使用GET方法发送请求 private static boolean sendGETRequest (String path, Map<String, String> params) throws Exception{ //发送地http://192.168.100.91:8080/videoServi
linux复习笔记之bash shell (3) 通配符 eksliang linux 通配符 linux通配符
转载请出自出处： http://eksliang.iteye.com/blog/2104387 在bash的操作环境中有一个非常有用的功能，那就是通配符。下面列出一些常用的通配符，如下表所示符号意义 * 万用字符，代表0个到无穷个任意字符 ? 万用字符，代表一定有一个任意字符 [] 代表一定有一个在中括号内的字符。例如：[abcd]代表一定有一个字符，可能是a、b、c
Android关于短信加密 gqdy365 android
关于Android短信加密功能，我初步了解的如下（只在Android应用层试验）： 1、因为Android有短信收发接口，可以调用接口完成短信收发；发送过程：APP（基于短信应用修改）接受用户输入号码、内容——>APP对短信内容加密——>调用短信发送方法Sm
asp.net在网站根目录下创建文件夹 hvt .net C#hovertree asp.net Web Forms
假设要在asp.net网站的根目录下建立文件夹hovertree,C#代码如下： string m_keleyiFolderName = Server.MapPath("/hovertree"); if (Directory.Exists(m_keleyiFolderName)) { //文件夹已经存在 return; } else { try { D
一个合格的程序员应该读过哪些书 justjavac 程序员书籍
编者按：2008年8月4日，StackOverflow 网友 Bert F 发帖提问：哪本最具影响力的书，是每个程序员都应该读的？ “如果能时光倒流，回到过去，作为一个开发人员，你可以告诉自己在职业生涯初期应该读一本，你会选择哪本书呢？我希望这个书单列表内容丰富，可以涵盖很多东西。” 很多程序员响应，他们在推荐时也写下自己的评语。以前就有国内网友介绍这个程序员书单，不过都是推荐数
单实例实践跑龙套_az 单例
1、内部类 public class Singleton { private static class SingletonHolder { public static Singleton singleton = new Singleton(); } public Singleton getRes
PO VO BEAN 理解 q137681467 VO DTO po
PO：全称是 persistant object持久对象最形象的理解就是一个PO就是数据库中的一条记录。好处是可以把一条记录作为一个对象处理，可以方便的转为其它对象。 BO：全称是 business object:业务对象主要作用是把业务逻辑封装为一个对象。这个对
战胜惰性，暗自努力金笛子努力
偶然看到一句很贴近生活的话：“别人都在你看不到的地方暗自努力，在你看得到的地方，他们也和你一样显得吊儿郎当，和你一样会抱怨，而只有你自己相信这些都是真的，最后也只有你一人继续不思进取。”很多句子总在不经意中就会戳中一部分人的软肋，我想我们每个人的周围总是有那么些表现得“吊儿郎当”的存在，是否你就真的相信他们如此不思进取，而开始放松了对自己的要求随波逐流呢？我有个朋友是搞技术的，平时嘻嘻哈哈，以
NDK/JNI二维数组多维数组传递 wenzongliang 二维数组 jni NDK
多维数组和对象数组一样处理，例如二维数组里的每个元素还是一个数组用jArray表示，直到数组变为一维的，且里面元素为基本类型，去获得一维数组指针。给大家提供个例子。已经测试通过。 Java_cn_wzl_FiveChessView_checkWin( JNIEnv* env,jobject thiz,jobjectArray qizidata) { jint i,j; int s