愤斗的橘子

工具系列：TensorFlow决策森林_(5)使用文本和神经网络特征

文章目录

- 设置
- 使用原始文本作为特征
- 使用预训练的文本嵌入
- 同时训练决策树和神经网络
- - 构建模型
  - 训练和评估模型

欢迎来到 TensorFlow决策森林（ TF-DF）的 中级教程。
在本文中，您将学习有关 TF-DF的一些更高级的功能，包括如何处理自然语言特征。

本文假设您已经熟悉在决策森林中介绍的概念，特别是关于TF-DF的安装。

在本文中，您将会：

训练一个原生地将文本特征作为分类集合的随机森林。
使用TensorFlow Hub模块训练一个使用文本特征的随机森林。在这种情况下（迁移学习），该模块已经在一个大型文本语料库上进行了预训练。
同时训练一个梯度提升决策树（GBDT）和一个神经网络。GBDT将使用神经网络的输出作为输入。

设置

# 安装 TensorFlow Decision Forests 库
!pip install tensorflow_decision_forests

Collecting tensorflow_decision_forests
  Using cached tensorflow_decision_forests-1.8.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.0 kB)
Requirement already satisfied: numpy in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.26.2)
Requirement already satisfied: pandas in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (2.1.3)
Requirement already satisfied: tensorflow~=2.15.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (2.15.0)
Requirement already satisfied: six in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.16.0)
Requirement already satisfied: absl-py in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.4.0)
Requirement already satisfied: wheel in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (0.41.2)
Collecting wurlitzer (from tensorflow_decision_forests)
  Using cached wurlitzer-3.0.3-py3-none-any.whl (7.3 kB)
Requirement already satisfied: astunparse>=1.6.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (1.6.3)
Requirement already satisfied: flatbuffers>=23.5.26 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (23.5.26)
Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (0.5.4)
Requirement already satisfied: google-pasta>=0.1.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (0.2.0)
Requirement already satisfied: h5py>=2.9.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (3.10.0)
Requirement already satisfied: libclang>=13.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (16.0.6)
Requirement already satisfied: ml-dtypes~=0.2.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (0.2.0)
Requirement already satisfied: opt-einsum>=2.3.2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (3.3.0)
Requirement already satisfied: packaging in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (23.2)
Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (3.20.3)
Requirement already satisfied: setuptools in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (68.2.2)
Requirement already satisfied: termcolor>=1.1.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (2.3.0)
Requirement already satisfied: typing-extensions>=3.6.6 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (4.8.0)
Requirement already satisfied: wrapt<1.15,>=1.11.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (1.14.1)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (0.34.0)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (1.60.0rc1)
Requirement already satisfied: tensorboard<2.16,>=2.15 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (2.15.1)
Requirement already satisfied: tensorflow-estimator<2.16,>=2.15.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (2.15.0)
Requirement already satisfied: keras<2.16,>=2.15.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (2.15.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas->tensorflow_decision_forests) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas->tensorflow_decision_forests) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas->tensorflow_decision_forests) (2023.3)
Requirement already satisfied: google-auth<3,>=1.6.3 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (2.23.4)
Requirement already satisfied: google-auth-oauthlib<2,>=0.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (1.1.0)
Requirement already satisfied: markdown>=2.6.8 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (3.5.1)
Requirement already satisfied: requests<3,>=2.21.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (2.31.0)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (3.0.1)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (5.3.2)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (0.3.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (4.9)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth-oauthlib<2,>=0.5->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (1.3.1)
Requirement already satisfied: importlib-metadata>=4.4 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from markdown>=2.6.8->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (6.8.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (2.1.0)
Requirement already satisfied: certifi>=2017.4.17 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (2023.11.17)
Requirement already satisfied: MarkupSafe>=2.1.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from werkzeug>=1.0.1->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (2.1.3)
Requirement already satisfied: zipp>=0.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (3.17.0)
Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (0.5.0)
Requirement already satisfied: oauthlib>=3.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<2,>=0.5->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (3.2.2)
Using cached tensorflow_decision_forests-1.8.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.3 MB)
Installing collected packages: wurlitzer, tensorflow_decision_forests
Successfully installed tensorflow_decision_forests-1.8.1 wurlitzer-3.0.3

Wurlitzer 是在 Colabs 中显示详细的训练日志所需的（当在模型构造函数中使用 verbose=2 时）。

# 安装wurlitzer模块，用于在Jupyter Notebook中显示C语言的输出结果
!pip install wurlitzer

Requirement already satisfied: wurlitzer in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (3.0.3)

导入必要的库。

# 导入所需的库

import tensorflow_decision_forests as tfdf  # 导入决策森林库
import os  # 导入操作系统库
import numpy as np  # 导入数值计算库
import pandas as pd  # 导入数据处理库
import tensorflow as tf  # 导入深度学习库
import math  # 导入数学库

2023-11-20 12:31:20.226021: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-20 12:31:20.226066: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-20 12:31:20.227643: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

隐藏的代码单元格在colab中限制了输出的高度。

#@title

# 导入所需的模块
from IPython.core.magic import register_line_magic
from IPython.display import Javascript
from IPython.display import display as ipy_display

# 定义一个魔术命令，用于设置单元格的最大高度
@register_line_magic
def set_cell_height(size):
  # 调用Javascript代码，设置单元格的最大高度
  ipy_display(
      Javascript("google.colab.output.setIframeHeight(0, true, {maxHeight: " +
                 str(size) + "})"))

使用原始文本作为特征

TF-DF可以原生地处理categorical-set特征。Categorical-sets将文本特征表示为词袋（或n-grams）。

例如："The little blue dog" → {"the", "little", "blue", "dog"}

在这个例子中，您将在Stanford Sentiment Treebank（SST）数据集上训练一个随机森林。该数据集的目标是将句子分类为positive或negative情感。您将使用在TensorFlow Datasets中精选的二分类版本的数据集。

注意： 训练categorical-set特征可能会很昂贵。在这本文中，我们将训练一个包含20棵树的小型随机森林。

# 安装 TensorFlow Datasets 包
!pip install tensorflow-datasets -U --quiet

# 导入tensorflow_datasets库
import tensorflow_datasets as tfds

# 加载数据集
all_ds = tfds.load("glue/sst2")

# 显示测试集中的前3个样例
for example in all_ds["test"].take(3):
  # 打印每个样例的属性名和属性值
  print({attr_name: attr_tensor.numpy() for attr_name, attr_tensor in example.items()})

{'idx': 163, 'label': -1, 'sentence': b'not even the hanson brothers can save it'}
{'idx': 131, 'label': -1, 'sentence': b'strong setup and ambitious goals fade as the film descends into unsophisticated scare tactics and b-film thuggery .'}
{'idx': 1579, 'label': -1, 'sentence': b'too timid to bring a sense of closure to an ugly chapter of the twentieth century .'}


2023-11-20 12:31:28.022927: W tensorflow/core/kernels/data/cache_dataset_ops.cc:858] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

数据集的修改如下：

原始标签是整数{-1, 1}，但学习算法期望的是正整数标签，例如{0, 1}。因此，标签的转换如下：new_labels = (original_labels + 1) / 2。
为了使数据集的读取更加高效，应用了批量大小为64。
sentence属性需要进行分词，即"hello world" -> ["hello", "world"]。

注意： 此示例不使用数据集的test拆分，因为它没有标签。如果test拆分有标签，可以将validation折叠连接到train中（例如all_ds["train"].concatenate(all_ds["validation"])）。

细节： 某些决策森林学习算法不需要验证数据集（例如随机森林），而其他一些算法则需要（例如某些情况下的梯度提升树）。由于TF-DF下的每个学习算法可以以不同的方式使用验证数据，TF-DF在内部处理训练/验证拆分。因此，当您有训练和验证集时，它们可以始终作为学习算法的输入进行连接。

# 定义函数prepare_dataset，用于处理数据集
# 参数example为输入的样本数据
def prepare_dataset(example):
    # 将label加1后除以2，得到标签值
    label = (example["label"] + 1) // 2
    # 将句子按空格进行分割，并返回分割后的结果作为"sentence"键的值
    return {"sentence" : tf.strings.split(example["sentence"])}, label

# 将训练数据集all_ds中的"train"部分进行批处理，每批100个样本，并使用prepare_dataset函数进行处理
train_ds = all_ds["train"].batch(100).map(prepare_dataset)

# 将验证数据集all_ds中的"validation"部分进行批处理，每批100个样本，并使用prepare_dataset函数进行处理
test_ds = all_ds["validation"].batch(100).map(prepare_dataset)

最后，像往常一样训练和评估模型。TF-DF会自动将多值分类特征识别为分类集合。

# 设置单元格高度为300
%set_cell_height 300
# 指定模型为随机森林模型，使用30棵树，verbose参数为2表示输出训练过程中的详细信息
model_1 = tfdf.keras.RandomForestModel(num_trees=30, verbose=2)

# 训练模型，使用train_ds作为训练数据集
model_1.fit(x=train_ds)




Warning: The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


WARNING:absl:The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


Use /tmpfs/tmp/tmpp9alip3z as temporary training directory
Reading training dataset...
Training tensor examples:
Features: {'sentence': tf.RaggedTensor(values=Tensor("data:0", shape=(None,), dtype=string), row_splits=Tensor("data_1:0", shape=(None,), dtype=int64))}
Label: Tensor("data_2:0", shape=(None,), dtype=int64)
Weights: None
Normalized tensor features:
 {'sentence': SemanticTensor(semantic=, tensor=tf.RaggedTensor(values=Tensor("data:0", shape=(None,), dtype=string), row_splits=Tensor("data_1:0", shape=(None,), dtype=int64)))}
Training dataset read in 0:00:04.588912. Found 67349 examples.
Training model...
Standard output detected as not visible to the user e.g. running in a notebook. Creating a training log redirection. If training gets stuck, try calling tfdf.keras.set_training_logs_redirection(False).


[INFO 23-11-20 12:31:32.7845 UTC kernel.cc:771] Start Yggdrasil model training
[INFO 23-11-20 12:31:32.7845 UTC kernel.cc:772] Collect training examples
[INFO 23-11-20 12:31:32.7846 UTC kernel.cc:785] Dataspec guide:
column_guides {
  column_name_pattern: "^__LABEL$"
  type: CATEGORICAL
  categorial {
    min_vocab_frequency: 0
    max_vocab_count: -1
  }
}
default_column_guide {
  categorial {
    max_vocab_count: 2000
  }
  discretized_numerical {
    maximum_num_bins: 255
  }
}
ignore_columns_without_guides: false
detect_numerical_as_discretized_numerical: false

[INFO 23-11-20 12:31:32.7849 UTC kernel.cc:391] Number of batches: 674
[INFO 23-11-20 12:31:32.7849 UTC kernel.cc:392] Number of examples: 67349
[INFO 23-11-20 12:31:32.8290 UTC data_spec_inference.cc:305] 12816 item(s) have been pruned (i.e. they are considered out of dictionary) for the column sentence (2000 item(s) left) because min_value_count=5 and max_number_of_unique_values=2000
[INFO 23-11-20 12:31:32.8820 UTC kernel.cc:792] Training dataset:
Number of records: 67349
Number of columns: 2

Number of columns by type:
	CATEGORICAL_SET: 1 (50%)
	CATEGORICAL: 1 (50%)

Columns:

CATEGORICAL_SET: 1 (50%)
	1: "sentence" CATEGORICAL_SET has-dict vocab-size:2001 num-oods:10187 (15.1257%) most-frequent:"the" 27205 (40.3941%)

CATEGORICAL: 1 (50%)
	0: "__LABEL" CATEGORICAL integerized vocab-size:3 no-ood-item

Terminology:
	nas: Number of non-available (i.e. missing) values.
	ood: Out of dictionary.
	manually-defined: Attribute whose type is manually defined by the user, i.e., the type was not automatically inferred.
	tokenized: The attribute value is obtained through tokenization.
	has-dict: The attribute is attached to a string dictionary e.g. a categorical attribute stored as a string.
	vocab-size: Number of unique values.

[INFO 23-11-20 12:31:32.8821 UTC kernel.cc:808] Configure learner
[INFO 23-11-20 12:31:32.8823 UTC kernel.cc:822] Training config:
learner: "RANDOM_FOREST"
features: "^sentence$"
label: "^__LABEL$"
task: CLASSIFICATION
random_seed: 123456
metadata {
  framework: "TF Keras"
}
pure_serving_model: false
[yggdrasil_decision_forests.model.random_forest.proto.random_forest_config] {
  num_trees: 30
  decision_tree {
    max_depth: 16
    min_examples: 5
    in_split_min_examples_check: true
    keep_non_leaf_label_distribution: true
    num_candidate_attributes: 0
    missing_value_policy: GLOBAL_IMPUTATION
    allow_na_conditions: false
    categorical_set_greedy_forward {
      sampling: 0.1
      max_num_items: -1
      min_item_frequency: 1
    }
    growing_strategy_local {
    }
    categorical {
      cart {
      }
    }
    axis_aligned_split {
    }
    internal {
      sorting_strategy: PRESORTED
    }
    uplift {
      min_examples_in_treatment: 5
      split_score: KULLBACK_LEIBLER
    }
  }
  winner_take_all_inference: true
  compute_oob_performances: true
  compute_oob_variable_importances: false
  num_oob_variable_importances_permutations: 1
  bootstrap_training_dataset: true
  bootstrap_size_ratio: 1
  adapt_bootstrap_size_ratio_for_maximum_training_duration: false
  sampling_with_replacement: true
}

[INFO 23-11-20 12:31:32.8826 UTC kernel.cc:825] Deployment config:
cache_path: "/tmpfs/tmp/tmpp9alip3z/working_cache"
num_threads: 32
try_resume_training: true

[INFO 23-11-20 12:31:32.8828 UTC kernel.cc:887] Train model
[INFO 23-11-20 12:31:32.8836 UTC random_forest.cc:416] Training random forest on 67349 example(s) and 1 feature(s).
[INFO 23-11-20 12:32:02.2437 UTC random_forest.cc:802] Training of tree  1/30 (tree index:13) done accuracy:0.738731 logloss:9.4171
[INFO 23-11-20 12:32:12.3428 UTC random_forest.cc:802] Training of tree  3/30 (tree index:27) done accuracy:0.754745 logloss:6.47525
[INFO 23-11-20 12:32:17.6546 UTC random_forest.cc:802] Training of tree  13/30 (tree index:20) done accuracy:0.801813 logloss:2.334
[INFO 23-11-20 12:32:18.5584 UTC random_forest.cc:802] Training of tree  23/30 (tree index:15) done accuracy:0.81742 logloss:0.942096
[INFO 23-11-20 12:32:21.9457 UTC random_forest.cc:802] Training of tree  30/30 (tree index:21) done accuracy:0.821274 logloss:0.854486
[INFO 23-11-20 12:32:21.9462 UTC random_forest.cc:882] Final OOB metrics: accuracy:0.821274 logloss:0.854486
[INFO 23-11-20 12:32:21.9558 UTC kernel.cc:919] Export model in log directory: /tmpfs/tmp/tmpp9alip3z with prefix d2f2a624a65443d5
[INFO 23-11-20 12:32:21.9870 UTC kernel.cc:937] Save model in resources
[INFO 23-11-20 12:32:21.9901 UTC abstract_model.cc:881] Model self evaluation:
Number of predictions (without weights): 67349
Number of predictions (with weights): 67349
Task: CLASSIFICATION
Label: __LABEL

Accuracy: 0.821274  CI95[W][0.818828 0.8237]
LogLoss: : 0.854486
ErrorRate: : 0.178726

Default Accuracy: : 0.557826
Default LogLoss: : 0.686445
Default ErrorRate: : 0.442174

Confusion Table:
truth\prediction
       1      2
1  19593  10187
2   1850  35719
Total: 67349


[INFO 23-11-20 12:32:22.0155 UTC kernel.cc:1233] Loading model from path /tmpfs/tmp/tmpp9alip3z/model/ with prefix d2f2a624a65443d5
[INFO 23-11-20 12:32:22.3248 UTC decision_forest.cc:660] Model loaded with 30 root(s), 43180 node(s), and 1 input feature(s).
[INFO 23-11-20 12:32:22.3249 UTC abstract_model.cc:1344] Engine "RandomForestGeneric" built
[INFO 23-11-20 12:32:22.3249 UTC kernel.cc:1061] Use fast generic engine


Model trained in 0:00:49.561739
Compiling model...
Model compiled.

在之前的日志中，注意到 sentence 是一个 CATEGORICAL_SET 特征。

模型的评估与往常一样：

# 对模型进行编译，指定评估指标为准确率
model_1.compile(metrics=["accuracy"])

# 对测试数据集进行评估，返回损失值和准确率
evaluation = model_1.evaluate(test_ds)

# 打印二元交叉熵损失值
print(f"BinaryCrossentropyloss: {evaluation[0]}")

# 打印准确率
print(f"Accuracy: {evaluation[1]}")

1/9 [==>...........................] - ETA: 4s - loss: 0.0000e+00 - accuracy: 0.8100
9/9 [==============================] - 1s 5ms/step - loss: 0.0000e+00 - accuracy: 0.7638
BinaryCrossentropyloss: 0.0
Accuracy: 0.7637614607810974

训练日志如下所示：

# 导入matplotlib.pyplot模块，用于绘制图表
import matplotlib.pyplot as plt

# 获取模型的训练日志
logs = model_1.make_inspector().training_logs()

# 绘制折线图，横坐标为日志中的树的数量，纵坐标为日志中的评估准确率
plt.plot([log.num_trees for log in logs], [log.evaluation.accuracy for log in logs])

# 设置横坐标的标签为"Number of trees"
plt.xlabel("Number of trees")

# 设置纵坐标的标签为"Out-of-bag accuracy"
plt.ylabel("Out-of-bag accuracy")

# 保持图表的原样，不做任何处理
pass

更多的树可能会有益处（我确定，因为我试过:p）。

使用预训练的文本嵌入

前面的例子使用原始文本特征训练了一个随机森林模型。这个例子将使用一个预训练的TF-Hub嵌入将文本特征转换为密集嵌入，并在其上训练一个随机森林模型。在这种情况下，随机森林模型只会“看到”嵌入的数值输出（即它不会看到原始文本）。

在这个实验中，我们将使用Universal-Sentence-Encoder。不同的预训练嵌入可能适用于不同类型的文本（例如不同的语言、不同的任务），也适用于其他类型的结构化特征（例如图像）。

**注意：**这个嵌入模块很大（1GB），因此最终模型的运行速度会比传统的决策树推断慢。

嵌入模块可以应用在两个地方：

在数据集准备阶段。
在模型的预处理阶段。

通常情况下，第二个选项更可取：将嵌入打包到模型中使得模型更容易使用（也更难被误用）。

首先安装TF-Hub：

# 安装tensorflow-hub库的最新版本
!pip install --upgrade tensorflow-hub

Requirement already satisfied: tensorflow-hub in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (0.15.0)
Requirement already satisfied: numpy>=1.12.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow-hub) (1.26.2)
Requirement already satisfied: protobuf>=3.19.6 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow-hub) (3.20.3)

与以前不同的是，您不需要对文本进行分词。

# 定义函数prepare_dataset，输入参数为example
def prepare_dataset(example):
    # 将label加1后除以2，得到label的值
    label = (example["label"] + 1) // 2
    # 返回一个字典，键为"sentence"，值为example中的"sentence"对应的值，以及label的值
    return {"sentence" : example["sentence"]}, label

# 将all_ds中的"train"数据集按照batch size为100进行分批，并对每个batch应用prepare_dataset函数进行处理，得到train_ds数据集
train_ds = all_ds["train"].batch(100).map(prepare_dataset)

# 将all_ds中的"validation"数据集按照batch size为100进行分批，并对每个batch应用prepare_dataset函数进行处理，得到test_ds数据集
test_ds = all_ds["validation"].batch(100).map(prepare_dataset)

%set_cell_height 300

# 导入tensorflow_hub模块
import tensorflow_hub as hub
# 定义使用的模型为Universal Sentence Encoder，版本为4
hub_url = "https://tfhub.dev/google/universal-sentence-encoder/4"
# 将模型转换为Keras层
embedding = hub.KerasLayer(hub_url)

# 定义输入层，输入为字符串类型的句子
sentence = tf.keras.layers.Input(shape=(), name="sentence", dtype=tf.string)
# 将句子转换为嵌入向量
embedded_sentence = embedding(sentence)

# 定义原始输入为句子，处理后的输入为嵌入向量
raw_inputs = {"sentence": sentence}
processed_inputs = {"embedded_sentence": embedded_sentence}
# 定义预处理模型，将原始输入转换为处理后的输入
preprocessor = tf.keras.Model(inputs=raw_inputs, outputs=processed_inputs)

# 定义随机森林模型，使用预处理模型进行数据预处理，树的数量为100
model_2 = tfdf.keras.RandomForestModel(
    preprocessing=preprocessor,
    num_trees=100)

# 使用训练数据进行模型训练
model_2.fit(x=train_ds)




Warning: The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


WARNING:absl:The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


Use /tmpfs/tmp/tmp2l8qenh8 as temporary training directory
Reading training dataset...
Training dataset read in 0:00:22.682140. Found 67349 examples.
Training model...


[INFO 23-11-20 12:33:16.6995 UTC kernel.cc:1233] Loading model from path /tmpfs/tmp/tmp2l8qenh8/model/ with prefix a883bbf674954d64


Model trained in 0:00:14.090027
Compiling model...


[INFO 23-11-20 12:33:18.4993 UTC decision_forest.cc:660] Model loaded with 100 root(s), 563552 node(s), and 512 input feature(s).
[INFO 23-11-20 12:33:18.4994 UTC abstract_model.cc:1344] Engine "RandomForestOptPred" built
[INFO 23-11-20 12:33:18.4996 UTC kernel.cc:1061] Use fast generic engine


Model compiled.

# 编译模型
model_2.compile(metrics=["accuracy"])

# 评估模型
evaluation = model_2.evaluate(test_ds)

# 打印二元交叉熵损失
print(f"BinaryCrossentropyloss: {evaluation[0]}")

# 打印准确率
print(f"Accuracy: {evaluation[1]}")

1/9 [==>...........................] - ETA: 13s - loss: 0.0000e+00 - accuracy: 0.7800
4/9 [============>.................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8075 
7/9 [======================>.......] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.7886
9/9 [==============================] - 2s 18ms/step - loss: 0.0000e+00 - accuracy: 0.7798
BinaryCrossentropyloss: 0.0
Accuracy: 0.7798165082931519

注意，分类集合与密集嵌入在表示文本时有所不同，因此同时使用这两种策略可能会很有用。

同时训练决策树和神经网络

前面的例子使用了一个预训练的神经网络（NN）来处理文本特征，然后将它们传递给随机森林。这个例子将从头开始训练神经网络和随机森林。

TF-DF的决策森林不会反向传播梯度（尽管这是正在进行的研究的主题）。因此，训练分为两个阶段：

将神经网络作为标准分类任务进行训练：

示例 → [归一化] → [神经网络*] → [分类头] → 预测
*: 训练。

用随机森林替换神经网络的头（最后一层和软最大值）。像往常一样训练随机森林：

示例 → [归一化] → [神经网络] → [随机森林*] → 预测
*: 训练。

### 准备数据集

本示例使用[Palmer's Penguins](https://allisonhorst.github.io/palmerpenguins/articles/intro.html)数据集。有关详细信息，请参阅[初学者colab](beginner_colab.ipynb)。

首先，下载原始数据：


```python
# 下载penguins.csv文件并保存到指定路径/tmp/penguins.csv

!wget -q https://storage.googleapis.com/download.tensorflow.org/data/palmer_penguins/penguins.csv -O /tmp/penguins.csv

将数据集加载到Pandas Dataframe中

# 读取csv文件，将数据存储在dataset_df中
dataset_df = pd.read_csv("/tmp/penguins.csv")

# 显示dataset_df中前3个样本的数据
dataset_df.head(3)

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	male	2007
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	female	2007
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	female	2007

准备训练数据集。

# 设置标签为"species"
label = "species"
# 将数据集中的数值NaN（表示Pandas Dataframe中的缺失值）替换为0。
# ...神经网络对数值NaN的处理效果不好。
for col in dataset_df.columns:
  # 如果数据集中的列的数据类型不是字符串或对象类型
  if dataset_df[col].dtype not in [str, object]:
    # 将该列中的NaN值替换为0
    dataset_df[col] = dataset_df[col].fillna(0)

# 将数据集拆分为训练集和测试集

def split_dataset(dataset, test_ratio=0.30):
  """将panda dataframe拆分为两个部分。"""
  # 生成一个与数据集长度相同的随机数组，元素值小于测试比例的为True，大于等于测试比例的为False
  test_indices = np.random.rand(len(dataset)) < test_ratio
  # 返回训练集和测试集
  return dataset[~test_indices], dataset[test_indices]

# 调用split_dataset函数将数据集拆分为训练集和测试集
train_ds_pd, test_ds_pd = split_dataset(dataset_df)
# 打印训练集和测试集的样本数量
print("{} 个样本用于训练，{} 个样本用于测试。".format(
    len(train_ds_pd), len(test_ds_pd)))

# 将数据集转换为tensorflow数据集
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_ds_pd, label=label)
test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_ds_pd, label=label)

248 examples in training, 96 examples for testing.

构建模型

接下来，使用Keras的函数式风格创建神经网络模型。

为了保持示例简单，该模型仅使用两个输入。

# 创建两个输入层
input_1 = tf.keras.Input(shape=(1,), name="bill_length_mm", dtype="float")  # 输入层1，表示企鹅嘴峰的长度，数据类型为浮点数
input_2 = tf.keras.Input(shape=(1,), name="island", dtype="string")  # 输入层2，表示企鹅所在的岛屿，数据类型为字符串

# 将两个输入层组合成一个列表
nn_raw_inputs = [input_1, input_2]  # 输入层列表，包含两个输入层

使用预处理层将原始输入转换为适合神经网络的输入。

# 正则化
Normalization = tf.keras.layers.Normalization
CategoryEncoding = tf.keras.layers.CategoryEncoding
StringLookup = tf.keras.layers.StringLookup

# 获取"bill_length_mm"列的值，并将其转换为二维数组
values = train_ds_pd["bill_length_mm"].values[:, tf.newaxis]
# 创建Normalization层实例
input_1_normalizer = Normalization()
# 对输入数据进行适应，计算均值和方差
input_1_normalizer.adapt(values)

# 获取"island"列的值
values = train_ds_pd["island"].values
# 创建StringLookup层实例，将字符串转换为整数索引
input_2_indexer = StringLookup(max_tokens=32)
# 对输入数据进行适应，构建索引映射关系
input_2_indexer.adapt(values)

# 创建CategoryEncoding层实例，将整数索引转换为二进制编码
input_2_onehot = CategoryEncoding(output_mode="binary", max_tokens=32)

# 对输入数据进行正则化处理
normalized_input_1 = input_1_normalizer(input_1)
# 将输入数据转换为整数索引，并进行二进制编码
normalized_input_2 = input_2_onehot(input_2_indexer(input_2))

# 组合处理后的输入数据
nn_processed_inputs = [normalized_input_1, normalized_input_2]

WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.


WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.

构建神经网络的主体部分：

# 创建一个Concatenate层，用于将nn_processed_inputs中的张量连接起来
y = tf.keras.layers.Concatenate()(nn_processed_inputs)

# 创建一个Dense层，输出维度为16，激活函数为relu6
y = tf.keras.layers.Dense(16, activation=tf.nn.relu6)(y)

# 创建一个Dense层，输出维度为8，激活函数为relu，命名为"last"
last_layer = tf.keras.layers.Dense(8, activation=tf.nn.relu, name="last")(y)

# 创建一个Dense层，输出维度为3，用于分类任务
classification_output = tf.keras.layers.Dense(3)(y)

# 创建一个模型，输入为nn_raw_inputs，输出为classification_output
nn_model = tf.keras.models.Model(nn_raw_inputs, classification_output)

这个 nn_model 直接产生分类的 logits。

接下来创建一个决策森林模型。它将在神经网络在分类头之前提取的高级特征上操作。


# 将神经网络模型去掉输出层，得到nn_without_head模型
# 将神经网络模型和决策森林模型组合成一个keras模型，即df_and_nn_model
nn_without_head = tf.keras.models.Model(inputs=nn_model.inputs, outputs=last_layer)
df_and_nn_model = tfdf.keras.RandomForestModel(preprocessing=nn_without_head)

Warning: The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


WARNING:absl:The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


Use /tmpfs/tmp/tmpzwv9a980 as temporary training directory

训练和评估模型

该模型将分为两个阶段进行训练。首先，使用自己的分类头训练神经网络：

# 设置单元格高度为300

# 编译神经网络模型
nn_model.compile(
  optimizer=tf.keras.optimizers.Adam(),  # 使用Adam优化器
  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),  # 使用稀疏分类交叉熵损失函数
  metrics=["accuracy"]  # 评估指标为准确率
)

# 使用训练数据集进行训练，并使用测试数据集进行验证，训练10个epochs
nn_model.fit(x=train_ds, validation_data=test_ds, epochs=10)

# 打印神经网络模型的概要信息
nn_model.summary()




Epoch 1/10


/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/keras/src/engine/functional.py:642: UserWarning: Input dict contained keys ['bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex', 'year'] which did not match any model input. They will be ignored by the model.
  inputs = self._flatten_to_reference_inputs(inputs)
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1700483606.110085  457876 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.



1/1 [==============================] - ETA: 0s - loss: 1.4043 - accuracy: 0.0161
1/1 [==============================] - 2s 2s/step - loss: 1.4043 - accuracy: 0.0161 - val_loss: 1.3502 - val_accuracy: 0.0208
Epoch 2/10

1/1 [==============================] - ETA: 0s - loss: 1.3963 - accuracy: 0.0161
1/1 [==============================] - 0s 21ms/step - loss: 1.3963 - accuracy: 0.0161 - val_loss: 1.3436 - val_accuracy: 0.0208
Epoch 3/10

1/1 [==============================] - ETA: 0s - loss: 1.3885 - accuracy: 0.0161
1/1 [==============================] - 0s 21ms/step - loss: 1.3885 - accuracy: 0.0161 - val_loss: 1.3371 - val_accuracy: 0.0208
Epoch 4/10

1/1 [==============================] - ETA: 0s - loss: 1.3809 - accuracy: 0.0161
1/1 [==============================] - 0s 21ms/step - loss: 1.3809 - accuracy: 0.0161 - val_loss: 1.3305 - val_accuracy: 0.0208
Epoch 5/10

1/1 [==============================] - ETA: 0s - loss: 1.3733 - accuracy: 0.0161
1/1 [==============================] - 0s 21ms/step - loss: 1.3733 - accuracy: 0.0161 - val_loss: 1.3241 - val_accuracy: 0.0208
Epoch 6/10

1/1 [==============================] - ETA: 0s - loss: 1.3658 - accuracy: 0.0161
1/1 [==============================] - 0s 20ms/step - loss: 1.3658 - accuracy: 0.0161 - val_loss: 1.3177 - val_accuracy: 0.0208
Epoch 7/10

1/1 [==============================] - ETA: 0s - loss: 1.3584 - accuracy: 0.0161
1/1 [==============================] - 0s 20ms/step - loss: 1.3584 - accuracy: 0.0161 - val_loss: 1.3113 - val_accuracy: 0.0208
Epoch 8/10

1/1 [==============================] - ETA: 0s - loss: 1.3511 - accuracy: 0.0081
1/1 [==============================] - 0s 20ms/step - loss: 1.3511 - accuracy: 0.0081 - val_loss: 1.3050 - val_accuracy: 0.0208
Epoch 9/10

1/1 [==============================] - ETA: 0s - loss: 1.3440 - accuracy: 0.0081
1/1 [==============================] - 0s 21ms/step - loss: 1.3440 - accuracy: 0.0081 - val_loss: 1.2988 - val_accuracy: 0.0208
Epoch 10/10

1/1 [==============================] - ETA: 0s - loss: 1.3369 - accuracy: 0.0121
1/1 [==============================] - 0s 21ms/step - loss: 1.3369 - accuracy: 0.0121 - val_loss: 1.2927 - val_accuracy: 0.0312
Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
==================================================================================================
 island (InputLayer)         [(None, 1)]                  0         []                            
                                                                                                  
 bill_length_mm (InputLayer  [(None, 1)]                  0         []                            
 )                                                                                                
                                                                                                  
 string_lookup (StringLooku  (None, 1)                    0         ['island[0][0]']              
 p)                                                                                               
                                                                                                  
 normalization (Normalizati  (None, 1)                    3         ['bill_length_mm[0][0]']      
 on)                                                                                              
                                                                                                  
 category_encoding (Categor  (None, 32)                   0         ['string_lookup[0][0]']       
 yEncoding)                                                                                       
                                                                                                  
 concatenate (Concatenate)   (None, 33)                   0         ['normalization[0][0]',       
                                                                     'category_encoding[0][0]']   
                                                                                                  
 dense (Dense)               (None, 16)                   544       ['concatenate[0][0]']         
                                                                                                  
 dense_1 (Dense)             (None, 3)                    51        ['dense[0][0]']               
                                                                                                  
==================================================================================================
Total params: 598 (2.34 KB)
Trainable params: 595 (2.32 KB)
Non-trainable params: 3 (16.00 Byte)
__________________________________________________________________________________________________

神经网络层在两个模型之间共享。因此，现在神经网络已经训练好了，决策森林模型将适应于神经网络层的训练输出。

# 使用df_and_nn_model模型对训练数据集train_ds进行拟合

df_and_nn_model.fit(x=train_ds)




Reading training dataset...
Training dataset read in 0:00:00.293304. Found 248 examples.
Training model...
Model trained in 0:00:00.045032
Compiling model...
Model compiled.


[INFO 23-11-20 12:33:27.2559 UTC kernel.cc:1233] Loading model from path /tmpfs/tmp/tmpzwv9a980/model/ with prefix 3397b294ee2f42a4
[INFO 23-11-20 12:33:27.2721 UTC decision_forest.cc:660] Model loaded with 300 root(s), 5280 node(s), and 7 input feature(s).
[INFO 23-11-20 12:33:27.2721 UTC kernel.cc:1061] Use fast generic engine

现在评估组合模型：

# 编译模型
df_and_nn_model.compile(metrics=["accuracy"])

# 打印模型评估结果
print("Evaluation:", df_and_nn_model.evaluate(test_ds))

1/1 [==============================] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.9479
1/1 [==============================] - 0s 162ms/step - loss: 0.0000e+00 - accuracy: 0.9479
Evaluation: [0.0, 0.9479166865348816]

与仅使用神经网络相比：

# 打印输出 "Evaluation :"，并调用神经网络模型的 evaluate 方法对测试数据集进行评估
print("Evaluation :", nn_model.evaluate(test_ds))

1/1 [==============================] - ETA: 0s - loss: 1.2927 - accuracy: 0.0312
1/1 [==============================] - 0s 13ms/step - loss: 1.2927 - accuracy: 0.0312
Evaluation : [1.2926578521728516, 0.03125]

你可能感兴趣的:(数据挖掘,tensorflow,神经网络)

nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
Python开发常用的三方模块如下：换个网名有点难 python 开发语言
Python是一门功能强大的编程语言，拥有丰富的第三方库，这些库为开发者提供了极大的便利。以下是100个常用的Python库，涵盖了多个领域：1、NumPy，用于科学计算的基础库。2、Pandas，提供数据结构和数据分析工具。3、Matplotlib，一个绘图库。4、Scikit-learn，机器学习库。5、SciPy，用于数学、科学和工程的库。6、TensorFlow，由Google开发的开源机
ai绘画工具midjourney怎么下载？附作品管理教程设计师早上好
Midjourney是一款功能强大的AI绘画工具，它使用机器学习技术和深度神经网络等算法，可以生成各种艺术风格的绘画作品。在创意设计、广告宣传等方面有着广泛的应用前景。那么，ai绘画工具midjourney怎么下载？本文将为您介绍Midjourney的下载以及作品的相关管理。一、Midjourney下载Midjourney的下载非常简单，只需打开Midjourney官网（点击“GetMidjour
Python实现关联规则推荐这孩子谁懂哈 Python Machine Learning python 关联规则机器学习
1.什么关联规则关联规则（AssociationRules）是反映一个事物与其他事物之间的相互依存性和关联性，如果两个或多个事物之间存在一定的关联关系，那么，其中一个事物就能通过其他事物预测到。关联规则是数据挖掘的一个重要技术，用于从大量数据中挖掘出有价值的数据项之间的相关关系。关联规则挖掘的最经典的例子就是沃尔玛的啤酒与尿布的故事，通过对超市购物篮数据进行分析，即顾客放入购物篮中不同商品之间的关
吴恩达深度学习笔记(30)-正则化的解释极客Array
正则化（Regularization）深度学习可能存在过拟合问题——高方差，有两个解决方法，一个是正则化，另一个是准备更多的数据，这是非常可靠的方法，但你可能无法时时刻刻准备足够多的训练数据或者获取更多数据的成本很高，但正则化通常有助于避免过拟合或减少你的网络误差。如果你怀疑神经网络过度拟合了数据，即存在高方差问题，那么最先想到的方法可能是正则化，另一个解决高方差的方法就是准备更多数据，这也是非常
个人学习笔记7-6：动手学深度学习pytorch版-李沐浪子L 深度学习深度学习笔记计算机视觉 python 人工智能神经网络 pytorch
#人工智能##深度学习##语义分割##计算机视觉##神经网络#计算机视觉13.11全卷积网络全卷积网络（fullyconvolutionalnetwork，FCN）采用卷积神经网络实现了从图像像素到像素类别的变换。引入l转置卷积（transposedconvolution）实现的，输出的类别预测与输入图像在像素级别上具有一一对应关系：通道维的输出即该位置对应像素的类别预测。13.11.1构造模型下
计算机视觉中，Pooling的作用 Wils0nEdwards 计算机视觉人工智能
在计算机视觉中，Pooling（池化）是一种常见的操作，主要用于卷积神经网络（CNN）中。它通过对特征图进行下采样，减少数据的空间维度，同时保留重要的特征信息。Pooling的作用可以归纳为以下几个方面：1.降低计算复杂度与内存需求Pooling操作通过对特征图进行下采样，减少了特征图的空间分辨率（例如，高度和宽度）。这意味着网络需要处理的数据量会减少，从而降低了计算量和内存需求。这对大型神经网络
神经网络-损失函数红米煮粥神经网络人工智能深度学习
文章目录一、回归问题的损失函数1.均方误差（MeanSquaredError,MSE）2.平均绝对误差（MeanAbsoluteError,MAE）二、分类问题的损失函数1.0-1损失函数（Zero-OneLossFunction）2.交叉熵损失（Cross-EntropyLoss）3.合页损失（HingeLoss）三、总结在神经网络中，损失函数（LossFunction）扮演着至关重要的角色，它
BP神经网络的传递函数大胜归来19 MATLAB
BP网络一般都是用三层的，四层及以上的都比较少用；传输函数的选择，这个怎么说，假设你想预测的结果是几个固定值，如1,0等，满足某个条件输出1，不满足则0的话，首先想到的是hardlim函数，阈值型的，当然也可以考虑其他的；然后，假如网络是用来表达某种线性关系时，用purelin---线性传输函数；若是非线性关系的话，用别的非线性传递函数，多层网络时，每层不一定要用相同的传递函数，可以是三种配合，可
神经网络传递函数sigmoid,神经网络传递函数作用快乐的小荣荣神经网络机器学习深度学习人工智能
神经网络传递函数选取不同会有特别大差别嘛？只是最后一层，但前面层是非线性，那么可能存在区别不大的情况。线性函数f(a*input)=af(input),一般来说，input为向量，最简化情况下，可以假设input的各个维度，a1=a2=a3。。。意味着你线性层只是简单的对输入做了scale~而神经网络能起作用的原因，在于通过足够复杂的非线性函数，来模拟任何的分布。所以，神经网络必须要用非线性函数。
Python和R均方根误差平均绝对误差算法模型亚图跨际 Python 交叉知识 R 回归模型误差指标归一化均方根误差生态状态指标神经网络成本误差气体排放气候模型多项式拟合
要点回归模型误差评估指标归一化均方根误差生态状态指标神经网络成本误差计算气体排放气候算法模型Python误差指标均方根误差和平均绝对误差均方根偏差或均方根误差是两个密切相关且经常使用的度量值之一，用于衡量真实值或预测值与观测值或估计值之间的差异。估计器θ^\hat{\theta}θ^相对于估计参数θ\thetaθ的RMSD定义为均方误差的平方根：RMSD⁡(θ^)=MSE⁡(θ^)=E((θ^−θ
CV、NLP、数据控掘推荐、量化海的那边- AI算法自然语言处理人工智能
下面是对CV（计算机视觉）、NLP（自然语言处理）、数据挖掘推荐和量化的简要概述及其应用领域的介绍：1.CV（计算机视觉，ComputerVision）定义：计算机视觉是一门让计算机能够从图像或视频中提取有用信息，并做出决策的学科。它通过模拟人类的视觉系统来识别、处理和理解视觉信息。主要任务：图像分类：识别图像中的物体并分类，比如猫、狗、车等。目标检测：在图像或视频中定位并识别多个对象，如人脸检测
【机器学习与R语言】1-机器学习简介苹果酱0567 面试题汇总与解析 java 中间件开发语言 spring boot 后端
1.基本概念机器学习：发明算法将数据转化为智能行为数据挖掘VS机器学习：前者侧重寻找有价值的信息，后者侧重执行已知的任务。后者是前者的先期准备过程：数据——>抽象化——>一般化。或者：收集数据——推理数据——归纳数据——发现规律抽象化：训练：用一个特定模型来拟合数据集的过程用方程来拟合观测的数据：观测现象——数据呈现——模型建立。通过不同的格式来把信息概念化一般化：一般化：将抽象化的知识转换成可用
【NLP5-RNN模型、LSTM模型和GRU模型】一蓑烟雨紫洛 nlp rnn lstm gru nlp
RNN模型、LSTM模型和GRU模型1、什么是RNN模型RNN（RecurrentNeuralNetwork)中文称为循环神经网络，它一般以序列数据为输入，通过网络内部的结构设计有效捕捉序列之间的关系特征，一般也是以序列形式进行输出RNN的循环机制使模型隐层上一时间步产生的结果，能够作为当下时间步输入的一部分（当下时间步的输入除了正常的输入外还包括上一步的隐层输出）对当下时间步的输出产生影响2、R
基于深度学习的农作物病害检测 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的农作物病害检测利用卷积神经网络（CNN）、生成对抗网络（GAN）、Transformer等深度学习技术，自动识别和分类农作物的病害，帮助农业工作者提高作物管理效率、减少损失。1.农作物病害检测的挑战病害种类繁多：农作物病害的类型多样，不同病害在同一作物上的表现差异很大，同时同一种病害在不同生长阶段的症状也可能不同。环境影响：天气、光照、湿度等外部环境因素会影响农作物的表现，使得病害检
系统架构师软考历年论文题目（2009-2024年）及分析 pccai-vip 系统架构师系统架构
时间题目20091.论基于DSSA的软件架构设计与应用；2.论信息系统建模方法；3.论基于REST服务的Web应用系统设计；4.论软件可靠性设计与应用20101.论软件的静态演化和动态演化及其应用；2.论数据挖掘技术的应用；3.论大规模分布式系统缓存设计策略；4.论软件可靠性评价20111.论模型驱动架构在系统开发中的应用；2.论企业集成平台的架构设计；3.论企业架构管理与应用；4.论软件需求获取
深度学习--对抗生成网络（GAN, Generative Adversarial Network） Ambition_LAO 深度学习生成对抗网络
对抗生成网络（GAN,GenerativeAdversarialNetwork）是一种深度学习模型，由IanGoodfellow等人在2014年提出。GAN主要用于生成数据，通过两个神经网络相互对抗，来生成以假乱真的新数据。以下是对GAN的详细阐述，包括其概念、作用、核心要点、实现过程、代码实现和适用场景。1.概念GAN由两个神经网络组成：生成器（Generator）和判别器（Discrimina
大数据新视界 --大数据大厂之数据挖掘入门：用 R 语言开启数据宝藏的探索之旅青云交大数据新视界数据库大数据数据挖掘 R 语言算法案例未来趋势应用场景学习建议大数据新视界
亲爱的朋友们，热烈欢迎你们来到青云交的博客！能与你们在此邂逅，我满心欢喜，深感无比荣幸。在这个瞬息万变的时代，我们每个人都在苦苦追寻一处能让心灵安然栖息的港湾。而我的博客，正是这样一个温暖美好的所在。在这里，你们不仅能够收获既富有趣味又极为实用的内容知识，还可以毫无拘束地畅所欲言，尽情分享自己独特的见解。我真诚地期待着你们的到来，愿我们能在这片小小的天地里共同成长，共同进步。本博客的精华专栏：Ja
chatgpt赋能python：如何在Python中安装Keras库？ turensu ChatGpt python chatgpt keras 计算机
如何在Python中安装Keras库？Keras是一个简单易用的神经网络库，由FrançoisChollet编写。它在Python编程语言中实现了深度学习的功能，可以使您更轻松地构建和试验不同类型的神经网络。如果您是一名Python开发人员，肯定会想知道如何在您的Python项目中安装Keras库。在本文中，我们将向您展示如何安装和配置Keras库。步骤1：安装Python要使用Keras库，您需
如何理解深度学习的训练过程奋斗的草莓熊深度学习人工智能 python scikit-learn virtualenv numpy pandas
文章目录1.训练是干什么？2.预训练模型进行训练，主要更改的是预训练模型的什么东西？1.训练是干什么？以yolov5为例子，训练的目的是把一组输入猫狗图像放到神经网络中，得到一个输出模型，这个模型下次可以直接用来识别哪个是猫，哪个是狗2.预训练模型进行训练，主要更改的是预训练模型的什么东西？超参数（Hyperparameters）：这是模型结构中定义的参数，比如：卷积核大小（kernel_size
Keras深度学习框架入门及实战指南司莹嫣Maude
Keras深度学习框架入门及实战指南keraskeras-team/keras:是一个基于Python的深度学习库，它没有使用数据库。适合用于深度学习任务的开发和实现，特别是对于需要使用Python深度学习库的场景。特点是深度学习库、Python、无数据库。项目地址:https://gitcode.com/gh_mirrors/ke/keras一、项目介绍Keras简介Keras是一款高级神经网络
每天五分钟玩转深度学习PyTorch：模型参数优化器torch.optim 幻风_huanfeng 深度学习框架pytorch 深度学习 pytorch 人工智能神经网络机器学习优化算法
本文重点在机器学习或者深度学习中，我们需要通过修改参数使得损失函数最小化(或最大化)，优化算法就是一种调整模型参数更新的策略。在pytorch中定义了优化器optim，我们可以使用它调用封装好的优化算法，然后传递给它神经网络模型参数，就可以对模型进行优化。本文是学习第6步(优化器)，参考链接pytorch的学习路线随机梯度下降算法在深度学习和机器学习中，梯度下降算法是最常用的参数更新方法，它的公式
大数据之flink与hive 星辰_mya 大数据 flink hive
其实吧我不太想写flink，因为线上经验确实不多，这也是我需要补的地方，没有条件创造条件，先来一篇吧flink：高性能低延迟流批一体的分布式计算框架基于事件时间对实时数据精准处理快速响应支持批处理，高效离线分析和数据挖掘数据仓库的引擎丰富数据源/接收器，集成多种数据存储格式和源，比较常见就是咱们今天的主题hive了checkpoint恢复机制，故障恢复快速恢复计算任务分布式弹性扩展，据业务灵活增加
如何有效的学习AI大模型？ Python程序员罗宾学习人工智能语言模型自然语言处理架构
学习AI大模型是一个系统性的过程，涉及到多个学科的知识。以下是一些建议，帮助你更有效地学习AI大模型：基础知识储备：数学基础：学习线性代数、概率论、统计学和微积分等，这些是理解机器学习算法的数学基础。编程技能：掌握至少一种编程语言，如Python，因为大多数AI模型都是用Python实现的。理论学习：机器学习基础：了解监督学习、非监督学习、强化学习等基本概念。深度学习：学习神经网络的基本结构，如卷
【3.6 python中的numpy编写一个“手写数字识”的神经网络】 wang151038606 深度学习入门 python numpy 神经网络
3.6python中的numpy编写一个“手写数字识”的神经网络要使用Python中的NumPy库从头开始编写一个“手写数字识别”的神经网络，我们通常会处理MNIST数据集，这是一个广泛使用的包含手写数字的图像数据集。但是，完全用NumPy来实现神经网络（包括数据的加载、预处理、模型定义、前向传播、损失计算、反向传播和权重更新）是一个相当复杂的任务，因为NumPy本身不提供自动微分或高级优化算法（
yolov5单目测距+速度测量+目标跟踪 cv_2025 YOLO 目标跟踪人工智能计算机视觉机器学习图像处理 opencv
要在YOLOv5中添加测距和测速功能，您需要了解以下两个部分的原理：单目测距算法单目测距是使用单个摄像头来估计场景中物体的距离。常见的单目测距算法包括基于视差的方法（如立体匹配）和基于深度学习的方法（如神经网络）。基于深度学习的方法通常使用卷积神经网络（CNN）来学习从图像到深度图的映射关系。单目测距代码单目测距涉及到坐标转换，代码如下：defconvert_2D_to_3D(point2D,R,
探索深度学习的奥秘：从理论到实践的奇幻之旅小周不想卷深度学习
目录引言：穿越智能的迷雾一、深度学习的奇幻起源：从感知机到神经网络1.1感知机的启蒙1.2神经网络的诞生与演进1.3深度学习的崛起二、深度学习的核心魔法：神经网络架构2.1前馈神经网络（FeedforwardNeuralNetwork,FNN）2.2卷积神经网络（CNN）2.3循环神经网络（RNN）及其变体（LSTM,GRU）2.4生成对抗网络（GAN）三、深度学习的魔法秘籍：算法与训练3.1损失
纯生信很难发表？只是你没有及时抓住研究热点 SCI狂人团队
当你还做meta分析的时候，你会发现meta分析很难发或者单位已经不承认了，而聪明的人已经开始做常规的生信GEO、TCGA数据挖掘这些（这个时候生信比较好发）。当你开始做常规的生信GEO、TCGA数据挖掘的时候，你会发现这些一样也是比较难发了，而聪明的人已经开始抓免疫评分这个热点进行生信数据挖掘（这个时候免疫评分比较好发）。当你开始对免疫评分这个热点进行生信数据挖掘的时候，你会发现自己的研究方向差
卷积神经网络（CNN）详细介绍及其原理详解（二） FFmpeg123 Pytorch cnn 深度学习人工智能
接上一文继续;五、全连接层假设还是上面人的脑袋的示例，现在我们已经通过卷积和池化提取到了这个人的眼睛、鼻子和嘴的特征，如果我想利用这些特征来识别这个图片是否是人的脑袋该怎么办呢？此时我们只需要将提取到的所有特征图进行“展平”，将其维度变为1×x1×x1×x，这个过程就是全连接的过程。也就是说，此步我们将所有的特征都展开并进行运算，最后会得到一个概率值，这个概率值就是输入图片是否是人的概率，这个过程
关于python版本与TensorFlow安装的版本问题 iiimharrygGc. python tensorflow 开发语言
实测在conda环境下，python3.12的版本无法安装TensorFlow2.14.0（截至2024.5.21）最新版本在python3.7版本下正常安装ps：上述安装均在anacondanavigator软件内安装
rust的指针作为函数返回值是直接传递，还是先销毁后创建？ wudixiaotie 返回值
这是我自己想到的问题，结果去知呼提问，还没等别人回答，我自己就想到方法实验了。。 fn main() { let mut a = 34; println!("a's addr:{:p}", &a); let p = &mut a; println!("p's addr:{:p}", &a
java编程思想 -- 数据的初始化百合不是茶 java 数据的初始化
1.使用构造器确保数据初始化 /* *在ReckInitDemo类中创建Reck的对象 */ public class ReckInitDemo { public static void main(String[] args) { //创建Reck对象 new Reck(); } }
[航天与宇宙]为什么发射和回收航天器有档期 comsci
地球的大气层中有一个时空屏蔽层,这个层次会不定时的出现,如果该时空屏蔽层出现,那么将导致外层空间进入的任何物体被摧毁,而从地面发射到太空的飞船也将被摧毁... 所以,航天发射和飞船回收都需要等待这个时空屏蔽层消失之后,再进行 &
linux下批量替换文件内容商人shang linux 替换
1、网络上现成的资料　　格式: sed -i "s/查找字段/替换字段/g" `grep 查找字段 -rl 路径` 　　linux sed 批量替换多个文件中的字符串　　sed -i "s/oldstring/newstring/g" `grep oldstring -rl yourdir` 　　例如：替换/home下所有文件中的www.admi
网页在线天气预报 oloz 天气预报
网页在线调用天气预报 <%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="utf-8"%> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transit
SpringMVC和Struts2比较杨白白 springMVC
1. 入口 spring mvc的入口是servlet，而struts2是filter（这里要指出，filter和servlet是不同的。以前认为filter是servlet的一种特殊），这样就导致了二者的机制不同，这里就牵涉到servlet和filter的区别了。参见：http://blog.csdn.net/zs15932616453/article/details/8832343 2
refuse copy, lazy girl! 小桔子 copy
妹妹坐船头啊啊啊啊！都打算一点点琢磨呢。文字编辑也写了基本功能了。。今天查资料，结果查到了人家写得完完整整的。我清楚的认识到： 1.那是我自己觉得写不出的高度 2.如果直接拿来用，很快就能解决问题 3.然后就是抄咩~~ 4.肿么可以这样子，都不想写了今儿个，留着作参考吧！拒绝大抄特抄，慢慢一点点写！
apache与php整合 aichenglong php apache web
一 apache web服务器 1 apeche web服务器的安装 1)下载Apache web服务器 2)配置域名(如果需要使用要在DNS上注册) 3)测试安装访问http://localhost/验证是否安装成功 2 apache管理 1)service.msc进行图形化管理 2)命令管理，配
Maven常用内置变量 AILIKES maven
Built-in properties ${basedir} represents the directory containing pom.xml ${version} equivalent to ${project.version} (deprecated: ${pom.version}) Pom/Project properties Al
java的类和对象百合不是茶 JAVA面向对象类对象
java中的类： java是面向对象的语言，解决问题的核心就是将问题看成是一个类，使用类来解决 java使用 class 类名来创建类，在Java中类名要求和构造方法，Java的文件名是一样的创建一个A类： class A{ } java中的类：将某两个事物有联系的属性包装在一个类中，再通
JS控制页面输入框为只读 bijian1013 JavaScript
在WEB应用开发当中，增、删除、改、查功能必不可少，为了减少以后维护的工作量，我们一般都只做一份页面，通过传入的参数控制其是新增、修改或者查看。而修改时需将待修改的信息从后台取到并显示出来，实际上就是查看的过程，唯一的区别是修改时，页面上所有的信息能修改，而查看页面上的信息不能修改。因此完全可以将其合并，但通过前端JS将查看页面的所有信息控制为只读，在信息量非常大时，就比较麻烦。
AngularJS与服务器交互 bijian1013 JavaScript AngularJS $http
对于AJAX应用（使用XMLHttpRequests）来说，向服务器发起请求的传统方式是：获取一个XMLHttpRequest对象的引用、发起请求、读取响应、检查状态码，最后处理服务端的响应。整个过程示例如下： var xmlhttp = new XMLHttpRequest(); xmlhttp.onreadystatechange
[Maven学习笔记八]Maven常用插件应用 bit1129 maven
常用插件及其用法位于：http://maven.apache.org/plugins/ 1. Jetty server plugin 2. Dependency copy plugin 3. Surefire Test plugin 4. Uber jar plugin 1. Jetty Pl
【Hive六】Hive用户自定义函数(UDF) bit1129 自定义函数
1. 什么是Hive UDF Hive是基于Hadoop中的MapReduce，提供HQL查询的数据仓库。Hive是一个很开放的系统，很多内容都支持用户定制，包括：文件格式：Text File，Sequence File 内存中的数据格式： Java Integer/String, Hadoop IntWritable/Text 用户提供的 map/reduce 脚本：不管什么
杀掉nginx进程后丢失nginx.pid，如何重新启动nginx ronin47 nginx 重启 pid丢失
nginx进程被意外关闭，使用nginx -s reload重启时报如下错误：nginx: [error] open() “/var/run/nginx.pid” failed (2: No such file or directory)这是因为nginx进程被杀死后pid丢失了，下一次再开启nginx -s reload时无法启动解决办法：nginx -s reload 只是用来告诉运行中的ng
UI设计中我们为什么需要设计动效 brotherlamp UI ui教程 ui视频 ui资料 ui自学
随着国际大品牌苹果和谷歌的引领，最近越来越多的国内公司开始关注动效设计了，越来越多的团队已经意识到动效在产品用户体验中的重要性了，更多的UI设计师们也开始投身动效设计领域。但是说到底，我们到底为什么需要动效设计？或者说我们到底需要什么样的动效？做动效设计也有段时间了，于是尝试用一些案例，从产品本身出发来说说我所思考的动效设计。一、加强体验舒适度嗯，就是让用户更加爽更加爽的用你的产品。
Spring中JdbcDaoSupport的DataSource注入问题 bylijinnan java spring
参考以下两篇文章： http://www.mkyong.com/spring/spring-jdbctemplate-jdbcdaosupport-examples/ http://stackoverflow.com/questions/4762229/spring-ldap-invoking-setter-methods-in-beans-configuration Sprin
数据库连接池的工作原理 chicony 数据库连接池
随着信息技术的高速发展与广泛应用，数据库技术在信息技术领域中的位置越来越重要，尤其是网络应用和电子商务的迅速发展，都需要数据库技术支持动态Web站点的运行，而传统的开发模式是：首先在主程序（如Servlet、Beans）中建立数据库连接；然后进行SQL操作，对数据库中的对象进行查询、修改和删除等操作；最后断开数据库连接。使用这种开发模式，对
java 关键字 CrazyMizzz java
关键字是事先定义的，有特别意义的标识符，有时又叫保留字。对于保留字，用户只能按照系统规定的方式使用，不能自行定义。 Java中的关键字按功能主要可以分为以下几类：（1）访问修饰符 public,private,protected p
Hive中的排序语法 daizj 排序 hive order by DISTRIBUTE BY sort by
Hive中的排序语法 2014.06.22 ORDER BY hive中的ORDER BY语句和关系数据库中的sql语法相似。他会对查询结果做全局排序，这意味着所有的数据会传送到一个Reduce任务上，这样会导致在大数量的情况下，花费大量时间。与数据库中 ORDER BY 的区别在于在hive.mapred.mode = strict模式下，必须指定 limit 否则执行会报错。
单态设计模式 dcj3sjt126com 设计模式
单例模式（Singleton）用于为一个类生成一个唯一的对象。最常用的地方是数据库连接。使用单例模式生成一个对象后，该对象可以被其它众多对象所使用。 <?phpclass Example{ // 保存类实例在此属性中 private static&
svn locked dcj3sjt126com Lock
post-commit hook failed (exit code 1) with output: svn: E155004: Working copy 'D:\xx\xxx' locked svn: E200031: sqlite: attempt to write a readonly database svn: E200031: sqlite: attempt to write a
ARM寄存器学习 e200702084 数据结构 C++c C#F#
无论是学习哪一种处理器，首先需要明确的就是这种处理器的寄存器以及工作模式。 ARM有37个寄存器，其中31个通用寄存器，6个状态寄存器。 1、不分组寄存器（R0-R7）不分组也就是说说，在所有的处理器模式下指的都时同一物理寄存器。在异常中断造成处理器模式切换时，由于不同的处理器模式使用一个名字相同的物理寄存器，就是
常用编码资料 gengzg 编码
List<UserInfo> list=GetUserS.GetUserList(11); String json=JSON.toJSONString(list); HashMap<Object,Object> hs=new HashMap<Object, Object>(); for(int i=0;i<10;i++) {
进程 vs. 线程 hongtoushizi 线程 linux 进程
我们介绍了多进程和多线程，这是实现多任务最常用的两种方式。现在，我们来讨论一下这两种方式的优缺点。首先，要实现多任务，通常我们会设计Master-Worker模式，Master负责分配任务，Worker负责执行任务，因此，多任务环境下，通常是一个Master，多个Worker。如果用多进程实现Master-Worker，主进程就是Master，其他进程就是Worker。如果用多线程实现
Linux定时Job：crontab -e 与 /etc/crontab 的区别 Josh_Persistence linux crontab
一、linux中的crotab中的指定的时间只有5个部分：* * * * * 分别表示：分钟，小时，日，月，星期，具体说来：第一段代表分钟 0—59 第二段代表小时 0—23 第三段代表日期 1—31 第四段代表月份 1—12 第五段代表星期几，0代表星期日 0—6 如： */1 * * * * 每分钟执行一次。 *
KMP算法详解 hm4123660 数据结构 C++算法字符串 KMP
字符串模式匹配我们相信大家都有遇过，然而我们也习惯用简单匹配法（即Brute-Force算法)，其基本思路就是一个个逐一对比下去，这也是我们大家熟知的方法，然而这种算法的效率并不高，但利于理解。假设主串s="ababcabcacbab",模式串为t="
枚举类型的单例模式 zhb8015 单例模式
E.编写一个包含单个元素的枚举类型[极推荐]。代码如下： public enum MaYun {himself; //定义一个枚举的元素，就代表MaYun的一个实例private String anotherField;MaYun() {//MaYun诞生要做的事情//这个方法也可以去掉。将构造时候需要做的事情放在instance赋值的时候：/** himself = MaYun() {*
Kafka+Storm+HDFS ssydxa219 storm
cd /myhome/usr/stormbin/storm nimbus &bin/storm supervisor &bin/storm ui &Kafka+Storm+HDFS整合实践kafka_2.9.2-0.8.1.1.tgzapache-storm-0.9.2-incubating.tar.gzKafka安装配置我们使用3台机器搭建Kafk
Java获取本地服务器的IP 中华好儿孙 java Web 获取服务器ip地址
System.out.println("getRequestURL:"+request.getRequestURL()); System.out.println("getLocalAddr:"+request.getLocalAddr()); System.out.println("getLocalPort:&quo