愤斗的橘子

工具系列：TensorFlow决策森林_(10)构建Uplifting Model

文章目录

- 安装 TensorFlow Decision Forests
- 导入库
- 什么是Uplift Modelling？
- - 在TF-DF中定义提升模型
- 训练一个uplifting model
- - 数据集预处理
  - 模型训练
评估Uplift models。
- Uplift models的度量指标
- 模型自我评估
- 手动计算AUUC
- - 计算AUUC

欢迎来到TensorFlow决策森林（TF-DF）的 Uplifting Model教程。在本教程中，您将学习什么是令Uplifting Model，为什么它如此重要，以及如何在TF-DF中实现它。

在这本文中，您将：

了解什么是Uplifting Model。
在Hillstrom电子邮件营销数据集上训练一个令人振奋的随机森林模型。
评估该模型的质量。

安装 TensorFlow Decision Forests

通过运行以下单元格来安装 TF-DF。

在显示详细的训练日志（当在模型构造函数中使用 verbose=2 时），需要使用 Wurlitzer。


# 安装tensorflow_decision_forests和wurlitzer库
!pip install tensorflow_decision_forests wurlitzer

Collecting tensorflow_decision_forests
  Obtaining dependency information for tensorflow_decision_forests from https://files.pythonhosted.org/packages/86/70/fa05c33db4bd9e7c4d4285a628f1127fd2d5a6aa5a3b324865f38f985bb1/tensorflow_decision_forests-1.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Using cached tensorflow_decision_forests-1.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.0 kB)
Collecting wurlitzer
  Using cached wurlitzer-3.0.3-py3-none-any.whl (7.3 kB)
Requirement already satisfied: numpy in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.26.0)
Requirement already satisfied: pandas in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (2.1.1)
Requirement already satisfied: tensorflow~=2.14.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (2.14.0)
Requirement already satisfied: six in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.16.0)
Requirement already satisfied: absl-py in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.4.0)
Requirement already satisfied: wheel in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (0.41.2)
Requirement already satisfied: astunparse>=1.6.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (1.6.3)
Requirement already satisfied: flatbuffers>=23.5.26 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (23.5.26)
Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (0.5.4)
Requirement already satisfied: google-pasta>=0.1.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (0.2.0)
Requirement already satisfied: h5py>=2.9.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (3.9.0)
Requirement already satisfied: libclang>=13.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (16.0.6)
Requirement already satisfied: ml-dtypes==0.2.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (0.2.0)
Requirement already satisfied: opt-einsum>=2.3.2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (3.3.0)
Requirement already satisfied: packaging in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (23.2)
Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (3.20.3)
Requirement already satisfied: setuptools in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (68.2.2)
Requirement already satisfied: termcolor>=1.1.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (2.3.0)
Requirement already satisfied: typing-extensions>=3.6.6 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (4.8.0)
Requirement already satisfied: wrapt<1.15,>=1.11.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (1.14.1)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (0.34.0)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (1.59.0)
Requirement already satisfied: tensorboard<2.15,>=2.14 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (2.14.1)
Requirement already satisfied: tensorflow-estimator<2.15,>=2.14.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (2.14.0)
Requirement already satisfied: keras<2.15,>=2.14.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.14.0->tensorflow_decision_forests) (2.14.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas->tensorflow_decision_forests) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas->tensorflow_decision_forests) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas->tensorflow_decision_forests) (2023.3)
Requirement already satisfied: google-auth<3,>=1.6.3 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (2.23.2)
Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (1.0.0)
Requirement already satisfied: markdown>=2.6.8 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (3.4.4)
Requirement already satisfied: requests<3,>=2.21.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (2.31.0)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (0.7.1)
Requirement already satisfied: werkzeug>=1.0.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (3.0.0)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (5.3.1)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (0.3.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (4.9)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (1.3.1)
Requirement already satisfied: importlib-metadata>=4.4 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from markdown>=2.6.8->tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (6.8.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (3.3.0)
Requirement already satisfied: idna<4,>=2.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (2.0.6)
Requirement already satisfied: certifi>=2017.4.17 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (2023.7.22)
Requirement already satisfied: MarkupSafe>=2.1.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from werkzeug>=1.0.1->tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (2.1.3)
Requirement already satisfied: zipp>=0.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (3.17.0)
Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (0.5.0)
Requirement already satisfied: oauthlib>=3.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tensorflow_decision_forests) (3.2.2)
Using cached tensorflow_decision_forests-1.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB)
Installing collected packages: wurlitzer, tensorflow_decision_forests
Successfully installed tensorflow_decision_forests-1.6.0 wurlitzer-3.0.3

导入库

# 导入tensorflow_decision_forests库
import tensorflow_decision_forests as tfdf

# 导入os库
import os

# 导入numpy库
import numpy as np

# 导入pandas库
import pandas as pd

# 导入tensorflow库
import tensorflow as tf

# 导入math库
import math

# 导入matplotlib.pyplot库
import matplotlib.pyplot as plt

2023-10-03 11:11:04.771348: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-03 11:11:04.771393: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-03 11:11:04.771442: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

隐藏的代码单元格会限制Colab中的输出高度。

#@title

# 导入所需的模块
from IPython.core.magic import register_line_magic
from IPython.display import Javascript
from IPython.display import display as ipy_display

# 定义一个魔术命令，用于设置单元格的最大高度
@register_line_magic
def set_cell_height(size):
  # 调用Javascript代码，设置单元格的最大高度
  ipy_display(
      Javascript("google.colab.output.setIframeHeight(0, true, {maxHeight: " +
                 str(size) + "})"))

# 检查 TensorFlow Decision Forests 的版本
print("Found TensorFlow Decision Forests v" + tfdf.__version__)

Found TensorFlow Decision Forests v1.6.0

什么是Uplift Modelling？

Uplift modeling是一种统计建模技术，用于预测对主体的行动的增量影响。该行动通常被称为可能或可能不会应用的处理。

Uplift modeling经常用于有针对性的营销活动中，以预测一个人在接收到营销宣传后进行购买（或任何其他期望的行动）的可能性增加。

例如，Uplift modeling可以预测电子邮件的效果。效果被定义为条件概率
\begin{align}
\text{effect}(\text{email}) = &\Pr(\text{outcome}=\text{purchase}\ \vert\ \text{treatment}=\text{with email})\ &- \Pr(\text{outcome}=\text{purchase} \ \vert\ \text{treatment}=\text{no email}),
\end{align}
其中 $\Pr(\text{outcome}=\text{purchase}\ \vert\ ...)$
是根据接收或不接收电子邮件而购买的概率。

将此与分类模型进行比较：使用分类模型，可以预测购买的概率。然而，具有高概率的客户可能会在商店里花钱，无论他们是否收到电子邮件。

类似地，可以使用数值提升来预测接收电子邮件时的数值花费增加。相比之下，回归模型只能增加预期花费，这在许多情况下是一个不太有用的指标。

在TF-DF中定义提升模型

TF-DF期望以“扁平”格式呈现提升数据集。
一个客户数据集可能如下所示

处理	结果	特征_1	特征_2
0	1	0.1	蓝色
0	0	0.2	蓝色
1	1	0.3	蓝色
1	1	0.4	蓝色

处理是一个二进制变量，指示示例是否接受了处理。在上面的示例中，处理指示客户是否收到了电子邮件。结果（标签）指示示例在接收处理（或未接收处理）后的状态。TF-DF支持分类提升的分类结果和数值提升的数值结果。

注意：提升在医学背景中也经常使用。这里的处理可以是医疗处理（例如接种疫苗），标签可以是生活质量的指标（例如患者是否生病）。这也解释了Uplift modeling的命名方式。

训练一个uplifting model

在这个例子中，我们将使用Hillstrom电子邮件营销数据集。

该数据集包含了64000名在过去十二个月内最后一次购买的顾客。这些顾客参与了一项电子邮件测试：

1/3的顾客被随机选择接收到一封以男士商品为特色的电子邮件广告。
1/3的顾客被随机选择接收到一封以女士商品为特色的电子邮件广告。
1/3的顾客被随机选择不接收任何电子邮件广告。

在电子邮件广告活动结束后的两周内，结果被跟踪记录。任务是判断男士或女士的电子邮件广告活动是否成功。

在数据集文档中了解更多关于数据集的信息。本教程使用由TensorFlow Datasets精选的数据集。

# 安装 TensorFlow Datasets 包
!pip install tensorflow-datasets -U --quiet

# 导入所需的库
import tensorflow_datasets as tfds
# 加载数据集
raw_train, raw_test = tfds.load('hillstrom', split=['train[:80%]', 'train[20%:]'])

# 显示测试集中的前10个样本
test_data = list(raw_test.batch(10).take(1))  # 获取测试集中的前10个样本
df = pd.DataFrame(test_data[0])  # 将样本转换为DataFrame格式
df

2023-10-03 11:11:10.733549: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2211] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2023-10-03 11:11:11.372447: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

	channel	history	history_segment	mens	newbie	recency	segment	visit	womens	zip_code
0	b'Web'	29.990000	b'1) $0 - $100'	1	0	6	b'Womens E-Mail'	0	0	b'Surburban'
1	b'Web'	150.380005	b'2) $100 - $200'	0	1	9	b'Womens E-Mail'	0	1	b'Surburban'
2	b'Phone'	602.960022	b'5) $500 - $750'	1	1	4	b'Womens E-Mail'	0	0	b'Surburban'
3	b'Multichannel'	341.010010	b'3) $200 - $350'	0	0	9	b'Womens E-Mail'	1	1	b'Urban'
4	b'Phone'	97.180000	b'1) $0 - $100'	0	1	3	b'Womens E-Mail'	1	1	b'Surburban'
5	b'Web'	83.269997	b'1) $0 - $100'	1	0	5	b'Mens E-Mail'	0	0	b'Urban'
6	b'Web'	331.170013	b'3) $200 - $350'	1	0	8	b'Womens E-Mail'	0	0	b'Surburban'
7	b'Multichannel'	628.400024	b'5) $500 - $750'	1	1	9	b'No E-Mail'	1	0	b'Surburban'
8	b'Phone'	134.610001	b'2) $100 - $200'	1	0	6	b'No E-Mail'	1	0	b'Rural'
9	b'Web'	141.210007	b'2) $100 - $200'	0	1	9	b'Mens E-Mail'	1	1	b'Surburban'

数据集预处理

由于TF-DF目前只支持二进制处理，将"Men’s Email"和"Women’s Email"活动合并。本教程使用二进制变量conversion作为结果。这意味着问题是一个分类提升问题。如果我们使用数值变量spend，问题将成为一个数值提升问题。

# 定义函数prepare_dataset，用于准备数据集
# 参数example为输入的样本数据
def prepare_dataset(example):
    # 使用二进制的treatment类别
    # 如果segment为'Mens E-Mail'或'Womens E-Mail'，则treatment为1，否则为0
    example['treatment'] = 1 if example['segment'] == b'Mens E-Mail' or example['segment'] == b'Womens E-Mail' else 0
    # 将outcome赋值为example中的conversion值
    outcome = example['conversion']
    # 限制数据集的输入特征为'channel', 'history', 'mens', 'womens', 'newbie', 'recency', 'zip_code', 'treatment'
    input_features = ['channel', 'history', 'mens', 'womens', 'newbie', 'recency', 'zip_code', 'treatment']
    # 创建一个新的example字典，只包含input_features中的特征，并将对应的值从原始example中复制过来
    example = {feature: example[feature] for feature in input_features}
    # 返回处理后的example和outcome
    return example, outcome

# 将raw_train数据集映射到prepare_dataset函数，并按照batch size为100进行分批处理
train_ds = raw_train.map(prepare_dataset).batch(100)

# 将raw_test数据集映射到prepare_dataset函数，并按照batch size为100进行分批处理
test_ds = raw_test.map(prepare_dataset).batch(100)

模型训练

最后，按照通常的方式训练和评估模型。请注意，TF-DF仅支持随机森林模型进行提升。

# 设置单元格高度为300

# 配置模型及其超参数。
model = tfdf.keras.RandomForestModel(
    verbose=2,  # 设置训练过程中的输出详细程度为2，即显示每个epoch的进度和性能指标。
    task=tfdf.keras.Task.CATEGORICAL_UPLIFT,  # 设置模型任务为分类的提升（uplift）任务。
    uplift_treatment='treatment'  # 设置提升任务的处理变量为'treatment'。
)

# 训练模型。
model.fit(train_ds)  # 使用训练数据集进行模型训练。




Warning: The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


WARNING:absl:The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


Use /tmpfs/tmp/tmpkvr89ot3 as temporary training directory
Reading training dataset...
Training tensor examples:
Features: {'channel': , 'history': , 'mens': , 'womens': , 'newbie': , 'recency': , 'zip_code': , 'treatment': }
Label: Tensor("data_8:0", shape=(None,), dtype=int64)
Weights: None
Normalized tensor features:
 {'channel': SemanticTensor(semantic=, tensor=), 'history': SemanticTensor(semantic=, tensor=), 'mens': SemanticTensor(semantic=, tensor=), 'womens': SemanticTensor(semantic=, tensor=), 'newbie': SemanticTensor(semantic=, tensor=), 'recency': SemanticTensor(semantic=, tensor=), 'zip_code': SemanticTensor(semantic=, tensor=)}
Training dataset read in 0:00:04.719923. Found 51200 examples.
Training model...
Standard output detected as not visible to the user e.g. running in a notebook. Creating a training log redirection. If training gets stuck, try calling tfdf.keras.set_training_logs_redirection(False).


[INFO 23-10-03 11:11:16.2703 UTC kernel.cc:773] Start Yggdrasil model training
[INFO 23-10-03 11:11:16.2703 UTC kernel.cc:774] Collect training examples
[INFO 23-10-03 11:11:16.2703 UTC kernel.cc:787] Dataspec guide:
column_guides {
  column_name_pattern: "^__LABEL$"
  type: CATEGORICAL
}
default_column_guide {
  categorial {
    max_vocab_count: 2000
  }
  discretized_numerical {
    maximum_num_bins: 255
  }
}
ignore_columns_without_guides: false
detect_numerical_as_discretized_numerical: false

[INFO 23-10-03 11:11:16.2707 UTC kernel.cc:393] Number of batches: 512
[INFO 23-10-03 11:11:16.2707 UTC kernel.cc:394] Number of examples: 51200
[INFO 23-10-03 11:11:16.2800 UTC kernel.cc:794] Training dataset:
Number of records: 51200
Number of columns: 9

Number of columns by type:
	NUMERICAL: 5 (55.5556%)
	CATEGORICAL: 4 (44.4444%)

Columns:

NUMERICAL: 5 (55.5556%)
	2: "history" NUMERICAL mean:241.833 min:29.99 max:3345.93 sd:255.292
	3: "mens" NUMERICAL mean:0.550391 min:0 max:1 sd:0.497454
	4: "newbie" NUMERICAL mean:0.503086 min:0 max:1 sd:0.49999
	5: "recency" NUMERICAL mean:5.75514 min:1 max:12 sd:3.50281
	7: "womens" NUMERICAL mean:0.549687 min:0 max:1 sd:0.497525

CATEGORICAL: 4 (44.4444%)
	0: "__LABEL" CATEGORICAL integerized vocab-size:3 no-ood-item
	1: "channel" CATEGORICAL has-dict vocab-size:4 zero-ood-items most-frequent:"Web" 22576 (44.0938%)
	6: "treatment" CATEGORICAL integerized vocab-size:3 no-ood-item
	8: "zip_code" CATEGORICAL has-dict vocab-size:4 zero-ood-items most-frequent:"Surburban" 22966 (44.8555%)

Terminology:
	nas: Number of non-available (i.e. missing) values.
	ood: Out of dictionary.
	manually-defined: Attribute which type is manually defined by the user i.e. the type was not automatically inferred.
	tokenized: The attribute value is obtained through tokenization.
	has-dict: The attribute is attached to a string dictionary e.g. a categorical attribute stored as a string.
	vocab-size: Number of unique values.

[INFO 23-10-03 11:11:16.2800 UTC kernel.cc:810] Configure learner
[INFO 23-10-03 11:11:16.2802 UTC kernel.cc:824] Training config:
learner: "RANDOM_FOREST"
features: "^channel$"
features: "^history$"
features: "^mens$"
features: "^newbie$"
features: "^recency$"
features: "^womens$"
features: "^zip_code$"
label: "^__LABEL$"
task: CATEGORICAL_UPLIFT
random_seed: 123456
uplift_treatment: "treatment"
metadata {
  framework: "TF Keras"
}
pure_serving_model: false
[yggdrasil_decision_forests.model.random_forest.proto.random_forest_config] {
  num_trees: 300
  decision_tree {
    max_depth: 16
    min_examples: 5
    in_split_min_examples_check: true
    keep_non_leaf_label_distribution: true
    num_candidate_attributes: 0
    missing_value_policy: GLOBAL_IMPUTATION
    allow_na_conditions: false
    categorical_set_greedy_forward {
      sampling: 0.1
      max_num_items: -1
      min_item_frequency: 1
    }
    growing_strategy_local {
    }
    categorical {
      cart {
      }
    }
    axis_aligned_split {
    }
    internal {
      sorting_strategy: PRESORTED
    }
    uplift {
      min_examples_in_treatment: 5
      split_score: KULLBACK_LEIBLER
    }
  }
  winner_take_all_inference: true
  compute_oob_performances: true
  compute_oob_variable_importances: false
  num_oob_variable_importances_permutations: 1
  bootstrap_training_dataset: true
  bootstrap_size_ratio: 1
  adapt_bootstrap_size_ratio_for_maximum_training_duration: false
  sampling_with_replacement: true
}

[INFO 23-10-03 11:11:16.2806 UTC kernel.cc:827] Deployment config:
cache_path: "/tmpfs/tmp/tmpkvr89ot3/working_cache"
num_threads: 32
try_resume_training: true

[INFO 23-10-03 11:11:16.2808 UTC kernel.cc:889] Train model
[INFO 23-10-03 11:11:16.2809 UTC random_forest.cc:416] Training random forest on 51200 example(s) and 7 feature(s).
[WARNING 23-10-03 11:11:16.4040 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.4058 UTC random_forest.cc:802] Training of tree  1/300 (tree index:28) done qini:0.000608425 auuc:0.00206948
[WARNING 23-10-03 11:11:16.4811 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.4858 UTC random_forest.cc:802] Training of tree  11/300 (tree index:1) done qini:7.44252e-05 auuc:0.00242451
[WARNING 23-10-03 11:11:16.5640 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.5666 UTC random_forest.cc:802] Training of tree  21/300 (tree index:22) done qini:4.22719e-05 auuc:0.00240438
[WARNING 23-10-03 11:11:16.6477 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.6521 UTC random_forest.cc:802] Training of tree  31/300 (tree index:13) done qini:8.03027e-05 auuc:0.00245679
[WARNING 23-10-03 11:11:16.7137 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.7161 UTC random_forest.cc:802] Training of tree  41/300 (tree index:38) done qini:8.50687e-05 auuc:0.00246156
[WARNING 23-10-03 11:11:16.7806 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.7833 UTC random_forest.cc:802] Training of tree  51/300 (tree index:49) done qini:-3.59235e-05 auuc:0.00234057
[WARNING 23-10-03 11:11:16.8648 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.8692 UTC random_forest.cc:802] Training of tree  61/300 (tree index:59) done qini:-0.000105298 auuc:0.00227119
[WARNING 23-10-03 11:11:16.9304 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.9329 UTC random_forest.cc:802] Training of tree  71/300 (tree index:68) done qini:-0.000137303 auuc:0.00223919
[WARNING 23-10-03 11:11:16.9970 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.9996 UTC random_forest.cc:802] Training of tree  81/300 (tree index:80) done qini:-8.23665e-05 auuc:0.00229412
[WARNING 23-10-03 11:11:17.0654 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.0682 UTC random_forest.cc:802] Training of tree  91/300 (tree index:91) done qini:-0.000220825 auuc:0.00215566
[WARNING 23-10-03 11:11:17.1524 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.1570 UTC random_forest.cc:802] Training of tree  101/300 (tree index:95) done qini:-0.000228188 auuc:0.0021483
[WARNING 23-10-03 11:11:17.2209 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.2235 UTC random_forest.cc:802] Training of tree  111/300 (tree index:108) done qini:-0.000288918 auuc:0.00208757
[WARNING 23-10-03 11:11:17.2774 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.2798 UTC random_forest.cc:802] Training of tree  121/300 (tree index:117) done qini:-0.000304144 auuc:0.00207234
[WARNING 23-10-03 11:11:17.3440 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.3463 UTC random_forest.cc:802] Training of tree  131/300 (tree index:129) done qini:-0.000216986 auuc:0.0021595
[WARNING 23-10-03 11:11:17.4250 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.4296 UTC random_forest.cc:802] Training of tree  141/300 (tree index:140) done qini:-0.000173193 auuc:0.0022033
[WARNING 23-10-03 11:11:17.4940 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.4966 UTC random_forest.cc:802] Training of tree  151/300 (tree index:151) done qini:-0.000152671 auuc:0.00222382
[WARNING 23-10-03 11:11:17.5521 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.5560 UTC random_forest.cc:802] Training of tree  161/300 (tree index:158) done qini:-0.000176023 auuc:0.00220047
[WARNING 23-10-03 11:11:17.6199 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.6225 UTC random_forest.cc:802] Training of tree  171/300 (tree index:171) done qini:-0.000151236 auuc:0.00222525
[WARNING 23-10-03 11:11:17.6565 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.6589 UTC random_forest.cc:802] Training of tree  196/300 (tree index:195) done qini:-0.000153745 auuc:0.00222274
[WARNING 23-10-03 11:11:17.8094 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.8143 UTC random_forest.cc:802] Training of tree  206/300 (tree index:205) done qini:-0.000105493 auuc:0.002271
[WARNING 23-10-03 11:11:17.8704 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.8730 UTC random_forest.cc:802] Training of tree  216/300 (tree index:208) done qini:-0.00012975 auuc:0.00224674
[WARNING 23-10-03 11:11:17.9298 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.9323 UTC random_forest.cc:802] Training of tree  226/300 (tree index:223) done qini:-0.000134271 auuc:0.00224222
[WARNING 23-10-03 11:11:18.0143 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:18.0189 UTC random_forest.cc:802] Training of tree  236/300 (tree index:233) done qini:-0.00011439 auuc:0.0022621
[WARNING 23-10-03 11:11:18.0843 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:18.0870 UTC random_forest.cc:802] Training of tree  246/300 (tree index:246) done qini:-0.000150459 auuc:0.00222603
[WARNING 23-10-03 11:11:18.1504 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:18.1529 UTC random_forest.cc:802] Training of tree  256/300 (tree index:248) done qini:-0.00013702 auuc:0.00223947
[WARNING 23-10-03 11:11:18.1913 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:18.1941 UTC random_forest.cc:802] Training of tree  280/300 (tree index:279) done qini:-0.000126474 auuc:0.00225001
[WARNING 23-10-03 11:11:18.3165 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:18.3189 UTC random_forest.cc:802] Training of tree  290/300 (tree index:287) done qini:-0.000183679 auuc:0.00219281
[WARNING 23-10-03 11:11:18.3762 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:18.3785 UTC random_forest.cc:802] Training of tree  300/300 (tree index:295) done qini:-0.000173259 auuc:0.00220323
[INFO 23-10-03 11:11:18.3818 UTC random_forest.cc:882] Final OOB metrics: qini:-0.000173259 auuc:0.00220323
[INFO 23-10-03 11:11:18.3984 UTC kernel.cc:926] Export model in log directory: /tmpfs/tmp/tmpkvr89ot3 with prefix d0d80b64ba754300
[INFO 23-10-03 11:11:18.4402 UTC kernel.cc:944] Save model in resources
[INFO 23-10-03 11:11:18.4426 UTC abstract_model.cc:881] Model self evaluation:
Number of predictions (without weights): 51200
Number of predictions (with weights): 51200
Task: CATEGORICAL_UPLIFT
Label: __LABEL

Number of treatments: 2
AUUC: 0.00220323
Qini: -0.000173259

[INFO 23-10-03 11:11:18.4697 UTC kernel.cc:1233] Loading model from path /tmpfs/tmp/tmpkvr89ot3/model/ with prefix d0d80b64ba754300
[INFO 23-10-03 11:11:18.6711 UTC decision_forest.cc:660] Model loaded with 300 root(s), 60190 node(s), and 7 input feature(s).
[INFO 23-10-03 11:11:18.6711 UTC abstract_model.cc:1343] Engine "RandomForestGeneric" built
[INFO 23-10-03 11:11:18.6711 UTC kernel.cc:1061] Use fast generic engine


Model trained in 0:00:02.419511
Compiling model...
Model compiled.

评估Uplift models。

Uplift models的度量指标

评估Uplift models最重要的两个度量指标是AUUC（提升曲线下的面积）和Qini（Qini曲线下的面积）度量。这类似于用于分类问题的AUC和准确率的使用。对于这两个度量指标，它们越大越好。

AUUC和Qini都是非标准化的度量指标。这意味着度量指标的最佳可能值可以因数据集而异。这与例如AUC度量指标始终在0和1之间变化不同。

AUUC的正式定义如下。有关这些度量指标的更多信息，请参见Guelman和Betlei et al.。

模型自我评估

TF-DF随机森林模型对训练数据集的袋外样本进行自我评估。对于提升模型，它们公开了AUUC和Qini指标。您可以通过检查器直接在训练数据集上检索这两个指标。

稍后，我们将“手动”在测试数据集上重新计算AUUC指标。请注意，由于AUUC不是归一化指标，因此不应期望两个指标完全相等（训练集上的袋外与测试集）。

# 创建一个模型检查器对象insp，用于获取模型的自我评估信息
insp = model.make_inspector()

# 调用模型检查器的evaluation方法，获取模型的自我评估结果
insp.evaluation()

Evaluation(num_examples=51200, accuracy=None, loss=None, rmse=None, ndcg=None, aucs=None, auuc=0.0022032303161204467, qini=-0.00017325876815314604)

手动计算AUUC

在本节中，我们手动计算AUUC并绘制提升曲线。

接下来的几段解释AUUC指标的细节，可以跳过。

计算AUUC

假设您有一个带有 $∣ T ∣$ 个带有处理的示例和 $∣ C ∣$ 个没有处理的示例的标记数据集，称为控制示例。对于每个示例，提升模型 $f$ 生成一个条件概率，即处理示例将产生积极结果的概率。

假设决策者需要使用提升模型 $f$ 决定向哪些客户发送电子邮件。该模型生成电子邮件将导致转化的（条件）概率。因此，决策者可能只选择要发送的电子邮件数量 $k$ ，并将这些 $k$ 封电子邮件发送给具有最高概率的客户。

使用标记的测试数据集，可以研究 $k$ 对活动成功的影响。首先，我们对接收到电子邮件并转化的客户占所有接收电子邮件客户的比例 $\frac{|C \cap T|}{|T|}$ 感兴趣。这里 $C$ 是接收并转化电子邮件的客户集， $T$ 是接收电子邮件的客户总数。我们将这个比例绘制成 $k$ 的函数。

理想情况下，我们希望这条曲线急剧上升。这意味着模型优先发送电子邮件给那些在接收电子邮件时会产生转化的客户。

# 计算在测试数据集上的所有预测值
predictions = model.predict(test_ds).flatten()

# 提取结果和处理方法
outcomes = np.concatenate([outcome.numpy() for _, outcome in test_ds])
treatment = np.concatenate([example['treatment'].numpy() for example,_ in test_ds])
control = 1 - treatment

# 统计处理组的数量
num_treatments = np.sum(treatment)
# 没有处理的客户被称为'对照'组
num_control = np.sum(control)
num_examples = len(predictions)

# 根据预测值对标签和处理方法进行降序排序
prediction_order = predictions.argsort()[::-1]
outcomes_sorted = outcomes[prediction_order]
treatment_sorted = treatment[prediction_order]
control_sorted = control[prediction_order]

# 计算处理组的转化率
ratio_treatment = np.cumsum(np.multiply(outcomes_sorted, treatment_sorted), axis=0)/num_treatments

# 创建图表和坐标轴
fig, ax = plt.subplots()
ax.plot(ratio_treatment, label='处理组的转化率')
ax.set_xlabel('k')
ax.set_ylabel('转化率')
ax.legend()

  1/512 [..............................] - ETA: 2:44
 12/512 [..............................] - ETA: 2s  
 23/512 [>.............................] - ETA: 2s
 34/512 [>.............................] - ETA: 2s
 45/512 [=>............................] - ETA: 2s
 55/512 [==>...........................] - ETA: 2s
 66/512 [==>...........................] - ETA: 2s
 77/512 [===>..........................] - ETA: 2s
 88/512 [====>.........................] - ETA: 2s
 98/512 [====>.........................] - ETA: 2s
109/512 [=====>........................] - ETA: 1s
120/512 [======>.......................] - ETA: 1s
131/512 [======>.......................] - ETA: 1s
142/512 [=======>......................] - ETA: 1s
153/512 [=======>......................] - ETA: 1s
164/512 [========>.....................] - ETA: 1s
175/512 [=========>....................] - ETA: 1s
186/512 [=========>....................] - ETA: 1s
197/512 [==========>...................] - ETA: 1s
208/512 [===========>..................] - ETA: 1s
219/512 [===========>..................] - ETA: 1s
230/512 [============>.................] - ETA: 1s
241/512 [=============>................] - ETA: 1s
252/512 [=============>................] - ETA: 1s
263/512 [==============>...............] - ETA: 1s
274/512 [===============>..............] - ETA: 1s
285/512 [===============>..............] - ETA: 1s
296/512 [================>.............] - ETA: 1s
307/512 [================>.............] - ETA: 0s
318/512 [=================>............] - ETA: 0s
328/512 [==================>...........] - ETA: 0s
339/512 [==================>...........] - ETA: 0s
350/512 [===================>..........] - ETA: 0s
361/512 [====================>.........] - ETA: 0s
372/512 [====================>.........] - ETA: 0s
383/512 [=====================>........] - ETA: 0s
394/512 [======================>.......] - ETA: 0s
405/512 [======================>.......] - ETA: 0s
416/512 [=======================>......] - ETA: 0s
427/512 [========================>.....] - ETA: 0s
438/512 [========================>.....] - ETA: 0s
449/512 [=========================>....] - ETA: 0s
460/512 [=========================>....] - ETA: 0s
471/512 [==========================>...] - ETA: 0s
482/512 [===========================>..] - ETA: 0s
493/512 [===========================>..] - ETA: 0s
504/512 [============================>.] - ETA: 0s
512/512 [==============================] - 3s 5ms/step

同样地，我们也可以计算和绘制那些没有收到邮件的人群的转化率，称为对照组。理想情况下，这条曲线最初是平的：这意味着模型不会优先发送邮件给那些即使没有收到邮件也会产生转化的客户。

# 计算控制组的转化率
ratio_control = np.cumsum(np.multiply(outcomes_sorted, control_sorted), axis=0) / num_control

# 绘制控制组的转化率曲线
ax.plot(ratio_control, label='Conversion ratio of control')

# 添加图例
ax.legend()

# 显示图形
fig

AUUC指标测量了这两条曲线之间的面积，并将y轴归一化到0和1之间。



# 创建一个等差数列，范围从0到1，共有num_examples个数据点
x = np.linspace(0, 1, num_examples)

# 绘制treatment组的转化率曲线
plt.plot(x, ratio_treatment, label='Treatment组的转化率')

# 绘制control组的转化率曲线
plt.plot(x, ratio_control, label='Control组的转化率')

# 使用蓝色填充treatment组转化率大于control组转化率的区域
plt.fill_between(x, ratio_treatment, ratio_control, where=(ratio_treatment > ratio_control), color='C0', alpha=0.3)

# 使用橙色填充treatment组转化率小于control组转化率的区域
plt.fill_between(x, ratio_treatment, ratio_control, where=(ratio_treatment < ratio_control), color='C1', alpha=0.3)

# 设置x轴标签为k
plt.xlabel('k')

# 设置y轴标签为转化率
plt.ylabel('转化率')

# 添加图例
plt.legend()

# 使用梯形法则计算两条曲线之间的面积，得到AUUC值
auuc = np.trapz(ratio_treatment - ratio_control, dx=1/num_examples)

# 打印AUUC值
print(f'测试数据集上的AUUC值为 {auuc}')

The AUUC on the test dataset is 0.007513928513572819

你可能感兴趣的:(数据挖掘,tensorflow,人工智能)

PyTorch & TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）阿牛的药铺算法移植部署 pytorch tensorflow fpga开发
PyTorch&TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）引言：为什么算法移植工程师必须掌握框架基础？针对光学类产品算法FPGA移植岗位需求（如可见光/红外图像处理），深度学习框架是算法落地的"桥梁"——既要用PyTorch/TensorFlow验证算法可行性，又要将训练好的模型（如CNN、目标检测）转换为FPGA可部署的格式（ONNX、TFLite）。本文采用"
算法学习笔记：17.蒙特卡洛算法 ——从原理到实战，涵盖 LeetCode 与考研 408 例题
在计算机科学和数学领域，蒙特卡洛算法（MonteCarloAlgorithm）以其独特的随机抽样思想，成为解决复杂问题的有力工具。从圆周率的计算到金融风险评估，从物理模拟到人工智能，蒙特卡洛算法都发挥着不可替代的作用。本文将深入剖析蒙特卡洛算法的思想、解题思路，结合实际应用场景与Java代码实现，并融入考研408的相关考点，穿插图片辅助理解，帮助你全面掌握这一重要算法。蒙特卡洛算法的基本概念蒙特卡
基于Python的健身数据分析工具的搭建流程day1 weixin_45677320 python 开发语言数据挖掘爬虫
基于Python的健身数据分析工具的搭建流程分数据挖掘、数据存储和数据分析三个步骤。本文主要介绍利用Python实现健身数据分析工具的数据挖掘部分。第一步：加载库加载本文需要的库，如下代码所示。若库未安装，请按照python如何安装各种库（保姆级教程）_python安装库-CSDN博客https://blog.csdn.net/aobulaien001/article/details/133298
AI音乐模拟器：AIGC时代的智能音乐创作革命 lauo 人工智能 AIGC 开源前端机器人
AI音乐模拟器：AIGC时代的智能音乐创作革命引言：AIGC浪潮下的音乐创作新范式在数字化转型的浪潮中，人工智能生成内容（AIGC）正在重塑各个创意领域。音乐产业作为创意经济的重要组成部分，正经历着前所未有的变革。据最新市场研究数据显示，全球AI音乐市场规模预计将从2023年的5.8亿美元增长到2030年的26.8亿美元，年复合增长率高达24.3%。这一快速增长的市场背后，是AI音乐技术正在打破传
视频分析：让AI看懂动态画面随机森林404 计算机视觉音视频人工智能 microsoft
引言：动态视觉理解的革命在数字信息爆炸的时代，视频已成为最主要的媒介形式。据统计，每分钟有超过500小时的视频内容被上传到YouTube平台，而全球互联网流量的82%来自视频数据传输。面对如此海量的视频内容，传统的人工处理方式已无法满足需求，这正是人工智能视频分析技术大显身手的舞台。视频分析技术赋予机器"看懂"动态画面的能力，使其能够自动理解、解释甚至预测视频中的内容，这一突破正在彻底改变我们与视
Python的科学计算库NumPy（一） linlin_1998 python numpy 开发语言
NumPy(NumericalPython)是Python中最基础、最重要的科学计算库之一，提供了高性能的多维数组（ndarray）对象和大量数学函数，是许多数据科学、机器学习库（如Pandas、SciPy、TensorFlow等）的基础依赖。1.创建一个numpy里面的一维数组importnumpyasnp###通过array方法创建一个ndarrayarray1=np.array([1,2,3
法律科技领域人工智能代理构建的十个经验教训，一位人工智能工程师通过构建、部署和维护智能代理的经验教训来优化法律工作流程的历程。知识大胖 NVIDIA GPU和大语言模型开发教程人工智能 ai
目录介绍什么是代理人？为什么它对法律如此重要？法律技术中代理用例示例-合同审查代理-法律研究代理在LegalTech中使用代理的十个教训-教训1：即使代理很酷，它们也不能解决所有问题-教训2：选择最适合您用例的框架-教训3：能够快速迭代不同的模型-教训4：从简单开始，必要时扩展-教训5：使用跟踪解决方案；您将需要它-教训6：确保跟踪成本，代理循环可能很昂贵-教训7：将控制权交给最终用户（人在环路中
Llama-Omni会说话的人工智能“语音到语音LLM” 利用低延迟、高质量语音转语音 AI 彻底改变对话方式（教程含源码）知识大胖 NVIDIA GPU和大语言模型开发教程 llama 人工智能 nvidia llm
介绍“单靠技术是不够的——技术与文科、人文学科的结合，才能产生让我们心花怒放的成果。”——史蒂夫·乔布斯近年来，人机交互领域发生了重大变化，尤其是随着ChatGPT、GPT-4等大型语言模型(LLM)的出现。虽然这些模型主要基于文本，但人们对语音交互的兴趣日益浓厚，以使人机对话更加无缝和自然。然而，实现语音交互而不受语音转文本处理中常见的延迟和错误的影响仍然是一个挑战。关键字：Llama-Omni
什么是热力学计算？它如何帮助人工智能发展？知识大胖 NVIDIA GPU和大语言模型开发教程人工智能量子计算
现代计算的基础是晶体管，这是一种微型电子开关，可以用它构建逻辑门，从而创建CPU或GPU等复杂的数字电路。随着技术的进步，晶体管变得越来越小。根据摩尔定律，集成电路中晶体管的数量大约每两年增加一倍。这种指数级增长使得计算技术呈指数级发展。然而，晶体管尺寸的缩小是有限度的。我们很快就会达到晶体管无法工作的阈值。此外，人工智能的进步使得对计算能力的需求比以往任何时候都更加迫切。根本问题是自然是随机的（
上海交大：工具增强推理agent
标题：SciMaster:TowardsGeneral-PurposeScientificAIAgentsPartI.X-MasterasFoundation-CanWeLeadonHumanity’sLastExam?来源：arXiv,2507.05241摘要人工智能代理的快速发展激发了利用它们加速科学发现的长期雄心。实现这一目标需要深入了解人类知识的前沿。因此，人类的最后一次考试（HLE）为评
微算法科技的前沿探索：量子机器学习算法在视觉任务中的革新应用 MicroTech2025 量子计算算法
在信息技术飞速发展的今天，计算机视觉作为人工智能领域的重要分支，正逐步渗透到我们生活的方方面面。从自动驾驶到人脸识别，从医疗影像分析到安防监控，计算机视觉技术展现了巨大的应用潜力。然而，随着视觉任务复杂度的不断提升，传统机器学习算法在处理大规模、高维度数据时遇到了计算瓶颈。在此背景下，量子计算作为一种颠覆性的计算模式，以其独特的并行处理能力和指数级增长的计算空间，为解决这一难题提供了新的思路。微算
中国银联豪掷1亿采购海光C86架构服务器信创新态势海光芯片 C86 国产芯片海光信息
近日，中国银联国产服务器采购大单正式敲定，基于海光C86架构的服务器产品中标，项目金额超过1亿元。接下来，C86服务器将用于支撑中国银联的虚拟化、大数据、人工智能、研发测试等技术场景，进一步提升其业务处理能力、用户服务效率和信息安全水平。作为我国重要的银行卡组织和金融基础设施，中国银联在全球183个国家和地区设有银联受理网络，境内外成员机构超过2600家，是世界三大银行卡品牌之一。此次中国银联发力
AI人工智能浪潮中文心一言的独特优势
AI人工智能浪潮中文心一言的独特优势：为什么它是中国市场的“AI主力军”？关键词：文心一言,AI大模型,中文处理,多模态融合,产业落地,安全可控,百度ERNIE摘要：在全球AI大模型浪潮中，百度文心一言（ERNIEBot）凭借“懂中文、会多模态、能落地、守规矩”的四大核心优势，成为中国市场最具竞争力的AI产品之一。本文将用“超级大脑”的比喻，从中文理解、多模态能力、产业生态融合、安全可控性四个维度
正义的算法迷宫—人工智能重构司法体系的技术悖论与文明试炼
一、法庭的数字化迁徙当美国威斯康星州法院采纳COMPAS算法评估被告再犯风险，当中国"智慧法院"系统年处理1.2亿件案件，司法体系正经历从石柱法典到代码裁判的范式革命。这场转型的核心驱动力是司法效率与公正的永恒张力：美国重罪案件平均审理周期达18个月，中国基层法官年人均结案357件（是德国同行的6倍），而算法能在0.3秒内完成百万份文书比对。人工智能渗透司法引发三重裂变：证据分析从经验推断转向数据
【python实战】不玩微博，一封邮件就能知道实时热榜，天秀吃瓜一条coding 从实战学python 人工智能 python linux 爬虫
❤️欢迎订阅《从实战学python》专栏，用python实现办公自动化、数据可视化、人工智能等各个方向的实战案例，有趣又有用！❤️更多精品专栏简介点这里有的人金玉其表败絮其中，有的人却若彩虹般绚烂，怦然心动前言哈喽，大家好，我是一条。在生活中我是一个不太喜欢逛娱乐平台的人，抖音、快手、微博我手机里都没装，甚至微信朋友圈都不看，但是自从开始写博客，有些热度不得不蹭。所以就有了这样一个需求，能不能让微
图神经网络：挖掘关系数据中的宝藏
图神经网络：挖掘关系数据中的宝藏在浩瀚的数据海洋中，蕴藏着一类特殊而强大的资源——关系数据。它们不是孤立的点，而是相互连接、彼此影响的复杂网络：社交平台上朋友的朋友、电商系统中商品与用户的互动、蛋白质分子内原子的结合、城市交通网中的道路连接……这些数据天然以图的形式存在，节点代表实体，边则承载着实体间千丝万缕的关系。传统的数据挖掘工具面对这些盘根错节的结构往往力不从心，而图神经网络（GNN）的崛起
MCP协议：AI时代的“万能插座”如何重构IT生态与未来
MCP协议：AI时代的“万能插座”如何重构IT生态与未来在人工智能技术爆炸式发展的浪潮中，一个名为ModelContextProtocol（MCP）的技术协议正以惊人的速度重塑IT行业的底层逻辑。2024年11月由Anthropic首次发布，MCP在短短半年内获得OpenAI、谷歌、亚马逊、阿里、腾讯等全球科技巨头的支持，被业内誉为AI时代的HTTP协议或USB-C接口，正在成为连接大模型与现实世
《算法备案全攻略：规范与流程引领数字时代新秩序》算法及大模型备案顾问刘老师算法备案深度学习 AIGC 语言模型算法人工智能
一、算法备案：开启合规新征程（一）备案规定的起源与发展2022年国家互联网信息办公室、工业和信息化部、公安部、国家市场监督管理总局联合发布《互联网信息服务算法推荐管理规定》，自2022年3月1日起施行。此后，相关规定不断完善和演进。如国家网信办于2022年8月、10月及2023年1月先后三次公布了《境内互联网信息服务算法备案清单》。同时，2022年发布的最高人民法院《关于规范和加强人工智能司法应用
使用tensorflow的多项式回归的例子（二） lishaoan77 tensorflow tensorflow 回归人工智能多项式回归
例2importtensorflowastfimportnumpyasnpimportmatplotlib.pyplotaspltplt.style.use('default')#importtensorflow.contrib.eagerastfe#fromgoogle.colabimportfiles#tf.enable_eager_execution()x=np.arange(0,5,0.1
使用tensorflow的线性回归的例子（七） lishaoan77 tensorflow tensorflow 线性回归人工智能
L1与L2损失这个脚本展示如何用TensorFlow求解线性回归。在算法的收敛性中，理解损失函数的影响是很重要的。这里我们展示L1和L2损失函数是如何影响线性回归的收敛性的。我们使用iris数据集,但是我们将改变损失函数和学习速率来看收敛性的改变。importmatplotlib.pyplotaspltimportnumpyasnpimporttensorflowastffromsklearnim
使用tensorflow的线性回归的例子（十二） lishaoan77 tensorflow tensorflow 线性回归人工智能戴明回归
DemingRegression这里展示如何用TensorFlow求解线性戴明回归。=+y=Ax+b我们用iris数据集,特别是:y=SepalLength且x=PetalWidth。戴明回归Demingregression也称为totalleastsquares,其中我们最小化从预测线到实际点(x,y)的最短的距离。最小二乘线性回归最小化与预测线的垂直距离，戴明回归最小化与预测线的总的距离，这种
C语言学生成绩管理系统<；自创>；(功能7有小错误,但可运行） han_xue_feng java
腾讯云加速企业和个人开发创新公开直播预告直播预告：07/18(周四)15:00-16:00随着人工智能与大模型的蓬勃发展，我们正步入一个由技微信实习第一天周五入职，早上早早来到了公司，发现好多人都没上班，到十点才陆陆续续有人来，办理完入职后，mentor中联夏令营遗憾没有入选不过hr的回复真的很好，辛苦啦#提前批简历挂麻了怎么办##机械制造投递记录#大数据开发的工作有点过于简单了吧sq大数据开发的
Python 实战人工智能数学基础：推荐系统应用 AI天才研究院 AI大模型企业级应用开发实战大数据人工智能语言模型 Java Python 架构设计
作者：禅与计算机程序设计艺术文章目录1.背景介绍2.核心概念与联系2.1用户画像2.2相似性计算2.2.1基于物品的相似度2.2.2基于用户的相似度2.3协同过滤算法2.3.1基于用户的协同过滤算法2.3.2基于物品的协同过滤算法2.3.3基于上下文的协同过滤算法3.核心算法原理和具体操作步骤以及数学模型公式详细讲解3.1基于用户的协同过滤算法3.2基于物品的协同过滤算法3.3混合协同过滤算法3.
Python桌面应用开发的未来——智能化工具与大模型赋能 IronwoodStag78
开发AI智能应用，就下载InsCodeAIIDE，一键接入DeepSeek-R1满血版大模型！标题：Python桌面应用开发的未来——智能化工具与大模型赋能随着人工智能技术的飞速发展，传统软件开发模式正在被重新定义。Python作为一门功能强大且灵活的语言，在桌面应用开发领域一直占据重要地位。然而，面对日益复杂的用户需求和快速变化的技术环境，如何提升开发效率、降低开发门槛，成为开发者亟需解决的问题
第八周 tensorflow实现猫狗识别降花绘 365天深度学习 tensorflow系列 tensorflow 深度学习人工智能
本文为365天深度学习训练营内部限免文章（版权归K同学啊所有）**参考文章地址：[TensorFlow入门实战｜365天深度学习训练营-第8周：猫狗识别（训练营内部成员可读）]**作者：K同学啊文章目录一、本周学习内容:1、自己搭建VGG16网络2、了解model.train_on_batch（）3、了解tqdm，并使用tqdm实现可视化进度条二、前言三、电脑环境四、前期准备1、导入相关依赖项2、
深度学习实战-使用TensorFlow与Keras构建智能模型程序员Gloria Python超入门 TensorFlow python
深度学习实战-使用TensorFlow与Keras构建智能模型深度学习已经成为现代人工智能的重要组成部分，而Python则是实现深度学习的主要编程语言之一。本文将探讨如何使用TensorFlow和Keras构建深度学习模型，包括必要的代码实例和详细的解析。1.深度学习简介深度学习是机器学习的一个分支，使用多层神经网络来学习和表示数据中的复杂模式。其广泛应用于图像识别、自然语言处理、推荐系统等领域。
AI产品经理需要了解的算法知识 AI劳模人工智能产品经理 AI产品经理 AI产品经理入门零基础入门产品经理算法语言模型
1、自然语言生成（NLG）自然语言生成（NaturalLanguageGeneration，简称NLG）是一种人工智能技术，它的目标是将计算机的数据、逻辑或算法产生的信息转换成人类可读的自然语言文本。换句话说，NLG能让机器“学会”写文章、报告、故事或者其他任何形式的文字，就像人类作家那样。这项技术使得机器能够理解复杂的数据并将其转化为易于理解的语言，以适应不同的受众和情境。应用实例：金融报告自动
【Python】OpenAI API 宅男很神经 python 开发语言
【Python与OpenAIAPI深度探索：从基础到未来】第一章：OpenAIAPI概览与核心概念1.1OpenAIAPI是什么？能做什么？OpenAIAPI(ApplicationProgrammingInterface，应用程序编程接口)是一套允许开发者通过编程方式访问和使用OpenAI开发的各种先进人工智能模型的服务。这些模型经过海量数据的训练，能够在多种任务上达到甚至超越人类水平。通过AP
Python：操作 Word 对齐方式 Thomas Kant Python python word c#
亲爱的技术爱好者们，热烈欢迎来到Kant2048的博客！我是ThomasKant，很开心能在CSDN上与你们相遇～本博客的精华专栏：【自动化测试】【测试经验】【人工智能】【Python】Python：操作Word对齐方式详解（左对齐/右对齐/居中/两端对齐）在日常办公自动化中，我们经常需要对Word文档中的段落设置对齐方式，如左对齐、右对齐、居中、两端对齐等。本文将带你使用python-docx库
TestCafe ➜ Playwright fixture 架构迁移指南 Thomas Kant 自动化测试 playwright testcafe typescript 测试架构
亲爱的技术爱好者们，热烈欢迎来到Kant2048的博客！我是ThomasKant，很开心能在CSDN上与你们相遇～本博客的精华专栏：【自动化测试】【测试经验】【人工智能】【Python】
apache 安装linux windows 墙头上一根草 apache inux windows
linux安装Apache 有两种方式一种是手动安装通过二进制的文件进行安装，另外一种就是通过yum 安装，此中安装方式，需要物理机联网。以下分别介绍两种的安装方式通过二进制文件安装Apache需要的软件有apr,apr-util,pcre 1，安装 apr 下载地址：htt
fill_parent、wrap_content和match_parent的区别 Cb123456 match_parent fill_parent
fill_parent、wrap_content和match_parent的区别: 1）fill_parent 设置一个构件的布局为fill_parent将强制性地使构件扩展，以填充布局单元内尽可能多的空间。这跟Windows控件的dockstyle属性大体一致。设置一个顶部布局或控件为fill_parent将强制性让它布满整个屏幕。 2） wrap_conte
网页自适应设计天子之骄 html css 响应式设计页面自适应
网页自适应设计网页对浏览器窗口的自适应支持变得越来越重要了。自适应响应设计更是异常火爆。再加上移动端的崛起，更是如日中天。以前为了适应不同屏幕分布率和浏览器窗口的扩大和缩小，需要设计几套css样式，用js脚本判断窗口大小，选择加载。结构臃肿，加载负担较大。现笔者经过一定时间的学习，有所心得，故分享于此，加强交流，共同进步。同时希望对大家有所
[sql server] 分组取最大最小常用sql 一炮送你回车库 SQL Server
--分组取最大最小常用sql--测试环境if OBJECT_ID('tb') is not null drop table tb;gocreate table tb( col1 int, col2 int, Fcount int)insert into tbselect 11,20,1 union allselect 11,22,1 union allselect 1
ImageIO写图片输出到硬盘 3213213333332132 java image
package awt; import java.awt.Color; import java.awt.Font; import java.awt.Graphics; import java.awt.image.BufferedImage; import java.io.File; import java.io.IOException; import javax.imagei
自己的String动态数组宝剑锋梅花香 java 动态数组数组
数组还是好说，学过一两门编程语言的就知道，需要注意的是数组声明时需要把大小给它定下来，比如声明一个字符串类型的数组：String str[]=new String[10]; 但是问题就来了，每次都是大小确定的数组，我需要数组大小不固定随时变化怎么办呢？动态数组就这样应运而生，龙哥给我们讲的是自己用代码写动态数组，并非用的ArrayList 看看字符
pinyin4j工具类 darkranger .net
pinyin4j工具类Java工具类 2010-04-24 00:47:00 阅读69 评论0 字号：大中小引入pinyin4j-2.5.0.jar包: pinyin4j是一个功能强悍的汉语拼音工具包，主要是从汉语获取各种格式和需求的拼音，功能强悍，下面看看如何使用pinyin4j。本人以前用AscII编码提取工具，效果不理想，现在用pinyin4j简单实现了一个。功能还不是很完美，
StarUML学习笔记----基本概念 aijuans UML建模
介绍StarUML的基本概念，这些都是有效运用StarUML?所需要的。包括对模型、视图、图、项目、单元、方法、框架、模型块及其差异以及UML轮廓。模型、视与图（Model, View and Diagram） &
Activiti最终总结 avords Activiti id 工作流
1、流程定义ID：ProcessDefinitionId，当定义一个流程就会产生。 2、流程实例ID：ProcessInstanceId，当开始一个具体的流程时就会产生，也就是不同的流程实例ID可能有相同的流程定义ID。 3、TaskId，每一个userTask都会有一个Id这个是存在于流程实例上的。 4、TaskDefinitionKey和（ActivityImpl activityId
从省市区多重级联想到的，react和jquery的差别 bee1314 jquery UI react
在我们的前端项目里经常会用到级联的select，比如省市区这样。通常这种级联大多是动态的。比如先加载了省，点击省加载市，点击市加载区。然后数据通常ajax返回。如果没有数据则说明到了叶子节点。针对这种场景，如果我们使用jquery来实现，要考虑很多的问题，数据部分，以及大量的dom操作。比如这个页面上显示了某个区，这时候我切换省，要把市重新初始化数据，然后区域的部分要从页面
Eclipse快捷键大全 bijian1013 java eclipse 快捷键
Ctrl+1 快速修复(最经典的快捷键,就不用多说了)Ctrl+D: 删除当前行 Ctrl+Alt+↓ 复制当前行到下一行(复制增加)Ctrl+Alt+↑ 复制当前行到上一行(复制增加)Alt+↓ 当前行和下面一行交互位置(特别实用,可以省去先剪切,再粘贴了)Alt+↑ 当前行和上面一行交互位置(同上)Alt+← 前一个编辑的页面Alt+→ 下一个编辑的页面(当然是针对上面那条来说了)Alt+En
js 笔记函数征客丶 JavaScript
一、函数的使用 1.1、定义函数变量 var vName = funcation(params){ } 1.2、函数的调用函数变量的调用： vName(params); 函数定义时自发调用：(function(params){})(params); 1.3、函数中变量赋值 var a = 'a'; var ff
【Scala四】分析Spark源代码总结的Scala语法二 bit1129 scala
1. Some操作在下面的代码中，使用了Some操作：if (self.partitioner == Some(partitioner))，那么Some(partitioner)表示什么含义？首先partitioner是方法combineByKey传入的变量， Some的文档说明： /** Class `Some[A]` represents existin
java 匿名内部类 BlueSkator java匿名内部类
组合优先于继承 Java的匿名类，就是提供了一个快捷方便的手段，令继承关系可以方便地变成组合关系继承只有一个时候才能用，当你要求子类的实例可以替代父类实例的位置时才可以用继承。在Java中内部类主要分为成员内部类、局部内部类、匿名内部类、静态内部类。内部类不是很好理解，但说白了其实也就是一个类中还包含着另外一个类如同一个人是由大脑、肢体、器官等身体结果组成，而内部类相
盗版win装在MAC有害发热，苹果的东西不值得买，win应该不用 ljy325 游戏 apple windows XP OS
Mac mini 型号: MC270CH-A RMB:5,688 Apple 对windows的产品支持不好,有以下问题: 1.装完了xp,发现机身很热虽然没有运行任何程序！貌似显卡跑游戏发热一样，按照那样的发热量,那部机子损耗很大,使用寿命受到严重的影响! 2.反观安装了Mac os的展示机，发热量很小，运行了1天温度也没有那么高 &nbs
读《研磨设计模式》-代码笔记-生成器模式-Builder bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /** * 生成器模式的意图在于将一个复杂的构建与其表示相分离，使得同样的构建过程可以创建不同的表示（GoF） * 个人理解： * 构建一个复杂的对象，对于创建者（Builder）来说，一是要有数据来源(rawData)，二是要返回构
JIRA与SVN插件安装 chenyu19891124 SVN jira
JIRA安装好后提交代码并要显示在JIRA上，这得需要用SVN的插件才能看见开发人员提交的代码。 1.下载svn与jira插件安装包，解压后在安装包(atlassian-jira-subversion-plugin-0.10.1) 2.解压出来的包里下的lib文件夹下的jar拷贝到(C:\Program Files\Atlassian\JIRA 4.3.4\atlassian-jira\WEB
常用数学思想方法 comsci 工作
对于搞工程和技术的朋友来讲，在工作中常常遇到一些实际问题，而采用常规的思维方式无法很好的解决这些问题，那么这个时候我们就需要用数学语言和数学工具，而使用数学工具的前提却是用数学思想的方法来描述问题。。下面转帖几种常用的数学思想方法，仅供学习和参考函数思想　　把某一数学问题用函数表示出来，并且利用函数探究这个问题的一般规律。这是最基本、最常用的数学方法
pl/sql集合类型 daizj oracle 集合 type pl/sql
--集合类型 /* 单行单列的数据，使用标量变量单行多列数据，使用记录单列多行数据，使用集合（。。。） *集合：类似于数组也就是。pl/sql集合类型包括索引表（pl/sql table）、嵌套表（Nested Table）、变长数组（VARRAY）等 */ /* --集合方法 &n
[Ofbiz]ofbiz初用 dinguangx 电商 ofbiz
从github下载最新的ofbiz（截止2015-7-13），从源码进行ofbiz的试用 1. 加载测试库 ofbiz内置derby，通过下面的命令初始化测试库 ./ant load-demo (与load-seed有一些区别) 2. 启动内置tomcat ./ant start 或 ./startofbiz.sh 或 java -jar ofbiz.jar &
结构体中最后一个元素是长度为0的数组 dcj3sjt126com c gcc
在Linux源代码中，有很多的结构体最后都定义了一个元素个数为0个的数组，如/usr/include/linux/if_pppox.h中有这样一个结构体： struct pppoe_tag { __u16 tag_type; __u16 tag_len; &n
Linux cp 实现强行覆盖 dcj3sjt126com linux
发现在Fedora 10 /ubutun 里面用cp -fr src dest，即使加了-f也是不能强行覆盖的，这时怎么回事的呢？一两个文件还好说，就输几个yes吧，但是要是n多文件怎么办，那还不输死人呢？下面提供三种解决办法。方法一我们输入alias命令，看看系统给cp起了一个什么别名。 [root@localhost ~]# aliasalias cp=’cp -i’a
Memcached(一)、HelloWorld frank1234 memcached
一、简介高性能的架构离不开缓存，分布式缓存中的佼佼者当属memcached，它通过客户端将不同的key hash到不同的memcached服务器中，而获取的时候也到相同的服务器中获取，由于不需要做集群同步，也就省去了集群间同步的开销和延迟，所以它相对于ehcache等缓存来说能更好的支持分布式应用，具有更强的横向伸缩能力。二、客户端选择一个memcached客户端，我这里用的是memc
Search in Rotated Sorted Array II hcx2013 search
Follow up for "Search in Rotated Sorted Array":What if duplicates are allowed? Would this affect the run-time complexity? How and why? Write a function to determine if a given ta
Spring4新特性——更好的Java泛型操作API jinnianshilongnian spring4 generic type
Spring4新特性——泛型限定式依赖注入 Spring4新特性——核心容器的其他改进 Spring4新特性——Web开发的增强 Spring4新特性——集成Bean Validation 1.1(JSR-349)到SpringMVC Spring4新特性——Groovy Bean定义DSL Spring4新特性——更好的Java泛型操作API Spring4新
CentOS安装JDK liuxingguome centos
1、行卸载原来的： [root@localhost opt]# rpm -qa | grep java tzdata-java-2014g-1.el6.noarch java-1.7.0-openjdk-1.7.0.65-2.5.1.2.el6_5.x86_64 java-1.6.0-openjdk-1.6.0.0-11.1.13.4.el6.x86_64 [root@localhost
二分搜索专题2-在有序二维数组中搜索一个元素 OpenMind 二维数组算法二分搜索
1,设二维数组p的每行每列都按照下标递增的顺序递增。用数学语言描述如下：p满足 (1),对任意的x1，x2，y，如果x1<x2,则p(x1,y)<p(x2,y); (2),对任意的x，y1,y2, 如果y1<y2,则p(x,y1)<p(x,y2); 2,问题：给定满足1的数组p和一个整数k，求是否存在x0,y0使得p(x0,y0)=k? 3,算法分析： (
java 随机数 Math与Random SaraWon java Math Random
今天需要在程序中产生随机数，知道有两种方法可以使用，但是使用Math和Random的区别还不是特别清楚，看到一篇文章是关于的，觉得写的还挺不错的，原文地址是 http://www.oschina.net/question/157182_45274?sort=default&p=1#answers 产生1到10之间的随机数的两种实现方式： //Math Math.roun
oracle创建表空间 tugn oracle
create temporary tablespace TXSJ_TEMP tempfile 'E:\Oracle\oradata\TXSJ_TEMP.dbf' size 32m autoextend on next 32m maxsize 2048m extent m
使用Java8实现自己的个性化搜索引擎 yangshangchuan java superword 搜索引擎 java8 全文检索
需要对249本软件著作实现句子级别全文检索，这些著作均为PDF文件，不使用现有的框架如lucene，自己实现的方法如下： 1、从PDF文件中提取文本，这里的重点是如何最大可能地还原文本。提取之后的文本，一个句子一行保存为文本文件。 2、将所有文本文件合并为一个单一的文本文件，这样，每一个句子就有一个唯一行号。 3、对每一行文本进行分词，建立倒排表，倒排表的格式为：词=包含该词的总行数N=行号