自动化深度学习工具AutoGluon

无意间发现了一个叫AutoGluon的工具,这个工具提供了丰富的功能,包括时间序列模型、eda数据分析模块,以及包括图像文本matching、物体检测、命名实体识别、文本分类等很多功能,如有需要可以访问:
https://auto.gluon.ai/dev/tutorials/

1.预测

该模型可以针对数值型、文本型数据分别完成预测。

数值型预测(例如鸢尾花数据预测)的研究较为充分,

而文本类型的预测(例如泰坦尼克预测)则是将很多文本转化为数字类型,autogluon进行了改进,根据文档显示,autogluon使用了transformer进行文本学习。

数值型预测:

from autogluon.tabular import TabularDataset, TabularPredictor

data_root = 'https://autogluon.s3.amazonaws.com/datasets/Inc/'
train_data = TabularDataset(data_root + 'train.csv')
test_data = TabularDataset(data_root + 'test.csv')

predictor = TabularPredictor(label='class').fit(train_data=train_data)
predictions = predictor.predict(test_data)

文本型预测:

from autogluon.multimodal import MultiModalPredictor
import uuid

time_limit = 3 * 60  # set to larger value in your applications
model_path = f"./tmp/{uuid.uuid4().hex}-automm_text_book_price_prediction"
predictor = MultiModalPredictor(label='Price', path=model_path)
predictor.fit(train_data, time_limit=time_limit)

2.命名实体识别:

准备数据:

from autogluon.core.utils.loaders import load_pd
train_data = load_pd.load('https://automl-mm-bench.s3.amazonaws.com/ner/mit-movies/train_v2.csv')
test_data = load_pd.load('https://automl-mm-bench.s3.amazonaws.com/ner/mit-movies/test_v2.csv')
train_data.head(5)

数据类似这样,短横线分割了原始输入和标注结果:

what movies star bruce willis-------------[{“entity_group”: “ACTOR”, “start”: 17, “end”:…

show me films with drew barrymore from the 1980s------------[{“entity_group”: “ACTOR”, “start”: 19, “end”:…

训练模型

from autogluon.multimodal import MultiModalPredictor
import uuid

label_col = "entity_annotations"
model_path = f"./tmp/{uuid.uuid4().hex}-automm_ner"  # You can rename it to the model path you like
predictor = MultiModalPredictor(problem_type="ner", label=label_col, path=model_path)
predictor.fit(
    train_data=train_data,
    hyperparameters={'model.ner_text.checkpoint_name':'google/electra-small-discriminator'},
    time_limit=300, #second
)

效果展示

from autogluon.multimodal.utils import visualize_ner

sentence = "Game of Thrones is an American fantasy drama television series created by David Benioff"
predictions = predictor.predict({'text_snippet': [sentence]})
print('Predicted entities:', predictions[0])

# Visualize
visualize_ner(sentence, predictions[0])

部分输出(其实还可以进行可视化展示):

Predicted entities: [{'entity_group': 'TITLE', 'start': 0, 'end': 15}, {'entity_group': 'GENRE', 'start': 22, 'end': 44}, {'entity_group': 'DIRECTOR', 'start': 74, 'end': 87}]

再训练:

如果数据发生了更新,还可以在旧模型的基础上,继续进行训练:

new_predictor = MultiModalPredictor.load(model_path)
new_model_path = f"./tmp/{uuid.uuid4().hex}-automm_ner_continue_train"
new_predictor.fit(train_data, time_limit=60, save_path=new_model_path)
test_score = new_predictor.evaluate(test_data, metrics=['overall_f1', 'ACTOR'])
print(test_score)

更多模型
可以参考:https://github.com/autogluon/autogluon/tree/master/examples/automm

你可能感兴趣的:(数据挖掘,深度学习,机器学习,深度学习,自动化,人工智能)