WatsonAPI之Natural Language Classifier

说明

NLC服务使用机器学习算法返回短文本输入的匹配预定义类。创建和训练一个分类器,将预定义分类与示例文本连接起来,以便服务可以将这些分类器可以对新的输入进行分类

认证方式

使用HTTP Basic Authentication方式认证。 即用户名/密码方式

创建一个分类器

CURL命令

curl -u "USERNAME":"PASSWORD"  ^
-F training_data=@weather_data_train.csv  ^
-F training_metadata="{\"language\":\"en\",\"name\":\"atp-weather\"}"  ^
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers"

返回值

{
  "classifier_id" : "359f3fx202-nlc-223328",
  "name" : "atp-weather",
  "language" : "en",
  "created" : "2017-07-25T03:20:16.451Z",
  "url" : "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328",
  "status" : "Training",
  "status_description" : "The classifier instance is in its training phase, not yet ready to accept classify requests"
}

** 注意此时分类器的状态为训练中 暂时还不能使用。我们可以通过命令查看分类器状态**

查看分类器列表

CURL命令

curl -u "USERNAME":"PASSWORD" ^
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers"

返回值

{
  "classifiers" : [ {
    "classifier_id" : "359f3fx202-nlc-223328",
    "url" : "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328",
    "name" : "atp-weather",
    "language" : "en",
    "created" : "2017-07-25T03:20:16.451Z"
  } ]
}

查看分类器信息

CURL命令

curl -u "USERNAME":"PASSWORD"  ^
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328"

返回值

{
  "classifier_id" : "359f3fx202-nlc-223328",
  "name" : "atp-weather",
  "language" : "en",
  "created" : "2017-07-25T03:20:16.451Z",
  "url" : "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328",
  "status" : "Available",
  "status_description" : "The classifier instance is now available and is ready to take classifier requests."
}

分类器有如下五种状态

  • 1 Non Existent : 不存在
  • 2 Training : 训练中
  • 3 Failed:失败
  • 4 Available:有效
  • 5 Unavailable:无效

使用分类器进行分类

CURL命令

  • Get方法分类 How how will it be today?
curl -G -u "USERNAME":"PASSWORD" ^
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328/classify?text=How%20hot%20will%20it%20be%20today%3F"
  • Post方法分类 How how will it be today?
curl -X POST -u "USERNAME":"PASSWORD" ^
-H "Content-Type:application/json" ^
-d "{\"text\":\"How hot will it be today?\"}" ^
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328/classify"

返回值

{
  "classifier_id" : "359f3fx202-nlc-223328",
  "url" : "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328",
  "text" : "How hot will it be today?",
  "top_class" : "temperature",
  "classes" : [ {
    "class_name" : "temperature",
    "confidence" : 0.9929586035651006
  }, {
    "class_name" : "conditions",
    "confidence" : 0.007041396434899482
  } ]
}

使用分类器训练数据中未包含的词汇(sleet 为雨夹雪)
特意使用了temperature分类中包含的句式 how xxx it is today?
分类器还是准确将其分到condition类中了。

curl -X POST -u "username":"password" ^
-H "Content-Type:application/json" ^
-d "{\"text\":\"How sleet will it be today?\"}" ^
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328/classify"

返回值

{
  "classifier_id" : "359f3fx202-nlc-223328",
  "url" : "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328",
  "text" : "How sleet will it be today?",
  "top_class" : "conditions",
  "classes" : [ {
    "class_name" : "conditions",
    "confidence" : 0.89688785244637
  }, {
    "class_name" : "temperature",
    "confidence" : 0.10311214755363002
  } ]
}

使用分类器完全无关的词汇 it is atp's notebook?
分类结果非常不理想 temperature类的置信度竟然高达82%

curl -X POST -u "74e23665-dfea-4bd6-ad80-3e9b4a7f7604":"RxFKejjwlUcA" ^
-H "Content-Type:application/json" ^
-d "{\"text\":\"it is atp's notebook?\"}" ^
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328/classify"

返回值

{
  "classifier_id" : "359f3fx202-nlc-223328",
  "url" : "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328",
  "text" : "it is atp's notebook?",
  "top_class" : "temperature",
  "classes" : [ {
    "class_name" : "temperature",
    "confidence" : 0.8255246180698945
  }, {
    "class_name" : "conditions",
    "confidence" : 0.1744753819301055
  } ]
}

删除一个分类器

CURL命令

curl -X DELETE -u "{username}":"{password}" 
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/10D41B-nlc-1"

要点

  • 置信度值表示为百分比,值越大表示置信度越高。响应最多包含 10 个类。
  • 如果培训数据中的类少于10个,那么所有置信度值的和为 100%。例如只定义了两个类,就只能返回两个类。
  • 其中一个样本问题包含未对分类器进行培训的词语(“foggy”)。您无须执行额外工作来识别这些“缺少”的词语,分类器对于这些词语就能获得不错的分数。请尝试使用包含培训数据中没有的词(例如,“sleet”或“storm”)的其他问题。

课题

  • 1 支持语言 en之外还包含?
  • 2 训练数据文本的格式 csv固定? csv的format也是固定?
  • 3 分类器建成以后是否可以追加training数据

你可能感兴趣的:(WatsonAPI之Natural Language Classifier)