Trajectory Tracking of Asian Giant Hornets Based on SVM and BA-SVM Algorithm
based on 2021 MCM Problem C: Confirming the Buzz about Hornets(to find out whether a pest or not)
类型:大数据分析题
大数据分析共三种类型–字符类型 文本类型 图像/视频类型
In September 2019, a colony of Vespa mandarinia (also known as the Asian giant hornet)was discovered on Vancouver Island in British Columbia, Canada. The nest was quicklydestroyed, but the news of the event spread rapidly throughout the area. Since that time,several confirmed sightings of the pest have occurred in neighboring Washington State, aswell as a multitude of mistaken sightings. See Figure 1 below for a map of detections,hornet watches, and public sightings.
Vespa mandarinia is the largest species of hornet in the world, and the occurrence of the nest was alarming. Additionally, the giant hornet is a predator of European honeybees, invading and destroying their nests. A small number of the hornets are capable of destroying a whole colony of European honeybees in a short time. At the same time, they are voracious predators of other insects that are considered agricultural pests.
The life cycle of this hornet is similar to many other wasps. Fertilized queens emerge in the spring and begin a new colony. In the fall, new queens leave the nest and will spend the winter in the soil waiting for the spring. A new queen has a range estimated at 30km for establishing her nest. More detailed information on Asian hornets is included in the problem attachments and can also be found online.
Due to the potential severe impact on local honeybee populations, the presence of Vespamandarinia can cause a good deal of anxiety. The State of Washington has created help line sand a website for people to report sightings of these hornets. Based on these reports from the public, the state must decide how to prioritize its limited resources to follow-up with additional in vestigation. While some reports have been determined to be Vespa mandarinia, many other sightings have turned out to be other types of insects.
The primary questions for this problem are“How can we interpret the data provided by the public reports?”and "What strategies can we use to prioritize these public reports for additional investigation given the limited resources of government agencies?”
Problems:
Your paper should explore and address the following aspects:
1.Address and discuss whether or not the spread of this pest over time can be predicted,and with what level of precision.
预测模型(讨论大黄蜂的出现随着时间等的变化,可以考虑空间特征进行分析)
2.Most reported sightings mistake other hornets for the Vespa mandarinia. Use only the data set file provided, and (possibly) the image files provided, to create,analyze, and discuss a model that predicts the likelihood of a mistaken classification.
分类模型(我的文章使用SVM来分类,要在模型分析上下功夫,对于Recall 等指标进一步分析)
3.Use your model to discuss how your classification analyses leads to prioritizing investigation of the reports most likely to be positive sightings.
评价(给出无监督样本的结果并评价模型的准确率等指标,便于进一步分析)
4.Address how you could update your model given additional new reports over time, and how often the updates should occur.
优化模型,学习率等参数的更新规则,以及数据集迭代的频次等
5.Using your model, what would constitute evidence that the pest has been eradicated in Washington State?
我对于黄蜂数量减少到认定范围的判定标准
Finally, your report should include a two-page memorandum that summarizes yourresults for the Washington State Department of Agriculture.
Your PDF solution of no more than 25 total pages should include:
您的PDF解决方案(总共不超过25页)应包括:
·One-page Summary Sheet.一页的摘要表
·Table of Contents.目录。
·Your complete solution.您的完整解决方案。
·Two-page Article.两页文章。
·References list.考文献清单。
1、找到判断亚洲大黄蜂的指标(选取判断指标)
2、根据指标和图像—转变为数据,建模(理论建模,有监督的训练,做图像的是转积神经网络,或者yolo5算法)
3、输入无监督的测试集,预测一些结果
4、当有结果数据增多时,即训练集增加时,多长时间要更新一次模型?如何更新(模型改进)
5、说明模型的判断精度,基于我的模型,找到一个大黄蜂就消灭一个,何时可以消灭大黄蜂
来源:西南交通大学钱学院辅B站分享
使用jpg格式地图按颜色进行阈值分割,进行读取,得到csv格式文件
分析过程:数据采集—数据清洗—数据分析—数据解释—数据可视化
数据集的特点:
detection date | notes | lab status | lab comments | submission date | latitude、longtitude |
---|---|---|---|---|---|
目击报告-提交时间 | 目击报告-文本描述信息 | 类别标识 | 实验室描述-文本信息 | 实验室检测时间 | 经纬度记录 |
检查数据异常或者缺失值—进行补充(插值或者回归)
针对notes 将该数据去除(因为噪音非常大,会对模型造型麻烦)
语义数据预处理(进行独热编码、离散化、NLP网络提取)
好看的框图结构展示
意外获得一条pandas入门之路:(看第一条解答)
https://www.zhihu.com/question/439115857
明日阅读优秀论文后再更~