R的能力

案例1. 芝加哥通过推文找问题餐厅

摘自:http://blog.revolutionanalytics.com/2013/08/foodborne-chicago.html

Foodborne Chicago finds dodgy restaurants with tweets, and R
If, like me, you've ever had a sandwich from a dubious deli and then been laid up for days afterwards, you know that food poisoning is no trifling matter. In the past, local authorities would only ever learn of such public health issues if they get reported to the authorities by the victim (or the victim's doctor). But that misses the many cases of less serious illnesses that don't involve a doctor or hospital, or illnesses that simply aren't reported to the authorities.
Now, the City of Chicago has found a new way of identifying sources of food poisoning: by analyzing tweets. Foodborne Chicago scans tweets posted in the Chicagoland area, responding to tweets like: "Stomach flu/food poisoning is like eating gas station sushi without the joys of eating gas station sushi" (but ignoring tweets like "It’s really hard to snack while watching Honey Boo Boo. It’s the second best diet to food poisoning."). If you send a such a tweet, you're likely to get a response:

R的能力_第1张图片

The system is entirely automated, and uses real-time text analysis implemented with R language to identify those tweets that are about a specific case of food poisoning:
Foodborne searches Twitter for all tweets near Chicago containing the string “food poisoning”. The ingestion service consumes thousands of tweets, storing them in a large MongoDB instance. A collection of classification servers, running R, churn through the collected tweets, applying a series of filters. The tweets are classified using a model that was trained via supervised learning, which determines if the tweets are related to a food poisoning illness or not.

Cory Nissen, the data scientist who implemented the analysis behind the system, shared some of the behind-the-scenes details with me via email. He used an R package called textcat and an algorithm based on n-grams to classify the tweets. The model is trained in such a way as to bias towards sensitivity (at the 90%+ level) at the expense of specificity (50 - 60%) to better sort true food poisoning reports from "junk" tweets merely about food poisoning. Out of all the tweets in the Chigaco area on any given day, the system flags about 10-20 tweets a day for review, of which just a couple will typically warrant a response to the unwell citizen for followup.

R的能力_第2张图片

The open-source R code behind the classifier is available on Github. Check out the README file for more technical details behind the implementation. You can also see how the application was presented on Fox 39 Chicago news (starting at the 2:09 mark):
(该视频不摘录提取过来。)

本文相关的数据和代码:
https://github.com/corynissen/foodborne_classifier/blob/master/README.md


案例2:ROC Curves in Python and R

摘自:http://blog.yhat.com/posts/roc-curves.html

R的能力_第3张图片
R的能力_第4张图片
R的能力_第5张图片
R的能力_第6张图片
R的能力_第7张图片
R的能力_第8张图片
R的能力_第9张图片
R的能力_第10张图片
R的能力_第11张图片
R的能力_第12张图片
R的能力_第13张图片
R的能力_第14张图片
R的能力_第15张图片
R的能力_第16张图片
R的能力_第17张图片

UGA: Receiver Operating Characteristic Curves
Intro to ROC Curves
The Area Under an ROC Curve
ROC curves and Area Under the Curve explained (video)

你可能感兴趣的:(R的能力)