php ml 使用_使用ML在朋友中找到最有趣的朋友

php ml 使用

A technical walkthrough of how to find the funniest Friend in FRIENDS

如何在朋友中找到最有趣的朋友的技术演练

After finishing Andrew Ng’s machine learning courses, I decided it was time to try a project of my own. Part of what I like about ML is how it can be applied to almost anything. With that in mind, I wanted to try a project that hadn’t been done before.

在完成了Andrew Ng的机器学习课程之后,我决定是时候尝试一个我自己的项目了。 我喜欢ML的部分原因是,它如何可以应用于几乎任何事物。 考虑到这一点,我想尝试一个以前没有做过的项目。

I settled on finding the funniest friend in FRIENDS because it seemed like an unanswered question and because it’s something that my non-technical friends might find interesting.

我决定在FRIENDS中找到最有趣的朋友,因为这似乎是一个悬而未决的问题,而且这是我的非技术朋友可能会发现的有趣问题。

In order to find the funniest friend in FRIENDS, we need to know which character generates the most laughter (per line). There are two caveats that I explain in the results article (which you should check out if you haven’t yet). The first is that we assume audience laughter (and length of laughter) maps to how funny a spoken line is. The second caveat is that we can’t capture purely non-verbal humor, which turns out to be ~3% of the show’s laughter (this is a rough calculation based on laughter’s proximity to spoken lines).

为了在朋友中找到最有趣的朋友,我们需要知道哪个角色产生最多的笑声(每行)。 我在结果文章中解释了两个注意事项(如果尚未查看,则应进行检查)。 首先,我们假设听众的笑声(和笑声的持续时间)与口语的趣味性相对应。 第二个警告是我们无法捕捉到纯粹的非语言幽默,而事实证明这大约是节目笑声的3%(这是根据笑声与口语线的接近程度进行的粗略计算)。

There are three main parts to figuring out which character is responsible for which laughter. First is identifying who is speaking and when. Second is identifying laughter and when it occurs. Third is piecing together the first two parts to figure out who is responsible for each instance of laughter.

弄清楚哪个角色导致哪个笑声是由三个主要部分组成的。 首先是确定谁在说话以及何时说话。 其次是识别笑声及其发生的时间。 第三是将前两个部分拼凑在一起,以确定每个笑声的肇事者。

第1部分 (Part 1)

谁与何时 (The Who and the When)

I talked the project over with some friends who know more about ML than I do. They advised me that the first part of the project (who is speaking when) is non-trivial when it comes to voice recognition algorithms. Detecting and differentiating human voices is still a difficult problem for ML. I read some papers and articles and it seemed to confirm my friends’ sentiments. Here is an example of an article that can identify different human voices with 80–85% accuracy. For my project, I didn’t think 80–85% accuracy would be high enough to draw any sort of conclusions about the funniest friend. That said, if you know of a relevant model for this sort of task please reach out!

我与一些比我更了解ML的朋友讨论了该项目。 他们告诉我,在语音识别算法方面,项目的第一部分(何时发言)是不平凡的。 对于ML,检测和区分人的声音仍然是一个难题。 我读了一些论文和文章,似乎证实了我朋友的观点。 这是一篇文章示例,可以识别80-85%的不同人声。 对于我的项目,我认为80-85%的准确性不足以得出关于这个最有趣的朋友的任何结论。 也就是说,如果您知道此类任务的相关模型,请联系!

With voice recognition ML not likely to perform accurately enough, I decided to try some different approaches. Because FRIENDS is so popular, there are fan-created script files for every episode in the show. These script files tell you who is speaking and what they say:

由于语音识别ML不太可能表现得足够准确,因此我决定尝试一些不同的方法。 由于FRIENDS非常受欢迎,因此该节目中的每一集都有粉丝创建的脚本文件 。 这些脚本文件告诉您谁在讲话以及他们说什么:

There have even been some data science articles written that make use of this script dataset in order to find interesting facts like who had the most speaking lines across all 10 seasons of the show.

甚至有一些数据科学文章都利用此脚本数据集来发现有趣的事实,例如谁在节目的所有10个季节中演讲最多。

These script files are great but they don’t tell us exactly WHEN any of these lines are said during the episode. Luckily, there are also subtitles files for each episode of the show. They tell you what is said and when it is said, but not WHO said it:

这些脚本文件很棒,但是在情节中什么时候说出这些内容时,它们并不能准确告诉我们。 幸运的是,节目的每一集都有字幕文件。 他们告诉您说了什么,说了什么,但世卫组织却没有说:

If we could somehow combine the script file and subtitles file, we could get the WHO and the WHEN that we need in order to figure out who speaks right before laughter. The script and subtitles both contain the WHAT (the spoken lines) so we can join them together using the WHAT to map the WHO to the WHEN.

如果我们能够以某种方式将脚本文件和字幕文件结合在一起,我们可以得到我们所需要的WHO和WHEN,以便弄清楚谁在笑声之前会说话。 脚本和字幕都包含WHAT(口语行),因此我们可以使用WHAT将它们结合在一起,以将WHO映射到WHEN。

If you look closely at the script and subtitles images above, you can see that mapping the lines to each other won’t be as easy as it sounds.

如果仔细看一下上面的脚本和字幕图片,您会发现,将各行相互映射并不像听起来那样容易。

Quick note: There is a lot of code in this project so I don’t post much of it in this article. Instead I will point you to the relevant Jupyter notebook on Github if you’re interested in following along with the implementations. If you’re interested in the scraping of script and subtitles files, see this notebook.

快速说明 :该项目中有很多代码,因此在本文中不做过多介绍。 相反,如果您对后续实现感兴趣,我将为您指向Github上的相关Jupyter笔记本。 如果您对抓取脚本和字幕文件感兴趣,请参阅此笔记本 。

I ended up using a series of 3 algorithms to join the script and subtitles files of each episode based on their spoken line similarities. Find the implementations here.

我最终使用了3种算法,根据口语的相似性将每个剧集的脚本和字幕文件加入脚本。 在此处找到实现。

算法#1 (Algorithm #1)

一个字符的单词/短语匹配 (One character word/phrase match)

We look for unique words or unique phrases (up to 5) in the script that are spoken by only one character and we see if that unique word or phrase also appears in the subtitles file (an equal or lesser amount of times). We then assign that character to the lines containing the unique word or phrase in the subtitles. An example from Season 1 Episode 1 is that Ross is the only character to say “should I know” in the episode, and these three words appear together once in the script and once in the subtitles. Therefore we assign the unknown subtitles line containing “should I know” to Ross.

我们在脚本中寻找仅由一个字符说出的唯一单词或唯一短语(最多5个),然后查看该唯一单词或短语是否也出现在字幕文件中(相等或更少的次数)。 然后,我们将该字符分配给字幕中包含唯一单词或短语的行。 第1季第1集的一个示例是,罗斯是该集中唯一一个说“我应该知道”的角色,并且这三个单词在脚本中和字幕中一次出现。 因此,我们将包含“我应该知道”的未知字幕行分配给Ross。

We get about 80% of all the lines in the subtitles after algorithm #1 and those lines were also about 99% accurate in terms of getting the character right.

在算法#1之后,我们获得了字幕中所有行的80%左右,并且在正确设置字符方面,这些行的准确性也约为99%。

算法#2 (Algorithm #2)

本地一个字符短语匹配 (Local one character phrase match)

This algorithm is the same as Algorithm #1 except that it is applied to specific parts of the script and subtitles files. The idea is that we can now start to zoom in on areas of the script where we have unknown lines between two known lines. We can do a one-character “unique word or phrase” match in that smaller area and find matches we would not have found if we were looking across the entire episode. We know exactly which line we are looking for in the subtitles and we also know where the closest known line above and below that line is in the subtitles.

该算法与算法#1相同,不同之处在于它适用于脚本和字幕文件的特定部分。 这个想法是,我们现在可以开始放大脚本中在两个已知行之间有未知行的区域。 我们可以在较小的区域中进行一个字符的“唯一单词或短语”匹配,并找到我们在整个情节中都找不到的匹配项。 我们确切知道字幕中要查找的行,我们也知道字幕中该行上下最接近的已知行。

However, we don’t have our bearings in the script file at all. To get our bearings, we take the “unique word or phrase” that gives us the known line above in the subtitles and same for the known line below. We then search the script file for a pattern that looks something like “known-line-above word”, then unknown line, then “known-line-below word.” Because known lines can have the same discovery word (for example, 4 different lines in the subtitles file for S1E1 have the discovery word “daddy”), this process is not flawless. But once we choose a “range” of the script to search, we simply execute algorithm #1 locally in that range.

但是,脚本文件中根本没有轴承。 为了理解我们的意思,我们采用“独特的单词或短语”,它在字幕中为我们提供了上面的已知行,而对于下面的已知行则具有相同的含义。 然后,我们在脚本文件中搜索一个模式,该模式看起来类似于“已知行上方的单词”,然后是未知行,然后是“已知行下方的单词”。 因为已知行可以具有相同的发现词(例如,S1E1的字幕文件中的4条不同行具有发现词“爸爸”),所以此过程并非完美无缺。 但是一旦选择了脚本的“范围”进行搜索,我们只需在该范围内本地执行算法#1。

To explain how this algorithm is different from algorithm #1, imagine the phrase “How are you?” is said 5 times throughout an episode by 3 different characters. Algorithm #1 would not be able to match any characters in this situation. But if we already know lines 45 and 55 we can search between these two lines. We find that “How are you?” is said only once by one character. Now we can match that character to a line where we couldn’t before.

为了解释该算法与算法1的不同,请想象一下短语“你好吗?” 在整个情节中被3个不同的角色说了5次。 在这种情况下,算法#1将无法匹配任何字符。 但是,如果我们已经知道第45和55行,则可以在这两行之间进行搜索。 我们发现“你好吗?” 只被一个字符说一次。 现在,我们可以将该字符与以前无法匹配的行进行匹配。

After Algorithm #2 we end up with 93% of lines predicted and 98% accuracy on those predictions. Algorithm #2 contributes an extra 13% with ~92% accuracy. Not as accurate as Algorithm #1, but not terrible either.

在算法#2之后,我们最终得到了93%的预测行和98%的预测准确度。 算法2贡献了约13%的准确度,达到了13%。 不如算法1准确,但也不可怕。

算法#3 (Algorithm #3)

确切的演讲者人数 (Exact Speaker Count)

Now say we know lines 45 and 48 in the script. Obviously there are 2 lines in between 45 and 48. The corresponding subtitles line numbers will be different and there may even be a different number of lines in between. But if there are 2 lines in between in the subtitles and script, we can simply map the two characters in the script to those two lines in the subtitles in sequential order. This situation is more rare than you’d think so Algorithm #3 only adds another 2% of total lines.

现在说我们知道脚本中的第45和48行。 显然,在45和48之间有2行。相应的字幕行号会有所不同,甚至在它们之间可能会有不同的行数。 但是,如果字幕和脚本之间有两行,我们可以简单地按顺序将脚本中的两个字符映射到字幕中的这两行。 这种情况比您想象的要罕见,因此算法3仅增加了总行数的2%。

After Algorithm #3, we now know 95% of lines with 98% accuracy.

在算法#3之后,我们现在知道95%的行和98%的精度。

With 98% accuracy on 95% of lines, I presume we are outperforming what a voice recognition ML model could get us (again, if you know of a super accurate model, please reach out!). The last 5% of lines tend to be extremely difficult to label. For example, Rachel will walk into a room of 3 friends and they will all say “Hi” to each other, nearly simultaneously. Without using a beastly combination of voice recognition, on-screen character (image) recognition, and text processing with the subtitles and script files I believe it would be very difficult for a model or algorithm to label these lines accurately. Therefore, I simply hand-labeled the last 5% of lines (hopefully with near 100% accuracy). This means that overall, the first part of the project will have character lines that are labeled with ~98% accuracy.

在95%的线路上具有98%的准确度,我想我们的表现要优于语音识别ML模型可以给我们带来的效果(同样,如果您知道超准确的模型,请联系!)。 最后5%的行往往很难标记。 例如,瑞秋(Rachel)走进一个有3个朋友的房间,他们几乎都会彼此都打个招呼。 如果不使用语音识别,屏幕上的字符(图像)识别以及带有字幕和脚本文件的文本处理的完美结合,我相信对于模型或算法而言,准确标记这些行将非常困难。 因此,我只是简单地手工标记了行的最后5%(希望准确率接近100%)。 这意味着总的来说,项目的第一部分将以〜98%的精度标记字符行。

From here, I do a quick check of all unique names across the 10 seasons and I assign any aliases to the correct characters. Then upload the labeled data into a SQLite database.

从这里,我快速检查了10个季节中的所有唯一名称,并为正确的角色分配了别名。 然后将标记的数据上传到SQLite数据库。

第2部分 (Part 2)

识别并标记笑声(涉及实际的ML!) (Recognizing and Labeling Laughter (Actual ML Involved!))

For recognizing laughter, there is no text-based file that I’m aware of that could give us the data we need. So we must go to the actual audio.

为了识别笑声,我知道没有一个基于文本的文件可以为我们提供所需的数据。 因此,我们必须转到实际的音频。

I started with the video files of each episode for all 10 seasons. From there we can use VLC (on Mac) to convert video files to audio (MP3).

我从所有10个季节的每一集的视频文件开始。 从那里,我们可以使用VLC(在Mac上)将视频文件转换为音频(MP3)。

Once we have the audio files, there is a transform we can apply that I found helps the ML algorithm tremendously.

一旦有了音频文件,便可以进行转换,我发现它可以极大地帮助ML算法​​。

We take the stereo MP3 track and split it into a dual-channel mono track, then invert one of the channels which effectively cancels out anything that was center-panned in the original stereo track. This is usually the character vocals. I use a free software called Audacity to accomplish this. This gives us an audio track that effectively silences the characters’ speech and leaves nearly all of the audience laughter alone. Here is the method.

我们采用立体声MP3轨道并将其分成双通道单声道轨道,然后反转其中一个通道,以有效地抵消原始立体声轨道中居中的任何东西。 这通常是角色的声音。 我使用名为Audacity的免费软件来完成此任务。 这为我们提供了一个音频轨道,可以有效地使角色的语音静音,并使几乎所有听众的笑声都消失了。 这是方法。

I will note that this method doesn’t work anywhere near perfectly (a lot of vocals come through still) but it can make a tremendous difference for helping the ML algorithm recognize laughter. Take a look at a 25-second clip before and after this transform. Hint: The yellow areas of the second clip are true laughter.

我会注意到,这种方法无法完美地发挥作用(很多声音仍然通过),但是对于帮助ML算法​​识别笑声可能会产生巨大的不同。 看一下此转换前后的25秒剪辑。 提示:第二个剪辑的黄色区域是真正的笑声。

之前: (Before:)

后: (After:)

Now we have the base audio that we will feed to our ML model.

现在,我们有了将要馈送到ML模型的基本音频。

In order for the model to learn, we need to give it the “answers” for a few episodes. In order to get training examples, I went through 20 full episodes in Audacity and labeled the laughter examples. You can easily export the labels as text files.

为了使模型学习,我们需要为它提供几集的“答案”。 为了获得训练示例,我在Audacity中观看了整整20集,并标记了笑声示例。 您可以轻松地将标签导出为文本文件。

While I was taking Andrew Ng’s Deep Learning course on Coursera, there was a very relevant project on detecting trigger words (like “Hey Siri”) from audio clips. The ML model used a 1-D convolutional layer and 2 GRU layers. I thought this model would be a good starting point.

当我在Coursera上学习Andrew Ng的深度学习课程时,有一个非常相关的项目,用于从音频片段中检测触发词(例如“ Hey Siri”)。 ML模型使用了一维卷积层和2个GRU层。 我认为此模型将是一个很好的起点。

For ease of use, I cut the 22-minute episodes into 10-second clips.

为了易于使用,我将22分钟的剧集切成10秒的剪辑。

For the model to “understand”, we have to translate our audio into numerical data. We can represent the audio clip to the model as a number of time steps (splitting the 10-second clip into roughly 10-millisecond increments) and display the values of different audio frequencies at each time step. I ended up with 257 frequencies per time step and 861 time steps per 10-second clip. So a 10-second clip becomes a numpy array with shape (861, 257). You can find the audio data preprocessing notebook here.

为了使模型“理解”,我们必须将音频转换为数字数据。 我们可以将音频片段表示为模型的多个时间步长(将10秒的片段分成大约10毫秒的增量),并在每个时间步长显示不同音频频率的值。 最终,我每个时间步长有257个频率,每10秒剪辑有861个时间步长。 因此,一个10秒的剪辑变成了一个形状为(861,257)的numpy数组。 您可以在此处找到音频数据预处理笔记本。

The model also needs “answers” to train on. To generate the y input, we use those text files full of hand-labeled laughter ranges. At each timestep of each 10-second clip, we basically ask the question, “Is this timestep inside a laughter range or outside?” and we label it “1” if it is inside a laughter instance, and “0” if not. For a 10-second clip, the y value will be of shape (861, 1) because we simply have a binary output at each time step in the clip.

该模型还需要“答案”进行培训。 为了生成y输入,我们使用那些带有手工标记的笑声范围的文本文件。 在每个10秒剪辑的每个时间步上,我们基本上都会问一个问题:“这个时间步是在笑声范围之内还是之外?” 如果在笑声实例中,则将其标记为“ 1”,否则将其标记为“ 0”。 对于10秒的剪辑,y值的形状为(861,1),因为我们在剪辑中的每个时间步仅具有二进制输出。

Side note: I ended up doing everything in Part 2 using Kaggle notebooks. I ran into disk space and RAM issues so many times that I wouldn’t recommend it for a project of this size. I understand Google Colab notebooks suffer similar restrictions. The appeal of these options is that you get free GPU time (30 hours per week) and unlimited access to 16 CPU cores. In the future, I would likely look towards a paid computing option to save myself the memory headaches. It turned out utilizing a GPU on a Kaggle notebook only had about a 15% improvement in training time. So for this project, a GPU may not have even been necessary. Here’s the notebook where I train the model.

旁注:我最终使用Kaggle笔记本完成了第2部分中的所有操作。 我经常遇到磁盘空间和RAM问题,因此我不建议在这种规模的项目中使用它。 我了解Google Colab笔记本电脑也受到类似的限制。 这些选项的吸引力在于,您可以获得免费的GPU时间(每周30小时)和无限制地访问16个CPU内核。 将来,我可能会考虑使用付费计算选项来节省内存的麻烦。 事实证明,在Kaggle笔记本电脑上使用GPU,培训时间仅减少了约15%。 因此,对于这个项目,甚至可能没有必要使用GPU。 这是我训练模型的笔记本。

We grab the X and y datasets and split them into 60% train, 20% development and 20% test.

我们获取X和y数据集,并将它们分为60%训练,20%开发和20%测试。

From there we define our model beginning with a 1-D convolutional layer and 2 GRU layers followed by a dense layer. I experimented with more or less GRU layers and also different combinations of dropout and batch normalization layers but this combination seemed to work best. It happened to be the exact same combination Andrew Ng’s “trigger word” model used. Maybe he knows what he’s talking about after all.

从那里我们定义我们的模型,从一维卷积层和2个GRU层开始,然后是一个致密层。 我尝试了或多或少的GRU层以及辍学和批处理规范化层的不同组合,但是这种组合似乎效果最好。 碰巧是吴安德(Andrew Ng)的“触发词”模型所使用的完全相同的组合。 也许他毕竟知道他在说什么。

We use the Adam algorithm for gradient descent optimization as well as a binary cross-entropy loss function.

我们使用亚当算法进行梯度下降优化以及二进制交叉熵损失函数。

Finally we can train the model!

终于我们可以训练模型了!

When I was experimenting with different hyper parameters and layer combinations I found that using 15 epochs was more than enough to figure out if the model would perform better or worse than past models. This took between 15–30 mins for most combinations. For the final model I decided to use, I trained for 100 epochs which took about 3.5 hours on a GPU.

当我尝试不同的超级参数和图层组合时,我发现使用15个历元足以判断该模型的性能是否比过去的模型好或坏。 对于大多数组合,此过程需要15-30分钟。 对于我决定使用的最终模型,我训练了100个时期,在GPU上花费了大约3.5个小时。

The best model was able to identify laughter with 95% accuracy. I tried to do better than 95% for a long time. But what I eventually figured out was that a lot of the error was my own fault (i.e. human error). I didn’t realize this until I went back and labeled the laughter for an episode that I had already labeled a week prior. I then compared the labels and found that they only had 97% overlap! Meaning that the “human” level accuracy for identifying laughter in this situation is ~97%, so it is unreasonable to expect a machine to do better than 97%. In that light, 95% for the algorithm looks like a more positive outcome.

最好的模型能够以95%的准确度识别笑声。 很长时间以来,我都试图做得好于95%。 但是我最终发现,很多错误是我自己的错误(即人为错误)。 直到我回过头将笑声贴上我一周前已经贴上的一集节目,我才意识到这一点。 然后我比较了标签,发现它们只有97%的重叠! 这意味着在这种情况下识别笑声的“人类”水平准确度约为97%,因此期望机器的性能优于97%是不合理的。 因此,该算法的95%看起来是一个更积极的结果。

Even with 95% accuracy, the model almost never misses a laughter instance. The 5% error tends to come from the very beginning or end of a laughter instance where it is often less clear when laughter “officially” begins or ends.

即使具有95%的准确性,该模型也几乎不会遗漏笑声实例。 5%的错误往往来自笑声实例的开始或结束,在笑声实例“正式”开始或结束时,往往不清楚。

Once we have the trained model, I use it to predict laughter for all the non-labeled episodes. I did this outside of the Kaggle environment because of the massive amount of data we are predicting (all 10 seasons of audio effectively). Notebook found here.

一旦有了训练有素的模型,我就可以用它来预测所有未标记事件的笑声。 我在Kaggle环境之外进行此操作,因为我们正在预测大量数据(有效地播放了所有十个音频季节)。 在这里找到笔记本。

With the predictions in hand, there are two post-processing steps I take in order to make the laughter ranges more sensible. Anything under 400ms is too short to be a standalone laughter instance, so I decided it either needs to join together with a close-by laughter instance, or it needs to be removed. Any gap of 100ms or less between two laughter instances is much too short to be meaningful, so we combine those two laughter instances into one longer laughter instance.

有了这些预测,我采取了两个后处理步骤,以使笑声范围更加合理。 任何不到400毫秒的时间都太短了,不足以成为一个独立的笑声实例,因此我决定它要么需要与附近的笑声实例结合在一起,要么需要将其删除。 两个笑声实例之间的100ms或更短的间隔太短而无意义,因此我们将这两个笑声实例合并为一个更长的笑声实例。

We take our processed predictions and store them in SQLite where they are now ready to be combined with the character data from Part 1.

我们将处理后的预测存储在SQLite中,现在可以将它们与第1部分中的字符数据进行组合。

第三部分 (Part 3)

结合人物和笑声 (Combining the Characters with the Laughter)

Part 3 notebook found here.

在此处找到第3部分笔记本。

We bring in both datasets and format them to be compatible with each other.

我们引入两个数据集并对其进行格式化以使其相互兼容。

Then we use a simple minimization function to decide who caused each instance of laughter. We take the beginning timestamp of the laughter instance and we find the minimum distance to the end of a character line. Whichever character’s line ends closest to the beginning of the laughter (within +/- 3 seconds) is the character to whom we attribute the laughter instance.

然后,我们使用一个简单的最小化函数来确定是谁引起了每个笑声。 我们以笑声实例的开始时间戳为例,并找到到字符行末尾的最小距离。 哪个字符行最接近笑声的开头(在+/- 3秒内)是我们将笑声实例归因于的字符。

Now we have the exact data we need to answer “Who is the Funniest Friend in FRIENDS?”

现在我们有了确切的数据,我们需要回答“谁是朋友中最有趣的朋友?”

But first, to make sure everything is working correctly, I create custom subtitles for each episode that will show us the output in real time.

但是首先,为了确保一切正常,我为每个剧集创建了自定义字幕,以实时向我们显示输出。

SRT files (subtitles files) are just text files with a standardized format for displaying subtitles on a video. We convert our timestamps and speaking lines into SRT format and we can combine it with any video file. Here’s the demo:

SRT文件(字幕文件)只是具有标准化格式的文本文件,用于在视频上显示字幕。 我们将时间戳记和讲话线路转换为SRT格式,并且可以将其与任何视频文件合并。 这是演示:

演示地址

Once we can see that the labels are working well in practice, all that’s left is to calculate the answer to our original question.

一旦我们看到标签在实践中运行良好,剩下的就是计算原始问题的答案。

Like I mention in the results article, it would be unfair to simply add up the seconds of laughter for each character across all 10 seasons to see who is funniest. That’s because some characters have many more lines than others. So I believe a more fair way to calculate the funniest character would be to see how much laughter follows each line spoken by a character on average. The character with the most seconds of laughter following each of their lines on average, wins.

就像我在结果文章中提到的那样,简单地将所有10个季节中每个角色的笑声加起来来看看谁最有趣是不公平的。 这是因为某些字符比其他字符具有更多的行。 因此,我认为一种计算最有趣的角色的更公平的方法是,平均每个角色说出的每一行笑声会多少。 平均来说,每行之后笑声最多的人物即会获胜。

演示地址

A couple SQL queries later, we can see that it turns out to be Joey with 1.27 seconds of laughter per line on average! But wait there’s more…

稍后进行了几次SQL查询,我们可以看到平均每行有1.27秒的笑声才是Joey! 但是,等等……

这些发现的准确性如何? (How accurate are these findings?)

It’s easy to look at a chart with precise numbers and “trust” that the data is accurate. But we all know this isn’t how the world works. So what are the chances that Joey ISN’T the funniest character?

很容易查看具有精确数字的图表,并“相信”数据是准确的。 但是我们都知道,这不是世界运转的方式。 那么,乔伊不是最有趣的角色的机会是什么?

We know Part 1 predicted the correct character for subtitle lines ~98% of the time (I painstakingly hand-labeled ~1000 lines to test it). And we know for Part 2, our ML model was ~95% accurate in labeling each ~10ms timestep of laughter correctly. For Part 3, deciding who is responsible for each laugh, I did more manual testing on a few episodes (~300 examples) and found that we attribute laughter to the correct character about 95% of the time. But our final metric, Joey’s 1.27 seconds of laughter per line, is a bit more complicated. It combines getting the character correct (~95%) and getting the length of laughter correct (~95%). On a timestep by timestep basis, the probability of both events being correct is ~90% (95% * 95%).

我们知道,第1部分在大约98%的时间内预测了字幕行的正确字符(我辛苦地手工标记了大约1000行以进行测试)。 我们知道,对于第2部分,我们的ML模型在正确标记每个〜10ms的笑声时步时,准确率约为95%。 对于第3部分,我决定对每个笑声负责,我对几个情节(约300个示例)进行了更多的手动测试,发现我们大约有95%的时间将笑声归因于正确的角色。 但是我们的最终指标,乔伊每行的笑声为1.27秒,要复杂一些。 它结合了正确的角色(〜95%)和正确的笑声长度(〜95%)。 在每个时间步长的基础上,两个事件均正确的可能性约为90%(95%* 95%)。

With this in mind, we can calculate confidence intervals around Joey’s 1.27 seconds of laughter per line. A 95% confidence interval gives us a +/- 0.02 seconds range around Joey’s 1.27 average seconds of laughter. In second place, Chandler’s 1.22 seconds of laughter also has a 95% confidence interval of +/- 0.02 seconds. This means we have greater than 95% confidence that Joey is the funniest character because Chandler’s best case value of 1.24 can’t surpass Joey’s worst case value of 1.25 within our 95% interval!

考虑到这一点,我们可以计算出乔伊每行笑声的1.27秒左右的置信区间。 95%的置信区间使我们在Joey的平均1.27秒笑声周围有+/- 0.02秒的范围。 第二名,钱德勒的1.22秒笑声也具有+/- 0.02秒的95%置信区间。 这意味着我们有95%的信心认为Joey是最有趣的角色,因为在我们的95%区间内,Chandler的最佳情况值1.24不能超过Joey的最坏情况值1.25!

On the other hand, Chandler and Phoebe’s intervals DO overlap which means we can’t be confident about the 2nd funniest character. And we also can’t be sure about the least funny character because Rachel and Monica’s intervals overlap as well!

另一方面,钱德勒和菲比的间隔确实重叠,这意味着我们对第二个最有趣的角色没有信心。 而且我们也不能确定最不有趣的角色,因为瑞秋和莫妮卡的间隔也重叠!

演示地址

I hope you enjoyed this walkthrough. If you’re working on a similar project, feel free to hit me up. Or if you have questions about any methods in this article or in my notebooks, please reach out as well. I would especially enjoy suggestions on improvements to this project! You can find my contact here.

希望您喜欢本演练。 如果您正在从事类似的项目,请随时与我联系。 或者,如果您对本文或我的笔记本中的任何方法有疑问,请也与我们联系。 我特别喜欢改进该项目的建议! 你可以在这里找到我的联系方式。

Originally published at https://jacksanford.me.

最初发布在 https://jacksanford.me

翻译自: https://medium.com/swlh/using-ml-to-find-the-funniest-friend-in-friends-49d34b5fb36

php ml 使用

你可能感兴趣的:(python,java,人工智能)