- 接上文:[中/英双语] Andrej Karpathy:A Survival Guide to a PhD (一)
Andrej Karpathy - Academic Website | Blog | Github | Quora Session.
- Research Scientist at OpenAI.
- Previously ML/CV PhD student at Stanford under Prof. Fei-Fei Li.
- Course Instructors for famous Stanford CS231n Course in Computer Vision.
Writing papers 写论文
Writing good papers is an essential survival skill of an academic (kind of like making fire for a caveman). In particular, it is very important to realize that papers are a specific thing: they look a certain way, they flow a certain way, they have a certain structure, language, and statistics that the other academics expect. It’s usually a painful exercise for me to look through some of my early PhD paper drafts because they are quite terrible. There is a lot to learn here.
在学术界,能写好论文是一项关键的生存技能(就像是生火技能对穴居人一样)。特别地,很重要的一点是要意识到论文是一种特别的事物:它们看起来有一定的形式、以一定的方式流动、有一定的结构、语言以及其他学者所期望的统计数据。对我来说,查看我博士早期阶段的论文真是一种痛苦的历练,因为它们实在太糟糕了。在这方面有很多东西需要了解。
Review papers. If you’re trying to learn to write better papers it can feel like a sensible strategy to look at many good papers and try to distill patterns. This turns out to not be the best strategy; it’s analogous to only receiving positive examples for a binary classification problem. What you really want is to also have exposure to a large number of bad papers and one way to get this is by reviewing papers. Most good conferences have an acceptance rate of about 25% so most papers you’ll review are bad, which will allow you to build a powerful binary classifier. You’ll read through a bad paper and realize how unclear it is, or how it doesn’t define it’s variables, how vague and abstract its intro is, or how it dives in to the details too quickly, and you’ll learn to avoid the same pitfalls in your own papers. Another related valuable experience is to attend (or form) journal clubs - you’ll see experienced researchers critique papers and get an impression for how your own papers will be analyzed by others.
查阅论文。如果你正在学习写更好的论文,阅读许多好论文并提取出其中的模式似乎是一个明智的选择。但事实证明这并不是最好的策略;这就好像是对于一个二元分类问题只接受正面的样本一样。你真正需要的是查阅大量糟糕的论文,其中一种方法是评阅论文。大部分好的会议的论文接收率大约为 25%,所以你查阅的大部分论文都很差,这让你可以构建一个强大的二元分类器。你可以阅读一篇糟糕的论文,看它的描述有多么不清楚,或者它如何没有定义自己的变量、摘要介绍有多模糊、或者它如何过快地深入到了细节之中——你可以学习让你的论文不落入同样的陷阱。另一个相关的有价值的经验是参加(或组织)读书俱乐部——你将看到经验丰富的研究者批评论文,并且了解自己的论文将会被其他人怎样分析。
Get the gestalt right. I remember being impressed with Fei-Fei (my adviser) once during a reviewing session. I had a stack of 4 papers I had reviewed over the last several hours and she picked them up, flipped through each one for 10 seconds, and said one of them was good and the other three bad. Indeed, I was accepting the one and rejecting the other three, but something that took me several hours took her seconds. Fei-Fei was relying on the gestalt of the papers as a powerful heuristic. Your papers, as you become a more senior researcher take on a characteristic look. An introduction of ~1 page. A ~1 page related work section with a good density of citations - not too sparse but not too crowded. A well-designed pull figure (on page 1 or 2) and system figure (on page 3) that were not made in MS Paint. A technical section with some math symbols somewhere, results tables with lots of numbers and some of them bold, one additional cute analysis experiment, and the paper has exactly 8 pages (the page limit) and not a single line less. You’ll have to learn how to endow your papers with the same gestalt because many researchers rely on it as a cognitive shortcut when they judge your work.
格式正确。我清楚地记得有一次和飞飞参加一次审阅会议。我在前面的几个小时里只评阅了 4 篇论文,而她拿起这些论文,每篇只翻了 10 秒钟就说其中一篇很好,其它都很糟糕。确实如此,我也接受了这一篇并拒绝了其它三篇,但这项花费我几个小时做成的事她只用几十秒就完成了。飞飞是将论文的格式作为强大的启发线索的。随着你变成越来越资深的研究者,你的论文将有一种特定风格的外观。一页引言/介绍。一页带有合适密度引用文献(不过于稀疏也不过于密集)的相关成果介绍。一张设计良好的 pull figure(在第一页或第二页)和系统图(在第三页)——不要用 MS Paint 制作。描写技术的章节在某个地方有些数学符号、带有大量数字的结果表(其中一些是粗体)、一个额外的聪明的分析实验、而且论文正好有 8 页(页数限制)且一行不少。你将不得不学习如何为你的论文赋予相同的格式,因为许多研究者在评价你的成果时都将其作为认知的捷径。
Identify the core contribution. Before you start writing anything it’s important to identify the single core contribution that your paper makes to the field. I would especially highlight the word single. A paper is not a random collection of some experiments you ran that you report on. The paper sells a single thing that was not obvious or present before. You have to argue that the thing is important, that it hasn’t been done before, and then you support its merit experimentally in controlled experiments. The entire paper is organized around this core contribution with surgical precision. In particular it doesn’t have any additional fluff and it doesn’t try to pack anything else on a side. As a concrete example, I made a mistake in one of my earlier papers on video classification where I tried to pack in two contributions: 1) a set of architectural layouts for video convnets and an unrelated 2) multi-resolution architecture which gave small improvements. I added it because I reasoned first that maybe someone could find it interesting and follow up on it later and second because I thought that contributions in a paper are additive: two contributions are better than one. Unfortunately, this is false and very wrong. The second contribution was minor/dubious and it diluted the paper, it was distracting, and no one cared. I’ve made a similar mistake again in my CVPR 2014 paper which presented two separate models: a ranking model and a generation model. Several good in-retrospect arguments could be made that I should have submitted two separate papers; the reason it was one is more historical than rational.
确定核心贡献。在你开始写任何东西之前,首先很重要的是要确定你的论文对该领域的一个单一的核心贡献。我会特别强调其中的单个词。一篇论文不是你运行的一些实验的随机集合的报告。论文的目的是给出一个之前并不存在或并不明显的单个事物。你必须认为这个事物是重要的,它之前从未被完成过,然后你通过实验的方式在有对照组的环境中证明它的优点。整篇论文都应该围绕这一核心贡献精准地展开。尤其是不要有任何额外的无价值的扩展,也不要裹带任何其它东西。举一个具体的例子,在我早期的一篇关于视频分类的论文(Large-scale Video Classification with Convolutional Neural Networks)中我就犯了这个错误,我尝试一次打包两个贡献:1)一个用于视频卷积网络的架构布局集合,2)一个不相关的带有很小改进的多分辨率架构。我把它加上去是因为我觉得一是也许有人会对此感兴趣然后跟进后续研究,二是因为我觉得论文的贡献越多越好:两个贡献好于一个贡献。不幸的是,这是一个非常彻底的错误。第二个贡献是微不足道的/可疑的,它稀释了这篇论文,分散了注意力,而且也没人关心。在我 CVPR 2014 的一篇论文(Deep Visual-Semantic Alignments for Generating Image Descriptions)中我又犯了类似的错误,我在该论文给出了两个没有关联的模型:一个排序模型和一个生成模型。我可以举出一些好的论据来证明我应该分开发两篇论文;只些一个贡献的原因更多是历史上的,而非理智上的。
The structure. Once you’ve identified your core contribution there is a default recipe for writing a paper about it. The upper level structure is by default Intro, Related Work, Model, Experiments, Conclusions. When I write my intro I find that it helps to put down a coherent top-level narrative in latex comments and then fill in the text below. I like to organize each of my paragraphs around a single concrete point stated on the first sentence that is then supported in the rest of the paragraph. This structure makes it easy for a reader to skim the paper. A good flow of ideas is then along the lines of 1) X (+define X if not obvious) is an important problem 2) The core challenges are this and that. 2) Previous work on X has addressed these with Y, but the problems with this are Z. 3) In this work we do W (?). 4) This has the following appealing properties and our experiments show this and that. You can play with this structure a bit but these core points should be clearly made. Note again that the paper is surgically organized around your exact contribution. For example, when you list the challenges you want to list exactly the things that you address later; you don’t go meandering about unrelated things to what you have done (you can speculate a bit more later in conclusion). It is important to keep a sensible structure throughout your paper, not just in the intro. For example, when you explain the model each section should: 1) explain clearly what is being done in the section, 2) explain what the core challenges are 3) explain what a baseline approach is or what others have done before 4) motivate and explain what you do 5) describe it.
结构。一旦你确定了你的核心贡献,就有了一个写论文的默认配方。上层结构默认的是引言/介绍、相关工作、模型、实验、结论。当我写我的引言时,我发现可以以相关评论的形式写下一些条理分明的顶层叙述,然后再填写下面的文本,这会很有帮助。我喜欢围绕单个明确的点来组织我的段落,并且这个观点在第一段就会给出,并用该段的剩下部分来支撑这个观点。这样的结构可以让读者轻松地快速略览。然后我们需要一个好的思维流程,可以按以下线索进行:1)X(如果不明显,还要加上对 X 的定义)是一个重要的问题;2)核心的挑战是什么,2)X 上之前的成果已经用 Y 解决的问题,而这一次的问题是 Z;3)在这项工作中,我们做了 W(?);4)这有以下有吸引力的特性,我们的实现表明了什么。你可以稍微调整这个结构,但这些核心的点需要得到明确。再重申一下:论文需要围绕你的确切贡献精准地进行组织。比如说,当你罗列挑战的时候,你需要确切列出那些你将在后面解决的问题,而不要牵扯到你做的与之无关的事情上(你可以在后面的结论中多做一点推测)。不只是在引言中,保持论文整体的合理结构也是很重要的。比如说,当你解释你的模型时,每一节应该:1)解释清楚在这一节做了什么,2)解释核心挑战,3)解释基本方法或之前其他人做了哪些工作,4)解释你的动机和你所做的工作,5)描述它。
Break the structure. You should also feel free (and you’re encouraged to!) play with these formulas to some extent and add some spice to your papers. For example, see this amusing paper from Razavian et al. in 2014 that structures the introduction as a dialog between a student and the professor. It’s clever and I like it. As another example, a lot of papers from Alyosha Efros have a playful tone and make great case studies in writing fun papers. As only one of many examples, see this paper he wrote with Antonio Torralba: Unbiased look at dataset bias. Another possibility I’ve seen work well is to include an FAQ section, possibly in the appendix.
打破结构。你也应该灵活应对这些格式,扩展你的论文,为之增加一点香料。比如说 Razavian et al. 的这篇论文(CNN Features off-the-shelf: an Astounding Baseline for Recognition)惊人地将引言做成了一位学生和教授的对话形式。这做得很聪明,我很喜欢。另一个例子,Alyosha Efros 的很多论文都带着一种俏皮的语气,为有趣论文的书写给出了绝佳的案例。比如说他与 Antonio Torralba 合著的这篇论文《Unbiased look at dataset bias》。另一种我见过的效果不错论文是问答式的章节,可能用在附录中。
Common mistake: the laundry list. One very common mistake to avoid is the “laundry list”, which looks as follows: “Here is the problem. Okay now to solve this problem first we do X, then we do Y, then we do Z, and now we do W, and here is what we get”. You should try very hard to avoid this structure. Each point should be justified, motivated, explained. Why do you do X or Y? What are the alternatives? What have others done? It’s okay to say things like this is common (add citation if possible). Your paper is not a report, an enumeration of what you’ve done, or some kind of a translation of your chronological notes and experiments into latex. It is a highly processed and very focused discussion of a problem, your approach and its context. It is supposed to teach your colleagues something and you have to justify your steps, not just describe what you did.
常见的错误:洗衣清单(laundry list)。洗衣清单是应该避免的一种非常常见的错误,它看起来像这样:「这里有一个问题。现在为了解决这个问题,我们首先做 X,然后我们做 Y,再做 Z,之后再是 Y,就得到了我们的结果。」你应该竭力避免这种结构。每一个点都应该得到证明、给出动机和解释。为什么你要做 X 或 Y?有没有替代选择?其他人做了什么?可以说这样的论文很常见(如果可能的话我倒愿意给出例子)。你的论文不是一份报告,不是你做过的事情的枚举,也不是你的按时间排列的笔记和实验的某种格式化的翻译。论文是对于一个问题、你的方法和其背景的高度处理过的和高度聚焦的讨论。它应该能教给你的同事一些东西,它必须要能证明你的步骤,而不只是描述你做了什么。
The language. Over time you’ll develop a vocabulary of good words and bad words to use when writing papers. Speaking about machine learning or computer vision papers specifically as concrete examples, in your papers you never “study” or “investigate” (there are boring, passive, bad words); instead you “develop” or even better you “propose”. And you don’t present a “system” or, shudder, a “pipeline”; instead, you develop a “model”. You don’t learn “features”, you learn “representations”. And god forbid, you never “combine”, “modify” or “expand”. These are incremental, gross terms that will certainly get your paper rejected :).
语言。随着时间的推移,你会积累一个写论文时的好词词典和坏词词典。具体可以机器学习或计算机视觉论文为例:在你的论文中永远不要出现「study」和「investigate」(这是无聊的、被动的、糟糕的词);而你应该使用「develop」或甚至「propose」这样的词。你不要提出一个「system」或甚至更糟的「pipeline」;相反,你开发了一个「model」。你不是在学习「features」,你是在学习「representations」。而且上帝保佑,你千万不要使用「combine」、「modify」或「expand」。这些多余的、粗陋的术语肯定会让你的论文被拒 :)
An internal deadlines 2 weeks prior. Not many labs do this, but luckily Fei-Fei is quite adamant about an internal deadline 2 weeks before the due date in which you must submit at least a 5-page draft with all the final experiments (even if not with final numbers) that goes through an internal review process identical to the external one (with the same review forms filled out, etc). I found this practice to be extremely useful because forcing yourself to lay out the full paper almost always reveals some number of critical experiments you must run for the paper to flow and for its argument flow to be coherent, consistent and convincing.
提前两周的内部截至时间。并没有许多实验室这样做,但幸运的是飞飞对这个提前两周的内部截至时间限制很是坚定,在这个时间,你必须提交至少 5 页带有所有最终实验的草稿(即使不是最终的数字);这份草稿会进入一个与外部完全一样的内部评审过程(具有相同的评审表等等)我发现这种做法非常有用,因为这会迫使你思考整篇论文的布局,从而总是能让你彰显出一些你必须为这篇论文的思路而运行的关键实验,并让论据思路条理清晰、连贯和有说服力。
Another great resource on this topic is Tips for Writing Technical Papers from Jennifer Widom.
关于这一主题的另一个好资源是 Jennifer Widom 写的《Tips for Writing Technical Papers》(https://cs.stanford.edu/people/widom/paper-writing.html)。
Writing code 写代码
A lot of your time will of course be taken up with the execution of your ideas, which likely involves a lot of coding. I won’t dwell on this too much because it’s not uniquely academic, but I would like to bring up a few points.
当然,你仍旧会花很多时间在实现你的想法上,也就是说,你还会编写很多代码。因为这并不是学术上独有的工作,所以我不会在此详谈,但还是有几点我想提一下。
Release your code. It’s a somewhat surprising fact but you can get away with publishing papers and not releasing your code. You will also feel a lot of incentive to not release your code: it can be a lot of work (research code can look like spaghetti since you iterate very quickly, you have to clean up a lot), it can be intimidating to think that others might judge you on your at most decent coding abilities, it is painful to maintain code and answer questions from other people about it (forever), and you might also be concerned that people could spot bugs that invalidate your results. However, it is precisely for some of these reasons that you should commit to releasing your code: it will force you to adopt better coding habits due to fear of public shaming (which will end up saving you time!), it will force you to learn better engineering practices, it will force you to be more thorough with your code (e.g. writing unit tests to make bugs much less likely), it will make others much more likely to follow up on your work (and hence lead to more citations of your papers) and of course it will be much more useful to everyone as a record of exactly what was done for posterity. When you do release your code I recommend taking advantage of docker containers; this will reduce the amount of headaches people email you about when they can’t get all the dependencies (and their precise versions) installed.
公开你的代码。虽然你可能会感到惊讶,但是你确实可以不发表论文也不公开代码。同时,你有很多动机将自己的代码藏起来:写代码会花费许多时间(研究项目的代码看起来像是意大利面,因为它的迭代非常快,所以你需要经常进行清理);同时,光是想到别人可能会对你的代码评头论足,就已经足够吓人了,维护代码以及回答别人(永远会有)的问题是非常痛苦的,你甚至会担心别人可能会发现代码中的错误,从而减弱了研究的可信度。然而,这正是你应该发表代码的原因之一:为了避免尴尬的情况发生,你会不断采用更好的编码习惯(而这最终会帮你节省时间!);你会被迫使学习更好的工程实践;你会被迫使对自己的代码更加严格要求(例如,编写单元测试以最小化错误出现的可能性),这一切都将让你的研究受到更多关注(并由此带来更多的引用次数),并且很自然地,你的研究也将对之后的研究更加有用。当你真的准备发表代码的时候,我建议你好好利用 docker containers(https://www.docker.com/);它会减少人们发邮件来问你要附件(和它们的各种版本),从而减轻你的烦恼。
Think of the future you. Make sure to document all your code very well for yourself. I guarantee you that you will come back to your code base a few months later (e.g. to do a few more experiments for the camera ready version of the paper), and you will feel completely lost in it. I got into the habit of creating very thorough readme.txt files in all my repos (for my personal use) as notes to future self on how the code works, how to run it, etc.
为将来的你着想。为了你自己的便捷,务必将自己的所有代码妥善记录,我保证几个月之后你会回来看你的代码(例如,为即将发表的论文再做几个实验),那时,你会一头雾水。我已经养成了为(自己的)每一个版本编写非常详尽的 readme.txt 文件的习惯,以便未来的自己能够明白代码的原理和使用方法等等。
Giving talks 做演讲
So, you published a paper and it’s an oral! Now you get to give a few minute talk to a large audience of people - what should it look like?
现在,你的论文成功发表了!你需要就这篇论文向许多观众进行几分钟的演讲——它应该是什么样的?
The goal of a talk. First, that there’s a common misconception that the goal of your talk is to tell your audience about what you did in your paper. This is incorrect, and should only be a second or third degree design criterion. The goal of your talk is to 1) get the audience really excited about the problem you worked on (they must appreciate it or they will not care about your solution otherwise!) 2) teach the audience something (ideally while giving them a taste of your insight/solution; don’t be afraid to spend time on other’s related work), and 3) entertain (they will start checking their Facebook otherwise). Ideally, by the end of the talk the people in your audience are thinking some mixture of “wow, I’m working in the wrong area”, “I have to read this paper”, and “This person has an impressive understanding of the whole area”.
演讲的目的。首先,一个常有的误解是,演讲的目的是向听众介绍你在论文中做了什么。这是错误的,这一目的最多也只能排在第二或第三位。你的演讲应应该:1)使听众对你研究的问题产生浓厚兴趣(如果大家对问题本身没兴趣,他们也不会在乎你的解决方法的!)2)教些东西给听众(理想的情况是在让大家体验你的思考 / 解决方案的时候,不要害怕在别人的相关工作上花时间)以及 3)有趣(否则很多人会开始刷 Facebook)。理想情况下,在演讲结束之后。你的听众中应该有人在想这几件事情:「哇,我要换个研究方向」,「我一定要看看这篇论文」,以及「作者本人对整个领域的理解非常出众。」
A few do’s: There are several properties that make talks better. For instance, Do: Lots of pictures. People Love pictures. Videos and animations should be used more sparingly because they distract. Do: make the talk actionable - talk about something someone can do after your talk. Do: give a live demo if possible, it can make your talk more memorable. Do: develop a broader intellectual arch that your work is part of. Do: develop it into a story (people love stories). Do: cite, cite, cite - a lot! It takes very little slide space to pay credit to your colleagues. It pleases them and always reflects well on you because it shows that you’re humble about your own contribution, and aware that it builds on a lot of what has come before and what is happening in parallel. You can even cite related work published at the same conference and briefly advertise it. Do: practice the talk! First for yourself in isolation and later to your lab/friends. This almost always reveals very insightful flaws in your narrative and flow.
一些可以做的事情:有些特征会让演讲更上一层楼,例如,要:有许多图片。人们喜欢图片。录像和动画应该更少一些,因为它们容易让人分心。要让演讲内容高度可执行——将一些人们在听到之后可以马上动手去做的东西。要:如果可能的话给一个 demo,它会让你的演讲更容易被记住。要发展一个你的研究涉及到更广泛的领域。要讲成一个故事(人们喜欢故事)。要引用,引用,引用——很多应用!加入引用不会占用你的幻灯片多大的空间,而你的同行们会因此感到高兴,并且认为你是一个十分谦虚的人,因为你意识到自己的贡献是建立在他人的许多成果之上的。你甚至可以引用在同一个会议发表的文章,并为之做简短的推荐。要进行练习!先自己练习,然后向同事 / 朋友展示。这常常会帮你发现许多叙述和流程中的重要问题。
Don’t: texttexttext. Don’t crowd your slides with text. There should be very few or no bullet points - speakers sometimes try to use these as a crutch to remind themselves what they should be talking about but the slides are not for you they are for the audience. These should be in your speaker notes. On the topic of crowding the slides, also avoid complex diagrams as much as you can - your audience has a fixed bit bandwidth and I guarantee that your own very familiar and “simple” diagram is not as simple or interpretable to someone seeing it for the first time.
不要加很多文字。不要让文字挤满你的幻灯片。你应该少用甚至不用重点标识——演讲者们有时会使用重点标识来提醒自己要讲些什么,但是幻灯片不是给你自己看的,而是给观众看的。重点标识应该出现在你的演讲笔记中。于此类似地,尽可能地避免使用复杂的图表——你的听众是有固定带宽的,并且我保证那些在你看来十分熟悉且「简单」的图表,对于那些第一次看到的人来说,就不是这么好理解了。
Careful with: result tables: Don’t include dense tables of results showing that your method works better. You got a paper, I’m sure your results were decent. I always find these parts boring and unnecessary unless the numbers show something interesting (other than your method works better), or of course unless there is a large gap that you’re very proud of. If you do include results or graphs build them up slowly with transitions, don’t post them all at once and spend 3 minutes on one slide.
注意,结果表:不要使用信息十分密集的表格来展示你的方法有多么优秀。既然你已经写了篇论文出来了,我相信你的结果至少是可靠的。我一致认为这一部分是非常无聊和无用的,除非数字能够表明一些(与证明你的论文无关的)十分有趣的东西,或者数字所表明的差距确实非常巨大。如果你真的要展示结果或图表,请循序渐进地将它们展示出来,而不是把所有东西扔到页面上,然后在一页幻灯片上花上三分钟。
Pitfall: the thin band between bored/confused. It’s actually quite tricky to design talks where a good portion of your audience learns something. A common failure case (as an audience member) is to see talks where I’m painfully bored during the first half and completely confused during the second half, learning nothing by the end. This can occur in talks that have a very general (too general) overview followed by a technical (too technical) second portion. Try to identify when your talk is in danger of having this property.
陷阱:无聊与困惑之间的微小距离。如果你听众中的许多人都抱着一种学习的心态而来,要设计出一个好的演讲不是那么容易的。一个常见的失败案例是(作为一个听众),在演讲的前半段无聊至死,然后在后半段困惑不已,最后啥都没学到。经常出现这一情形的演讲的特点是,摘要非常概括性(过于概括了),然后紧接着技术(过于技术的)详解。尝试在你的演讲中规避这一倾向。
Pitfall: running out of time. Many speakers spend too much time on the early intro parts (that can often be somewhat boring) and then frantically speed through all the last few slides that contain the most interesting results, analysis or demos. Don’t be that person.
陷阱:超时。许多演讲者会在开始的部分花费过多的时间(一般来讲这也会使得演讲变得无聊),然后火急火燎地了解最后的几张幻灯片,而那些往往是最有趣的结果、分析或 demo。不要做这样的演讲者。
Pitfall: formulaic talks. I might be a special case but I’m always a fan of non-formulaic talks that challenge conventions. For instance, I despise the outline slide. It makes the talk so boring, it’s like saying: “This movie is about a ring of power. In the first chapter we’ll see a hobbit come into possession of the ring. In the second we’ll see him travel to Mordor. In the third he’ll cast the ring into Mount Doom and destroy it. I will start with chapter 1” - Come on! I use outline slides for much longer talks to keep the audience anchored if they zone out (at 30min+ they inevitably will a few times), but it should be used sparingly.
陷阱:形式化的演讲。我可能是个特例,但是我一直都喜欢挑战传统的、规避形式化的演讲。例如,我鄙视在幻灯片中加入演讲大纲的行为。因为这使得整个演讲变得无聊,就像在说:「这部电影讲述的是一个有魔力的戒指,在第一章我们会看到一个霍比特人得到这个戒指,第二章我们会看到他去了 Mordor,第三章里他将戒指扔到了 Mount Doom 并将之毁坏了。我将从第一章开始讲起」——拜托别这样!我只在非常长的演讲中才使用大纲页面,以便于听众在走神之后重新恢复记忆(30 分钟后他们往往会走几次神),但是这应该尽量少用。
Observe and learn. Ultimately, the best way to become better at giving talks (as it is with writing papers too) is to make conscious effort to pay attention to what great (and not so great) speakers do and build a binary classifier in your mind. Don’t just enjoy talks; analyze them, break them down, learn from them. Additionally, pay close attention to the audience and their reactions. Sometimes a speaker will put up a complex table with many numbers and you will notice half of the audience immediately look down on their phone and open Facebook. Build an internal classifier of the events that cause this to happen and avoid them in your talks.
观察并学习。最终,成为一个优秀演讲者的最好方法是(写论文也是这样),留意观察优秀的(和不怎么优秀的)演讲者的行为,然后在你的大脑里构建一个二元分类器。不要仅仅做演讲的听众;你要对它们进行分析、分解、然后从中学习。除此之外,留意现场反应。有时,当演讲者展示出一个复杂的数字表格时,你会注意到,许多观众立马低头看起了手机。为可能导致这一场景的行为构建一个内部分类器,并在你自己的演讲中避免这些行为。
Attending conferences 参加会议
On the subject of conferences:
对于会议:
Go. It’s very important that you go to conferences, especially the 1-2 top conferences in your area. If your adviser lacks funds and does not want to pay for your travel expenses (e.g. if you don’t have a paper) then you should be willing to pay for yourself (usually about $2000 for travel, accommodation, registration and food). This is important because you want to become part of the academic community and get a chance to meet more people in the area and gossip about research topics. Science might have this image of a few brilliant lone wolfs working in isolation, but the truth is that research is predominantly a highly social endeavor - you stand on the shoulders of many people, you’re working on problems in parallel with other people, and it is these people that you’re also writing papers to. Additionally, it’s unfortunate but each field has knowledge that doesn’t get serialized into papers but is instead spread across a shared understanding of the community; things such as what are the next important topics to work on, what papers are most interesting, what is the inside scoop on papers, how they developed historically, what methods work (not just on paper, in reality), etcetc. It is very valuable (and fun!) to become part of the community and get direct access to the hivemind - to learn from it first, and to hopefully influence it later.
参加。参加会议是很重要的,特别是你所在的领域的最顶尖的 1-2 场会议。如果你的导师缺乏资金,不愿意为你的路费买单(例如,当你还没有论文的时候),那么你应当愿意自己买单。这是很重要的,因为你需要成为学术圈的一员,并能够见到更多同僚,以及了解研究话题的八卦。科学界可能有一些极少数的单打独斗的人,但是真相是,做研究很大程度上是一个高度社交性的事业——你是站在许多人的肩膀上的,且还有许多人和你一起努力,并且这些人也是你的论文的阅读者。此外,我很遗憾这么说,但是每一个领域都有一些没有出现在论文里、但是在整个圈子里广为流传的知识,包括接下来的重要话题有什么,哪些论文是最有趣的,论文的内线消息是什么,他们之前是如何发展的,哪些方法管用了(不是在论文里,而是在实际中),等等等等。成为圈子里的一员,并且了解这个集体中的共识,是很有价值的(并且很有趣!)——首先从中学习,然后最好能够影响这个圈子。
Talks: choose by speaker. One conference trick I’ve developed is that if you’re choosing which talks to attend it can be better to look at the speakers instead of the topics. Some people give better talks than others (it’s a skill, and you’ll discover these people in time) and in my experience I find that it often pays off to see them speak even if it is on a topic that isn’t exactly connected to your area of research.
讲座:根据演讲者进行选择。我使用的一个会议技巧是,在选择讲座的时候要看演讲嘉宾,而不是讲座主题(这是一项技能,慢慢地你会发现有价值的人),并且,根据我的经验,我发现亲耳听这些人演讲会大有裨益,尽管话题甚至和你的研究领域没有直接联系。
The real action is in the hallways. The speed of innovation (especially in Machine Learning) now works at timescales much faster than conferences so most of the relevant papers you’ll see at the conference are in fact old news. Therefore, conferences are primarily a social event. Instead of attending a talk I encourage you to view the hallway as one of the main events that doesn’t appear on the schedule. It can also be valuable to stroll the poster session and discover some interesting papers and ideas that you may have missed.
It is said that there are three stages to a PhD. In the first stage you look at a related paper’s reference section and you haven’t read most of the papers. In the second stage you recognize all the papers. In the third stage you’ve shared a beer with all the first authors of all the papers.
真正有价值的信息可能在走廊上。现在,创新的速度(尤其在机器学习领域)已经比会议的间隔时间要短了,所以你在会议看到的大部分论文实际上都算是旧新闻了。因此,会议更多地是一项社交活动。与其参加一个讲座,我建议你把去走廊转转作为一项主要活动。你还可以去海报宣传去逛逛,说不定会发现一些错过的有趣论文和想法。
据说一个博士生有三个阶段。在第一个阶段,一篇相关论文的引用你大部分都没看过;在第二个阶段,你能认出这些论文;在第三个阶段,你已经与所有论文的第一作者喝过一圈了。
Closing thoughts 最后的一些想法
I can’t find the quote anymore but I heard Sam Altman of YC say that there are no shortcuts or cheats when it comes to building a startup. You can’t expect to win in the long run by somehow gaming the system or putting up false appearances. I think that the same applies in academia. Ultimately you’re trying to do good research and push the field forward and if you try to game any of the proxy metrics you won’t be successful in the long run. This is especially so because academia is in fact surprisingly small and highly interconnected, so anything shady you try to do to pad your academic resume (e.g. self-citing a lot, publishing the same idea multiple times with small remixes, resubmitting the same rejected paper over and over again with no changes, conveniently trying to leave out some baselines etc.) will eventually catch up with you and you will not be successful.
尽管我现在找不到出处了,但是我曾听到 YC 的 Sam Altman 说,建立一个创业公司没有捷径可走。你不能指望通过玩弄体制,或者通过伪装来获得长久的胜利。我想在学术领域也是一样的。最终,你的目的是用优秀的研究推动这一领域的进步,如果你试图针对某些指标动手脚,从长远来看你无法成功。在学术界尤其如此,因为学术界令人惊讶地小,并且高度关联,所以,任何你试图在学术履历上用点阴招(例如,常常自己引用自己、将同一想法稍作修改后重复发表、重复提交被退回的论文而没有丝毫修改、为了自己的便利而抛弃一些基本原则,等等)最终将让你尝尽苦果,而你也不会成功。
So at the end of the day it’s quite simple. Do good work, communicate it properly, people will notice and good things will happen. Have a fun ride!
所以,总而言之就一句话:好好工作、适当交流,人们会注意到你,好事也会发生。祝博士之旅愉快!
EDIT: HN discussion link.
comments powered by Disqus
【附录:博士论文】
-
论文:连接图像与自然语言(CONNECTING IMAGES AND NATURAL LANGUAGE)
摘要:人工智能领域的一个长期目标是开发能够感知和理解我们周围丰富的视觉世界,并能使用自然语言与我们进行关于其的交流的代理。由于近些年来计算基础设施、数据收集和算法的发展,人们在这一目标的实现上已经取得了显著的进步。这些进步在视觉识别上尤为迅速——现在计算机已能以可与人类媲美的表现对图像进行分类,甚至在一些情况下超越人类,比如识别狗的品种。但是,尽管有许多激动人心的进展,但大部分视觉识别方面的进步仍然是在给一张图像分配一个或多个离散的标签(如,人、船、键盘等等)方面。
在这篇学位论文中,我们开发了让我们可以将视觉数据领域和自然语言话语领域连接起来的模型和技术,从而让我们可以实现两个领域中元素的互译。具体来说,首先我们引入了一个可以同时将图像和句子嵌入到一个共有的多模态嵌入空间(multi-modal embedding space)中的模型。然后这个空间让我们可以识别描绘了一个任意句子描述的图像,而且反过来我们还可以找出描述任意图像的句子。其次,我们还开发了一个图像描述模型(image captioning model),该模型可以根据输入其的图像直接生成一个句子描述——该描述并不局限于人工编写的有限选择集合。最后,我们描述了一个可以定位和描述图像中所有显著部分的模型。我们的研究表明这个模型还可以反向使用:以任意描述(如:白色网球鞋)作为输入,然后有效地在一个大型的图像集合中定位其所描述的概念。我们认为这些模型、它们内部所使用的技术以及它们可以带来的交互是实现人工智能之路上的一块垫脚石,而且图像和自然语言之间的连接也能带来许多实用的益处和马上就有价值的应用。
从建模的角度来看,我们的贡献不在于设计和展现了能以复杂的处理流程处理图像和句子的明确算法,而在于卷积和循环神经网络架构的混合设计,这种设计可以在一个单个网络中将视觉数据和自然语言话语连接起来。因此,图像、句子和关联它们的多模态嵌入结构的计算处理会在优化损失函数的过程中自动涌现,该优化考虑网络在图像及其描述的训练数据集上的参数。这种方法享有许多神经网络的优点,其中包括简单的均质计算的使用,这让其易于在硬件上实现并行;以及强大的性能——由于端到端训练(end-to-end training)可以将这个问题表示成单个优化问题,其中该模型的所有组件都具有一个相同的最终目标。我们的研究表明我们的模型在需要图像和自然语言的联合处理的任务中推进了当前最佳的表现,而且我们可以一种能促进对该网络的预测的可解读视觉检查的方式来设计这一架构。
- 上文请见:[中/英双语] Andrej Karpathy:A Survival Guide to a PhD (一)
(本文为自己整理,仅供学习收藏使用,译文部分参考机器之心翻译(有一段翻译漏掉了,自己加上去了,然后略作修改),在此表示感谢。未经允许禁止转载,授权转载请注明出处,谢谢!)