cc和毫升换算_撰写和撰写毫升研究论文的技巧

cc和毫升换算

Peer reviewing is a staple process of the scientific community, and it always precedes the publication of a paper in a scientific journal or conference. Its goal is to receive feedbacks from peer researchers that analyze your work according to their own experience, and judge it according to some criteria such as novelty, technical correctness and even readability.

同行评审是科学界的一个主要程序,它总是在科学期刊或会议上发表论文之前进行。 其目标是接收来自同行研究人员的反馈,这些反馈根据他们自己的经验来分析您的工作,并根据某些标准(例如新颖性,技术正确性甚至可读性)进行判断。

In the fields related to “AI” (e.g. computer vision, natural language processing, speech processing and so on…) this step is often delayed or completely skipped because of tools like arxiv.org. Ariv is a preprint server, its goal is supposed to be early dissemination and debate of new work, but many papers put there become very popular before any peer review. And sometimes no peer review happens at all. Many of such papers are indeed very valuable pieces of work, but in some cases some ideas are spread without receiving a “green light” from the community, and for good reasons.

在与“ AI”相关的领域(例如,计算机视觉,自然语言处理,语音处理等),由于诸如arxiv.org之类的工具,此步骤通常被延迟或完全跳过。 Ariv是一个预打印服务器,其目标应该是尽早传播和讨论新工作,但在那里发表的许多论文在任何同行评审之前都非常受欢迎。 有时根本没有同行评审。 许多这样的论文确实是非常有价值的作品,但是在某些情况下,有些想法是在没有得到社区“批准”的情况下传播的,这是有充分理由的。

Is this bad? Is this good? For sure, arxiv prevents some gatekeeping mechanisms that can block young researcher’s career. On the other side, our worldis flooded with papers, and for new researchers (or data scientists / engineers who need to read them) it may be very tough to discriminate good and bad work. With so many papers around we need a way to narrow down our focus.

这不好吗? 这个好吗? 可以肯定的是,arxiv阻止了某些可能阻碍年轻研究人员职业的守门机制。 另一方面,我们的世界充斥着论文,对于新的研究人员(或需要阅读它们的数据科学家/工程师),很难区分好事和坏事。 围绕这么多论文,我们需要一种缩小关注范围的方法。

With this post I want to share the lessons for paper writing/reading that I learned from peer reviewing , and when I say reading I mean reviewing. It is a good exercise to approach the reading of all papers you face, also the most popular, as a review. They do not present established knowledge and it usually takes some years before the community replicates and reproduces extensively the experiments to absorb the new knowledge. So, let skepticism and criticism anticipate your enthusiasm. Reading will take longer but you will learn more in the process.

在这篇文章中,我想分享从同行评审中学到的论文写作/阅读的经验,当我说阅读时,我的意思是复习。 最好阅读一下您所面临的所有论文(也是最受欢迎的论文)以作为综述。 他们没有提供已建立的知识,通常需要几年的时间才能使社区广泛复制和复制实验以吸收新知识。 因此,让怀疑和批评预示您的热情。 阅读将花费更长的时间,但您会在此过程中学到更多。

写作技巧 (Writing tips)

In the following section, I describe the different aspects of a paper to analyze, which incidentally are the ones you are required to evaluate in a conference peer review. These suggestions are the product of four years of writing and reading, getting papers rejected and improving them according to the reviewer’s suggestions. I know that four years are nothing when compared to the decade-long experience of many professors. However, these are lessons learned the hard way and I thought that I can make someone’s life easier by writing them down. The discussed points are valid for both writing and reading, but writing is obviously more difficult and, as such, I want to add something about it before starting.

在以下部分中,我描述了要分析的论文的不同方面,而这些恰好是您在会议同行评审中需要评估的方面。 这些建议是四年写作和阅读的产物,可以根据审稿人的建议拒绝论文并对其进行改进。 我知道,与许多教授长达十年的经验相比,四年都不算什么。 但是,这些是很难学到的经验,我认为通过写下来可以使某人的生活更轻松。 讨论的要点对于写作和阅读都是有效的,但是写作显然要困难得多,因此,我想在开始之前添加一些内容。

First of all, the more you master English and scientific writing, the better it is. Good research is necessary for having a paper accepted but it is not sufficient. Your readers should be able to understand what you have done and its value. Read about scientific writing and try to improve continuosuly. Never stop learning, never think that your writing skills are good enough.There are native speakers out there writing in the same area as you, and there researchers trained as linguists, who will always write better than you. Fortunately, it is a continuous work in progress, and if everything goes well, when you will read again your papers from one year back you will be embarassed by writing issues. To improve faster, do not limit yourself to writing papers at the end of a research work. Write drafts, proposal, even a blog (look at what I do, not at what I say ;) ) and, most important, receive feedbacks. You improve when your errors are pointed out and corrected.

首先,您对英语和科学写作的掌握越多,越好。 要接受论文,必须进行良好的研究,但这还不够。 您的读者应该能够理解您所做的工作及其价值。 阅读 关于 科学 写作 和 尝试 ,以 提高 实际深入 。 永远不要停止学习,永远不要认为自己的写作能力足够好。那里有以英语为母语的人与您在同一领域写作,并且有受过语言学家培训的研究人员,他们的写作总会比您更好。 幸运的是,这是一项持续不断的工作,如果一切顺利,那么一年后再次阅读论文时,您会被写作问题所困扰。 为了更快地进步,在研究工作结束时不要局限于撰写论文。 撰写草稿,提案甚至是博客(看看我在做什么,而不是我在说什么;)),最重要的是,收到反馈。 当指出并纠正错误时,您会有所改善。

Second, keep your audience in mind. Your paper should follow the writing conventions of the conference/journal your are writing for and also the results should be of the correct type. For instance, a research about how technology helps translators will hardly be published in EMNLP, but it will fit perfectly a machine translation conference like the MT Summit. In the same fashion, a new cool interface for data collecting can be very useful for many tasks, but unless it proposes new deep learning models it will never get to NeurIPS. Or, machine learning conferences require much more math writing than other conferences.

其次,紧记您的听众。 您的论文应遵循您所写作的会议/期刊的写作习惯,并且结果应为正确的类型。 例如,有关技术如何帮助翻译的研究几乎不会在EMNLP上发表,但它非常适合像MT Summit这样的机器翻译会议。 以同样的方式,一个新的很酷的数据收集接口对于许多任务可能非常有用,但是除非它提出了新的深度学习模型,否则它将永远不会出现在NeurIPS上 。 或者,机器学习会议比其他会议需要更多的数学写作。

Third, always write with your reader in mind. Remind that your reader is not inside your head. Your reader does not know what you know, does not have your experience, does not know what you were thinking while writing, and definitely does not know your assumptions. Unless you write everything down. Then, writing a paper is not about self-celebration. Nobody reads you to know that you can smash all the competition and break the state-of-the-art results. A research paper is a way to communicate new knowledge, and your readers want exactly this. Always ask yourself why a reader (and a reviewer) should be interested in your work, why they should care at all, and how you can make it more interesting. In the corporate world, the most successful companies are the ones that can interpret better the needs of their customers and propose products accordingly. Writing requires the same mindset.

第三,一定要牢记读者的观点。 提醒您,读者不在头脑中。 您的读者不知道您所知道的知识,没有您的经验,不知道您在写作时的想法,并且绝对不知道您的假设。 除非您将所有内容都写下来。 因此,写论文与自我庆祝无关。 没有人看过您,知道您可以粉碎所有竞争并打破最新的结果。 研究论文是交流新知识的一种方式,您的读者正是希望如此。 始终问自己一个问题,为什么读者(和审稿人)对您的工作感兴趣,为什么他们应该完全关心,以及如何使它变得更有趣。 在企业界,最成功的公司是可以更好地解释其客户需求并相应提出产品的公司。 写作需要相同的心态。

Finally, scientific writing is not supposed to be aseptyc. Whenever you write, tell a story for your reader. If every section of your paper is self-contained and it is mostly a contour for your numbers, it may be useful as a reference but it will hardly excite your reader. A research paper needs to follow a progression similarly to a modern tale: a strong enemy threatens to derange our lifes (our problem); somebody really needs to fight him (motivation); a hero appears who can fight the enemy (proposed method); others have tried before him and failed (related woks); the fight takes place (experiments and results); but then, as we are scientists, the triumph is replaced by an analysis of the fight. Finally, strong of the lessons learned from the analysis, we ask our community to join our fight and promise to come back to it soon. When writing a good story, you create engagement and provide much more than information.

最后,科学写作不应该被认为是Aseptyc。 每当您写作时,请为您的读者讲一个故事 。 如果您的论文的每一部分都是独立的,并且大部分都是数字的轮廓,则可以作为参考,但几乎不会激发读者的兴趣。 研究论文需要遵循与现代故事相似的发展过程:强大的敌人威胁着改变我们的生活(我们的问题); 有人真的需要和他战斗(动机); 出现可以与敌人战斗的英雄(建议的方法); 其他人在他之前尝试过并失败了(相关的炒锅); 战斗发生(实验和结果); 但是后来,作为我们的科学家,胜利被对战斗的分析所取代。 最后,在从分析中学到的大量经验教训之后,我们要求我们的社区加入我们的斗争,并承诺很快会再次参与其中。 撰写好故事时,您可以创造参与感,并提供的不仅仅是信息。

研究论文的关键方面 (Critical Aspects of a Research Paper)

新颖性 (Novelty)

Novelty is a tricky topic in science. Research obviously has to produce new knowledge, but this can mean different thigs for different people, and sometimes the communities have overvalued a type of knowledge at the expense of the others, mostly because of research trends or what was perceived “difficult” at the time.

新颖性是科学中一个棘手的话题。 研究显然必须产生新知识,但这对于不同的人可能意味着不同的想法,并且有时社区会以牺牲他人为代价高估一种知识,这主要是由于研究趋势或当时被认为是“困难”的。

The most evident example happened in NLP conferences (and other “AI”-related fields) when everybody started to use deep learning. It seemed that the only research worth to do was to propose new deep learning topologies for different tasks. Other important areas like building linguistic resources and evaluating system quality were relegated to minor conferences.

最明显的例子发生在NLP会议(以及其他与“ AI”相关的领域)中,当时每个人都开始使用深度学习。 似乎唯一值得做的研究是针对不同任务提出新的深度学习拓扑。 其他重要领域,例如建立语言资源和评估系统质量,都降级为小型会议。

I think that the problem emerged because many unexperienced reviewers (as I was) were more interested in what is possible to achieve with deep learning than actually understanding better a problem and how to solve it. This attitude led to the acceptance of many papers that we cannot remember anymore, probably becausethe “new” methods did not do anything substantially different from previous, simpler models.

我认为这个问题之所以出现,是因为许多经验不足的评论者(就像我一样)对深度学习可以实现的目标比对更好地理解问题以及如何解决更感兴趣。 这种态度导致许多我们不记得的论文被接受,这可能是因为“新”方法与以前的简单模型没有什么实质性的区别。

Fortunately, now conferences are explictly encouraging their reviewers to have a broader view about the “acceptable” papers. Indeed,novelty can mean to make unexpected links between two different research areas, propose needed corpora, better evaluation or, even better, pave the way for a new task.

幸运的是,现在的会议明确鼓励他们的审稿人对“可接受的”论文有更广泛的看法。 确实,创新可能意味着在两个不同的研究领域之间建立意想不到的联系,提出需要的语料库,进行更好的评估,甚至为新任务铺平道路。

Novelty is strongly related to what can be considered a research papers. A few years ago, showing that something can be done with deep learning, then with RNNs, CNNs, and so on, was a legitimate research question. Now it is not considered interesting anymore (given that you can find a task where all these things have not been applied).

新颖性与可以视为研究论文的内容密切相关。 几年前,证明可以通过深度学习,然后使用RNN,CNN等来完成某些事情是一个合理的研究问题。 现在,它不再被认为是有趣的(假设您可以找到一个尚未应用所有这些东西的任务)。

When you read a paper, connect it to previous work and try to understand how novel it is. Was this problem tackled before? Is this paper proposing a new way to look at the problem? Is the resource valuable and the construction method useful to reproduce?

阅读论文时,请将其与以前的作品联系起来,并尝试了解它的新颖性。 这个问题以前解决过吗? 本文是否提出了解决问题的新方法? 资源是否有价值,构造方法是否对复制有用?

When writing, try to highlight the novelty of your paper in every possible way. If you do not do it, nobody else is going to do it. Highlighting the novelty can also suggest you the story to tell with it.

写作时,请尝试以各种可能的方式强调论文的新颖性。 如果您不这样做,那么没有其他人会这样做。 突出新颖性也可以建议您讲故事。

明晰 (Clarity)

Content quality is of absolute importance in a research paper, but to be able to convey the content, it must be written well. A good paper has a good structure that facilitates its reading. The abstract and introduction convey the message in a broad way and motivate the reader to keep reading. In the middle it must be clear where to find your hypothesis, experimental setup, experiments, the needed background and how it is positioned in the current literature. The conclusions wrap up the paper to highlight the findings and why the research contained in it is important.

内容质量在研究论文中绝对重要,但要能够传达内容,必须写得好。 好的论文的结构要有利于阅读。 摘要和引言广泛地传达了这一信息,并激发读者继续阅读。 在中间,必须清楚在哪里可以找到您的假设,实验设置,实验,所需的背景以及在当前文献中的位置。 结论总结了这篇论文,以突出研究结果以及其中所包含的研究为何重要的原因。

Abstract, introduction and conclusions must be written with particular care, because their goal is to convince the audience that the paper is worth reading.

摘要,导言和结论必须格外小心,因为它们的目的是使听众确信该论文值得阅读。

If you are writing a paper as a beginner, the best thing to do is to stick to the most common structure of papers similar to yours in the venue where you are publishing. With time and experience you will learn to change it to fit better your paper, but there is no need to run fast on this point. Use a clean structure and focus on your writing.

如果您是初学者,那么最好的做法是坚持与您所发表的场所中最相似的论文结构。 随着时间和经验的积累,您将学会进行更改以使其更适合您的论文,但此时无需快速运行。 使用简洁的结构并专注于写作。

Good English is the second, and more difficult, point about clarity. Grammar should be as perfect as possible, sentences clear and concise. Avoid statements that are general or ambiguous. Explain clearly the figures and the tables that you present and never assume anything. And again, remember:

好的英语是第二个要点,也是更困难的要点。 语法应尽可能完善,句子应简洁明了。 避免使用笼统或含糊的陈述。 清楚说明您提供的图形和表格,从不承担任何责任。 再一次,请记住:

The reader is not inside your mind.

读者不在您的内心。

If you want your paper to be clear for peer researchers, ask first somebody with a different background to read it and tell you what they find not clear. You can be surprised by the actual readability.

如果您希望论文对同行研究者清晰,请首先请具有不同背景的人阅读,并告诉您他们发现不清楚的内容。 您可能会对实际的可读性感到惊讶。

When reading, enjoy a beautifully written paper, or just leave aside a poorly written one. If you cannot understand what it says, or if the content is too ambiguous, chances are that it is difficult to read for many others. It is not going to be useful for providing new knowledge.

阅读时,请享受写得精美的纸张,或者只留下写得不好的纸张。 如果您听不懂它的意思,或者内容过于模棱两可,则很多其他人很难阅读。 它对于提供新知识将无用。

动机 (Motivation)

A paper’s introduction has to provide general background information as well as the scientific problem that it solves, some information about the approach and method used and the motivation behind all this. This part can be underestimated when writing a paper, maybe because one assumes that its value is obvious, or because one may think that only results are really needed in a paper. It turns out that, unless you are doing research in an overcrowded niche, it is unlikely that your reader knows why it matters. It happened to me sometimes to get papers rejected mainly because they were not well motivated.

论文的引言必须提供一般的背景信息以及它要解决的科学问题,以及所使用的方法和方法以及其背后动机的一些信息。 在撰写论文时,这部分可能会被低估,这可能是因为有人认为它的价值是显而易见的,或者是因为有人认为论文只需要结果。 事实证明,除非您在人满为患的利基市场中进行研究,否则您的读者不太可能知道其重要性。 有时候我碰巧主要是因为论文动机不足,导致论文被拒绝。

You may ask why it matters so much if the research contained in it is good. The reason is that, if you cannot explain the relevance of solving a problem, how can the others know that it is a problem at all? Also, any publication venue has limited room for papers. Then, everything else being equal, a paper that solves a compelling problem will be valued more than another with dubious motivation.

您可能会问,如果其中包含的研究很好,那么为何如此重要呢? 原因是,如果您不能解释解决问题的重要性,那么其他人怎么可能根本不知道这是一个问题呢? 此外,任何出版场所的论文空间都有限。 然后,在其他所有条件都相同的情况下,解决一个令人信服的问题的论文将比其他具有怀疑动机的论文更有价值。

My suggestion is to think carefully why your readers can find your research interesting and write it explicitly. If the motivation resonates with your readers and the solution is good, they will become your biggest fan on the planet and fight for having it accepted/known. The opposite case will sound like: “the method is interesting, the results are good, yes ok, but why should I care?”. Try to be on the good side.

我的建议是仔细考虑为什么您的读者会发现您的研究有趣并明确撰写。 如果这种动机引起了读者的共鸣,并且解决方案很好,那么他们将成为您在地球上的最大粉丝,并为获得认可/知名度而战。 相反的情况听起来像是:“方法很有趣,结果很好,是的,但是我为什么要关心?”。 尝试站在好的方面。

When reviewing, try to be more cautious. If the motivation does not resonate with you, it may be because you do not understand the problem tackled. However, if the paper is not able to frame the general problem and the aspects studied in it in an understandable way, then this is an issue to point out.

审阅时,请务必谨慎。 如果动机不引起您的共鸣,则可能是因为您不了解所解决的问题。 但是,如果本文无法以一种可以理解的方式来描述一般问题和研究的各个方面,那么这是一个需要指出的问题。

假设 (Hypothesis)

One of the pillars of science is the falsifiable hypothesis: after observing a phenomenon, I make a hypothesis and design experiments where its assumptions hold true. I hope that my hypothesis is not proved wrong by my experiments, but it can be proved wrong/incomplete in the future.

科学的Struts之一是可证伪的假设:观察到一种现象之后,我进行了一个假设,并进行了假设成立的实验。 我希望我的假设不会被我的实验证明是错误的,但是将来可以被证明是错误的/不完整的。

A hypothesis should give an explanation for a phenomenon or justify an engineering improvement. In a paper, it should be clearly stated since the abstract and repeated along the paper. It is not a coincidence if the most successful papers are also the ones with hypothesis that changed how we see problems. “Attention is all you need” hypothesized that self-attention is more effective than recurrent neural networks in modelling sequences; “BERT” hypothesized that large amounts of unlabeled text data can power up systems for many NLP tasks; “Distilling the Knowledge in a Neural Network” hypothesized that a neural network can learn latent relations among target classes that are more useful than the original data themselves for training new models. They all are now common knowledge in the scientific community, but their ideas were transmitted by their strong and concise messages, not only through numbers in a table. By contrast, papers with weak hypotheses will be perceived as “incremental work” and then moved to a lower rank.

假设应为现象做出解释或证明工程上的改进。 在论文中,应该从摘要中清楚地陈述出来,并随论文重复。 如果最成功的论文也是那些改变我们如何看待问题的假设,这不是巧合。 “ 注意力是您所需要的 ”,假设在建模序列中,自我注意力要比递归神经网络更有效; “ BERT ”假设大量未标记的文本数据可以为许多NLP任务启动系统。 “ 在神经网络中提取知识 ”假设,神经网络可以学习目标类别之间的潜在关系,这些潜在关系比原始数据本身对训练新模型更有用。 它们现在都是科学界的常识,但是它们的思想是通过其简洁明了的信息传递的,而不仅仅是通过表格中的数字来传递。 相比之下,假设较弱的论文将被视为“增量工作”,然后排名较低。

When writing, state clearly your hypothesis and be sure that the method you propose is coherent with it. Obviously, this part should be done when designing the method and much earlier than writing the paper, but when you write it is time to make it explict for your readers.

写作时,请清楚陈述您的假设,并确保您提出的方法与之相符。 显然,这部分应该在设计方法时完成,并且要比撰写论文早得多,但是在撰写本文时,是时候让读者明白了。

When reading/reviewing ask yourself if the hypothesis is clear, if it is interesting and if the experiments agree with it. If the method and the hypothesis are not coherent among temselves, then the paper is not producing knowledge but confusion. It always feels bad to click on “reject” if you feel that the research was good, but a method justified by means of a hypothesis that does not really hold leads to false claims. The same is true when the paper’s title does not agree with its content or the analysis/conclusions make claims out of the blue. In science, errors can happen, but openly false claims cannot be accepted.

阅读/复习时,请问自己假设是否明确,是否有趣以及实验是否与之吻合。 如果方法和假设彼此之间不一致,那么本文不是在产生知识,而是产生混乱。 如果您认为研究很好,那么单击“拒绝”总是很不好的意思,但是一种通过假设所证实的方法并没有真正成立,这会导致错误的主张。 当论文标题与其内容不符或分析/结论出人意料时,情况也是如此。 在科学中,可能会发生错误,但不能接受公开的错误主张。

数据集/评估 (Datasets/Evaluation)

The dataset used for your experiments is not a secondary choice of the experimental design. A dataset embodies a set of domains (or languages, for speech and language processing), assumptions and biases that should be kept in mind during a study. The dataset should reflect the phenomenon to study. Some research works use datasets that are “proxies” for the real task, but this is usually not a good idea.

用于实验的数据集不是实验设计的第二选择。 数据集包含一组在研究期间应牢记的领域(或语言,用于语音和语言处理),假设和偏见。 数据集应反映要研究的现象。 一些研究工作使用的数据集是完成实际任务的“代理”,但这通常不是一个好主意。

A common example of dataset problems from the machine translation world is for papers about low-resourced language pairs, and I must admit that I fell in the same error. Many papers present methods for a low-data condition, but then experiment on a small (and sometimes not-so-small) subset of a large dataset. However, many methods can appear to work here just because the small data produce a weak baseline. Also, such an approach assumes that low-resourced languages and rich-resourced languages are alike, only with less known vocabulary. This is mostly wishful thinking since languages can be as diverse as you can imagine. For instance, the concepts of “to be” and “to have” can be expressed in many different ways, if expressed at all; different formality levels can lead to completely different vocabulary; the idiomatic expressions are really linked to their culture; different writing systems, sometimes many together. These are just some of the challenges of really low-resourced languages that can be hardly learned from small datasets.

来自机器翻译领域的数据集问题的一个常见示例是有关资源匮乏的语言对的论文,我必须承认我也遇到了同样的错误。 许多论文介绍了针对低数据条件的方法,但随后针对大型数据集的一小部分(有时不是那么小)进行了实验。 但是,许多方法似乎仅在此起作用,因为小数据会产生较弱的基线。 同样,这种方法假设资源较少的语言和资源丰富的语言是相似的,只是词汇量较少。 这主要是一厢情愿的想法,因为语言可以像您想象的那样多种多样。 例如,“存在”和“具有”的概念可以用多种不同的方式表达,如果可以表达的话。 不同的形式水平可以导致完全不同的词汇量; 惯用语表达确实与其文化联系在一起; 不同的写作系统,有时很多在一起。 这些只是资源匮乏的语言所面临的一些挑战,这些语言很难从小型数据集中学习。

A different issue is the one of evaluation. Sometimes researchers are eager to propose new solutions to problem that they still do not know how to evaluate. In this case, establishing an evaluation method can foster the research field much more than a “novel method”. For instance, “Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus” proposes a reproducible method to evaluate speech translation’s quality on gender-related phenomena. The evaluation method required to build a test set with manual annotations in it. Can it be done better? For sure yes, but we still do not know how. And that paper is a big step forward when compared to previous studies on translation gender bias that simply evaluated general translation quality and use it as a proxy for their task.

评估是另一个问题。 有时,研究人员渴望提出新的解决方案,以解决他们仍然不知道如何评估的问题。 在这种情况下,建立一种评估方法可以比“新颖的方法”更多地促进研究领域的发展。 例如,“ 危险中的性别? 在MuST-SHE语料库上评估语音翻译技术 ”提出了一种可再现的方法,用于评估与性别相关的现象的语音翻译质量。 建立带有手动注释的测试集所需的评估方法。 能做得更好吗? 当然可以,但是我们仍然不知道如何。 与以前对翻译性别偏见的研究相比,该论文仅向前评估了一般翻译质量并将其用作任务的代理,这是向前迈出的一大步。

Do you want to tackle a task but the existing datasets/evaluation methods are not good enough? Build those and your work will be much more valuable than the ones using a super complex neural network to solve a fake problem.Remember that anybody approaching a new research area will need software, datasets and automatic evaluation. By making it easy to access these resources, you can gain attention (and citations) in your area. In my personal case, people started to notice me in the field of speech translation becasue of code and a dataset I worked on.

您是否想解决一项任务,但是现有的数据集/评估方法还不够好? 构建这些模型,您的工作将比使用超复杂神经网络解决假问题的模型有价值得多。请记住,任何进入新研究领域的人都需要软件,数据集和自动评估。 通过轻松访问这些资源,您可以在您所在的地区获得关注(和引用)。 就我个人而言,由于代码和我工作的数据集 ,人们开始在语音翻译领域注意到我。

When writing a paper, be sure that the datasets and evaluation methods that you used are coherent with the problem that you want to solve.

撰写论文时,请确保您使用的数据集和评估方法与要解决的问题保持一致。

When reading a paper, ensure that the claims in the introduction/conclusion really match the findings of the experiments.

阅读论文时,请确保引言/结论中的权利要求与实验结果完全匹配。

基准线 (Baseline)

Your baselines represent a strong indicator of the care you put in your work. A strong baseline, maybe stronger than in other published results, means that the improvement provided by your method are reliable. Sometimes , it happens to read or review papers with baselines that are competitive with the results of 5 years ago, and they absolutely neglect all the improvements produced in the meanwhile. Then, although the improvement over the baseline can be meaningful, the final result is not reliable. The reason is that the contribution (in terms of final result) of many methods or inductive biases tends to zero when applied to stronger baselines. As an example, it is very difficult to appreciate the value of embedding pre-training in a neural machine translation model with a large amount of training data, whereas they are very effective when the training set is small.

您的基准线代表着您对工作的重视程度。 强大的基线(可能比其他已发布的结果强)意味着您的方法所提供的改进是可靠的。 有时,阅读或审阅具有与5年前的结果相抗衡的基准的论文时,它们绝对会忽略同时产生的所有改进。 然后,尽管对基线的改进可能是有意义的, 但最终结果并不可靠 。 原因是当应用于更强的基准时,许多方法或归纳偏差的贡献(就最终结果而言)趋于零。 例如,很难理解将预训练嵌入具有​​大量训练数据的神经机器翻译模型中的价值,而当训练集较小时,它们非常有效。

When writing a paper, show more love to your baselines and it will be reciprocated. Stronger baselines can give you new insights into a problem and highlight some flaws in how it has been tackled so far. Weak baselines are useful only to try to publish a paper by hoping to get reviewers that do not know your field. Please, do not do that.

撰写论文时,请多加一些爱心 ,这将使您受益。 更强的基准可以为您提供对问题的新见解,并突出显示到目前为止解决该问题的一些缺陷。 较低的基准仅在试图通过希望让不了解您的领域的审稿人来发表论文时有用。 请不要那样做。

When reading a paper, before being amazed by the shiny results presented in it, check if it cites recent papers in the same area and verify that the baselines are coherent with the other studies. The presence of tables listing the state of the art is usually a value, but always double check that it is really the state of the art. I saw papers that excluded results that they should have been aware of (because of time passed and small number of published papers). This way their results looked much better than they actually were. This is a really bad scientific practice and should be stopped as soon as noticed.

阅读论文时,请先阅读是否引用了同一地区的最新论文,并验证基线是否与其他研究保持一致,然后再对其中的闪亮结果感到惊讶。 列出现有技术水平的表通常是一个值,但始终要仔细检查它是否确实是现有技术水平。 我看到了一些论文,这些论文排除了他们应该知道的结果(由于时间的流逝和发表论文的数量很少)。 这样,他们的结果看上去比实际情况要好得多。 这是一种非常糟糕的科学做法,应在发现后立即停止。

结果/分析 (Results/Analysis)

The results are an important aspect for an empirical field like NLP, but the adoption of deep learning forces us to read them with a grain of salt. What can go wrong? After checking that the baselines are convincing, we need to estimate the entity of the improvement. Relatively small improvements, although stastically significant, can still be due to randomness (like different random seed, or even a different implementation of the method) or to a better choice of hyperparameters. Just by reading a paper it is hard to tell what the reasons of the improvement are, so I prefer to focus on the analysis. If the analysis can show a coherence between the hypothesis and the results, then I tend to have a positive attitude. If the analysis shows something just because it has to, then it is better for the rest of the paper being really good.

对于像NLP这样的经验领域来说,结果是一个重要方面,但是深度学习的采用迫使我们读了一堆盐。 有什么问题吗? 在确认基线令人信服之后,我们需要估计改进的内容。 相对较小的改进(尽管在统计上意义重大)仍然可以归因于随机性(如不同的随机种子,甚至是该方法的不同实现)或对超参数的更好选择。 仅通过阅读一篇论文就很难说出改进的原因是什么,所以我宁愿专注于分析。 如果分析可以显示假设和结果之间的一致性,那么我倾向于持积极态度。 如果分析显示某些事情仅仅是因为它必须这样做,那么最好使本文的其余部分保持良好状态。

However, sometimes it is really hard to produce an analysis that can show something interesting for the study. Usually, this happens because it would involve understanding what goes on inside the network weights, which is still quite uninterpretable. Then, we are forced to provide proxies to gain some insights about our methods. The proxies should still be related to the phenomenon, and the models, we are interested in. Anyway, if you are reading a paper and find the analysis unsatisfying, do not be too negative unless you are able to propose a more valuable one.

但是,有时候很难做出能够显示出一些有趣研究结果的分析。 通常,发生这种情况是因为这将涉及了解网络权重内部发生的事情,这仍然是无法解释的。 然后,我们被迫提供代理以获取有关我们方法的一些见解。 代理仍然应该与现象有关,而模型也应与我们感兴趣。无论如何,如果您正在阅读一篇论文并发现分析不令人满意,请不要太否定,除非您能够提出更有价值的建议。

When writing a paper, ensure that your results are convincing and the procedure followed is analogous to previous studies, so that it is easy for a reader to compare.

撰写论文时,请确保您的结果令人信服,并且遵循的步骤与以前的研究相似,以使读者可以轻松比较。

When reading a paper, try to figure out if the results can be really due to the method applied or to other unrelated factors. Read well the analysis and understand if it is compelling to the phenomenon under study.

阅读论文时,请尝试确定结果是否确实是由于所采用的方法或其他不相关的因素所致。 仔细阅读分析,了解它是否对正在研究的现象产生影响。

结论 (Conclusions)

A scientific paper can be very intimidating to write/read/review, but practice and experience will make it easier over time. Fortunately, many papers follow the same structure that makes the reading easier, and when they do not, either they are particularly good or they are particularly bad. Both cases are usually easy to spot. If you are new to writing scientific papers, you may want to improve your scientific writing skills. The web is full of resources, and the ones I linked are a very small sample. I learned many useful tips and recommendations this way. If you are new to reviewing, it is probably better to do some practice by reviewing published papers and asking to somebody more experienced to comment your reviews. I hope that this post helps you to focus on the relevant aspects that can make you understand easier the value of a paper and avoid some errors that I did when starting.

撰写/阅读/审阅科学论文可能会令人生畏,但随着时间的流逝,实践和经验将使其变得更加容易。 幸运的是,许多论文都采用了相同的结构,使阅读更容易,而如果没有,则它们特别好或特别差。 两种情况通常都很容易发现。 如果您不熟悉撰写科学论文,则可能需要提高科学撰写技能。 网络上充满了资源,而我链接的资源只是一个很小的示例。 通过这种方式,我学到了许多有用的提示和建议。 如果您不熟悉审阅,最好通过审阅已发表的论文并请更有经验的人发表评论来进行一些练习。 我希望这篇文章可以帮助您专注于相关方面,以使您更容易理解论文的价值并避免我在开始时犯的一些错误。

翻译自: https://towardsdatascience.com/tips-for-reading-and-writing-an-ml-research-paper-a505863055cf

cc和毫升换算

你可能感兴趣的:(人工智能,机器学习)