忘情摆渡

【转载】A Recipe for Training Neural Networks

A Recipe for Training Neural Networks

大神Karpathy的经验之谈，转载自Karpathy的博客。

Some few weeks ago I posted a tweet on “the most common neural net mistakes”, listing a few common gotchas related to training neural nets. The tweet got quite a bit more engagement than I anticipated (including a webinar ?). Clearly, a lot of people have personally encountered the large gap between “here is how a convolutional layer works” and “our convnet achieves state of the art results”.

So I thought it could be fun to brush off my dusty blog to expand my tweet to the long form that this topic deserves. However, instead of going into an enumeration of more common errors or fleshing them out, I wanted to dig a bit deeper and talk about how one can avoid making these errors altogether (or fix them very fast). The trick to doing so is to follow a certain process, which as far as I can tell is not very often documented. Let’s start with two important observations that motivate it.

1) Neural net training is a leaky abstraction

It is allegedly easy to get started with training neural nets. Numerous libraries and frameworks take pride in displaying 30-line miracle snippets that solve your data problems, giving the (false) impression that this stuff is plug and play. It’s common see things like:

>>> your_data = # plug your awesome dataset here
>>> model = SuperCrossValidator(SuperDuper.fit, your_data, ResNet50, SGDOptimizer)
conquer world here

These libraries and examples activate the part of our brain that is familiar with standard software - a place where clean APIs and abstractions are often attainable. Requests library to demonstrate:

>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200

That’s cool! A courageous developer has taken the burden of understanding query strings, urls, GET/POST requests, HTTP connections, and so on from you and largely hidden the complexity behind a few lines of code. This is what we are familiar with and expect. Unfortunately, neural nets are nothing like that. They are not “off-the-shelf” technology the second you deviate slightly from training an ImageNet classifier. I’ve tried to make this point in my post “Yes you should understand backprop” by picking on backpropagation and calling it a “leaky abstraction”, but the situation is unfortunately much more dire. Backprop + SGD does not magically make your network work. Batch norm does not magically make it converge faster. RNNs don’t magically let you “plug in” text. And just because you can formulate your problem as RL doesn’t mean you should. If you insist on using the technology without understanding how it works you are likely to fail. Which brings me to…

2) Neural net training fails silently

When you break or misconfigure code you will often get some kind of an exception. You plugged in an integer where something expected a string. The function only expected 3 arguments. This import failed. That key does not exist. The number of elements in the two lists isn’t equal. In addition, it’s often possible to create unit tests for a certain functionality.

This is just a start when it comes to training neural nets. Everything could be correct syntactically, but the whole thing isn’t arranged properly, and it’s really hard to tell. The “possible error surface” is large, logical (as opposed to syntactic), and very tricky to unit test. For example, perhaps you forgot to flip your labels when you left-right flipped the image during data augmentation. Your net can still (shockingly) work pretty well because your network can internally learn to detect flipped images and then it left-right flips its predictions. Or maybe your autoregressive model accidentally takes the thing it’s trying to predict as an input due to an off-by-one bug. Or you tried to clip your gradients but instead clipped the loss, causing the outlier examples to be ignored during training. Or you initialized your weights from a pretrained checkpoint but didn’t use the original mean. Or you just screwed up the settings for regularization strengths, learning rate, its decay rate, model size, etc. Therefore, your misconfigured neural net will throw exceptions only if you’re lucky; Most of the time it will train but silently work a bit worse.

As a result, (and this is reeaally difficult to over-emphasize) a “fast and furious” approach to training neural networks does not work and only leads to suffering. Now, suffering is a perfectly natural part of getting a neural network to work well, but it can be mitigated by being thorough, defensive, paranoid, and obsessed with visualizations of basically every possible thing. The qualities that in my experience correlate most strongly to success in deep learning are patience and attention to detail.

The recipe

In light of the above two facts, I have developed a specific process for myself that I follow when applying a neural net to a new problem, which I will try to describe. You will see that it takes the two principles above very seriously. In particular, it builds from simple to complex and at every step of the way we make concrete hypotheses about what will happen and then either validate them with an experiment or investigate until we find some issue. What we try to prevent very hard is the introduction of a lot of “unverified” complexity at once, which is bound to introduce bugs/misconfigurations that will take forever to find (if ever). If writing your neural net code was like training one, you’d want to use a very small learning rate and guess and then evaluate the full test set after every iteration.

1. Become one with the data

The first step to training a neural net is to not touch any neural net code at all and instead begin by thoroughly inspecting your data. This step is critical. I like to spend copious amount of time (measured in units of hours) scanning through thousands of examples, understanding their distribution and looking for patterns. Luckily, your brain is pretty good at this. One time I discovered that the data contained duplicate examples. Another time I found corrupted images / labels. I look for data imbalances and biases. I will typically also pay attention to my own process for classifying the data, which hints at the kinds of architectures we’ll eventually explore. As an example - are very local features enough or do we need global context? How much variation is there and what form does it take? What variation is spurious and could be preprocessed out? Does spatial position matter or do we want to average pool it out? How much does detail matter and how far could we afford to downsample the images? How noisy are the labels?

In addition, since the neural net is effectively a compressed/compiled version of your dataset, you’ll be able to look at your network (mis)predictions and understand where they might be coming from. And if your network is giving you some prediction that doesn’t seem consistent with what you’ve seen in the data, something is off.

Once you get a qualitative sense it is also a good idea to write some simple code to search/filter/sort by whatever you can think of (e.g. type of label, size of annotations, number of annotations, etc.) and visualize their distributions and the outliers along any axis. The outliers especially almost always uncover some bugs in data quality or preprocessing.

2. Set up the end-to-end training/evaluation skeleton + get dumb baselines

Now that we understand our data can we reach for our super fancy Multi-scale ASPP FPN ResNet and begin training awesome models? For sure no. That is the road to suffering. Our next step is to set up a full training + evaluation skeleton and gain trust in its correctness via a series of experiments. At this stage it is best to pick some simple model that you couldn’t possibly have screwed up somehow - e.g. a linear classifier, or a very tiny ConvNet. We’ll want to train it, visualize the losses, any other metrics (e.g. accuracy), model predictions, and perform a series of ablation experiments with explicit hypotheses along the way.

Tips & tricks for this stage:

fix random seed. Always use a fixed random seed to guarantee that when you run the code twice you will get the same outcome. This removes a factor of variation and will help keep you sane.
simplify. Make sure to disable any unnecessary fanciness. As an example, definitely turn off any data augmentation at this stage. Data augmentation is a regularization strategy that we may incorporate later, but for now it is just another opportunity to introduce some dumb bug.
add significant digits to your eval. When plotting the test loss run the evaluation over the entire (large) test set. Do not just plot test losses over batches and then rely on smoothing them in Tensorboard. We are in pursuit of correctness and are very willing to give up time for staying sane.
verify loss @ init. Verify that your loss starts at the correct loss value. E.g. if you initialize your final layer correctly you should measure -log(1/n_classes) on a softmax at initialization. The same default values can be derived for L2 regression, Huber losses, etc.
init well. Initialize the final layer weights correctly. E.g. if you are regressing some values that have a mean of 50 then initialize the final bias to 50. If you have an imbalanced dataset of a ratio 1:10 of positives:negatives, set the bias on your logits such that your network predicts probability of 0.1 at initialization. Setting these correctly will speed up convergence and eliminate “hockey stick” loss curves where in the first few iteration your network is basically just learning the bias.
human baseline. Monitor metrics other than loss that are human interpretable and checkable (e.g. accuracy). Whenever possible evaluate your own (human) accuracy and compare to it. Alternatively, annotate the test data twice and for each example treat one annotation as prediction and the second as ground truth.
input-indepent baseline. Train an input-independent baseline, (e.g. easiest is to just set all your inputs to zero). This should perform worse than when you actually plug in your data without zeroing it out. Does it? i.e. does your model learn to extract any information out of the input at all?
overfit one batch. Overfit a single batch of only a few examples (e.g. as little as two). To do so we increase the capacity of our model (e.g. add layers or filters) and verify that we can reach the lowest achievable loss (e.g. zero). I also like to visualize in the same plot both the label and the prediction and ensure that they end up aligning perfectly once we reach the minimum loss. If they do not, there is a bug somewhere and we cannot continue to the next stage.
verify decreasing training loss. At this stage you will hopefully be underfitting on your dataset because you’re working with a toy model. Try to increase its capacity just a bit. Did your training loss go down as it should?
visualize just before the net. The unambiguously correct place to visualize your data is immediately before your y_hat = model(x) (or sess.run in tf). That is - you want to * visualize exactly what goes into your network, decoding that raw tensor of data and labels into visualizations. This is the only “source of truth”. I can’t count the number of times this has saved me and revealed problems in data preprocessing and augmentation.
visualize prediction dynamics. I like to visualize model predictions on a fixed test batch during the course of training. The “dynamics” of how these predictions move will give you incredibly good intuition for how the training progresses. Many times it is possible to feel the network “struggle” to fit your data if it wiggles too much in some way, revealing instabilities. Very low or very high learning rates are also easily noticeable in the amount of jitter.
use backprop to chart dependencies. Your deep learning code will often contain complicated, vectorized, and broadcasted operations. A relatively common bug I’ve come across a few times is that people get this wrong (e.g. they use view instead of transpose/permute somewhere) and inadvertently mix information across the batch dimension. It is a depressing fact that your network will typically still train okay because it will learn to ignore data from the other examples. One way to debug this (and other related problems) is to set the loss to be something trivial like the sum of all outputs of example i, run the backward pass all the way to the input, and ensure that you get a non-zero gradient only on the i-th input. The same strategy can be used to e.g. ensure that your autoregressive model at time t only depends on 1…t-1. More generally, gradients give you information about what depends on what in your network, which can be useful for debugging.
generalize a special case. This is a bit more of a general coding tip but I’ve often seen people create bugs when they bite off more than they can chew, writing a relatively general functionality from scratch. I like to write a very specific function to what I’m doing right now, get that to work, and then generalize it later making sure that I get the same result. Often this applies to vectorizing code, where I almost always write out the fully loopy version first and only then transform it to vectorized code one loop at a time.

3. Overfit

At this stage we should have a good understanding of the dataset and we have the full training + evaluation pipeline working. For any given model we can (reproducibly) compute a metric that we trust. We are also armed with our performance for an input-independent baseline, the performance of a few dumb baselines (we better beat these), and we have a rough sense of the performance of a human (we hope to reach this). The stage is now set for iterating on a good model.

The approach I like to take to finding a good model has two stages: first get a model large enough that it can overfit (i.e. focus on training loss) and then regularize it appropriately (give up some training loss to improve the validation loss). The reason I like these two stages is that if we are not able to reach a low error rate with any model at all that may again indicate some issues, bugs, or misconfiguration.

A few tips & tricks for this stage:

picking the model. To reach a good training loss you’ll want to choose an appropriate architecture for the data. When it comes to choosing this my #1 advice is: Don’t be a hero. I’ve seen a lot of people who are eager to get crazy and creative in stacking up the lego blocks of the neural net toolbox in various exotic architectures that make sense to them. Resist this temptation strongly in the early stages of your project. I always advise people to simply find the most related paper and copy paste their simplest architecture that achieves good performance. E.g. if you are classifying images don’t be a hero and just copy paste a ResNet-50 for your first run. You’re allowed to do something more custom later and beat this.
adam is safe. In the early stages of setting baselines I like to use Adam with a learning rate of 3e-4. In my experience Adam is much more forgiving to hyperparameters, including a bad learning rate. For ConvNets a well-tuned SGD will almost always slightly outperform Adam, but the optimal learning rate region is much more narrow and problem-specific. (Note: If you are using RNNs and related sequence models it is more common to use Adam. At the initial stage of your project, again, don’t be a hero and follow whatever the most related papers do.)
complexify only one at a time. If you have multiple signals to plug into your classifier I would advise that you plug them in one by one and every time ensure that you get a performance boost you’d expect. Don’t throw the kitchen sink at your model at the start. There are other ways of building up complexity - e.g. you can try to plug in smaller images first and make them bigger later, etc.
do not trust learning rate decay defaults. If you are re-purposing code from some other domain always be very careful with learning rate decay. Not only would you want to use different decay schedules for different problems, but - even worse - in a typical implementation the schedule will be based current epoch number, which can vary widely simply depending on the size of your dataset. E.g. ImageNet would decay by 10 on epoch 30. If you’re not training ImageNet then you almost certainly do not want this. If you’re not careful your code could secretely be driving your learning rate to zero too early, not allowing your model to converge. In my own work I always disable learning rate decays entirely (I use a constant LR) and tune this all the way at the very end.

4. Regularize

Ideally, we are now at a place where we have a large model that is fitting at least the training set. Now it is time to regularize it and gain some validation accuracy by giving up some of the training accuracy. Some tips & tricks:

get more data. First, the by far best and preferred way to regularize a model in any practical setting is to add more real training data. It is a very common mistake to spend a lot engineering cycles trying to squeeze juice out of a small dataset when you could instead be collecting more data. As far as I’m aware adding more data is pretty much the only guaranteed way to monotonically improve the performance of a well-configured neural network almost indefinitely. The other would be ensembles (if you can afford them), but that tops out after ~5 models.
data augment. The next best thing to real data is half-fake data - try out more aggressive data augmentation.
creative augmentation. If half-fake data doesn’t do it, fake data may also do something. People are finding creative ways of expanding datasets; For example, domain randomization, use of simulation, clever hybrids such as inserting (potentially simulated) data into scenes, or even GANs.
pretrain. It rarely ever hurts to use a pretrained network if you can, even if you have enough data.
stick with supervised learning. Do not get over-excited about unsupervised pretraining. Unlike what that blog post from 2008 tells you, as far as I know, no version of it has reported strong results in modern computer vision (though NLP seems to be doing pretty well with BERT and friends these days, quite likely owing to the more deliberate nature of text, and a higher signal to noise ratio).
smaller input dimensionality. Remove features that may contain spurious signal. Any added spurious input is just another opportunity to overfit if your dataset is small. Similarly, if low-level details don’t matter much try to input a smaller image.
smaller model size. In many cases you can use domain knowledge constraints on the network to decrease its size. As an example, it used to be trendy to use Fully Connected layers at the top of backbones for ImageNet but these have since been replaced with simple average pooling, eliminating a ton of parameters in the process.
decrease the batch size. Due to the normalization inside batch norm smaller batch sizes somewhat correspond to stronger regularization. This is because the batch empirical mean/std are more approximate versions of the full mean/std so the scale & offset “wiggles” your batch around more.
drop. Add dropout. Use dropout2d (spatial dropout) for ConvNets. Use this sparingly/carefully because dropout does not seem to play nice with batch normalization.
weight decay. Increase the weight decay penalty.
early stopping. Stop training based on your measured validation loss to catch your model just as it’s about to overfit.
try a larger model. I mention this last and only after early stopping but I’ve found a few times in the past that larger models will of course overfit much more eventually, but their “early stopped” performance can often be much better than that of smaller models.

Finally, to gain additional confidence that your network is a reasonable classifier, I like to visualize the network’s first-layer weights and ensure you get nice edges that make sense. If your first layer filters look like noise then something could be off. Similarly, activations inside the net can sometimes display odd artifacts and hint at problems.

5. Tune

You should now be “in the loop” with your dataset exploring a wide model space for architectures that achieve low validation loss. A few tips and tricks for this step:

random over grid search. For simultaneously tuning multiple hyperparameters it may sound tempting to use grid search to ensure coverage of all settings, but keep in mind that it is best to use random search instead. Intuitively, this is because neural nets are often much more sensitive to some parameters than others. In the limit, if a parameter a matters but changing b has no effect then you’d rather sample a more throughly than at a few fixed points multiple times.
hyper-parameter optimization. There is a large number of fancy bayesian hyper-parameter optimization toolboxes around and a few of my friends have also reported success with them, but my personal experience is that the state of the art approach to exploring a nice and wide space of models and hyperparameters is to use an intern ?. Just kidding.

6. Squeeze out the juice

Once you find the best types of architectures and hyper-parameters you can still use a few more tricks to squeeze out the last pieces of juice out of the system:

ensembles. Model ensembles are a pretty much guaranteed way to gain 2% of accuracy on anything. If you can’t afford the computation at test time look into distilling your ensemble into a network using dark knowledge.
leave it training. I’ve often seen people tempted to stop the model training when the validation loss seems to be leveling off. In my experience networks keep training for unintuitively long time. One time I accidentally left a model training during the winter break and when I got back in January it was SOTA (“state of the art”).
Conclusion
Once you make it here you’ll have all the ingredients for success: You have a deep understanding of the technology, the dataset and the problem, you’ve set up the entire training/evaluation infrastructure and achieved high confidence in its accuracy, and you’ve explored increasingly more complex models, gaining performance improvements in ways you’ve predicted each step of the way. You’re now ready to read a lot of papers, try a large number of experiments, and get your SOTA results. Good luck!

深度学习论文: CAS-ViT: Convolutional Additive Self-attention Vision Transformers mingo_敏 Paper Reading 深度学习人工智能
深度学习论文:CAS-ViT:ConvolutionalAdditiveSelf-attentionVisionTransformersforEfficientMobileApplicationsCAS-ViT:ConvolutionalAdditiveSelf-attentionVisionTransformersforEfficientMobileApplicationsPDF:https:/
深度学习论文: Image Segmentation Using Text and Image Prompts mingo_敏 Paper Reading Semantic Segmentation 深度学习人工智能
深度学习论文:ImageSegmentationUsingTextandImagePromptsImageSegmentationUsingTextandImagePromptsPDF:https://arxiv.org/abs/2503.10622v1PyTorch代码:https://github.com/shanglianlm0525/CvPytorchPyTorch代码:https://g
SD模型微调之LoRA 好评笔记补档深度学习计算机视觉人工智能面试 AIGC SD stable diffusion
大家好，这里是Goodnote（好评笔记），关注公主号Goodnote，专栏文章私信限时Free。本文是SD模型微调方法LoRA的详细介绍，包括数据集准备，模型微调过程，推理过程，优缺点等。热门专栏机器学习机器学习笔记合集深度学习深度学习笔记合集文章目录热门专栏机器学习深度学习论文概念核心原理优点训练过程预训练模型加载选择微调的层LoRA优化的层Cross-Attention（跨注意力）层Self
深度学习论文阅读路线图喜欢打酱油的老鸟深度学习论文阅读路线图深度学习论文阅读路线图论文阅读路线图
https://www.toutiao.com/a6703859415763649031/作者：floodsun编译：ronghuaiyang这是作者一年前整理的东西，有些最新的论文没有包含进去，但是对于新手来说，入门足够了！如果你是深度学习领域的新人，你的第一个问题可能是“我该从哪些论文开始读起呢？”这就是深度学习论文的阅读路线图！这个路线图是根据下面几个规则构建的：从概要到细节从老的到最新的业
深度学习论文: Cultivated Land Extraction from High-Resolution Remote Sensing Image mingo_敏 Paper Reading Deep Learning Instance Segmentation python 人工智能机器学习
深度学习论文:CultivatedLandExtractionfromHigh-ResolutionRemoteSensingImageTheWinningSolutiontotheiFLYTEKChallenge2021CultivatedLandExtractionfromHigh-ResolutionRemoteSensingImagePDF:https://arxiv.org/pdf/22
深度学习论文精读（7）：MTCNN hwl19951007 计算机视觉论文精读
深度学习论文精读（7）：MTCNN论文地址：JointFaceDetectionandAlignmentusingMulti-taskCascadedConvolutionalNetworks译文地址：https://zhuanlan.zhihu.com/p/37884254参考博文1：https://zhuanlan.zhihu.com/p/38520597官方地址：https://kpzhan
易 AI - 使用 TensorFlow 2 Keras 实现 AlexNet CNN 架构 CatchZeng
原文：https://makeoptim.com/deep-learning/yiai-alexnet-implementation前言网络结构实现SequentialSubclassingDemo小结参考前言上一篇笔者使用如何阅读深度学习论文的方法阅读了AlexNet。为了加深理解，本文带大家使用TensorFlow2Keras实现AlexNetCNN架构。网络结构image从上一篇可以得到Al
【初读论文】 Selvaggia 深度学习 python
这里写目录标题万字长文解析深度学习中的术语面向小白的深度学习论文术语（持续更新）deepsolo不懂的知识pipelinebaselineRoI(RegionofInterest)分类问题中的正例负例指示函数（indicatorfunction）模型性能评估指标（PRF1……）深度学习中的FPN详解CNN解码Transformer：自注意力机制与编解码器机制详述与代码实现deepsolo前言知乎深
第4周：Pytorch——综合应用和实战项目 Day 28-30: 学习资源和社区参与 M.D 学习 pytorch tensorflow
第4周：综合应用和实战项目Day28-30:学习资源和社区参与在这个阶段，我们将探索更多的学习资源并鼓励参与PyTorch和TensorFlow的社区，以进一步提升技术和融入开发者社群。学习资源：论文：阅读最新的机器学习和深度学习论文，了解领域的最新进展。推荐资源包括arXiv、GoogleScholar。博客和教程：关注行业知名博客和教程，如TowardsDataScience,Medium,P
深度学习论文解读分享之diffGrad：一种卷积神经网络优化方法曦曦逆风深度学习深度学习 cnn 人工智能
IEEETNNLS2020：diffGrad:一种卷积神经网络优化方法题目diffGrad:AnOptimizationMethodforConvolutionalNeuralNetworks作者ShivRamDubey,Member,IEEE,SoumenduChakraborty,SwalpaKumarRoy,StudentMember,IEEE,SnehasisMukherjee,Membe
AI 论文精读，中文视频讲解：剖析人工智能本质 | 开源日报 No.120 开源服务指南开源日报人工智能开源
mli/paper-readingStars:21.8kLicense:Apache-2.0深度学习论文精读是一个深度学习相关论文列表，包括计算机视觉、生成模型、自然语言处理等多个领域。该项目的核心优势和特点包括：提供了大量关于深度学习各领域热门文章内容对不同年份发表的有较高引用率或近期比较有意思的文章进行详尽解读涵盖了计算机视觉、生成模型、自然语言处理等多个方面，为广大研究者提供全面而专业的知识
深度学习论文阅读：Generative Pre-Training(GPT) 阿正的梦工坊 DL Papers 深度学习 GPT BERT transformer
文章目录GPTAbstract1Introduction6Conclusion2RelatedWork3Framework3.1Unsupervisedpre-trainingGPT和BERT的区别3.2Supervisedfine-tuning3.3Task-specificinputtransformations4Experiments总结参考GPT核心点：预训练一个transformerde
推荐·人工智能+深度学习论文阅读小组我的昵称违规了
Pytorch学习到第5篇论文，这篇论文解读很少，就在网上搜了一下，不经意发现这个小组，推荐给大家。似乎不让放外链？我试一下PaperWeeklyPaperWeekly论文阅读小组阅读论文是小众活动，阅读者分散在全球各地。PaperWeekly论文阅读小组，把分散在全球的华人阅读者，聚合在一起。不仅互帮互助读懂论文，而且通过讨论，激发灵感。进入PaperWeekly的网站，阅读者不仅可以看到本周热
经典深度学习论文中英文翻译 MrUncle德鲁机器学习论文翻译深度学习中英文
DeepLearningPapersTranslation(CV)仅为方便查看。本文转自：SnailTyan的Github（侵删）ImageClassificationAlexNetImageNetClassificationwithDeepConvolutionalNeuralNetworks中文版中英文对照VGGVeryDeepConvolutionalNetworksforLarge-Sca
使用 PointNet 进行3D点集（即点云）的分类 TD程序员深度学习开发实践系列分类数据挖掘人工智能机器学习神经网络 3d
点云分类介绍无序3D点集（即点云）的分类、检测和分割是计算机视觉中的核心问题。此示例实现了开创性的点云深度学习论文PointNet（Qi等人，2017）。设置如果使用colab首先安装trimesh!pipinstalltrimesh。importosimportglobimporttrimeshimportnumpyasnpimporttensorflowastffromte
[深度学习论文笔记]Hybrid Window Attention Based Transformer Architecture for Brain Tumor Segmentation SerendipityQYK 深度学习之医学图像分割论文深度学习 transformer 医学图像处理肿瘤分割人工智能
HybridWindowAttentionBasedTransformerArchitectureforBrainTumorSegmentation基于混合窗口注意力的Transformer结构脑肿瘤分割Author：HimashiPeiris,MunawarHayat,ZhaolinChen,GaryEgan,MehrtashHarandiUnit：MonashUniversitySubmitt
FlyAI小课堂：深度学习论文翻译解析（3）：丰富的特征层次结构，可实现准确的目标检测和语义分割 iFlyAI 竞赛深度学习目标检测机器翻译目标检测语义分割深度学习
论文标题：Richfeaturehierarchiesforaccurateobjectdetectionandsemanticsegmentation标题翻译：丰富的特征层次结构，可实现准确的目标检测和语义分割论文作者：RossGirshickJeffDonahueTrevorDarrellJitendraMali论文地址：http://fcv2011.ulsan.ac.kr/files/ann
深度学习论文翻译 -- Inception-v4，Inception-ResNet and the Impact of Residual Connections on Learning X_Imagine 深度学习论文翻译 Inception-V4 图像分类深度学习
本文翻译论文为深度学习经典模型之一：Inception-V4论文链接：https://arxiv.org/pdf/1602.07261.pdf摘要：近些年，超深度卷积网络成为图像识别领域的核心算法。其中，Inception结构在图像分类中表现优秀，并且计算代价很低。最近，残差与更加传统的结构相结合，在ILSVRC挑战中获得Start-of-art的结果（与Inception-v3）的分类精度差不多
机器学习/深度学习论文里的损失函数 L字体书写方式 Echo_ac python
损失函数L\mathcal{L}L：\mathcal{L}损失函数l\mathcal{l}l：\mathcal{l}
深度学习论文: ISTDU-Net：Infrared Small-Target Detection U-Net及其PyTorch实现 mingo_敏 Paper Reading Deep Learning Semantic Segmentation 深度学习 pytorch 人工智能
深度学习论文:ISTDU-Net：InfraredSmall-TargetDetectionU-Net及其PyTorch实现ISTDU-Net：InfraredSmall-TargetDetectionU-NetPDF:https://doi.org/10.1109/LGRS.2022.3141584PyTorch代码:https://github.com/shanglianlm0525/CvPy
深度学习论文: Rethinking Mobile Block for Efficient Attention-based Models及其PyTorch实现 mingo_敏 Paper Reading Deep Learning 深度学习 pytorch 人工智能
深度学习论文:RethinkingMobileBlockforEfficientAttention-basedModels及其PyTorch实现RethinkingMobileBlockforEfficientAttention-basedModelsPDF:https://arxiv.org/pdf/2301.01146.pdfPyTorch代码:https://github.com/shang
ICCV 2023 | Ada3D: 利用动态推理挖掘3D感知任务中数据冗余性 AITIME论道 3d
点击蓝字关注我们AITIME欢迎每一位AI爱好者的加入！以下内容来源于将门创投作者：赵天辰机构：清华大学电子工程系研究方向：硬件友好的高效深度学习论文标题：Ada3D:ExploitingtheSpatialRedundancywithAdaptiveInferenceforEfficient3DObjectDetection论文地址：https://arxiv.org/abs/2307.0820
深度学习论文分享（六）Simple Baselines for Image Restoration 澪mio 深度学习论文分享深度学习人工智能
深度学习论文分享（六）SimpleBaselinesforImageRestoration前言Abstract1Introduction2RelatedWorks2.1ImageRestoration2.2GatedLinearUnits3BuildASimpleBaseline3.1Architecture3.2APlainBlock3.3Normalization3.4Activation3.
深度学习论文分享（七）Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild 澪mio 深度学习论文分享深度学习人工智能
深度学习论文分享（七）DenoisingDiffusionProbabilisticModelsforRobustImageSuper-ResolutionintheWild前言Abstract1.Introduction2.BackgroundonDiffusionModels3.RelatedWork4.Methodology4.1.Architecture4.2.Higher-orderde
深度学习论文分享（八）Learning Event-Driven Video Deblurring and Interpolation 澪mio 深度学习论文分享深度学习人工智能
深度学习论文分享（八）LearningEvent-DrivenVideoDeblurringandInterpolation前言Abstract1Introduction2Motivation2.1PhysicalModelofEvent-basedVideoReconstruction2.2SpatiallyVariantTriggeringThreshold3ProposedMethods3.
深度学习论文: Segment Any Anomaly without Training via Hybrid Prompt Regularization mingo_敏 Unsupervised Anomaly Detection Paper Reading Deep Learning 深度学习 prompt 人工智能
深度学习论文:SegmentAnyAnomalywithoutTrainingviaHybridPromptRegularizationSegmentAnyAnomalywithoutTrainingviaHybridPromptRegularizationPDF:https://arxiv.org/pdf/2305.10724.pdfPyTorch代码:https://github.com/sh
年末回顾：2021年 AI 领域十大研究趋势及必读论文夕小瑶人工智能大数据算法编程语言 python
编|小轶，Yimin_饭煲在本文中，我们将梳理近百篇的最新深度学习论文，以总结出“2021年十大AI研究趋势”。AI领域的论文可谓层出不穷。这篇文章或许能帮助你跟踪总体趋势和重要研究。下文中提及的部分工作可能并不发表于2021年，但对于形成2021年的AI趋势也起到了重要作用，因而也在本文中列出。1.OpenAICLIPOpenAI今年年初发布的CLIP模型可以说是今年AI行业最重要的里程碑。CL
深度学习论文: RepViT: Revisiting Mobile CNN From ViT Perspective及其PyTorch实现 mingo_敏 Paper Reading Deep Learning 深度学习 cnn pytorch
深度学习论文:RepViT:RevisitingMobileCNNFromViTPerspective及其PyTorch实现RepViT:RevisitingMobileCNNFromViTPerspectivePDF:https://arxiv.org/pdf/2307.09283.pdfPyTorch代码:https://github.com/shanglianlm0525/CvPytorch
深度学习论文: Towards Total Recall in Industrial Anomaly Detection及其PyTorch实现 mingo_敏 Unsupervised Anomaly Detection Paper Reading Deep Learning 深度学习 pytorch 人工智能
深度学习论文:TowardsTotalRecallinIndustrialAnomalyDetection及其PyTorch实现TowardsTotalRecallinIndustrialAnomalyDetectionPDF:https://arxiv.org/pdf/2106.08265.pdfPyTorch代码:https://github.com/shanglianlm0525/CvPyt
万字长文解析深度学习中的术语追忆苔上雪深度学习人工智能 pytorch 机器学习神经网络
引言新手在学习深度学习或者在看深度学习论文的过程中，有不少专业词汇，软件翻译不出来，就算是翻译出来也看不懂，因为不少术语是借用其他学科的概念，这里整理了一些在深度学习中常见的术语，并对一些概念进行解释。这里先教大家一个查概念的方法，比如我想查Ablationstudy，这个中文翻译是消融实验，这概念谁能明白呢，咱们可以从根源去查消融实验的含义，打开google，直接搜whatisxxxindeep
[星球大战]阿纳金的背叛 comsci
本来杰迪圣殿的长老是不同意让阿纳金接受训练的......... 但是由于政治原因,长老会妥协了...这给邪恶的力量带来了机会所以......现代的地球联邦接受了这个教训...绝对不让某些年轻人进入学院
看懂它，你就可以任性的玩耍了！ aijuans JavaScript
javascript作为前端开发的标配技能，如果不掌握好它的三大特点：1.原型 2.作用域 3. 闭包 ,又怎么可以说你学好了这门语言呢？如果标配的技能都没有撑握好，怎么可以任性的玩耍呢？怎么验证自己学好了以上三个基本点呢，我找到一段不错的代码，稍加改动，如果能够读懂它，那么你就可以任性了。 function jClass(b
Java常用工具包 Jodd Kai_Ge java jodd
Jodd 是一个开源的 Java 工具集，包含一些实用的工具类和小型框架。简单，却很强大！写道 Jodd = Tools + IoC + MVC + DB + AOP + TX + JSON + HTML < 1.5 Mb Jodd 被分成众多模块，按需选择，其中工具类模块有： jodd-core &nb
SpringMvc下载 120153216 springMVC
@RequestMapping(value = WebUrlConstant.DOWNLOAD) public void download(HttpServletRequest request,HttpServletResponse response,String fileName) { OutputStream os = null; InputStream is = null;
Python 标准异常总结 2002wmj python
Python标准异常总结 AssertionError 断言语句（assert）失败 AttributeError 尝试访问未知的对象属性 EOFError 用户输入文件末尾标志EOF（Ctrl+d） FloatingPointError 浮点计算错误 GeneratorExit generator.close()方法被调用的时候 ImportError 导入模块失
SQL函数返回临时表结构的数据用于查询 357029540 SQL Server
这两天在做一个查询的SQL，这个SQL的一个条件是通过游标实现另外两张表查询出一个多条数据，这些数据都是INT类型，然后用IN条件进行查询，并且查询这两张表需要通过外部传入参数才能查询出所需数据，于是想到了用SQL函数返回值，并且也这样做了，由于是返回多条数据，所以把查询出来的INT类型值都拼接为了字符串，这时就遇到问题了，在查询SQL中因为条件是INT值，SQL函数的CAST和CONVERST都
java 时间格式化 | 比较大小| 时区个人笔记 7454103 java eclipse tomcat c MyEclipse
个人总结！不当之处多多包含！引用 1.0 如何设置 tomcat 的时区：位置：(catalina.bat---JAVA_OPTS 下面加上) set JAVA_OPT
时间获取Clander的用法 adminjun Clander 时间
/** * 得到几天前的时间 * @param d * @param day * @return */ public static Date getDateBefore(Date d,int day){ Calend
JVM初探与设置 aijuans java
JVM是Java Virtual Machine（Java虚拟机）的缩写，JVM是一种用于计算设备的规范，它是一个虚构出来的计算机，是通过在实际的计算机上仿真模拟各种计算机功能来实现的。Java虚拟机包括一套字节码指令集、一组寄存器、一个栈、一个垃圾回收堆和一个存储方法域。 JVM屏蔽了与具体操作系统平台相关的信息，使Java程序只需生成在Java虚拟机上运行的目标代码（字节码）,就可以在多种平台
SQL中ON和WHERE的区别 avords
SQL中ON和WHERE的区别数据库在通过连接两张或多张表来返回记录时，都会生成一张中间的临时表，然后再将这张临时表返回给用户。 www.2cto.com 在使用left jion时，on和where条件的区别如下： 1、 on条件是在生成临时表时使用的条件，它不管on中的条件是否为真，都会返回左边表中的记录。
说说自信 houxinyou 工作生活
自信的来源分为两种,一种是源于实力,一种源于头脑.实力是一个综合的评定,有自身的能力,能利用的资源等.比如我想去月亮上,要身体素质过硬,还要有飞船等等一系列的东西.这些都属于实力的一部分.而头脑不同,只要你头脑够简单就可以了!同样要上月亮上,你想,我一跳,1米,我多跳几下,跳个几年,应该就到了!什么?你说我会往下掉?你笨呀你!找个东西踩一下不就行了吗? 无论工作还
WEBLOGIC事务超时设置 bijian1013 weblogic jta 事务超时
系统中统计数据，由于调用统计过程，执行时间超过了weblogic设置的时间，提示如下错误：统计数据出错! 原因：The transaction is no longer active - status: 'Rolling Back. [Reason=weblogic.transaction.internal
两年已过去，再看该如何快速融入新团队 bingyingao java 互联网融入架构新团队
偶得的空闲，翻到了两年前的帖子该如何快速融入一个新团队，有所感触，就记下来，为下一个两年后的今天做参考。时隔两年半之后的今天，再来看当初的这个博客，别有一番滋味。而我已经于今年三月份离开了当初所在的团队，加入另外的一个项目组，2011年的这篇博客之后的时光，我很好的融入了那个团队，而直到现在和同事们关系都特别好。大家在短短一年半的时间离一起经历了一
【Spark七十七】Spark分析Nginx和Apache的access.log bit1129 apache
Spark分析Nginx和Apache的access.log，第一个问题是要对Nginx和Apache的access.log文件进行按行解析，按行解析就的方法是正则表达式： Nginx的access.log解析正则表达式 val PATTERN = """([^ ]*) ([^ ]*) ([^ ]*) (\\[.*\\]) (\&q
Erlang patch bookjovi erlang
Totally five patchs committed to erlang otp, just small patchs. IMO, erlang really is a interesting programming language, I really like its concurrency feature. but the functional programming style
log4j日志路径中加入日期 bro_feng java log4j
要用log4j使用记录日志，日志路径有每日的日期，文件大小5M新增文件。实现方式 log4j: <appender name="serviceLog" class="org.apache.log4j.RollingFileAppender"> <param name="Encoding" v
读《研磨设计模式》-代码笔记-桥接模式 bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /** * 个人觉得关于桥接模式的例子，蜡笔和毛笔这个例子是最贴切的：http://www.cnblogs.com/zhenyulu/articles/67016.html * 笔和颜色是可分离的，蜡笔把两者耦合在一起了：一支蜡笔只有一种
windows7下SVN和Eclipse插件安装 chenyu19891124 eclipse插件
今天花了一天时间弄SVN和Eclipse插件的安装，今天弄好了。svn插件和Eclipse整合有两种方式，一种是直接下载插件包，二种是通过Eclipse在线更新。由于之前Eclipse版本和svn插件版本有差别，始终是没装上。最后在网上找到了适合的版本。所用的环境系统：windows7JDK：1.7svn插件包版本：1.8.16Eclipse：3.7.2工具下载地址：Eclipse下在地址：htt
[转帖]工作流引擎设计思路 comsci 设计模式工作应用服务器 workflow 企业应用
作为国内的同行，我非常希望在流程设计方面和大家交流，刚发现篇好文(那么好的文章，现在才发现，可惜)，关于流程设计的一些原理，个人觉得本文站得高，看得远，比俺的文章有深度，转载如下 ================================================================================= 自开博以来不断有朋友来探讨工作流引擎该如何
Linux 查看内存，CPU及硬盘大小的方法 daizj linux cpu 内存硬盘大小
一、查看CPU信息的命令 [root@R4 ~]# cat /proc/cpuinfo |grep "model name" && cat /proc/cpuinfo |grep "physical id" model name : Intel(R) Xeon(R) CPU X5450 @ 3.00GHz model name :
linux 踢出在线用户 dongwei_6688 linux
两个步骤： 1.用w命令找到要踢出的用户，比如下面： [root@localhost ~]# w 18:16:55 up 39 days, 8:27, 3 users, load average: 0.03, 0.03, 0.00 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
放手吧,就像不曾拥有过一样 dcj3sjt126com
内容提要：静悠悠编著的《放手吧就像不曾拥有过一样》集结“全球华语世界最舒缓心灵”的精华故事，触碰生命最深层次的感动，献给全世界亿万读者。《放手吧就像不曾拥有过一样》的作者衷心地祝愿每一位读者都给自己一个重新出发的理由，将那些令你痛苦的、扛起的、背负的，一并都放下吧！把憔悴的面容换做一种清淡的微笑，把沉重的步伐调节成春天五线谱上的音符，让自己踏着轻快的节奏，在人生的海面上悠然漂荡，享受宁静与
php二进制安全的含义 dcj3sjt126com PHP
PHP里，有string的概念。 string里，每个字符的大小为byte（与PHP相比，Java的每个字符为Character，是UTF8字符，C语言的每个字符可以在编译时选择）。 byte里，有ASCII代码的字符，例如ABC，123，abc，也有一些特殊字符，例如回车，退格之类的。特殊字符很多是不能显示的。或者说，他们的显示方式没有标准，例如编码65到哪儿都是字母A，编码97到哪儿都是字符
Linux下禁用T440s，X240的一体化触摸板(touchpad) gashero linux ThinkPad 触摸板
自打1月买了Thinkpad T440s就一直很火大，其中最让人恼火的莫过于触摸板。 Thinkpad的经典就包括用了小红点(TrackPoint)。但是小红点只能定位，还是需要鼠标的左右键的。但是自打T440s等开始启用了一体化触摸板，不再有实体的按键了。问题是要是好用也行。实际使用中，触摸板一堆问题，比如定位有抖动，以及按键时会有飘逸。这就导致了单击经常就
graph_dfs hcx2013 Graph
package edu.xidian.graph; class MyStack { private final int SIZE = 20; private int[] st; private int top; public MyStack() { st = new int[SIZE]; top = -1; } public void push(i
Spring4.1新特性——Spring核心部分及其他 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
配置HiveServer2的安全策略之自定义用户名密码验证 liyonghui160com
具体从网上看 http://doc.mapr.com/display/MapR/Using+HiveServer2#UsingHiveServer2-ConfiguringCustomAuthentication LDAP Authentication using OpenLDAP Setting
一位30多的程序员生涯经验总结 pda158 编程工作生活咨询
1.客户在接触到产品之后，才会真正明白自己的需求。　　这是我在我的第一份工作上面学来的。只有当我们给客户展示产品的时候，他们才会意识到哪些是必须的。给出一个功能性原型设计远远比一张长长的文字表格要好。 2.只要有充足的时间，所有安全防御系统都将失败。　　安全防御现如今是全世界都在关注的大课题、大挑战。我们必须时时刻刻积极完善它，因为黑客只要有一次成功，就可以彻底打败你。 3.
分布式web服务架构的演变自由的奴隶 linux Web 应用服务器互联网
最开始，由于某些想法，于是在互联网上搭建了一个网站，这个时候甚至有可能主机都是租借的，但由于这篇文章我们只关注架构的演变历程，因此就假设这个时候已经是托管了一台主机，并且有一定的带宽了，这个时候由于网站具备了一定的特色，吸引了部分人访问，逐渐你发现系统的压力越来越高，响应速度越来越慢，而这个时候比较明显的是数据库和应用互相影响，应用出问题了，数据库也很容易出现问题，而数据库出问题的时候，应用也容易
初探Druid连接池之二——慢SQL日志记录 xingsan_zhang 日志连接池 druid 慢SQL
由于工作原因，这里先不说连接数据库部分的配置，后面会补上，直接进入慢SQL日志记录。 1.applicationContext.xml中增加如下配置： <bean abstract="true" id="mysql_database" class="com.alibaba.druid.pool.DruidDataSourc