DS Wannabe之5-AM Project: DS 30day int prep day5

Q1: What are Epochs?

One Epoch is an ENTIRE dataset is passed forwards and backwards through the neural network.

Since one epoch is too large to feed to the computer at once, we divide it into several smaller batches.

We always use more than one Epoch because one epoch leads to underfitting.

As the number of epochs increases, several times the weight are changed in the neural network and the curve goes from underfitting up to optimal to overfitting curve.

Q2. What is the batch size?

Batch Size

The total number of training and examples present in a single batch.

Unlike the learning rate hyperparameter where its value doesn’t affect computational time, the batch sizes must be examined in conjunctions with the execution time of training. The batch size is limited by hardware’s memory, while the learning rate is not. Leslie recommends using a batch size that fits in hardware’s memory and enables using larger learning rate.

If our server has multiple GPUs, the total batch size is the batch size on a GPU multiplied by the numbers of GPU. If the architectures are small or your hardware permits very large batch sizes, then you might compare the performance of different batch sizes. Also, recall that small batch sizes add regularization while large batch sizes add less, so utilize this while balancing the proper amount of regularization. It is often better to use large batch sizes so a larger learning rate can be used.

Q3: What is dropout in Neural network?

Dropout refers to ignoring units during the training phase of a certain set of neurons which is chosen randomly. These units are not considered during the particular forward or backward pass.

More technically, at each training stage, individual nodes are either dropped out of the net with probability 1-p or kept with probability p, so that a reduced network is left; incoming and outgoing edges to a dropped-out node are also removed.

在神经网络中,Dropout是一种正则化技术,用于防止过拟合。它通过在训练阶段随机忽略一定比例的神经元(以及它们的连接)来工作,从而强制网络学习更鲁棒的特征。Dropout可以应用于网络的任何层,通常用于隐藏层和输入层,但不用于输出层。这种方法虽然可能会使得训练迭代次数增加,但每个周期的训练时间会减少。

We need Dropout to prevent over-fitting

A dropout is an approach to regularization in neural networks which helps to reduce interdependent learning amongst the neurons.

Where to use

Dropout is implemented per-layer in a neural network.

It can be used with most types of layers, such as dense fully connected layers, convolutional layers, and recurrent layers such as the long short-term memory network layer.

Dropout may be implemented on any or all hidden layers in the network as well as the visible or input layer. It is not used on the output layer.

Benefits Dropout的优势包括:

1)它迫使神经网络学习更健壮的特征,这些特征与其他神经元的不同随机子集结合使用时非常有用

2)尽管Dropout可能使得达到收敛所需的迭代次数翻倍,但每个周期的训练时间却减少了

Q4: List down hyperparameter tuning in deep learning.

The process of setting the hyper-parameters requires expertise and extensive trial and error. There are no simple and easy ways to set hyper-parameters — specifically, learning rate, batch size, momentum, and weight decay.

Approaches to searching for the best configuration:

  • Grid Search

  • Random Search

Approach

  1. Observe and understand the clues available during training by monitoring validation/test loss early in training, tune your architecture and hyper-parameters with short runs of a few epochs.

  2. Signs of underfitting or overfitting of the test or validation loss early in the training process are useful for tuning the hyper-parameters.

Tools for Optimizing Hyperparameters

  • Sage Maker: amazon

  • Comet.ml: comet

  • Weights &Biases

  • Deep Cognition

  • Azure ML

Q5: What do you understand by activation function and error functions?

Error functions

In most learning networks, an error is calculated as the difference between the predicted output and the actual output.

The function that is used to compute this error is known as Loss Function J(.). Different loss functions will give different errors for the same prediction, and thus have a considerable effect on the performance of the model. One of the most widely used loss function is mean square error, which calculates the square of the difference between the actual values and predicted value. Different loss functions are used to deals with a different type of tasks, i.e. regression and classification.

Regressive loss functions: Mean Square Error
Absolute error
Smooth Absolute Error

Classification loss functions:

1. Binary Cross-Entropy
2. Negative Log-Likelihood 3. Margin Classifier
4. Soft Margin Classifier

Activation functions decide whether a neuron should be activated or not by calculating a weighted sum and adding bias with it. The purpose of the activation function is to introduce non-linearity into the output of a neuron.

In a neural network, we would update the weights and biases of the neurons based on the error at the outputs. This process is known as back-propagation. Activation function makes the back-propagation possible since the gradients are supplied along with the errors to update the weights and biases.

Q6: Why do we need Non-linear activation functions?

A neural network without activation functions is essentially a linear regression model. The activation functions do the non-linear transformation to the input, making it capable of learning and performing more complex tasks.

  1. Identity
  2. BinaryStep
  3. Sigmoid
  4. Tanh
  5. ReLU
  6. LeakyReLU
  7. Softmax

The activation functions do the non-linear transformation to the input, making it capable of learning and performing more complex tasks.

Q7: What do you under by vanishing gradient problem and how can Do we solve that?

The problem:

As more layers using certain activation function are added to neural networks, the gradients of the loss function approach zero, making the networks tougher to train.

Why:

Certain activation functions, like the sigmoid function, squishes a large input space into a small input space between 0 and 1. Therefore, a large change in the input of the sigmoid function will cause a small change in the output. Hence, the derivative becomes small.DS Wannabe之5-AM Project: DS 30day int prep day5_第1张图片

For shallow networks with only a few layers that use these activations, this isn’t a big problem. However, when more layers are used, it can cause the gradient to be too small for training to work effectively.

However, when n hidden layers use an activation like the sigmoid function, n small derivatives are multiplied together. Thus, the gradient decreases exponentially as we propagate down to the initial layers.

Q8: What is Transfer learning in deep learning ?

Transfer learning: It is a machine learning method where a model is developed for the task is again used as the starting point for a model on a second task.

It is a popular approach in deep learning where pre-trained models are used as the starting point on computer vision and natural language processing tasks given the vast compute and time resources required to develop neural network models on these problems

Transfer learning is a machine learning technique where a model trained on one task is re-purposed on a second related task.

Transfer learning is an optimization that allows rapid progress or improved performance when modelling the second task.

Transfer learning only works in deep learning if the model features learned from the first task are general.

迁移学习是深度学习中的一种方法,其中已经针对一个任务开发的模型被用作第二个任务上模型的起点。这种方法在深度学习中很流行,特别是在计算机视觉和自然语言处理任务中,因为开发这些问题上的神经网络模型需要大量的计算和时间资源。迁移学习允许在建模第二个任务时迅速取得进展或提高性能,但这要求第一个任务中学习的模型特征具有普遍性。

Q9: What is VGG16 and explain the architecture of VGG16?

VGG-16 is a simpler architecture model since it’s not using many hyperparameters. It always uses 3 x 3 filters with the stride of 1 in convolution layer and uses SAME padding in pooling layers 2 x 2 with a stride of 2.

VGG16是一种深度学习模型,用于计算机视觉任务,特别是图像识别。它由牛津大学的视觉几何组(Visual Geometry Group)开发,包含16层,其中有13个卷积层、3个全连接层和5个池化层。VGG16以其简单而有效的结构著称,使用了3x3的小卷积核和2x2的最大池化层。这种架构使得VGG16能够学习到丰富的特征表示,从而在多个图像识别任务中取得了优异的性能。尽管VGG16相对较大且计算密集,但它的结构清晰、易于理解,常被用作迁移学习和其他计算机视觉任务的基础模型。

This architecture is from the VGG group, Oxford. It improves AlexNet by replacing the large kernel- sized filter with multiple 3X3 kernel-sized filters one after another. With a given receptive field(the effective area size of input image on which output depends), multiple stacked smaller size kernel is better than the one with a larger size kernel because multiple non-linear layers increases the depth of the network which enables it to learn more complex features, and that too at a lower cost.

Three fully connected layers follow the VGG convolutional layers. The width of the networks starts at the small value of 64 and increases by a factor of 2 after every sub-sampling/pooling layer. It achieves the top-5 accuracy of 92.3 % on ImageNet.

Q10: What is RESNET?

The winner of ILSRVC 2015, it also called as Residual Neural Network (ResNet) by Kaiming. This architecture introduced a concept called “skip connections”. Typically, the input matrix calculates in two linear transformations with ReLU activation function. In Residual network, it directly copies the input matrix to the second transformation output and sums the output in final ReLU function.

Experiments in paper four can judge the power of the residual network. The plain 34 layer network had high validation error than the 18 layers plain network. This is where we realize the degradation problems. And the same 34 layers network when converted to the residual network has much less training error than the 18 layers residual network.

RESNET,即残差网络,是一种深度学习模型,用于处理计算机视觉任务。它通过引入“残差学习”的概念来解决深度网络中的退化问题,使得网络可以通过跨层连接更有效地训练。这种架构允许构建更深的网络,从而提高性能,而不会导致梯度消失或爆炸的问题。RESNET已经在多个图像识别和分类任务中取得了显著的成果。

Q11: What is ImageNet?

ImageNet is a project aimed at (manually) labelling and categorizing images into almost 22,000 separate object categories for computer vision researches.

When we hear the about“ImageNet” in the context of deep learning and Convolutional Neural Network, we are referring to ImageNet Large Scale Visual Recognition Challenge.

The main aim of this image classification challenge is to train the model that can correctly classify an input image into the 1,000 separate objects category.

Models are trained on the ~1.2 million training images with another 50,000 images for validation and 100,000 images for testing.

These 1,000 image categories represent object classes that we encounter in our day-to-day lives, such as species of dogs, cats, various household objects, vehicle types, and much more.

When it comes to the image classification, the ImageNet challenge is the “de facto “ benchmark for computer vision classification algorithms — and the leaderboard for this challenge has been dominated by Convolutional Neural Networks and Deep learning techniques since 2012.

Q12: What is DarkNet?

DarkNet is a framework used to train neural networks; it is open source and written in C/CUDA and serves as the basis for YOLO. Darknet is also used as the framework for training YOLO, meaning it sets the architecture of the network. DarkNet是一个深度学习框架,它是用C和CUDA编写的,专为神经网络的训练而设计。它被广泛用于各种计算机视觉任务,包括物体检测和分类。

Clone the repo locally, and you have it. To compile it, run a make. But first, if you intend to use the GPU capability, you need to edit the Makefile in the first two lines, where you tell it to compile for GPU usage with CUDA drivers.

Q13: What is YOLO and explain the architecture of YOLO (you only Look Once). One use case?

YOLO v1

The first YOLO You only look once (YOLO) version came about May 2016 and sets the core of the algorithm, the following versions are improvements that fix some drawbacks.

In short, YOLO is a network “inspired by” Google Net. It has 24 convolutional layers working as the feature extractors and two dense layers for making the predictions. The architecture works upon is called Darknet, a neural network framework created by the first author of the YOLO paper.

Core Concept:-

The algorithm works off by dividing the image into the grid of the cells, for each cell bounding boxes and their scores are predicted, alongside class probabilities. The confidence is given in terms of IOU (intersection over union), metric, which is measuring how much the detected object overlaps with the ground truth as a fraction of the total area spanned by the two together (the union).

YOLO v2-

This improves on some of the shortcomings of the first version, namely the fact that it is not very good at detecting objects that are very near and tends to make some of the mistakes on localization.

It introduces a few newer things: Which are anchor boxes (pre-determined sets of boxes such that the network moves from predicting the bounding boxes to predicting the offsets from these) and the use of features that are more fine-grained so smaller objects can be predicted better.

YOLO(You Only Look Once)是一种流行的物体检测算法,以其速度快和准确率高著称。YOLO将物体检测视为单个回归问题,直接从图像像素到边界框坐标和类别概率的映射。它将输入图像分成网格,并对每个网格单元预测多个边界框及其相应的置信度和类别概率。YOLO的架构包括多个卷积层和全连接层。一个常见的用例是实时视频流中的物体检测,例如监控摄像头中的行人或车辆检测。

你可能感兴趣的:(Machine,Learning,Data,Science,深度学习,机器学习,人工智能,学习,笔记)