最近读论文,老师叫写的报告。。。
1.1. Title:
Localized Generalization Error Model and Its Application to Architecture selection for Radial Basis Function Neural Network
1.2. Problem authors wanted to solve:
The generalization error bounds found by current error models using the number of effective parameters of a classifier and the number of training samples are usually very loose. These bounds are intended for the entire input space. However, several learning machines like SVM, RBFNN, and MLPNN are local learning machines for solving problems and treat unseen samples near the training samples to be more important.
1.3. Solution:
The authors proposed a localized generalized error model to bounds more tightly and locally for some local learning machines and use this model to develop an architecture selection.
(1) First they proposed the definitions of the Q-Neighborhood and Q-Union. Usually for computational ease, they use hyper-square to represent the shape of the Q-Neighborhood, and later calculate the localized generalized error in this region.
(2) Then, they found an upper bound for the (Q) ,which depends on the training error and the Stochastic Sensitivity
(3) Later, they estimated the ST-SM for using the formula (9). Note that the derivation needs a function.
(4) In the end, they proposed this method for Architecture selection. First they preset a Q for Q-Neighborhood, find the best number M of hidden neurons from 1 to N for RBFNN such that is the minimum one and calculate the connection weight between the output layer and the hidden layer.
1.4. Any further improvement or problem:
Problem:
(1) Why we can have the probability of (1-η) when using ?
(2) How to explain this formula? Why we have 1/N in it?
(3) Why is hyper-square to represent the shape of the Q-Neighborhood easier than hyper-sphere?
Limitation:
(1) The derivation of ST-SM needs a function. What does that function mean?
(2) The distribution of the unseen samples is unknown. As for different distribution of the input space, user needs to develop different LGE.
2.1. Title:
Active learning using localized generalization error of candidate sample as criterion
2.2. Problem authors wanted to solve:
Most of the existing active learning methods did not directly relate to the generalization error of classifiers. Some of them need high computational time or are based on strict assumptions.
2.3. Solution:
The author used LGEM in sample selection for query. Every time the sample which yields the largest generalization error will be chosen for query. This method can be applied to different kinds of classifiers and its complexity is low.
(1)Cluster the candidate dataset and decide the architecture of a RBFNN
(2)Select the initial dataset from
(3)Train a RBFNN using the
(4)Set b=1
(5)Select the next training sample from such that
(6)Query the label as F
(7)Add to , and delete from
(8)Train a RBFNN using and b=b+1
(9)Stoop when b=1 or =0 .Otherwise go back to Step 5
2.4. Any further improvement or problem:
Problem:
(1) A version space is only exists if the training data are linearly separable in SVMs. Although SVMs can use a kernel function to map samples to the high dimensional feature space, no one can guarantee the dataset is linear separable before all labels are known.
Why?
(2)
Why we need to multiply a factor ?
3.1. Title:
Active Learning Methods for Interactive Image Retrieval
3.2. Problem authors wanted to solve:
(1)A lot of the learning strategies consider the CBIR process as a classical classification problem, without any adaptations to the characteristics of this context.
(2) Furthermore, the system has to handle classification with few training data, especially at the beginning of the search, where the query concept has to be estimated in a database of thousands of images with only a few examples.
3.3. Solution:
In this paper, we focus on statistical learning techniques for interactive image retrieval.
(1) Boundary correction, which corrects the noisy classification boundary in the first iterations;
They develop the update rule , in which the offset function h() satisfied that
When using SVMs for computing the classifier ( ), where
(2) Average precision maximization, which selects the images so that classification and mean average precision are enhanced;
(3) Batch selection, which addresses the problem of the selection of multiple images in the same feedback iteration.
3.4. Any further improvement or problem:
Problem:
(1)
What does b mean in this formula? What is KKT condition?
(2) The boundary correction increases the performances the most in the first feedback steps, while the diversification increases the performances the most after several feedback steps.
a better batch selection can be achieved by selecting samples in an iterative way [36]. Although this strategy is sub-optimal, it requires little computation in comparison to an exhaustive search of the best subset.
What does diversification mean? Why is diversification responsible for the performances after several feedback steps?
(3)
Is the estimation of reasonable?