R.X. NLOS

Open AI 自监督学习笔记：Self-Supervised Learning | Tutorial | NeurIPS 2021

转载自微信公众号
原文链接： https://mp.weixin.qq.com/s?__biz=Mzg4MjgxMjgyMg==&mid=2247486049&idx=1&sn=1d98375dcbb9d0d68e8733f2dd0a2d40&chksm=cf51b898f826318ead24e414144235cfd516af4abb71190aeca42b1082bd606df6973eb963f0#rd

Open AI 自监督学习笔记

文章目录

- Open AI 自监督学习笔记
- - Outline
  - Introduction
  - - What is self-supervised learning?
    - What's Possible with Self-Supervised Learning?
  - Early Work
  - - Early Work: Connecting the Dots
    - Restricted Boltzmann Machines
    - Autoencoder: Self-Supervised Learning for Vision in Early Days
    - Word2Vec: Self-Supervised Learning for Language
    - Autoregressive Modeling
    - Siamese Networks
    - Multiple Instance Learning & Metric Learning
  - Methods
  - - Methods for Framing Self-Supervised Learning Tasks
    - Self-Prediction
    - Self-prediction: Autoregressive Generation
    - Self-Prediction: Masked Generation
    - Self-Prediction: Innate Relationship Prediction
    - Self-Prediction: Hybrid Self-Prediction Models
    - Contrastive Learning
    - Contrastive Learning: Inter-Sample Classification
    - - Loss function 1: Contrastive loss
      - Loss function 2: Triplet loss
      - Loss function 3: N-pair loss
      - Loss function 4: Lifted structured loss
      - Loss function 5: Noise Contrastive Estimation (NCE)
      - Loss function 6: InfoNCE
      - Loss function 7: Soft-Nearest Neighbors Loss
    - Contrastive Learning: Feature Clustering
    - Contrastive Learning: Multiview Coding
    - Contrastive Learning between Modalities
  - Pretext tasks
  - - Recap: Pretext Tasks
    - Pretext Tasks: Taxonomy
    - Image / Vision Pretext Tasks
    - - Image Pretext Tasks: Varizational AutoEncoders
      - Image Pretext Tasks: Generative Adversial Networks
      - Vision Pretext Tasks: Autoregressive Image Generation
      - Vision Pretext Tasks: Diffusion Model
      - Vision Pretext Tasks: Masked Prediction
      - Vision Pretext Tasks: Colorization and More
      - Vision Pretext Tasks: Innate Relationship Prediction
      - Contrastive Predictive Coding and InfoNCE
      - Vision Pretext Tasks: Inter-Sample Classification
      - Vision Pretext Tasks: Contrastive Learning
      - Vision Pretext Tasks: Data Augmentation and Multiple Views
      - Vision Pretext Tasks: Inter-Sample Classification
      - MoCo
        
        SimCLR
        
        Barlow Twins
      - Vision Pretext Tasks: Non-Contrastive Siamese Networks
      - Vision Pretext Tasks: Feature Clustering with K-Means
      - Vision Pretext Tasks: Feature Clustering with Sinkhorm-Knopp
      - Vision Pretext Tasks: Feature Clustering to improve SSL
      - Vision Pretext Tasks: Nearest-Neighbor
      - Vision Pretext Tasks: Combining with Supervised Loss
    - Video Pretext Tasks
    - - Video Pretext Tasks: Innate Relationship Prediction
      - Video Pretext Tasks: Optical Flow
      - Video Pretext Tasks: Sequence Ordering
      - Video Pretext Tasks: COlorization
      - Video Pretext Tasks: Contrastive Multi-View Learning
      - Video Pretext Task: Autoregressive Generation
    - Audio Pretext Tasks
    - - Audio Pretext Tasks: Contrastive Learning
      - Audio Pretext Task: Masked Languagee Modeling for ASR
    - Multimodal Pretext Tasks
    - Language Pretext Tasks
    - - Language Pretext Tasks: Generative Language Modeling
      - Language Pretext Tasks: Sentence Embedding
  - Training Techniques
  - - Techniques: Data augmentation
    - - Techniques: Data augmentation -- Image Augmentation
      - Techniques: Data augmentation -- Text Augmentation
    - Hard Negative Mining
    - - What is "hard negative mining"
      - Explicit hard negative mining
      - Implicit hard negative mining
  - Theories
  - - Contrastive learning captures shared information betweem views
    - The InfoMin Principle
    - Alignment and Uniformity on the Hypersphere
    - Dimensional Collapse
    - Provable Guarantees for Contrastive Learning
  - Feature directions
  - - Future Directions

Video: https://www.youtube.com/watch?v=7l6fttRJzeU
Slides: https://nips.cc/media/neurips-2021/Slides/21895.pdf

Self-Supervised Learning
– Self-Prediction and Contrastive Learning

Self-Supervised Learning
- a popular paradigm of representation learning

Outline

Introduction: motivation, basic concepts, examples
Early Work: Look into connection with old methods
Methods
- Self-prediction
- Contrastive Learning
- (for each subsection, present the framework and categorization)
Pretext tasks: a wide range of literature review
Techniques: improve training efficiency

Introduction

What is self-supervised learning and why we need it?

What is self-supervised learning?

Self-supervised learning (SSL):
- a special type of representation learning that enables learning good data representation from unlablled dataset
Motivation :
- the idea of constructing supervised learning tasks out of unsupervised datasets
- Why?
  
  ✅ Data labeling is expensive and thus high-quality dataset is limited
  
  ✅ Learning good representation makes it easier to transfer useful information to a variety of downstream tasks $\Rightarrow$ e.g. Few-shot learning / Zero-shot transfer to new tasks

Self-supervised learning tasks are also known as pretext tasks

What’s Possible with Self-Supervised Learning?

Video Colorization (Vondrick et al 2018)
- a self-supervised learning method
- resulting in a rich representation
- can be used for video segmentation + unlabelled visual region tracking, without extra fine-tuning
- just label the first frame
Zero-shot CLIP (Radford et al. 2021)
- Despite of not training on supervised labels
- Zero-shot CLIP classifier achieve great performance on challenging image-to-text classification tasks

Early Work

Precursors 先驱者 to recent self-supervised approaches

Early Work: Connecting the Dots

Some ideas:

Restricted Boltzmann Machines
Autoencoders
Word2Vec
Autogressive Modeling
Siamese networks
Multiple Instance / Metric Learning

Restricted Boltzmann Machines

RBM:
- a special case of markov random field
- consisting of visible units and hidden units
- has connections between any pair across visible and hidden units, but not within each group

Autoencoder: Self-Supervised Learning for Vision in Early Days

Autoencoder: a precursor to the modren self-supervised approaches
- Such as Denoising Autoencoder
Has inspired many self-learning approaches in later years
- such as masked language model (e.g. BERT), MAE

Word2Vec: Self-Supervised Learning for Language

Word Embeddings to map words to vectors
- extract the feature of words
idea:
- the sum of neighboring word embedding is predictive of the word in the middle

An interesting phenomenon resulting from word2Vec:
- you can observe linear substructures in the embedding space where the lines connecting comparable concepts such as the corresponding masculine and feminine words appear in roughly parallel lines

Autoregressive Modeling

Autoregressive model:
- Autoregressive (AR) models are a class of time series models in which the value at a given time step is modeled as a linear function of previous values
- NADE: Neural Autogressive Distribution Estimator
Autogressive model also has been a basis for many self-supervised methods such as gpt

Siamese Networks

Many contrastive self-supervised learning methods use a pair of neural networks and learned from their difference
– this idea can be tracked back to Siamese Networks

Self-organizing neural networks
- where two neural networks take seperate but related parts of the input, and learns to maximize the agreement between the two outputs
Siamese Networks
- if you believe that one network F can well encode x and get a good representation f(x)
- then, 对于两个不同的输入x1和x2，their distance can be d(x1,x2) = L(f(x1),f(x2))
- the idea of running two identical CNN on two different inputs and then comparing them —— a Siamese network
- Train by:
  
  ✅ If xi and xj are the same person, $∣∣ f (x i) - f (x j)$ is small
  
  ✅ If xi and xj are the different person, $∣∣ f (x i) - f (x j)$ is large

Multiple Instance Learning & Metric Learning

Predecessors of the predetestors of the recent contrastive learning techniques : multiple instance learning and metric learning

deviate frome the typical framework of empirical risk minimization
- define the objective function in terms of multiple samples from the dataset $\Rightarrow$ multiple instance learning
ealy work:
- around non-linear dimensionality reduction
- 如multi-dimensional scaling and locally linear embedding
- better than PCA: can preserving the local structure of data samples
metric learning:
- x and y: two samples
- A: A learnable positive semi-definite matrix
contrastive Loss:
- use a spring system to decrease the distance between the same types of inputs, and increase between different type of inputs
Triplet loss
- another way to obtain a learned metric
- defined using 3 data points
- anchor, positive and negative
- the anchor point is learned to become similar to the positive, and dissimilar to the negative
N-pair loss:
- generalized triplet loss
- recent 对比学习就以 N-pair loss 为原型

Methods

self-prediction
Contrastive learning

Methods for Framing Self-Supervised Learning Tasks

Self-prediction: Given an individual data sample, the task is to predict one part of the sample given the other part
- 即 “Intra-sample” prediction

The part to be predicted pretends to be missing

Contrastive learning: Given multiple data samples, the task is to predict the relationship among them
- relationship: can be based on inner logics within data
  
  ✅ such as different camera views of the same scene
  
  ✅ or create multiple augmented version of the same sample

The multiple samples can be selected from the dataset based on some known logics (e.g., the order of words / sentences), or fabricated by altering the original version
即 we know the true relationship between samples but pretend to not know it

Self-Prediction

Self-prediction construct prediction tasks within every individual data sample
- to predict a part of the data from the rest while pretending we don’t know that part
- The following figure: demonstrate how flexible and diverse the options we have for constructing self-prediction learning tasks
  
  ✅ can mask any dimensions
分类：
- Autoregressive generation
- Masked generation
- Innate relationship prediction
- Hybrid self-prediction

Self-prediction: Autoregressive Generation

The autoregressive model predicts future behavior based on past behavior
- Any data that comes with an innate sequential order can be modeled with regression
Examples :
- Audio (WaveNet, WaveRNN)
- Autoregressive language modeling (GPT, XLNet)
- Images in raster scan (PixelCNN, PixelRNN, iGPT)

Self-Prediction: Masked Generation

mask a random portion of information and pretend it is missiing, irrespective of the natural sequence
- The model learns to predict the missing portion given other unmasked information
e.g.,
- predicting random words based on other words in the same context around it
Examples :
- Masked language modeling (BERT)
- Images with masked patch (denoising autoencoder, context autoencoder, colorization)

Self-Prediction: Innate Relationship Prediction

Some transformation (e.g., segmentation, rotation) of one data samples should maintain the original information of follow the desired innate logic
Examples
- Order of image patches
  
  ✅ e.g., shuffle the patches
  
  ✅ e.g., relative position, jigsaw puzzle
- Image rotation
- Counting features across patches

Self-Prediction: Hybrid Self-Prediction Models

Hybrid Self-Prediction Models: Combines different type of generation modeling

VQ-VAE + AR
- Jukebox (Dhariwal et al. 2020), DALL-E (Ramesh et al. 2021)
VQ-VAE + AR + Adversial
- VQGAN (Esser & Rombach et al. 2021)
- VQ-VAE: to learn a discrete code book of context rich visual parts
- A transformer model: trained to auto-aggressively modeling the color combination of this code book

Contrastive Learning

Goal:
- To learn such an embedding space in which similar sample pairs stay close to each other while dissimilar ones are far apart
对比学习 can be applied to both supervised and unsupervised settings
- when working with unsupervised data, 对比学习 is one of the most powerful approach in the self-supervised learning
Category
- Inter-sample classification
  
  the most dominant approach
  
  ✅ “inter-smaple”: emphasize or distinguish it from “intra-sample”
- Feature clustering
- Multiview coding

Contrastive Learning: Inter-Sample Classification

Given both similar (“positive”) and dissimilar (“negative”) candidates, to identify which ones are similar to the anchor data point is a classification task
- anchor: the original input
How to construct a set of data point candidates:
- The original input and its distorted version
- Data that captures the same target from different views
Common loss functions :
- Contrastive loss, 2005
- Triplet loss, 2015
- Lifted structured loss, 2015
- Multi-class n-pair loss, 2016
- Noise contrastive estimation, 2010
- InfoNCE, 2018
- Soft-nearest neighbors loss, 2007, 2019

Loss function 1: Contrastive loss

2005
Works with labelled dataset
Encoder data into an embedding vector
- such that examples from the same class have similar embeddings and samples from different classes have different ones
Given two labeled data pairs $x_i,y_i)$ and $x_j,y_j)$ :

Loss function 2: Triplet loss

Triplet loss (Schroff et al. 2015)
- learns to minimize the distance between the anchor $x$ and positive $x +$ and
- maximize the distance between the anchor $x$ and negative $x -$ at the same time
Given a triplet input $x, x^{+}, x^{-})$

Triplet (三胞胎) loss: because it demands an input triplet containing one anchor, one positive and one negative

Loss function 3: N-pair loss

N-Pair loss (Sohn 2016)
- generalizes triplet loss to include comparison with multiple negative samples
Given oen positive and N-1 negative samples:
- ${x,x^{+},x_{1}^{-},...,x_{N-1}^{-}\}$

Loss function 4: Lifted structured loss

Lifted structured loss (Song et al. 2015):
- utilizes all the pairwise edges within one training batch for better computational efficiency
对于大规模训练，batchsize经常非常大
- means we have many samples within one batch
- can construct multiple similar or dissimilar pairs
- Lifted structured loss: utilize all the paragraphs edges of relationship within one training batch
- improve compute efficiency as it incorporates more information within one batch

Loss function 5: Noise Contrastive Estimation (NCE)

Noise contrastive Estimation (NCE): Gutmann & Hyvarinen 2010
- runs logistic regression to tell apart the target data from noise
Given target sample distribution p and noise distribution q:
initially proposed to learn word embedding in 2010

Loss function 6: InfoNCE

InfoNCE (2018)
- Uses categorical cross-entropy loss to identify the positive sample amongst a set of unrelated noise samples
Given a context vector c, the positive sample should be drawn from the conditional distribution $(p (x ∣ c))$
- while N-1 negative samples are drawn from the proposal distribution p(x), independent from the context c
The probability of detecting the positive sample correctly is:

Loss function 7: Soft-Nearest Neighbors Loss

Soft-Nearest Neighbors Loss (Frosst et al. 2019): extends the loss function to include multiple positive samples given known labels
Given a batch of samples ${x_i,y_i\}|_{i=1}^B$
- known labels may come from supervised dataset or fabricated with data augmentation
- temperature term: tuning how concentrated the feature space

Contrastive Learning: Feature Clustering

Find similar data samples by clustering them with learned features
core idea : Use clustering algorithms to assign pseudo lables to samples such that we can run intra-sample contrastive learning
Examples:
- Deep Cluster (Caron et al 2018)
- Inter CLR (Xie et al 2021)

Contrastive Learning: Multiview Coding

Apply the InfoNCE objective to two or more different views of input data
Became a mainstream contrastive learning method
- AMDIM (Bachman et al 2019)
- Contrastive multiview coding (CMC, Tian et al 2019) 等

Contrastive Learning between Modalities

Views can be from paired inputs from two or more modalities
- CLIP (Radford et al 2021)、ALIGN (Jia et al 2021)：enables zero-shot classification, cross-modal retrieval, guided image generation
- CodeSearchNet (Husain et al 2019): contrast learning between text and code

Pretext tasks

Recap: Pretext Tasks

Step 1: Pre-train a model for a pretext task
Step 2: Transfer to applications

Pretext Tasks: Taxonomy

Generative
- VAE
- GAN
- Autoregressive
- Flow-based
- Diffusion
Self-Prediction
- Masked Prediction (Denoising AE, Context AE)
- Channel Shuffling (colorization, split-brain)
Innate Relationship
- Patch Positioning
- Image Rotation
- Feature Counting
- Contrastive Predictive Coding
Contrastive
- Instance Discrim
- Augmented Views
- Clustering-based

Image / Vision Pretext Tasks

Image Pretext Tasks: Varizational AutoEncoders

Auto-Encoding Variational Bayes (Kingma et al. 2014)
Image generation：
- itself is an immensely broad field that deserves an entire tutorial or more
- but can also serve as representation learning

Image Pretext Tasks: Generative Adversial Networks

Jointly train an encoder, additional to the usual GAN
- Bidirectional GAN
- Adversarially Learned Inference
GAN Inversion: learning encoder post-hoc and/or optimizing for given image

Vision Pretext Tasks: Autoregressive Image Generation

Neural autoregressive density estimation (NADE)
Pixel RNN, Pixel CNN
- Use RNN and CNN to predict values conditioned on the neighboring pixels
Image GPT
- Uses a transformer on discretized pixels and was able to obtain better representation than building of supervised approaches

Vision Pretext Tasks: Diffusion Model

Diffusion Modeling :
- Follows a Markov chain of diffusion steps to slowly add random noise to data
- and then learn to reverse the diffusion process to construct desired data samples from the noise

Vision Pretext Tasks: Masked Prediction

Denoising autoencoder (Vincent et al. 2008)
- Add noise = Randomly mask some pixels
- Only reconstruction loss
Context autoencoder (Pathak et al 2016)
- Mask a random region in the image
- Reconstruction loss + adversial loss
- adversial loss: tries to make it difficult to distingusih between the painting produced by the model and the actual image

Vision Pretext Tasks: Colorization and More

can not only be on the pixel value itself, but also on any subset of information from the image

Image Colorization
- Predict the binned CIE Lab color space given a grayscale image
Split-brain autoencoder
- Predict a subset of color channels from the rest of channels
- Channels: luminosit, color, depth, etc.

In order to get representation that transfer well to downstream tasks

Vision Pretext Tasks: Innate Relationship Prediction

Learn the relationship among image patches:
- Predict relative positions between patches
- Jigsaw Puzzle using patches

RotNet: predict which rotation is applied (Gidaris et al. 2018)
- Rotation does not alter the semantic content of an image
Representation Learning by Learning to Count (Noroozi et al. 2017)
- Counting features across patches without labels, using equivariance of counts
- ie, learns a function that counts visual primitives in images

Contrastive Predictive Coding and InfoNCE

Contrastive Predictive Coding (CPC) (van den Oord et al 2018)
- Classify the “future” representation amongst a set of unrelated “negative” samples
- an autoregressive context predictor is used to classify the correct future patches

minimizing the loss function 等价于 maxmizing a lower bound to the mutual information between the predicted context $c_t$ and the future patch $x_{t+k}$

相当于预测的数据的latent representation最准确

CPC has been highly influential in contrastive learning

showing the effectiveness of causing the problem as an entire sample classification task

Vision Pretext Tasks: Inter-Sample Classification

Example CNN
Instance-level discrimination
- Each istance is a distinct calss of its own
  
  # classes = # training samples
- Non-parametric softmax that compares features
- Memory bank for stroing representations of past samples $V=V\{i\}$

The model learns to scatter the feature vectors in the hypersphere while mapping visually similar images into closer regions

Vision Pretext Tasks: Contrastive Learning

Common approach:
- Positive: make multiple views to one images and consider the image and its distorted version as similar pairs
- Negative: different images are treated dissimilar

一个自然的问题：Is there better ways to creat multiview images? $\downarrow$

Vision Pretext Tasks: Data Augmentation and Multiple Views

Augment Multiscale Deep InfoMax
- AMDIM, Bachman 2019
- Views from different augmentations
- create multiple views from one input image
Contrastive Multiview Coding
- CMC
- Multiple views from different channels or semantic segmentation labels of the image as different views from a single image
Pretext-Invariant Representation Learning
- Jigsam transformation
- (as an input transform)

Vision Pretext Tasks: Inter-Sample Classification

MoCo

MoCo (Momentum Contrast; He et al. 2019)
- Memory bank is a FIFO queue now
- The target features are encoded using a momentum encoder $\Rightarrow$ 一个batch付出很小的代价即可获得更多的negative samples
- shuffling BN: 缓解BN对self-supervised learning的不利影响
  - MOCO v2:
- MLP projection head
- stronger data augmentation （添加了模糊）
- Cosine learning rate schedule
MoCo v3:
- Use Vision Transformer to replace ResNet
- in-batch negatives

SimCLR

SimCLR (Simple framework for Contrastive Learning of visual Representation)
- Contrastive learning loss
- f() – base encoder
- g() – projection head layer
- In-batch negative samples
  
  ✅ Use large batches to have sufficient number of negative inputs

fully symmetric;

SimCLR v2
- Larger ResNet models
- Deeper g()
- Memory bank

Barlow Twins

Barlow Twins (Zbontar et al. 2021)
- Learn to make the cross-correlation matrix between two output features for two distorted version of the same sample close to the identity
- Make it as diagonal as possible
- because: if the individual features are efficiently encoded, they shouldn’t be encoding information that is redundant between any pairs $\Rightarrow$ their corrleation should be zero

Vision Pretext Tasks: Non-Contrastive Siamese Networks

Learn similarity representations for different augmented views of the same sample, but no contrastive component involving negative samples

the objective is just minimizing the L2 distance between features encoded from the same image
Bootstrap Your Own Latent (BYOL, et al. )
- Momentum-encoded features as the target
Simsiam (Chen 2020)
- No momentum encoder
- Large batch size unnecessary
BatchNorm seems to be playing an important role
- might implicityly providing contrastive learning signal

Vision Pretext Tasks: Feature Clustering with K-Means

another major technology for self-supervised learning:

to learn from clusters of features

DeepCluster (Caron et al. 2018)
- Iteratively clusters features via k-means
- then, uses cluster assignments as pseudo lables to provide supervised signals
Online DeepCluster (Zhan et al. 2020)
- Performs clustering and netwrok update simultaneously rather than alternatingly

Prototypical Cluster Learning (PCL, Li et al. 2020)
- Online EM for clustering
- combined with InfoNCE for smoothness

Vision Pretext Tasks: Feature Clustering with Sinkhorm-Knopp

Sinkhorm-Knopp: a cluster algorithm based on OT

SeLa (Self-Labelling, Asano et al. 2020)
SwAV (Swapping Assignments between multiple Views; Caron et al. 2020)
- Implicit clustering via a learned prototype code (“anchor clusters”)
- Predict cluster assignment in the other column

Vision Pretext Tasks: Feature Clustering to improve SSL

In this approach, nobel ideas based on clustering are designed to be used in conjunction with other SSL methods

InterCLR (Xie et al. 2020)
- Inter-sample contrastive pairs are constructed according to pseudo labels obtained by clustering
- 即让对比学习的正样本也可以来自不同的图片 (而不是只能通过Multi-view) using pseudolabels from an online k-means clustering
Divide and Contraset (Tian et al. 2021)
- Train expert models on the clustered datasets and then distill the experts into a single model
- to improve the performance of other self-supervised learning models

Vision Pretext Tasks: Nearest-Neighbor

NNCLR (Dwibedi et al. 2021)
- Contrast with the nearest neighbors in the embedding space
  
  ✅ to serve as the positive and negtive in contrastive learning
- Allows for lighter data augmentation for views

Vision Pretext Tasks: Combining with Supervised Loss

Combine supervised loss + self-supervised learning
- Self-supervised semi-supervised learning (S4L, Zhai et al 2019)
- Unsupervised data augmentation (UDA, Xie et al 2019)
Use known labels for contrastive learning
- Supervised Contrastive Loss (SupCon; Khosla et al. 2021)
  
  ✅ less sensitive to hyperparameter choices

Video Pretext Tasks

Video Pretext Tasks: Innate Relationship Prediction

Most image pretext tasks can be applied to videos
However, with an additional time dimension, much more information about the video shot configuration or the physical world can be extracted from videos
- Predicting object movements
- 3D motion of camera

Video Pretext Tasks: Optical Flow

Tracking object movement tracking in time

Tracking movement of image patches (Wang & Gupta, 2016)

Segmentation based on motion (Pathak et al. 2017)

Video Pretext Tasks: Sequence Ordering

Temporal order Verification
- Misra et al. 2016
- Fernando et al. 2017
- 判断顺序是否正确
Predict the arrow of time, forward or backward
- Wei et al. 2018
- classify whether the sequene is moving forward or backward in time
- outperform the temporal order verification model

Video Pretext Tasks: COlorization

Tracking emerges by colorizing videos (Vondrick et al. 2018)
- Copy colors from a reference frame to another target frame in grayscale
- by leverage the natural temporal coherence of colors across video frames
Tracking emerges by colorizing videos (Vondrick et al. 2018)
- Used for video segmentation or human pose estimation without fine-tuning
  
  ✅ because the model can move the colored markings in the labeled input image directly in the prediction

Video Pretext Tasks: Contrastive Multi-View Learning

TCN (Sermanet et al. 2017)
- Use triplet loss
- Different viewpoints at the same timestep of the same scene should share the same embedding, while embedding should vary in time, even of the same camera viewpoint
Multi-frame TCN (Dwibedi et al. 2019)
- Use n-pairs loss
- Multiple frames are aggregated into the embedding

Video Pretext Task: Autoregressive Generation

Because video files are huge, generating coherent continuous of video has been a difficult task

Predicting videos with VQ-VAE (Walker et al. 2021)
- first: learning to discretized the video into latent codes using VQ-VAE
- then: learning to auto regressively generate the frames using pixel cnn or transformers
- Combining VQ-VAE and autogressively models to generate high dimensional data $\Rightarrow$ is a very powerful generating model
VideoGPT: Video generation using VQ-VAE and Transformers (Yan et al. 2021)
Jukebox (Dhariwal et al. 2020)
- learning 3 different level of VQ-VAE using 3 different compression ratio
- resulting 3 sequence of discrete code
- then use them to generate new music

CALM (Castellon et al. 2021)
- Jukebox representation for MIR tasks
TagBox (Manilow et al. 2021)
- Source separation by steering Jukebox’ latent space

Audio Pretext Tasks

Audio Pretext Tasks: Contrastive Learning

COLA (Saeed et al. 2021)
- Assigns high similarity between audio clip extracted from the same recording and low similarity to clips from different recordings
- predicts a pari of encoded features are from the same recording or not
Multi-Format audio contrastive learning
- assigns high similarity between the raw audio format and the corresponding spectral representation
- maximizing agreement between between features included from the raw waveform and he spectrogram formats

Audio Pretext Task: Masked Languagee Modeling for ASR

ASR: Automatic speech recognition

Wav2Vec 2.0 (Baevski et al. 2020)
- applies contrast siblings on the representation of mask portion of the audio
  
  ✅ to learn discrete tokens from them
- speech recognition models: trained on these token, show better performance compared to those trained on conventional audio features / raw audio
HuBERT (Hsu et al. 2021, FAIR)
- learned by alternating between an offline cadence clustering step and optimizing for cluster assignment prediction (similar to deep cluster)
Also employed by SpeechStew (Chan et al. 2021), Big SSL (Zhang et al. 2021)

Multimodal Pretext Tasks

applied to multimodal data, although the difinition of self-supervised learning gets kind of blurry here depending on whether you consider a multi-modal dataset as single unlabeled dataset or as if one modality gives supervision to another modality

MIL-NCE (Miech et al. 2020)
- Find matching narration with video
- trained constrastively to find matching narration with video, which can not only use for correcting misalignment in videos but also for action recognition, text to video retrieval, action localization and action segmentation
CLIP (Radford et al. ), ALIGN (Jia et al. 2021)
- Contrast text and image embeddings from paired data

Language Pretext Tasks

Language Pretext Tasks: Generative Language Modeling

Pretrained language models:
- They all rely on unsupervised text and try to predict one sentence from the context
- only depend on the natural order of words and sequences
Some examples: changed the landscape of NLP research quite a lot
- GPT
  
  ✅ Autogressive;
  
  ✅ predict the next token based on the previous tokens
- BERT
  
  ✅ as a bi-directional transformer model
  
  ✅ Masked language modeling (MLM)
  
  ✅ Next sentence prediction (NSP) $\Rightarrow$ a binary classifier for telling whether one sentence is the next sentence of the other
- ALBERT
  
  ✅ Sentence order prediction (SOP) $\Rightarrow$ Positive sample: a pair of two consecutive segments from the same document; Negative sample: same as above but with the segment order switch
- ELECTRA
  
  ✅ Replaced token detection (RTD) $\Rightarrow$ random tokens are replaced and considered corrected, in parallel a binary discriminator is trained together with the generative model to predict whether each token has been replaced

Language Pretext Tasks: Sentence Embedding

Skip-thought vectors (Kiros et al. 2015)
- Predict sentences based on other sentences around
Quick-thought vectors (Logeswaran & Lee, 2018)
- Identify the correct context sentence among other contrastive sentences
IS-BERT (“Info-Sentence BERT”; Zhang et al. 2020)
- matual information maximization
SimCSE (“Simple Contrastive learning of Sentence Embeddings”; Gao et al. 2021)
- Predict a sentence from itself with only dropout noise
- One sentence gets two different versions of dropout augmentations

Most of the models for learning sentence embedding relies on supervised NLI (Natural Language Inference) datasets, such as SBERT (Reimers & Gurevych 2019), BERT-flow

Unsupervised sentence embedding models (e.g., unsupervised SimCSE) still have performance gap with the supervised version (e.g., supervised SimCSE)

Training Techniques

Data augmentation
In-batch negatives samples
Hard negative mining
Memory bank
Large batchsize

contrastive learning can provide good results in terms of transfer performance

Techniques: Data augmentation

Data augmentation setup is critical for learning good embedding
- and generalizable embdding features
方法：
- Introduces the non-essential variations into examples without modifying semantic meanings
- $\Rightarrow$ thus encourages the model to learn the essential part within the representation

image augmentation; text augmentation

Techniques: Data augmentation – Image Augmentation

Basic Image Augmentation:
- Random crop
- color distortion
- Gaussian blur
- color jittering
- random flip / rotation
- etc.
Augmentation Strategies
- AutoAugment (Cubuk, et al. 2018): Inspired by NAS
- RandAugment (Cubuk et al. 2019): reduces NAS search space in AutoAugment
- PBA (Population based augmentation; Ho et al. 2019): evolutionary algorithms
- UDA (Unsupervised Data Augmentation ,Xie et al. 2019): select augmentation strategy to minimize the KL divergencec between the predicted distribution over an unlabelled example and its unlabelled augmented version
Image mixture
- Mixup (Zhang et al. 2018): weighted pixel-wise combination of two images
  
  ✅ to create new sampls based on existed ones
- Cutmix (Yun et al 2019): mix in a local region of one image into the other
- MoCHi (Mixing of Contrastive Hard Negatives): mixture of hard negative samples
  
  ✅ explicitly maintains a queue of some number of negative samples sorted by similarity to the query in descending order $\Rightarrow$ the first couple samples in the queue should be the hardest and negative samples $\Rightarrow$ then new hard negative can be created by mixing images in this queue together or even with the query

Techniques: Data augmentation – Text Augmentation

Lexical (词汇的) Edits.
- (Just changing the words or tokens)
- EDA (Easy Data Augmentation; Wei & Zhou 2019): Synonym replacement, random insertion / swap / deletion
- Contextual Augmentation (Kobayashi 2018): word substition by BERT prediction
  
  ✅ try to find the replacement words using a bi-directional language model
Back-translation (Sennrich et al. 2015)
- augments by first translating it to another language and then translating it back to the original language
  
  ✅ depends on the translation model $\Rightarrow$ the meaning should stay largely unchanged
- CERT (Fang et al. 2020) generates augmented sentences via back-translation
Dropout and Cutoff
- SimCSE uses dropout (Gao et al. 2021)
  
  ✅ drouput: a universal way to apply transformnation on any input
  
  ✅ SimCSE: use drouput to creat different copies of the same text $\Rightarrow$ universial because it doe not need expert knowledege about the attributes of this input modality (it is changes on the architecture level)
- Cutoff augmentation for text (Shen et al. 2020)
  
  ✅ masking random selected tokens, feature columns, spans

Hard Negative Mining

What is “hard negative mining”

Hard negative samples are different to learn
- They should have different labels from the anchor samples
- But the embedding features may be very close
Hard negative mining is important for contrastive learning
Challenging negative samples encourages the model to learn better representations that can distinguish hard negatives from true positives

Explicit hard negative mining

Extract task-specific hard negative samples from labelled datasets
- e.g., “contradiction” sentence pairs from NLI datasets.
- (Most sentence embedding papers)
Keyword based retrieval
- can be found by classic information retrieval models (Such as BM25)
Upweight the negative sample probability to be proportional to its similarity to the anchor sample
MoCHi: mine hard negative by sorting them according to similarity to the query in descending order

Implicit hard negative mining

In-batch negative samples
Memory bank (Wu et al. 2018, He et al. 2019)
- Increase batch size
Large batch size via various training parallelism

Need large batchsize

Theories

Why does contrastive learning work?

Contrastive learning captures shared information betweem views

InfoNCE (van den Oord et al. 2018)
- is a lower bound to MI (Mutual information) between views:
Minimizing InfoNCE leads to maximizing the MI between view1 and view2
- 因此，minimizing the inforNCE loss $\Rightarrow$ the encoder are optimizing the embedding space to retain as much information as possible that exsited between the two views
- The info max principle in contrastive learning
Q: How can we design good views?
- augmentations are crucial for the performance

The InfoMin Principle

Optimal views are at the sweet spot where it only encodes useful informnation for transfer
- Minimal sufficient encoder depends on downstream tasks (Tian et al. 2020)
- Composite loss for finding the sweet spot (Tsai et al. 2020)
  
  ✅ helps converging to a minimal sufficient encoder

To perform well in transfer learning $\Rightarrow$ we want our model to capture the mutual information between the data x and the downstream label y $I (x; y)$

if the mutual information between the views ( $I(v_1; v_2)$ ) is smaller than $I (x; y)$ $\Rightarrow$ the model would fail to capture useful information for the downstream tasks

Meanwhile, if the mutual information between the views are too large $\Rightarrow$ would have excess information that is unrelated to the downstream tasks $\Rightarrow$ the transfor performance would decrease due to the noise

$\Rightarrow$ there is a sweet spot $\Rightarrow$ the minimal sufficient encoder

This shows:
- The optimal views are dependent on the downstream tasks

Alignment and Uniformity on the Hypersphere

Contrastively learned features are more uniform and aligned
- Uniform : features should be distributed uniformly on the hypershere $S^d$
- Aligned : features from two views of the same input should be the same

compared with random initialized network or a network trained with the supervised learning

also measured the alignment measuring how close the distance between features from two views of the same input is

Dimensional Collapse

Contrastive methods sometimes suffer from dimensional collapse (Hua et al. 2021)
- Features span lower-dimensional subspace instead
- (Learned features span lower dimensional subspace instead of using the full dimensionality)
Two causes demonstrated by Jing et al (2021)
- 1 Strong augmentation while creating the views
- 2 implicit regularization caused by the gradient decent dynamics

Provable Guarantees for Contrastive Learning

Sampling complexity decreases when:
- Adopting contrastive learning objectives (Arora et al. 2019)
- Predicting the known distribution in teh data (Lee et al. 2020)
Linear classifier on learned representation is nearly optimal (Tosh et al. 2021)
Spectral Contrastive Learning (HaoChen et al. 2021)
- based on a spectral decomposition of the augmentation graph

总之，对比学习理论起到了很大作用，但仍有很长的路要走

Feature directions

briefly discuss a few open research questions and areas of work to look into

Future Directions

Large batch size $\Rightarrow$ improved transfer performance
High-quality large data corpus $\Rightarrow$ better performance
- Learning from synthetic or Web data
- Measuring dataset quality and filtering / active learning $\Rightarrow$ better control over data quality
Efficient negative sample selection
- to do hard negative mining
- (lage batchsize is not enough because batchsize cannot go to infinity)
Combine multiple pretext tasks
- How to combine
- Best strategies
Data augmentation tricks have critical impacts but are still quite ad-hoc
- Modality-dependent: 大多数增强方法仅适用于单个modality $\Rightarrow$ most of them are handcrafted by human
- Theoretical foundations
  
  ✅ e.g., on why certain augmentation works better than others
  
  ✅ to guide us to find more efficient data augmentation
Improving training efficiency
- Self-supervised learning methods are pushing the deep learning arms race (军备竞赛)
  
  ❌ increase of model size and training batch size
  
  ❌ $\Rightarrow$ leads to increase the cost both economically and environmentally
- Direct impacts on the economical and environmental costs
Social biases in the embedding space
- Early work in debiasing word embedding
- Biases in Dataset

你可能感兴趣的:(#,Deep,Learning,人工智能,机器学习,自监督学习,对比学习)

系统思考：结构性张力与创造性张力陈思杰_第五项修炼_系统思考
看到以下三张图，会不会有一些熟悉？在我之前的学习中，理解为创造性张力和结构性张力，我们时常想去改变自己，比如学英语、减肥、冥想、健身。我们有时候会觉得，这些大多数的改变行动，本质上都是自我安慰，当安慰作用达到之后，我们都倾向于懒懒地躺回沙发上继续做一个葛优。我们的生命中，似乎缺乏支撑着我们能够持续向前，不断改变的源源不竭的动力。这也是我们提到的结构影响行为，其实我们在一种结构里。我们大部分的思考方
飞算科技：以原创技术为翼，赋能产业数字化转型
在数字经济浪潮席卷全球的当下，一批专注于技术创新的中国企业正加速崛起，飞算数智科技（深圳）有限公司（简称“飞算科技”）便是其中的佼佼者。作为一家国家级高新技术企业，飞算科技以自主创新为核心驱动力，凭借互联网科技、大数据、人工智能等前沿技术，为各行业客户插上数字化转型的翅膀。飞算科技的定位清晰而坚定——自主创新型数字科技公司。这一定位不仅体现在其技术研发的方向上，更融入到为客户服务的每一个环节。无论
妈妈教的数学蛋卷426
学习心得听见数学我就头疼，可是听完课立马对数学有了兴趣，哈，神奇？人天生是爱学习的，天生具有好奇心？对于孩子，做好数学启萌很重要，用正确的方法让孩子爱上学习，同时不要害怕孩子出错，犯错是教育孩子最好的机会，我们要发现孩子出现问题的根本原因，是不是看不懂题目？语言理解的不对？还是这个知识点不懂，没学会？听完能拿来就用的方法，扳指头学习乘法表，今天就找来学习，教给孩子……又油然而生一种与孩子共成长的感
【崔律100天精时力训练营·学习日志·DAY93】迷猴桃sally
#崔律100天精时力训练营14.5#这是2019年12月13日之的学习日志。1.【知识】我在课程中的收获：◤学霸回顾◢@优美分享，自己对于每天进步一点点，一年下来，积累下来的效果是巨大的，这点再孩子学习和自己的人生规划上都很有效。——确实，这一年自己跟着崔律学习精时力，最大的变化之一就是自己可以跳出自己曾经的小圈子，看到未来自己想要的样子，然后再每天一点点的改变着，累了就休息一下，抬头看看远处的梦
警惕!北恒私募高级班周一丰，马建军不正规。不让出金,不能提现,大家远离骗局! 昌龙律法
随着互联网的普及，数字经济蓬勃发展，各种线上平台如雨后春笋般涌现。然而，在这些看似繁荣的平台中，不乏一些黑平台，它们以欺诈手段骗取用户的财产，给人们的财产安全带来严重威胁。因此，我们有必要提高警惕，防范黑平台诈骗。针对网上素未谋面的牛散大咖，经济学家等推荐网上投资理财、数字经济，数字体育市场，人工智能项目，数字低碳，慈善投票网站买数字的等等都是骗局，广大市民对此要提高警惕，若你也不幸被骗遇到此类平
【蜕变之路】—未来很美好，当下值得你拼尽全力 Sophia灵歌
图片发自App见学习内容:蜕变之路—社群创业必知的降龙十八掌学习时间:2019/8/14学习者:杨圆圆问:什么是社群创业？社群创业＝社群＋创业只需要一部手机，通过社群来实现创业的梦想。掌控社群底层逻辑，教你玩转赚钱游戏——金钱是工具，被你使用的工具。图片发自App如果不懂社群创业，你将会有以下三个坏处:①走得很慢②容易被淘汰③赚钱这条路上，你将变得很难很难如果懂得了社群创业，你会得到以下好处:①把
怎么用电脑兼职赚钱，普通人可做的6个副业项目高省爱氧惠
今天给大家分享五个正规靠谱的线上兼职副业平台，现在线上兼职已经成为越来越多人赚钱的新选择。这些平台可以提供了许多不同种类的工作，包括调查、写作、设计、数据输入等等。这些工作都可以在家中完成，同时，平台也为那些想要自由工作和拥有灵活工作时间的人提供了机会。毕竟，任何人都可以选择自己想要的工作，而不是被迫接受固定的工作时间和地点。有的平台也有助于提高人们的技能和专业知识。通过平台的研究学习，人们可以学
不断更新—平衡自我提升原则李男Nina
365天读书打卡第350天《高效能人士的7个习惯》作者：史蒂芬.柯维人生最值得的投资就是磨练自己，因为生活与服务人群都得靠自己，这是最珍贵的工具。工作本身并不能带来经济上的安全感，具备良好的思考，学习创造与适应能力，才能立于不败之地。拥有财富并不代表经济独立，拥有创造财富的能力才真正可靠。自我提升和完善的4个层面，包括：身体，精神，智力，社会情感。从根本上讲，不断更新意味着要兼顾这四种要素，要以睿
服务网关面试题分类整理 jarenyVO 面试题中间件面试
服务网关面试题分类整理文章目录服务网关面试题分类整理一、基础概念类1.什么是服务网关？它解决了什么问题？2.服务网关和负载均衡器有什么区别？二、核心功能类1.服务网关的五大核心功能是什么？2.动态路由是如何实现的？三、技术实现类1.主流网关技术选型对比2.网关如何实现高性能？四、高可用设计类1.如何保证网关自身的高可用？2.网关如何避免成为单点瓶颈？五、安全防护类1.网关如何实现API安全？2.如
《梳理》焦点中原团队杨静分享第46天（477+46）（约练挑战第12周第1次总第221次2021.03.23）空心郁金香幽香
221次约练分享:今早做的来访者。梳理了自己一直很纠结的学习的事情，如何把书读透。明确了先从有书共读的一些书开始，专业性少一些，更容易开得始。利用思维导图把内容进行梳理和细化，锻炼自己的概括能力。专业性强的书先过一遍再慢慢来。通过梳理把要读的书重点明确了，心理就不再纠结了，负担放下了。看来学习的事情也需要过一阵子梳理一次，让目标明确，重点突出。
学习人工智能开发的详细指南 Ws＿学习人工智能 python
一、引言人工智能（AI）开发是一个充满挑战与机遇的领域，它融合了数学、计算机科学、统计学、认知科学等多个学科的知识。随着大数据、云计算和深度学习技术的快速发展，AI已经成为推动社会进步和产业升级的关键力量。本文将为初学者提供一份详细的学习指南，帮助大家逐步掌握AI开发的核心技能。二、基础知识准备数学基础：线性代数：理解向量、矩阵、线性变换等基本概念，掌握矩阵运算和特征值分解等技巧。概率论与统计学：
2020-6-9晚间日记 Miss亚姐聊职业生涯成长
今天是什么日子起床：07:20就寝：23:00天气：晴心情：太阳任务清单今日完成的任务，最重要的三件事：1.组织架构调整拟定3.整理档案室+找档案4.解约函5.在职证明6.职级调整7.新建岗位+发录用8.入职办理学习·信息·阅读《跟汪涵学说话之道》阅读中～健康·饮食·锻炼早餐：燕麦片➕两片面包中餐：带饭晚餐：麦片➕酸奶工作·思考客户思维就是，怎么给对方呈现对方最容易理解，以及对方怎么最简单操作可以
计算机发展史：人工智能时代的智能变革与无限可能 jdlxx_dongfangxing 计算机发展史计算机发展史
在计算机发展的漫长进程中，人工智能时代的到来无疑是最具革命性的篇章之一。它使计算机从单纯的数据处理工具，进化为能够模拟、延伸和拓展人类智能的强大系统，对科学研究、经济发展、社会生活乃至人类文明的走向，都产生了深远且不可逆转的影响。从早期对智能机器的设想，到如今人工智能技术在全球范围内的广泛应用，这一领域经历了无数次理论突破、技术迭代与实践探索，正以前所未有的速度重塑着我们的世界。人工智能的起源与早
高中英语突破140分的学习方法，在120分遇到瓶颈，不是因为没天赋大咖看学习
本人高考英语143分，客观题部分满分。高中毕业进入同济德语专业，大学四年做了4年兼职高中英语培训老师。在这个过程中，发现英语高分生最容易卡在120多分，提不上去。●提分要点一：克服固执，接受新方法在上海四年，我最拿手的是给零基础的同学提分，40分起步的同学，我大部分可以提分到120-130，偶尔还能提分到140+。天生120分段的学生，虽然质地比零基础的同学好，但是他们普遍存在一些心理层面的问题。
《3-6岁儿童学习与发展指南》梦_e02a
1、能根据需要划出图形线条基本平滑。5-6岁的幼儿手部肌肉逐渐发育已经能够自如地控制手腕。2、能熟练地使用筷子。到了5-6岁幼儿手指小肌肉快速发展，大部分已经能够熟练地使用筷子了。3、能沿轮廓线剪出曲线构成的简单图形边线吻合且平滑。5-6岁的幼儿运用手指活动能力日益增强，他们可以灵活地使用使用剪刀。4、能够使用简单的劳动工具或用具。5-6岁幼儿的行为的积极性，主动性日渐增强，所以幼儿园里的活动要为
推荐项目： Few-Shot-Adversarial-Learning-for-face-swap 邱晋力
推荐项目：Few-Shot-Adversarial-Learning-for-face-swap去发现同类优质开源项目:https://gitcode.com/1、项目介绍Few-Shot-Adversarial-Learning-for-face-swap是一个基于PyTorch的开源实现，重演了三星AI实验室的一项前沿研究——“Few-ShotAdversarialLearningofReal
月黑黑风凛冽七八幺
有句话：“其实我不想熬夜，熬夜会让人变老，可是你喜欢熬夜，而我喜欢和你一起变老”图片发自App熬夜的节奏是年轻人带动的，无论下大雨吹大风，下冰雹，飘大雪，我自门窗紧闭于房中，耳不听窗外声，心不在门外事。尤其喜欢在夜晚做事情，我喜欢看朋友们的动态，我其实不是很关心只是在意他们过得怎么样？有没有升职加薪？有没有遇见真爱...我对这些报以祝福，我只是想给自己找些事干，不想让自己闲着，一看时间往往是十点钟
黑猫带你学UFS协议第1篇：全网最全UFS协议中文详讲，这份学习框架图，你值得拥有！！！（持续更新中...）黑猫学长呀黑猫带你学：UFS协议详解网络 ufs 存储芯片嵌入式手机
文/黑猫学长1作者想说笔者本人从事于存储芯片行业多年，对eMMC/UFS/SD等芯片有深入研究，协议尤甚。而今看来，UFS协议在整个存储产品中（包括U盘、SPI、SD卡，NM卡、emmc、SSD、flash颗粒等），属于最难梯队。对于嵌入式存储芯片来说，从最初大家熟悉的SD/TF卡，发展到emmc，再到如今的UFS，速率越来越快，性能越来越稳定。即使是最新的UFS产品，从问世到笔者写这篇文章（20
迈向大型推理模型：基于大型语言模型的强化推理综述（附教程） LLM大模型人工智能自然语言处理知识库本地化部署吴恩达大模型 RAG
语言长期以来被认为是人类推理的基本工具。大型语言模型（LLM）的突破激发了大量研究兴趣，推动了利用这些模型解决复杂推理任务的探索。研究人员通过引入“思维”这一概念——即一系列代表推理过程中的中间步骤的标记——超越了简单的自回归标记生成。这一创新范式使LLMs能够模仿复杂的人类推理过程，如树搜索和反思性思维。近年来，学习推理的趋势逐渐兴起，强化学习（RL）被应用于训练LLMs掌握推理过程。这种方法通
人生需要导师燕姐读书
现实生活中，总有一些人迷茫焦虑，没有方向感。他们日复一日，年复一年，过的都是同样的生活：守着单调乏味的工作，拿着屈指可数的工资，为一件衣服纠结很久，为一顿饭店考虑再三，看着别人能够出手阔绰，自己无比羡慕却忍着不肯说出口……这些人有时会八卦一下他人的“短处”来获得一点心理平衡，有时又慨叹自己没有机遇或者认可自己没有那能力，有时也会怨天尤人抱怨命运不公……其实，这类人往往不是不努力，不是不学习，也不是
有个地方我还没去过之新疆情绪化的大笨蛇
我以我心记真实文/情绪化的大笨蛇第[152]篇图片采自网络，向原作者致谢提到新疆，你第一个反应是什么？是甜蜜十足的哈密瓜、葡萄，滋味难忘的手抓饭、大盘鸡，还是新疆舞的风情十足，或者是新疆美女、新疆的美景？于我而言，以上都是。图片采自网络，向原作者致谢对一个吃货来说，新疆的美食具有一想到就会刺激我味蕾、地域风味鲜明的特点。还在播放的《中餐厅2》，赵薇有一道经典的菜式--手抓饭。每当看到她拿着菜刀剁羊
Python爬虫【四十五章】爬虫攻防战：异步并发+AI反爬识别的技术解密程序员_CLUB Python入门到进阶 python 爬虫人工智能
目录引言：当爬虫工程师遇上AI反爬官一、异步并发基础设施层1.1混合调度框架设计1.2智能连接池管理二、机器学习反爬识别层2.1特征工程体系2.2轻量级在线推理三、智能决策系统3.1动态策略引擎3.2实时对抗案例四、性能优化实战4.1全链路压测数据4.2典型故障处理案例五、总结：构建智能化的爬虫生态系统Python爬虫相关文章（推荐）引言：当爬虫工程师遇上AI反爬官在大数据采集领域，我们正经历着技
走进区块城市，开启你的元宇宙之旅！口碑信息传播者
随着科技的飞速发展，虚拟现实、区块链、人工智能等前沿技术逐渐融入我们的生活。在这个大背景下，元宇宙概念应运而生，成为全球关注的焦点。本文将带领读者走进区块城市，一探元宇宙的究竟，感受这个未来世界的魅力。探索未来，触碰无限可能！国内区块链元宇宙正引领一场前所未有的科技革命，现在正是您加入这场盛宴的最佳时机！在这里，您将亲身体验到一个全新的虚拟世界，感受与现实世界无缝对接的震撼体验。加入国内区块链元宇
管理好自己的健康，人生才能长赢初心在行动
吴妙|睿兮妈第三天阅读《终身学习》第一章总结——一直听到诺妈对这本书的推荐，对它非常期待。翻开这本书看了目录就知道它会带给我很大的帮助。作者选择健康问题作为第一章，是的，“身体是革命的本钱”。我们唯有保持健康的体魄才能做好接下来的每件事。第一天看就让我惊讶，首先作者的专研精神让我佩服。作者从四个方面来阐述他的观点——管理好自己的健康，人生才能长赢！1.在当今营养过剩的时代，营养不是我们的主要考虑因
“敢不敢”选择不一样的一片追星星的云
你敢不敢不顾所有人的想法、看法、态度去做一件，自己想做的事情？反正我是做了，这些年我也一直在做。这还要从初中毕业开始说起，在之前我是个学习很不错，老师同学也比较喜欢我。但在中招考试的教室里做着试卷的我，还依然不知道我以后想做什么，考试又为了什么，我好像没有什么喜欢的也没有擅长的，我觉得这一切都没有什么意思。图片发自App不知道是上天注定还是态度使然，我确实没考好，所以我有三个选择：（1）让家长找人
2022-04-28 阿诗玛_6209
姓名：赵丽娟【日精进打卡第1530天】【知～学习】《六项精进》大纲0遍，共407遍；《六项精进》通篇0遍，共172遍；《大学》0遍；共607遍【今年计划读10本书】《理解人性》36《“偷懒”的技术，财务excel》26《避税：无限接近但不逾越》082《经营者养成笔记》101《六项精进》ok《京瓷哲学》416-418【经典名句】不管做什么都不要急于回报。因为播种和收获不在同一个季节，中间隔着的一段时
读《饮食滋味》第一章2023-09-16 LEE婷
芳婷最近在《重家风，轻养育》共创一本书的社区认识一个叫晓彤的小女孩，她建立一个社群，一起来共读《饮食滋味》一般学习一边养生养身体，真的很不错，所以我就加入了。我很快在她的介绍下开始阅读《饮食滋味》，我一口气读了三章。今天我先来说说我读第一章的收获和感受。第一章的主题：“以人为本”还是“以食物为本”——中西方营养学的最大区别其实我在读这一章的时候我并没有很主要到中西方营养学的区别，但是给我感受最深的
After Effects 教程，如何在 After Effects 中使用3D 摄像机跟踪器效果？ Mac123123
欢迎观看AfterEffects中文版教程，小编带大家学习AfterEffects的基本工具和使用技巧，了解如何在AE中使用3D摄像机跟踪器效果。3D摄像机跟踪器效果可以分析一段视频，并确定用于拍摄镜头的原始相机的精确位置和移动。将使用这种效果将一些文本放置到场景中，使其看起来好像一直处于场景本身的环境中。选择「时间轴」中的「Runway.mov」图层，在「动画」菜单中选择「跟踪相机」。选择特效后
Vue 3.6 Alpha 深度解析：Vapor Mode 如何颠覆虚拟 DOM 时代给钱，谢谢！ vue3 前端 js vue.js javascript 前端 vue3.6 Vapor Mode
无虚拟DOM、原生级性能、渐进式迁移——Vue进入双运行时新纪元在最近的VueConf大会上，尤雨溪正式发布了Vue3.6Alpha版本，其中最引人瞩目的特性VaporMode（无虚拟DOM模式）标志着Vue在渲染引擎上的革命性突破。本文将深入解析其技术原理、性能表现和实践方案，并附带完整可运行的代码示例。一、为何需要VaporMode？传统虚拟DOM的瓶颈：-️虚拟节点创建与对比的运行时开销-内
AI人工智能领域知识图谱在文本分类中的应用技巧 AI天才研究院 AI大模型企业级应用开发实战人工智能知识图谱分类 ai
AI人工智能领域知识图谱在文本分类中的应用技巧关键词：知识图谱、文本分类、图神经网络、实体关系抽取、深度学习、自然语言处理、特征融合摘要：本文深入探讨了知识图谱在文本分类任务中的应用技巧。我们将从知识图谱的基本概念出发，详细分析如何将结构化知识融入传统文本分类流程，介绍最新的图神经网络方法，并通过实际案例展示知识增强型文本分类系统的构建过程。文章特别关注知识表示学习与文本特征的融合策略，以及在不同
redis学习笔记——不仅仅是存取数据 Everyday都不同 returnSource expire/del incr/lpush 数据库分区 redis
最近项目中用到比较多redis，感觉之前对它一直局限于get/set数据的层面。其实作为一个强大的NoSql数据库产品，如果好好利用它，会带来很多意想不到的效果。（因为我搞java，所以就从jedis的角度来补充一点东西吧。PS：不一定全，只是个人理解，不喜勿喷） 1、关于JedisPool.returnSource(Jedis jeids) 这个方法是从red
SQL性能优化-持续更新中。。。。。。 atongyeye oracle sql
1 通过ROWID访问表--索引你可以采用基于ROWID的访问方式情况,提高访问表的效率, , ROWID包含了表中记录的物理位置信息..ORACLE采用索引(INDEX)实现了数据和存放数据的物理位置(ROWID)之间的联系. 通常索引提供了快速访问ROWID的方法,因此那些基于索引列的查询就可以得到性能上的提高. 2 共享SQL语句--相同的sql放入缓存 3 选择最有效率的表
[JAVA语言]JAVA虚拟机对底层硬件的操控还不完善 comsci JAVA虚拟机
如果我们用汇编语言编写一个直接读写CPU寄存器的代码段，然后利用这个代码段去控制被操作系统屏蔽的硬件资源，这对于JVM虚拟机显然是不合法的，对操作系统来讲，这样也是不合法的，但是如果是一个工程项目的确需要这样做，合同已经签了，我们又不能够这样做，怎么办呢？那么一个精通汇编语言的那种X客，是否在这个时候就会发生某种至关重要的作用呢？ &n
lvs- real 男人50 LVS
#!/bin/bash # # Script to start LVS DR real server. # description: LVS DR real server # #. /etc/rc.d/init.d/functions VIP=10.10.6.252 host='/bin/hostname' case "$1" in sta
生成公钥和私钥 oloz DSA 安全加密
package com.msserver.core.util; import java.security.KeyPair; import java.security.PrivateKey; import java.security.PublicKey; import java.security.SecureRandom; public class SecurityUtil {
UIView 中加入的cocos2d，背景透明 374016526 cocos2d glClearColor
要点是首先pixelFormat:kEAGLColorFormatRGBA8，必须有alpha层才能透明。然后view设置为透明glView.opaque = NO;[director setOpenGLView:glView];[self.viewController.view setBackgroundColor:[UIColor clearColor]];[self.viewControll
mysql常用命令香水浓 mysql
连接数据库 mysql -u troy -ptroy 备份表 mysqldump -u troy -ptroy mm_database mm_user_tbl > user.sql 恢复表（与恢复数据库命令相同） mysql -u troy -ptroy mm_database < user.sql 备份数据库 mysqldump -u troy -ptroy
我的架构经验系列文章 - 后端架构 - 系统层面 agevs JavaScript jquery css html5
系统层面：高可用性所谓高可用性也就是通过避免单独故障加上快速故障转移实现一旦某台物理服务器出现故障能实现故障快速恢复。一般来说，可以采用两种方式，如果可以做业务可以做负载均衡则通过负载均衡实现集群，然后针对每一台服务器进行监控，一旦发生故障则从集群中移除；如果业务只能有单点入口那么可以通过实现Standby机加上虚拟IP机制，实现Active机在出现故障之后虚拟IP转移到Standby的快速
利用ant进行远程tomcat部署 aijuans tomcat
在javaEE项目中，需要将工程部署到远程服务器上，如果部署的频率比较高，手动部署的方式就比较麻烦，可以利用Ant工具实现快捷的部署。这篇博文详细介绍了ant配置的步骤（http://www.cnblogs.com/GloriousOnion/archive/2012/12/18/2822817.html），但是在tomcat7以上不适用，需要修改配置，具体如下： 1.配置tomcat的用户角色
获取复利总收入 baalwolf 获取
public static void main(String args[]){ int money=200; int year=1; double rate=0.1; &
eclipse.ini解释 BigBird2012 eclipse
大多数java开发者使用的都是eclipse，今天感兴趣去eclipse官网搜了一下eclipse.ini的配置，供大家参考，我会把关键的部分给大家用中文解释一下。还是推荐有问题不会直接搜谷歌，看官方文档，这样我们会知道问题的真面目是什么，对问题也有一个全面清晰的认识。 Overview 1、Eclipse.ini的作用 Eclipse startup is controlled by th
AngularJS实现分页功能 bijian1013 JavaScript AngularJS 分页
对于大多数web应用来说显示项目列表是一种很常见的任务。通常情况下，我们的数据会比较多，无法很好地显示在单个页面中。在这种情况下，我们需要把数据以页的方式来展示，同时带有转到上一页和下一页的功能。既然在整个应用中这是一种很常见的需求，那么把这一功能抽象成一个通用的、可复用的分页（Paginator）服务是很有意义的。 &nbs
[Maven学习笔记三]Maven archetype bit1129 ArcheType
archetype的英文意思是原型，Maven archetype表示创建Maven模块的模版，比如创建web项目，创建Spring项目等等. mvn archetype提供了一种命令行交互式创建Maven项目或者模块的方式， mvn archetype 1.在LearnMaven-ch03目录下，执行命令mvn archetype:gener
【Java命令三】jps bit1129 Java命令
jps很简单，用于显示当前运行的Java进程，也可以连接到远程服务器去查看 [hadoop@hadoop bin]$ jps -help usage: jps [-help] jps [-q] [-mlvV] [<hostid>] Definitions: <hostid>: <hostname>[:
ZABBIX2.2 2.4 等各版本之间的兼容性 ronin47
zabbix更新很快，从2009年到现在已经更新多个版本，为了使用更多zabbix的新特性，随之而来的便是升级版本，zabbix版本兼容性是必须优先考虑的一点客户端AGENT兼容 zabbix1.x到zabbix2.x的所有agent都兼容zabbix server2.4：如果你升级zabbix server，客户端是可以不做任何改变，除非你想使用agent的一些新特性。 Zabbix代理（p
unity 3d还是cocos2dx哪个适合游戏？ brotherlamp unity自学 unity教程 unity视频 unity资料 unity
unity 3d还是cocos2dx哪个适合游戏？问：unity 3d还是cocos2dx哪个适合游戏？答：首先目前来看unity视频教程因为是3d引擎，目前对2d支持并不完善，unity 3d 目前做2d普遍两种思路，一种是正交相机，3d画面2d视角，另一种是通过一些插件，动态创建mesh来绘制图形单元目前用的较多的是2d toolkit，ex2d，smooth moves，sm2，
百度笔试题：一个已经排序好的很大的数组，现在给它划分成m段，每段长度不定，段长最长为k，然后段内打乱顺序，请设计一个算法对其进行重新排序 bylijinnan java 算法面试百度招聘
import java.util.Arrays; /** * 最早是在陈利人老师的微博看到这道题： * #面试题#An array with n elements which is K most sorted，就是每个element的初始位置和它最终的排序后的位置的距离不超过常数K * 设计一个排序算法。It should be faster than O(n*lgn)。
获取checkbox复选框的值 chiangfai checkbox
<title>CheckBox</title> <script type = "text/javascript"> doGetVal: function doGetVal() { //var fruitName = document.getElementById("apple").value;//根据
MySQLdb用户指南 chenchao051 mysqldb
原网页被墙，放这里备用。 MySQLdb User's Guide Contents Introduction Installation _mysql MySQL C API translation MySQL C API function mapping Some _mysql examples MySQLdb
HIVE 窗口及分析函数 daizj hive 窗口函数分析函数
窗口函数应用场景：（1）用于分区排序（2）动态Group By （3）Top N （4）累计计算（5）层次查询一、分析函数用于等级、百分点、n分片等。函数说明 RANK() &nbs
PHP ZipArchive 实现压缩解压Zip文件 dcj3sjt126com PHP zip
PHP ZipArchive 是PHP自带的扩展类，可以轻松实现ZIP文件的压缩和解压，使用前首先要确保PHP ZIP 扩展已经开启，具体开启方法就不说了，不同的平台开启PHP扩增的方法网上都有，如有疑问欢迎交流。这里整理一下常用的示例供参考。一、解压缩zip文件 01 02 03 04 05 06 07 08 09 10 11
精彩英语贺词 dcj3sjt126com 英语
I'm always here 我会一直在这里支持你 &nb
基于Java注解的Spring的IoC功能 e200702084 java spring bean IOC Office
java模拟post请求 geeksun java
一般API接收客户端（比如网页、APP或其他应用服务）的请求，但在测试时需要模拟来自外界的请求，经探索，使用HttpComponentshttpClient可模拟Post提交请求。此处用HttpComponents的httpclient来完成使命。 import org.apache.http.HttpEntity ; import org.apache.http.HttpRespon
Swift语法之 ---- ?和!区别 hongtoushizi ?swift !
转载自： http://blog.sina.com.cn/s/blog_71715bf80102ux3v.html Swift语言使用var定义变量，但和别的语言不同，Swift里不会自动给变量赋初始值，也就是说变量不会有默认值，所以要求使用变量之前必须要对其初始化。如果在使用变量之前不进行初始化就会报错： var stringValue : String //
centos7安装jdk1.7 jisonami jdk centos
安装JDK1.7 步骤1、解压tar包在当前目录 [root@localhost usr]#tar -xzvf jdk-7u75-linux-x64.tar.gz 步骤2：配置环境变量在etc/profile文件下添加 export JAVA_HOME=/usr/java/jdk1.7.0_75 export CLASSPATH=/usr/java/jdk1.7.0_75/lib
数据源架构模式之数据映射器 home198979 PHP 架构数据映射器 datamapper
前面分别介绍了数据源架构模式之表数据入口、数据源架构模式之行和数据入口数据源架构模式之活动记录，相较于这三种数据源架构模式，数据映射器显得更加“高大上”。一、概念数据映射器（Data Mapper）：在保持对象和数据库（以及映射器本身）彼此独立的情况下，在二者之间移动数据的一个映射器层。概念永远都是抽象的，简单的说，数据映射器就是一个负责将数据映射到对象的类数据。 &nb
在Python中使用MYSQL pda158 mysql python
缘由　　近期在折腾一个小东西须要抓取网上的页面。然后进行解析。将结果放到数据库中。　　了解到 Python在这方面有优势，便选用之。　　由于我有台 server上面安装有 mysql，自然使用之。在进行数据库的这个操作过程中遇到了不少问题，这里记录一下，大家共勉。　　 python中mysql的调用　　百度之后能够通过MySQLdb进行数据库操作。
单例模式 hxl1988_0311 java 单例设计模式单件
package com.sosop.designpattern.singleton; /* * 单件模式：保证一个类必须只有一个实例，并提供全局的访问点 * * 所以单例模式必须有私有的构造器，没有私有构造器根本不用谈单件 * * 必须考虑到并发情况下创建了多个实例对象 * */ /** * 虽然有锁，但是只在第一次创建对象的时候加锁，并发时不会存在效率
27种迹象显示你应该辞掉程序员的工作 vipshichg 工作
1、你仍然在等待老板在2010年答应的要提拔你的暗示。 2、你的上级近10年没有开发过任何代码。 3、老板假装懂你说的这些技术，但实际上他完全不知道你在说什么。 4、你干完的项目6个月后才部署到现场服务器上。 5、时不时的，老板在检查你刚刚完成的工作时，要求按新想法重新开发。 6、而最终这个软件只有12个用户。 7、时间全浪费在办公室政治中，而不是用在开发好的软件上。 8、部署前5分钟才开始测试。