论文原文:GATE: Graph CCA for Temporal Self-Supervised Learning for Label-Efficient fMRI Analysis | IEEE Journals & Magazine | IEEE Xplore
英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用!
1. 省流版
1.1. 心得
1.2. 论文框架图
2. 论文逐段精读
2.1. Abstract
2.2. Introduction
2.3. Related Work
2.3.1. Disease Prediction on fMRI Data
2.3.2. GCNs for Disease Prediction on fMRI Data
2.3.3. Self-Supervised Learning
2.4. Method
2.4.1. Multi-View fMRI Dynamic Functional Connectivity Generation
2.4.2. Graph Embedding
2.4.3. Objective Function
2.5. Theoretical Motivation and Analysis on CCA Loss
2.6. Experiments
2.6.1. Experimental Setup
2.6.2. Results and Analysis
2.6.3. Ablation Study
2.7. Discussion
2.7.1. The Needs of Label Efficiency for fMRI
2.7.2. Graph Learning for Neuroimaging
2.7.3. Technical Contributions of Our Work
2.8. Conclusion
3. 知识补充
3.1. Transductive learning
3.2. Adam W optimizer
3.3. Unknown
4. Reference List
①They designed a self-supervised learning (SSL) structure to optimize GCNs, and it called Graph CCA for Temporal sElf-supervised learning on fMRI analysis ( GATE ).
②Traditional models are always relys on plenty of labeled data, and they might be influenced by mislabeled data
③Their training based on fMRI dynamic functional connectives (FC)
④They need to firstly train SSL on unlabeled fMRI population graph and fine-tune the results
spurious adj.虚假的;伪造的;谬误的;建立在错误的观念(或思想方法)之上的
①Sliding window method is widely used in dynamic FC capturing
②Previous works rely on time-consuming labeling
③Contrastive-based SSL, reconstruction-based SSL, and similarity-based SSL are three main SSL strategies categories. Similarity-based SSL is choosen for their approach.
④Challenge 1 for similarity-based SSL: the data augmentations. Obviously they required data with low coupling of labels and low of spurious features.
⑤Challenge 2: design the corresponding consistency loss function. Maximizing the consistency of correlated signals is needed.
⑥The authors firstly augmented fMRI data and generated two views from BOLD:
where each node denotes a subject and which used SSL to capture information. Then adopted GCN encoder to obtain their embedding matrices. Finally give a Canonical Correlation Analysis (CCA) analysis
⑦Contributions: 1) high efficiency, 2) tackling spurious labels in dynamic FC by self-designed GCN-based CCA regularization, 3) includes theoretical discussion, 4) ablation study.
(1)Medical imaging approach examples:
①Magnetic Resonance Imaging (MRI)
② Computed Tomography (CT)
③Positron Emission Tomography (PET)
(2)Structural MRI and functional MRI
①sMRI: nodes are anatomical connections between anatomical connections, edges are topology between them
②fMRI: nodes are functional regions of the brain, edges are correlations between nodes. Additionally, fMRI presents the dynamic changes in a short time
(1)Population graph-based models
①Classification based on population
②Nodes are subjects and edges are similarity between subjects
(2)Brain region graph based models
②Classification based on brain region
③Nodes are brain regions and edges are functional or structural connectives among brain regions
①Contrastive-based SSL: increase the similarity between local and global representations by tuning positive and negative sample pairs. Additionally, it mostly relies on negative samples. However, it is not suitable for small number of samples or small number of classes.
②Reconstruction-based SSL: transfer input with low dimensional features to high dimension
③Similarity-based SSL: benefit from the coupling between multiple views of the same data.
①Key components of GATE: 1) Dynamic FC augmentation, 2) GCN encoder, 3) Objective function
②Training procedure: 1) unsupervised pre-training, 2) fine-tuning of pre-training label
③The whole framework
Main characteristics are kept , but the predictions may vary in spurious features.
(1)Dynamic Functional Connectivity:
①Sliding window method is used for capturing temporal information
②BOLD signals , where denotes the number of brain Regions-Of-Interests (ROIs) in fMRI of the -th subject, denotes the length of the segment
③FC matrix is calculated by Pearson’s correlation between the matched BOLD segments of the paired ROIs
④Then they flatten the upper triangle matrix to
⑤Population graph , where denotes feature of an individual, denotes similarities between each subjects, and each node feature comes from the FC matrix
⑥Size of sliding window:
⑦The step of the sliding window:
(2)Step Window Augmentation (S-A):
①There are sub-segments where , hence is the set for one subject.
②S-A randomly select as the first view and find a neighbor as the other view
(3)Multi-Scale Window Augmentation (M-A):
①Choose two different window size:
②Then getting two views: and
①They adopt GCN as their encoder, the function of the -th layer:
where denotes the diagonal matrix of , is the weight matrix after training, is the feature matrix of all subjects
②Normalized views: and
①Reconstruction-based SSL may overfit to scattered noises
②They used GATE, which ignores negative sample and avoids reconstruct
③Maximize the correlation between each matrix
④Input-consistency regularization loss:
where is the dot product operator, denotes trade-off coefficient, denotes one of the view (a or b), denotes the embeddings matrices of view . What is more, the first part of this function is a regulation term that it keeps the relative activity of features in low dimension. And the second part is to ensure the irrelevance of each dimension.
⑤Replacing with identity matrix in order to fine-tune
⑥Activation: ELU, denoted as
①The adding of input-consistency regularization decrease the relevance of spurious features and true label and increase performance
②The CCA function:
where is a normalized non-linear embedding, is covariance matrix ()
③Connection between CCA and generalization error of downstream tasks:
where denotes representation operation, and are low rank approximation operator, Singular Value Decomposition (SVD) of is ,
④General theorem for non-linear CCA, which presents the approximation error:
where is the optimal function that can predict ,
⑥Upper bound of excess risk of downstream task:
(1)ABIDE dataset:
①Datasets: Autism brain imaging data exchange (ABIDE) I/II
②Object: health control (HC) vs. autism patient (classification 1)
③Samples: 485 ASD and 544 HCs in ABIDE
④Atlas: Bootstrap Analysis of Stable Cluster parcellation with 122 ROIs (BASC-122)
⑤Node: ROI
⑥Edge: Pearson’s correlation between the time series of BOLD signals of their ROIs
⑦Dimension: 7503 (.....................)
①Dataset: Frontotemporal dementia (FTD)
②Object: HC vs. dementia (classification 2)
③Samples: 86 HC and 95 FTD in FTD
④Pre-process: DPARSF
⑤Number of ROI: 116
(3)Graph Construction:
①Construct similarity graph with low-dimensional and discriminative features extracted from raw images, where denotes the number of nodes in the population graph. This approach mitigates the influence of noise, redundant features and the dimensionality curse brought by high-dimensional features.
②Then construct phenotypic graph matrix with gender, age or gene etc.
③Get initial graph
④Only keeping the top- edge features of one node
⑤Add diagonal matrix to ,
(4)Comparison Methods (adopt the same window):
①Methods without SSL: vanilla GCN, GAT, SAC-GCN
②Contrastive- based SSL: DGI, MVGRL
③Similarity-based SSL: BGRL, CCA-SSG
(5) Implementation Details:
①Optimizer: Adam W
②Learning rate: 0.001
④In S-A, is 30, is 15
⑤In M-A, and are randomly chosen from
⑥Labeled data: 20%, 206 in ABIDE, 36 in FTD
⑦Validation: 5-fold cross validation
(6)Performance Evaluation:
①Evaluation metrics: accuracy, area under the ROC curve (AUC), precision, recall, F1 score
②The higher the matrics, the better the performance
①Comparison table:
②Then they change the proportion of labeled data from 10 to 80:
(1)Effectiveness of Dynamic FC Augmentation:
MA and SA significantly enhance the performance:
(2)Effectiveness of Different SSL Strategies:
They compare Contrastive-based SSL (CL), Reconstruction-based SSL (Re) and their model, while changing object function in CL to InfoNCE loss with random selecting negative samples and in RE to MSE loss with extra decoder:
(3)Different Dimensional Embedding:
Chose the dimension from {16, 32, 64, 128, 256, 512, 1024}, the performances relatively reach peak when chose 256 for ABIDE and 128 for FTD:
Hence they chose 256 in all the experiments in that low dimensionality is lack of representation ability and high dimensionality will consume computational time
(4)Effectiveness of γ in the Objective Function:
γ in objective function tends to stabilize at values of 0.1-0.8:
(5)Effectiveness of Fine-Tuning and Graph:
GATE without fine-tuning or without graph (replace original by ) in SSL:
Fine-tuning is for obtaining correct labeled data, and graph structure is for providing common biomarkers
(6)Low-Rank Representation:
As low-provide common biomarkers, GATE is able to reduce the upper limit of excess risk for downstream tasks. Here is the comparison of GATE and vanilla GCN:
(7)Parameter Sensitivity Analysis:
They research whether GATE is sensitive to sliding-window parameters, such as window length, step sizes or gaps of multiple windows:
Exactly, the more the labeled data, the higher the accuracy. However, it is obviously big challenge of getting plenty of labeled images. Thus, designed GATE is able to achieve excellent performance under 20% labels that its accuracy is similar to vanilla GCN under 50%.
GATE shows better extraction of associations between subjects. Then it maximize the correlation.
①SSL strategy produces multiple coupled views of a fMRI BOLD signal
②Pre-processing and fine-tuning
GATE, which used in population graph, implementes high-precision functionality in small amounts of labeled data and noisy environments
相关链接:转导学习 transductive learning_TBYourHero的博客-CSDN博客
(1)在Adam优化器的基础上增加了weight decay正则化,相当于衰减了原先的权重
(2)相关链接:【优化器】(六) AdamW原理 & pytorch代码解析_Lcm_Tech的博客-CSDN博客
(1)corruption function
(2)feature collapse
Peng, L. et al. (2022) 'GATE: Graph CCA for Temporal Self-Supervised Learning for Label-Efficient fMRI Analysis', IEEE Transactions on Medical Imaging, vol. 42, issue. 2, pp. 391-402. doi: 10.1109/TMI.2022.3201974