Summaries on Three Papers about Phenotyping (Lasko, Harutyunyan & Ho)

Lasko etal.(2013)

1. Procedure

Input (Sparse, noisy, irregular observations on serum uric acid concentration) -> Gaussian process regression (Transforming the raw data into a continuous longitudinal probability density) -> Autoencoder (The process of feature learning which uses 30-day continuous elements of input vector) -> Output (Learned features/phenotypes of the first layer and second layer)

2. Phenotypes (acid concentration in 30-day span):

a) Phenotypes of the first layer: Wi, each row in the weight matrix.

b) Types of features in the first layer: uphill/downhill, single/multiple-spot, short-/long-edge, mixed.

c) Phenotypes of the second layer: nonlinear combination of first-layer features


3. Evaluation:

a) Face validity: Features are continuous without any mandate. However, the regularization and sparsity constraints are required for this continuity (Can be wrong).

b) Population subtypes: Besides separating phenotypes of gout and leukemia, learned feature sets (first-layer and second-layer features, in contrast to expert engineered features) also show additional cluster structure by embedding features into two-dimensional space using t-SNE (Visualized in clusters).

c) Generalized discrimination performance for distinguishing gout and leukemia: Classifiers using logistic regression with four different feature sets: 1) first-layer; 2) second-layer; 3) expert-engineered; 4) sequence mean (baseline).



Harutyunyan et al. (2017)

1. Procedure:

Input (Time-series clinical observations (e.g. capillary refill rate, blood pressure, etc.) of ICU stays across 40,000 critical care patients. Patients who are neonatal, pediatric or with multiple ICU stays are excluded.) -> Output (Predicted vector of binary phenotype labels)

2. Phenotypes: 

25 common diseases which are classified into chronic, acute and mixed type.


xt : clinical observations at hour t

pi:k: vector of k binary phenotype labels. Phenotype matrix is only predicted at the last timestep T.

3. Evaluation: 

Multitask LSTM vs. single-task (linear regression with hand-engineered features and single-task LSTM)



Ho et al. (2014)

1. Procedure:

Input (Counts of con-occurrences of clinical measurement between various mode (parents*procedures*diagnoses)) -> Marble: Non-negative Poisson tensor decomposition to the data. -> Output: Tensor V which is used to define R candidate phenotypes (M=[C,V])

2. Evaluation: 

Similarity of Non-zeros between computed solution and actual solution

a) Simulated dataset:

b) Realistic HER dataset



Glickberg et al. (2018)

1. Procedure

Input (Disease (ICD-9 Code), procedure, lab tests & medication) -----> word2vec (regarding a sequence of medical concepts during a time interval as a sentence) ——> clinical embeddings ------> extract disease cohorts for each patient and get the distance for each disease (query by medical concepts) ——> average of all of the distances

你可能感兴趣的:(Summaries on Three Papers about Phenotyping (Lasko, Harutyunyan & Ho))