MIMIC-iv官方SQL概念语句标注——mimic_derived模块部分信息

1.mmic-iv与mimic-iii的区别之一就是前者分了3个模块,mimic_icu,mimic_hosp和mimic_core, 而在学习过程中我们还可以观察到另外一个模块mimic_derived, 而且这个模块仅存在与谷歌云的mimic数据库中, 本地建立的数据库中虽然有这个模块,但是没有内容(需要自己根据官方提供的代码生成,也是现在我们要查看的内容)。
由于mimic_derived 参与了很多内容的查询,熟悉这个模块的内容还是很有必要的,但是官方对这个模块缺乏详细的介绍。今天通过官方提供的源代码查看理解mimic_derived中各个表格的内容,记录在这里,供大家参考。

注:仅包含了作者认为重要的表格,并非全部。

  • age,为患者入院(admission)时的年龄,存储在age列。
-- This query calculates the age of a patient on admission to the hospital. 

-- The columns of the table patients: anchor_age, anchor_year, anchor_year_group
-- provide information regarding the actual patient year for the patient admission, 
-- and the patient's age at that time.

-- anchor_year is a shifted year for the patient.
-- anchor_year_group is a range of years - the patient's anchor_year occurred during this range.
-- anchor_age is the patient's age in the anchor_year.
-- Example: a patient has an anchor_year of 2153,
-- anchor_year_group of 2008 - 2010, and an anchor_age of 60.
-- The year 2153 for the patient corresponds to 2008, 2009, or 2010.
-- The patient was 60 in the shifted year of 2153, i.e. they were 60 in 2008, 2009, or 2010.
-- A patient admission in 2154 will occur in 2009-2011, 
-- an admission in 2155 will occur in 2010-2012, and so on.

-- Therefore, the age of a patient = hospital admission time - anchor_year + anchor_age
SELECT 	
	ad.subject_id
	, ad.hadm_id
	, ad.admittime
	, pa.anchor_age
	, pa.anchor_year
	, DATETIME_DIFF(ad.admittime, DATETIME(pa.anchor_year, 1, 1, 0, 0, 0), YEAR) + pa.anchor_age AS age
FROM `physionet-data.mimic_core.admissions` ad
INNER JOIN `physionet-data.mimic_core.patients` pa
ON ad.subject_id = pa.subject_id
;
  • 令人惊喜的发现derived中的一个表格icustay_detail提供了多种结局的数据, 包括是否发生院内死亡(hospital_expire_flag)、住院时间(los_hospital)、住ICU时间(los_icu)以及是否是首次进入院,是否是首次进入ICU等,为提取部分患者结局变量提供了方便。
SELECT ie.subject_id, ie.hadm_id, ie.stay_id

-- patient level factors
, pat.gender, pat.dod

-- hospital level factors
, adm.admittime, adm.dischtime
, DATETIME_DIFF(adm.dischtime, adm.admittime, DAY) as los_hospital
, DATETIME_DIFF(adm.admittime, DATETIME(pat.anchor_year, 1, 1, 0, 0, 0), YEAR) + pat.anchor_age as admission_age
, adm.ethnicity
, adm.hospital_expire_flag
, DENSE_RANK() OVER (PARTITION BY adm.subject_id ORDER BY adm.admittime) AS hospstay_seq
, CASE
    WHEN DENSE_RANK() OVER (PARTITION BY adm.subject_id ORDER BY adm.admittime) = 1 THEN True
    ELSE False END AS first_hosp_stay

-- icu level factors
, ie.intime as icu_intime, ie.outtime as icu_outtime
, ROUND(DATETIME_DIFF(ie.outtime, ie.intime, HOUR)/24.0, 2) as los_icu
, DENSE_RANK() OVER (PARTITION BY ie.hadm_id ORDER BY ie.intime) AS icustay_seq

-- first ICU stay *for the current hospitalization*
, CASE
    WHEN DENSE_RANK() OVER (PARTITION BY ie.hadm_id ORDER BY ie.intime) = 1 THEN True
    ELSE False END AS first_icu_stay

FROM `physionet-data.mimic_icu.icustays` ie
INNER JOIN `physionet-data.mimic_core.admissions` adm
    ON ie.hadm_id = adm.hadm_id
INNER JOIN `physionet-data.mimic_core.patients` pat
    ON ie.subject_id = pat.subject_id

  • weight_duration, ICU期间体重的变化,体重是反应患者营养状况的重要因素,常被纳入到预测因素中。
-- This query extracts weights for adult ICU patients with start/stop times
-- if an admission weight is given, then this is assigned from intime to outtime
WITH wt_stg as
(
    SELECT
        c.stay_id
      , c.charttime
      , case when c.itemid = 226512 then 'admit'
          else 'daily' end as weight_type
      -- TODO: eliminate obvious outliers if there is a reasonable weight
      , c.valuenum as weight
    FROM `physionet-data.mimic_icu.chartevents` c
    WHERE c.valuenum IS NOT NULL
      AND c.itemid in
      (
          226512 -- Admit Wt
          , 224639 -- Daily Weight
      )
      AND c.valuenum > 0
)
-- assign ascending row number
, wt_stg1 as
(
  select
      stay_id
    , charttime
    , weight_type
    , weight
    , ROW_NUMBER() OVER (partition by stay_id, weight_type order by charttime) as rn
  from wt_stg
  WHERE weight IS NOT NULL
)
-- change charttime to intime for the first admission weight recorded
, wt_stg2 AS
(
  SELECT 
      wt_stg1.stay_id
    , ie.intime, ie.outtime
    , wt_stg1.weight_type
    , case when wt_stg1.weight_type = 'admit' and wt_stg1.rn = 1
        then DATETIME_SUB(ie.intime, INTERVAL '2' HOUR)
      else wt_stg1.charttime end as starttime
    , wt_stg1.weight
  from wt_stg1
  INNER JOIN `physionet-data.mimic_icu.icustays` ie
    on ie.stay_id = wt_stg1.stay_id
)
, wt_stg3 as
(
  select
    stay_id
    , intime, outtime
    , starttime
    , coalesce(
        LEAD(starttime) OVER (PARTITION BY stay_id ORDER BY starttime),
        DATETIME_ADD(outtime, INTERVAL '2' HOUR)
      ) as endtime
    , weight
    , weight_type
  from wt_stg2
)
-- this table is the start/stop times from admit/daily weight in charted data
, wt1 as
(
  select
      stay_id
    , starttime
    , coalesce(endtime,
      LEAD(starttime) OVER (partition by stay_id order by starttime),
      -- impute ICU discharge as the end of the final weight measurement
      -- plus a 2 hour "fuzziness" window
      DATETIME_ADD(outtime, INTERVAL '2' HOUR)
    ) as endtime
    , weight
    , weight_type
  from wt_stg3
)
-- if the intime for the patient is < the first charted daily weight
-- then we will have a "gap" at the start of their stay
-- to prevent this, we look for these gaps and backfill the first weight
-- this adds (153255-149657)=3598 rows, meaning this fix helps for up to 3598 stay_id
, wt_fix as
(
  select ie.stay_id
    -- we add a 2 hour "fuzziness" window
    , DATETIME_SUB(ie.intime, INTERVAL '2' HOUR) as starttime
    , wt.starttime as endtime
    , wt.weight
    , wt.weight_type
  from `physionet-data.mimic_icu.icustays` ie
  inner join
  -- the below subquery returns one row for each unique stay_id
  -- the row contains: the first starttime and the corresponding weight
  (
    SELECT wt1.stay_id, wt1.starttime, wt1.weight
    , weight_type
    , ROW_NUMBER() OVER (PARTITION BY wt1.stay_id ORDER BY wt1.starttime) as rn
    FROM wt1
  ) wt
    ON  ie.stay_id = wt.stay_id
    AND wt.rn = 1
    and ie.intime < wt.starttime
)
-- add the backfill rows to the main weight table
SELECT
wt1.stay_id
, wt1.starttime
, wt1.endtime
, wt1.weight
, wt1.weight_type
FROM wt1
UNION ALL
SELECT
wt_fix.stay_id
, wt_fix.starttime
, wt_fix.endtime
, wt_fix.weight
, wt_fix.weight_type
FROM wt_fix;
  • bg (blood gas), 血气分析,代码注释中说来源是hosp模块的labevents,部分指标也和ICU相关。有个地方存疑,血气指标皆取了最大值,不知道这是否符合医学常识, 注释中解释为“simply collapse rows”(仅是为了压缩列),不一定符合实际中的需求,注意。
-- The aim of this query is to pivot entries related to blood gases
-- which were found in LABEVENTS
WITH bg AS
(
select 
  -- specimen_id only ever has 1 measurement for each itemid
  -- so, we may simply collapse rows using MAX()
    MAX(subject_id) AS subject_id
  , MAX(hadm_id) AS hadm_id
  , MAX(charttime) AS charttime
  -- specimen_id *may* have different storetimes, so this is taking the latest
  , MAX(storetime) AS storetime
  , le.specimen_id
  , MAX(CASE WHEN itemid = 52028 THEN value ELSE NULL END) AS specimen
  , MAX(CASE WHEN itemid = 50801 THEN valuenum ELSE NULL END) AS aado2
  , MAX(CASE WHEN itemid = 50802 THEN valuenum ELSE NULL END) AS baseexcess
  , MAX(CASE WHEN itemid = 50803 THEN valuenum ELSE NULL END) AS bicarbonate
  , MAX(CASE WHEN itemid = 50804 THEN valuenum ELSE NULL END) AS totalco2
  , MAX(CASE WHEN itemid = 50805 THEN valuenum ELSE NULL END) AS carboxyhemoglobin
  , MAX(CASE WHEN itemid = 50806 THEN valuenum ELSE NULL END) AS chloride
  , MAX(CASE WHEN itemid = 50808 THEN valuenum ELSE NULL END) AS calcium
  , MAX(CASE WHEN itemid = 50809 and valuenum <= 10000 THEN valuenum ELSE NULL END) AS glucose
  , MAX(CASE WHEN itemid = 50810 and valuenum <= 100 THEN valuenum ELSE NULL END) AS hematocrit
  , MAX(CASE WHEN itemid = 50811 THEN valuenum ELSE NULL END) AS hemoglobin
  , MAX(CASE WHEN itemid = 50813 and valuenum <= 10000 THEN valuenum ELSE NULL END) AS lactate
  , MAX(CASE WHEN itemid = 50814 THEN valuenum ELSE NULL END) AS methemoglobin
  , MAX(CASE WHEN itemid = 50815 THEN valuenum ELSE NULL END) AS o2flow
  -- fix a common unit conversion error for fio2
  -- atmospheric o2 is 20.89%, so any value <= 20 is unphysiologic
  -- usually this is a misplaced O2 flow measurement
  , MAX(CASE WHEN itemid = 50816 THEN
      CASE
        WHEN valuenum > 20 AND valuenum <= 100 THEN valuenum 
        WHEN valuenum > 0.2 AND valuenum <= 1.0 THEN valuenum*100.0
      ELSE NULL END
    ELSE NULL END) AS fio2
  , MAX(CASE WHEN itemid = 50817 AND valuenum <= 100 THEN valuenum ELSE NULL END) AS so2
  , MAX(CASE WHEN itemid = 50818 THEN valuenum ELSE NULL END) AS pco2
  , MAX(CASE WHEN itemid = 50819 THEN valuenum ELSE NULL END) AS peep
  , MAX(CASE WHEN itemid = 50820 THEN valuenum ELSE NULL END) AS ph
  , MAX(CASE WHEN itemid = 50821 THEN valuenum ELSE NULL END) AS po2
  , MAX(CASE WHEN itemid = 50822 THEN valuenum ELSE NULL END) AS potassium
  , MAX(CASE WHEN itemid = 50823 THEN valuenum ELSE NULL END) AS requiredo2
  , MAX(CASE WHEN itemid = 50824 THEN valuenum ELSE NULL END) AS sodium
  , MAX(CASE WHEN itemid = 50825 THEN valuenum ELSE NULL END) AS temperature
  , MAX(CASE WHEN itemid = 50807 THEN value ELSE NULL END) AS comments
FROM mimic_hosp.labevents le
where le.ITEMID in
-- blood gases
(
    52028 -- specimen
  , 50801 -- aado2
  , 50802 -- base excess
  , 50803 -- bicarb
  , 50804 -- calc tot co2
  , 50805 -- carboxyhgb
  , 50806 -- chloride
  -- , 52390 -- chloride, WB CL-
  , 50807 -- comments
  , 50808 -- free calcium
  , 50809 -- glucose
  , 50810 -- hct
  , 50811 -- hgb
  , 50813 -- lactate
  , 50814 -- methemoglobin
  , 50815 -- o2 flow
  , 50816 -- fio2
  , 50817 -- o2 sat
  , 50818 -- pco2
  , 50819 -- peep
  , 50820 -- pH
  , 50821 -- pO2
  , 50822 -- potassium
  -- , 52408 -- potassium, WB K+
  , 50823 -- required O2
  , 50824 -- sodium
  -- , 52411 -- sodium, WB NA +
  , 50825 -- temperature
)
GROUP BY le.specimen_id
)
, stg_spo2 as
(
  select subject_id, charttime
    -- avg here is just used to group SpO2 by charttime
    , AVG(valuenum) as SpO2
  FROM mimic_icu.chartevents
  where ITEMID = 220277 -- O2 saturation pulseoxymetry
  and valuenum > 0 and valuenum <= 100
  group by subject_id, charttime
)
, stg_fio2 as
(
  select subject_id, charttime
    -- pre-process the FiO2s to ensure they are between 21-100%
    , max(
        case
          when valuenum > 0.2 and valuenum <= 1
            then valuenum * 100
          -- improperly input data - looks like O2 flow in litres
          when valuenum > 1 and valuenum < 20
            then null
          when valuenum >= 20 and valuenum <= 100
            then valuenum
      else null end
    ) as fio2_chartevents
  FROM mimic_icu.chartevents
  where ITEMID = 223835 -- Inspired O2 Fraction (FiO2)
  and valuenum > 0 and valuenum <= 100
  group by subject_id, charttime
)
, stg2 as
(
select bg.*
  , ROW_NUMBER() OVER (partition by bg.subject_id, bg.charttime order by s1.charttime DESC) as lastRowSpO2
  , s1.spo2
from bg
left join stg_spo2 s1
  -- same hospitalization
  on  bg.subject_id = s1.subject_id
  -- spo2 occurred at most 2 hours before this blood gas
  and s1.charttime between DATETIME_SUB(bg.charttime, INTERVAL 2 HOUR) and bg.charttime
where bg.po2 is not null
)
, stg3 as
(
select bg.*
  , ROW_NUMBER() OVER (partition by bg.subject_id, bg.charttime order by s2.charttime DESC) as lastRowFiO2
  , s2.fio2_chartevents
  -- create our specimen prediction
  ,  1/(1+exp(-(-0.02544
  +    0.04598 * po2
  + coalesce(-0.15356 * spo2             , -0.15356 *   97.49420 +    0.13429)
  + coalesce( 0.00621 * fio2_chartevents ,  0.00621 *   51.49550 +   -0.24958)
  + coalesce( 0.10559 * hemoglobin       ,  0.10559 *   10.32307 +    0.05954)
  + coalesce( 0.13251 * so2              ,  0.13251 *   93.66539 +   -0.23172)
  + coalesce(-0.01511 * pco2             , -0.01511 *   42.08866 +   -0.01630)
  + coalesce( 0.01480 * fio2             ,  0.01480 *   63.97836 +   -0.31142)
  + coalesce(-0.00200 * aado2            , -0.00200 *  442.21186 +   -0.01328)
  + coalesce(-0.03220 * bicarbonate      , -0.03220 *   22.96894 +   -0.06535)
  + coalesce( 0.05384 * totalco2         ,  0.05384 *   24.72632 +   -0.01405)
  + coalesce( 0.08202 * lactate          ,  0.08202 *    3.06436 +    0.06038)
  + coalesce( 0.10956 * ph               ,  0.10956 *    7.36233 +   -0.00617)
  + coalesce( 0.00848 * o2flow           ,  0.00848 *    7.59362 +   -0.35803)
  ))) as specimen_prob
from stg2 bg
left join stg_fio2 s2
  -- same patient
  on  bg.subject_id = s2.subject_id
  -- fio2 occurred at most 4 hours before this blood gas
  and s2.charttime between DATETIME_SUB(bg.charttime, INTERVAL 4 HOUR) and bg.charttime
  AND s2.fio2_chartevents > 0
where bg.lastRowSpO2 = 1 -- only the row with the most recent SpO2 (if no SpO2 found lastRowSpO2 = 1)
)
select
    stg3.subject_id
  , stg3.hadm_id
  , stg3.charttime
  -- raw data indicating sample type
  , specimen 
  -- prediction of specimen for obs missing the actual specimen
  , case
        when specimen is not null then specimen
        when specimen_prob > 0.75 then 'ART.'
      else null end as specimen_pred
  , specimen_prob

  -- oxygen related parameters
  , so2
  , po2
  , pco2
  , fio2_chartevents, fio2
  , aado2
  -- also calculate AADO2
  , case
      when  po2 is null
        OR pco2 is null
      THEN NULL
      WHEN fio2 IS NOT NULL
        -- multiple by 100 because fio2 is in a % but should be a fraction
        THEN (fio2/100) * (760 - 47) - (pco2/0.8) - po2
      WHEN fio2_chartevents IS NOT NULL
        THEN (fio2_chartevents/100) * (760 - 47) - (pco2/0.8) - po2
      else null
    end as aado2_calc
  , case
      when PO2 is null
        THEN NULL
      WHEN fio2 IS NOT NULL
       -- multiply by 100 because fio2 is in a % but should be a fraction
        then 100 * PO2/fio2
      WHEN fio2_chartevents IS NOT NULL
       -- multiply by 100 because fio2 is in a % but should be a fraction
        then 100 * PO2/fio2_chartevents
      else null
    end as pao2fio2ratio
  -- acid-base parameters
  , ph, baseexcess
  , bicarbonate, totalco2

  -- blood count parameters
  , hematocrit
  , hemoglobin
  , carboxyhemoglobin
  , methemoglobin

  -- chemistry
  , chloride, calcium
  , temperature
  , potassium, sodium
  , lactate
  , glucose

  -- ventilation stuff that's sometimes input
  -- , intubated, tidalvolume, ventilationrate, ventilator
  -- , peep, o2flow
  -- , requiredo2
from stg3
where lastRowFiO2 = 1 -- only the most recent FiO2
;
  • blood_differential, 血细胞分类
-- For reference, some common unit conversions:
-- 10^9/L == K/uL == 10^3/uL
WITH blood_diff AS
(
SELECT
    MAX(subject_id) AS subject_id
  , MAX(hadm_id) AS hadm_id
  , MAX(charttime) AS charttime
  , le.specimen_id
  -- create one set of columns for percentages, and one set of columns for counts
  -- we harmonize all count units into K/uL == 10^9/L
  -- counts have an "_abs" suffix, percentages do not

  -- absolute counts
  , MAX(CASE WHEN itemid in (51300, 51301, 51755) THEN valuenum ELSE NULL END) AS wbc
  , MAX(CASE WHEN itemid = 52069 THEN valuenum ELSE NULL END) AS basophils_abs
  -- 52073 in K/uL, 51199 in #/uL
  , MAX(CASE WHEN itemid = 52073 THEN valuenum WHEN itemid = 51199 THEN valuenum / 1000.0 ELSE NULL END) AS eosinophils_abs
  -- 51133 in K/uL, 52769 in #/uL
  , MAX(CASE WHEN itemid = 51133 THEN valuenum WHEN itemid = 52769 THEN valuenum / 1000.0 ELSE NULL END) AS lymphocytes_abs
  -- 52074 in K/uL, 51253 in #/uL
  , MAX(CASE WHEN itemid = 52074 THEN valuenum WHEN itemid = 51253 THEN valuenum / 1000.0 ELSE NULL END) AS monocytes_abs
  , MAX(CASE WHEN itemid = 52075 THEN valuenum ELSE NULL END) AS neutrophils_abs
  -- convert from #/uL to K/uL
  , MAX(CASE WHEN itemid = 51218 THEN valuenum / 1000.0 ELSE NULL END) AS granulocytes_abs

  -- percentages, equal to cell count / white blood cell count
  , MAX(CASE WHEN itemid = 51146 THEN valuenum ELSE NULL END) AS basophils
  , MAX(CASE WHEN itemid = 51200 THEN valuenum ELSE NULL END) AS eosinophils
  , MAX(CASE WHEN itemid in (51244, 51245) THEN valuenum ELSE NULL END) AS lymphocytes
  , MAX(CASE WHEN itemid = 51254 THEN valuenum ELSE NULL END) AS monocytes
  , MAX(CASE WHEN itemid = 51256 THEN valuenum ELSE NULL END) AS neutrophils

  -- other cell count percentages
  , MAX(CASE WHEN itemid = 51143 THEN valuenum ELSE NULL END) AS atypical_lymphocytes
  , MAX(CASE WHEN itemid = 51144 THEN valuenum ELSE NULL END) AS bands
  , MAX(CASE WHEN itemid = 52135 THEN valuenum ELSE NULL END) AS immature_granulocytes
  , MAX(CASE WHEN itemid = 51251 THEN valuenum ELSE NULL END) AS metamyelocytes
  , MAX(CASE WHEN itemid = 51257 THEN valuenum ELSE NULL END) AS nrbc

  -- utility flags which determine whether imputation is possible
  , CASE
    -- WBC is available
    WHEN MAX(CASE WHEN itemid in (51300, 51301, 51755) THEN valuenum ELSE NULL END) > 0
    -- and we have at least one percentage from the diff
    -- sometimes the entire diff is 0%, which looks like bad data
    AND SUM(CASE WHEN itemid IN (51146, 51200, 51244, 51245, 51254, 51256) THEN valuenum ELSE NULL END) > 0
    THEN 1 ELSE 0 END AS impute_abs

FROM mimic_hosp.labevents le
WHERE le.itemid IN
(
    51146, -- basophils
    52069, -- Absolute basophil count
    51199, -- Eosinophil Count
    51200, -- Eosinophils
    52073, -- Absolute Eosinophil count
    51244, -- Lymphocytes
    51245, -- Lymphocytes, Percent
    51133, -- Absolute Lymphocyte Count
    52769, -- Absolute Lymphocyte Count
    51253, -- Monocyte Count
    51254, -- Monocytes
    52074, -- Absolute Monocyte Count
    51256, -- Neutrophils
    52075, -- Absolute Neutrophil Count
    51143, -- Atypical lymphocytes
    51144, -- Bands (%)
    51218, -- Granulocyte Count
    52135, -- Immature granulocytes (%)
    51251, -- Metamyelocytes
    51257,  -- Nucleated Red Cells

    -- wbc totals measured in K/uL
    51300, 51301, 51755
    -- 52220 (wbcp) is percentage

    -- below are point of care tests which are extremely infrequent and usually low quality
    -- 51697, -- Neutrophils (mmol/L)

    -- below itemid do not have data as of MIMIC-IV v1.0
    -- 51536, -- Absolute Lymphocyte Count
    -- 51537, -- Absolute Neutrophil
    -- 51690, -- Lymphocytes
    -- 52151, -- NRBC
)
AND valuenum IS NOT NULL
-- differential values cannot be negative
AND valuenum >= 0
GROUP BY le.specimen_id
)
SELECT 
subject_id, hadm_id, charttime, specimen_id

, wbc
-- impute absolute count if percentage & WBC is available
, ROUND(CASE
    WHEN basophils_abs IS NULL AND basophils IS NOT NULL AND impute_abs = 1
        THEN basophils * wbc
    ELSE basophils_abs
END, 4) AS basophils_abs
, ROUND(CASE
    WHEN eosinophils_abs IS NULL AND eosinophils IS NOT NULL AND impute_abs = 1
        THEN eosinophils * wbc
    ELSE eosinophils_abs
END, 4) AS eosinophils_abs
, ROUND(CASE
    WHEN lymphocytes_abs IS NULL AND lymphocytes IS NOT NULL AND impute_abs = 1
        THEN lymphocytes * wbc
    ELSE lymphocytes_abs
END, 4) AS lymphocytes_abs
, ROUND(CASE
    WHEN monocytes_abs IS NULL AND monocytes IS NOT NULL AND impute_abs = 1
        THEN monocytes * wbc
    ELSE monocytes_abs
END, 4) AS monocytes_abs
, ROUND(CASE
    WHEN neutrophils_abs IS NULL AND neutrophils IS NOT NULL AND impute_abs = 1
        THEN neutrophils * wbc
    ELSE neutrophils_abs
END, 4) AS neutrophils_abs

, basophils
, eosinophils
, lymphocytes
, monocytes
, neutrophils

-- impute bands/blasts?
, atypical_lymphocytes
, bands
, immature_granulocytes
, metamyelocytes
, nrbc
FROM blood_diff
;

  • cardiac_maker, 心脏标记物,三种酶
-- begin query that extracts the data
SELECT
    MAX(subject_id) AS subject_id
  , MAX(hadm_id) AS hadm_id
  , MAX(charttime) AS charttime
  , le.specimen_id
  -- convert from itemid into a meaningful column
  , MAX(CASE WHEN itemid = 51002 THEN value ELSE NULL END) AS troponin_i
  , MAX(CASE WHEN itemid = 51003 THEN value ELSE NULL END) AS troponin_t
  , MAX(CASE WHEN itemid = 50911 THEN valuenum ELSE NULL END) AS ck_mb
FROM mimic_hosp.labevents le
WHERE le.itemid IN
(
    -- 51002, -- Troponin I (troponin-I is not measured in MIMIC-IV)
    -- 52598, -- Troponin I, point of care, rare/poor quality
    51003, -- Troponin T
    50911  -- Creatinine Kinase, MB isoenzyme
)
GROUP BY le.specimen_id
;

-chemistry, 化学元素

-- extract chemistry labs
-- excludes point of care tests (very rare)
-- blood gas measurements are *not* included in this query
-- instead they are in bg.sql
SELECT 
    MAX(subject_id) AS subject_id
  , MAX(hadm_id) AS hadm_id
  , MAX(charttime) AS charttime
  , le.specimen_id
  -- convert from itemid into a meaningful column
  , MAX(CASE WHEN itemid = 50862 AND valuenum <=    10 THEN valuenum ELSE NULL END) AS albumin
  , MAX(CASE WHEN itemid = 50930 AND valuenum <=    10 THEN valuenum ELSE NULL END) AS globulin
  , MAX(CASE WHEN itemid = 50976 AND valuenum <=    20 THEN valuenum ELSE NULL END) AS total_protein
  , MAX(CASE WHEN itemid = 50868 AND valuenum <= 10000 THEN valuenum ELSE NULL END) AS aniongap
  , MAX(CASE WHEN itemid = 50882 AND valuenum <= 10000 THEN valuenum ELSE NULL END) AS bicarbonate
  , MAX(CASE WHEN itemid = 51006 AND valuenum <=   300 THEN valuenum ELSE NULL END) AS bun
  , MAX(CASE WHEN itemid = 50893 AND valuenum <= 10000 THEN valuenum ELSE NULL END) AS calcium
  , MAX(CASE WHEN itemid = 50902 AND valuenum <= 10000 THEN valuenum ELSE NULL END) AS chloride
  , MAX(CASE WHEN itemid = 50912 AND valuenum <=   150 THEN valuenum ELSE NULL END) AS creatinine
  , MAX(CASE WHEN itemid = 50931 AND valuenum <= 10000 THEN valuenum ELSE NULL END) AS glucose
  , MAX(CASE WHEN itemid = 50983 AND valuenum <=   200 THEN valuenum ELSE NULL END) AS sodium
  , MAX(CASE WHEN itemid = 50971 AND valuenum <=    30 THEN valuenum ELSE NULL END) AS potassium
FROM mimic_hosp.labevents le
WHERE le.itemid IN
(
  -- comment is: LABEL | CATEGORY | FLUID | NUMBER OF ROWS IN LABEVENTS
  50862, -- ALBUMIN | CHEMISTRY | BLOOD | 146697
  50930, -- Globulin
  50976, -- Total protein
  50868, -- ANION GAP | CHEMISTRY | BLOOD | 769895
  -- 52456, -- Anion gap, point of care test
  50882, -- BICARBONATE | CHEMISTRY | BLOOD | 780733
  50893, -- Calcium
  50912, -- CREATININE | CHEMISTRY | BLOOD | 797476
  -- 52502, Creatinine, point of care
  50902, -- CHLORIDE | CHEMISTRY | BLOOD | 795568
  50931, -- GLUCOSE | CHEMISTRY | BLOOD | 748981
  -- 52525, Glucose, point of care
  50971, -- POTASSIUM | CHEMISTRY | BLOOD | 845825
  -- 52566, -- Potassium, point of care
  50983, -- SODIUM | CHEMISTRY | BLOOD | 808489
  -- 52579, -- Sodium, point of care
  51006  -- UREA NITROGEN | CHEMISTRY | BLOOD | 791925
  -- 52603, Urea, point of care
)
AND valuenum IS NOT NULL
-- lab values cannot be 0 and cannot be negative
-- .. except anion gap.
AND (valuenum > 0 OR itemid = 50868)
GROUP BY le.specimen_id
;
  • GCS评分, 神经系统功能评分
-- This query extracts the Glasgow Coma Scale, a measure of neurological function.
-- The query has a few special rules:
--    (1) The verbal component can be set to 0 if the patient is ventilated.
--    This is corrected to 5 - the overall GCS is set to 15 in these cases.
--    (2) Often only one of three components is documented. The other components
--    are carried forward.

-- ITEMIDs used:

-- METAVISION
--    223900 GCS - Verbal Response
--    223901 GCS - Motor Response
--    220739 GCS - Eye Opening

-- Note:
--  The GCS for sedated patients is defaulted to 15 in this code.
--  This is in line with how the data is meant to be collected.
--  e.g., from the SAPS II publication:
--    For sedated patients, the Glasgow Coma Score before sedation was used.
--    This was ascertained either from interviewing the physician who ordered the sedation,
--    or by reviewing the patient's medical record.
with base as
(
  select 
    subject_id
  , ce.stay_id, ce.charttime
  -- pivot each value into its own column
  , max(case when ce.ITEMID = 223901 then ce.valuenum else null end) as GCSMotor
  , max(case
      when ce.ITEMID = 223900 and ce.VALUE = 'No Response-ETT' then 0
      when ce.ITEMID = 223900 then ce.valuenum
      else null 
    end) as GCSVerbal
  , max(case when ce.ITEMID = 220739 then ce.valuenum else null end) as GCSEyes
  -- convert the data into a number, reserving a value of 0 for ET/Trach
  , max(case
      -- endotrach/vent is assigned a value of 0
      -- flag it here to later parse specially
      when ce.ITEMID = 223900 and ce.VALUE = 'No Response-ETT' then 1 -- metavision
    else 0 end)
    as endotrachflag
  , ROW_NUMBER ()
          OVER (PARTITION BY ce.stay_id ORDER BY ce.charttime ASC) as rn
  from mimic_icu.chartevents ce
  -- Isolate the desired GCS variables
  where ce.ITEMID in
  (
    -- GCS components, Metavision
    223900, 223901, 220739
  )
  group by ce.subject_id, ce.stay_id, ce.charttime
)
, gcs as (
  select b.*
  , b2.GCSVerbal as GCSVerbalPrev
  , b2.GCSMotor as GCSMotorPrev
  , b2.GCSEyes as GCSEyesPrev
  -- Calculate GCS, factoring in special case when they are intubated and prev vals
  -- note that the coalesce are used to implement the following if:
  --  if current value exists, use it
  --  if previous value exists, use it
  --  otherwise, default to normal
  , case
      -- replace GCS during sedation with 15
      when b.GCSVerbal = 0
        then 15
      when b.GCSVerbal is null and b2.GCSVerbal = 0
        then 15
      -- if previously they were intub, but they aren't now, do not use previous GCS values
      when b2.GCSVerbal = 0
        then
            coalesce(b.GCSMotor,6)
          + coalesce(b.GCSVerbal,5)
          + coalesce(b.GCSEyes,4)
      -- otherwise, add up score normally, imputing previous value if none available at current time
      else
            coalesce(b.GCSMotor,coalesce(b2.GCSMotor,6))
          + coalesce(b.GCSVerbal,coalesce(b2.GCSVerbal,5))
          + coalesce(b.GCSEyes,coalesce(b2.GCSEyes,4))
      end as GCS

  from base b
  -- join to itself within 6 hours to get previous value
  left join base b2
    on b.stay_id = b2.stay_id
    and b.rn = b2.rn+1
    and b2.charttime > DATETIME_ADD(b.charttime, INTERVAL 6 HOUR)
)
-- combine components with previous within 6 hours
-- filter down to cohort which is not excluded
-- truncate charttime to the hour
, gcs_stg as
(
  select
    subject_id
  , gs.stay_id, gs.charttime
  , GCS
  , coalesce(GCSMotor,GCSMotorPrev) as GCSMotor
  , coalesce(GCSVerbal,GCSVerbalPrev) as GCSVerbal
  , coalesce(GCSEyes,GCSEyesPrev) as GCSEyes
  , case when coalesce(GCSMotor,GCSMotorPrev) is null then 0 else 1 end
  + case when coalesce(GCSVerbal,GCSVerbalPrev) is null then 0 else 1 end
  + case when coalesce(GCSEyes,GCSEyesPrev) is null then 0 else 1 end
    as components_measured
  , EndoTrachFlag
  from gcs gs
)
-- priority is:
--  (i) complete data, (ii) non-sedated GCS, (iii) lowest GCS, (iv) charttime
, gcs_priority as
(
  select 
      subject_id
    , stay_id
    , charttime
    , gcs
    , gcsmotor
    , gcsverbal
    , gcseyes
    , EndoTrachFlag
    , ROW_NUMBER() over
      (
        PARTITION BY stay_id, charttime
        ORDER BY components_measured DESC, endotrachflag, gcs, charttime DESC
      ) as rn
  from gcs_stg
)
select
  gs.subject_id
  , gs.stay_id
  , gs.charttime
  , GCS AS gcs
  , GCSMotor AS gcs_motor
  , GCSVerbal AS gcs_verbal
  , GCSEyes AS gcs_eyes
  , EndoTrachFlag AS gcs_unable
from gcs_priority gs
where rn = 1
;
  • inflammation, C反应蛋白,炎症标记物
SELECT
    MAX(subject_id) AS subject_id
  , MAX(hadm_id) AS hadm_id
  , MAX(charttime) AS charttime
  , le.specimen_id
  -- convert from itemid into a meaningful column
  , MAX(CASE WHEN itemid = 50889 THEN valuenum ELSE NULL END) AS crp
  -- , CAST(NULL AS NUMERIC) AS il6
  -- , CAST(NULL AS NUMERIC) AS procalcitonin
FROM mimic_hosp.labevents le
WHERE le.itemid IN
(
    50889 -- crp
    -- 51652 -- high sensitivity CRP
)
AND valuenum IS NOT NULL
-- lab values cannot be 0 and cannot be negative
AND valuenum > 0
GROUP BY le.specimen_id
;
  • vitalsign 包括心率、血压、血糖等
-- This query pivots the vital signs for the entire patient stay.
-- Vital signs include heart rate, blood pressure, respiration rate, and temperature
select
    ce.subject_id
  , ce.stay_id
  , ce.charttime
  , AVG(case when itemid in (220045) and valuenum > 0 and valuenum < 300 then valuenum else null end) as heart_rate
  , AVG(case when itemid in (220179,220050) and valuenum > 0 and valuenum < 400 then valuenum else null end) as sbp
  , AVG(case when itemid in (220180,220051) and valuenum > 0 and valuenum < 300 then valuenum else null end) as dbp
  , AVG(case when itemid in (220052,220181,225312) and valuenum > 0 and valuenum < 300 then valuenum else null end) as mbp
  , AVG(case when itemid = 220179 and valuenum > 0 and valuenum < 400 then valuenum else null end) as sbp_ni
  , AVG(case when itemid = 220180 and valuenum > 0 and valuenum < 300 then valuenum else null end) as dbp_ni
  , AVG(case when itemid = 220181 and valuenum > 0 and valuenum < 300 then valuenum else null end) as mbp_ni
  , AVG(case when itemid in (220210,224690) and valuenum > 0 and valuenum < 70 then valuenum else null end) as resp_rate
  , ROUND(
      AVG(case when itemid in (223761) and valuenum > 70 and valuenum < 120 then (valuenum-32)/1.8 -- converted to degC in valuenum call
              when itemid in (223762) and valuenum > 10 and valuenum < 50  then valuenum else null end)
    , 2) as temperature
  , MAX(CASE WHEN itemid = 224642 THEN value ELSE NULL END) AS temperature_site
  , AVG(case when itemid in (220277) and valuenum > 0 and valuenum <= 100 then valuenum else null end) as spo2
  , AVG(case when itemid in (225664,220621,226537) and valuenum > 0 then valuenum else null end) as glucose
  FROM mimic_icu.chartevents ce
  where ce.stay_id IS NOT NULL
  and ce.itemid in
  (
    220045, -- Heart Rate
    225309, -- ART BP Systolic
    225310, -- ART BP Diastolic
    225312, -- ART BP Mean
    220050, -- Arterial Blood Pressure systolic
    220051, -- Arterial Blood Pressure diastolic
    220052, -- Arterial Blood Pressure mean
    220179, -- Non Invasive Blood Pressure systolic
    220180, -- Non Invasive Blood Pressure diastolic
    220181, -- Non Invasive Blood Pressure mean
    220210, -- Respiratory Rate
    224690, -- Respiratory Rate (Total)
    220277, -- SPO2, peripheral
    -- GLUCOSE, both lab and fingerstick
    225664, -- Glucose finger stick
    220621, -- Glucose (serum)
    226537, -- Glucose (whole blood)
    -- TEMPERATURE
    223762, -- "Temperature Celsius"
    223761,  -- "Temperature Fahrenheit"
    224642 -- Temperature Site
    -- 226329 -- Blood Temperature CCO (C)
)
group by ce.subject_id, ce.stay_id, ce.charttime
;

-crrt, 连续肾脏替代治疗

with crrt_settings as
(
  select ce.stay_id, ce.charttime
  , CASE WHEN ce.itemid = 227290 THEN ce.value END AS CRRT_mode
  , CASE WHEN ce.itemid = 224149 THEN ce.valuenum ELSE NULL END AS AccessPressure
  , CASE WHEN ce.itemid = 224144 THEN ce.valuenum ELSE NULL END AS BloodFlow -- (ml/min)
  , CASE WHEN ce.itemid = 228004 THEN ce.valuenum ELSE NULL END AS Citrate -- (ACD-A)
  , CASE WHEN ce.itemid = 225183 THEN ce.valuenum ELSE NULL END AS CurrentGoal
  , CASE WHEN ce.itemid = 225977 THEN ce.value ELSE NULL END AS DialysateFluid
  , CASE WHEN ce.itemid = 224154 THEN ce.valuenum ELSE NULL END AS DialysateRate
  , CASE WHEN ce.itemid = 224151 THEN ce.valuenum ELSE NULL END AS EffluentPressure
  , CASE WHEN ce.itemid = 224150 THEN ce.valuenum ELSE NULL END AS FilterPressure
  , CASE WHEN ce.itemid = 225958 THEN ce.value ELSE NULL END AS HeparinConcentration -- (units/mL)
  , CASE WHEN ce.itemid = 224145 THEN ce.valuenum ELSE NULL END AS HeparinDose -- (per hour)
  -- below may not account for drug infusion/hyperalimentation/anticoagulants infused
  , CASE WHEN ce.itemid = 224191 THEN ce.valuenum ELSE NULL END AS HourlyPatientFluidRemoval
  , CASE WHEN ce.itemid = 228005 THEN ce.valuenum ELSE NULL END AS PrefilterReplacementRate
  , CASE WHEN ce.itemid = 228006 THEN ce.valuenum ELSE NULL END AS PostFilterReplacementRate
  , CASE WHEN ce.itemid = 225976 THEN ce.value ELSE NULL END AS ReplacementFluid
  , CASE WHEN ce.itemid = 224153 THEN ce.valuenum ELSE NULL END AS ReplacementRate
  , CASE WHEN ce.itemid = 224152 THEN ce.valuenum ELSE NULL END AS ReturnPressure
  , CASE WHEN ce.itemid = 226457 THEN ce.valuenum END AS UltrafiltrateOutput
  -- separate system integrity into sub components
  -- need to do this as 224146 has multiple unique values for a single charttime
  -- e.g. "Clots Present" and "Active" at same time
  , CASE
        WHEN ce.itemid = 224146
        AND ce.value IN ('Active', 'Initiated', 'Reinitiated', 'New Filter')
            THEN 1
        WHEN ce.itemid = 224146
        AND ce.value IN ('Recirculating', 'Discontinued')
            THEN 0
    ELSE NULL END as system_active
  , CASE
        WHEN ce.itemid = 224146
        AND ce.value IN ('Clots Present', 'Clots Present')
            THEN 1
        WHEN ce.itemid = 224146
        AND ce.value IN ('No Clot Present', 'No Clot Present')
            THEN 0
    ELSE NULL END as clots
  , CASE 
        WHEN ce.itemid = 224146
        AND ce.value IN ('Clots Increasing', 'Clot Increasing')
            THEN 1
    ELSE NULL END as clots_increasing
  , CASE
        WHEN ce.itemid = 224146
        AND ce.value IN ('Clotted')
            THEN 1
    ELSE NULL END as clotted
  from `physionet-data.mimic_icu.chartevents` ce
  where ce.itemid in
  (
    -- MetaVision ITEMIDs
    227290, -- CRRT Mode
    224146, -- System Integrity
    -- 225956,  -- Reason for CRRT Filter Change
    -- above itemid is one of: Clotted, Line Changed, Procedure
    -- only ~200 rows, not super useful
    224149, -- Access Pressure
    224144, -- Blood Flow (ml/min)
    228004, -- Citrate (ACD-A)
    225183, -- Current Goal
    225977, -- Dialysate Fluid
    224154, -- Dialysate Rate
    224151, -- Effluent Pressure
    224150, -- Filter Pressure
    225958, -- Heparin Concentration (units/mL)
    224145, -- Heparin Dose (per hour)
    224191, -- Hourly Patient Fluid Removal
    228005, -- PBP (Prefilter) Replacement Rate
    228006, -- Post Filter Replacement Rate
    225976, -- Replacement Fluid
    224153, -- Replacement Rate
    224152, -- Return Pressure
    226457  -- Ultrafiltrate Output
  )
  and ce.value is not null
)
-- use MAX() to collapse to a single row
-- there is only ever 1 row for unique combinations of stay_id/charttime/itemid
select stay_id
, charttime
, MAX(crrt_mode) AS crrt_mode
, MAX(AccessPressure) AS access_pressure
, MAX(BloodFlow) AS blood_flow
, MAX(Citrate) AS citrate
, MAX(CurrentGoal) AS current_goal
, MAX(DialysateFluid) AS dialysate_fluid
, MAX(DialysateRate) AS dialysate_rate
, MAX(EffluentPressure) AS effluent_pressure
, MAX(FilterPressure) AS filter_pressure
, MAX(HeparinConcentration) AS heparin_concentration
, MAX(HeparinDose) AS heparin_dose
, MAX(HourlyPatientFluidRemoval) AS hourly_patient_fluid_removal
, MAX(PrefilterReplacementRate) AS prefilter_replacement_rate
, MAX(PostFilterReplacementRate) AS postfilter_replacement_rate
, MAX(ReplacementFluid) AS replacement_fluid
, MAX(ReplacementRate) AS replacement_rate
, MAX(ReturnPressure) AS return_pressure
, MAX(UltrafiltrateOutput) AS ultrafiltrate_output
, MAX(system_active) AS system_active
, MAX(clots) AS clots
, MAX(clots_increasing) AS clots_increasing
, MAX(clotted) AS clotted
from crrt_settings
group by stay_id, charttime
  • ventilation, 机械通气
-- Calculate duration of mechanical ventilation.
-- Some useful cases for debugging:
--  stay_id = 30019660 has a tracheostomy placed in the ICU
--  stay_id = 30000117 has explicit documentation of extubation
-- classify vent settings into modes
WITH tm AS
(
  SELECT stay_id, charttime
  FROM `physionet-data.mimic_derived.ventilator_setting`
  UNION DISTINCT
  SELECT stay_id, charttime
  FROM `physionet-data.mimic_derived.oxygen_delivery`
)
, vs AS
(
    SELECT tm.stay_id, tm.charttime
    -- source data columns, here for debug
    , o2_delivery_device_1
    , COALESCE(ventilator_mode, ventilator_mode_hamilton) AS vent_mode
    -- case statement determining the type of intervention
    -- done in order of priority: trach > mech vent > NIV > high flow > o2
    , CASE
    -- tracheostomy
    WHEN o2_delivery_device_1 IN
    (
        'Tracheostomy tube'
    -- 'Trach mask ' -- 16435 observations
    )
        THEN 'Trach'
    -- mechanical ventilation
    WHEN o2_delivery_device_1 IN
    (
        'Endotracheal tube'
    )
    OR ventilator_mode IN
    (
        '(S) CMV',
        'APRV',
        'APRV/Biphasic+ApnPress',
        'APRV/Biphasic+ApnVol',
        'APV (cmv)',
        'Ambient',
        'Apnea Ventilation',
        'CMV',
        'CMV/ASSIST',
        'CMV/ASSIST/AutoFlow',
        'CMV/AutoFlow',
        'CPAP/PPS',
        'CPAP/PSV+Apn TCPL',
        'CPAP/PSV+ApnPres',
        'CPAP/PSV+ApnVol',
        'MMV',
        'MMV/AutoFlow',
        'MMV/PSV',
        'MMV/PSV/AutoFlow',
        'P-CMV',
        'PCV+',
        'PCV+/PSV',
        'PCV+Assist',
        'PRES/AC',
        'PRVC/AC',
        'PRVC/SIMV',
        'PSV/SBT',
        'SIMV',
        'SIMV/AutoFlow',
        'SIMV/PRES',
        'SIMV/PSV',
        'SIMV/PSV/AutoFlow',
        'SIMV/VOL',
        'SYNCHRON MASTER',
        'SYNCHRON SLAVE',
        'VOL/AC'
    )
    OR ventilator_mode_hamilton IN
    (
        'APRV',
        'APV (cmv)',
        'Ambient',
        '(S) CMV',
        'P-CMV',
        'SIMV',
        'APV (simv)',
        'P-SIMV',
        'VS',
        'ASV'
    )
        THEN 'InvasiveVent'
    -- NIV
    WHEN o2_delivery_device_1 IN
    (
        'Bipap mask ', -- 8997 observations
        'CPAP mask ' -- 5568 observations
    )
    OR ventilator_mode_hamilton IN
    (
        'DuoPaP',
        'NIV',
        'NIV-ST'
    )
        THEN 'NonInvasiveVent'
    -- high flow
    when o2_delivery_device_1 IN
    (
        'High flow neb', -- 10785 observations
        'High flow nasal cannula' -- 925 observations
    )
        THEN 'HighFlow'
    -- normal oxygen delivery
    WHEN o2_delivery_device_1 in
    (
        'Nasal cannula', -- 153714 observations
        'Face tent', -- 24601 observations
        'Aerosol-cool', -- 24560 observations
        'Non-rebreather', -- 5182 observations
        'Venti mask ', -- 1947 observations
        'Medium conc mask ', -- 1888 observations
        'T-piece', -- 1135 observations
        'Ultrasonic neb', -- 9 observations
        'Vapomist', -- 3 observations
        'Oxymizer' -- 1301 observations
    )
        THEN 'Oxygen'
    -- Not categorized:
    -- 'Other', 'None'
    ELSE NULL END AS ventilation_status
  FROM tm
  LEFT JOIN `physionet-data.mimic_derived.ventilator_setting` vs
      ON tm.stay_id = vs.stay_id
      AND tm.charttime = vs.charttime
  LEFT JOIN `physionet-data.mimic_derived.oxygen_delivery` od
      ON tm.stay_id = od.stay_id
      AND tm.charttime = od.charttime
)
, vd0 AS
(
    SELECT
      stay_id, charttime
      -- source data columns, here for debug
      , o2_delivery_device_1
      , vent_mode
      -- carry over the previous charttime which had the same state
      , LAG(charttime, 1) OVER (PARTITION BY stay_id, ventilation_status ORDER BY charttime) AS charttime_lag
      -- bring back the next charttime, regardless of the state
      -- this will be used as the end time for state transitions
      , LEAD(charttime, 1) OVER w AS charttime_lead
      , ventilation_status
      , LAG(ventilation_status, 1) OVER w AS ventilation_status_lag
    FROM vs
    WHERE ventilation_status IS NOT NULL
    WINDOW w AS (PARTITION BY stay_id ORDER BY charttime)
)
, vd1 as
(
    SELECT
        stay_id
        -- source data columns, here for debug
        , o2_delivery_device_1
        , vent_mode
        , charttime_lag
        , charttime
        , charttime_lead
        , ventilation_status

        -- calculate the time since the last event
        , DATETIME_DIFF(charttime, charttime_lag, MINUTE)/60 as ventduration

        -- now we determine if the current ventilation status is "new", or continuing the previous
        , CASE
            -- a 14 hour gap always initiates a new event
            WHEN DATETIME_DIFF(charttime, charttime_lag, HOUR) >= 14 THEN 1
            WHEN ventilation_status_lag IS NULL THEN 1
            -- not a new event if identical to the last row
            WHEN ventilation_status_lag != ventilation_status THEN 1
          ELSE 0
          END AS new_status
    FROM vd0
)
, vd2 as
(
    SELECT vd1.*
    -- create a cumulative sum of the instances of new ventilation
    -- this results in a monotonic integer assigned to each instance of ventilation
    , SUM(new_status) OVER (PARTITION BY stay_id ORDER BY charttime) AS vent_num
    FROM vd1
)
-- create the durations for each ventilation instance
SELECT stay_id
  , MIN(charttime) AS starttime
  -- for the end time of the ventilation event, the time of the *next* setting
  -- i.e. if we go NIV -> O2, the end time of NIV is the first row with a documented O2 device
  -- ... unless it's been over 14 hours, in which case it's the last row with a documented NIV.
  , MAX(
        CASE
            WHEN charttime_lead IS NULL
            OR DATETIME_DIFF(charttime_lead, charttime, HOUR) >= 14
                THEN charttime
        ELSE charttime_lead
        END
   ) AS endtime
   -- all rows with the same vent_num will have the same ventilation_status
   -- for efficiency, we use an aggregate here, but we could equally well group by this column
  , MAX(ventilation_status) AS ventilation_status
FROM vd2
GROUP BY stay_id, vent_num
HAVING min(charttime) != max(charttime)
;

-rrt 肾脏透析疗法

-- Creates a table with stay_id / time / dialysis type (if present)

with ce as
(
  select ce.stay_id
    , ce.charttime
          -- when ce.itemid in (152,148,149,146,147,151,150) and value is not null then 1
          -- when ce.itemid in (229,235,241,247,253,259,265,271) and value = 'Dialysis Line' then 1
          -- when ce.itemid = 466 and value = 'Dialysis RN' then 1
          -- when ce.itemid = 927 and value = 'Dialysis Solutions' then 1
          -- when ce.itemid = 6250 and value = 'dialys' then 1
          -- when ce.
          -- when ce.itemid = 582 and value in ('CAVH Start','CAVH D/C','CVVHD Start','CVVHD D/C','Hemodialysis st','Hemodialysis end') then 1
    , CASE
        -- metavision itemids

        -- checkboxes
        WHEN ce.itemid IN
        (
            226118 -- | Dialysis Catheter placed in outside facility      | Access Lines - Invasive | chartevents        | Checkbox
          , 227357 -- | Dialysis Catheter Dressing Occlusive              | Access Lines - Invasive | chartevents        | Checkbox
          , 225725 -- | Dialysis Catheter Tip Cultured                    | Access Lines - Invasive | chartevents        | Checkbox
        ) THEN 1
        -- numeric data
        WHEN ce.itemid IN
        (
            226499 -- | Hemodialysis Output                               | Dialysis
          , 224154 -- | Dialysate Rate                                    | Dialysis
          , 225810 -- | Dwell Time (Peritoneal Dialysis)                  | Dialysis
          , 225959 -- | Medication Added Amount  #1 (Peritoneal Dialysis) | Dialysis
          , 227639 -- | Medication Added Amount  #2 (Peritoneal Dialysis) | Dialysis
          , 225183 -- | Current Goal                     | Dialysis
          , 227438 -- | Volume not removed               | Dialysis
          , 224191 -- | Hourly Patient Fluid Removal     | Dialysis
          , 225806 -- | Volume In (PD)                   | Dialysis
          , 225807 -- | Volume Out (PD)                  | Dialysis
          , 228004 -- | Citrate (ACD-A)                  | Dialysis
          , 228005 -- | PBP (Prefilter) Replacement Rate | Dialysis
          , 228006 -- | Post Filter Replacement Rate     | Dialysis
          , 224144 -- | Blood Flow (ml/min)              | Dialysis
          , 224145 -- | Heparin Dose (per hour)          | Dialysis
          , 224149 -- | Access Pressure                  | Dialysis
          , 224150 -- | Filter Pressure                  | Dialysis
          , 224151 -- | Effluent Pressure                | Dialysis
          , 224152 -- | Return Pressure                  | Dialysis
          , 224153 -- | Replacement Rate                 | Dialysis
          , 224404 -- | ART Lumen Volume                 | Dialysis
          , 224406 -- | VEN Lumen Volume                 | Dialysis
          , 226457 -- | Ultrafiltrate Output             | Dialysis
        ) THEN 1

        -- text fields
        WHEN ce.itemid IN
        (
            224135 -- | Dialysis Access Site | Dialysis
          , 224139 -- | Dialysis Site Appearance | Dialysis
          , 224146 -- | System Integrity | Dialysis
          , 225323 -- | Dialysis Catheter Site Appear | Access Lines - Invasive
          , 225740 -- | Dialysis Catheter Discontinued | Access Lines - Invasive
          , 225776 -- | Dialysis Catheter Dressing Type | Access Lines - Invasive
          , 225951 -- | Peritoneal Dialysis Fluid Appearance | Dialysis
          , 225952 -- | Medication Added #1 (Peritoneal Dialysis) | Dialysis
          , 225953 -- | Solution (Peritoneal Dialysis) | Dialysis
          , 225954 -- | Dialysis Access Type | Dialysis
          , 225956 -- | Reason for CRRT Filter Change | Dialysis
          , 225958 -- | Heparin Concentration (units/mL) | Dialysis
          , 225961 -- | Medication Added Units #1 (Peritoneal Dialysis) | Dialysis
          , 225963 -- | Peritoneal Dialysis Catheter Type | Dialysis
          , 225965 -- | Peritoneal Dialysis Catheter Status | Dialysis
          , 225976 -- | Replacement Fluid | Dialysis
          , 225977 -- | Dialysate Fluid | Dialysis
          , 227124 -- | Dialysis Catheter Type | Access Lines - Invasive
          , 227290 -- | CRRT mode | Dialysis
          , 227638 -- | Medication Added #2 (Peritoneal Dialysis) | Dialysis
          , 227640 -- | Medication Added Units #2 (Peritoneal Dialysis) | Dialysis
          , 227753 -- | Dialysis Catheter Placement Confirmed by X-ray | Access Lines - Invasive
        ) THEN 1
      ELSE 0 END
      AS dialysis_present
    , CASE
        WHEN ce.itemid = 225965 -- Peritoneal Dialysis Catheter Status
          AND value = 'In use' THEN 1
        WHEN ce.itemid IN
        (
            226499 -- | Hemodialysis Output              | Dialysis
          , 224154 -- | Dialysate Rate                   | Dialysis
          , 225183 -- | Current Goal                     | Dialysis
          , 227438 -- | Volume not removed               | Dialysis
          , 224191 -- | Hourly Patient Fluid Removal     | Dialysis
          , 225806 -- | Volume In (PD)                   | Dialysis
          , 225807 -- | Volume Out (PD)                  | Dialysis
          , 228004 -- | Citrate (ACD-A)                  | Dialysis
          , 228005 -- | PBP (Prefilter) Replacement Rate | Dialysis
          , 228006 -- | Post Filter Replacement Rate     | Dialysis
          , 224144 -- | Blood Flow (ml/min)              | Dialysis
          , 224145 -- | Heparin Dose (per hour)          | Dialysis
          , 224153 -- | Replacement Rate                 | Dialysis
          , 226457 -- | Ultrafiltrate Output             | Dialysis
        ) THEN 1
      ELSE 0 END
      AS dialysis_active
    , CASE
        -- dialysis mode
        -- we try to set dialysis mode to one of:
        --   CVVH
        --   CVVHD
        --   CVVHDF
        --   SCUF
        --   Peritoneal
        --   IHD
        -- these are the modes in itemid 227290
        WHEN ce.itemid = 227290 THEN value
        -- itemids which imply a certain dialysis mode
        -- peritoneal dialysis
        WHEN ce.itemid IN 
        (
            225810 -- | Dwell Time (Peritoneal Dialysis) | Dialysis
          , 225806 -- | Volume In (PD)                   | Dialysis
          , 225807 -- | Volume Out (PD)                  | Dialysis
          , 225810 -- | Dwell Time (Peritoneal Dialysis)                  | Dialysis
          , 227639 -- | Medication Added Amount  #2 (Peritoneal Dialysis) | Dialysis
          , 225959 -- | Medication Added Amount  #1 (Peritoneal Dialysis) | Dialysis
          , 225951 -- | Peritoneal Dialysis Fluid Appearance | Dialysis
          , 225952 -- | Medication Added #1 (Peritoneal Dialysis) | Dialysis
          , 225961 -- | Medication Added Units #1 (Peritoneal Dialysis) | Dialysis
          , 225953 -- | Solution (Peritoneal Dialysis) | Dialysis
          , 225963 -- | Peritoneal Dialysis Catheter Type | Dialysis
          , 225965 -- | Peritoneal Dialysis Catheter Status | Dialysis
          , 227638 -- | Medication Added #2 (Peritoneal Dialysis) | Dialysis
          , 227640 -- | Medication Added Units #2 (Peritoneal Dialysis) | Dialysis
        )
          THEN 'Peritoneal'
        WHEN ce.itemid = 226499
          THEN 'IHD'
      ELSE NULL END as dialysis_type
  from `physionet-data.mimic_icu.chartevents` ce
  WHERE ce.itemid in
  (
    -- === MetaVision itemids === --
  
    -- Checkboxes
      226118 -- | Dialysis Catheter placed in outside facility      | Access Lines - Invasive | chartevents        | Checkbox
    , 227357 -- | Dialysis Catheter Dressing Occlusive              | Access Lines - Invasive | chartevents        | Checkbox
    , 225725 -- | Dialysis Catheter Tip Cultured                    | Access Lines - Invasive | chartevents        | Checkbox

    -- Numeric values
    , 226499 -- | Hemodialysis Output                               | Dialysis                | chartevents        | Numeric
    , 224154 -- | Dialysate Rate                                    | Dialysis                | chartevents        | Numeric
    , 225810 -- | Dwell Time (Peritoneal Dialysis)                  | Dialysis                | chartevents        | Numeric
    , 227639 -- | Medication Added Amount  #2 (Peritoneal Dialysis) | Dialysis                | chartevents        | Numeric
    , 225183 -- | Current Goal                     | Dialysis | chartevents        | Numeric
    , 227438 -- | Volume not removed               | Dialysis | chartevents        | Numeric
    , 224191 -- | Hourly Patient Fluid Removal     | Dialysis | chartevents        | Numeric
    , 225806 -- | Volume In (PD)                   | Dialysis | chartevents        | Numeric
    , 225807 -- | Volume Out (PD)                  | Dialysis | chartevents        | Numeric
    , 228004 -- | Citrate (ACD-A)                  | Dialysis | chartevents        | Numeric
    , 228005 -- | PBP (Prefilter) Replacement Rate | Dialysis | chartevents        | Numeric
    , 228006 -- | Post Filter Replacement Rate     | Dialysis | chartevents        | Numeric
    , 224144 -- | Blood Flow (ml/min)              | Dialysis | chartevents        | Numeric
    , 224145 -- | Heparin Dose (per hour)          | Dialysis | chartevents        | Numeric
    , 224149 -- | Access Pressure                  | Dialysis | chartevents        | Numeric
    , 224150 -- | Filter Pressure                  | Dialysis | chartevents        | Numeric
    , 224151 -- | Effluent Pressure                | Dialysis | chartevents        | Numeric
    , 224152 -- | Return Pressure                  | Dialysis | chartevents        | Numeric
    , 224153 -- | Replacement Rate                 | Dialysis | chartevents        | Numeric
    , 224404 -- | ART Lumen Volume                 | Dialysis | chartevents        | Numeric
    , 224406 -- | VEN Lumen Volume                 | Dialysis | chartevents        | Numeric
    , 226457 -- | Ultrafiltrate Output             | Dialysis | chartevents        | Numeric
    , 225959 -- | Medication Added Amount  #1 (Peritoneal Dialysis) | Dialysis | chartevents | Numeric
    -- Text values
    , 224135 -- | Dialysis Access Site | Dialysis | chartevents | Text
    , 224139 -- | Dialysis Site Appearance | Dialysis | chartevents | Text
    , 224146 -- | System Integrity | Dialysis | chartevents | Text
    , 225323 -- | Dialysis Catheter Site Appear | Access Lines - Invasive | chartevents | Text
    , 225740 -- | Dialysis Catheter Discontinued | Access Lines - Invasive | chartevents | Text
    , 225776 -- | Dialysis Catheter Dressing Type | Access Lines - Invasive | chartevents | Text
    , 225951 -- | Peritoneal Dialysis Fluid Appearance | Dialysis | chartevents | Text
    , 225952 -- | Medication Added #1 (Peritoneal Dialysis) | Dialysis | chartevents | Text
    , 225953 -- | Solution (Peritoneal Dialysis) | Dialysis | chartevents | Text
    , 225954 -- | Dialysis Access Type | Dialysis | chartevents | Text
    , 225956 -- | Reason for CRRT Filter Change | Dialysis | chartevents | Text
    , 225958 -- | Heparin Concentration (units/mL) | Dialysis | chartevents | Text
    , 225961 -- | Medication Added Units #1 (Peritoneal Dialysis) | Dialysis | chartevents | Text
    , 225963 -- | Peritoneal Dialysis Catheter Type | Dialysis | chartevents | Text
    , 225965 -- | Peritoneal Dialysis Catheter Status | Dialysis | chartevents | Text
    , 225976 -- | Replacement Fluid | Dialysis | chartevents | Text
    , 225977 -- | Dialysate Fluid | Dialysis | chartevents | Text
    , 227124 -- | Dialysis Catheter Type | Access Lines - Invasive | chartevents | Text
    , 227290 -- | CRRT mode | Dialysis | chartevents | Text
    , 227638 -- | Medication Added #2 (Peritoneal Dialysis) | Dialysis | chartevents | Text
    , 227640 -- | Medication Added Units #2 (Peritoneal Dialysis) | Dialysis | chartevents | Text
    , 227753 -- | Dialysis Catheter Placement Confirmed by X-ray | Access Lines - Invasive | chartevents | Text
  )
  AND ce.value IS NOT NULL
)
-- TODO:
--   charttime + dialysis_present + dialysis_active
--  for inputevents_cv, outputevents
--  for procedures_mv, left join and set the dialysis_type
, oe as
(
 select stay_id
    , charttime
    , 1 AS dialysis_present
    , 0 AS dialysis_active
    , NULL AS dialysis_type
 from `physionet-data.mimic_icu.outputevents`
 where itemid in
 (
       40386 -- hemodialysis
 )
 and value > 0 -- also ensures it's not null
)
, mv_ranges as
(
  select stay_id
    , starttime, endtime
    , 1 AS dialysis_present
    , 1 AS dialysis_active
    , 'CRRT' as dialysis_type
  from `physionet-data.mimic_icu.inputevents`
  where itemid in
  (
      227536 --	KCl (CRRT)	Medications	inputevents_mv	Solution
    , 227525 --	Calcium Gluconate (CRRT)	Medications	inputevents_mv	Solution
  )
  and amount > 0 -- also ensures it's not null
  UNION DISTINCT
  select stay_id
    , starttime, endtime
    , 1 AS dialysis_present
    , CASE WHEN itemid NOT IN (224270, 225436) THEN 1 ELSE 0 END AS dialysis_active
    , CASE
        WHEN itemid = 225441 THEN 'IHD'
        WHEN itemid = 225802 THEN 'CRRT'  -- CVVH (Continuous venovenous hemofiltration)
        WHEN itemid = 225803 THEN 'CVVHD' -- CVVHD (Continuous venovenous hemodialysis)
        WHEN itemid = 225805 THEN 'Peritoneal'
        WHEN itemid = 225809 THEN 'CVVHDF' -- CVVHDF (Continuous venovenous hemodiafiltration)
        WHEN itemid = 225955 THEN 'SCUF' -- SCUF (Slow continuous ultra filtration)
      ELSE NULL END as dialysis_type
  from `physionet-data.mimic_icu.procedureevents`
  where itemid in
  (
      225441 -- | Hemodialysis          | 4-Procedures              | procedureevents_mv | Process
    , 225802 -- | Dialysis - CRRT       | Dialysis                  | procedureevents_mv | Process
    , 225803 -- | Dialysis - CVVHD      | Dialysis                  | procedureevents_mv | Process
    , 225805 -- | Peritoneal Dialysis   | Dialysis                  | procedureevents_mv | Process
    , 224270 -- | Dialysis Catheter     | Access Lines - Invasive   | procedureevents_mv | Process
    , 225809 -- | Dialysis - CVVHDF     | Dialysis                  | procedureevents_mv | Process
    , 225955 -- | Dialysis - SCUF       | Dialysis                  | procedureevents_mv | Process
    , 225436 -- | CRRT Filter Change    | Dialysis                  | procedureevents_mv | Process
  )
  AND value IS NOT NULL
)
-- union together the charttime tables; append times from mv_ranges to guarantee they exist
, stg0 AS
(
  SELECT
    stay_id, charttime, dialysis_present, dialysis_active, dialysis_type
  FROM ce
  WHERE dialysis_present = 1
  UNION DISTINCT
--   SELECT
--     stay_id, charttime, dialysis_present, dialysis_active, dialysis_type
--   FROM oe
--   WHERE dialysis_present = 1
--   UNION DISTINCT
  SELECT
    stay_id, starttime AS charttime, dialysis_present, dialysis_active, dialysis_type
  FROM mv_ranges
)
SELECT
    stg0.stay_id
    , charttime
    , COALESCE(mv.dialysis_present, stg0.dialysis_present) AS dialysis_present
    , COALESCE(mv.dialysis_active, stg0.dialysis_active) AS dialysis_active
    , COALESCE(mv.dialysis_type, stg0.dialysis_type) AS dialysis_type
FROM stg0
LEFT JOIN mv_ranges mv
  ON stg0.stay_id = mv.stay_id
  AND stg0.charttime >= mv.starttime
  AND stg0.charttime <= mv.endtime
  • 列举不能全面, 遇到没有列举表格,其含义可以查询github上官方源代码中的concept文件夹内的代码和注释。
  • 表格名称的简单注释,gcs(Glasgow Coma Scale),icp(Intra Cranial Pressure ),lods(Logistic Organ Dysfunction Score ),apsiii(Acute Physiology Score III ),oasis(Oxford Acute Severity of Illness Score ),sirs(Systemic inflammatory response syndrome ),sofa(Sequential Organ Failure Assessment )

你可能感兴趣的:(数据挖掘,sql,数据库,database)