5-10mm | 10-30mm |
50% | 50% |
2. 数据格式:
seriesuid | coordX | coordY | coordZ | diameter_mm |
LKDS_00001 | -100.56 | 67.26 | -231.81 | 6.44 |
3. 层厚(mm)
For this challenge, we use the publicly available LIDC/IDRI database. This data uses the Creative Commons Attribution 3.0 Unported License. We excluded scans with a slice thickness greater than 2.5 mm. In total, 888 CT scans are included. The LIDC/IDRI database also contains annotations which were collected during a two-phase annotation process using 4 experienced radiologists. Each radiologist marked lesions they identified as non-nodule, nodule < 3 mm, and nodules >= 3 mm. See this publication for the details of the annotation process. The reference standard of our challenge consists of all nodules >= 3 mm accepted by at least 3 out of 4 radiologists. Annotations that are not included in the reference standard (non-nodules, nodules < 3 mm, and nodules annotated by only 1 or 2 radiologists) are referred as irrelevant findings. The list of irrelevant findings is provided inside the evaluation script package (annotations_excluded.csv).
本次比赛,我们使用公开可获得的LIDC/IDRI database(数据库)。
无关发现的列表在 evaluation script package (annotations_excluded.csv)中。
Data is available on the download page. The data is structured as follows:
Additional data includes:
Note: The dataset is used for both training and testing dataset. To allow easier reproducibility, please use the given subsets for training the algorithm for 10-folds cross-validation.
In this dataset, you are given over a thousand low-dose CT images from high-risk patients in DICOM format. Each image contains a series with multiple axial slices of the chest cavity. Each image has a variable number of 2D slices, which can vary based on the machine taking the scan and patient.
The DICOM files have a header that contains the necessary information about the patient id, as well as scan parameters such as the slice thickness.
The competition task is to create an automated method capable of determining whether or not the patient will be diagnosed with lung cancer within one year of the date the scan was taken. The ground truth labels were confirmed by pathology diagnosis.
The images in this dataset come from many sources and will vary in quality. For example, older scans were imaged with less sophisticated equipment. You should expect the stage 2 data to be, on the whole, more recent and higher quality than the stage 1 data (generally having thinner slice thickness). Ideally, your algorithm should perform well across a range of image quality.
Each patient id has an associated directory of DICOM files. The patient id is found in the DICOM header and is identical to the patient name. The exact number of images will differ from case to case, varying according in the number of slices. Images were compressed as .7z files due to the large size of the dataset.
The DICOM standard is complex and there are a number of different tools to work with DICOM files. You may find the following resources helpful for managing the competition data: