• Data for the Datathon

    De-identified Real-world Electronic Medical Record Datasets

  • Datasets

    1. Electrical Medical Records Datasets

    During the datathon, teams will have access to 3 de-identified EMR datasets. Teams may choose to use one or all of these datasets to answer their clinical questions. In particular, these three datasets are: 1) the Medical Information Mart for Intensive Care (MIMIC) Database from Physionet 2) the Philips eICU Collaborative Research Database (https://eicu-crd.mit.edu/ 3) the NUHS Department of Surgery EMR database. These three databases share similar data schemas. They contain hourly physiologic readings from bedside monitors, validated by ICU nurses. They also contain records of demographics, labs, nursing progress notes, discharge summaries, IV medications, fluid balance, and other clinical variables.

  • 2. Medical Imaging Datasets

    Three medical image datasets will be provided to all teams: 1) Digital Database for Screening Mammography (DDSM). The Digital Database for Screening Mammography (DDSM) is a resource for use by the mammographic image analysis research community. The database contains approximately 2,500 studies. Each study includes two images of each breast. Images containing suspicious areas have associated pixel-level "ground truth" information about the locations and types of suspicious regions. 2) ISIC Skin Lesion Analysis Towards Melanoma Detection dataset: The ISIC dataset contains the largest publicly available collection of quality controlled dermoscopic images of skin lesions. The dataset contains a representative mix of images of both malignant and benign skin lesions. The dataset was randomly partitioned into both a training and test sets, with ~900 images in the training set and ~350 images in the test set. 3) Brain Tumor Image Segmentation 2015(BRATS 2015) is a dataset which contains 3D brain MRI images(volume) with brain tumors. The dataset contains 220 subjects with high grade gliomas and 54 subjects with low grade gliomas. Four different modalities of MRI images(T1, T2, T1C, FLAIR) are available for each subject and each modalities highlights different tumors tissues present in the image. The dataset also contains pixel level ground truth volume for each subject to annotate different types of tissues (necrosis, edema, non-enhancing tumor, enhancing tumor and background). Out of 274 subjects, 244 subjects will be used for training and the rest 30 subjects for testing.

    DDSM Dataset

    BRATS-2015 Dataset

    Introduction & Documentation: https://www.smir.ch/BRATS/Start2015