TensorFlow based Quantile Regression solution - OSIC Pulmonary Fibrosis

A complete tensorflow pipeline for training, inference and feature extraction notebooks used in Kaggle competition OSIC Pulmonary Fibrosis (July-Oct 2020)

Brief overview of the competition data

The data contained of dicom (images + metadata) data of chest X-Ray of patients along with tabular data like smoking status, age, Forced Vital Capacity (FVC) values etc.
Slices preview of chest X-Ray of a patient are as:

Lung mask segmentation process deployed was (3rd image - final mask)

3D plot of stacked 2D segmented masks to form a lung produces

Apart from the dicom data the tabular data was as follows

Notebooks description

A brief content description is provided here, for detailed descriptions check the notebook

Feature Engineering notebook

A major task was engineering and extracting features from the dcm slices
In total I engineered 5 features as follows

Chest Volume:
- Calculated through numpy.trapz() integration over all 2D slices using pixel count, sliceThickness and pixelSpacing (Voxel spacing) metadata in the dcm file
- Dealt with the inconsistencies in the data and final distplot produced was
Chest Area:
- Maximum area of chest calculated using the average of 3 middle most slices in same fashion as Chest Volume
- distplot
Lung - Tissue ratio:
- Ratio of pixel area of segmented lung mask to the total tissue pixel area as in original dcm file
- The ideology behind being this feature was to detect lung shrinkage inside chest
- distplot
Chest Height:
- Chest height calculated using sliceThickness and number of slices forming the lung
- distplot
Height of the Patient:
- Approximate height calculated using FVC values and age of a patient according to formulaes and observations made from external medical research data
- distplot

Plots of Features vs FVC / Percent

[TRAIN] notebook

EffNet train notebook described below, Custom tf tabular data only model listed in [INFERENCE] itself

Pre-Processing:
- Handled the various sizes and missing slices issues
- Stratified 5 fold split based on PatientID
Augmentations:
- Albumentations - RandomSizedCrop, Flips, Gaussian Blur, CoarseDropout, Rotate (0-90)
Configurations:
- Optimizer - NAdam
- LR Scheduler - ReduceLRonPlateau (initial LR = 0.0005, patience = 5, factor = 0.5)
- Model - EfficientNet B5
- Input Size - 512 * 512

[INFERENCE] Submission notebook

Contains custom tabular data model training and inference too

Custom Net:
- A tiny net using given tabular data and engineered features on swish activated dense layers
- Pinball loss function for multiple quantiles was used, the difference in first and last quantiles was used as uncertainty measure
Ensemble:
- Final submission made using ensemble of both effnet image and custom model

How to use

Just change the directories according to your environment.

Google Colab deployed versions are available for
[TRAIN] Effnet
[TRAIN] Base Custom Net

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
Feature-Engineering.ipynb		Feature-Engineering.ipynb
README.md		README.md
[INFERENCE]Image+QuantileRegression.ipynb		[INFERENCE]Image+QuantileRegression.ipynb
[TRAIN]_TF_EfficientNet.ipynb		[TRAIN]_TF_EfficientNet.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorFlow based Quantile Regression solution - OSIC Pulmonary Fibrosis

Table of Contents

Brief overview of the competition data

Notebooks description

Feature Engineering notebook

[TRAIN] notebook

[INFERENCE] Submission notebook

How to use

About

Languages

5m0k3/osic-pulmonary-fibrosis-tf

Folders and files

Latest commit

History

Repository files navigation

TensorFlow based Quantile Regression solution - OSIC Pulmonary Fibrosis

Table of Contents

Brief overview of the competition data

Notebooks description

Feature Engineering notebook

[TRAIN] notebook

[INFERENCE] Submission notebook

How to use

About

Topics

Resources

Stars

Watchers

Forks

Languages