A complete tensorflow pipeline for training, inference and feature extraction notebooks used in Kaggle competition OSIC Pulmonary Fibrosis (July-Oct 2020)
The data contained of dicom (images + metadata) data of chest X-Ray of patients along with tabular data like smoking status, age, Forced Vital Capacity (FVC) values etc.
Slices preview of chest X-Ray of a patient are as:
Lung mask segmentation process deployed was (3rd image - final mask)
3D plot of stacked 2D segmented masks to form a lung produces
Apart from the dicom data the tabular data was as follows
A brief content description is provided here, for detailed descriptions check the notebook
A major task was engineering and extracting features from the dcm slices
In total I engineered 5 features as follows
-
Chest Volume:
- Calculated through numpy.trapz() integration over all 2D slices using pixel count, sliceThickness and pixelSpacing (Voxel spacing) metadata in the dcm file
- Dealt with the inconsistencies in the data and final distplot produced was
-
Chest Area:
- Maximum area of chest calculated using the average of 3 middle most slices in same fashion as Chest Volume
- distplot
-
Lung - Tissue ratio:
- Ratio of pixel area of segmented lung mask to the total tissue pixel area as in original dcm file
- The ideology behind being this feature was to detect lung shrinkage inside chest
- distplot
-
Chest Height:
- Chest height calculated using sliceThickness and number of slices forming the lung
- distplot
-
Height of the Patient:
- Approximate height calculated using FVC values and age of a patient according to formulaes and observations made from external medical research data
- distplot
Plots of Features vs FVC / Percent
EffNet train notebook described below, Custom tf tabular data only model listed in [INFERENCE] itself
-
Pre-Processing:
- Handled the various sizes and missing slices issues
- Stratified 5 fold split based on PatientID -
Augmentations:
- Albumentations - RandomSizedCrop, Flips, Gaussian Blur, CoarseDropout, Rotate (0-90) -
Configurations:
- Optimizer - NAdam
- LR Scheduler - ReduceLRonPlateau (initial LR = 0.0005, patience = 5, factor = 0.5)
- Model - EfficientNet B5
- Input Size - 512 * 512
Contains custom tabular data model training and inference too
-
Custom Net:
- A tiny net using given tabular data and engineered features on swish activated dense layers
- Pinball loss function for multiple quantiles was used, the difference in first and last quantiles was used as uncertainty measure -
Ensemble:
- Final submission made using ensemble of both effnet image and custom model
Just change the directories according to your environment.
Google Colab deployed versions are available for
[TRAIN] Effnet
[TRAIN] Base Custom Net