Pneumonia Diagnosis using XRays from Kaggle Data Sets

Data Introduction

The data is from Kaggle( The original data sets is from here ). But here we just use kaggle's data to analysis.

There are two category in kaggle's data sets : Normal and PNEUMONIA (PEUMONIA can also split into virus and bacteria, but currently we just consider Normal & PNEUMONIA to study)

Data explore

The kaggle's data have split data set into 3 folders : train、val and test.
The train folder totally have 5216 jpg files (Normal:1341，PNEUMONIA:3875).
The val folder totally have 16 jpg files (Normal:8，PNEUMONIA:8).
The test folder totally have 624 jpg files (Normal:234，PNEUMONIA:390).

Remark! The train folder is an imbalance data sets for Normal & PNEUMONIA (about 1:3)

Data Augmentation

Here we just use Keras function image_data_generator. Below is my generator R code

image_data_generator(
  rescale            = 1/255    ,
  rotation_range     = 5        ,
  width_shift_range  = 0.1      ,
  height_shift_range = 0.05     ,
  shear_range        = 0.1      ,
  zoom_range         = 0.15     ,
  horizontal_flip    = TRUE     ,
  vertical_flip      = FALSE    ,
  fill_mode          = "reflect"
)

Model Build

I use xception model with transfer learning.

conv_base      <- application_xception(
                                       weights     = "imagenet"    ,
                                       include_top = FALSE         ,
                                       input_shape = c(299, 299, 3)
                                      )
unfreeze_weights(conv_base, from = "block3_sepconv1_act")   

input_tensor   <- layer_input(shape = list(299, 299, 3), name = "input_tensor")
output_tensor  <- input_tensor %>%
                  conv_base %>% 
                  layer_global_average_pooling_2d() %>%
                  layer_dense(units = 1024, activation = "relu", name='fc1') %>% 
                  layer_dropout(rate = 0.3, name='dropout1') %>%
                  layer_dense(units = 512, activation = "relu", name='fc2') %>% 
                  layer_dropout(rate = 0.3, name='dropout2') %>%
                  layer_dense(units = 2, activation = "softmax", name='fc3')
model          <- keras_model(input_tensor, output_tensor)

Model Fit

model %>% compile(
  loss      = "binary_crossentropy"        ,
  optimizer = optimizer_rmsprop(lr = 1e-5) ,
  metrics   = c("accuracy")
)

training_step_size          <- ceiling(length(list.files(train_dir     , recursive = T)) / training_batch_size  )
validation_step_size        <- ceiling(length(list.files(validation_dir, recursive = T)) / validation_batch_size)
weight_adjustment           <- length(list.files(paste(train_dir, '/NORMAL/'   , sep = ""), recursive = T)) / 
                               length(list.files(paste(train_dir, '/PNEUMONIA/', sep = ""), recursive = T))
history <- model %>% fit_generator(
  train_generator                                      ,
  steps_per_epoch  = training_step_size                ,
  class_weight     = list("0"=1,"1"=weight_adjustment) ,
  epochs           = 30                                ,
  validation_data  = validation_generator              ,
  validation_steps = validation_step_size
)

Below is my training progress and it is stable for validation data sets from 7th epoch (Good! it is 100% accuracy for validation).

Classified Result

preds   <- predict_generator(model,
                             test_generator,
                             steps = length(list.files(test_dir, recursive = T))
                            )
predictions                                                       <- data.frame(test_generator$filenames)
predictions$prob_pneumonia                                        <- preds[,2]
colnames(predictions)                                             <- c('Filename', 'Prob_Pneumonia')
predictions$Class_predicted                                       <- 'Normal'
predictions$Class_predicted[predictions$Prob_Pneumonia >= 0.5]    <- 'Pneumonia'
predictions$Class_actual                                          <- 'Normal'
predictions$Class_actual[grep("PNEUMONIA", predictions$Filename)] <- 'Pneumonia'
predictions$Class_predicted                                       <- as.factor(predictions$Class_predicted )
predictions$Class_actual                                          <- as.factor(predictions$Class_actual )
confusionMatrix(predictions$Class_predicted, predictions$Class_actual, positive = 'Pneumonia')

Below is my test data set classified result.
Acquired precision (Positive predictive value/Precision/準確性) is 92.57%.
Recall (True positive rate/Sensitivity/靈敏性) is 95.90%.
Specificity (True negative rate/特異性) is 87.18%.

Above result is enough to compared with Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning (Precision 92.8%, Recall 93.2%, Specificity 90.1%)

Conclusion and Future Work

In this example, the valiadation is only 16 jpgs which is not enough to fine tune our model's hyperparameters. But it's a simple example to learning deep learning with R. I hope my code will help someone learn keras in R.
There are two things I want to try :

For PNEUMONIA, there are still two category which can classified. Trying to classified them and make summary.
Study another data set from kaggle

Finally We are all standing on the shoulders of giants.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
image		image
.gitignore		.gitignore
Pneumonia-Diagnosis-using-XRays.Rproj		Pneumonia-Diagnosis-using-XRays.Rproj
README.md		README.md
_config.yml		_config.yml
main.R		main.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pneumonia Diagnosis using XRays from Kaggle Data Sets

Data Introduction

Data explore

Data Augmentation

Model Build

Model Fit

Classified Result

Conclusion and Future Work

Reference

About

Uh oh!

Releases

Packages

Languages

fr407041/Pneumonia-Diagnosis-using-XRays

Folders and files

Latest commit

History

Repository files navigation

Pneumonia Diagnosis using XRays from Kaggle Data Sets

Data Introduction

Data explore

Data Augmentation

Model Build

Model Fit

Classified Result

Conclusion and Future Work

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages