The data is from Kaggle( The original data sets is from here ). But here we just use kaggle's data to analysis.
There are two category in kaggle's data sets : Normal and PNEUMONIA (PEUMONIA can also split into virus and bacteria, but currently we just consider Normal & PNEUMONIA to study)

The train folder totally have 5216 jpg files (Normal:1341,PNEUMONIA:3875).
The val folder totally have 16 jpg files (Normal:8,PNEUMONIA:8).
The test folder totally have 624 jpg files (Normal:234,PNEUMONIA:390).
Remark! The train folder is an imbalance data sets for Normal & PNEUMONIA (about 1:3)
Here we just use Keras function image_data_generator. Below is my generator R codeimage_data_generator(
rescale = 1/255 ,
rotation_range = 5 ,
width_shift_range = 0.1 ,
height_shift_range = 0.05 ,
shear_range = 0.1 ,
zoom_range = 0.15 ,
horizontal_flip = TRUE ,
vertical_flip = FALSE ,
fill_mode = "reflect"
)
conv_base <- application_xception(
weights = "imagenet" ,
include_top = FALSE ,
input_shape = c(299, 299, 3)
)
unfreeze_weights(conv_base, from = "block3_sepconv1_act")
input_tensor <- layer_input(shape = list(299, 299, 3), name = "input_tensor")
output_tensor <- input_tensor %>%
conv_base %>%
layer_global_average_pooling_2d() %>%
layer_dense(units = 1024, activation = "relu", name='fc1') %>%
layer_dropout(rate = 0.3, name='dropout1') %>%
layer_dense(units = 512, activation = "relu", name='fc2') %>%
layer_dropout(rate = 0.3, name='dropout2') %>%
layer_dense(units = 2, activation = "softmax", name='fc3')
model <- keras_model(input_tensor, output_tensor)
model %>% compile(
loss = "binary_crossentropy" ,
optimizer = optimizer_rmsprop(lr = 1e-5) ,
metrics = c("accuracy")
)
training_step_size <- ceiling(length(list.files(train_dir , recursive = T)) / training_batch_size )
validation_step_size <- ceiling(length(list.files(validation_dir, recursive = T)) / validation_batch_size)
weight_adjustment <- length(list.files(paste(train_dir, '/NORMAL/' , sep = ""), recursive = T)) /
length(list.files(paste(train_dir, '/PNEUMONIA/', sep = ""), recursive = T))
history <- model %>% fit_generator(
train_generator ,
steps_per_epoch = training_step_size ,
class_weight = list("0"=1,"1"=weight_adjustment) ,
epochs = 30 ,
validation_data = validation_generator ,
validation_steps = validation_step_size
)
Below is my training progress and it is stable for validation data sets from 7th epoch (Good! it is 100% accuracy for validation).
preds <- predict_generator(model,
test_generator,
steps = length(list.files(test_dir, recursive = T))
)
predictions <- data.frame(test_generator$filenames)
predictions$prob_pneumonia <- preds[,2]
colnames(predictions) <- c('Filename', 'Prob_Pneumonia')
predictions$Class_predicted <- 'Normal'
predictions$Class_predicted[predictions$Prob_Pneumonia >= 0.5] <- 'Pneumonia'
predictions$Class_actual <- 'Normal'
predictions$Class_actual[grep("PNEUMONIA", predictions$Filename)] <- 'Pneumonia'
predictions$Class_predicted <- as.factor(predictions$Class_predicted )
predictions$Class_actual <- as.factor(predictions$Class_actual )
confusionMatrix(predictions$Class_predicted, predictions$Class_actual, positive = 'Pneumonia')
Below is my test data set classified result.
Acquired precision (Positive predictive value/Precision/準確性) is 92.57%.
Recall (True positive rate/Sensitivity/靈敏性) is 95.90%.
Specificity (True negative rate/特異性) is 87.18%.
Above result is enough to compared with Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning (Precision 92.8%, Recall 93.2%, Specificity 90.1%)
In this example, the valiadation is only 16 jpgs which is not enough to fine tune our model's hyperparameters. But it's a simple example to learning deep learning with R. I hope my code will help someone learn keras in R.There are two things I want to try :
-
For PNEUMONIA, there are still two category which can classified. Trying to classified them and make summary.
Finally We are all standing on the shoulders of giants.

