Content is user-generated and unverified.

Brain Tumor Detection with VGG16: A Complete Guide to Transfer Learning and Fine-Tuning on MRI Images

How I built a binary classifier that detects brain tumors from MRI scans using only 253 images — and why the two-phase training strategy is the key to making it work.

The Problem: Medical Imaging with a Tiny Dataset

Brain tumor detection is one of the most critical and time-sensitive tasks in modern medicine. An MRI scan that goes unanalyzed for even a few hours can mean the difference between a treatable condition and an irreversible outcome. The natural question becomes: can we train a deep learning model to assist radiologists in this task?

The challenge is immediately obvious. Medical imaging datasets are notoriously small. Acquiring labeled MRI scans requires expert radiologists, ethical approvals, and expensive equipment. In this project, I worked with just 253 images — 98 MRI scans showing no tumor and 155 showing a positive tumor diagnosis. By any standard deep learning benchmark, this is a microscopic dataset.

Training a convolutional neural network from scratch on 253 images would be a recipe for overfitting. The model would simply memorize the training examples rather than learning the underlying patterns that distinguish a tumor from healthy brain tissue. We needed a smarter approach.

That smarter approach is Transfer Learning with Fine-Tuning, and specifically, using VGG16 as our backbone architecture.

Why VGG16? Understanding the Foundation

VGG16 is a convolutional neural network architecture developed by the Visual Geometry Group at Oxford University. It was introduced in 2014 and achieved exceptional results on the ImageNet Large Scale Visual Recognition Challenge, where it learned to classify over 1.2 million images across 1,000 different categories.

The architecture is elegantly simple. VGG16 stacks sixteen layers of learnable parameters — thirteen convolutional layers organized into five "blocks," each followed by a max-pooling operation. Every convolutional kernel is a tiny 3×3 filter, and the number of filters doubles as we go deeper: 64 filters in block 1, 128 in block 2, 256 in block 3, and 512 in blocks 4 and 5.

The crucial insight that makes transfer learning possible is this: the lower layers of VGG16 (blocks 1, 2, 3) have learned to detect universal low-level features — edges, curves, color gradients, and textures. These features are not specific to cats or cars; they are fundamental visual primitives that appear in every image, including MRI scans. The higher layers (blocks 4 and 5) have learned more abstract and domain-specific patterns.

When we load VGG16 with ImageNet weights, we are essentially downloading a "vision system" that has already invested millions of computations learning to see. We don't throw that away — we build on top of it.

The Two-Phase Strategy: Why You Can't Just Fine-Tune from the Start

The approach in this notebook is a two-phase training strategy, and understanding why both phases are necessary is fundamental to understanding the whole project.

Phase 1 — Feature Extraction: Every single layer of VGG16 is frozen, meaning their weights will not change during training. We add a custom "head" on top of VGG16 and train only this head. The base model acts purely as a feature extractor, transforming raw pixel input into a rich 512-dimensional feature vector.

This phase is essential because of a concept called weight shock. If you unfreeze a pre-trained model and immediately hit it with gradients from a randomly initialized classification head, the large error signals will destroy the carefully tuned ImageNet weights before they have a chance to be useful. Phase 1 lets the new head stabilize and produce reasonable, low-magnitude gradients before we touch the base model.

Phase 2 — Fine-Tuning: Once the head has stabilized, we selectively unfreeze the last four layers of VGG16 and retrain with an extremely low learning rate of 1e-5. This allows the block 5 convolutional filters to slowly re-specialize from ImageNet features to the specific visual patterns of brain MRI scans — tumor mass, irregular boundaries, contrast enhancement — while keeping the lower layers' general visual knowledge intact.

Cell 1: Importing Libraries

python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
import os

print("TensorFlow Version:", tf.__version__)
print("GPU Available:", len(tf.config.list_physical_devices('GPU')) > 0)

numpy is the foundational numerical computing library. It is used later to expand image dimensions before inference (np.expand_dims) and to randomly select a test image (np.random.choice).

matplotlib.pyplot is used to plot training/validation curves and to display test images inline in the notebook.

tensorflow is the core deep learning framework. Version 2.19.0 runs in eager execution mode by default, meaning operations are evaluated immediately rather than being deferred into a computational graph — this makes debugging significantly easier.

ImageDataGenerator is Keras's high-level utility for loading images from disk in batches, applying on-the-fly data augmentation, normalizing pixel values, and splitting data into training and validation subsets. It is the central data pipeline tool for this project.

VGG16 imports the architecture with its pre-trained ImageNet weights. Keras ships with several pre-trained architectures under tensorflow.keras.applications. VGG16 is chosen here because its straightforward sequential block structure makes the transfer learning and fine-tuning mechanics especially transparent.

The three layer imports serve distinct roles in the custom head: Dense is a fully connected layer; GlobalAveragePooling2D bridges the convolutional base and the dense layers by averaging each feature map into a single scalar; Dropout is a regularization layer that randomly deactivates neurons during training to prevent overfitting.

Model is the functional API class. Unlike the simpler Sequential API, it lets you graft a new head onto a pre-existing base by explicitly defining input tensors, threading them through layers, and specifying the output.

Adam is the optimizer used in both phases. It maintains per-parameter estimates of the first moment (mean of gradients) and second moment (uncentered variance), using these to compute adaptive step sizes for each parameter independently — making it highly effective for fine-tuning pre-trained networks.

The three callback imports each serve a specific protective function during training, which is covered in detail in Cell 5.

tf.config.list_physical_devices('GPU') returns a list of GPU devices detected by TensorFlow. The Kaggle environment provides two NVIDIA Tesla T4 GPUs with approximately 16 GB of VRAM each, and TensorFlow automatically routes tensor operations to the GPU when available.

Output:

TensorFlow Version: 2.19.0
GPU Available: True

Cell 2: Automated Data Path Discovery

python

base_search_path = '/kaggle/input'
DATA_PATH = ''

found = False
for root, dirs, files in os.walk(base_search_path):
    if 'no' in dirs and 'yes' in dirs:
        DATA_PATH = root
        print(f"✅ Success! Correct Data Path Found: {DATA_PATH}")
        found = True
        break

if not found:
    print("❌ Error: Could not find 'no' and 'yes' folders. Please check the dataset.")
else:
    no_count = len(os.listdir(os.path.join(DATA_PATH, 'no')))
    yes_count = len(os.listdir(os.path.join(DATA_PATH, 'yes')))
    print(f"Tumor Negative (No): {no_count} images")
    print(f"Tumor Positive (Yes): {yes_count} images")

Kaggle mounts all datasets under /kaggle/input, but the actual subdirectory structure beneath that root depends on the dataset name and version. Hardcoding a full path like /kaggle/input/brain-mri-images-for-brain-tumor-detection is fragile — if anyone forks the notebook and adds the dataset under a slightly different name, the hardcoded path immediately breaks.

os.walk() is a generator that yields a 3-tuple for every directory it visits: the path of the current directory (root), a list of subdirectory names (dirs), and a list of filenames (files). It performs a depth-first traversal of the entire directory tree. The condition if 'no' in dirs and 'yes' in dirs detects the directory that contains both class subfolders. As soon as this condition is met, the path is stored and the loop breaks immediately — there is no need to continue searching.

os.path.join() constructs a platform-appropriate full path string. os.listdir() returns all items in the given directory, and len() counts them, giving us the class sizes.

Output:

✅ Success! Correct Data Path Found: /kaggle/input/brain-mri-images-for-brain-tumor-detection
Tumor Negative (No): 98 images
Tumor Positive (Yes): 155 images

The dataset has a mild class imbalance: 61.3% positive (tumor) and 38.7% negative, totaling 253 images.

Cell 3: Data Preprocessing & Augmentation

python

IMG_SIZE = (224, 224)
BATCH_SIZE = 16

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    shear_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True,
    fill_mode='nearest',
    validation_split=0.2
)

train_generator = train_datagen.flow_from_directory(
    DATA_PATH,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='binary',
    classes=['no', 'yes'],
    subset='training',
    shuffle=True
)

validation_generator = train_datagen.flow_from_directory(
    DATA_PATH,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='binary',
    classes=['no', 'yes'],
    subset='validation',
    shuffle=False
)

IMG_SIZE = (224, 224) is not arbitrary — it is a hard requirement of VGG16. The architecture was designed and trained on 224×224 images, and after five max-pooling operations that each halve the spatial dimensions, a 224×224 input becomes a 7×7 feature map at the final pooling layer. Using a different size would break the weight compatibility.

BATCH_SIZE = 16 is deliberately small for this dataset. With only 203 training images, a larger batch size would mean fewer gradient updates per epoch and slower convergence. Smaller batches also introduce stochasticity into the training process, which acts as an implicit regularizer.

rescale=1./255 maps every pixel value from the integer range [0, 255] to the continuous range [0.0, 1.0]. This normalization is critical: neural network weight initialization schemes are calibrated for inputs near zero, and large input magnitudes cause large activations, large gradients, and unstable training.

rotation_range=15 randomly rotates images by up to ±15 degrees. For MRI scans, patients' heads may not be perfectly aligned in the scanner, so this range teaches the model to recognize tumors regardless of slight head tilt. A range larger than 15 degrees would risk rotating the image so much that anatomical orientation becomes meaningless.

width_shift_range=0.1 and height_shift_range=0.1 randomly shift images horizontally or vertically by up to 10% of the image dimension (up to ±22 pixels on a 224×224 image), simulating variability in how the brain is centered within the scan frame.

shear_range=0.1 applies a shear transformation — a distortion where the image is slanted along one axis while the other remains fixed — introducing mild geometric variation that helps the model become more invariant to perspective-like deformations.

zoom_range=0.1 randomly zooms images by up to 10% (scaling factor between 0.9 and 1.1), simulating variability in scanner distance or brain size relative to the image frame.

horizontal_flip=True flips images left-to-right with a probability of 0.5. This is acceptable for binary tumor presence/absence classification. If the task were tumor laterality classification (left vs. right hemisphere), flipping would be actively harmful and should be disabled.

fill_mode='nearest' determines how empty border areas are filled after rotation or shifting. The nearest strategy copies the value of the closest edge pixel outward, producing natural-looking borders without the artificial black bars that constant fill would introduce.

validation_split=0.2 reserves 20% of images for validation. Keras applies this deterministically based on file ordering: the last 20% of files become validation data. With 253 total images: 203 for training and 50 for validation.

In flow_from_directory, the classes=['no', 'yes'] parameter is subtle but critical. It explicitly defines both which subfolders to load and their integer encoding: no → 0, yes → 1. Without this, Keras would sort directory names alphabetically, which happens to give the same result here — but explicitly specifying it makes the encoding unambiguous and immune to filesystem quirks.

class_mode='binary' returns labels as 0 or 1 (float32 scalars), which is appropriate for binary cross-entropy loss paired with sigmoid output.

shuffle=True for training randomizes batch ordering, preventing the model from learning spurious patterns based on data order. shuffle=False for validation keeps the set in a consistent order, making metrics reproducible and enabling correct per-sample alignment if needed later.

Output:

Found 203 images belonging to 2 classes.
Found 50 images belonging to 2 classes.

Cell 4: Building the Model

python

base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

base_model.trainable = False

x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(256, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(1, activation='sigmoid')(x)

model = Model(inputs=base_model.input, outputs=predictions)

model.compile(
    optimizer=Adam(learning_rate=0.0001),
    loss='binary_crossentropy',
    metrics=[tf.keras.metrics.BinaryAccuracy(name='accuracy')]
)

model.summary()

VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3)) loads the architecture and downloads the pre-trained weights file (approximately 58.9 MB). The include_top=False argument discards VGG16's original three fully connected layers that were designed to classify ImageNet's 1,000 categories. Those layers are useless — and actively harmful — for our binary brain tumor task.

base_model.trainable = False is the key Phase 1 operation. It marks all 14,714,688 VGG16 parameters as non-trainable. Keras will not compute gradients for these parameters and will not update them during model.fit(). Of the model's total 14,846,273 parameters, only the 131,585 in the custom head will be trained — less than 1% of the total.

x = base_model.output retrieves the symbolic tensor representing the output of VGG16's final layer, block5_pool (MaxPooling2D), which produces a 7×7×512 tensor. In Keras's functional API, every layer call returns a symbolic tensor that can be passed to the next layer.

GlobalAveragePooling2D()(x) collapses the 7×7×512 tensor into a 512-dimensional vector by computing the spatial mean of each channel — for each of the 512 feature maps, it averages the 49 values across the 7×7 grid into a single scalar. Compared to the alternative Flatten(), which would produce a 25,088-dimensional vector requiring 6.4 million parameters in the next Dense layer, GlobalAveragePooling reduces the head to 131,328 parameters. It also acts as a natural regularizer by discarding spatial position information and retaining only whether each feature is activated, not where.

Dense(256, activation='relu')(x) is a fully connected layer with 256 neurons and ReLU activation: f(x) = max(0, x). ReLU introduces the non-linearity that allows the network to learn complex decision boundaries. It avoids the vanishing gradient problem of older activations like sigmoid and tanh in deep networks because when an input is positive, the gradient is simply 1 — it flows back unchanged. This layer has 512 × 256 + 256 = 131,328 parameters.

Dropout(0.5)(x) randomly sets 50% of the 256 neurons to zero during each training forward pass. The remaining neurons' outputs are scaled up by 2.0 to maintain the expected output magnitude. This forces the network to learn redundant, distributed representations — no single neuron can be relied upon. During inference, Dropout is automatically deactivated. With a 203-image training set, overfitting is the primary risk, and a dropout rate of 0.5 is an aggressive but appropriate defense.

Dense(1, activation='sigmoid')(x) is the output layer. The sigmoid function squashes any real-valued input to (0, 1): σ(x) = 1 / (1 + e^(-x)). This output is interpreted as the probability that the input image belongs to class 1 — tumor detected. The layer has 256 × 1 + 1 = 257 parameters.

Model(inputs=base_model.input, outputs=predictions) assembles the final model by declaring the entry point and exit point. Keras traces the computational graph between these two tensors and registers all intermediate layers.

Adam(learning_rate=0.0001) uses a learning rate of 1e-4 for Phase 1 — small enough to prevent the randomly initialized head from making destructively large gradient updates into the pre-trained base (even though it is frozen, stable gradients in the head itself are important for rapid convergence), but large enough to learn meaningfully within 20 epochs.

loss='binary_crossentropy' is the appropriate loss for binary classification with sigmoid output. For a single prediction p and true label y ∈ {0, 1}: loss = -[y * log(p) + (1-y) * log(1-p)]. This loss heavily penalizes confident wrong predictions, pushing the model toward confident correct ones.

Parameter summary after Phase 1 setup:

Total params: 14,846,273 — Trainable: 131,585 — Non-trainable: 14,714,688

Cell 5: Defining Callbacks

python

checkpoint = ModelCheckpoint(
    'best_brain_tumor_model.keras',
    monitor='val_accuracy',
    save_best_only=True,
    mode='max',
    verbose=1
)

early_stop = EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True
)

reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.2,
    patience=3,
    min_lr=1e-6
)

ModelCheckpoint monitors validation accuracy after every epoch and saves the model to disk only when a new maximum is achieved (save_best_only=True, mode='max'). The .keras extension uses TensorFlow's native format, storing architecture, weights, optimizer state, and training config in a single file. Without this callback, the best-performing weights would be lost if the model later overfits.

EarlyStopping monitors validation loss with patience=5. If validation loss does not improve for five consecutive epochs, training is halted and the weights from the best epoch are restored (restore_best_weights=True). Validation loss is preferred over accuracy for this role because it is continuous and more sensitive — accuracy is a coarser, discrete metric that can plateau while the model is still genuinely improving in calibration.

ReduceLROnPlateau monitors validation loss with patience=3 and factor=0.2. If validation loss stagnates for three epochs, the learning rate is multiplied by 0.2. A learning rate of 1e-4 becomes 2e-5 on the first reduction, then 4e-6, then floored at min_lr=1e-6. The key interaction between callbacks: ReduceLROnPlateau fires first (after 3 stagnant epochs), giving the model a chance to recover at a smaller step size. Only if that recovery also fails for 5 total stagnant epochs does EarlyStopping terminate training.

Cell 6: Phase 1 Training — Feature Extraction

python

history_phase1 = model.fit(
    train_generator,
    steps_per_epoch=len(train_generator),
    validation_data=validation_generator,
    validation_steps=len(validation_generator),
    epochs=20,
    callbacks=[checkpoint, early_stop, reduce_lr]
)

steps_per_epoch=len(train_generator) evaluates to ceil(203 / 16) = 13. This tells Keras that one epoch consists of 13 generator steps — approximately 203 images. After 13 steps, the epoch ends and validation begins.

validation_steps=len(validation_generator) evaluates to ceil(50 / 16) = 4, covering all 50 validation images.

During Phase 1, the VGG16 base acts as a fixed feature extractor. For each input image, it produces a 512-dimensional feature vector representing everything VGG16 "sees" in that image through the lens of its ImageNet-trained filters. The Adam optimizer updates only the 131,585 custom head parameters, learning which combinations of those 512 features reliably indicate tumor presence. Validation accuracy typically climbs from around 62% in epoch 1 to the high 70s and low 80s by the time early stopping triggers.

Cell 7: Phase 2 Setup — Fine-Tuning

python

base_model.trainable = True

for layer in base_model.layers[:-4]:
    layer.trainable = False

model.compile(
    optimizer=Adam(learning_rate=1e-5),
    loss='binary_crossentropy',
    metrics=[tf.keras.metrics.BinaryAccuracy(name='accuracy')]
)

base_model.trainable = True re-enables gradient computation for all VGG16 layers. We then immediately re-freeze the majority of them with the for loop.

base_model.layers[:-4] is Python slice notation for "all elements except the last 4." For VGG16, the last four layers are block5_conv1, block5_conv2, block5_conv3, and block5_pool. Everything from input_layer through block4_pool is frozen again. Only block 5's three convolutional layers remain trainable alongside the custom head.

Why block 5 specifically? VGG16's feature hierarchy progresses from universal to domain-specific as we go deeper. Blocks 1 and 2 detect edges and simple textures — completely universal, should never be touched. Block 3 detects textures and patterns — still broadly useful. Block 4 detects more structured shapes — beneficial to keep frozen given our dataset size. Block 5 detects complex, high-level, domain-specific patterns. In the ImageNet context, these might encode "fur texture" or "wheel shapes." For MRI brain tumor detection, we want to replace these with representations for "tumor mass characteristics" and "irregular contrast boundaries." These are the layers worth the cost of fine-tuning.

model.compile(optimizer=Adam(learning_rate=1e-5), ...) — this recompilation is mandatory. After changing trainable attributes, Keras must rebuild its internal training graph. Without recompilation, the trainability changes have no effect. The learning rate drops to 1e-5 — ten times smaller than Phase 1. The block 5 weights are already in a good region of the loss landscape thanks to ImageNet training. A large learning rate would catastrophically overshoot and destroy this good initialization. With 1e-5, each update is a tiny perturbation that gradually nudges the weights toward better MRI representations without catastrophic forgetting of their prior visual knowledge.

Cell 8: Phase 2 Training — Executing Fine-Tuning

python

history_fine = model.fit(
    train_generator,
    steps_per_epoch=len(train_generator),
    validation_data=validation_generator,
    validation_steps=len(validation_generator),
    epochs=15,
    callbacks=[checkpoint, early_stop, reduce_lr]
)

Structurally identical to Phase 1, with epochs=15 since fine-tuning is refinement rather than learning from scratch. The characteristic behavior is that validation accuracy — which may have plateaued around 80–85% at the end of Phase 1 — climbs further during fine-tuning, typically reaching 85–92% on this dataset, as the block 5 filters gradually specialize for MRI-specific patterns.

The same three callbacks remain active. The best checkpoint continues to be updated if validation accuracy improves. Early stopping will terminate training if the model stops benefiting from fine-tuning.

Cell 9: Visualizing Training Performance

python

acc = history_fine.history['accuracy']
val_acc = history_fine.history['val_accuracy']
loss = history_fine.history['loss']
val_loss = history_fine.history['val_loss']

plt.figure(figsize=(14, 5))

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.title('Fine-Tuning Accuracy')
plt.legend()
plt.grid(True)

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.title('Fine-Tuning Loss')
plt.legend()
plt.grid(True)

plt.show()

history_fine.history is a dictionary returned by model.fit() where each key is a metric name and the value is a list of that metric's value at each epoch. The four keys here are 'accuracy', 'val_accuracy', 'loss', and 'val_loss'.

plt.figure(figsize=(14, 5)) creates a figure 14 inches wide and 5 inches tall. plt.subplot(1, 2, 1) and plt.subplot(1, 2, 2) divide this figure into a 1-row, 2-column grid, placing the accuracy plot on the left and the loss plot on the right.

A healthy training curve shows training and validation metrics converging and moving together. Divergence — training accuracy rising while validation accuracy stagnates or falls — is the classic signature of overfitting. Given the small dataset and the Dropout regularization, some gap between training and validation is expected but should remain moderate.

Note that only the Phase 2 history is plotted here. For a comprehensive view of the entire training process, one could concatenate both history_phase1.history and history_fine.history lists before plotting.

Cell 10: Inference on a Test Image

python

from tensorflow.keras.preprocessing import image

test_folder = os.path.join(DATA_PATH, 'yes')
random_img = np.random.choice(os.listdir(test_folder))
img_path = os.path.join(test_folder, random_img)

img = image.load_img(img_path, target_size=IMG_SIZE)
plt.imshow(img)
plt.axis('off')
plt.title("Test Image (Actual: Tumor)")
plt.show()

img_array = image.img_to_array(img)
img_array = np.expand_dims(img_array, axis=0)
img_array /= 255.

prediction = model.predict(img_array)
score = prediction[0][0]

if score > 0.5:
    print(f"🚨 Prediction: TUMOR DETECTED (Confidence: {score*100:.2f}%)")
else:
    print(f"✅ Prediction: NO TUMOR (Confidence: {(1-score)*100:.2f}%)")

image.load_img(img_path, target_size=IMG_SIZE) loads the image from disk and resizes it to 224×224 using the PIL library internally. The result is a PIL Image object.

image.img_to_array(img) converts it to a NumPy array of shape (224, 224, 3) with dtype float32. Pixel values are still in [0, 255] at this point.

np.expand_dims(img_array, axis=0) inserts a new dimension at position 0, changing the shape from (224, 224, 3) to (1, 224, 224, 3). Keras models always expect batches. A single image must be wrapped in a batch dimension of size 1 before being passed to model.predict().

img_array /= 255. is an in-place division that normalizes pixel values from [0, 255] to [0.0, 1.0]. This step must exactly match the rescale=1./255 applied during training. If inference preprocessing differs from training preprocessing, the model sees inputs from a completely different distribution than it was trained on, and predictions become unreliable.

prediction = model.predict(img_array) runs the full forward pass and returns a NumPy array of shape (1, 1). score = prediction[0][0] extracts the single float probability value.

The decision threshold of 0.5 is the natural midpoint of the sigmoid output range. Values above 0.5 correspond to class 1 (tumor detected), and values at or below 0.5 correspond to class 0 (no tumor). In clinical deployment, this threshold could be adjusted: a lower threshold (e.g., 0.3) would increase sensitivity at the cost of more false positives, which is often the clinically preferable trade-off for cancer screening.

When tumor is detected, confidence is displayed as score * 100. When no tumor is detected, it is displayed as (1 - score) * 100.

Output:

Raw Score: 0.5623
🚨 Prediction: TUMOR DETECTED (Confidence: 56.23%)

A score of 0.5623 on a known positive image is a correct prediction, but the modest confidence suggests this particular scan has characteristics that place it near the model's decision boundary — perhaps an ambiguous MRI slice or a small tumor.

Cell 11: Saving the Model

python

model.save('final_brain_tumor_vgg16_finetuned.keras')
print("Model saved successfully as 'final_brain_tumor_vgg16_finetuned.keras'")

model.save() with the .keras extension uses TensorFlow 2.x's native format. It stores the complete model in a single file: the architecture (layer types, configurations, and connections), all weight values (all 14.8 million parameters), the compile configuration (optimizer type, loss function, metrics), and the optimizer's internal state (Adam's moment estimates, allowing training to resume exactly where it left off).

Two model files are produced by the full notebook: best_brain_tumor_model.keras saved by ModelCheckpoint at peak validation accuracy, and final_brain_tumor_vgg16_finetuned.keras saved here at the end. Because EarlyStopping(restore_best_weights=True) is active, both files should contain the same weights — but best_brain_tumor_model.keras is the safer choice for deployment since it is explicitly saved at the verified peak performance epoch.

Architectural Overview

Input: (batch_size, 224, 224, 3) — RGB MRI image

VGG16 Base — 14,714,688 params, frozen in Phase 1, block 5 unfrozen in Phase 2: Block 1 → Block 2 → Block 3 → Block 4 → Block 5 → output: 7×7×512

Custom Head — 131,585 params, always trainable: GlobalAveragePooling2D → 512 → Dense(256, relu) → 256 → Dropout(0.5) → Dense(1, sigmoid) → probability

Decision rule: score > 0.5 → Tumor Detected. score ≤ 0.5 → No Tumor.

Key Takeaways

The core lesson of this project is that you don't need a massive dataset to build a meaningful medical image classifier — you need the right strategy. Transfer learning bridges the gap between what a model already knows about the visual world and what it needs to learn about your specific domain.

The two-phase approach is not optional; it is architecturally necessary. Phase 1 stabilizes the new classification head. Phase 2 carefully adapts the upper layers of the pre-trained backbone. Skipping Phase 1 risks gradient shock that destroys the transferred knowledge. Skipping Phase 2 leaves performance on the table by not adapting the highest-level features to your domain.

Data augmentation, when applied conservatively and thoughtfully for medical images, is a powerful way to extract more learning signal from limited training data. And the three-callback system — checkpoint, early stopping, and learning rate reduction — forms a robust training harness that protects against overfitting, wasted computation, and suboptimal convergence simultaneously.

The full notebook is available on Kaggle. If this walkthrough helped clarify the transfer learning pipeline for you, an upvote on the notebook goes a long way — it helps others in the community discover it.

Tags: Deep Learning, Computer Vision, Transfer Learning, Medical Imaging, TensorFlow, Keras, VGG16, Brain Tumor Detection, Python

Content is user-generated and unverified.