This is the multi-page printable view of this section. Click here to print.
Integration tutorials
- 1: PyTorch
- 2: PyTorch Lightning
- 3: Hugging Face
- 4: TensorFlow
- 5: TensorFlow Sweeps
- 6: 3D brain tumor segmentation with MONAI
- 7: Keras
- 8: Keras models
- 9: Keras tables
- 10: XGBoost Sweeps
1 - PyTorch
Use W&B for machine learning experiment tracking, dataset versioning, and project collaboration.
 
What this notebook covers
We show you how to integrate W&B with your PyTorch code to add experiment tracking to your pipeline.
 
# import the library
import wandb
# start a new experiment
with wandb.init(project="new-sota-model") as run:
 
    # capture a dictionary of hyperparameters with config
    run.config = {"learning_rate": 0.001, "epochs": 100, "batch_size": 128}
    # set up model and data
    model, dataloader = get_model(), get_data()
    # optional: track gradients
    run.watch(model)
    for batch in dataloader:
    metrics = model.training_step()
    # log metrics inside your training loop to visualize model performance
    run.log(metrics)
    # optional: save model at the end
    model.to_onnx()
    run.save("model.onnx")
Follow along with a video tutorial.
Note: Sections starting with Step are all you need to integrate W&B in an existing pipeline. The rest just loads data and defines a model.
Install, import, and log in
import os
import random
import numpy as np
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from tqdm.auto import tqdm
# Ensure deterministic behavior
torch.backends.cudnn.deterministic = True
random.seed(hash("setting random seeds") % 2**32 - 1)
np.random.seed(hash("improves reproducibility") % 2**32 - 1)
torch.manual_seed(hash("by removing stochasticity") % 2**32 - 1)
torch.cuda.manual_seed_all(hash("so runs are repeatable") % 2**32 - 1)
# Device configuration
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# remove slow mirror from list of MNIST mirrors
torchvision.datasets.MNIST.mirrors = [mirror for mirror in torchvision.datasets.MNIST.mirrors
                                      if not mirror.startswith("http://yann.lecun.com")]
Step 0: Install W&B
To get started, we’ll need to get the library.
wandb is easily installed using pip.
!pip install wandb onnx -Uq
Step 1: Import W&B and Login
In order to log data to our web service, you’ll need to log in.
If this is your first time using W&B, you’ll need to sign up for a free account at the link that appears.
import wandb
wandb.login()
Define the Experiment and Pipeline
Track metadata and hyperparameters with wandb.init
Programmatically, the first thing we do is define our experiment: what are the hyperparameters? what metadata is associated with this run?
It’s a pretty common workflow to store this information in a config dictionary
(or similar object)
and then access it as needed.
For this example, we’re only letting a few hyperparameters vary
and hand-coding the rest.
But any part of your model can be part of the config.
We also include some metadata: we’re using the MNIST dataset and a convolutional architecture. If we later work with, say, fully connected architectures on CIFAR in the same project, this will help us separate our runs.
config = dict(
    epochs=5,
    classes=10,
    kernels=[16, 32],
    batch_size=128,
    learning_rate=0.005,
    dataset="MNIST",
    architecture="CNN")
Now, let’s define the overall pipeline, which is pretty typical for model-training:
- we first makea model, plus associated data and optimizer, then
- we trainthe model accordingly and finally
- testit to see how training went.
We’ll implement these functions below.
def model_pipeline(hyperparameters):
    # tell wandb to get started
    with wandb.init(project="pytorch-demo", config=hyperparameters) as run:
        # access all HPs through run.config, so logging matches execution.
        config = run.config
        # make the model, data, and optimization problem
        model, train_loader, test_loader, criterion, optimizer = make(config)
        print(model)
        # and use them to train the model
        train(model, train_loader, criterion, optimizer, config)
        # and test its final performance
        test(model, test_loader)
    return model
The only difference here from a standard pipeline
is that it all occurs inside the context of wandb.init.
Calling this function sets up a line of communication
between your code and our servers.
Passing the config dictionary to wandb.init
immediately logs all that information to us,
so you’ll always know what hyperparameter values
you set your experiment to use.
To ensure the values you chose and logged are always the ones that get used
in your model, we recommend using the run.config copy of your object.
Check the definition of make below to see some examples.
Side Note: We take care to run our code in separate processes, so that any issues on our end (such as if a giant sea monster attacks our data centers) don’t crash your code. Once the issue is resolved, such as when the Kraken returns to the deep, you can log the data with
wandb sync.
def make(config):
    # Make the data
    train, test = get_data(train=True), get_data(train=False)
    train_loader = make_loader(train, batch_size=config.batch_size)
    test_loader = make_loader(test, batch_size=config.batch_size)
    # Make the model
    model = ConvNet(config.kernels, config.classes).to(device)
    # Make the loss and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(
        model.parameters(), lr=config.learning_rate)
    
    return model, train_loader, test_loader, criterion, optimizer
Define the Data Loading and Model
Now, we need to specify how the data is loaded and what the model looks like.
This part is very important, but it’s
no different from what it would be without wandb,
so we won’t dwell on it.
def get_data(slice=5, train=True):
    full_dataset = torchvision.datasets.MNIST(root=".",
                                              train=train, 
                                              transform=transforms.ToTensor(),
                                              download=True)
    #  equiv to slicing with [::slice] 
    sub_dataset = torch.utils.data.Subset(
      full_dataset, indices=range(0, len(full_dataset), slice))
    
    return sub_dataset
def make_loader(dataset, batch_size):
    loader = torch.utils.data.DataLoader(dataset=dataset,
                                         batch_size=batch_size, 
                                         shuffle=True,
                                         pin_memory=True, num_workers=2)
    return loader
Defining the model is normally the fun part.
But nothing changes with wandb,
so we’re gonna stick with a standard ConvNet architecture.
Don’t be afraid to mess around with this and try some experiments – all your results will be logged on wandb.ai.
# Conventional and convolutional neural network
class ConvNet(nn.Module):
    def __init__(self, kernels, classes=10):
        super(ConvNet, self).__init__()
        
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, kernels[0], kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, kernels[1], kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.fc = nn.Linear(7 * 7 * kernels[-1], classes)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out
Define Training Logic
Moving on in our model_pipeline, it’s time to specify how we train.
Two wandb functions come into play here: watch and log.
Track gradients with run.watch() and everything else with run.log()
run.watch will log the gradients and the parameters of your model,
every log_freq steps of training.
All you need to do is call it before you start training.
The rest of the training code remains the same:
we iterate over epochs and batches,
running forward and backward passes
and applying our optimizer.
def train(model, loader, criterion, optimizer, config):
    # Tell wandb to watch what the model gets up to: gradients, weights, and more.
    run = wandb.init(project="pytorch-demo", config=config)
    run.watch(model, criterion, log="all", log_freq=10)
    # Run training and track with wandb
    total_batches = len(loader) * config.epochs
    example_ct = 0  # number of examples seen
    batch_ct = 0
    for epoch in tqdm(range(config.epochs)):
        for _, (images, labels) in enumerate(loader):
            loss = train_batch(images, labels, model, optimizer, criterion)
            example_ct +=  len(images)
            batch_ct += 1
            # Report metrics every 25th batch
            if ((batch_ct + 1) % 25) == 0:
                train_log(loss, example_ct, epoch)
def train_batch(images, labels, model, optimizer, criterion):
    images, labels = images.to(device), labels.to(device)
    
    # Forward pass ➡
    outputs = model(images)
    loss = criterion(outputs, labels)
    
    # Backward pass ⬅
    optimizer.zero_grad()
    loss.backward()
    # Step with optimizer
    optimizer.step()
    return loss
The only difference is in the logging code:
where previously you might have reported metrics by printing to the terminal,
now you pass the same information to run.log().
run.log() expects a dictionary with strings as keys.
These strings identify the objects being logged, which make up the values.
You can also optionally log which step of training you’re on.
Side Note: I like to use the number of examples the model has seen, since this makes for easier comparison across batch sizes, but you can use raw steps or batch count. For longer training runs, it can also make sense to log by
epoch.
def train_log(loss, example_ct, epoch):
    with wandb.init(project="pytorch-demo") as run:
        # Log the loss and epoch number
        # This is where we log the metrics to W&B
        run.log({"epoch": epoch, "loss": loss}, step=example_ct)
        print(f"Loss after {str(example_ct).zfill(5)} examples: {loss:.3f}")
Define Testing Logic
Once the model is done training, we want to test it: run it against some fresh data from production, perhaps, or apply it to some hand-curated examples.
(Optional) Call run.save()
This is also a great time to save the model’s architecture
and final parameters to disk.
For maximum compatibility, we’ll export our model in the
Open Neural Network eXchange (ONNX) format.
Passing that filename to run.save() ensures that the model parameters
are saved to W&B’s servers: no more losing track of which .h5 or .pb
corresponds to which training runs.
For more advanced wandb features for storing, versioning, and distributing
models, check out our Artifacts tools.
def test(model, test_loader):
    model.eval()
    with wandb.init(project="pytorch-demo") as run:
        # Run the model on some test examples
        with torch.no_grad():
            correct, total = 0, 0
            for images, labels in test_loader:
                images, labels = images.to(device), labels.to(device)
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()
            print(f"Accuracy of the model on the {total} " +
                f"test images: {correct / total:%}")
            
            run.log({"test_accuracy": correct / total})
        # Save the model in the exchangeable ONNX format
        torch.onnx.export(model, images, "model.onnx")
        run.save("model.onnx")
Run training and watch your metrics live on wandb.ai
Now that we’ve defined the whole pipeline and slipped in those few lines of W&B code, we’re ready to run our fully tracked experiment.
We’ll report a few links to you: our documentation, the Project page, which organizes all the runs in a project, and the Run page, where this run’s results will be stored.
Navigate to the Run page and check out these tabs:
- Charts, where the model gradients, parameter values, and loss are logged throughout training
- System, which contains a variety of system metrics, including Disk I/O utilization, CPU and GPU metrics (watch that temperature soar), and more
- Logs, which has a copy of anything pushed to standard out during training
- Files, where, once training is complete, you can click on the model.onnxto view our network with the Netron model viewer.
Once the run in finished, when the with wandb.init block exits,
we’ll also print a summary of the results in the cell output.
# Build, train and analyze the model with the pipeline
model = model_pipeline(config)
Test Hyperparameters with Sweeps
We only looked at a single set of hyperparameters in this example. But an important part of most ML workflows is iterating over a number of hyperparameters.
You can use W&B Sweeps to automate hyperparameter testing and explore the space of possible models and optimization strategies.
Check out a Colab notebook demonstrating hyperparameter optimization using W&B Sweeps.
Running a hyperparameter sweep with W&B is very easy. There are just 3 simple steps:
- 
Define the sweep: We do this by creating a dictionary or a YAML file that specifies the parameters to search through, the search strategy, the optimization metric et all. 
- 
Initialize the sweep: sweep_id = wandb.sweep(sweep_config)
- 
Run the sweep agent: wandb.agent(sweep_id, function=train)
That’s all there is to running a hyperparameter sweep.
 
Example Gallery
Explore examples of projects tracked and visualized with W&B in our Gallery →.
Advanced Setup
- Environment variables: Set API keys in environment variables so you can run training on a managed cluster.
- Offline mode: Use dryrunmode to train offline and sync results later.
- On-prem: Install W&B in a private cloud or air-gapped servers in your own infrastructure. We have local installations for everyone from academics to enterprise teams.
- Sweeps: Set up hyperparameter search quickly with our lightweight tool for tuning.
2 - PyTorch Lightning
We will build an image classification pipeline using PyTorch Lightning. We will follow this style guide to increase the readability and reproducibility of our code. A cool explanation of this available here.Setting up PyTorch Lightning and W&B
For this tutorial, we need PyTorch Lightning and W&B.
pip install lightning -q
pip install wandb -qU
import lightning.pytorch as pl
# your favorite machine learning tracking tool
from lightning.pytorch.loggers import WandbLogger
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import random_split, DataLoader
from torchmetrics import Accuracy
from torchvision import transforms
from torchvision.datasets import CIFAR10
import wandb
Now you’ll need to log in to your wandb account.
wandb.login()
DataModule - The Data Pipeline we Deserve
DataModules are a way of decoupling data-related hooks from the LightningModule so you can develop dataset agnostic models.
It organizes the data pipeline into one shareable and reusable class. A datamodule encapsulates the five steps involved in data processing in PyTorch:
- Download / tokenize / process.
- Clean and (maybe) save to disk.
- Load inside Dataset.
- Apply transforms (rotate, tokenize, etc…).
- Wrap inside a DataLoader.
Learn more about datamodules here. Let’s build a datamodule for the Cifar-10 dataset.
class CIFAR10DataModule(pl.LightningDataModule):
    def __init__(self, batch_size, data_dir: str = './'):
        super().__init__()
        self.data_dir = data_dir
        self.batch_size = batch_size
        self.transform = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        ])
        
        self.num_classes = 10
    
    def prepare_data(self):
        CIFAR10(self.data_dir, train=True, download=True)
        CIFAR10(self.data_dir, train=False, download=True)
    
    def setup(self, stage=None):
        # Assign train/val datasets for use in dataloaders
        if stage == 'fit' or stage is None:
            cifar_full = CIFAR10(self.data_dir, train=True, transform=self.transform)
            self.cifar_train, self.cifar_val = random_split(cifar_full, [45000, 5000])
        # Assign test dataset for use in dataloader(s)
        if stage == 'test' or stage is None:
            self.cifar_test = CIFAR10(self.data_dir, train=False, transform=self.transform)
    
    def train_dataloader(self):
        return DataLoader(self.cifar_train, batch_size=self.batch_size, shuffle=True)
    def val_dataloader(self):
        return DataLoader(self.cifar_val, batch_size=self.batch_size)
    def test_dataloader(self):
        return DataLoader(self.cifar_test, batch_size=self.batch_size)
Callbacks
A callback is a self-contained program that can be reused across projects. PyTorch Lightning comes with few built-in callbacks which are regularly used. Learn more about callbacks in PyTorch Lightning here.
Built-in Callbacks
In this tutorial, we will use Early Stopping and Model Checkpoint built-in callbacks. They can be passed to the Trainer.
Custom Callbacks
If you are familiar with Custom Keras callback, the ability to do the same in your PyTorch pipeline is just a cherry on the cake.
Since we are performing image classification, the ability to visualize the model’s predictions on some samples of images can be helpful. This in the form of a callback can help debug the model at an early stage.
class ImagePredictionLogger(pl.callbacks.Callback):
    def __init__(self, val_samples, num_samples=32):
        super().__init__()
        self.num_samples = num_samples
        self.val_imgs, self.val_labels = val_samples
    
    def on_validation_epoch_end(self, trainer, pl_module):
        # Bring the tensors to CPU
        val_imgs = self.val_imgs.to(device=pl_module.device)
        val_labels = self.val_labels.to(device=pl_module.device)
        # Get model prediction
        logits = pl_module(val_imgs)
        preds = torch.argmax(logits, -1)
        # Log the images as wandb Image
        trainer.logger.experiment.log({
            "examples":[wandb.Image(x, caption=f"Pred:{pred}, Label:{y}") 
                           for x, pred, y in zip(val_imgs[:self.num_samples], 
                                                 preds[:self.num_samples], 
                                                 val_labels[:self.num_samples])]
            })
        
LightningModule - Define the System
The LightningModule defines a system and not a model. Here a system groups all the research code into a single class to make it self-contained. LightningModule organizes your PyTorch code into 5 sections:
- Computations (__init__).
- Train loop (training_step)
- Validation loop (validation_step)
- Test loop (test_step)
- Optimizers (configure_optimizers)
One can thus build a dataset agnostic model that can be easily shared. Let’s build a system for Cifar-10 classification.
class LitModel(pl.LightningModule):
    def __init__(self, input_shape, num_classes, learning_rate=2e-4):
        super().__init__()
        
        # log hyperparameters
        self.save_hyperparameters()
        self.learning_rate = learning_rate
        
        self.conv1 = nn.Conv2d(3, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 32, 3, 1)
        self.conv3 = nn.Conv2d(32, 64, 3, 1)
        self.conv4 = nn.Conv2d(64, 64, 3, 1)
        self.pool1 = torch.nn.MaxPool2d(2)
        self.pool2 = torch.nn.MaxPool2d(2)
        
        n_sizes = self._get_conv_output(input_shape)
        self.fc1 = nn.Linear(n_sizes, 512)
        self.fc2 = nn.Linear(512, 128)
        self.fc3 = nn.Linear(128, num_classes)
        self.accuracy = Accuracy(task='multiclass', num_classes=num_classes)
    # returns the size of the output tensor going into Linear layer from the conv block.
    def _get_conv_output(self, shape):
        batch_size = 1
        input = torch.autograd.Variable(torch.rand(batch_size, *shape))
        output_feat = self._forward_features(input) 
        n_size = output_feat.data.view(batch_size, -1).size(1)
        return n_size
        
    # returns the feature tensor from the conv block
    def _forward_features(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool1(F.relu(self.conv2(x)))
        x = F.relu(self.conv3(x))
        x = self.pool2(F.relu(self.conv4(x)))
        return x
    
    # will be used during inference
    def forward(self, x):
       x = self._forward_features(x)
       x = x.view(x.size(0), -1)
       x = F.relu(self.fc1(x))
       x = F.relu(self.fc2(x))
       x = F.log_softmax(self.fc3(x), dim=1)
       
       return x
    
    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = F.nll_loss(logits, y)
        
        # training metrics
        preds = torch.argmax(logits, dim=1)
        acc = self.accuracy(preds, y)
        self.log('train_loss', loss, on_step=True, on_epoch=True, logger=True)
        self.log('train_acc', acc, on_step=True, on_epoch=True, logger=True)
        
        return loss
    
    def validation_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = F.nll_loss(logits, y)
        # validation metrics
        preds = torch.argmax(logits, dim=1)
        acc = self.accuracy(preds, y)
        self.log('val_loss', loss, prog_bar=True)
        self.log('val_acc', acc, prog_bar=True)
        return loss
    
    def test_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = F.nll_loss(logits, y)
        
        # validation metrics
        preds = torch.argmax(logits, dim=1)
        acc = self.accuracy(preds, y)
        self.log('test_loss', loss, prog_bar=True)
        self.log('test_acc', acc, prog_bar=True)
        return loss
    
    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
        return optimizer
Train and Evaluate
Now that we have organized our data pipeline using DataModule and model architecture+training loop using LightningModule, the PyTorch Lightning Trainer automates everything else for us.
The Trainer automates:
- Epoch and batch iteration
- Calling of optimizer.step(),backward,zero_grad()
- Calling of .eval(), enabling/disabling grads
- Saving and loading weights
- W&B logging
- Multi-GPU training support
- TPU support
- 16-bit training support
dm = CIFAR10DataModule(batch_size=32)
# To access the x_dataloader we need to call prepare_data and setup.
dm.prepare_data()
dm.setup()
# Samples required by the custom ImagePredictionLogger callback to log image predictions.
val_samples = next(iter(dm.val_dataloader()))
val_imgs, val_labels = val_samples[0], val_samples[1]
val_imgs.shape, val_labels.shape
model = LitModel((3, 32, 32), dm.num_classes)
# Initialize wandb logger
wandb_logger = WandbLogger(project='wandb-lightning', job_type='train')
# Initialize Callbacks
early_stop_callback = pl.callbacks.EarlyStopping(monitor="val_loss")
checkpoint_callback = pl.callbacks.ModelCheckpoint()
# Initialize a trainer
trainer = pl.Trainer(max_epochs=2,
                     logger=wandb_logger,
                     callbacks=[early_stop_callback,
                                ImagePredictionLogger(val_samples),
                                checkpoint_callback],
                     )
# Train the model 
trainer.fit(model, dm)
# Evaluate the model on the held-out test set ⚡⚡
trainer.test(dataloaders=dm.test_dataloader())
# Close wandb run
run.finish()
Final Thoughts
I come from the TensorFlow/Keras ecosystem and find PyTorch a bit overwhelming even though it’s an elegant framework. Just my personal experience though. While exploring PyTorch Lightning, I realized that almost all of the reasons that kept me away from PyTorch is taken care of. Here’s a quick summary of my excitement:
- Then: Conventional PyTorch model definition used to be all over the place. With the model in some model.pyscript and the training loop in thetrain.pyfile. It was a lot of looking back and forth to understand the pipeline.
- Now: The LightningModuleacts as a system where the model is defined along with thetraining_step,validation_step, etc. Now it’s modular and shareable.
- Then: The best part about TensorFlow/Keras is the input data pipeline. Their dataset catalog is rich and growing. PyTorch’s data pipeline used to be the biggest pain point. In normal PyTorch code, the data download/cleaning/preparation is usually scattered across many files.
- Now: The DataModule organizes the data pipeline into one shareable and reusable class. It’s simply a collection of a train_dataloader,val_dataloader(s),test_dataloader(s) along with the matching transforms and data processing/downloads steps required.
- Then: With Keras, one can call model.fitto train the model andmodel.predictto run inference on.model.evaluateoffered a good old simple evaluation on the test data. This is not the case with PyTorch. One will usually find separatetrain.pyandtest.pyfiles.
- Now: With the LightningModulein place, theTrainerautomates everything. One needs to just calltrainer.fitandtrainer.testto train and evaluate the model.
- Then: TensorFlow loves TPU, PyTorch…
- Now: With PyTorch Lightning, it’s so easy to train the same model with multiple GPUs and even on TPU.
- Then: I am a big fan of Callbacks and prefer writing custom callbacks. Something as trivial as Early Stopping used to be a point of discussion with conventional PyTorch.
- Now: With PyTorch Lightning using Early Stopping and Model Checkpointing is a piece of cake. I can even write custom callbacks.
🎨 Conclusion and Resources
I hope you find this report helpful. I will encourage to play with the code and train an image classifier with a dataset of your choice.
Here are some resources to learn more about PyTorch Lightning:
- Step-by-step walk-through: This is one of the official tutorials. Their documentation is really well written and I highly encourage it as a good learning resource.
- Use Pytorch Lightning with W&B: This is a quick colab that you can run through to learn more about how to use W&B with PyTorch Lightning.
3 - Hugging Face
 Visualize your Hugging Face model’s performance quickly with a seamless W&B integration.
Visualize your Hugging Face model’s performance quickly with a seamless W&B integration.
Compare hyperparameters, output metrics, and system stats like GPU utilization across your models.
Why should I use W&B?
 
- Unified dashboard: Central repository for all your model metrics and predictions
- Lightweight: No code changes required to integrate with Hugging Face
- Accessible: Free for individuals and academic teams
- Secure: All projects are private by default
- Trusted: Used by machine learning teams at OpenAI, Toyota, Lyft and more
Think of W&B like GitHub for machine learning models— save machine learning experiments to your private, hosted dashboard. Experiment quickly with the confidence that all the versions of your models are saved for you, no matter where you’re running your scripts.
W&B lightweight integrations works with any Python script, and all you need to do is sign up for a free W&B account to start tracking and visualizing your models.
In the Hugging Face Transformers repo, we’ve instrumented the Trainer to automatically log training and evaluation metrics to W&B at each logging step.
Here’s an in depth look at how the integration works: Hugging Face + W&B Report.
Install, import, and log in
Install the Hugging Face and W&B libraries, and the GLUE dataset and training script for this tutorial.
- Hugging Face Transformers: Natural language models and datasets
- W&B: Experiment tracking and visualization
- GLUE dataset: A language understanding benchmark dataset
- GLUE script: Model training script for sequence classification
!pip install datasets wandb evaluate accelerate -qU
!wget https://raw.githubusercontent.com/huggingface/transformers/refs/heads/main/examples/pytorch/text-classification/run_glue.py
# the run_glue.py script requires transformers dev
!pip install -q git+https://github.com/huggingface/transformers
Before continuing, sign up for a free account.
Put in your API key
Once you’ve signed up, run the next cell and click on the link to get your API key and authenticate this notebook.
import wandb
wandb.login()
Optionally, we can set environment variables to customize W&B logging. See the Hugging Face integration guide.
# Optional: log both gradients and parameters
%env WANDB_WATCH=all
Train the model
Next, call the downloaded training script run_glue.py and see training automatically get tracked to the W&B dashboard. This script fine-tunes BERT on the Microsoft Research Paraphrase Corpus— pairs of sentences with human annotations indicating whether they are semantically equivalent.
%env WANDB_PROJECT=huggingface-demo
%env TASK_NAME=MRPC
!python run_glue.py \
  --model_name_or_path bert-base-uncased \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --max_seq_length 256 \
  --per_device_train_batch_size 32 \
  --learning_rate 2e-4 \
  --num_train_epochs 3 \
  --output_dir /tmp/$TASK_NAME/ \
  --overwrite_output_dir \
  --logging_steps 50
Visualize results in dashboard
Click the link printed out above, or go to wandb.ai to see your results stream in live. The link to see your run in the browser will appear after all the dependencies are loaded. Look for the following output: “wandb: View run at [URL to your unique run]”
Visualize Model Performance It’s easy to look across dozens of experiments, zoom in on interesting findings, and visualize highly dimensional data.
 
Compare Architectures Here’s an example comparing BERT vs DistilBERT. It’s easy to see how different architectures effect the evaluation accuracy throughout training with automatic line plot visualizations.
 
Track key information effortlessly by default
W&B saves a new run for each experiment. Here’s the information that gets saved by default:
- Hyperparameters: Settings for your model are saved in Config
- Model Metrics: Time series data of metrics streaming in are saved in Log
- Terminal Logs: Command line outputs are saved and available in a tab
- System Metrics: GPU and CPU utilization, memory, temperature etc.
Learn more
4 - TensorFlow
What this notebook covers
- Easy integration of W&B with your TensorFlow pipeline for experiment tracking.
- Computing metrics with keras.metrics
- Using wandb.logto log those metrics in your custom training loop.
 
Note: Sections starting with Step are all you need to integrate W&B into existing code. The rest is just a standard MNIST example.
import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import cifar10
Install, Import, Login
Install W&B
%%capture
!pip install wandb
Import W&B and login
import wandb
from wandb.integration.keras import WandbMetricsLogger
wandb.login()
Side note: If this is your first time using W&B or you are not logged in, the link that appears after running
wandb.login()will take you to sign-up/login page. Signing up is as easy as one click.
Prepare Dataset
# Prepare the training dataset
BATCH_SIZE = 64
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))
# build input pipeline using tf.data
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(BATCH_SIZE)
val_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
val_dataset = val_dataset.batch(BATCH_SIZE)
Define the Model and the Training Loop
def make_model():
    inputs = keras.Input(shape=(784,), name="digits")
    x1 = keras.layers.Dense(64, activation="relu")(inputs)
    x2 = keras.layers.Dense(64, activation="relu")(x1)
    outputs = keras.layers.Dense(10, name="predictions")(x2)
    return keras.Model(inputs=inputs, outputs=outputs)
def train_step(x, y, model, optimizer, loss_fn, train_acc_metric):
    with tf.GradientTape() as tape:
        logits = model(x, training=True)
        loss_value = loss_fn(y, logits)
    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))
    train_acc_metric.update_state(y, logits)
    return loss_value
def test_step(x, y, model, loss_fn, val_acc_metric):
    val_logits = model(x, training=False)
    loss_value = loss_fn(y, val_logits)
    val_acc_metric.update_state(y, val_logits)
    return loss_value
Add wandb.log to your training loop
def train(
    train_dataset,
    val_dataset,
    model,
    optimizer,
    train_acc_metric,
    val_acc_metric,
    epochs=10,
    log_step=200,
    val_log_step=50,
):
    run = wandb.init(
        project="my-tf-integration",
        config={
            "epochs": epochs,
            "log_step": log_step,
            "val_log_step": val_log_step,
            "architecture": "MLP",
            "dataset": "MNIST",
        },
    )
    for epoch in range(epochs):
        print("\nStart of epoch %d" % (epoch,))
        train_loss = []
        val_loss = []
        # Iterate over the batches of the dataset
        for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
            loss_value = train_step(
                x_batch_train,
                y_batch_train,
                model,
                optimizer,
                loss_fn,
                train_acc_metric,
            )
            train_loss.append(float(loss_value))
        # Run a validation loop at the end of each epoch
        for step, (x_batch_val, y_batch_val) in enumerate(val_dataset):
            val_loss_value = test_step(
                x_batch_val, y_batch_val, model, loss_fn, val_acc_metric
            )
            val_loss.append(float(val_loss_value))
        # Display metrics at the end of each epoch
        train_acc = train_acc_metric.result()
        print("Training acc over epoch: %.4f" % (float(train_acc),))
        val_acc = val_acc_metric.result()
        print("Validation acc: %.4f" % (float(val_acc),))
        # Reset metrics at the end of each epoch
        train_acc_metric.reset_state()
        val_acc_metric.reset_state()
        # Log metrics using run.log()
        run.log(
            {
                "epochs": epoch,
                "loss": np.mean(train_loss),
                "acc": float(train_acc),
                "val_loss": np.mean(val_loss),
                "val_acc": float(val_acc),
            }
        )
    run.finish()
Run Training
Call wandb.init() to start a run
This lets us know you’re launching an experiment, so we can give it a unique ID and a dashboard.
Check out the official documentation
# initialize wandb with your project name and optionally with configuration.
# play around with the config values and see the result on your wandb dashboard.
config = {
    "learning_rate": 0.001,
    "epochs": 10,
    "batch_size": 64,
    "log_step": 200,
    "val_log_step": 50,
    "architecture": "CNN",
    "dataset": "CIFAR-10",
}
run = wandb.init(project='my-tf-integration', config=config)
config = run.config
# Initialize model.
model = make_model()
# Instantiate an optimizer to train the model.
optimizer = keras.optimizers.SGD(learning_rate=config.learning_rate)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Prepare the metrics.
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()
train(
    train_dataset,
    val_dataset, 
    model,
    optimizer,
    train_acc_metric,
    val_acc_metric,
    epochs=config.epochs, 
    log_step=config.log_step, 
    val_log_step=config.val_log_step,
)
run.finish()  # In Jupyter/Colab, let us know you're finished!
Visualize Results
Click on the run page link above to see your live results.
Sweep 101
Use W&B Sweeps to automate hyperparameter optimization and explore the space of possible models.
Check out a Colab notebook demonstrating hyperparameter optimization using W&B Sweeps
Benefits of using W&B Sweeps
- Quick setup: With just a few lines of code you can run W&B Sweeps.
- Transparent: We cite all the algorithms we’re using, and our code is open source.
- Powerful: Our sweeps are completely customizable and configurable. You can launch a sweep across dozens of machines, and it’s just as easy as starting a sweep on your laptop.
 
Example Gallery
Explore examples of projects tracked and visualized with W&B in our gallery of examples, Fully Connected →.
Best Practices
- Projects: Log multiple runs to a project to compare them. wandb.init(project="project-name")
- Groups: For multiple processes or cross validation folds, log each process as a runs and group them together. wandb.init(group="experiment-1")
- Tags: Add tags to track your current baseline or production model.
- Notes: Type notes in the table to track the changes between runs.
- Reports: Take quick notes on progress to share with colleagues and make dashboards and snapshots of your ML projects.
Advanced Setup
- Environment variables: Set API keys in environment variables so you can run training on a managed cluster.
- Offline mode
- On-prem: Install W&B in a private cloud or air-gapped servers in your own infrastructure. We have local installations for everyone from academics to enterprise teams.
- Artifacts: Track and version models and datasets in a streamlined way that automatically picks up your pipeline steps as you train models.
5 - TensorFlow Sweeps
Use W&B for machine learning experiment tracking, dataset versioning, and project collaboration. 
Use W&B Sweeps to automate hyperparameter optimization and explore model possibilities with interactive dashboards:
 
Why use sweeps
- Quick setup: Run W&B sweeps with a few lines of code.
- Transparent: The project cites all algorithms used, and the code is open source.
- Powerful: Sweeps provide customization options and can run on multiple machines or a laptop with ease.
For more information, see the Sweeps overview.
What this notebook covers
- Steps to start with W&B Sweep and a custom training loop in TensorFlow.
- Finding best hyperparameters for image classification tasks.
Note: Sections starting with Step show necessary code to perform a hyperparameter sweep. The rest sets up a simple example.
Install, import, and log in
Install W&B
pip install wandb
Import W&B and log in
import tqdm
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import cifar10
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import wandb
from wandb.integration.keras import WandbMetricsLogger
wandb.login()
wandb.login() directs to the sign-up/login page.Prepare dataset
# Prepare the training dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))
Build a classifier MLP
def Model():
    inputs = keras.Input(shape=(784,), name="digits")
    x1 = keras.layers.Dense(64, activation="relu")(inputs)
    x2 = keras.layers.Dense(64, activation="relu")(x1)
    outputs = keras.layers.Dense(10, name="predictions")(x2)
    return keras.Model(inputs=inputs, outputs=outputs)
def train_step(x, y, model, optimizer, loss_fn, train_acc_metric):
    with tf.GradientTape() as tape:
        logits = model(x, training=True)
        loss_value = loss_fn(y, logits)
    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))
    train_acc_metric.update_state(y, logits)
    return loss_value
def test_step(x, y, model, loss_fn, val_acc_metric):
    val_logits = model(x, training=False)
    loss_value = loss_fn(y, val_logits)
    val_acc_metric.update_state(y, val_logits)
    return loss_value
Write a training loop
def train(
    train_dataset,
    val_dataset,
    model,
    optimizer,
    loss_fn,
    train_acc_metric,
    val_acc_metric,
    epochs=10,
    log_step=200,
    val_log_step=50,
):
    run = wandb.init(
        project="sweeps-tensorflow",
        job_type="train",
        config={
            "epochs": epochs,
            "log_step": log_step,
            "val_log_step": val_log_step,
            "architecture_name": "MLP",
            "dataset_name": "MNIST",
        },
    )
    for epoch in range(epochs):
        print("\nStart of epoch %d" % (epoch,))
        train_loss = []
        val_loss = []
        # Iterate over the batches of the dataset
        for step, (x_batch_train, y_batch_train) in tqdm.tqdm(
            enumerate(train_dataset), total=len(train_dataset)
        ):
            loss_value = train_step(
                x_batch_train,
                y_batch_train,
                model,
                optimizer,
                loss_fn,
                train_acc_metric,
            )
            train_loss.append(float(loss_value))
        # Run a validation loop at the end of each epoch
        for step, (x_batch_val, y_batch_val) in enumerate(val_dataset):
            val_loss_value = test_step(
                x_batch_val, y_batch_val, model, loss_fn, val_acc_metric
            )
            val_loss.append(float(val_loss_value))
        # Display metrics at the end of each epoch
        train_acc = train_acc_metric.result()
        print("Training acc over epoch: %.4f" % (float(train_acc),))
        val_acc = val_acc_metric.result()
        print("Validation acc: %.4f" % (float(val_acc),))
        # Reset metrics at the end of each epoch
        train_acc_metric.reset_states()
        val_acc_metric.reset_states()
        # 3. Log metrics using run.log()
        run.log(
            {
                "epochs": epoch,
                "loss": np.mean(train_loss),
                "acc": float(train_acc),
                "val_loss": np.mean(val_loss),
                "val_acc": float(val_acc),
            }
        )
    run.finish()
Configure the sweep
Steps to configure the sweep:
- Define the hyperparameters to optimize
- Choose the optimization method: random,grid, orbayes
- Set a goal and metric for bayes, like minimizingval_loss
- Use hyperbandfor early termination of performing runs
See more in the sweep configuration guide.
sweep_config = {
    "method": "random",
    "metric": {"name": "val_loss", "goal": "minimize"},
    "early_terminate": {"type": "hyperband", "min_iter": 5},
    "parameters": {
        "batch_size": {"values": [32, 64, 128, 256]},
        "learning_rate": {"values": [0.01, 0.005, 0.001, 0.0005, 0.0001]},
    },
}
Wrap the training loop
Create a function, like sweep_train,
which uses run.config() to set hyperparameters before calling train.
def sweep_train(config_defaults=None):
    # Set default values
    config_defaults = {"batch_size": 64, "learning_rate": 0.01}
    # Initialize wandb with a sample project name
    run = wandb.init(config=config_defaults)  # this gets over-written in the Sweep
    # Specify the other hyperparameters to the configuration, if any
    run.config.epochs = 2
    run.config.log_step = 20
    run.config.val_log_step = 50
    run.config.architecture_name = "MLP"
    run.config.dataset_name = "MNIST"
    # build input pipeline using tf.data
    train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
    train_dataset = (
        train_dataset.shuffle(buffer_size=1024)
        .batch(run.config.batch_size)
        .prefetch(buffer_size=tf.data.AUTOTUNE)
    )
    val_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
    val_dataset = val_dataset.batch(run.config.batch_size).prefetch(
        buffer_size=tf.data.AUTOTUNE
    )
    # initialize model
    model = Model()
    # Instantiate an optimizer to train the model.
    optimizer = keras.optimizers.SGD(learning_rate=run.config.learning_rate)
    # Instantiate a loss function.
    loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
    # Prepare the metrics.
    train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
    val_acc_metric = keras.metrics.SparseCategoricalAccuracy()
    train(
        train_dataset,
        val_dataset,
        model,
        optimizer,
        loss_fn,
        train_acc_metric,
        val_acc_metric,
        epochs=run.config.epochs,
        log_step=run.config.log_step,
        val_log_step=run.config.val_log_step,
    )
    run.finish()
Initialize sweep and run personal digital assistant
sweep_id = wandb.sweep(sweep_config, project="sweeps-tensorflow")
Limit the number of runs with the count parameter. Set to 10 for quick execution. Increase as needed.
wandb.agent(sweep_id, function=sweep_train, count=10)
Visualize results
Click on the Sweep URL link preceding to view live results.
Example gallery
Explore projects tracked and visualized with W&B in the Gallery.
Best practices
- Projects: Log multiple runs to a project to compare them. wandb.init(project="project-name")
- Groups: Log each process as a run for multiple processes or cross-validation folds, and group them. wandb.init(group='experiment-1')
- Tags: Use tags to track your baseline or production model.
- Notes: Enter notes in the table to track changes between runs.
- Reports: Use reports for progress notes, sharing with colleagues, and creating ML project dashboards and snapshots.
Advanced setup
- Environment variables: Set API keys for training on a managed cluster.
- Offline mode
- On-prem: Install W&B in a private cloud or air-gapped servers in your infrastructure. Local installations suit academics and enterprise teams.
6 - 3D brain tumor segmentation with MONAI
This tutorial demonstrates how to construct a training workflow of multi-labels 3D brain tumor segmentation task using MONAI and use experiment tracking and data visualization features of W&B. The tutorial contains the following features:
- Initialize a W&B Run and synchronize all configs associated with the run for reproducibility.
- MONAI transform API:
- MONAI Transforms for dictionary format data.
- How to define a new transform according to MONAI transformsAPI.
- How to randomly adjust intensity for data augmentation.
 
- Data Loading and Visualization:
- Load Niftiimage with metadata, load a list of images and stack them.
- Cache IO and transforms to accelerate training and validation.
- Visualize the data using wandb.Tableand interactive segmentation overlay on W&B.
 
- Load 
- Training a 3D SegResNetmodel- Using the networks,losses, andmetricsAPIs from MONAI.
- Training the 3D SegResNetmodel using a PyTorch training loop.
- Track the training experiment using W&B.
- Log and version model checkpoints as model artifacts on W&B.
 
- Using the 
- Visualize and compare the predictions on the validation dataset using wandb.Tableand interactive segmentation overlay on W&B.
Setup and Installation
First, install the latest version of both MONAI and W&B.
!python -c "import monai" || pip install -q -U "monai[nibabel, tqdm]"
!python -c "import wandb" || pip install -q -U wandb
import os
import numpy as np
from tqdm.auto import tqdm
import wandb
from monai.apps import DecathlonDataset
from monai.data import DataLoader, decollate_batch
from monai.losses import DiceLoss
from monai.inferers import sliding_window_inference
from monai.metrics import DiceMetric
from monai.networks.nets import SegResNet
from monai.transforms import (
    Activations,
    AsDiscrete,
    Compose,
    LoadImaged,
    MapTransform,
    NormalizeIntensityd,
    Orientationd,
    RandFlipd,
    RandScaleIntensityd,
    RandShiftIntensityd,
    RandSpatialCropd,
    Spacingd,
    EnsureTyped,
    EnsureChannelFirstd,
)
from monai.utils import set_determinism
import torch
Then, authenticate the Colab instance to use W&B.
wandb.login()
Initialize a W&B Run
Start a new W&B Run to start tracking the experiment. Use of proper config system is a recommended best practice for reproducible machine learning. You can track the hyperparameters for every experiment using W&B.
with wandb.init(project="monai-brain-tumor-segmentation") as run:
    config = run.config
    config.seed = 0
    config.roi_size = [224, 224, 144]
    config.batch_size = 1
    config.num_workers = 4
    config.max_train_images_visualized = 20
    config.max_val_images_visualized = 20
    config.dice_loss_smoothen_numerator = 0
    config.dice_loss_smoothen_denominator = 1e-5
    config.dice_loss_squared_prediction = True
    config.dice_loss_target_onehot = False
    config.dice_loss_apply_sigmoid = True
    config.initial_learning_rate = 1e-4
    config.weight_decay = 1e-5
    config.max_train_epochs = 50
    config.validation_intervals = 1
    config.dataset_dir = "./dataset/"
    config.checkpoint_dir = "./checkpoints"
    config.inference_roi_size = (128, 128, 64)
    config.max_prediction_images_visualized = 20
You also need to set the random seed for modules to enable or turn off deterministic training.
set_determinism(seed=config.seed)
# Create directories
os.makedirs(config.dataset_dir, exist_ok=True)
os.makedirs(config.checkpoint_dir, exist_ok=True)
Data Loading and Transformation
Here, use the monai.transforms API to create a custom transform that converts the multi-classes labels into multi-labels segmentation task in one-hot format.
class ConvertToMultiChannelBasedOnBratsClassesd(MapTransform):
    """
    Convert labels to multi channels based on brats classes:
    label 1 is the peritumoral edema
    label 2 is the GD-enhancing tumor
    label 3 is the necrotic and non-enhancing tumor core
    The possible classes are TC (Tumor core), WT (Whole tumor)
    and ET (Enhancing tumor).
    Reference: https://github.com/Project-MONAI/tutorials/blob/main/3d_segmentation/brats_segmentation_3d.ipynb
    """
    def __call__(self, data):
        d = dict(data)
        for key in self.keys:
            result = []
            # merge label 2 and label 3 to construct TC
            result.append(torch.logical_or(d[key] == 2, d[key] == 3))
            # merge labels 1, 2 and 3 to construct WT
            result.append(
                torch.logical_or(
                    torch.logical_or(d[key] == 2, d[key] == 3), d[key] == 1
                )
            )
            # label 2 is ET
            result.append(d[key] == 2)
            d[key] = torch.stack(result, axis=0).float()
        return d
Next, set up transforms for training and validation datasets respectively.
train_transform = Compose(
    [
        # load 4 Nifti images and stack them together
        LoadImaged(keys=["image", "label"]),
        EnsureChannelFirstd(keys="image"),
        EnsureTyped(keys=["image", "label"]),
        ConvertToMultiChannelBasedOnBratsClassesd(keys="label"),
        Orientationd(keys=["image", "label"], axcodes="RAS"),
        Spacingd(
            keys=["image", "label"],
            pixdim=(1.0, 1.0, 1.0),
            mode=("bilinear", "nearest"),
        ),
        RandSpatialCropd(
            keys=["image", "label"], roi_size=config.roi_size, random_size=False
        ),
        RandFlipd(keys=["image", "label"], prob=0.5, spatial_axis=0),
        RandFlipd(keys=["image", "label"], prob=0.5, spatial_axis=1),
        RandFlipd(keys=["image", "label"], prob=0.5, spatial_axis=2),
        NormalizeIntensityd(keys="image", nonzero=True, channel_wise=True),
        RandScaleIntensityd(keys="image", factors=0.1, prob=1.0),
        RandShiftIntensityd(keys="image", offsets=0.1, prob=1.0),
    ]
)
val_transform = Compose(
    [
        LoadImaged(keys=["image", "label"]),
        EnsureChannelFirstd(keys="image"),
        EnsureTyped(keys=["image", "label"]),
        ConvertToMultiChannelBasedOnBratsClassesd(keys="label"),
        Orientationd(keys=["image", "label"], axcodes="RAS"),
        Spacingd(
            keys=["image", "label"],
            pixdim=(1.0, 1.0, 1.0),
            mode=("bilinear", "nearest"),
        ),
        NormalizeIntensityd(keys="image", nonzero=True, channel_wise=True),
    ]
)
The Dataset
The dataset used for this experiment comes from http://medicaldecathlon.com/. It uses multi-modal multi-site MRI data (FLAIR, T1w, T1gd, T2w) to segment Gliomas, necrotic/active tumour, and oedema. The dataset consists of 750 4D volumes (484 Training + 266 Testing).
Use the DecathlonDataset to automatically download and extract the dataset. It inherits MONAI CacheDataset which enables you to set cache_num=N to cache N items for training and use the default arguments to cache all the items for validation, depending on your memory size.
train_dataset = DecathlonDataset(
    root_dir=config.dataset_dir,
    task="Task01_BrainTumour",
    transform=val_transform,
    section="training",
    download=True,
    cache_rate=0.0,
    num_workers=4,
)
val_dataset = DecathlonDataset(
    root_dir=config.dataset_dir,
    task="Task01_BrainTumour",
    transform=val_transform,
    section="validation",
    download=False,
    cache_rate=0.0,
    num_workers=4,
)
train_transform to the train_dataset, apply val_transform to both the training and validation datasets. This is because, before training, you would be visualizing samples from both the splits of the dataset.Visualizing the Dataset
W&B supports images, video, audio, and more. You can log rich media to explore your results and visually compare our runs, models, and datasets. Use the segmentation mask overlay system to visualize our data volumes. To log segmentation masks in tables, you must provide a wandb.Image object for each row in the table.
An example is provided in the pseudocode below:
table = wandb.Table(columns=["ID", "Image"])
for id, img, label in zip(ids, images, labels):
    mask_img = wandb.Image(
        img,
        masks={
            "prediction": {"mask_data": label, "class_labels": class_labels}
            # ...
        },
    )
    table.add_data(id, img)
run.log({"Table": table})
Now write a simple utility function that takes a sample image, label, wandb.Table object and some associated metadata and populate the rows of a table that would be logged to the W&B dashboard.
def log_data_samples_into_tables(
    sample_image: np.array,
    sample_label: np.array,
    split: str = None,
    data_idx: int = None,
    table: wandb.Table = None,
):
    num_channels, _, _, num_slices = sample_image.shape
    with tqdm(total=num_slices, leave=False) as progress_bar:
        for slice_idx in range(num_slices):
            ground_truth_wandb_images = []
            for channel_idx in range(num_channels):
                ground_truth_wandb_images.append(
                    masks = {
                        "ground-truth/Tumor-Core": {
                            "mask_data": sample_label[0, :, :, slice_idx],
                            "class_labels": {0: "background", 1: "Tumor Core"},
                        },
                        "ground-truth/Whole-Tumor": {
                            "mask_data": sample_label[1, :, :, slice_idx] * 2,
                            "class_labels": {0: "background", 2: "Whole Tumor"},
                        },
                        "ground-truth/Enhancing-Tumor": {
                            "mask_data": sample_label[2, :, :, slice_idx] * 3,
                            "class_labels": {0: "background", 3: "Enhancing Tumor"},
                        },
                    }
                    wandb.Image(
                        sample_image[channel_idx, :, :, slice_idx],
                        masks=masks,
                    )
                )
            table.add_data(split, data_idx, slice_idx, *ground_truth_wandb_images)
            progress_bar.update(1)
    return table
Next, define the wandb.Table object and what columns it consists of so that it can populate with the data visualizations.
table = wandb.Table(
    columns=[
        "Split",
        "Data Index",
        "Slice Index",
        "Image-Channel-0",
        "Image-Channel-1",
        "Image-Channel-2",
        "Image-Channel-3",
    ]
)
Then, loop over the train_dataset and val_dataset respectively to generate the visualizations for the data samples and populate the rows of the table which to log to the dashboard.
# Generate visualizations for train_dataset
max_samples = (
    min(config.max_train_images_visualized, len(train_dataset))
    if config.max_train_images_visualized > 0
    else len(train_dataset)
)
progress_bar = tqdm(
    enumerate(train_dataset[:max_samples]),
    total=max_samples,
    desc="Generating Train Dataset Visualizations:",
)
for data_idx, sample in progress_bar:
    sample_image = sample["image"].detach().cpu().numpy()
    sample_label = sample["label"].detach().cpu().numpy()
    table = log_data_samples_into_tables(
        sample_image,
        sample_label,
        split="train",
        data_idx=data_idx,
        table=table,
    )
# Generate visualizations for val_dataset
max_samples = (
    min(config.max_val_images_visualized, len(val_dataset))
    if config.max_val_images_visualized > 0
    else len(val_dataset)
)
progress_bar = tqdm(
    enumerate(val_dataset[:max_samples]),
    total=max_samples,
    desc="Generating Validation Dataset Visualizations:",
)
for data_idx, sample in progress_bar:
    sample_image = sample["image"].detach().cpu().numpy()
    sample_label = sample["label"].detach().cpu().numpy()
    table = log_data_samples_into_tables(
        sample_image,
        sample_label,
        split="val",
        data_idx=data_idx,
        table=table,
    )
# Log the table to your dashboard
run.log({"Tumor-Segmentation-Data": table})
The data appears on the W&B dashboard in an interactive tabular format. We can see each channel of a particular slice from a data volume overlaid with the respective segmentation mask in each row. You can write Weave queries to filter the data on the table and focus on one particular row.
|  | 
|---|
| An example of logged table data. | 
Open an image and see how you can interact with each of the segmentation masks using the interactive overlay.
|  | 
|---|
| *An example of visualized segmentation maps. | 
Loading the Data
Create the PyTorch DataLoaders for loading the data from the datasets. Before creating the DataLoaders, set the transform for train_dataset to train_transform to pre-process and transform the data for training.
# apply train_transforms to the training dataset
train_dataset.transform = train_transform
# create the train_loader
train_loader = DataLoader(
    train_dataset,
    batch_size=config.batch_size,
    shuffle=True,
    num_workers=config.num_workers,
)
# create the val_loader
val_loader = DataLoader(
    val_dataset,
    batch_size=config.batch_size,
    shuffle=False,
    num_workers=config.num_workers,
)
Creating the Model, Loss, and Optimizer
This tutorial crates a SegResNet model based on the paper 3D MRI brain tumor segmentation using auto-encoder regularization. The SegResNet model that comes implemented as a PyTorch Module as part of the monai.networks API as well as an optimizer and learning rate scheduler.
device = torch.device("cuda:0")
# create model
model = SegResNet(
    blocks_down=[1, 2, 2, 4],
    blocks_up=[1, 1, 1],
    init_filters=16,
    in_channels=4,
    out_channels=3,
    dropout_prob=0.2,
).to(device)
# create optimizer
optimizer = torch.optim.Adam(
    model.parameters(),
    config.initial_learning_rate,
    weight_decay=config.weight_decay,
)
# create learning rate scheduler
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
    optimizer, T_max=config.max_train_epochs
)
Define the loss as multi-label DiceLoss using the monai.losses API and the corresponding dice metrics using the monai.metrics API.
loss_function = DiceLoss(
    smooth_nr=config.dice_loss_smoothen_numerator,
    smooth_dr=config.dice_loss_smoothen_denominator,
    squared_pred=config.dice_loss_squared_prediction,
    to_onehot_y=config.dice_loss_target_onehot,
    sigmoid=config.dice_loss_apply_sigmoid,
)
dice_metric = DiceMetric(include_background=True, reduction="mean")
dice_metric_batch = DiceMetric(include_background=True, reduction="mean_batch")
post_trans = Compose([Activations(sigmoid=True), AsDiscrete(threshold=0.5)])
# use automatic mixed-precision to accelerate training
scaler = torch.cuda.amp.GradScaler()
torch.backends.cudnn.benchmark = True
Define a small utility for mixed-precision inference. This will be useful during the validation step of the training process and when you want to run the model after training.
def inference(model, input):
    def _compute(input):
        return sliding_window_inference(
            inputs=input,
            roi_size=(240, 240, 160),
            sw_batch_size=1,
            predictor=model,
            overlap=0.5,
        )
    with torch.cuda.amp.autocast():
        return _compute(input)
Training and Validation
Before training, define the metric properties which will later be logged with run.log() for tracking the training and validation experiments.
run.define_metric("epoch/epoch_step")
run.define_metric("epoch/*", step_metric="epoch/epoch_step")
run.define_metric("batch/batch_step")
run.define_metric("batch/*", step_metric="batch/batch_step")
run.define_metric("validation/validation_step")
run.define_metric("validation/*", step_metric="validation/validation_step")
batch_step = 0
validation_step = 0
metric_values = []
metric_values_tumor_core = []
metric_values_whole_tumor = []
metric_values_enhanced_tumor = []
Execute Standard PyTorch Training Loop
with wandb.init(
    project="monai-brain-tumor-segmentation",
    config=config,
    job_type="train",
    reinit=True,
) as run:
    # Define a W&B Artifact object
    artifact = wandb.Artifact(
        name=f"{run.id}-checkpoint", type="model"
    )
    epoch_progress_bar = tqdm(range(config.max_train_epochs), desc="Training:")
    for epoch in epoch_progress_bar:
        model.train()
        epoch_loss = 0
        total_batch_steps = len(train_dataset) // train_loader.batch_size
        batch_progress_bar = tqdm(train_loader, total=total_batch_steps, leave=False)
        
        # Training Step
        for batch_data in batch_progress_bar:
            inputs, labels = (
                batch_data["image"].to(device),
                batch_data["label"].to(device),
            )
            optimizer.zero_grad()
            with torch.cuda.amp.autocast():
                outputs = model(inputs)
                loss = loss_function(outputs, labels)
            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()
            epoch_loss += loss.item()
            batch_progress_bar.set_description(f"train_loss: {loss.item():.4f}:")
            ## Log batch-wise training loss to W&B
            run.log({"batch/batch_step": batch_step, "batch/train_loss": loss.item()})
            batch_step += 1
        lr_scheduler.step()
        epoch_loss /= total_batch_steps
        ## Log batch-wise training loss and learning rate to W&B
        run.log(
            {
                "epoch/epoch_step": epoch,
                "epoch/mean_train_loss": epoch_loss,
                "epoch/learning_rate": lr_scheduler.get_last_lr()[0],
            }
        )
        epoch_progress_bar.set_description(f"Training: train_loss: {epoch_loss:.4f}:")
        # Validation and model checkpointing step
        if (epoch + 1) % config.validation_intervals == 0:
            model.eval()
            with torch.no_grad():
                for val_data in val_loader:
                    val_inputs, val_labels = (
                        val_data["image"].to(device),
                        val_data["label"].to(device),
                    )
                    val_outputs = inference(model, val_inputs)
                    val_outputs = [post_trans(i) for i in decollate_batch(val_outputs)]
                    dice_metric(y_pred=val_outputs, y=val_labels)
                    dice_metric_batch(y_pred=val_outputs, y=val_labels)
                metric_values.append(dice_metric.aggregate().item())
                metric_batch = dice_metric_batch.aggregate()
                metric_values_tumor_core.append(metric_batch[0].item())
                metric_values_whole_tumor.append(metric_batch[1].item())
                metric_values_enhanced_tumor.append(metric_batch[2].item())
                dice_metric.reset()
                dice_metric_batch.reset()
                checkpoint_path = os.path.join(config.checkpoint_dir, "model.pth")
                torch.save(model.state_dict(), checkpoint_path)
                
                # Log and versison model checkpoints using W&B artifacts.
                artifact.add_file(local_path=checkpoint_path)
                run.log_artifact(artifact, aliases=[f"epoch_{epoch}"])
                # Log validation metrics to W&B dashboard.
                run.log(
                    {
                        "validation/validation_step": validation_step,
                        "validation/mean_dice": metric_values[-1],
                        "validation/mean_dice_tumor_core": metric_values_tumor_core[-1],
                        "validation/mean_dice_whole_tumor": metric_values_whole_tumor[-1],
                        "validation/mean_dice_enhanced_tumor": metric_values_enhanced_tumor[-1],
                    }
                )
                validation_step += 1
    # Wait for this artifact to finish logging
    artifact.wait()
Instrumenting the code with wandb.log not only enables tracking all metrics associated with the training and validation process, but also logs all system metrics (our CPU and GPU in this case) on the W&B dashboard.
|  | 
|---|
| An example of training and validation process tracking on W&B. | 
Navigate to the artifacts tab in the W&B run dashboard to access the different versions of model checkpoint artifacts logged during training.
|  | 
|---|
| An example of model checkpoints logging and versioning on W&B. | 
Inference
Using the artifacts interface, you can select which version of the artifact is the best model checkpoint, in this case, the mean epoch-wise training loss. You can also explore the entire lineage of the artifact and use the version that you need.
|  | 
|---|
| An example of model artifact tracking on W&B. | 
Fetch the version of the model artifact with the best epoch-wise mean training loss and load the checkpoint state dictionary to the model.
run = wandb.init(
    project="monai-brain-tumor-segmentation",
    job_type="inference",
    reinit=True,
)
model_artifact = run.use_artifact(
    "geekyrakshit/monai-brain-tumor-segmentation/d5ex6n4a-checkpoint:v49",
    type="model",
)
model_artifact_dir = model_artifact.download()
model.load_state_dict(torch.load(os.path.join(model_artifact_dir, "model.pth")))
model.eval()
Visualizing Predictions and Comparing with the Ground Truth Labels
Create another utility function to visualize the predictions of the pre-trained model and compare them with the corresponding ground-truth segmentation mask using the interactive segmentation mask overlay,.
def log_predictions_into_tables(
    sample_image: np.array,
    sample_label: np.array,
    predicted_label: np.array,
    split: str = None,
    data_idx: int = None,
    table: wandb.Table = None,
):
    num_channels, _, _, num_slices = sample_image.shape
    with tqdm(total=num_slices, leave=False) as progress_bar:
        for slice_idx in range(num_slices):
            wandb_images = []
            for channel_idx in range(num_channels):
                wandb_images += [
                    wandb.Image(
                        sample_image[channel_idx, :, :, slice_idx],
                        masks={
                            "ground-truth/Tumor-Core": {
                                "mask_data": sample_label[0, :, :, slice_idx],
                                "class_labels": {0: "background", 1: "Tumor Core"},
                            },
                            "prediction/Tumor-Core": {
                                "mask_data": predicted_label[0, :, :, slice_idx] * 2,
                                "class_labels": {0: "background", 2: "Tumor Core"},
                            },
                        },
                    ),
                    wandb.Image(
                        sample_image[channel_idx, :, :, slice_idx],
                        masks={
                            "ground-truth/Whole-Tumor": {
                                "mask_data": sample_label[1, :, :, slice_idx],
                                "class_labels": {0: "background", 1: "Whole Tumor"},
                            },
                            "prediction/Whole-Tumor": {
                                "mask_data": predicted_label[1, :, :, slice_idx] * 2,
                                "class_labels": {0: "background", 2: "Whole Tumor"},
                            },
                        },
                    ),
                    wandb.Image(
                        sample_image[channel_idx, :, :, slice_idx],
                        masks={
                            "ground-truth/Enhancing-Tumor": {
                                "mask_data": sample_label[2, :, :, slice_idx],
                                "class_labels": {0: "background", 1: "Enhancing Tumor"},
                            },
                            "prediction/Enhancing-Tumor": {
                                "mask_data": predicted_label[2, :, :, slice_idx] * 2,
                                "class_labels": {0: "background", 2: "Enhancing Tumor"},
                            },
                        },
                    ),
                ]
            table.add_data(split, data_idx, slice_idx, *wandb_images)
            progress_bar.update(1)
    return table
Log the prediction results to the prediction table.
run = wandb.init(
    project="monai-brain-tumor-segmentation",
    job_type="inference",
    reinit=True,
)
# create the prediction table
prediction_table = wandb.Table(
    columns=[
        "Split",
        "Data Index",
        "Slice Index",
        "Image-Channel-0/Tumor-Core",
        "Image-Channel-1/Tumor-Core",
        "Image-Channel-2/Tumor-Core",
        "Image-Channel-3/Tumor-Core",
        "Image-Channel-0/Whole-Tumor",
        "Image-Channel-1/Whole-Tumor",
        "Image-Channel-2/Whole-Tumor",
        "Image-Channel-3/Whole-Tumor",
        "Image-Channel-0/Enhancing-Tumor",
        "Image-Channel-1/Enhancing-Tumor",
        "Image-Channel-2/Enhancing-Tumor",
        "Image-Channel-3/Enhancing-Tumor",
    ]
)
# Perform inference and visualization
with torch.no_grad():
    config.max_prediction_images_visualized
    max_samples = (
        min(config.max_prediction_images_visualized, len(val_dataset))
        if config.max_prediction_images_visualized > 0
        else len(val_dataset)
    )
    progress_bar = tqdm(
        enumerate(val_dataset[:max_samples]),
        total=max_samples,
        desc="Generating Predictions:",
    )
    for data_idx, sample in progress_bar:
        val_input = sample["image"].unsqueeze(0).to(device)
        val_output = inference(model, val_input)
        val_output = post_trans(val_output[0])
        prediction_table = log_predictions_into_tables(
            sample_image=sample["image"].cpu().numpy(),
            sample_label=sample["label"].cpu().numpy(),
            predicted_label=val_output.cpu().numpy(),
            data_idx=data_idx,
            split="validation",
            table=prediction_table,
        )
    run.log({"Predictions/Tumor-Segmentation-Data": prediction_table})
# End the experiment
run.finish()
Use the interactive segmentation mask overlay to analyze and compare the predicted segmentation masks and the ground-truth labels for each class.
|  | 
|---|
| An example of predictions and ground-truth visualization on W&B. | 
Acknowledgements and more resources
7 - Keras
Use W&B for machine learning experiment tracking, dataset versioning, and project collaboration. 
This Colab notebook introduces the WandbMetricsLogger callback. Use this callback for Experiment Tracking. It will log your training and validation metrics along with system metrics to W&B.
Setup and Installation
First, let us install the latest version of W&B. We will then authenticate this colab instance to use W&B.
pip install -qq -U wandb
import os
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import models
import tensorflow_datasets as tfds
# W&B related imports
import wandb
from wandb.integration.keras import WandbMetricsLogger
If this is your first time using W&B or you are not logged in, the link that appears after running wandb.login() will take you to sign-up/login page. Signing up for a free account is as easy as a few clicks.
wandb.login()
Hyperparameters
Use of proper config system is a recommended best practice for reproducible machine learning. We can track the hyperparameters for every experiment using W&B. In this colab we will be using simple Python dict as our config system.
configs = dict(
    num_classes=10,
    shuffle_buffer=1024,
    batch_size=64,
    image_size=28,
    image_channels=1,
    earlystopping_patience=3,
    learning_rate=1e-3,
    epochs=10,
)
Dataset
In this colab, we will be using Fashion-MNIST dataset from TensorFlow Dataset catalog. We aim to build a simple image classification pipeline using TensorFlow/Keras.
train_ds, valid_ds = tfds.load("fashion_mnist", split=["train", "test"])
AUTOTUNE = tf.data.AUTOTUNE
def parse_data(example):
    # Get image
    image = example["image"]
    # image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    # Get label
    label = example["label"]
    label = tf.one_hot(label, depth=configs["num_classes"])
    return image, label
def get_dataloader(ds, configs, dataloader_type="train"):
    dataloader = ds.map(parse_data, num_parallel_calls=AUTOTUNE)
    if dataloader_type == "train":
        dataloader = dataloader.shuffle(configs["shuffle_buffer"])
    dataloader = dataloader.batch(configs["batch_size"]).prefetch(AUTOTUNE)
    return dataloader
trainloader = get_dataloader(train_ds, configs)
validloader = get_dataloader(valid_ds, configs, dataloader_type="valid")
Model
def get_model(configs):
    backbone = tf.keras.applications.mobilenet_v2.MobileNetV2(
        weights="imagenet", include_top=False
    )
    backbone.trainable = False
    inputs = layers.Input(
        shape=(configs["image_size"], configs["image_size"], configs["image_channels"])
    )
    resize = layers.Resizing(32, 32)(inputs)
    neck = layers.Conv2D(3, (3, 3), padding="same")(resize)
    preprocess_input = tf.keras.applications.mobilenet.preprocess_input(neck)
    x = backbone(preprocess_input)
    x = layers.GlobalAveragePooling2D()(x)
    outputs = layers.Dense(configs["num_classes"], activation="softmax")(x)
    return models.Model(inputs=inputs, outputs=outputs)
tf.keras.backend.clear_session()
model = get_model(configs)
model.summary()
Compile Model
model.compile(
    optimizer="adam",
    loss="categorical_crossentropy",
    metrics=[
        "accuracy",
        tf.keras.metrics.TopKCategoricalAccuracy(k=5, name="top@5_accuracy"),
    ],
)
Train
# Initialize a W&B Run
run = wandb.init(project="intro-keras", config=configs)
# Train your model
model.fit(
    trainloader,
    epochs=configs["epochs"],
    validation_data=validloader,
    callbacks=[
        WandbMetricsLogger(log_freq=10)
    ],  # Notice the use of WandbMetricsLogger here
)
# Close the W&B Run
run.finish()
8 - Keras models
Use W&B for machine learning experiment tracking, dataset versioning, and project collaboration. 
This Colab notebook introduces the WandbModelCheckpoint callback. Use this callback to log your model checkpoints to W&B Artifacts.
Setup and Installation
First, let us install the latest version of W&B. We will then authenticate this colab instance to use W&B.
!pip install -qq -U wandb
import os
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import models
import tensorflow_datasets as tfds
# W&B related imports
import wandb
from wandb.integration.keras import WandbMetricsLogger
from wandb.integration.keras import WandbModelCheckpoint
If this is your first time using W&B or you are not logged in, the link that appears after running wandb.login() will take you to sign-up/login page. Signing up for a free account is as easy as a few clicks.
wandb.login()
Hyperparameters
Use of proper config system is a recommended best practice for reproducible machine learning. We can track the hyperparameters for every experiment using W&B. In this colab we will be using simple Python dict as our config system.
configs = dict(
    num_classes = 10,
    shuffle_buffer = 1024,
    batch_size = 64,
    image_size = 28,
    image_channels = 1,
    earlystopping_patience = 3,
    learning_rate = 1e-3,
    epochs = 10
)
Dataset
In this colab, we will be using Fashion-MNIST dataset from TensorFlow Dataset catalog. We aim to build a simple image classification pipeline using TensorFlow/Keras.
train_ds, valid_ds = tfds.load('fashion_mnist', split=['train', 'test'])
AUTOTUNE = tf.data.AUTOTUNE
def parse_data(example):
    # Get image
    image = example["image"]
    # image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    # Get label
    label = example["label"]
    label = tf.one_hot(label, depth=configs["num_classes"])
    return image, label
def get_dataloader(ds, configs, dataloader_type="train"):
    dataloader = ds.map(parse_data, num_parallel_calls=AUTOTUNE)
    if dataloader_type=="train":
        dataloader = dataloader.shuffle(configs["shuffle_buffer"])
      
    dataloader = (
        dataloader
        .batch(configs["batch_size"])
        .prefetch(AUTOTUNE)
    )
    return dataloader
trainloader = get_dataloader(train_ds, configs)
validloader = get_dataloader(valid_ds, configs, dataloader_type="valid")
Model
def get_model(configs):
    backbone = tf.keras.applications.mobilenet_v2.MobileNetV2(weights='imagenet', include_top=False)
    backbone.trainable = False
    inputs = layers.Input(shape=(configs["image_size"], configs["image_size"], configs["image_channels"]))
    resize = layers.Resizing(32, 32)(inputs)
    neck = layers.Conv2D(3, (3,3), padding="same")(resize)
    preprocess_input = tf.keras.applications.mobilenet.preprocess_input(neck)
    x = backbone(preprocess_input)
    x = layers.GlobalAveragePooling2D()(x)
    outputs = layers.Dense(configs["num_classes"], activation="softmax")(x)
    return models.Model(inputs=inputs, outputs=outputs)
tf.keras.backend.clear_session()
model = get_model(configs)
model.summary()
Compile Model
model.compile(
    optimizer = "adam",
    loss = "categorical_crossentropy",
    metrics = ["accuracy", tf.keras.metrics.TopKCategoricalAccuracy(k=5, name='top@5_accuracy')]
)
Train
# Initialize a W&B Run
run = wandb.init(
    project = "intro-keras",
    config = configs
)
# Train your model
model.fit(
    trainloader,
    epochs = configs["epochs"],
    validation_data = validloader,
    callbacks = [
        WandbMetricsLogger(log_freq=10),
        WandbModelCheckpoint(filepath="models/model.keras") # Notice the use of WandbModelCheckpoint here
    ]
)
# Close the W&B Run
run.finish()
9 - Keras tables
Use W&B for machine learning experiment tracking, dataset versioning, and project collaboration. 
This Colab notebook introduces the WandbEvalCallback which is an abstract callback that be inherited to build useful callbacks for model prediction visualization and dataset visualization.
Setup and Installation
First, let us install the latest version of W&B. We will then authenticate this colab instance to use W&B.
pip install -qq -U wandb
import os
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import models
import tensorflow_datasets as tfds
# W&B related imports
import wandb
from wandb.integration.keras import WandbMetricsLogger
from wandb.integration.keras import WandbModelCheckpoint
from wandb.integration.keras import WandbEvalCallback
If this is your first time using W&B or you are not logged in, the link that appears after running wandb.login() will take you to sign-up/login page. Signing up for a free account is as easy as a few clicks.
wandb.login()
Hyperparameters
Use of proper config system is a recommended best practice for reproducible machine learning. We can track the hyperparameters for every experiment using W&B. In this colab we will be using simple Python dict as our config system.
configs = dict(
    num_classes=10,
    shuffle_buffer=1024,
    batch_size=64,
    image_size=28,
    image_channels=1,
    earlystopping_patience=3,
    learning_rate=1e-3,
    epochs=10,
)
Dataset
In this colab, we will be using Fashion-MNIST dataset from TensorFlow Dataset catalog. We aim to build a simple image classification pipeline using TensorFlow/Keras.
train_ds, valid_ds = tfds.load("fashion_mnist", split=["train", "test"])
AUTOTUNE = tf.data.AUTOTUNE
def parse_data(example):
    # Get image
    image = example["image"]
    # image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    # Get label
    label = example["label"]
    label = tf.one_hot(label, depth=configs["num_classes"])
    return image, label
def get_dataloader(ds, configs, dataloader_type="train"):
    dataloader = ds.map(parse_data, num_parallel_calls=AUTOTUNE)
    if dataloader_type=="train":
        dataloader = dataloader.shuffle(configs["shuffle_buffer"])
      
    dataloader = (
        dataloader
        .batch(configs["batch_size"])
        .prefetch(AUTOTUNE)
    )
    return dataloader
trainloader = get_dataloader(train_ds, configs)
validloader = get_dataloader(valid_ds, configs, dataloader_type="valid")
Model
def get_model(configs):
    backbone = tf.keras.applications.mobilenet_v2.MobileNetV2(
        weights="imagenet", include_top=False
    )
    backbone.trainable = False
    inputs = layers.Input(
        shape=(configs["image_size"], configs["image_size"], configs["image_channels"])
    )
    resize = layers.Resizing(32, 32)(inputs)
    neck = layers.Conv2D(3, (3, 3), padding="same")(resize)
    preprocess_input = tf.keras.applications.mobilenet.preprocess_input(neck)
    x = backbone(preprocess_input)
    x = layers.GlobalAveragePooling2D()(x)
    outputs = layers.Dense(configs["num_classes"], activation="softmax")(x)
    return models.Model(inputs=inputs, outputs=outputs)
tf.keras.backend.clear_session()
model = get_model(configs)
model.summary()
Compile Model
model.compile(
    optimizer="adam",
    loss="categorical_crossentropy",
    metrics=[
        "accuracy",
        tf.keras.metrics.TopKCategoricalAccuracy(k=5, name="top@5_accuracy"),
    ],
)
WandbEvalCallback
The WandbEvalCallback is an abstract base class to build Keras callbacks for primarily model prediction visualization and secondarily dataset visualization.
This is a dataset and task agnostic abstract callback. To use this, inherit from this base callback class and implement the add_ground_truth and add_model_prediction methods.
The WandbEvalCallback is a utility class that provides helpful methods to:
- create data and prediction wandb.Tableinstances,
- log data and prediction Tables as wandb.Artifact,
- logs the data table on_train_begin,
- logs the prediction table on_epoch_end.
As an example, we have implemented WandbClfEvalCallback below for an image classification task. This example callback:
- logs the validation data (data_table) to W&B,
- performs inference and logs the prediction (pred_table) to W&B on every epoch end.
How the memory footprint is reduced
We log the data_table to W&B when the on_train_begin method is ivoked. Once it’s uploaded as a W&B Artifact, we get a reference to this table which can be accessed using data_table_ref class variable. The data_table_ref is a 2D list that can be indexed like self.data_table_ref[idx][n] where idx is the row number while n is the column number. Let’s see the usage in the example below.
class WandbClfEvalCallback(WandbEvalCallback):
    def __init__(
        self, validloader, data_table_columns, pred_table_columns, num_samples=100
    ):
        super().__init__(data_table_columns, pred_table_columns)
        self.val_data = validloader.unbatch().take(num_samples)
    def add_ground_truth(self, logs=None):
        for idx, (image, label) in enumerate(self.val_data):
            self.data_table.add_data(idx, wandb.Image(image), np.argmax(label, axis=-1))
    def add_model_predictions(self, epoch, logs=None):
        # Get predictions
        preds = self._inference()
        table_idxs = self.data_table_ref.get_index()
        for idx in table_idxs:
            pred = preds[idx]
            self.pred_table.add_data(
                epoch,
                self.data_table_ref.data[idx][0],
                self.data_table_ref.data[idx][1],
                self.data_table_ref.data[idx][2],
                pred,
            )
    def _inference(self):
        preds = []
        for image, label in self.val_data:
            pred = self.model(tf.expand_dims(image, axis=0))
            argmax_pred = tf.argmax(pred, axis=-1).numpy()[0]
            preds.append(argmax_pred)
        return preds
Train
# Initialize a W&B Run
run = wandb.init(project="intro-keras", config=configs)
# Train your model
model.fit(
    trainloader,
    epochs=configs["epochs"],
    validation_data=validloader,
    callbacks=[
        WandbMetricsLogger(log_freq=10),
        WandbClfEvalCallback(
            validloader,
            data_table_columns=["idx", "image", "ground_truth"],
            pred_table_columns=["epoch", "idx", "image", "ground_truth", "prediction"],
        ),  # Notice the use of WandbEvalCallback here
    ],
)
# Close the W&B Run
run.finish()
10 - XGBoost Sweeps
Use W&B for machine learning experiment tracking, dataset versioning, and project collaboration. 
Squeezing the best performance out of tree-based models requires
selecting the right hyperparameters.
How many early_stopping_rounds? What should the max_depth of a tree be?
Searching through high dimensional hyperparameter spaces to find the most performant model can get unwieldy very fast. Hyperparameter sweeps provide an organized and efficient way to conduct a battle royale of models and crown a winner. They enable this by automatically searching through combinations of hyperparameter values to find the most optimal values.
In this tutorial we’ll see how you can run sophisticated hyperparameter sweeps on XGBoost models in 3 easy steps using W&B.
For a teaser, check out the plots below:
 
Sweeps: An Overview
Running a hyperparameter sweep with W&B is very easy. There are just 3 simple steps:
- 
Define the sweep: we do this by creating a dictionary-like object that specifies the sweep: which parameters to search through, which search strategy to use, which metric to optimize. 
- 
Initialize the sweep: with one line of code we initialize the sweep and pass in the dictionary of sweep configurations: sweep_id = wandb.sweep(sweep_config)
- 
Run the sweep agent: also accomplished with one line of code, we call w andb.agent()and pass thesweep_idalong with a function that defines your model architecture and trains it:wandb.agent(sweep_id, function=train)
That’s all there is to running a hyperparameter sweep.
In the notebook below, we’ll walk through these 3 steps in more detail.
We highly encourage you to fork this notebook, tweak the parameters, or try the model with your own dataset.
Resources
!pip install wandb -qU
import wandb
wandb.login()
1. Define the Sweep
W&B sweeps give you powerful levers to configure your sweeps exactly how you want them, with just a few lines of code. The sweeps config can be defined as a dictionary or a YAML file.
Let’s walk through some of them together:
- Metric: This is the metric the sweeps are attempting to optimize. Metrics can take a name(this metric should be logged by your training script) and agoal(maximizeorminimize).
- Search Strategy: Specified using the "method"key. We support several different search strategies with sweeps.
- Grid Search: Iterates over every combination of hyperparameter values.
- Random Search: Iterates over randomly chosen combinations of hyperparameter values.
- Bayesian Search: Creates a probabilistic model that maps hyperparameters to probability of a metric score, and chooses parameters with high probability of improving the metric. The objective of Bayesian optimization is to spend more time in picking the hyperparameter values, but in doing so trying out fewer hyperparameter values.
- Parameters: A dictionary containing the hyperparameter names, and discrete values, a range, or distributions from which to pull their values on each iteration.
For details, see the list of all sweep configuration options.
sweep_config = {
    "method": "random", # try grid or random
    "metric": {
      "name": "accuracy",
      "goal": "maximize"   
    },
    "parameters": {
        "booster": {
            "values": ["gbtree","gblinear"]
        },
        "max_depth": {
            "values": [3, 6, 9, 12]
        },
        "learning_rate": {
            "values": [0.1, 0.05, 0.2]
        },
        "subsample": {
            "values": [1, 0.5, 0.3]
        }
    }
}
2. Initialize the Sweep
Calling wandb.sweep starts a Sweep Controller –
a centralized process that provides settings of the parameters to any who query it
and expects them to return performance on metrics via wandb logging.
sweep_id = wandb.sweep(sweep_config, project="XGBoost-sweeps")
Define your training process
Before we can run the sweep, we need to define a function that creates and trains the model – the function that takes in hyperparameter values and spits out metrics.
We’ll also need wandb to be integrated into our script.
There’s three main components:
- wandb.init(): Initialize a new W&B Run. Each run is single execution of the training script.
- run.config: Save all your hyperparameters in a config object. This lets you use our app to sort and compare your runs by hyperparameter values.
- run.log(): Logs metrics and custom objects, such as images, videos, audio files, HTML, plots, or point clouds.
We also need to download the data:
!wget https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv
# XGBoost model for Pima Indians dataset
from numpy import loadtxt
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# load data
def train():
  config_defaults = {
    "booster": "gbtree",
    "max_depth": 3,
    "learning_rate": 0.1,
    "subsample": 1,
    "seed": 117,
    "test_size": 0.33,
  }
  with wandb.init(config=config_defaults)  as run: # defaults are over-ridden during the sweep
    config = run.config
    # load data and split into predictors and targets
    dataset = loadtxt("pima-indians-diabetes.data.csv", delimiter=",")
    X, Y = dataset[:, :8], dataset[:, 8]
    # split data into train and test sets
    X_train, X_test, y_train, y_test = train_test_split(X, Y,
                                                        test_size=config.test_size,
                                                        random_state=config.seed)
    # fit model on train
    model = XGBClassifier(booster=config.booster, max_depth=config.max_depth,
                          learning_rate=config.learning_rate, subsample=config.subsample)
    model.fit(X_train, y_train)
    # make predictions on test
    y_pred = model.predict(X_test)
    predictions = [round(value) for value in y_pred]
    # evaluate predictions
    accuracy = accuracy_score(y_test, predictions)
    print(f"Accuracy: {accuracy:.0%}")
    run.log({"accuracy": accuracy})
3. Run the Sweep with an agent
Now, we call wandb.agent to start up our sweep.
You can call wandb.agent on any machine where you’re logged into W&B that has
- the sweep_id,
- the dataset and trainfunction
and that machine will join the sweep.
Note: a
randomsweep will by defauly run forever, trying new parameter combinations until the cows come home – or until you turn the sweep off from the app UI. You can prevent this by providing the totalcountof runs you’d like theagentto complete.
wandb.agent(sweep_id, train, count=25)
Visualize your results
Now that your sweep is finished, it’s time to look at the results.
W&B will generate a number of useful plots for you automatically.
Parallel coordinates plot
This plot maps hyperparameter values to model metrics. It’s useful for honing in on combinations of hyperparameters that led to the best model performance.
This plot seems to indicate that using a tree as our learner slightly, but not mind-blowingly, outperforms using a simple linear model as our learner.
 
Hyperparameter importance plot
The hyperparameter importance plot shows which hyperparameter values had the biggest impact on your metrics.
We report both the correlation (treating it as a linear predictor) and the feature importance (after training a random forest on your results) so you can see which parameters had the biggest effect and whether that effect was positive or negative.
Reading this chart, we see quantitative confirmation
of the trend we noticed in the parallel coordinates chart above:
the largest impact on validation accuracy came from the choice of
learner, and the gblinear learners were generally worse than gbtree learners.
 
These visualizations can help you save both time and resources running expensive hyperparameter optimizations by honing in on the parameters (and value ranges) that are the most important, and thereby worthy of further exploration.