Classifying Men and Women Images using PyTorch

7 min readApr 17, 2021

PyTorch is an open-source ML library developed by Facebook. It has many applications in Data Science and is notably used in computer vision and natural language processing.

So, as an exercise, I have tried using the PyTorch library to classify men and women images from a small Kaggle dataset — https://www.kaggle.com/saadpd/menwomen-classification.

I have used Google Colab for coding the problem. So, I will show you how it is done step-wise.

Getting the Data

We need to install ‘opendatasets’ library first with the following command.

pip install opendatasets — upgrade — quiet

Now, we import opendatasets and get the Kaggle dataset.

import opendatasets as oddataset_url = ‘https://www.kaggle.com/saadpd/menwomen-classification'od.download(dataset_url)

You will need to have a Kaggle account for downloading this dataset as the above command will ask for your Kaggle account ID and the API key. You can find the API key in the account details of your Kaggle account. Once you enter those, your dataset will be downloaded in Colab.

Importing required Libraries

import osimport torchimport torchvisionimport tarfilefrom torchvision.datasets.utils import download_urlfrom torch.utils.data import random_splitfrom torchvision.datasets import ImageFolderfrom torchvision.transforms import ToTensorimport torchvision.transforms as transformsimport matplotlibimport matplotlib.pyplot as plt%matplotlib inlinematplotlib.rcParams[‘figure.facecolor’] = ‘#ffffff’from torch.utils.data.dataloader import DataLoader

As we needed torch and and torchvision for further processing, we can import them as above.

Checking out the Data

DATA_DIR = ‘./menwomen-classification/traindata/traindata’print(os.listdir(DATA_DIR))

You get output as ‘[‘men’, ‘women’]’ for the above commands.

print(os.listdir(DATA_DIR+’/men’)[:10])

And you get output ‘[‘00002001.jpg’, ‘misclassed (1).jpg’, ‘00001267.jpeg’, ‘00001243.jpg’, ‘00000528.jpg’, ‘00000836.jpg’, ‘00002327.jpg’, ‘00001781.jpg’, ‘00000810.jpg’, ‘00002155.jpg’]’ for the above command.

Transforming the Data

t = transforms.Compose([transforms.Resize((32, 32)), transforms.ToTensor()])dataset = ImageFolder(DATA_DIR, transform=t)

Here, I have transformed the dataset images to 32 by 32 pixels size. You are free to try out different sizes but beware that as the resolution is increased, more computation is done and hence, more time and resources are required.

Checking the Images

Below function will help in checking out the images.

def show_example(img, label):print(‘Label: ‘, dataset.classes[label], “(“+str(label)+”)”)plt.imshow(img.permute(1, 2, 0))show_example(*dataset[1010])

Creating Train and Validation Sets

val_size = 500train_size = len(dataset) — val_sizetrain_ds, val_ds = random_split(dataset, [train_size, val_size])

random_split function gives us the train and validation sets as per our size requirement. It selects random points from out dataset. Now, we can use these to create dataloaders.

batch_size=128train_dl = DataLoader(train_ds, batch_size, shuffle=True, num_workers=4, pin_memory=True)val_dl = DataLoader(val_ds, batch_size*2, num_workers=4, pin_memory=True)

Now, our train and validation dataloaders are created. Cool!

Checking the Batch

from torchvision.utils import make_griddef show_batch(dl):for images, labels in dl:fig, ax = plt.subplots(figsize=(12, 6))ax.set_xticks([]); ax.set_yticks([])ax.imshow(make_grid(images, nrow=16).permute(1, 2, 0))break

Above code helps us check the batch which looks like below.

Creating Functions for our Model

def apply_kernel(image, kernel):ri, ci = image.shape # image dimensionsrk, ck = kernel.shape # kernel dimensionsro, co = ri-rk+1, ci-ck+1 # output dimensionsoutput = torch.zeros([ro, co])for i in range(ro):for j in range(co):output[i,j] = torch.sum(image[i:i+rk,j:j+ck] * kernel)return output

Above function gets us a matrix of input after convolution with the function’s kernel. Below example shows us how.

sample_image = torch.tensor([[3, 3, 2, 1, 0], [0, 0, 1, 3, 1], [3, 1, 2, 2, 3], [2, 0, 0, 2, 2], [2, 0, 0, 0, 1]], dtype=torch.float32)sample_kernel = torch.tensor([[0, 1, 2], [2, 2, 0], [0, 1, 2]], dtype=torch.float32)apply_kernel(sample_image, sample_kernel)

The output tensor is this — tensor([[12., 12., 17.], [10., 17., 19.], [ 9., 6., 14.]])

Creating the Model

simple_model = nn.Sequential(nn.Conv2d(3, 8, kernel_size=3, stride=1, padding=1),nn.MaxPool2d(2, 2))class ImageClassificationBase(nn.Module):def training_step(self, batch):images, labels = batchout = self(images) # Generate predictionsloss = F.cross_entropy(out, labels) # Calculate lossreturn lossdef validation_step(self, batch):images, labels = batchout = self(images) # Generate predictionsloss = F.cross_entropy(out, labels) # Calculate lossacc = accuracy(out, labels) # Calculate accuracyreturn {‘val_loss’: loss.detach(), ‘val_acc’: acc}def validation_epoch_end(self, outputs):batch_losses = [x[‘val_loss’] for x in outputs]epoch_loss = torch.stack(batch_losses).mean() # Combine lossesbatch_accs = [x[‘val_acc’] for x in outputs]epoch_acc = torch.stack(batch_accs).mean() # Combine accuraciesreturn {‘val_loss’: epoch_loss.item(), ‘val_acc’: epoch_acc.item()}def epoch_end(self, epoch, result):print(“Epoch [{}], train_loss: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}”.format(epoch, result[‘train_loss’], result[‘val_loss’], result[‘val_acc’]))def accuracy(outputs, labels):_, preds = torch.max(outputs, dim=1)return torch.tensor(torch.sum(preds == labels).item() / len(preds))class class_finder(ImageClassificationBase):def __init__(self):super().__init__()self.network = nn.Sequential(nn.Conv2d(3, 32, kernel_size=3, padding=1),nn.ReLU(),nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),nn.ReLU(),nn.MaxPool2d(2, 2), # output: 64 x 16 x 16nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),nn.ReLU(),nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),nn.ReLU(),nn.MaxPool2d(2, 2), # output: 128 x 8 x 8nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),nn.ReLU(),nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),nn.ReLU(),nn.MaxPool2d(2, 2), # output: 256 x 4 x 4nn.Flatten(),nn.Linear(256*4*4, 1024),nn.ReLU(),nn.Linear(1024, 512),nn.ReLU(),nn.Linear(512, 10))def forward(self, xb):return self.network(xb)

Now, do

model = class_finder()model

to get the model as

class_finder( (network): Sequential( (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU() (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU() (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU() (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (13): ReLU() (14): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (15): Flatten(start_dim=1, end_dim=-1) (16): Linear(in_features=4096, out_features=1024, bias=True) (17): ReLU() (18): Linear(in_features=1024, out_features=512, bias=True) (19): ReLU() (20): Linear(in_features=512, out_features=10, bias=True) ) )

Getting GPU Support for our Model

Below function helps us to convert our model to GPU.

def get_default_device():“””Pick GPU if available, else CPU”””if torch.cuda.is_available():return torch.device(‘cuda’)else:return torch.device(‘cpu’)def to_device(data, device):“””Move tensor(s) to chosen device”””if isinstance(data, (list,tuple)):return [to_device(x, device) for x in data]return data.to(device, non_blocking=True)class DeviceDataLoader():“””Wrap a dataloader to move data to a device”””def __init__(self, dl, device):self.dl = dlself.device = devicedef __iter__(self):“””Yield a batch of data after moving it to device”””for b in self.dl:yield to_device(b, self.device)def __len__(self):“””Number of batches”””return len(self.dl)

We get a device variable for the device.

device = get_default_device()

Now, we convert to train and validation dataloaders.

train_dl = DeviceDataLoader(train_dl, device)val_dl = DeviceDataLoader(val_dl, device)to_device(model, device)

Creating the Evaluation Function

@torch.no_grad()def evaluate(model, val_loader):model.eval()outputs = [model.validation_step(batch) for batch in val_loader]return model.validation_epoch_end(outputs)def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):history = []optimizer = opt_func(model.parameters(), lr)for epoch in range(epochs):# Training Phasemodel.train()train_losses = []for batch in train_loader:loss = model.training_step(batch)train_losses.append(loss)loss.backward()optimizer.step()optimizer.zero_grad()# Validation phaseresult = evaluate(model, val_loader)result[‘train_loss’] = torch.stack(train_losses).mean().item()model.epoch_end(epoch, result)history.append(result)return history

Running the Model and Getting Results

model = to_device(class_finder(), device)evaluate(model, val_dl)

We get the base accuracy as {‘val_acc’: 0.0, ‘val_loss’: 2.276859760284424}. Let’s set the epochs and learning rate.

num_epochs = 10opt_func = torch.optim.Adamlr = 0.001

Now, store the training in a variable named history for use later.

history = fit(num_epochs, lr, model, train_dl, val_dl, opt_func)

You get the following output while using fit function.

Epoch [0], train_loss: 0.9649, val_loss: 0.6594, val_acc: 0.6277 Epoch [1], train_loss: 0.6432, val_loss: 0.6616, val_acc: 0.6277 Epoch [2], train_loss: 0.6455, val_loss: 0.6591, val_acc: 0.6277 Epoch [3], train_loss: 0.6388, val_loss: 0.6584, val_acc: 0.6277 Epoch [4], train_loss: 0.6347, val_loss: 0.6662, val_acc: 0.6277 Epoch [5], train_loss: 0.6389, val_loss: 0.6580, val_acc: 0.6277 Epoch [6], train_loss: 0.6273, val_loss: 0.6413, val_acc: 0.6277 Epoch [7], train_loss: 0.6216, val_loss: 0.6514, val_acc: 0.6277 Epoch [8], train_loss: 0.6086, val_loss: 0.6314, val_acc: 0.6476 Epoch [9], train_loss: 0.6218, val_loss: 0.6390, val_acc: 0.6454

Plotting our Results

Now, we define a plotting function to plot accuracies.

def plot_accuracies(history):accuracies = [x[‘val_acc’] for x in history]plt.plot(accuracies, ‘-x’)plt.xlabel(‘epoch’)plt.ylabel(‘accuracy’)plt.title(‘Accuracy vs. No. of epochs’)

After using the plot accuracy function, we get, the plot,

As you see, the accuracy increases with out epochs and it is a bit constant early in the training. Similarly, we define a function to plot losses,

def plot_losses(history):train_losses = [x.get(‘train_loss’) for x in history]val_losses = [x[‘val_loss’] for x in history]plt.plot(train_losses, ‘-bx’)plt.plot(val_losses, ‘-rx’)plt.xlabel(‘epoch’)plt.ylabel(‘loss’)plt.legend([‘Training’, ‘Validation’])plt.title(‘Loss vs. No. of epochs’)

We obtain the above plot with the plot losses function. The losses decrease with training which makes total sense.

Checking on the Test Set

Now, it is time to get the test set in picture.

t_d = transforms.Compose([transforms.Resize((32, 32)), transforms.ToTensor()])test_dataset = ImageFolder(data_dir, transform=t_d)

We have a predict image function to test on the test set.

def predict_image(img, model):# Convert to a batch of 1xb = to_device(img.unsqueeze(0), device)# Get predictions from modelyb = model(xb)# Pick index with highest probability_, preds = torch.max(yb, dim=1)# Retrieve the class labelreturn dataset.classes[preds[0].item()]

Now, we test on a single image.

img, label = test_dataset[1002]plt.imshow(img.permute(1, 2, 0))print('Label:', dataset.classes[label], ', Predicted:', predict_image(img, model))

We get the result as Label: women , Predicted: women

Now, we test model accuracy on the test set.

test_loader = DeviceDataLoader(DataLoader(test_dataset, batch_size*2), device)result = evaluate(model, test_loader)

We get {‘acc’: 0.7138802409172058, ‘loss’: 0.6005937457084656}

Saving the Model for Future Use

We now save the model for further use later on.

torch.save(model.state_dict(), ‘men_women.pth’)model2 = to_device(class_finder(), device)model2.load_state_dict(torch.load(‘men_women.pth’))

Future Work

As we see, the accuracy is pretty low. Some ways to increase it would be to — 1) get a larger set,

2) use high resolution of the images,

3) do some data augmentation on the images, etc.

Now, you can experiment more on different datasets using some of the techniques which this article suggests. Carry on!