COSC420 Assignment 1
Classifying text from images
Weight:20% Lecturer: Lech Szymanski
For this assignment, you will be building and training neural network models using
Tensorflow’s Keras library. Datasets for your training and testing will be provided and
they correspond to three different types of classification.
Simple classification
In this task you are given 28x28 pixel colour images of letters from the English alphabet
written in different fonts. The task is to identify the letter in the image, meaning 26
classes (same class for the lower and upper case character). There are two variants of this
dataset – one with black characters over uniform light background (20000 train images,
4000 test images), and the other of multi-colour characters over random image
background (60000 train images, 12000 test images).
Figure 1: Simple classification dataset – ‘clean’ on the left, ‘noisy’ on the right.
Fine-grained classification
In this task you are given the same set of images as in simple classification, but the labels
indicate which font the text is written in. There are 8 fonts in total. The difference
between this task and simple classification is that the nature of differences between
representatives of different classes are more subtle, and (presumably) harder for the
network to distinguish.
1
Figure 2: Fine-grained classification dataset – ‘clean’ on the left, ‘noisy’ on the right.
Multi-label classification
In this task you are given 48x48 pixel colour images of three-letter words made of random
letters written in one of possible 8 choices of font types. The label indicates the letters
found in the image (order is of no significance for this task) as well as the font type.
There are variants of this dataset – one with black characters over uniform light
background (20000 train images, 4000 test images), and the other of multi-colour
characters over random image background (60000 train images, 12000 test images).
Figure 3: Multi-label classification dataset – ‘clean’ on the left, ‘noisy’ on the right.
2
Task 1, Implementation (10 marks)
Select two classification tasks you want to work with, devise a neural network
architecture(s), build the network(s) in Tensorflow and test their performance. The order
of tasks, as presented above, is roughly in the increasing order of difficulty of
classification. That is, simple classification should be pretty easy (in terms of the network
achieving good accuracy) and multi-label classification will most likely be the hardest to
get the network to perform well. If you manage to build a network that performs well on
the ‘clean’ version of the data, train it also on the ‘noisy‘ version. Try to come up with
strategies for the network architecture, data augmentation, and anything else that you
can think of to get the best accuracy.
The point of this exercise is not necessarily to attempt all tasks, but rather to find a
challenging problem and have a go at trying to solve it with a neural network.
Specifically, you should aim to have a go at two of the tasks. You might need to consider
different architectures for different tasks.
Don’t forget to clean up your code before submission and add comments. Make sure to
save your model after training, and it would be very helpful if you set up your script so
that it loads pre-trained network when I run it (so that I don’t have to wait for them to
train).
I will test the accuracy of your model on completely new set of test images.
Task 2, Report (10 marks)
Write a report of what you have done. What I am looking for here is a justification for
choices made in your implementation and a methodical approach to your investigation. If
you attempt different types of regularisations, record the results, report what happened
and provide some justification for why you think it did (or did not) work. This is meant
to be a technical report – concise, but clear. Including diagrams and figures, such as plots
of training and validation accuracy over the course of training, is a good way to give me
more insight into what you have done, and to back up any decisions made about
hyper-parameters, strategies, etc.
3
Importing the data into Python
Download cosc420_assig1_data.py into the folder of your Python scripts project. In
your own script you can import the load_data function and invoke it (as shown below)
to load training data, testing data and a list of class names.
from cosc420 assig1 data import load data
Loading simp l e c l a ssi f i c a t i o n c l e an v a r i a n t
t r a i n ima g e s is a 20000 x28x28x3 mat r i x o f 20K images
t r a i n l a b e ls is a 20000 v e c t o r o f l a b e ls c o n t a i n i n g
i n d e x e s o f 10 c l a ss name s
(train images , train labels),
(test images , test labels),
class names = load data(task=’simple’, dtype=’clean’)
To choose different classification task you can change the string value for the task
argument to:
• ’simple’ for simple classification,
• ’fine-grained’ for fine-grained, and
• ’multi-label’ for multi-label.
To select the clean or noisy version of the data for a particular task, change the dtype
argument to:
• ’clean’, or
• ’noisy’
respectively. The first time you load a particular task, the load_data function might
take a while, since the data needs to be downloaded first (you need to be connected to
the Internet, but just for the first time loading a given dataset).
The train_images variable will be a N × S × S × 3 numpy array, where N is the number
of images, S × S is the size of each image in pixels and 3 is for three colour channels. For
the simple and fine-grained classification S = 28 and for multi-label S = 48. For the
clean version of the data N = 20000 and for the noisy N = 60000.
For the simple and fine-grained classification the train_labels will be a N-dimensional
array of integers in range 0-25 and 0-7 respectively (corresponding to 26 and 8 indexes for
classes in the simple and fine-grained tasks). For the multi-label classification task,
4
train_labels will be a zero-one N × 34 matrix – each element in the 34-dimensional
label vector indicates presence of that class in the image if it is set to 1, otherwise it is set
to 0. There are 34 possible classes for 26 letter and 8 font types. Since there are 3 letters
in a given image and they are in a given font style, a given label vector will have up to
four 1’s, otherwise it will be zero everywhere else.
The test_images and test_labels will follow the format format of train images and
labels, except there will be fewer images in that set.
The class_names variable will be a list of strings giving the class names: 26 strings for
simple classification, 8 for fine-grained and 36 for multi-label.
If you’d like to see a sample of the images and labels that you have just loaded, you can
use the show_data_images method from show_methods.py given in the Tensorflow
tutorial (Lecture 3, git url: https://altitude.otago.ac.nz/...).
For instance, to see a sample of 16 images from the training set you would do:
import show methods
Load t h e d a t a s e t using c o s c 4 2 0 a ssi g 1 d a t a ’ s
l o a d d a t a method as shown above
.
.
.
show methods.show data images(
images=train images [:16],
labels=train labels [:16],
class names=class names)
Submission
The assignment is due at 4pm on Monday of Week 7 (19th April). Zip the folder
including your code, saved model files and report and submit electronically via
Blackboard. Your scripts should pick up the included model files, load the corresponding
networks and evaluate the model using test data. At the start of your report include a
brief guide explaining which script does what.
5
Academic Integrity and Academic Misconduct
Academic integrity means being honest in your studying and assessments. It is the basis
for ethical decision-making and behaviour in an academic context. Academic integrity is
informed by the values of honesty, trust, responsibility, fairness, respect and courage.
Students are expected to be aware of, and act in accordance with, the University’s
Academic Integrity Policy.
Academic Misconduct, such as plagiarism or cheating, is a breach of Academic Integrity
and is taken very seriously by the University. Types of misconduct include plagiarism,
copying, unauthorised collaboration, taking unauthorised material into a test or exam,
impersonation, and assisting someone else’s misconduct. A more extensive list of the
types of academic misconduct and associated processes and penalties is available in the
University’s Student Academic Misconduct Procedures.
It is your responsibility to be aware of and use acceptable academic practices when
completing your assessments. To access the information in the Academic Integrity Policy
and learn more, please visit the University’s Academic Integrity website or ask at the
Student Learning Centre or Library. If you have any questions, ask your lecturer.
• Academic Integrity Policy