Faculty of Engineering and Information TechnologyThe University of MelbourneCOMP90073 Security Analytics,Semester 2, 2023Assignment 2: Blue Team & Red Team CybersecurityRelease: Fri 1 Sep 2023Due: Tue 17 Oct 2023Marks: The Project will contribute 25% of your overall mark for the subject.You will be assigned a mark out of 25, according to the criteria below.
1. Overview
You’ve recently been brought into head up the cybersecurity team FashionMarketData.com -who are a major player in providing platforms for online fashion retailers to leverage theirdata through machine learning. However, after recently witnessing high profile databreaches, including at Medibank and AHM, leadership of thecompany are concerned thatthe business might face existential financial, legal, and reputational risks stemming fromhackers potentially manipulating their data, or exploiting their machine learning models.The CEO has tasked you with heading up the newly formed Blue and Red Teamcybersecurity groups inside the company, and developing a report for the board that outlinesboth risks and opportunities for the company. The Blue Team are concerned about userspotentially uploading images that do not match their labels, througheither mistaken use ofthe platform, or to potentially actively manipulate the company's systems. As such, the BlueTeam are working on designing and implementing systems that ensure only genuine,fashion-related images are processed and ingested into the company’s crowd sourceddatasets. In practice, this will involve reliably detecting and distinguishing both anomalousand out of distribution samples.
The Red Team are taking a different bent. Ratherthanactivelydefending the company’ssystems, they’re more concerned with understanding the scope of vulnerabilities in machinelearning models that have rapidly become a core part of the company’s business practices.As such, the team plans to construct evasion and data poisoning attacks against exemplar,non-production models, and to use these results to build a picture of the vulnerabilitiespresent within the company’s systems and processes.Finally, you will need to present a report for the non-technical leadership ofFashionMarketData.com, based upon your insights from working with both the Blue and Redteams. Due to the critical nature of understanding the risk that the company may face fromits data management and machine learning practices, it is crucial that you deliver this reportat the next meeting of thecompany'sboard, which will be on Tuesday the 17th of October,2023.DatasetsTounderstand these vulnerabilities, you have been provided with images encompassing 10distinct fashion categories, which have primarily been drawn from the Fashion-MNISTdataset. This dataset consists of 28*28 grayscale images in 10 distinct fashioncategories.This compilation serves as your in-distribution (normal) dataset, representative of the coreand expected content within the fashion domain. Examples of the first 3 fashion categories inthe given dataset are shown below.Your primary task is to devise and refine multiple algorithms with the specific aim ofidentifying anomalies and out-of-distribution (OOD) images. OOD samples refer to imagesthat do not belong to the fashion categories defined w.r.t. the in-distribution data, such asairplanes, animals, and hand-written letters/characters, etc. Meanwhile, anomaly samplespertain to fashion images that diverge from typical fashion items. While they remain
categorically fashion images, they differ from the familiar images in the dataset due todistortions, rotations, cropping, and similar alterations.To facilitate this objective, five separate datasets will be made available to you. Each datasetwill play a crucial role in training, validating, and testing the efficiency and accuracy of thealgorithms you develop.Dataset DescriptionTraining set[train_data.npy]train_labels.npy]This dataset features images from 10 unique fashioncategories, labelled from 0 to 9. It acts as the principalguide to discern the standard content in the fashion
domain. You will employ this dataset for both anomalyand OOD detection tasks.For Anomaly DetectionValidation setanomaly_validation_data.npy
This set comprises both original and distorted fashionitems. Importantly, items labelled '1' indicate ananomaly status, while those labelled '0' representnormal data. This validation set is primarily intended fortuning your model's hyperparameters and reporting itsperformance using the relevant metrics in your analysis.For Anomaly Detection
2. Test set
[anomaly_test_data.npy]The test set comprises original and distorted fashionitems, with a similar proportion found in the validation
set. However, unlike the validation set, this datasetcontains no labels. As such, you are required to useyour trained model to predict their anomaly statuses.For OOD DetectionValidation setood_validation_data.npy
This set contains a blend of both fashion and nonfashion items.Notably, items labelled as '1' signify OutOf-Distribution (OOD)status, indicating they do notalign with the standard fashion categories. On the otherhand, samples labelled '0' represent in-distribution data.Primarily, this validation set is intended for tuning yourmodel's hyperparameters and for reporting performanceusing the relevant metrics in your analysis.For OOD DetectionTest set[ood_test_data.npy]The test set includes both fashion-centric items andpotential non-fashion items, mirroring the proportion
observed in the validation set. Unlike the validation set,this dataset lacks pre-labelled OOD statuses. As such,you will be required to use your trained model to predictthese OOD statuses.Note that the NumPy array file (.npy) can be loaded via: data=np.load('input.npy').
Blue Team TasksYou will create anomaly detection and OOD detection algorithms with the provided trainingand validation sets. Following the development phase, these algorithms will be tested on the
given separate test set. You need to annotate this test set withanomaly and OOD statusesderived from each of the detectors.For anomaly detection, you will develop two distinct detectionalgorithms:
1) Shallow model
Use a shallow (non neural network) model (e.g., OCSVM, LOF) to develop a detector for identifying non-fashion items. It might be beneficial to utilize dimensionalityreduction techniques before inputting the data into the detection model.
2) Deep learning model
Develop a deep learning model, such as autoencoder, to detect whether an item belongs to the category of fashion items or not. For OOD detection, you are required to develop a single algorithm.
3.Deliverables
1) The predicted labels for the test sets (submit in a zip archive of .npy files)• After running each of the three detection algorithms on the testset, the annotatedresults (the non-fashion statuses determined by each detector) should be prepared ina structured format as the validation set.• For your Blue Team results you will need to generate 3 result files corresponding toeach of the Blue Team approaches. The filenames will be 1.npy (anomaly detection:shallow model), 2.npy (anomaly detection: deep learning model), and 3.npy (OODdetection).
2) Python code or Jupyter Notebook (submit as zip archive)
• This should contain the complete code for all three detectionalgorithms, starting fromdata import and preprocessing, progressing to algorithm implementation, and endingwith an appropriate analysis of your results. This may include visualisations to helpemphasise your points.
• It is important that if your code is a Jupyter Notebook, the notebook must contain theevaluated results of all cells; and if you are using Python, it must be able to be runcompletely. In both cases, you must include a supplementary README file thatincludes the versions of all libraries used must be included.
• When utilizing any pre-existing code sourced from online resources, you need toclearly annotate that within your codebase using comments. Furthermore, pleaseensure that you provide a comprehensive reference to these sources in your report,detailing their origins and contributions to your work.
• Ensure the code is well-structured with clear function definitions, variable names, andcomments that explain the purpose of each section or step. Complex or nuancedsections of the code should have accompanying comments for clarity.− If submitting the Python code (.py), please provide comments in the code formajor procedures and functions. Also, please contain a README file (.txt)
showing instructions on how to run each script. You need to submit a zip archivecontaining all scripts (.py) and README.txt.
− If submitting a Jupyter Notebook (.ipynb), incorporate markdown cells to segmentthe code and provide explanatory notes or observations. Before submission,restart the kernel and run the notebook from the beginning to ensure all cellsexecute in order and produce the expected outputs. Ensure that all outputs,especially visualizations or essential printed results, arevisibleand saved in thenotebook.
• Please include all data preprocessing or visualisation steps you may haveundertaken, even those not included in the report. Use comments to specify theintent behind each result/graph or whatinsights were derived from them.
3) Report (submit as PDF)
• Your report should be targeted towards your intended audience and should usequalitative and quantitative methods to describe your techniques, results, insights,and understandings. This should include an appropriate description of your choice ofdetection algorithms, evaluation methods, and a discussion regarding theramifications of these choices. As a part of this, it is important to include any
challenges that you faced, and the decisions or assumptions you made throughoutthe process. Your results should be presented in a fashion that is readilycomprehensible to a non-technical audience.
• Your report should include both an introductory executive summary to provide anoverview of the underlying task, offering a snapshot for readers tounderstand thecontext and objective of the report. Following the body of yourreport, the conclusionshould encapsulate the primary findings of your investigation or study. Additionally,
this section should present recommendations for potentialenhancements oralternative strategies that might be considered in the future.
• The word limit for Blue Team (Task I) report is 1500. Your main report should notexceed 7 pages in length. However, any supplementary diagrams, plots, andreferences that you wish to include can be added after the main report. Theseadditional materials will not be considered as part of the word orpage count limits.
• You should evaluate your model with at least three appropriate metrics for eachdetection algorithm. Some commonly used metrics include AUROC and false positive(FP) rate. However, depending on the context of your algorithm, other metrics mightbe equally or even more relevant. Ensure that your chosen metrics provide a wellrounded viewof the algorithm's performance in its intended application.
• You could also evaluate samples where the model misclassified. For instance, ifcertain types of anomalies are consistently missed, what valuableinsights intopatterns or consistencies could be gained from these failures? Additionally, if thereare any extreme cases that makes your model fails to predict, what measures couldbe taken in future training? You could also discuss how these inaccuracies might
manifest in real-world scenarios.
• To make your findings more accessible to readers, tables are recommended to usefor structured presentation of numerical results and comparisons. Meanwhile,visualizations like bar graphs, scatter plots, or heat maps can offer intuitive insightsinto the data, helping to convey relationships, distributions, or anomalies that mightbe less apparent in raw numbers alone.
• The creativity marks will be allocated based upon both how you extend and presentyour work, and the insights that you provide. When we refer to extensions, this maybe in terms of the techniques taken, insights get from your experiments, comparisonof model parameters, or your comprehensive analysis – even if your tested novelideas are not successful. The amount of time you spend pursuing the creativity
marks should be commensurate with the marks allocated.Red Team Tasks
Because the company’s leadership is cautious about Red Team attempting to attack aproduction model, you will instead need to train a similar model, that you will use as a proxyfor attacking. To ensure that the model closely matches what is used in production, the
trained architecture that you produce should incorporate at least 3 linear layers, with adropout layer (with probability 0.25) preceding the last two linear layers. You should train onthe training set to an accuracy of at least 85%, using a cross entropy loss,anAdamoptimizer, and a learning rate of 10-4. All your input samples should be normalised to within[0,1] before being passed into the model.
After training this model, you will need to design an iterative gradient based attack. The codefor this should be flexible enough that you would be able to take any model and input sampleand attack it for a up to a specified number of iterations with a fixed stepsize, while ensuringthat the attack image always remains within [0,1]. While the maximum number of iterationsshould be fixed, consider if your attack can be modified to stop early if necessary. Yourattack should be able to produce both targeted and untargeted attacks. Because the RedTeam is trying to build up their own internal capabilities, your attack should avoidunnecessary use of off-the-shelf libraries that implement adversarial attacks themselves.
Your focus should be on employing basic machine learningandmathematical libraries, aswell as autograd.To test the performance of your ability to attack the model, youshould attack every 20thsample of the test set. As you do so, vary the step size
for at most 100 steps, and perform an analysis on the untargeted performance of theattack, and the performance when targeted towards forcing the model to predict the 0th class.You will need to perform an appropriate analysis of the attack performance, which should
include an analysis of the success rates and l2 norm distance betweenyour tested imagesand your successful attacks (this l2 norm should be calculated using sum, square-root andpower operations).Given that you know that Blue Team are working on techniques that could be used to detectattacks, your report to thecompany's leadership should consider how the techniques
implemented by the Blue Team could be used to defend your model from adversarial attack.You may also wish to consider how changes to the model architecture, training process, or
data handling procedures may influence the level of adversarial risk faced by your model, orhow you might attack a model that incorporatesdefensive stratagems from the Blue Team.
5.Deliverables
Python code or Jupyter Notebook (submit as zip archive)
• This should contain the complete code for (1) training thunderlying network, (2)performing adversarial attacks, and (3) evaluating their performance. • The requirements/guidelines are identical to those outlined for Blue Team (Task I).
Report (submit as PDF)
• Please make a separate report for Red Team (Task II), The word limit for this task is1000. Your main report should not exceed 4 pages in length. However, anysupplementary diagrams, plots, and references that you wish to include can beadded after the main report. These additional materials will not be considered as partof the word or page count limits. • You should evaluate your model with appropriate metrics for your implementedattack. Some commonly used metrics include accuracy drops and perturbationsize/distribution. However, other metrics might be equally or even more relevant. • You could include visualizations of the adversarial noise and the perturbed images in the report, such as side-by-side comparisons that illuminate the slight alterations that result in significant prediction deviations, helping readers discern the vulnerabilities ofyour implemented attack. Meanwhile, the plot of loss/performance changes versusiterations could be used to provide a visual representation of the model's trainingdynamics, making it easier to diagnose issues, compare solutions, and communicate the model's behaviour to both technical and non-technical stakeholders. • The creativity marks will be allocated based on both how you extend and present your work and the insights that you provide. This may be in terms of the networkstructure, training techniques, adversarial attack techniques, evaluation and analysis, or to present interesting findings – even if your tested ideas are not successful. Theamount of time you spend pursuing the creativity marks should be commensurate with the marks allocated.
If we require any changes or clarifications to the project specifications, they will be posted onthe Canvas. Any addendums will supersede information included in this document. If youhave assignment-related questions, you are welcome to post those in thediscussion board.Academic MisconductFor most people, collaboration will form a natural part of the undertaking of this project.
However, it is still an individual task, and so reuse of ideas orexcessive influence inalgorithm choice and development will be considered cheating. We will be checkingsubmissions for originality and will invoke the University’s Academic Misconduct policy
(http://academichonesty.unimelb.edu.au/policy.html) where inappropriate levels of collusionor plagiarism are deemed to have taken place.Late Submission PolicyYou are strongly encouraged to submit by the time and date specifiedabove, but ifcircumstances do not permit this, the marks will be adjusted as follows. Each day (or partthereof) that this project is submitted after the due date (and time) specified above, 10% willbe deducted from the marks available, up until 5 days have passed, after which regularsubmissions will no longer be accepted.Extensionsthe subject “COMP90073 ExtensionRequest” at the earliest possibleopportunity. We willthen assess whether an extension is appropriate. If you have amedical reason for yourrequest, you will be asked to provide a medical certificate. Requests for extensions onmedical grounds received after the deadline may be declined. Note that computer systemsare often heavily loaded near project deadlines, and unexpected network or systemdowntime can occur. System downtime or failure will not be considered grounds for anextension. You should plan to avoid leaving things to the last minute when unexpectedproblems may occur.