ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

School of Computing and Information Systems The University of Melbourne COMP90073 Security Analytics

2021-10-03 09:33:58  阅读:231  来源: 互联网

标签:Information School Task features Computing will marks model your


School of Computing and Information Systems The University of Melbourne COMP90073 Security Analytics, Semester 2 2021

Project 2: Machine learning based cyberattack detection

Release: Tue 31 Aug 2021
Due: 1pm, Tue 12 Oct 2021
Marks: The Project will contribute 25% of your overall mark for the subject;
you will be assigned a mark out of 25, according to the criteria below.
Overview
There are three tasks in this project: Task I aims to develop your skills in applying
unsupervised machine learning techniques for anomaly detection. Task II helps you better
understand how to use gradient descent-based methods to generate adversarial samples
against supervised learning models beyond the computer vision domain. In Task III, you are
asked to read a recent paper on adversarial machine learning and write a review on it.
Specifically, (1) for Tasks I and II, two network traffic (NetFlow) datasets are provided, one
for each task. Both datasets contain botnet traffic and normal traffic. You need to identify
botnet IP addresses from both two datasets. (2) For Task II you also need to choose a
botnet IP address, and explain how to manipulate the corresponding raw network traffic
records in order to bypass detection. (3) Each student has been assigned a paper for Task
III, which will be sent individually via email.
Deliverables

  1. Task I – Source code (Python) and SPL queries used to do the following:
    a. Generate/Select features from the packet capture files (training and test datasets)
    using Splunk. You can use apps such as Splunk Machine Learning Toolkit, but all
    features have to be generated/selected within Splunk. (1 mark)
    b. Use two alternative feature generation/selection methods (filter-based, wrapperbased, etc.) to select features from packet capture files (training and test datasets).
    (2 marks)
    c. Use Python/Splunk to build six models: apply two different anomaly detection
    techniques on each of the three set of features generated/extracted from 1.a. and
    1.b. (3 marks)
    d. Score the test data such that cyberattacks are assigned the highest (or lowest)1
    scores. (1 mark)
    1 Optionally anomalies may have lowest scores given the applied technique. Some anomaly detection techniques
    assign high scores (e.g., distance measure) to anomalies and some of them assign low scores (e.g., probability)
    to anomalies.
    e. Return the IP addresses of attackers and the timestamps of their first and last
    attempt for attacking the network service (per attack scenario). (3 marks)
    f. Compare and discuss the results from different feature extraction and different
    anomaly detection techniques. (2 marks)
    g. Prepare a TXT file including all stream ID which your program classifies as attack
    traffic, separated by newlines (i.e., one stream ID in each line). (1 mark)
  2. Task II
    a. Source code in Python, including:
    i. Building, training and testing the supervised learning model. (1 mark)
    ii. Generating adversarial samples for a chosen botnet IP address, i.e., how to
    modify its feature values. (2 marks)
    b. Explain how to change the raw traffic sent from/to the chosen botnet IP address, in
    order to reflect the modified feature values. For example, the following six features
    are extracted for each IP address: (1) mean outbound packet size, (2) variance of
    outbound packet size, (3) mean packet count per second, (4) max packet count per
    second, (5) mean of packet jitter, (6) variance of packet jitter. A supervised model
    is trained on these features to decide whether the corresponding IP address is
    malicious. You find that by manipulating the values of the third and fourth features,
    a botnet IP address is labelled as “normal” by the model. Then how do you change
    the raw traffic records so that they are consistent with the modified feature values?
    For instance, if 1000 raw traffic records were related to the bot, do you change all
    1000 records, or only a subset, e.g., 100/200 of them? How do you change each of
    the selected traffic records? (2 marks)
  • Note that for Task II, (1) the model is trained to classify each IP address, not each traffic
    record, as demonstrated in the above example. (2) The focus is not to train an accurate
    detection model (i.e., do not spend too much effort on improving model performance), but to
    understand the difference of generating adversarial samples in domains other than computer
    vision: in the vision domain, raw pixels are often taken as input, and attackers can directly
    manipulate them. However, in other domains such as cyber security, raw data cannot be fed
    into a model directly, and instead features need to be extracted first. Therefore, although it
    would not be difficult to know how to manipulate the features to bypass detection, there will
    be different ways to change the raw traffic records, in order to be consistent with the
    modified feature values and without affecting the botnet functionality.
  1. Task III (6 marks). In this task, you will learn how to write a review for an academic
    paper. Typically, a review should include the following parts:
    a. Summary. Your review starts with a brief summary of the main ideas of the paper.
    It helps meta-reviewers, program chairs and the authors to determine whether
    there are any misunderstandings.
    b. Merits. List the main contributions of the paper in this section. Contributions can be
    theoretical, methodological, algorithmic, empirical, etc.
    c. Main review. Provide a thorough review of the paper, including:
    i. Originality: Are the tasks or methods new? Is the work a novel combination of
    well-known techniques? Is it clear how this work differs from previous
    contributions? Is related work adequately cited?
    ii. Quality: Is the submission technically sound? Are claims well supported (e.g., by
    theoretical analysis or experimental results)? Are the methods used appropriate?
    Is this a complete piece of work or work in progress? Are the authors careful and
    honest about evaluating both the strengths and weaknesses of their work?
    iii. Clarity: Is the submission clearly written? Is it well organized? If not, please make
    constructive suggestions for improving its clarity.
    iv. Significance: Are the results important? Are others (researchers or practitioners)
    likely to use the ideas or build on them? Does the submission address a difficult
    task in a better way than previous work? Does it advance the state of the art in a
    demonstrable way? Does it provide unique data, unique conclusions about
    existing data, or a unique theoretical or experimental approach?
    *Note that (1) the questions listed in c.i -- c.iv are for explanation only. DO NOT write the
    main review in the form of Q&A. Write it like an essay instead. (2) Some papers include
    appendix, which may include proofs, additional experimental settings and results. The
    appendix helps you better understand the paper, but your review should focus on the main
    part of the paper.
    For Tasks I & II:
  2. A README that briefly details how your program(s)/queries work(s). You may use
    any external resources for your program(s) that you wish. You must indicate (cite)
    what these external resources are and where you obtained them, in the README
    file. (0.5 mark * 2)
    *Note: please submit a separate README file for Tasks I & II.
    Technical Report
    A technical report of around 2000 words comprising:
    Task I:
  3. An overview of the test dataset using Splunk and explaining feature
    generation/selection using SPL queries and Splunk native functionalities.
  4. Description of your methodology for generating features. Briefly explain your method
    for the first project, and discuss your modifications and new findings in Project 2.
  5. Review of at least two anomaly detection methods that you have used.
  6. Description of the experimental setup and evaluation of the (two) methods in
    detecting anomalies on the test datasets using features generated in Splunk and also
    features generated using alternative methods. Description should also comprise IP
    addresses of attacker(s) and victim(s), the attacked service(s), the timestamp, and
    the type of the attack per attack scenario identified.
  7. Description of your final CSV file, the scoring and thresholding technique you used
    for detecting the reported anomalies2.
  8. Conclusion and discussion: describe anomaly detection method worked best given
    the attack scenario.
    Task II:
  9. Explanation of the generated features and your choice of supervised learning
    model. Note that supervised learning is used here, and the mode is the target against
    which adversarial samples will be generated.
    2 For example, you may choose the best model as your final model or make an ensemble of models.
  10. Choosing one IP address classified as botnet by your model, and explaining:
    a. How to perturb its features via gradient descent-based method to bypass the
    detection of your model;
    b. How to change the raw network traffic sent from/to it, in order to be consistent
    with the modified feature values and without affecting the botnet functionality.
    You should include a bibliography and citations to relevant research papers and external
    resources and code you have used.
    Review
    A review of 400 – 500 words of the assignment paper.
  11. Summary. This part should contain no more than three sentences. Please be brief,
    but specific.
  12. Merits. List the top three or more main contributions.
  13. Main review. In order for your reviews to provide useful feedback to authors, write
    this section in a top-down manner and start from the most important aspects. Your
    arguments should be objective, specific, concise and polite.
    Assessment Criteria
    Code quality and README (2 marks)
    Technical report (17 marks)
  14. Methodology: (4 marks)
    You will describe your methodology in a manner that would make your work
    reproducible. You should describe in detail:
    Tasks I and II
    a. The features that were generated and/or selected.
    b. The training data that was used to learn the anomaly detection models. You
    should explain how the parameter settings for your methods were performed
    (e.g., setting the

    标签:Information,School,Task,features,Computing,will,marks,model,your
    来源: https://www.cnblogs.com/codedesign/p/15363463.html

    本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
    2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
    3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
    4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
    5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有