COMP9318_WEEK1

声明:由于本人也是处于学习阶段,有些理解可能并不深刻,甚至会携带一定错误,因此请以批判的态度来进行阅读,如有错误,请留言或直接联系本人。

COMP9318(Data Warehouse & Data Mining)课程简介

教师:王伟(Professor)

Email:自行搜索

教师个人履历:本科上海交通大学;博士香港科技大学;发表论文数超过90篇(截至2014年引用达到3746次);PC co-chair of Asia Pacific Web Conference 2013, and the PC Area Chair of ICDE 2014 for the track of "Strings, Texts and Keyword Search".

教师个人研究兴趣方向(本人理解为研究方向):PS.对这方面感兴趣额小伙伴可以多与这位老师交流

My general research interests center around query processing and optimization in database systems and novel data management applications, including:

(1) Similarity search (including kNN and top-k search) and its applications (e.g., near duplicate object detection, record linkage, and data archiving)

(2) Integration of database and information retrieval technologies (DB + IR) (e.g., effective and efficient keyword search on the wikipedia data or freebase data)

(3) Spatial, text, graph, and multimedia databases (e.g., indexing objects in high dimensional spaces)

(4) Query processing issues in XML databases, Data Warehouses, and Data Mining

(5) Knowledge graph / natural language processing

(6) High-dimensional data / Similarity query processing

(7) AI

课程介绍:COMP9318 Data Warehouse & Data Mining

Data Warehouse: (a) Data Model for Data Warehouses. (b) Implementing Data Warehouses: data extraction, cleansing, transformation and loading, data cube computation, materialized view selection, OLAP query processing. Data Mining: (a) Fundamentals: data mining process and system architecture, relationship with data warehouse and OLAP systems, data pre-processing. (b) Mining Techniques and Application: association rules, mining spatial databases, mining multimedia databases, web mining, mining sequence and time-series data, text mining, etc. The lecture materials will be complemented by projects /assignments.

课程作业:1 written assignment + 1 programming project + 5 labs(5个lab取3次成绩最好的,1个lab可以提交多次,可以立马得到feedback)

Final mark = 0.15 * (assignment + project + lab) + 0.55 * exam

本课程是double pass,需要final exam >= 40.

WARNING: This course has

  • Broad coverage

  • Heavy workload

  • High fail rate ≥ 20%

  • Plagiarism is not allowed.

Text Book:

  • Leskovec et al, Mining of Massive Datasets (ver 2.1),

Available at http://infolab.stanford.edu/~ullman/mmds.html

  • Jensen et al, Multidimensional Databases and Data Warehousing. (Accessible from a UNSW IP)

  • Han et al, Data Mining: Concepts and Techniques, 1st/2nd

edition, Kaufmann Publishers.

Reference Books:

  • Tan et al, Introduction to Data Mining, Addison-Wesley, 2005.

  • Witten et al, Data Mining: Practical Machine Learning Tools

and Techniques with Java Implementations, 1st/2nd edition,

Morgan Kaufmann.

  • Charu Aggarwal, Data Mining: The Textbook, Springer, 2015.

Software:

  • Anaconda

  • Python 3

  • Jupyter notebook

  • Python libs such as numpy, pandas, matplotlib,scikit-learn, . . .

Reading Materials:

  • Papers from machine learning/data mining

conferences/journals, white papers, surveys, etc.

  • All available from the course Web page.

日程安排:


COMP9318_WEEK1_第1张图片
Schedule

目的:

  1. Cover practically useful data mining/machine learning algorithms and concepts

  2. Foster deeper understanding of maths, models, and algorithms

  3. Gain hands-on experience with solving real problems

要求:

  1. You need to have a solid background in Maths (Linear Algebra, Calculus, Probability & Statistics) and programming (mainly python).

  2. Understand (not memorize) concepts/equations/algorithms.

  • Ask why.

  • Describe it in your own language to a layman.

作业环境及上传:

1.Use Linux/command line.

(1)Project marked on linux servers

(2)You need to be able to upload, run, and test your programunder linux.

2. Assignment/Project submission

(1)Give to submit. Watch out for possible error messages.

(2)Classrun. Check your submission, marks, etc. Read https://wiki.cse.unsw.edu.au/give/Classrun

(3)Common errors:

File corrupt (during SFTP?), not in the correct format.

Submission not accepted by the system (wrong filename? Too large? . . . ).

  1. Lab submission: our home-made Web submission system.

Other specialised courses in the Database or Data Science stream:

  1. COMP9319: Advanced algorithms on compression, text/XML databases, etc.

  2. COMP9313: Big data systems (hadoop, spark, etc)

  3. COMP6714: Information retrieval, Natural language processing, Search engines.

Other machine learning courses:

  1. COMP9417: Machine Learning and Data Mining

  2. COMP9444: Neural Networks and Deep Learning

  3. COMP9418: Advanced Machine Learning

涉足data management, text mining,machine learning, and natural language processing领域的小伙伴还有两个彩蛋:

  1. PhD scholarship and/or top-ups available.

  2. Special research project (12UoC or 18UoC) for MIT students

(needs to contact me by the end of this semester)

老师给了道题:John got a positive result for the α test, and the probability that patients with the deadly β disease having a positive α test result is 99%. Should John be worried about having the β disease?

这是一个条件概率题, 由此我们可设事件α发生的概率为P(α),事件β发生的概率为P(β)。又由题可知,在β条件下α的概率为99%,即P(α|β) = 99%.

由此,我们可以解题,


COMP9318_WEEK1_第2张图片
image.png

所以:


COMP9318_WEEK1_第3张图片
image.png

如对条件概率不是很懂,可参照:https://zh.wikipedia.org/wiki/%E6%9D%A1%E4%BB%B6%E6%A6%82%E7%8E%87

你可能感兴趣的:(COMP9318_WEEK1)