MCM-Problem-C-Overview

C题是MCM于2016年新增设的题目,被称为Data Insights类问题,关注与数据有关的数学模型。因此,与之前的MCM赛题相比,统计、模式识别等领域的模型可能用的更多。
C题是与数据有关的实际问题,建模的时候可能会遇到各种困难,如数据集较大(但还不是大数据级别),混合的数据类型,数据缺失等。但C题不是大数据(big data)问题,不需要参赛队掌握特殊的计算机科学知识,如数据处理的基本算法、分析技巧,或是访问高性能计算平台等。
题目的数据是可以公开访问的。
虽然不是大数据问题,但是压缩后的数据文件可能会超过100MB,这比往年MCM赛题的数据要大。选题时要考虑是否有足够的实力处理这么大的数据集。
压缩文件中除了数据库文件,可能还会有字典,映射文件,或者代码,用以建立标签。
将以多种格式提供数据文件,如SAS、SPSS、STATA和CSV。
可以使用软件,如Statistic, JMP, SAS, SPSS, Excel, R, Matlab等,但不要求必须使用某种特定的软件。如果竞赛中使用了特殊的软件或者代码,要了解其背后的数学原理。
竞赛只需要提交论文,不需要提交数据库文件

The 2016 MCM introduces a new modeling challenge – Problem C - that is best described as
Data Insights. Problem C is intended to focus on and amplify specific elements of mathematical
modeling challenges associated with data. In this sense, techniques stemming from statistics and
pattern classification will play a larger role in creating a mathematical model on this problem
than in previous contests.
While not a ‘big data’ challenge in the sense of teams needing to develop specialized computer
science-based data handling algorithms and analysis techniques or have access to high
performance computing platforms, the problem will provide teams with an opportunity to
encounter real-world, challenging data that have interesting characteristics. Naturally occurring
complicating factors such as data set size (but not big data), blend of data types, breadth of
representation in data elements, cross-discipline sources, time series dependencies, censored or
missing data, and others could present themselves depending on the specifics of the modeling
problem.
MCM Problem C: Data Insights
 Teams will be given access to database files that will be made available from a public
website.
 The database files will be compressed for size but the file size could still be 100mbs or
more and teams should take this into consideration prior to choosing Problem C.
 Each zipped file may include the database files along with the data dictionary, data
mapping file, and program code to create value labels.
 The database will be made available in multiple formats SAS, SPSS, STATA and CSV.
 Software such as Statistica, JMP, SAS, SPSS, Excel, R, Matlab or other applications may
be used to aid in your solution but no one particular piece of software is endorsed or
required. If specialized software or custom code is used to support the contest effort,
teams should take care to clearly communicate an understanding of the mathematics and
assumptions applied via tools and algorithms in the software.
 When submitting your final electronic solution you are NOT required to submit back the
database file or any data for that matter. The only thing that should be submitted is your
electronic (word or PDF) solution.

你可能感兴趣的:(数据平台开发实习生)