Web science 笔记 Crowdsourcing, Stock prediction

Content

  • Crowdsourcing
    • 3 central aspects of crowdsourcing
    • Overall process
      • Process
      • Aggregating output
    • Benefits
  • Stock prediction
    • Background model
      • modern portfolio theory (MPT)
      • efficient market hypothesis (EMH)
      • Social media as a social sensor
      • Stock-net
    • stock price prediction
      • Data collection
      • Models

Crowdsourcing

Outsourcing some tasks to a crowd -> Crowdsourcing
Improve the quality, timeliness and breadth of data
将一些任务外包给人群 -> Crowdsourcing
提高数据的质量、及时性和广度

Key questions:

  • What computational problems can/should be solved?
    Data augmenting, Data processing

  • What are the programming paradigms/platforms?
    A programming paradigm is the classification, style or way of programming. It is an approach to solve problems by using programming languages.

  • How do we guarantee that the solution is accurate, efficient and economical?
    Quality, cost and latency

  • How do we motivate participation and leverages their unique expertise and interests of workers?

  • How do we leverage the joint efforts of both automated and
    human computers as workers?

3 central aspects of crowdsourcing

  • What
    • What tasks can be performed by machines
    • Decompose the macro and micro tasks
  • Who
    • Expertise of workers (如何模拟工人的专业知识)
    • Manage cultural aspects and language barrier
  • How
    • How to design and execute tasks
    • Aggregate noisy & complex output ( defines how intelligent aggregation techniques should be, such as Hierarchical—cluster-based aggregation) 聚合嘈杂和复杂的输出(定义智能聚合技术应该如何,例如分层 - 基于集群的聚合)

Overall process

Process

  • 使用Parallel安排worker
    • Operations & Control: 多产线并行,成本高
    • Cost vs latency:cost high, low latency 成本高,延迟小
  • 使用sequential安排worker
    • Operations & Control: 一个接一个
    • Cost vs latency:延迟高,需要等上一个工人的结果,但如果计划分配三名工人,如果他们中的两个同意结果,那么不需要执行另一个 HIT,节约成本
  • Operations & Control
    • Repetition
      You repeat the tasks until you are satisfied
      重复任务直到满意
    • Selection
      You retrieve tasks using selection mechanisms
      使用选择机制检索任务

Aggregating output

Challenges

  • Outputs are noisy (lack of expertise)
  • Humans are not always reliable (cheating)
  • Cultural context may bias the answers

Goal

  • Automatic procedure to merge HIT results

Assumptions

  • There exists a “true” answer
  • Redundancy helps

挑战

  • 输出嘈杂(缺乏专业知识)
  • 人类并不总是可靠的(作弊)
  • 文化背景可能会影响答案

目标

  • 自动合并 HIT 结果的程序

假设

  • 存在一个“真实”的答案
  • 冗余有帮助

Latent Class models
Web science 笔记 Crowdsourcing, Stock prediction_第1张图片
crowdsourcing

Designing a crowdsourcing solution
Preparation and initialization
Decomposition and Aggregation
Worker Management
Prior & External Information 计算机生成的证据,主动学习

Web science 笔记 Crowdsourcing, Stock prediction_第2张图片

Benefits

  • Capturing important information in a timely fashion

  • Labeling datasets

  • Quality of the results

  • Breadth of data

  • 及时获取重要信息

  • 标记数据集

  • 结果的质量

  • 数据广度

Stock prediction

Investment factors

  • Liquidity principle: financial assets held in rapid cash ability
  • Safety principle: the value of the financial asset and and bear ability due to the loss of accident risk
  • Profit principle: a financial asset investment income level
  • 流动性原则:持有的金融资产具有快速变现的能力
  • 安全原则:金融资产的价值和因事故风险损失而产生的承受能力
  • 盈利原则:金融资产投资收益水平

Background model

modern portfolio theory (MPT)

MPT 用于选择投资以在可接受的风险水平内最大化其整体回报

利用不同的收益集(盘中、收盘和调整后收盘)和相关性(在一个行业内和与其他市场)来预测未来收益

投资者可以根据对风险承受能力的评估选择两者的最佳组合,从而获得最佳结果。 这种最佳组合构成了有效边界,它是 MPT 的基石,也是指示投资组合的基本线,这些投资组合将提供以最低的风险获得最高的回报。

efficient market hypothesis (EMH)

EMH是金融经济学中的一个假设,它指出资产价格反映了所有可用信息。 EMH 指出全球金融市场在信息上是有效的,这意味着股票价格反映了与目标公司相关的所有信息

Social media as a social sensor

社会媒体有对股票的讨论和信息

Stock-net

Stock-net是一种深度学习解决方案,具有 3 层架构,基础层是市场信息编码器,用于对推文和股票价格数据进行编码。该模型试图根据推文学习股票走势,使用基于事件的情绪分析进行股票预测。

更集中地使用tweeter数据

stock price prediction

Data collection

  • trading day data of a stock

    • Basic: Date, Open price, Close price, High, Low, Adjusted close, Volume,
      日期,开盘价格,收盘价格,当日最高价,最低价,修正收盘价(考虑任何公司行为后的修正收盘价),交易量(交易日交易的股票数量的价值)
    • More: Twitter Data for stocks
      推特股票数据
  • Cleaning the data
    清洗推特数据留下text

  • Data processing

    • 特殊符号处理,时间统一
    • 按照交易日合并股票价格和tweet text
      • 用开盘价需要假设推文可能来自一天中的任何时间
      • 收盘价更容易了解趋势,并有助于确定推文是否对股票有任何影响
  • Trend representation

    • 收盘价和开盘价作差,正向trend标记1,负向trend标记0
  • Normalization dataset

Models

模型用tweet text来预测 trend, 时间作为index

  1. 使用LSTM/BiLSTM
    Web science 笔记 Crowdsourcing, Stock prediction_第3张图片

  2. BERT model
    BERT 代表来自 Transformers 的双向编码器表示,它基于 Transformers,这是一种深度学习模型,其中每个输出元素都连接到每个输入元素,并且它们之间的权重是根据它们的连接动态计算的
    Web science 笔记 Crowdsourcing, Stock prediction_第4张图片

  3. dense neural network
    Web science 笔记 Crowdsourcing, Stock prediction_第5张图片

  4. Distilled BERT

你可能感兴趣的:(nlp)