特征工程(Feature Engineering)是从原始数据中创造新的特征以提升算法学习效果的过程。

  • feature engineering: This process attempts to create additional relevant features from the existing raw features in the data, and to increase the predictive power of the learning algorithm.
  • feature selection: This process selects the key subset of original data features in an attempt to reduce the dimensionality of the training problem.
    Normally feature engineering is applied first to generate additional features, and then the feature selection step is performed to eliminate irrelevant, redundant, or highly correlated features.

  • 对于数值型特征(numeric values)
    i.e. age与income为非线性关系

using bucket for different weights


age_buckets = tf.feature_column.bucketized_column{
  boundaries=[31, 46, 60, 75, 90]
  • 对于类别型特征(categorical values)
    For small vocabulary: use the raw value

    feature crossing

For larger vocabulary: use hash or embedding

occupation = tf.feature_column_categorical_column_with_hash_bucket('occupation', 1080)


Dense vectors vs One-hot(Sparse)
tensorflow projector可视化网站

  • Word Embeddings
    • word2vec
      • skipgram
      • CBOW
      • GloVe
    • Word Regularities
    • Doc2vec
  • Image Embeddings
    • Single layer embeddings
      • DeCAF
      • CNN Features off-the-shell
    • Studies of transferability
      • Transferability of features
      • Factors of transferability
    • Multiple layer embeddings
      • Full-Network embedding
  • Multimodal Embeddings
    • Introduction
    • Image and Text Multimodal Embeddings
      • Two separate embeddings
      • Pairwise Ranking Loss
      • Available datasets for Image Captioning
      • Applications today
    • Other multimodal combinations

