鼠鼠学PYTHON

《学习笔记模式》

Here are some guidelines explaining the scope that the final exam might cover.

  1. Data Processing (15%):
    • File Handling with Python
      • Read files (e.g., txt, CSV);
      • Write to files (e.g., txt, CSV);
    • Scientific Computing Library: using numpy module.
      • Array Creation, Shaping, and Transposition
        • Array Creation Routines
        • Shaping
      • Transposition
      • Indexing & Slicing
      • Mathematical Operation
      • Statistics & Linear Algebra   
    • Python Data Analysis Library: using pandas module.
      • Create DataFrame & Series;
      • Indexing & Slicing;
      • Calculation, Joining and Concatenation;
      • Some advanced pandas operations
  2. Data Visualization (5%): line plot, scatter plot, bar plot, histogram.
    • Visualization using Matplotlib;
    • Visualization using Seaborn;
  3. Web Scraping (10)
    • Navigate a tree of an HTML doc using BeatifulSoup;
  4. Time Series Analysis (10%): using statsmodels.tsa.
    • Create time series;
    • Check whether the time series is stationary;
    • Exponential smoothing;
    • Component Decomposition;
    • Autocorrelation function (ACF) & Partial autocorrelation function (PACF)
    • Model Diagnostics & Model Selection of ARIMA
  5. Machine Learning (30%): using scikit-learn library.
    • Supervised Learning
      • Classification: Basic concepts of Logistic Regression (sigmoid function), Support Vector Machines (kernel), K-Nearest Neighbor (weight), Naive Bayes, Decision Tree, Random Forests.
        • Binary Classification
        • Multiclass Classification
          • One-versus-All (OvA) Strategy
          • One-versus-One (OvO) Strategy
      • Regression: OLS, Ridge Regression, LASSO;
        • Linear Regression
        • Polynomial Regression
    • Unsupervised Learning: Basic concepts of K-Means and PCA
      • Clustering
        • K-Means
          • Finding the Optimal Number of Clusters;
          • Inertia;
          • Silhouette Score    
      • Decomposition
        • Principle Component Decomposition
          • Variance Explained
    • Training ML Models;
      • Prepare data for model training;
      • Training, Validating, and Testing (e.g., Cross-validation);
      • Fine-tuning the model (e.g., GridSearchCV);
    • Evaluate ML Models;
      • Performance measures:
        • Classification: Accuracy, Confusion Matrix, Precision, Recall, F1 Score, ROC
        • Regression: MSE, RMSE, MAE, RMAE
    • Make Predictions;
    • Deal with Missing Values;
    • Feature Scaling;
    • Handling Text and Categorical Attributes: encoder;
  6. Natural Language Processing (20%)
    • Simple Regular Expression using re module;
    • Tokenization and Stemming using NLTK;
    • Topic analysis using Gensim:
      • Latent Dirichlet Allocation (LDA);
      • Basic concepts: Document, Corpus, Dictionary, etc.
      • Bag-of-Word model;
    • Analyze text using SpaCy:
      • Tokenization;
      • Part of Speech (POS) Tagging
      • Named Entity Recognition (NER)
    • Others:
      • Feature representation/Vectorization: Count vectorizer and TF-IDF vectorizer using scikit-learn;
      • Text classification: such as using Naive Bayes to classify text;
      • Sentiment Analysis using nltk.sentiment.vader;
  7. Network Analysis (10%): using NetworkX module.
    • Basic concepts of network analysis: node, edge, graph, etc.
    • Creating a graph;
    • Network Properties
    • Graph visualization
    • Node2vec embedding;

可用以下模式搜集Python练习题,并附上相关知识点。该系列将不断更新。

Question 1:

----
##

Question 2:  

----
##

Question 3:   

----
##

你可能感兴趣的:(python,回归,算法,线性回归)