How to apply a standard scaler on your own dataset?

This is only a very short post that contains some tips you need when scaling your data and (maybe) some problems you'll meet during this process.

There are many state-of-the-art libraries can handle this problem easily for you. I'll introduce the one I am mostly familiar with, scikit-learn in Python.

This is the most top 5 rows in our sample dataset, where open, high, 'low', 'volume' and 'amount' are our features and close is the target we want to be able to predict after the model is trained.

How to apply a standard scaler on your own dataset?_第1张图片
Data Example

But wait, before we start throwing our data into the model training process, what did you forget?

You need to standardize features by removing the mean and scaling to unit variance.

def standard_scaler(X_train, X_test):
    train_samples, train_nx, train_ny = X_train.shape
    test_samples, test_nx, test_ny = X_test.shape
    X_train = X_train.reshape((train_samples, train_nx * train_ny))
    X_test = X_test.reshape((test_samples, test_nx * test_ny))
    preprocessor = prep.StandardScaler().fit(X_train)
    X_train = preprocessor.transform(X_train)
    X_test = preprocessor.transform(X_test)
    X_train = X_train.reshape((train_samples, train_nx, train_ny))
    X_test = X_test.reshape((test_samples, test_nx, test_ny))
    return X_train, X_test

TODO...
TODO...
TODO...

你可能感兴趣的:(How to apply a standard scaler on your own dataset?)