【机器学习】Github上爆火的机器学习资源numpy-ml

加州大学伯克利分校的david bourgin博士使用numpy手撸各种机器学习源码,Github爆砍13.3k小星星。

https://github.com/ddbourgin/numpy-ml/tree/master

我为什么要推荐这个资源?

目前开源的机器学习框架有很多,例如sklearn,scipy,tensorflow等等。

但是,当你想调试时,或者想查看某些细节是如何实现时,你会发现,这些框架都依赖了很多其他的库。

而numpy-ml仅依赖numpy。

由于没有使用其他第三方库,很多方法都是从零开始实现,当你想通过查看源码验证理论时,numpy-ml是个不错的选择。

例如,对于ALS矩阵分解,你可以通过代码查看求解子矩阵的迭代过程。

对于决策树的创建,如何通过信息增益计算分割条件的代码也非常详细。

主要内容:

1.Gaussian mixture model
	EM training
2.Hidden Markov model
	Viterbi decoding
	Likelihood computation
	MLE parameter estimation via Baum-Welch/forward-backward algorithm
3.Latent Dirichlet allocation (topic model)
	Standard model with MLE parameter estimation via variational EM
	Smoothed model with MAP parameter estimation via MCMC
4.Neural networks
	col2im (MATLAB port)
	im2col (MATLAB port)
	conv1D
	conv2D
	deconv2D
	minibatch
	Bernoulli variational autoencoder
	Wasserstein GAN with gradient penalty
	word2vec encoder with skip-gram and CBOW architectures
	ReLU
	Tanh
	Affine
	Sigmoid
	Leaky ReLU
	ELU
	SELU
	GELU
	Exponential
	Hard Sigmoid
	Softplus
	Cross entropy
	Squared error
	Bernoulli VAE loss
	Wasserstein loss with gradient penalty
	Noise contrastive estimation loss
	Glorot/Xavier uniform and normal
	He/Kaiming uniform and normal
	Standard and truncated normal
	Constant
	Exponential
	Noam/Transformer
	Dlib scheduler
	SGD w/ momentum
	AdaGrad
	RMSProp
	Adam
	Batch normalization (spatial and temporal)
	Layer normalization (spatial and temporal)
	Dropout
	Bidirectional LSTM
	ResNet-style residual blocks (identity and convolution)
	WaveNet-style residual blocks with dilated causal convolutions
	Transformer-style multi-headed scaled dot product attention
	Add
	Flatten
	Multiply
	Softmax
	Fully-connected/Dense
	Sparse evolutionary connections
	LSTM
	Elman-style RNN
	Max + average pooling
	Dot-product attention
	Embedding layer
	Restricted Boltzmann machine (w. CD-n training)
	2D deconvolution (w. padding and stride)
	2D convolution (w. padding, dilation, and stride)
	1D convolution (w. padding, dilation, stride, and causality)
	Layers / Layer-wise ops
	Modules
	Regularizers
	Normalization
	Optimizers
	Learning Rate Schedulers
	Weight Initializers
	Losses
	Activations
	Models
	Utilities
5.Tree-based models
	Decision trees (CART)
	[Bagging] Random forests
	[Boosting] Gradient-boosted decision trees
6.Linear models
	Unknown mean, known variance (Gaussian prior)
	Unknown mean, unknown variance (Normal-Gamma / Normal-Inverse-Wishart prior)
	Ridge regression
	Logistic regression
	Ordinary least squares
	Weighted linear regression
	Generalized linear model (log, logit, and identity link)
	Gaussian naive Bayes classifier
	Bayesian linear regression w/ conjugate prior
7.n-Gram sequence models
	Maximum likelihood scores
	Additive/Lidstone smoothing
	Simple Good-Turing smoothing
8.Multi-armed bandit models
	Beta-Bernoulli sampler
	UCB1
	LinUCB
	Epsilon-greedy
	Thompson sampling w/ conjugate priors
	LinUCB
9.Reinforcement learning models
	Cross-entropy method agent
	First visit on-policy Monte Carlo agent
	Weighted incremental importance sampling Monte Carlo agent
	Expected SARSA agent
	TD-0 Q-learning agent
	Dyna-Q / Dyna-Q+ with prioritized sweeping
10.Nonparameteric models
	Nadaraya-Watson kernel regression
	k-Nearest neighbors classification and regression	
	Gaussian process regression
11.Matrix factorization
	Regularized alternating least-squares
	Non-negative matrix factorization
12.Preprocessing
	Discrete Fourier transform (1D signals)
	Discrete cosine transform (type-II) (1D signals)
	Bilinear interpolation (2D signals)
	Nearest neighbor interpolation (1D and 2D signals)
	Autocorrelation (1D signals)
	Signal windowing
	Text tokenization
	Feature hashing
	Feature standardization
	One-hot encoding / decoding
	Huffman coding / decoding
	Byte pair encoding / decoding
	Term frequency-inverse document frequency (TF-IDF) encoding
	MFCC encoding
13.Utilities
	Similarity kernels
	Distance metrics
	Priority queue
	Ball tree
	Discrete sampler
	Graph processing and generators

既然numpy支持各种类型数据运算,为什么还需要其他机器学习框架?

虽然 NumPy 是一个功能强大的库,支持各种类型的数据运算,但它主要专注于数组操作和数值计算。在机器学习领域,除了基本的数值计算,还涉及到许多其他复杂的任务和算法。这就是为什么需要其他专门的机器学习框架的原因,其中一些主要包括:

1.高级机器学习算法:NumPy 只提供了有限的几种经典算法,不如完整框架包括的算法多,如果需要更高级的功能和优化。就需要专门的机器学习框架,如 TensorFlow、PyTorch 和 scikit-learn等。

2.自动微分和梯度计算:在训练神经网络等深度学习模型时,梯度计算是反向传播过程中进行参数更新的关键步骤。而Numpy没有提供自动求导功能,专门的框架提供了自动微分和梯度计算的功能。

3.高级数据处理和预处理:在机器学习任务中,数据的处理和预处理是非常重要的。专门的机器学习框架提供了丰富的工具和函数,用于数据加载、转换、特征工程和数据增强等操作。这些功能使得数据的准备和处理更加方便和灵活。

4.分布式计算和加速计算:对于大规模的数据集和复杂的模型,需要进行分布式计算和高性能的加速计算。一些机器学习框架提供了分布式计算的支持,可以在集群或GPU等加速硬件上运行模型训练和推理,以提高计算效率和速度。

你可能感兴趣的:(机器学习,机器学习,numpy,人工智能)