There are many paths into the field of machine learning and most start with theory.
If you are a programmer then you already have the skills to decompose problems into their constituent parts and to prototype small projects in order to learn new technologies, libraries and methods. These are important skills for any professional programmer and these skills can be used to get started in machine learning, today.
You must learn the theory to be effective in machine learning, but you can use your interests and thirst for knowledge motivate you from working examples into mathematical understandings of algorithms.
In this post you will learn four strategies a programmer can follow to get started in machine learning. This is the path of the technician, which is practical and empirical and will require you to perform research and complete experiments in order to build up your own intuitions.
The four strategies are:
Read through these strategies and select one that you feel suits you the best, then execute with abandon.
Select a tool or library that you like and learn how to use it well.
I recommend you start with an environment that provides tools for data preparation, machine learning algorithms and the presentation of results. Learning an environment like this will allow you to get good at the process of machine learning end-to-end which is more valuable to you than learning a specific data preparation technique or machine learning algorithm.
Alternatively, perhaps you are interested in a specific technique of family of techniques. You could use this as an opportunity to deep dive into a library or tool that offers these methods and master the technique by mastering the library that supplies access to the technique.
Some tactics you could follow for this strategy are:
Some environments you should consider include: R, Weka, scikit-learn, waffles, and orange.
Select a dataset and understand it intimately and discover which algorithm class or type addresses it the best.
I recommend you select a modest sized dataset that fits into memory that may have been well studied before. There are excellent libraries of data sources available for you to browse and choose. Your objective is to understand the underlying problem that the data source represents, the structure in the dataset and the types of solutions that are most suited to the problem.
Use a machine learning or statistical environment to study the dataset. This will allow you to focus on the questions you are seeking to answer about the dataset rather than being distracted with learning about a given technique and learning how to implement it in code.
Some tactics that can help you with your study of an experimental machine learning dataset are:
Some repositories of high quality datasets you may like to consider are: UCI ML Repository, Kaggle and data.gov.
Select an algorithm and understand it intimately and discover parameter configurations that are stable across different datasets.
I recommend that you start with an algorithm of modest complexity. Select an algorithm that is well understood, has many open source implementations from you to choose from and has few parameters for you to explore. Your objective is to build up intuitions for how the algorithm performs across a range of problems and parameter configurations.
Use a machine learning environment or library. This will allow you to focus on the behaviors of the algorithm as a “system” as opposed to concerning yourself with canonical mathematical descriptions and reference literature.
Some tactics you can use when studying your chosen machine learning algorithm are:
Your studies can be as simple or as complex as you like. At the higher end you can explore so-called heuristics or rules of thumb for applying algorithms and empirically demonstrate whether they have merit and if so under what circumstances they correlate with successful outcomes.
Some algorithms you may consider to start with include: least squares linear regression, logistic regression, k-nearest neighbor classification, perceptron
Select an algorithm and implement or port an existing implementation to a language of your choice.
Select an algorithm of modest complexity to implement. I recommend performing some detailed research on the algorithm you which to implement, or select an implementation you like and port it to your chosen target programming language.
Implementing an algorithm by hand from scratch is a great way to learn about the myriad of micro-decisions that have to be made in transforming an algorithm description into a functioning system. By repeating this process with multiple algorithms you will quickly gain an intuition for how to read the mathematical descriptions of algorithms in research papers and books.
Five tactics that may help you when implementing machine learning algorithms from scratch are:
The four strategies being to a methodology I call “small projects”. It is an approach you can use to very quickly build up practical skills in technical fields of study, like machine learning. The general idea is that you design and execute on small projects that target a specific question you want to answer.
Small projects are small in a few dimensions to ensure that they completed and that you extract the learning benefits and move onto the next project. Below are constraints you should consider imposing on your projects:
The principle of these strategies is to take action and make use of your programmer skill set. Below are three tips to help you adjust your mindset in order to take action:
Here are the size strategies again with a clear one-linear for each to help you choose the one that is right for you.
Pick One!
Which strategy would you choose and what will be your first step? Pick one and declare your intentions in a comment below.
If you like this self-study strategy, I have created a 32-page PDF guide you can learn and practice applied machine learning. Check it out:
Small Projects Methodology: Learn and Practice Applied Machine Learning
I have also created a list of 90 project ideas (yeah, I went overboard) and provided it as a bonus with the guide
转载:http://machinelearningmastery.com/self-study-machine-learning-projects/