python、数据收集、数据分析

本文用于存放一些看到的好材料、文章的链接,不断更新。

1.http://www.cnblogs.com/ming5536/archive/2012/11/21/2781062.html

如何成为一个牛逼的数据分析师?

        There are two classes of skills that are needed to be a successful data analyst: both soft and technical skills are needed.  The core work flow for a data analyst is several fold.  Once a problem has been defined, and a hypothesis is to be tested, the data must be drawn out and then analyzed.  The resulting analysis is written up and communicated to the interested stake holder.  In order to do this there are several hard and soft skills that are required.

Technical Skills:
  1. A basic knowledge of statistics to a rigorous understanding of Machine Learning.  Most consumers of analysis will not look at more than descriptive analysis (means, medians, significance).  
  2. Computer skills that are useful are a Querying Language (SQL,Hive,Pig), a scripting Language (Python,Matlab), a Statistical Language (R, SAS, SPSS), and a Spreadsheet (Excel). 
Soft Skills
  1. Defining the problem and narrowing the analysis down often requires a lot of soft skills.  Balancing the demands on your time to reduce infinite what-if scenarios and understanding the requestors needs requires good communication and understanding of the business needs.  Avoid agreeing to delivering too much information that will be not useful to solving the core issues. 
  2. Knowing the audience. There is a different presentation required for a PM or a CEO.  As a Data Analyst, you will be often required to answer to both.  A typical PM will want a more collaborative interaction with more scenarios spelled out and a less polished presentation.  A CEO will often be looking for a specific recommendation in a small polished presentation. 
  3. Delivery.  Having a wonderfully accurate predictive model, that has been backtested to deliver a low RMSE, or an AB test that can increase conversion 15% without reducing sales price are all great results.  However, without a great presentation key findings may be left out of product road maps and in the backlog for months or years.
     看完后,总结一下作者的意思,无非两层:一是要有干货,二是要有思维。干货包括对机器学习的理解,通一门查询语言(SQL,Hive或者Pig),通一门脚本语言(python或者Matlab),通一门统计语言(R,SAS或者SPSS),通一款软件(Excel)。思维包括,  明确问题的核心,理解客户心声。

二、随想录(程序员和收入)
很好的一篇文章,程序员的工作与收入,值得思考。
http://blog.csdn.net/justjavac/article/details/8686805 

三、python学习
3-1:字符串及其编码
1.http://www.cnblogs.com/pylemon/archive/2011/05/18/2050179.html    Python中strip lstrip rstrip使用方法 (去除指定字符)
2.http://www.rmi.net/~lutz/strings30.html        python3.X中字符串编码:unicode 和 bytes
3.http://woodpecker.org.cn/diveintopython3/strings.html      string和bytes,dive into python字符串章。 bytes对象有一个decode()方法,它使用某种字符编码作为参数,然后依照这种编码方式将bytes对象转换为字符串,对应地,字符串有一个encode()方法,它也使用某种字符编码作为参数,然后依照它将串转换为bytes对象。
3-2:正则表达式
1.http://www.cnblogs.com/coderzh/archive/2008/05/06/1185755.html    正则表达式(爬虫用到,匹配具体内容,import re)


四、社会网络分析
1. http://www.kazemjahanbakhsh.com/codes/cmty.html   使用python的networkx和igraph包实现的GN社群检测算法的程序。
2. http://www.kazemjahanbakhsh.com/              大牛,社会网络分析。 Victoria大学博士
3. http://blog.sina.com.cn/s/blog_622245920100vscb.html   Igraph/ networkx学习笔记—数据结构(how)
4. http://www.zhizhihu.com/html/y2012/3912.html       iGraph库中Community Detection方法比较
5.http://blog.csdn.net/chaishen10000/article/details/5869445   六个主要的社会网络分析软件。

五、机器学习+推荐系统
1.http://blog.sina.com.cn/s/blog_7ad79389010184w3.html    基于用户的协同过滤介绍
2.http://webdam.inria.fr/Jorge/html/wdmch19.html              英文版 推荐系统 教程   2013/4/10
3.http://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html    推荐一个机器学习框架——python实现
4.http://w800927.iteye.com/blog/1329937                          数据挖掘易犯错误———像训练集和测试集的划分。“喝前摇一摇”
5.http://blog.sina.com.cn/s/blog_6b1c9ed50101akb6.html        经典的机器学习包。
6. http://www.lfd.uci.edu/~gohlke/pythonlibs/                    Unofficial Windows Binaries for Python Extension Packages很多非官方的包,PYhton3.x
[Python3.2  win32  如何安装scikitlearn包。目前官方的还不行,在上述网上搜集的unofficial版本,希望能帮助大家学 http://download.csdn.net/detail/database_zbye/5258021]
scikit-learn包使用主要参考官方reference。这有一个简单的例子 http://www.shahuwang.com/?p=1018
7. http://www.aiseminar.cn/bbs/forum.php?mod=viewthread&tid=798     机器学习库介绍

六、数据挖掘
1. http://blog.csdn.net/aladdina/article/details/4141177     数据挖掘的10大经典算法!
2. http://www.douban.com/group/topic/35168224/      漫谈数据挖掘从入门到进阶【详细介绍机器学习学习路线】

七、C/C++
1.http://www.cnblogs.com/zjfdbz/archive/2011/12/17/2291233.html     EOF 文件结束 end of file
2. http://blog.csdn.net/stone_sky/article/details/7288013 
    http://blog.csdn.net/mkowzy/article/details/1848647  cout<<char数组名/数组指针"跟"cout<<int数组名/数组指针"的差别

你可能感兴趣的:(数据分析,数据挖掘,机器学习)