Python:基础知识提要

数据类型与表达式

基本数据类型

  • int
  • float
  • 字符串
  • bool
a=2
b=2.0
c='xiniulab'
d=True
a,b,c,d
(2, 2.0, 'xiniulab', True)

表达式

y=a+b
y
4.0

字符串操作

字符串

  • 连接字符串可以用“+”
  • \n 代表换行
str1="xiniu\nhello"
print str1
xiniu
hello
str2='\nI am coming'
str3=str1+str2
print str3
xiniu
hello
I am coming

slice/切片

test_str="xiniulabhello"
#取下标范围[1,4)
print test_str[1:4]
#取下标范围[1,4),步长为2
print test_str[1:4:2]
#负号表示从尾部倒数..
print test_str[-5:]
#步长为负,表示逆序
print test_str[::-1]
ini
ii
hello
ollehbaluinix

工具函数 dir & help

#dir 罗列出一个类或模块内的属性和函数
#例如
dir(str)
['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getslice__',
 '__gt__',
 '__hash__',
 '__init__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '_formatter_field_name_split',
 '_formatter_parser',
 'capitalize',
 'center',
 'count',
 'decode',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'index',
 'isalnum',
 'isalpha',
 'isdigit',
 'islower',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']
#help 查看函数的说明文档
#例如
help(str.strip)
Help on method_descriptor:

strip(...)
    S.strip([chars]) -> string or unicode

    Return a copy of the string S with leading and trailing
    whitespace removed.
    If chars is given and not None, remove characters in chars instead.
    If chars is unicode, S will be converted to unicode before stripping
import pandas
help(pandas)
Help on package pandas:

NAME
    pandas

FILE
    /opt/ds/local/lib/python2.7/site-packages/pandas/__init__.py

DESCRIPTION
    pandas - a powerful data analysis and manipulation library for Python
    =====================================================================

    See http://pandas.pydata.org/ for full documentation. Otherwise, see the
    docstrings of the various objects in the pandas namespace:

    Series
    DataFrame
    Panel
    Index
    DatetimeIndex
    HDFStore
    bdate_range
    date_range
    read_csv
    read_fwf
    read_table
    ols

PACKAGE CONTENTS
    _hash
    _join
    _period
    _sparse
    _testing
    _version
    _window
    algos
    api (package)
    compat (package)
    computation (package)
    core (package)
    formats (package)
    hashtable
    index
    indexes (package)
    info
    io (package)
    json
    lib
    msgpack (package)
    parser
    rpy (package)
    sparse (package)
    stats (package)
    tests (package)
    tools (package)
    tseries (package)
    tslib
    types (package)
    util (package)

SUBMODULES
    offsets

DATA
    IndexSlice = 
    NaT = NaT
    __docformat__ = 'restructuredtext'
    __version__ = u'0.19.2'
    datetools = 
    get_option = 
    options = 
    plot_params = {'xaxis.compat': False}
    reset_option = 
    set_option = 

VERSION
    0.19.2

去空格、分词、联合

  • str.strip 去除左右两端空格
  • str.lstrip 去除左端空格
  • str.rstrip 去除右端空格
  • str.split 分裂字符串
  • str.join 将分裂的字符串连接
l_str='    this is xiniu lab     '
l_str.lstrip()
'this is xiniu lab     '
new_str=l_str.strip()
new_str
'this is xiniu lab'
str_list=new_str.split(" ")
str_list
['this', 'is', 'xiniu', 'lab']
new_str2="#".join(str_list)
new_str2
'this#is#xiniu#lab'

容器介绍

容器

  • list
  • dict
  • set

list/列表

  • 与C++数组很像
  • list不管元素类型
  • 切片
  • 判断一个元素是否在list中
str_list[3]=2.0
str_list
['this', 'is', 'xiniu', 2.0]
str_list[1:]
['is', 'xiniu', 2.0]
2 in str_list
True

append和extend区别

  • append比较粗暴,硬塞进去
  • extend 等价于“+”
#append and extend
help(list.append)
Help on method_descriptor:

append(...)
    L.append(object) -- append object to end
new_list=[1,2,3]
new_list.append(str_list)
new_list
[1, 2, 3, ['this', 'is', 'xiniu', 2.0]]
new_list.remove(str_list)
new_list
[1, 2, 3]
new_list.extend(str_list)
new_list
[1, 2, 3, 'this', 'is', 'xiniu', 2.0]
[1,2]+[3]
[1, 2, 3]

判断语句

if 1 in str_list:
    print "yes"
else:
    print "No"
No
age=25
if(age>58):
    print "old man"
elif(age>35):
    print "middle age"
else:
    print "young man"
young man

循环语句

for elem in str_list:
    print elem
this
is
xiniu
2.0
#while
index=0
while indexif type(str_list[index])==float:
        print str_list[index]
    index+=1
2.0

list高级用法

  • reverse
  • sort和sorted
help(list.reverse)
Help on method_descriptor:

reverse(...)
    L.reverse() -- reverse *IN PLACE*
str_list.reverse()
str_list
[2.0, 'xiniu', 'is', 'this']
#sort 和sorted
test=[5,2,4,6,1]
help(list.sort)
Help on method_descriptor:

sort(...)
    L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
    cmp(x, y) -> -1, 0, 1
help(sorted)
Help on built-in function sorted in module __builtin__:

sorted(...)
    sorted(iterable, cmp=None, key=None, reverse=False) --> new sorted list
#并不是就地转换
print sorted(test)
#reverse代表逆序排列
print sorted(test,reverse=True)
print test
[1, 2, 4, 5, 6]
[6, 5, 4, 2, 1]
[1, 2, 4, 5, 6]
#就地转换,改变原值
test.sort()
test
[1, 2, 4, 5, 6]
  • sorted函数的参数key

  • key可指定函数/映射

strs=['ccc','aaaaa','dd','e']
sorted(strs,key=len)
['e', 'dd', 'ccc', 'aaaaa']
#默认按字符编码先后排列
new_str_list=['aa','BB','CC','zz']
sorted(new_str_list)
['BB', 'CC', 'aa', 'zz']
sorted(new_str_list,key=str.lower)
['aa', 'BB', 'CC', 'zz']

字典/dict

  • 类比C++ hash map
animal_legs={'dog':4,'chicken':2,'spider':8}
print 'dog has '+str(animal_legs['dog'])+' legs'
dog has 4 legs
animal_legs.keys()
['chicken', 'dog', 'spider']
animal_legs.values()
[2, 4, 8]
help(dict)
Help on class dict in module __builtin__:

class dict(object)
 |  dict() -> new empty dictionary
 |  dict(mapping) -> new dictionary initialized from a mapping object's
 |      (key, value) pairs
 |  dict(iterable) -> new dictionary initialized as if via:
 |      d = {}
 |      for k, v in iterable:
 |          d[k] = v
 |  dict(**kwargs) -> new dictionary initialized with the name=value pairs
 |      in the keyword argument list.  For example:  dict(one=1, two=2)
 |  
 |  Methods defined here:
 |  
 |  __cmp__(...)
 |      x.__cmp__(y) <==> cmp(x,y)
 |  
 |  __contains__(...)
 |      D.__contains__(k) -> True if D has a key k, else False
 |  
 |  __delitem__(...)
 |      x.__delitem__(y) <==> del x[y]
 |  
 |  __eq__(...)
 |      x.__eq__(y) <==> x==y
 |  
 |  __ge__(...)
 |      x.__ge__(y) <==> x>=y
 |  
 |  __getattribute__(...)
 |      x.__getattribute__('name') <==> x.name
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(...)
 |      x.__gt__(y) <==> x>y
 |  
 |  __init__(...)
 |      x.__init__(...) initializes x; see help(type(x)) for signature
 |  
 |  __iter__(...)
 |      x.__iter__() <==> iter(x)
 |  
 |  __le__(...)
 |      x.__le__(y) <==> x<=y
 |  
 |  __len__(...)
 |      x.__len__() <==> len(x)
 |  
 |  __lt__(...)
 |      x.__lt__(y) <==> x x!=y
 |  
 |  __repr__(...)
 |      x.__repr__() <==> repr(x)
 |  
 |  __setitem__(...)
 |      x.__setitem__(i, y) <==> x[i]=y
 |  
 |  __sizeof__(...)
 |      D.__sizeof__() -> size of D in memory, in bytes
 |  
 |  clear(...)
 |      D.clear() -> None.  Remove all items from D.
 |  
 |  copy(...)
 |      D.copy() -> a shallow copy of D
 |  
 |  fromkeys(...)
 |      dict.fromkeys(S[,v]) -> New dict with keys from S and values equal to v.
 |      v defaults to None.
 |  
 |  get(...)
 |      D.get(k[,d]) -> D[k] if k in D, else d.  d defaults to None.
 |  
 |  has_key(...)
 |      D.has_key(k) -> True if D has a key k, else False
 |  
 |  items(...)
 |      D.items() -> list of D's (key, value) pairs, as 2-tuples
 |  
 |  iteritems(...)
 |      D.iteritems() -> an iterator over the (key, value) items of D
 |  
 |  iterkeys(...)
 |      D.iterkeys() -> an iterator over the keys of D
 |  
 |  itervalues(...)
 |      D.itervalues() -> an iterator over the values of D
 |  
 |  keys(...)
 |      D.keys() -> list of D's keys
 |  
 |  pop(...)
 |      D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
 |      If key is not found, d is returned if given, otherwise KeyError is raised
 |  
 |  popitem(...)
 |      D.popitem() -> (k, v), remove and return some (key, value) pair as a
 |      2-tuple; but raise KeyError if D is empty.
 |  
 |  setdefault(...)
 |      D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D
 |  
 |  update(...)
 |      D.update([E, ]**F) -> None.  Update D from dict/iterable E and F.
 |      If E present and has a .keys() method, does:     for k in E: D[k] = E[k]
 |      If E present and lacks .keys() method, does:     for (k, v) in E: D[k] = v
 |      In either case, this is followed by: for k in F: D[k] = F[k]
 |  
 |  values(...)
 |      D.values() -> list of D's values
 |  
 |  viewitems(...)
 |      D.viewitems() -> a set-like object providing a view on D's items
 |  
 |  viewkeys(...)
 |      D.viewkeys() -> a set-like object providing a view on D's keys
 |  
 |  viewvalues(...)
 |      D.viewvalues() -> an object providing a view on D's values
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __hash__ = None
 |  
 |  __new__ = 
 |      T.__new__(S, ...) -> a new object with type S, a subtype of T
animal_legs.items()
[('chicken', 2), ('dog', 4), ('spider', 8)]
animal_legs['dog']=3
animal_legs['dog']
3

列表推导式

num_list=range(10)
num_list
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[tmp**2 for tmp in num_list]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
# 加条件
[tmp**2 for tmp in num_list if tmp%2==0]
[0, 4, 16, 36, 64]
# 加条件
[tmp**2 for tmp in num_list if tmp%2==0 and tmp>=4]
[16, 36, 64]

字典也有推导式

# 字典dictionary 推导式
{tmp:tmp**2 for tmp in num_list}
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
  • 推导式比循环有效率

函数

函数

def my_sum(a=3,b=5):
    return a+b
my_sum(1,3)
4

不定长参数

#不定长参数
def print_all(*args):
    print type(args)
    print(args)
print_all('hello','word','xiniu','lab')

('hello', 'word', 'xiniu', 'lab')
  • tuple 和list差不多,但用的不是很多,因为不支持一些变换
my_sum()
8
  • 默认参数值

文件操作

文件读写

infile=open('ShangHai.txt','r')
for line in infile:
    print line.strip().split(" ")
['On', 'the', 'morning', 'of', 'June', '20th', '1830,', 'Lord', 'Amnerst,', 'the', 'first', 'British', 'ship', 'to', 'visit', 'Shanghai', 'was', 'anchored', 'at', 'the', 'mouth', 'of', 'Huangpu,', 'two', 'Europeans', 'strode', 'ashore.', 'These', 'men', 'were', 'Charles', 'Gutzlaff,', 'translator', 'and', 'missionary,', 'and', 'Hill', 'Lynsay,', 'representative', 'of', 'the', 'British', 'East', 'India', 'Company.', 'Crowds', 'gathered', 'together', 'to', 'witness', 'these', 'so-called', 'barbarians;', 'though', 'in', 'his', 'report', 'Linsay', 'mentioned', 'cotton', 'cloth', 'and', 'calico,', 'his', 'real', 'objective', 'was', 'to', 'sell', 'opium.', 'Nine', 'years', 'later,', 'the', 'opium', 'war', 'broke', 'out.', 'After', 'the', 'Chinese', 'was', 'defeated', 'by', 'Britain,', 'Shanghai', 'became', 'one', 'of', 'the', 'cities', 'opened', 'to', 'foreign', 'trade', 'by', 'the', '1842', 'Treaty', 'of', 'Nanking,', 'and', 'a', 'new', 'city', 'began', 'to', 'develop.']
['Shanghailanders']
['Until', 'the', '19th', 'century', 'and', 'the', 'first', 'opium', 'war,', 'Shanghai', 'was', 'considered', 'to', 'be', 'essentially', 'a', 'fishing', 'village.', 'However,', 'in', '1914,', 'Shanghai', 'had', '200', 'banks', 'dealing', 'with', '80%', 'of', 'its', 'foreign', 'investments', 'in', 'China.', 'Citizens', 'of', 'many', 'countries', 'on', 'all', 'continents', 'gathered', 'in', 'Shanghai', 'to', 'live', 'and', 'work', 'in', 'the', 'ensuing', 'decades.', 'By', '1932,', 'Shanghai', 'had', 'become', 'the', 'world\xe2\x80\x99s', '5th', 'largest', 'city', 'and', 'home', 'to', '70,000', 'foreigners.', 'Foreign', 'residents', 'of', 'the', 'city', 'called', 'themselves', 'Shanghailanders.', 'From', '1842', 'to', '1949,', 'while', 'the', 'British', 'established', 'settlement', 'in', 'a', 'section', 'of', 'Shanghai,', 'the', 'French', 'and', 'the', 'American', 'also', 'established', 'their', 'own', 'settlements;', 'these', 'settlements', 'were', 'later', 'called', 'concessions.', 'World', 'War', 'II', 'marked', 'Shanghai', 'as', 'a', 'destination', 'for', 'refugees.', 'Between', '1937', 'and', '1939,', 'an', 'estimated', '20,000', 'Jews', 'traveled', 'to', 'Shanghai', 'to', 'flee', 'the', 'Nazis,', 'Shanghai', 'was', 'the', 'only', 'city', 'where', 'Jews', 'were', 'welcome', 'without', 'condition.', 'Today,', 'the', 'streets', 'of', 'the', 'French', 'concession', 'and', 'other', 'foreign', 'settlements', 'had', 'changed', 'to', 'become', 'what-to-do', 'n\xe2\x80\x99', 'you-need', 'avenues,', 'while', 'the', 'Bund,', 'a', 'stretch', 'of', 'Western', 'buildings', 'is', 'still', 'representing', 'the', 'Western', 'influence', 'that', 'dominated', 'so', 'much', 'of', 'the', 'city\xe2\x80\x99s', 'history.']
['General', 'Facts']
['Shanghai', 'is', 'a', 'city', 'in', 'East', 'China;', 'it', 'is', 'the', 'largest', 'city', 'of', 'the', 'People\xe2\x80\x99s', 'Republic', 'of', 'China', 'and', 'the', '8th', 'largest', 'city', 'in', 'the', 'world.', 'Due', 'to', 'its', 'rapid', 'growth', 'of', 'the', 'last', 'two', 'decades,', 'it', 'has', 'again', 'become', 'a', 'global', 'city;', 'it', 'is', 'also', 'known', 'as', 'the', 'Paris', 'of', 'the', 'East.', 'According', 'to', 'the', '2009', 'census,', 'Shanghai', 'has', 'a', 'population', 'of', 'about', '19', 'millions,', 'four', 'times', 'more', 'than', 'the', 'people', 'in', 'New', 'Zealand,', 'registered', 'migrants', 'comprise', 'of', 'one-third', 'of', 'the', 'population', 'in', '2007.', 'However,', 'as', 'the', 'most', 'success', 'of', 'cities', 'of', 'the', 'one-child', 'policy,', 'Shanghai', 'has', 'the', 'lowest', 'fertility', 'rate', 'in', 'China.', 'The', 'main', 'language', 'spoken', 'in', 'Shanghai', 'is', 'Shanghainese,', 'one', 'of', 'the', '248', 'Chinese', 'dialects', 'identified', 'by', 'Wikipedia.', 'It', 'is', 'gigantically', 'different', 'from', 'Mandarin.', 'If', 'you', 'were', 'to', 'say', 'something', 'in', 'Shanghainese', 'to', 'a', 'Beijinger,', 'he\xe2\x80\x99s', 'bound', 'to', 'get', 'a', 'confused', 'stroke', 'and', 'possibly', 'get', 'some', 'eye-rolling.', 'Shanghainese', 'kids', 'start', 'learning', 'English', 'in', 'the', 'first', 'grade,', 'like', 'it', 'or', 'not,', 'English', 'is', 'now', 'a', 'compulsory', 'course', 'for', 'all', 'pupils', 'in', 'Shanghai.', 'In', 'a', 'decade\xe2\x80\x99s', 'time,', 'everyone', 'in', 'the', 'city', 'may', 'speak', 'English', 'or', 'a', 'hybrid', 'language', 'of', 'Chinese', 'and', 'English,', 'known', 'as', 'Chinglish.']
['Economy']
['Shanghai', 'means', 'on', 'top', 'of', 'the', 'sea,', 'but', 'the', 'fact', 'is,', 'quite', 'a', 'lot', 'of', 'local', 'Shanghainese', 'have', 'never', 'seen', 'the', 'sea', 'despite', 'Shanghai', 'is', 'not', 'more', 'than', 'one', 'hundred', 'miles', 'from', 'the', 'Pacific', 'Ocean;', 'and', 'it', 'is', 'not', 'blue', 'as', 'you', 'may', 'expect,', 'because', 'of', 'pollutions', 'from', 'factories', 'around', 'the', 'Yangtze', 'River', 'delta.', 'In', '2005,', 'Shanghai', 'was', 'termed', 'to', 'be', 'the', 'world\xe2\x80\x99s', 'largest', 'port', 'for', 'cargo', 'and', 'it', 'is', 'now', 'the', 'world\xe2\x80\x99s', 'busiest', 'seaport.', 'It', 'handled', '29', 'million', 'TEUs', 'in', '2010,', '25%', 'of', 'Chinese', 'industrial', 'output', 'comes', 'from', 'the', 'city', 'out', 'of', 'sea,', 'and', 'Shanghai', 'produces', '30%', 'of', 'China\xe2\x80\x99s', 'GDP.', 'By', 'the', 'end', 'of', '2009,', 'there', 'were', '787', 'financial', 'institutions', 'in', 'Shanghai,', 'of', 'which', '170', 'were', 'foreign', 'invested.', 'In', '2009,', 'the', 'Shanghai', 'Stock', 'Exchange', 'ranked', 'third', 'among', 'worldwide', 'stock', 'exchanges', 'in', 'terms', 'of', 'traded', 'volume', 'and', 'trading', 'volume', 'of', 'six', 'key', 'commodities', 'including', 'rubber,', 'copper', 'and', 'zinc', 'under', 'Shanghai', 'Future', 'Exchange', 'all', 'ranked', 'first', 'across', 'the', 'world.', 'Shanghai', 'is', 'now', 'ranked', '5th', 'in', 'the', 'latest', 'edition', 'of', 'the', 'Global', 'Financial', 'Center', 'Index', 'published', 'by', 'the', 'city', 'of', 'London.']
['Urban', 'Development']
['One', 'uniquely', 'Shanghainese', 'cultural', 'element', 'is', 'the', 'SHI', 'Ku', 'Men', 'residences,', 'which', 'is', 'a', 'two', 'or', 'three', 'storey', 'townhouses.', 'The', 'Shi', 'Ku', 'Men', 'is', 'a', 'cultural', 'blend', 'of', 'elements', 'found', 'in', 'Western', 'architecture,', 'traditional', 'Chinese', 'architecture', 'and', 'social', 'behavior.', 'Today,', 'many', 'of', 'the', 'area', 'with', 'classic', 'Shi', 'Ku', 'Men', 'stood', 'had', 'been', 'redeveloped', 'for', 'modern', 'Shanghai,', 'with', 'only', 'a', 'few', 'areas', 'remaining.', 'During', 'the', '1990s,', 'Shanghai', 'had', 'the', 'largest', 'agglomeration', 'of', 'construction', 'cranes;', 'since', '2008,', 'Shanghai', 'has', 'boasted', 'more', 'free', 'standing', 'buildings', 'for', '400', 'meters', 'than', 'any', 'other', 'cities,', 'The', 'Shanghai', 'World', 'Financial', 'Center', 'is', 'currently', 'the', 'third', 'tallest', 'building', 'in', 'the', 'world;', 'in', 'the', 'future,', 'the', 'Shanghai', 'Tower,', 'straight', 'to', 'completion', 'in', '2014,', 'will', 'be', 'the', 'tallest', 'in', 'China.', 'Meanwhile,', 'Shanghai', 'is', 'sinking', 'at', 'a', 'rate', 'of', '1.5cm', 'a', 'year.', 'Shanghai\xe2\x80\x99s', 'rapid', 'transit', 'system,', 'Shanghai', 'Metro,', 'extends', 'to', 'every', 'core', 'neighbor', 'districts', 'in', 'and', 'to', 'every', 'suburban', 'district.', 'As', 'of', '2010,', 'there', 'were12', 'metro', 'lines,', '273', 'stations', 'and', 'over', '420', 'km', 'of', 'tracks', 'in', 'operation,', 'making', 'it', 'the', 'largest', 'network', 'in', 'the', 'world.']
['And', 'the', 'shuttle', 'maglev', 'train', 'linking', 'the', 'airport', 'to', 'the', 'city', 'center', 'built', 'in', '2004', 'is', 'the', 'world\xe2\x80\x99s', 'fastest', 'passenger', 'train,', 'reaching', 'a', 'maximum', 'cruising', 'speed', 'of', '431', 'km', 'per', 'hour.', 'Shanghai', 'has', 'the', 'largest', 'bus', 'system', 'in', 'the', 'planet', 'with', '1424', 'bus', 'lines.']
file_name="ShangHai.txt"
with open(file_name,'r') as f:
    line=f.readline()
    print line
    lines=f.readlines()
    print lines
On the morning of June 20th 1830, Lord Amnerst, the first British ship to visit Shanghai was anchored at the mouth of Huangpu, two Europeans strode ashore. These men were Charles Gutzlaff, translator and missionary, and Hill Lynsay, representative of the British East India Company. Crowds gathered together to witness these so-called barbarians; though in his report Linsay mentioned cotton cloth and calico, his real objective was to sell opium. Nine years later, the opium war broke out. After the Chinese was defeated by Britain, Shanghai became one of the cities opened to foreign trade by the 1842 Treaty of Nanking, and a new city began to develop.

['Shanghailanders\n', 'Until the 19th century and the first opium war, Shanghai was considered to be essentially a fishing village. However, in 1914, Shanghai had 200 banks dealing with 80% of its foreign investments in China. Citizens of many countries on all continents gathered in Shanghai to live and work in the ensuing decades. By 1932, Shanghai had become the world\xe2\x80\x99s 5th largest city and home to 70,000 foreigners. Foreign residents of the city called themselves Shanghailanders. From 1842 to 1949, while the British established settlement in a section of Shanghai, the French and the American also established their own settlements; these settlements were later called concessions. World War II marked Shanghai as a destination for refugees. Between 1937 and 1939, an estimated 20,000 Jews traveled to Shanghai to flee the Nazis, Shanghai was the only city where Jews were welcome without condition. Today, the streets of the French concession and other foreign settlements had changed to become what-to-do n\xe2\x80\x99 you-need avenues, while the Bund, a stretch of Western buildings is still representing the Western influence that dominated so much of the city\xe2\x80\x99s history.  \n', 'General Facts\n', 'Shanghai is a city in East China; it is the largest city of the People\xe2\x80\x99s Republic of China and the 8th largest city in the world. Due to its rapid growth of the last two decades, it has again become a global city; it is also known as the Paris of the East. According to the 2009 census, Shanghai has a population of about 19 millions, four times more than the people in New Zealand, registered migrants comprise of one-third of the population in 2007. However, as the most success of cities of the one-child policy, Shanghai has the lowest fertility rate in China. The main language spoken in Shanghai is Shanghainese, one of the 248 Chinese dialects identified by Wikipedia. It is gigantically different from Mandarin. If you were to say something in Shanghainese to a Beijinger, he\xe2\x80\x99s bound to get a confused stroke and possibly get some eye-rolling. Shanghainese kids start learning English in the first grade, like it or not, English is now a compulsory course for all pupils in Shanghai. In a decade\xe2\x80\x99s time, everyone in the city may speak English or a hybrid language of Chinese and English, known as Chinglish. \n', 'Economy\n', 'Shanghai means on top of the sea, but the fact is, quite a lot of local Shanghainese have never seen the sea despite Shanghai is not more than one hundred miles from the Pacific Ocean; and it is not blue as you may expect, because of pollutions from factories around the Yangtze River delta. In 2005, Shanghai was termed to be the world\xe2\x80\x99s largest port for cargo and it is now the world\xe2\x80\x99s busiest seaport. It handled 29 million TEUs in 2010, 25% of Chinese industrial output comes from the city out of sea, and Shanghai produces 30% of China\xe2\x80\x99s GDP. By the end of 2009, there were 787 financial institutions in Shanghai, of which 170 were foreign invested. In 2009, the Shanghai Stock Exchange ranked third among worldwide stock exchanges in terms of traded volume and trading volume of six key commodities including rubber, copper and zinc under Shanghai Future Exchange all ranked first across the world. Shanghai is now ranked 5th in the latest edition of the Global Financial Center Index published by the city of London.\n', 'Urban Development\n', 'One uniquely Shanghainese cultural element is the SHI Ku Men residences, which is a two or three storey townhouses. The Shi Ku Men is a cultural blend of elements found in Western architecture, traditional Chinese architecture and social behavior. Today, many of the area with classic Shi Ku Men stood had been redeveloped for modern Shanghai, with only a few areas remaining. During the 1990s, Shanghai had the largest agglomeration of construction cranes; since 2008, Shanghai has boasted more free standing buildings for 400 meters than any other cities, The Shanghai World Financial Center is currently the third tallest building in the world; in the future, the Shanghai Tower, straight to completion in 2014, will be the tallest in China. Meanwhile, Shanghai is sinking at a rate of 1.5cm a year. Shanghai\xe2\x80\x99s rapid transit system, Shanghai Metro, extends to every core neighbor districts in and to every suburban district. As of 2010, there were12 metro lines, 273 stations and over 420 km of tracks in operation, making it the largest network in the world.         \n', 'And the shuttle maglev train linking the airport to the city center built in 2004 is the world\xe2\x80\x99s fastest passenger train, reaching a maximum cruising speed of 431 km per hour. Shanghai has the largest bus system in the planet with 1424 bus lines.\n']
  • readline是只读取一个整行
  • readlines是读取全文,得到所有行,返回的是list

类和对象

#define class
class Employee(object):
    def __init__(self,name,ID,title,salary,manager=None):
        self.name=name
        self.ID=ID
        self.title=title
        self.salary=salary
        self.manager=manager

    def get_info(self):
        return "Employee name:"+self.name+","+"Employee ID:" + str(self.ID)
boss=Employee("mayun",1,"CEO",999999)
print boss.get_info()
Employee name:mayun,Employee ID:1

子类

#定义subclass
class CEO(Employee):
    def __init__(self,name,ID,title,salary,manager=None):
        super(CEO,self).__init__(name,ID,title,salary,manager)

    def work(self):
        print "I do CEO's work"
one_person=CEO("Wang Da chui",3,"CEO",99999)
print one_person.get_info()
one_person.work()
Employee name:Wang Da chui,Employee ID:3
I do CEO's work

正则表达式

正则表达式练习网站

语法 说明
. 匹配除‘\n’之外的字符
[ ] 字符集,字符可列出,也可给出范围
\s 空格
\S 非空格
\w 单词,包括大小写字母、数字、下划线
\W 非单词
\d 数字
\D 非数字
{m} 匹配前一个字符m次
{m,n} 匹配前一个字符m至n次
+ 至少有一次
* 零次或若干
零次或一次
^ 边界匹配,字符串头部
$ 边界匹配,尾部
\b 边界匹配,单词与非单词之间
( ) 分组,括号里的表达式作为一个整体

在Python中使用正则表达式

Python通过re模块提供对正则表达式的支持。

导入re模块

使用re的一般步骤是
* 1.将正则表达式的字符串形式编译为Pattern实例
* 2.使用Pattern实例处理文本并获得匹配结果(一个Match实例)
* 3.使用Match实例获得信息,进行其他的操作。

# encoding: UTF-8
import re

# 将正则表达式编译成Pattern对象
pattern = re.compile(r'hello.*\!')

# 使用Pattern匹配文本,获得匹配结果,无法匹配时将返回None
match = pattern.match('hello, hanxiaoyang! How are you?')

if match:
    # 使用Match获得分组信息
    print match.group()
hello, hanxiaoyang!

相关函数

re.compile

用于将字符串形式的正则表达式编译为Pattern对象

match.group
  • 获得一个或多个分组截获的字符串
  • 这里的分组和正则表达式相关,比如 r’(\w+) (\w+)(?P.*)’有3个分组
pattern.match
pattern.search
pattern.findall

3者的区别:

  • findall 返回全部的匹配子串,match和search只返回一个(match对象)
  • match和search,match是从头开始匹配,若不匹配返回None
pattern.finditer

搜索string,返回一个顺序访问每一个匹配结果(Match对象)的迭代器。

  • 可和fidall做下比较。可这么看,findall返回的的list元素是字符串,而finditer返回的list元素是match对象
pattern.split

按正则表达式指定的匹配,分裂全部字符串

pattern.sub

替换子串

  • 注意要在正则表达式中指定分组
pattern.subn
  • 和sub的区别是,返回值中增加了替换次数

你可能感兴趣的:(数据处理与机器学习,python,数据挖掘)