#第一章 文本
##1.1每次处理一个字符
```python
1. thelist = list(thestring)
for c in tehstring:
do_something_with(c)
2. results = [do_something_with(c) for c in thestring]
3. results = map(do_something,thestring)
```
##1.2字符和字符值之间的转换
将一个字符转化为相应的ASC或Unicode
```python
>>>print ord('a') 97 >>>print chr(91)
a
>>>print ord(u'\u2020')
8224
>>>print repr(unichr(8224))
u'\u2020'
```
##1.3测试一个对象是否是类字符串
```python
isinstance 和 basestring
basestring是str 和unicode的共同基类
def isAString(anobj):
return isinstance(anobj,basestring)
```
##1.4字符串对齐
```python
>>>print '|','hej'.ljust(20),'|','hej'.rjust(20),'|','hej'.center(20),'|'
|hej | hej | hej |
```
##1.5去掉字符串两端的空格
```python
>>>x = ' hej ' >>>print '|',x.lstrip(),'|',x.rstrip(),'|',x.strip(),'|'
| hej | hej | hej |
```
##1.6合并字符串
1.大量字符优先,可以用中间数据list来容纳后来的字符,最后再一并处理
largeString = ''.join(pieces)
2.少量字符优先
largeString = '%s%s something %s yet more' % (small1,small2,small3)
3.效率低下,会产生许多字符串的中间结果
largeString = small1 + small2 + 'something' + small3 + 'yet more' 4. import operator
largeString = reduce(operator.add,pieces,'')
##1.7将字符串逐个字符或逐个词进行反转
1.字符,切片方法
revchars = astring[::-1]
2.单词
```python
revwords = astring.split()
revwords.reverse()
revwords = ' '.join(revwords)
revwords = ' '.join(astring.split()[::-1])
```
reversed返回一个迭代器
##1.8检查字符中是否包含某字符集合中的字符
```python
def containsAny(seq,aset):
for c in aset:
if c inaset: return True
return False
```
##1.9简化字符串的translate方法的使用
##1.10过滤字符串中不属于指定集合的字符(translate)
__Perl的判定方法,如果字符串中包含了空值或者其中有超过30%的字符的高位被置为1或是奇怪的控制码,我们就认为这段数据是二进制数据__
```pyhon
from __future__ import division
import string
text_characters = ''.join(map(chr,range(32,127))) + '\n\r\t\b'
_null_trans = string.maketrans('','')
def istext(s,text_characters = text_characters,threshold=0.30):
if '\0' in s:
return False
if not s:
return True
t = s.translate(_null_trans,text_characters)
return len(t)/len(c) <= threshold
```
##1.12控制大小写
big = little.upper()
little = big.lower()
```python
>>>print 'one tWo thrEe'.capitalize()
One two three
>>>print 'one tWo threEe'.title()
One Two Threee
```
##1.13访问子字符串
1.切片方式
afield = theline[3:8]
2.struct.unpack方法
```python
import struct
format = '5s 4x 3s'
#取前5个字符,跳过4个字符,再取3个字符
print ''.join(struct.unpack(format,'Test astring'))
结果: Testing
```
##1.14改变多行文本字符串的缩进
1.统一的缩进
```python
defa reindent(s,numSpace):
leading_space = numSpaces * ' '
lines = [leading_space + line.strio() for line in s.splitlines()]
return '\n'.join(lines)
```
2.相对缩进
```python
def addSpaces(s,numAdd):
white = " " * numAdd
return white + white.join(s.splitlines(True))
def numSpaces(s):
return [len(line) - len(line.lstrip()) for line in s.splitlines()]
def delSpaces(s,numDel):
if numDel > min(numSpaces(s)):
raise ValueError,"removing more spaces than there are!"
return '\n'.join([line[numDel:] for line in s.splitlines()])
```
3.让缩进最小的行与左端对齐
```python
def unIndentBlock(s):
return delSpaces(s,min(numSpaces(s)))
```