书那么多,你没有时间重读第二遍,要做笔记,萃取精华.
随书代码
https://github.com/cbrownley/foundations-for-analytics-with-python
适时地进行复制和粘贴也是高效编程的一部分
数据ETL(extract、transform、load,即抽取、转换和加载)
先从客户或供应商处获取数据,然后提取并保留所需的数据,之后还可能会进行一些数据转换或重新格式化,最后将数据保存到数据库或数据仓库
典型的数据分析过程包括数据获取、数据准备、数据分析和结果展示
解析与读写MExcel 工作簿
按照安装程序的指示操作
#!/usr/bin/env python3
x = 4
y = 5
z = x +y
# 9
print(z)
# 9
print(format(z))
# "{0:d}".format(z)----{}是一个占位符号,表示要传入print语句一个具体的值,这里指变量z;
# 0 指向format() 方法中的第一个参数,在这里,只包含一个参数z,所以0 就指向这个值;相反,如果有多个参数,0 就确定地表示传入第一个参数。
# 冒号(:)用来分隔传入的值和它的格式;d表示被格式化为证书
# Output #2: Four plus five equals 9.
print("Output #2: Four plus five equals {0:d}.".format(z))
a = [1, 2, 3, 4]
b = ["first", "second", "third", "fourth"]
c = a + b
# "{0}, {1}, {2}".format(a, b, c),它说明了如何在print 语句中包含多个值。a 被传给{0},b 被传给{1},c 被传给{2}。因为这3 个值都是列表,不是数值,所以不设置数值格式.
# Output #3: [1, 2, 3, 4], ['first', 'second', 'third', 'fourth'], [1, 2, 3, 4, 'first', 'second', 'third', 'fourth']
print("Output #3: {0}, {1}, {2}".format(a, b, c))
Python 并不要求每条print 语句都必须使用.format,但是.format 确实功能强大,可以
为你节省很多输入。在上面的示例中,注意print("Output #3: {0}, {1}, {2}".format(a,
b, c)) 的最终结果是用逗号分隔的3 个变量。如果你想在不使用.format 的情况下得到
同样的结果,那么就应该这样写:print("Output #3: ",a,", ",b,", ",c),但这是一段非
常容易出现输入错误的代码。后面还会介绍.format 的其他用法,但是从现在开始,你
就应该熟练掌握它的用法,以便在需要的时候加以使用
x = 9
print("Output #4: {0}".format(x))
# 3的4次方
print("Output #5: {0}".format(3**4))
#将数值转换成整数并进行除法运算
print("Output #6: {0}".format(int(8.3)/int(2.7)))
输出结果
# =============================================================================
# Output #4: 9
# Output #5: 81
# Output #6: 4.0
# =============================================================================
# {0:.3f}---保留三位小数
# .format(x)---赋值给占位符{0}
print("Output #7: {0:.3f}".format(8.3/2.7))
y = 2.5*4.8
print("Output #8: {0:.1f}".format(y))
r = 8/float(3)
print("Output #9: {0:.2f}".format(r))
print("Output #10: {0:.4f}".format(8.0/3))
输出结果:
Output #7: 3.074
Output #8: 12.0
Output #9: 2.67
Output #10: 2.6667
type函数:查看数据类型
type(x)
使用math模块中的一些函数
# 脚本开头shebang行的下方添加from math import[function name]
from math import exp,log,sqrt
print("Output #11: {0:.4f}".format(exp(3)))
print("Output #12: {0:.2f}".format(log(4)))
print("Output #13: {0:.1f}".format(sqrt(81)))
输出结果:
Output #11: 20.0855
Output #12: 1.39
Output #13: 9.0
字符串可以包含在单引号、双引号、3 个单引号或3 个双引号之间
# 单引号中出现单引号,要加转义符\
.format('I\'m enjoying learning Python.'))
# 双引号中换行要加\
.format("a\
b\
c\
d")
# 使用3单引号或3双引号创建多行字符串,不需要加\
.format('''a
b
c
d''')
string1 = "This is a "
string2 = "short string."
sentence = string1 + string2
#输出: This is a short string.
print("输出: {0:s}".format(sentence))
# *----字符串重复一定的次数
#输出: She is very very very very beautiful.
print("输出: {0:s} {1:s} {2:s}".format("She is", "very "*4, "beautiful."))
m = len(sentence)
# 输出:23
print("输出:{0:d}".format(m))
# split()分割的使用
string1 = "My deliverable is due in May"
string1_list1 = string1.split()
# 使用空格字符(默认值)对字符串进行拆分
# 拆成子串列表
# 输出: ['My', 'deliverable', 'is', 'due', 'in', 'May']
print("输出: {0}".format(string1_list1))
# 使用前两个空格进行拆分
string1_list2 = string1.split(" ",2)
# 输出: FIRST PIECE:My SECOND PIECE:deliverable THIRD PIECE:is due in May
print("输出: FIRST PIECE:{0} SECOND PIECE:{1} THIRD PIECE:{2}"\
.format(string1_list2[0], string1_list2[1], string1_list2[2]))
string2 = "Your,deliverable,is,due,in,June"
string2_list = string2.split(",")
# 输出: ['Your', 'deliverable', 'is', 'due', 'in', 'June']
print("输出: {0}".format(string2_list))
# 输出: deliverable June June
print("输出: {0} {1} {2}".format(string2_list[1], string2_list[5],\string2_list[-1]))
join 函数将列表中的子字符串组合成一个字符串
将一个参数放在join 前面,表示使用这个字符(或字符串)在子字符串之间进行组合
string1 = "My deliverable is due in May"
string1_list2 = string1.split(" ",2)
# 输出:['My', 'deliverable', 'is due in May']
# 输出:My,deliverable,is due in May
print("输出:{0}".format(string1_list2))
print("输出:{0}".format(",".join(string1_list2)))
使用strip、lstrip 和rstrip 函数从字符串两端删除不想要的字符
string4 = "$$The unwanted characters have been removed.__---++"
#去掉下划线
string4_strip = string4.strip('$_-+')
# Output #31: The unwanted characters have been removed.
print("Output #31: {0:s}".format(string4_strip))
replace 函数将字符串中的一个或一组字符替换为另一个或另一组字符
string5 = "Let's replace the spaces in this sentence with other characters."
# 空格替换逗号
string5.replace(" ", ",")
lower和upper将字符串中的字母转为小写和大写
capitalize首字母变为大写
string5 = "here's WHAT Happens WHEN you use Capitalize."
string5_list = string5.split()
print("每个单词大写:")
for word in string5_list:
print("{0:s}".format(word.capitalize()))
输出:
每个单词大写:
Here's
What
Happens
When
You
Use
Capitalize.
使用时先导入:import re
#!/usr/bin/env python3
import re
string = "The quick brown fox jumps over the lazy dog."
# 分割
string_list = string.split()
# 创建pattern正则表达式,re.compile()提高运行速度;re.I 函数确保模式是不区分大小写;r确保不处理转义字符,比如\、\t 或\n
pattern = re.compile(r"The", re.I)
count = 0
for word in string_list:
# 将列表中的每个单词与正则表达式进行比较;pattern。search()匹配的结果返回true/false
if pattern.search(word):
count += 1
print("Output #38: {0:d}".format(count))
# 正则表达式比较长时,适用
string_to_find = r"The"
pattern = re.compile(string_to_find, re.I)
pattern = re.compile(r"The", re.I)
import re
string = "The quick brown fox jumps over the lazy dog."
string_to_find = r"The"
pattern = re.compile(string_to_find, re.I)
# 在string中找到the并用a替换
print("Output #40: {:s}".format(pattern.sub("a", string)))
from datetime import date, time, datetime, timedelta
# 只含年月日
today = date.today()
# {0!s}!s表示将值转化为字符串,尽管是数值型数据
print("Output #41: today: {0!s}".format(today))
print("Output #42: {0!s}".format(today.year))
print("Output #43: {0!s}".format(today.month))
print("Output #44: {0!s}".format(today.day))
# 包括时分秒
current_datetime = datetime.today()
print("Output #45: {0!s}".format(current_datetime))
# 输出结果
Output #41: today: 2018-02-26
Output #42: 2018
Output #43: 2
Output #44: 26
Output #45: 2018-02-26 20:43:23.966000
# 使用方括号创建一个列表
# 用len()计算列表中元素的数量
# 用max()和min()找出最大值和最小值
# 用count()计算出列表中某个值出现的次数
a_list = [1, 2, 3]
print("Output #58: {}".format(a_list))
print("Output #59: a_list has {} elements.".format(len(a_list)))
print("Output #60: the maximum value in a_list is {}.".format(max(a_list)))
print("Output #61: the minimum value in a_list is {}.".format(min(a_list)))
another_list = ['printer', 5, ['star', 'circle', 9]]
print("Output #62: {}".format(another_list))
print("Output #63: another_list also has {} elements.".format\
(len(another_list)))
print("Output #64: 5 is in another_list {} time.".format(another_list.count(5)))
# 输出
Output #58: [1, 2, 3]
Output #59: a_list has 3 elements.
Output #60: the maximum value in a_list is 3.
Output #61: the minimum value in a_list is 1.
Output #62: ['printer', 5, ['star', 'circle', 9]]
Output #63: another_list also has 3 elements.
Output #64: 5 is in another_list 1 time.
# 使用索引值访问列表中的特定元素
# [0]是第1个元素,[-1]是最后一个元素
a_list = [1, 2, 3]
another_list = ['printer', 5, ['star', 'circle', 9]]
print("Output #65: {}".format(a_list[0]))
print("Output #66: {}".format(a_list[1]))
print("Output #67: {}".format(a_list[2]))
print("Output #68: {}".format(a_list[-1]))
print("Output #69: {}".format(a_list[-2]))
print("Output #70: {}".format(a_list[-3]))
print("Output #71: {}".format(another_list[2]))
print("Output #72: {}".format(another_list[-1]))
# 输出
Output #65: 1
Output #66: 2
Output #67: 3
Output #68: 3
Output #69: 2
Output #70: 1
Output #71: ['star', 'circle', 9]
Output #72: ['star', 'circle', 9]
# 使用列表切片访问列表元素的一个子集
# 从开头开始切片,可以省略第1个索引值
# 一直切片到末尾,可以省略第2个索引值
a_list = [1, 2, 3]
another_list = ['printer', 5, ['star', 'circle', 9]]
print("Output #73: {}".format(a_list[0:2]))
print("Output #74: {}".format(another_list[:2]))
print("Output #75: {}".format(a_list[1:3]))
print("Output #76: {}".format(another_list[1:]))
# 输出
Output #73: [1, 2]
Output #74: ['printer', 5]
Output #75: [2, 3]
Output #76: [5, ['star', 'circle', 9]]
# 使用[:]复制一个列表
a_new_list = a_list[:]
# a_new_list 是a_list 的一个完美复制,你可以对a_new_list 添加或删除、排序,而不会影响a_list
print("Output #77: {}".format(a_new_list))
#输出
Output #77: [1, 2, 3]
a_list = [1, 2, 3]
another_list = ['printer', 5, ['star', 'circle', 9]]
# 使用+将两个或更多个列表连接起来
a_longer_list = a_list + another_list
print("Output #78: {}".format(a_longer_list))
#输出
Output #78: [1, 2, 3, 'printer', 5, ['star', 'circle', 9]]
a_list = [1, 2, 3]
another_list = ['printer', 5, ['star', 'circle', 9]]
# 使用in和not in来检查列表中是否有特定元素
a = 2 in a_list
print("Output #79: {}".format(a))
if 2 in a_list:
print("Output #80: 2 is in {}.".format(a_list))
b = 6 not in a_list
print("Output #81: {}".format(b))
if 6 not in a_list:
print("Output #82: 6 is not in {}.".format(a_list))
# 输出
Output #79: True
Output #80: 2 is in [1, 2, 3].
Output #81: True
Output #82: 6 is not in [1, 2, 3].
a_list = [1, 2, 3]
another_list = ['printer', 5, ['star', 'circle', 9]]
# 使用append()向列表末尾追加一个新元素
# 使用remove()从列表中删除一个特定元素
# 使用pop()从列表末尾删除一个元素
a_list.append(4)
a_list.append(5)
a_list.append(6)
print("Output #83: {}".format(a_list))
a_list.remove(5)
print("Output #84: {}".format(a_list))
a_list.pop()
a_list.pop()
print("Output #85: {}".format(a_list))
# 输出
Output #83: [1, 2, 3, 4, 5, 6]
Output #84: [1, 2, 3, 4, 6]
Output #85: [1, 2, 3]
a_list = [1, 2, 3]
another_list = ['printer', 5, ['star', 'circle', 9]]
# 使用reverse()原地反转一个列表会修改原列表
# 要想反转列表同时又不修改原列表,可以先复制列表
a_list.reverse()
print("Output #86: {}".format(a_list))
a_list.reverse()
print("Output #87: {}".format(a_list))
#输出
Output #86: [3, 2, 1]
Output #87: [1, 2, 3]
# 使用sort()对列表进行原地排序会修改原列表
# 要想对列表进行排序同时又不修改原列表,可以先复制列表
unordered_list = [3, 5, 1, 7, 2, 8, 4, 9, 0, 6]
print("Output #88: {}".format(unordered_list))
list_copy = unordered_list[:]
list_copy.sort()
print("Output #89: {}".format(list_copy))
print("Output #90: {}".format(unordered_list))
# 输出
Output #88: [3, 5, 1, 7, 2, 8, 4, 9, 0, 6]
Output #89: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Output #90: [3, 5, 1, 7, 2, 8, 4, 9, 0, 6]