解释器是一个从上到下,从左到右读取代码的程序
python打开csv文件分析并处理
打开文件更pythonic的方式是使用c中不存在的关键字使用with
as文件名作为特定的文件名,此时文件将自动关闭
import csv
with open("demo.csv","r") as file:
reader = csv.reader(file)
next(reader)
for row in reader:
favorite = row[1]
print(favorite)
更常用的是使用DictReader,一旦移动了列不用修改第一行
import csv
with open("demo.csv","r") as file:
reader = csv.DictReader(file)
for row in reader:
print(row["Name"])
统计喜爱人数
import csv
with open("favorites.csv","r") as file:
reader = csv.DictReader(file)
scratch, c ,python = 0, 0, 0
for row in reader:
favorite = row["language"]
if favorite == "C":
c += 1
elif favorite == "Python":
python += 1
elif favorite == "Scratch":
scratch += 1
print(f"c:{c} python: {python} scratch: {scratch}")
更pysonic的方式 是使用字典,这里使用一个字典可以替代三个变量
import csv
with open("favorites.csv", "r") as file:
reader = csv.DictReader(file)
counts = {}
for row in reader:
favorite = row["language"]
if favorite in counts:
counts[favorite] += 1
else:
counts[favorite] = 1
for favorite in counts:
print(f"{favorite}: {counts[favorite]}")
counts是字典,可以按作为对象按右侧序列来排序
import csv
with open("favorites.csv", "r") as file:
reader = csv.DictReader(file)
counts = {}
for row in reader:
favorite = row["language"]
if favorite in counts:
counts[favorite] += 1
else:
counts[favorite] = 1
for favorite in sorted(counts, key = counts.get, reverse = True):
print(f"{favorite}: {counts[favorite]}")
使用collections中的Conter库可以自动完成字典的计数
import csv
from collections import Counter
with open("favorites.csv", "r") as file:
reader = csv.DictReader(file)
counts = Counter()
for row in reader:
favorite = row["language"]
counts[favorite] += 1
for favorite in sorted(counts, key = counts.get, reverse = True):
print(f"{favorite}: {counts[favorite]}")
一个数据库程序,一个在计算机上运行的软件,在服务器上运行,支持数据库特定的语言
SQL:一种数据库特定语言,代表结构化查询语言,描述你想要返回的数据和问题,他没有很多关键字
SQL遵守crud范例,在关系数据库中,只能做四件事,创建数据,读取数据,更新数据,删除数据
可以通过SQL创建表
这里使用sqlite3版本,有一种语言较sqlite的sql的实现
SQL是“Structured Query Language”的缩写,直译为“结构化查询语言”。这是一种专门用于管理和操作关系数据库系统的标准化编程语言。SQL使得用户可以执行各种操作,如查询、更新、插入和删除数据库中的数据,以及创建和修改数据库结构等。
SELECT
语句从数据库表中检索数据。INSERT
)、更新(UPDATE
)、删除(DELETE
)数据行。CREATE TABLE
、DROP TABLE
、ALTER TABLE
等语句创建和修改表格和数据库结构。GRANT
和REVOKE
语句控制对数据库操作的访问权限。SQL语言自1970年代初期被发明以来,经历了多个版本的更新和标准化。以下是一些重要的里程碑:
除了这些标准版本,许多数据库管理系统(DBMS)如MySQL、PostgreSQL、Oracle、SQL Server等都支持SQL标准,但同时也引入了各自的专有扩展来增强功能和性能。这意味着虽然SQL的基本语法和操作在不同的系统中保持一致,但是特定的功能和性能优化措施可能会有所不同。
将csv加载到正确的数据库中,实际上可以使用另一种称为SQL的语言.
code命令通常对于文本文件,sqlite将创建一个ie二进制文件,包括0和1
User
src/python/ $ sqlite3 favorites.db
Are you sure you want to create favorites.db? [y/N] y
sqlite> .mode csv
sqlite> .import favorites.csv favorites
sqlite> .quit加上注释
1. `src/python/ $ sqlite3 favorites.db`
- 这条命令启动SQLite命令行工具并尝试连接到一个名为`favorites.db`的数据库。如果该数据库不存在,SQLite会询问你是否要创建它。
2. `Are you sure you want to create favorites.db? [y/N] y`
- 这是一个提示,询问你是否确实想要创建`favorites.db`数据库。输入`y`表示是,这将创建数据库文件。
3. `sqlite> .mode csv`
- 这条命令设置SQLite的模式为CSV,意味着接下来导入和导出操作将假定数据是以逗号分隔值(CSV)格式进行的。
4. `sqlite> .import favorites.csv favorites`
- 这条命令导入一个名为`favorites.csv`的CSV文件到数据库中。这里,`favorites`是命令的最后部分,指定了导入数据后在数据库中创建的表的名称。如果表已经存在,SQLite会将数据追加到该表中;如果表不存在,SQLite会根据CSV文件的第一行(通常包含列名)自动创建表和列。
5. `sqlite> .quit`
- 这条命令退出SQLite命令行工具。
整个过程是在SQLite命令行环境中进行的,涵盖了从创建数据库文件、设置数据导入模式、从CSV文件导入数据到指定表,最后退出SQLite环境的全过程。
src/python/ $ sqlite3 favorites.db
sqlite> .schema
CREATE TABLE IF NOT EXISTS "favorites"(
"Timestamp" TEXT, "language" TEXT, "problem" TEXT);
从表中选择一列到多列,这里所有的SQL都是大写
sqlite> SELECT * FROM favorites;
+---------------------+----------+--------------+
| Timestamp | language | problem |
+---------------------+----------+--------------+
| 10/24/2022 8:33:26 | C | Credit |
| 10/24/2022 10:32:26 | Python | Runoff |
| 10/24/2022 11:10:47 | Python | Mario |
| 10/24/2022 11:22:35 | Python | Scratch |
| 10/24/2022 11:39:06 | Python | Readability |
| 10/24/2022 11:53:00 | Scratch | Scratch |
| 10/24/2022 13:26:23 | C | Bulbs |
| 10/24/2022 13:32:09 | Python | Filter |
| 10/24/2022 13:36:35 | Python | DNA |
| 10/24/2022 13:37:20 | Scratch | Scratch |
| 10/24/2022 13:37:22 | Scratch | Scratch |
| 10/24/2022 13:37:23 | Python | Hello |
| 10/24/2022 13:37:24 | Python | DNA |
| 10/24/2022 13:37:25 | Python | Hello |
| 10/24/2022 13:37:26 | Scratch | Cash |
| 10/24/2022 13:37:28 | Python | Readability |
| 10/24/2022 13:37:29 | Scratch | Scratch |
.....
sqlite> SELECT language FROM favorites;
+----------+
| language |
+----------+
| C |
| Python |
| Python |
| Python |
| Python |
| Scratch |
| C |
| Python |
| Python |
| Scratch |
| Scratch |
| Python |
| Python |
| Python |
...
sqlite> SELECT language FROM favorites limit 10;
+----------+
| language |
+----------+
| C |
| Python |
| Python |
| Python |
| Python |
| Scratch |
| C |
| Python |
| Python |
| Scratch |
+----------+
以上是SQL中的一些关键字
sqlite> SELECT COUNT(*) FROM favorites;
+----------+
| COUNT(*) |
+----------+
| 430 |
+----------+
sqlite> SELECT DISTINCT(language) FROM favorites;
+----------+
| language |
+----------+
| C |
| Python |
| Scratch |
sqlite> SELECT COUNT( DISTINCT(language)) FROM favorites;
+----------------------------+
| COUNT( DISTINCT(language)) |
+----------------------------+
| 3 |
+----------------------------+
SQLite是一个流行的嵌入式SQL数据库引擎,以其轻量级、高可靠性、完全自足、无需配置和跨平台的特性而闻名。SQLite3是SQLite数据库的第三版,提供了许多功能和改进,使其成为广泛使用的嵌入式数据库解决方案之一。SQLite3支持标准的SQL语法,并通过多种编程语言的接口进行访问,包括但不限于C、C++、Python和JavaScript。
SQLite3由于其轻量级和易于集成的特性,特别适合以下使用场景:
总之,SQLite3提供了一个轻量级的,不需要复杂配置和管理的数据库解决方案,非常适合单一应用程序和设备内部使用。
新的一些关键字
sqlite> SELECT COUNT(*) FROM favorites WHERE language = 'C';
+----------+
| COUNT(*) |
+----------+
| 98 |
+----------+
sqlite> SELECT COUNT(*) FROM favorites WHERE language = 'C' AND problem = 'Hello';
+----------+
| COUNT(*) |
+----------+
| 9 |
+----------+
最神奇的来了,这也是我们的目的
sqlite> SELECT language, COUNT(*) FROM favorites GROUP BY language;
+----------+----------+
| language | COUNT(*) |
+----------+----------+
| C | 98 |
| Python | 270 |
| Scratch | 62 |
+----------+----------+
这条SQL语句用于查询favorites
表中,按照language
字段分组后,每种语言出现的次数。具体来说,这条语句的各个部分的功能如下:
SELECT language, COUNT(*)
:这部分选择了两个字段。language
字段直接从favorites
表中选择出来,而COUNT(*)
是一个聚合函数,用于计算每个分组内的记录数。COUNT(*)
计算每个分组中的行数,包括所有行,不管列值是否为NULL。
FROM favorites
:指定了要从中选择数据的表。在这个例子中,数据来自名为favorites
的表。
GROUP BY language
:这部分指示数据库按照language
字段的值对结果集进行分组。每个不同的language
值都会产生一个分组,COUNT(*)
函数将分别计算每个这样的分组中的记录数。
这条语句的执行结果会是一个两列的表格,第一列是language
,包含了favorites
表中出现的所有不同的语言值;第二列是每种语言对应的记录数,即该语言在表中出现的次数。这对于理解favorites
表中不同编程语言的受欢迎程度或分布情况非常有用。
例如,如果favorites
表包含了用户喜爱的编程语言信息,这条语句可以告诉我们每种语言被多少用户喜爱,从而可以分析出最受欢迎的编程语言。
sqlite> SELECT language, COUNT(*) FROM favorites GROUP BY language ORDER BY COUNT(*);
+----------+----------+
| language | COUNT(*) |
+----------+----------+
| Scratch | 62 |
| C | 98 |
| Python | 270 |
+----------+----------+
sqlite> SELECT language, COUNT(*) FROM favorites GROUP BY language ORDER BY COUNT(*) DESC;
+----------+----------+
| language | COUNT(*) |
+----------+----------+
| Python | 270 |
| C | 98 |
| Scratch | 62 |
+----------+----------+
sqlite> SELECT language, COUNT(*) AS n FROM favorites GROUP BY language ORDER BY n DESC;
+----------+-----+
| language | n |
+----------+-----+
| Python | 270 |
| C | 98 |
| Scratch | 62 |
+----------+-----+
sqlite> SELECT language, COUNT(*) AS n FROM favorites GROUP BY language ORDER BY n DESC limit 1;
+----------+-----+
| language | n |
+----------+-----+
| Python | 270 |
+----------+-----+
sqlite> INSERT INTO favorites (language, problem) VALUES('SQL', 'Fiftyville');
sqlite> SELECT * FROM favorites;
sqlite> DELETE FROM favorites WHERE Timestamp IS NULL;
sqlite> SELECT * FROM favorites;
sqlite> UPDATE favorites SET language = 'SQL',problem = 'Fiftyville';
sqlite> SELECT * FROM favorites;
IMDb 互联网电影数据库
IMDb(Internet Movie Database)是一个在线数据库,收集了有关电影、电视节目、家庭视频、视频游戏以及网络流媒体内容的信息,包括演员、制作团队成员、个人传记、剧情摘要、技术数据、评分以及用户评论和评分。它为电影爱好者、电影行业专业人士以及一般观众提供了一个广泛的信息资源。
IMDb的主要功能和特点包括:
自1990年代初期创建以来,IMDb已经成为最权威和最全面的电影及娱乐相关内容的在线资源之一。它不仅是电影爱好者和评论家寻找信息的宝库,也为电影和电视制作行业的专业人士提供了重要的参考资料。
幕后将你的输入传递到SQL查询中
https://docs.google.com/spreadsheets/d/1VMqXYzR4rbZQaJx6ZJbUNXDxGhVFgsGxjY9hInmjazY/edit?usp=sharing
第三张表正在关联同一个节目ID与4个不同的人。
虽然看似多复制了一些数字,但计算机可以更快的处理数字
sqlite> .schema
CREATE TABLE shows (
id INTEGER,
title TEXT NOT NULL,
year NUMERIC,
episodes INT,
PRIMARY KEY(id)
);
CREATE TABLE genres (
show_id INTEGER NOT NULL,
genre TEXT NOT NULL,
FOREIGN KEY(show_id) REFERENCES shows(id)
);
CREATE TABLE stars (
show_id INTEGER NOT NULL,
person_id INTEGER NOT NULL,
FOREIGN KEY(show_id) REFERENCES shows(id),
FOREIGN KEY(person_id) REFERENCES people(id)
);
CREATE TABLE writers (
show_id INTEGER NOT NULL,
person_id INTEGER NOT NULL,
FOREIGN KEY(show_id) REFERENCES shows(id),
FOREIGN KEY(person_id) REFERENCES people(id)
);
CREATE TABLE ratings (
show_id INTEGER NOT NULL,
rating REAL NOT NULL,
votes INTEGER NOT NULL,
FOREIGN KEY(show_id) REFERENCES shows(id)
);
CREATE TABLE people (
id INTEGER,
name TEXT NOT NULL,
birth NUMERIC,
PRIMARY KEY(id)
);
SQL主要有5个数据类型
其他数据库还有更多的数据类型:Oracle, MySQL, Postgre
具有多个表的数据库,这些表之间存在关系
这个id列是该表的主键
外键只是其他表中主键的存在,外键是唯一标识数据的列,是我们能将两个表联系起来
sqlite> SELECT * FROM ratings WHERE rating >= 6.0 LIMIT 10;
+---------+--------+-------+
| show_id | rating | votes |
+---------+--------+-------+
| 62614 | 6.6 | 177 |
| 63881 | 7.8 | 696 |
| 63962 | 7.9 | 2523 |
| 65269 | 8.2 | 85 |
| 65270 | 7.2 | 14 |
| 65272 | 6.5 | 2357 |
| 65273 | 7.1 | 132 |
| 65274 | 7.2 | 86 |
| 65276 | 7.7 | 38 |
| 65277 | 7.9 | 20 |
+---------+--------+-------+
sqlite> SELECT * FROM shows WHERE id = 62614
...> ;
+-------+-------------+------+----------+
| id | title | year | episodes |
+-------+-------------+------+----------+
| 62614 | Zeg 'ns Aaa | 1981 | 214 |
+-------+-------------+------+----------+
sqlite> SELECT * FROM shows WHERE id IN
...> (SELECT show_id FROM ratings WHERE rating >= 6.0) LIMIT 10;
+-------+-----------------------------+------+----------+
| id | title | year | episodes |
+-------+-----------------------------+------+----------+
| 62614 | Zeg 'ns Aaa | 1981 | 214 |
| 63881 | Catweazle | 1970 | 26 |
| 63962 | UFO | 1970 | 26 |
| 65269 | Ace of Wands | 1970 | 46 |
| 65270 | The Adventures of Don Quick | 1970 | 6 |
| 65272 | All My Children | 1970 | 3914 |
| 65273 | Archie's Funhouse | 1970 | 23 |
| 65274 | Arnie | 1970 | 48 |
| 65276 | Barefoot in the Park | 1970 | 12 |
| 65277 | The Best of Everything | 1970 | 114 |
+-------+-----------------------------+------+----------+
sqlite> SELECT title FROM shows WHERE id IN
...> (SELECT show_id FROM ratings WHERE rating >= 6.0) LIMIT 10;
+-----------------------------+
| title |
+-----------------------------+
| Zeg 'ns Aaa |
| Catweazle |
| UFO |
| Ace of Wands |
| The Adventures of Don Quick |
| All My Children |
| Archie's Funhouse |
| Arnie |
| Barefoot in the Park |
| The Best of Everything |
+-----------------------------+
当SQL有两个及以上的表,实际上可以将他们结合到一起
以某种方式将这两个表连接到共同值
sqlite> SELECT * FROM shows JOIN ratings ON shows.id = ratings.show_id WHERE rating >= 6.0 LIMIT 10;
+-------+-----------------------------+------+----------+---------+--------+-------+
| id | title | year | episodes | show_id | rating | votes |
+-------+-----------------------------+------+----------+---------+--------+-------+
| 62614 | Zeg 'ns Aaa | 1981 | 214 | 62614 | 6.6 | 177 |
| 63881 | Catweazle | 1970 | 26 | 63881 | 7.8 | 696 |
| 63962 | UFO | 1970 | 26 | 63962 | 7.9 | 2523 |
| 65269 | Ace of Wands | 1970 | 46 | 65269 | 8.2 | 85 |
| 65270 | The Adventures of Don Quick | 1970 | 6 | 65270 | 7.2 | 14 |
| 65272 | All My Children | 1970 | 3914 | 65272 | 6.5 | 2357 |
| 65273 | Archie's Funhouse | 1970 | 23 | 65273 | 7.1 | 132 |
| 65274 | Arnie | 1970 | 48 | 65274 | 7.2 | 86 |
| 65276 | Barefoot in the Park | 1970 | 12 | 65276 | 7.7 | 38 |
| 65277 | The Best of Everything | 1970 | 114 | 65277 | 7.9 | 20 |
+-------+-----------------------------+------+----------+---------+--------+-------+
sqlite> SELECT title,rating FROM shows JOIN ratings ON shows.id = ratings.show_id WHERE rating >= 6.0
LIMIT 10;
+-----------------------------+--------+
| title | rating |
+-----------------------------+--------+
| Zeg 'ns Aaa | 6.6 |
| Catweazle | 7.8 |
| UFO | 7.9 |
| Ace of Wands | 8.2 |
| The Adventures of Don Quick | 7.2 |
| All My Children | 6.5 |
| Archie's Funhouse | 7.1 |
| Arnie | 7.2 |
| Barefoot in the Park | 7.7 |
| The Best of Everything | 7.9 |
+-----------------------------+--------+
IMDB支持 one - to - many
sqlite> SELECT * FROM genres LIMIT 10;
+---------+-----------+
| show_id | genre |
+---------+-----------+
| 62614 | Comedy |
| 63881 | Adventure |
| 63881 | Comedy |
| 63881 | Family |
| 63962 | Action |
| 63962 | Sci-Fi |
| 65269 | Family |
| 65269 | Fantasy |
| 65270 | Comedy |
| 65270 | Sci-Fi |
+---------+-----------+
sqlite> SELECT * FROM shows WHERE id = 63881;
+-------+-----------+------+----------+
| id | title | year | episodes |
+-------+-----------+------+----------+
| 63881 | Catweazle | 1970 | 26 |
sqlite> SELECT title FROM shows WHERE id IN
...> (SELECT show_id FROM genres WHERE genre = 'Comedy' LIMIT 10);
+-----------------------------+
| title |
+-----------------------------+
| Zeg 'ns Aaa |
| Catweazle |
| The Adventures of Don Quick |
| Albert and Victoria |
| Archie's Funhouse |
| Arnie |
| Barefoot in the Park |
| Comedy Tonight |
| The Culture Vultures |
| Make Room for Granddaddy |
+-----------------------------+
sqlite> SELECT * FROM shows JOIN genres ON shows.id = genres.show_id WHERE id = 63881;
+-------+-----------+------+----------+---------+-----------+
| id | title | year | episodes | show_id | genre |
+-------+-----------+------+----------+---------+-----------+
| 63881 | Catweazle | 1970 | 26 | 63881 | Adventure |
| 63881 | Catweazle | 1970 | 26 | 63881 | Comedy |
| 63881 | Catweazle | 1970 | 26 | 63881 | Family |
+-------+-----------+------+----------+---------+-----------+
many to many
sqlite> SELECT title FROM shows , stars, people
...> WHERE shows.id = stars.show_id
...> AND people.id = stars.person_id
...> AND name = 'Steve Carell';
+------------------------------------+
| title |
+------------------------------------+
| The Dana Carvey Show |
| Over the Top |
| Watching Ellie |
| Come to Papa |
| The Office |
| Entertainers with Byron Allen |
| The Naked Trucker and T-Bones Show |
| Made in Hollywood |
| ES.TV HD |
| Mark at the Movies |
| Inside Comedy |
| Rove LA |
| Metacafe Unfiltered |
| Fabrice Fabrice Interviews |
| Riot |
| Séries express |
| Hollywood Sessions |
| IMDb First Credit |
| First Impressions with Dana Carvey |
| The Morning Show |
| LA Times: The Envelope |
+------------------------------------+
SQL 对其关键字不区分大小写
sqlite> .timer on
sqlite> SELECT * FROM shows WHERE title = 'The Office'
...> ;
+---------+------------+------+----------+
| id | title | year | episodes |
+---------+------------+------+----------+
| 112108 | The Office | 1995 | 6 |
| 290978 | The Office | 2001 | 14 |
| 386676 | The Office | 2005 | 188 |
| 1791001 | The Office | 2010 | 30 |
| 2186395 | The Office | 2012 | 8 |
| 8305218 | The Office | 2019 | 28 |
+---------+------------+------+----------+
Run Time: real 0.027 user 0.024752 sys 0.001951
可以创建索引 。如果经常搜索某些列
sqlite> SELECT * FROM shows WHERE title = 'The Office'
...> ;
+---------+------------+------+----------+
| id | title | year | episodes |
+---------+------------+------+----------+
| 112108 | The Office | 1995 | 6 |
| 290978 | The Office | 2001 | 14 |
| 386676 | The Office | 2005 | 188 |
| 1791001 | The Office | 2010 | 30 |
| 2186395 | The Office | 2012 | 8 |
| 8305218 | The Office | 2019 | 28 |
+---------+------------+------+----------+
Run Time: real 0.027 user 0.024752 sys 0.001951
sqlite> CREATE INDEX title_index ON shows (title);
Run Time: real 0.298 user 0.147998 sys 0.033431
sqlite> SELECT * FROM shows WHERE title = 'The Office';
+---------+------------+------+----------+
| id | title | year | episodes |
+---------+------------+------+----------+
| 112108 | The Office | 1995 | 6 |
| 290978 | The Office | 2001 | 14 |
| 386676 | The Office | 2005 | 188 |
| 1791001 | The Office | 2010 | 30 |
| 2186395 | The Office | 2012 | 8 |
| 8305218 | The Office | 2019 | 28 |
+---------+------------+------+----------+
Run Time: real 0.000 user 0.000169 sys 0.000152
此时是在记忆b树 ,不是二叉树
这样就不是对整个列进行简单的线性检索
B树(B-tree)是一种自平衡的树数据结构,它维持数据的排序,并允许搜索、顺序访问、插入和删除数据的操作都在对数时间内完成。B树特别适用于存储在外部存储器中的数据结构,如硬盘,这是因为B树的设计减少了磁盘I/O操作的次数。在数据库系统和文件系统的索引中,B树因其高效的数据访问和存储效率而广泛应用。
B树有几个变体,包括B+树和B*树,它们在特定应用中提供了额外的优势。例如,B+树在数据库索引中特别流行,因为它通过将所有数据保持在叶子节点并通过指针连接它们来优化顺序访问和范围查询。
总的来说,B树是一个强大的数据结构,用于管理大量需要频繁读写操作的数据,特别是在系统性能受到物理存储限制的情况下。
每当声明数据库中的主键,可以免费获得索引
重点不涉及csv文件,打开它和迭代行,实际上可以使用python来执行SQL查询
SQL最擅长从数据库读取数据,python可能是创建用户界面的最佳选择或制作一个网络应用的程序
from cs50 import SQL
db = SQL("sqlite:///favorites.db")
favorite = input("favorite: ")
rows = db.execute("SELECT COUNT(*) AS n FROM favorites WHERE problem = ?", favorite)
row = rows[0]
print(row["n"])
点赞可能背后就运行的上面代码
像这样的代码可能是危险的,存在竞争
原子的表明他们应该不间断的发生或者根本不应该发生,这确保了数学不会出错
是危险的