PyODPS是MaxCompute的Python SDK,提供DataFrame框架和MaxCompute对象的基本操作方法。
接口手册:https://pyodps.readthedocs.io/zh_CN/latest/?spm=a2c4g.11186623.0.0.1aaf3d94n84mIN
PySdk下载:https://github.com/aliyun/aliyun-odps-python-sdk
from odps import ODPS
access_id = "username"
access_key = "password"
project_name = "zxcp_bz"
endpoint = "address"
db_entry = ODPS(access_id=access_id, secret_access_key=access_key, project=project_name, endpoint=endpoint)
方法 | 说明 |
---|---|
db_entry.exist_project(name=) | 当project存在或不存在时,返回 True | False;当project存在,但是没权限是抛出异常; |
db_entry.get_project(name=) |
for table in db_entry.list_tables():
print(table.name)
属性/方法 | 说明 |
---|---|
table.name | |
table.schema | |
table.owner | |
table.comment | |
table.creation_time | |
table.last_modified_time | |
table.record_num | |
table.lifecycle | |
table.size |
创建表
db_entry.create_table(name=, schema=, project=, comment=, if_not_exists=, lifecycle=)
说明:这里 schema可以是一个 odps.modules.Schema,也可以是一个 sql_string;
读取数据
方式一:table.open_reader()
o_table = db_entry.get_table(name="table_name", project="schema_name")
with o_table.open_reader(partition=None) as reader:
for record in reader:
print(record)
方式二:db_entry.read_table()
for record in db_entry.read_table(name="", project="", limit=, start=, step=, partition=):
print(record)
table.open_writer()
o_table = db_entry.get_table(name="t_tunnel_test_t", project="zxcp_bz")
with o_table.open_writer(partition=None) as writer:
records_a = [["QOI908", "768905", "pyjdk,pyodps", "202107"],
["HOI908", "345890", "pyjdk,pyodps", "202203"]]
records_b = [o_table.new_record(["QOI908", "768905", "pyjdk,pyodps", "202107"]),
o_table.new_record(["HOI908", "345890", "pyjdk,pyodps", "202203"])]
writer.write(records_a)
writer.write(records_b)
说明:此处records的格式可以是 list,也可以是 record对象集合;
方式二:db_entry.write_table()
records_c = [["QOI908", "768905", "pyjdk,pyodps", "202107"],
["HOI908", "345890", "pyjdk,pyodps", "202203"]]
db_entry.write_table("zxcp_bz.t_tunnel_test_t", records_c)
说明:
1)使用该方法,不建议使用关键字参数赋值,直接使用变量;
2)表名可以直接将 Project和table_name拼一起作为 name 参数的值;
此处略过;
(一) 在odps 中 DDL 和 DML 类 sql执行可以调用如下方法:
A:instance_sql=db_entry.execute_sql(sql=, project=,hints=)
说明:该方法以同步的形式执行,sql执行结束后返回;
B:instance_sql=db_entry.run_sql(sql=, project=,hints=)
说明:该方法以异步的形式执行,可以添加如下语句等待语句执行完成;
instance_sql.wait_for_success()
# 阻塞直到完成
(二)在 odps中执行 DCL 类sql语句,可以调用如下方法:
db_entry.run_security_query(query=,project=)
(三) 运行参数
hints参数内容如下:
hints={'odps.sql.mapper.split.size': 16}
全局运行参数:
from odps import options
options.sql.settings = {'odps.sql.mapper.split.size': 16}
# 此处后面的语句都将添加 Hints运行参数
db_entry.execute_sql(sql=, project=)
(四)、查询结果
instance_sql = db_entry.execute_sql(sql="select * from zxcp_bz.t_tunnel_test_t")
with instance_sql.open_reader() as reader:
for record in reader:
print(record)
from odps.df import DataFrame
o_table = db_entry.get_table(name="table_name", project="project_name")
t_df=DataFrame(o_table)
t_df.dtypes
t_df.head(n)
t_df[["column_1","column_2"]]
t_df.exclude("column_1","column_2").head()
======================================= over ==========================================