现在MVC架构开发中。m部分是提供底层数据。无论是cs还是bs越来越看重数据对象的业务处理,而不是以前原生的sql得到的数据。
因此,1个通用的关系数据库字段对应对模型对象的框架就比较重要了。有了他可以节省大量的开发时间。
本篇侧重简要分析django中的orm。
对于orm,既然是通用,那么就存在5个重要问题。
1:如何多数据库的支持
2:对象字段类型的提供
3:sql->object如何实现转换
4:object->sql如何存储
5:数据库字段关联关系在对象中的对应。
11分析如上几个问题。
1:对于多数据库支持,这个问题比较好解决,因为sql通用的操作代码比较多。差别在细微和一些数据库特性。数据库联接上。基于分层思想。
高层封装sql通用方法类。特性方法。和数据库操作方法类。在具体对应的数据库实现文件中针对性改写。
根据猜想,分析django.db.backends模块下文件。得到
backends
….
|----sqlite3
|----__init__.py
|----base.py 基于sqlite3构造继承数据库基类。填充sqlte3特性,完善sqlte3的运算操作与游标
|----client.py 本地执行sqlite3 构造db文件
|----creation.py 提供构造数据库的model与数据库字段对应关系,填充创建方法
|----introspection.py 提供自省对应的model与数据库字段关系,填充自省方法
__init__.py 数据库基础,特性,运算操作,自省,本地执行,验证 等基类封装
creation.py 构造数据库基类封装
signals.py 数据库构造信号
util.py 游标,调试游标(记录时间)封装
问题2:对象字段类型的提供。
sql语句与iphone的视图与android的控件一样。都是1个提供比较同样方法或者属性的集合。细微差别在子类实现时候差异化。
对于sql提供的字段,有整形,布尔型,字符,浮点,日期等。。。
这段比较不熟悉,因此分析models的模块得到如下目录
models
|----fields
|----__init__.py 对象的基类与对象类
|----fies.py 文件对象
|----proxy.py 代理类,暂时还不知道用途
|----related.py 关联关系(1对多,多对多,反向,主见,1对1)
|----subclassing.py 暂时不知道
|----sql
|----__init__.py
|----aggregates.py 集合类,用于组合sql字段模版,重点是as_sql
|----complier.py sql语句组合类,最终查询是通过此类实现
|----constants.py 常量
|----datastructures.py 暂时不清楚
|----expressions.py 评估程序?暂时不清楚
|----query.py 查询
|----subqueries.py 增删改查表达式(从Qeury文件中继承)
|----where.py
__init__.py
aggregates.py 集合类,平均数,统计,最大,最小,stdDev,求和,方差
base.py 模型基类
constants.py
deletion.py 删除的集合类collecor
expression.py 逻辑运算表达式
loading.py 加载 加载应用程序于模型,注册模型,等方法的封装
manager.py 对象管理类。对象管理描述符类
options.py 模型对象支持的选项
query.py 查询集合类(基类,值类,值列表,时间,空)
query_utils.py 查询包装
related.py
signals.py 操作相关的信号定义
大致根据如上的代码详细查看源代码会方便理解。
有了sql的连接管理,有了model提供字段,如何管理对象。。
从django的使用上可以看出。
模型.管理.操作()[切片处理]得到QuerySet。其中QuerySet封装在models.query.py中。
下面就剩下3个问题。
3:sql->object,从QuerySet入手。分析iterator()
def iterator(self):
"""
An iterator over the results from applying this QuerySet to the
database.
"""
...
# Cache db and model outside the loop
db = self.db
model = self.model
compiler = self.query.get_compiler(using=db)
if fill_cache:
klass_info = get_klass_info(model, max_depth=max_depth,
requested=requested, only_load=only_load) 得到类信息
for row in compiler.results_iter(): 从数据库遍历字段
if fill_cache:
obj, _ = get_cached_row(row, index_start, db, klass_info,
offset=len(aggregate_select)) 获取字段
else:
# Omit aggregates in object creation.
row_data = row[index_start:aggregate_start]
if skip:
obj = model_cls(**dict(zip(init_list, row_data)))
else:
obj = model(*row_data) 根据字段生成模型对象
# Store the source database of the object
obj._state.db = db
# This object came from the database; it's not being added.
obj._state.adding = False
if extra_select:
for i, k in enumerate(extra_select):
setattr(obj, k, row[i])
# Add the aggregates to the model
if aggregate_select:
for i, aggregate in enumerate(aggregate_select):
setattr(obj, aggregate, row[i + aggregate_start])
# Add the known related objects to the model, if there are any
if self._known_related_objects:
for field, rel_objs in self._known_related_objects.items():
pk = getattr(obj, field.get_attname())
try:
rel_obj = rel_objs[pk]
except KeyError:
pass # may happen in qs1 | qs2 scenarios
else:
setattr(obj, field.name, rel_obj)
yield obj
def get_cached_row(row, index_start, using, klass_info, offset=0):
"""
Helper function that recursively returns an object with the specified
related attributes already populated.
This method may be called recursively to populate deep select_related()
clauses.
Arguments:
* row - the row of data returned by the database cursor
* index_start - the index of the row at which data for this
object is known to start
* offset - the number of additional fields that are known to
exist in row for `klass`. This usually means the number of
annotated results on `klass`.
* using - the database alias on which the query is being executed.
* klass_info - result of the get_klass_info function
"""
if klass_info is None:
return None
klass, field_names, field_count, related_fields, reverse_related_fields, pk_idx = klass_info
fields = row[index_start : index_start + field_count]
# If the pk column is None (or the Oracle equivalent ''), then the related
# object must be non-existent - set the relation to None.
if fields[pk_idx] == None or fields[pk_idx] == '':
obj = None
elif field_names:
obj = klass(**dict(zip(field_names, fields)))
else:
obj = klass(*fields)生成对象
# If an object was retrieved, set the database state.
if obj:
obj._state.db = using
obj._state.adding = False
# Instantiate related fields
index_end = index_start + field_count + offset
# Iterate over each related object, populating any
# select_related() fields
for f, klass_info in related_fields:
# Recursively retrieve the data for the related object
cached_row = get_cached_row(row, index_end, using, klass_info)
# If the recursive descent found an object, populate the
# descriptor caches relevant to the object
if cached_row:
rel_obj, index_end = cached_row
if obj is not None:
# If the base object exists, populate the
# descriptor cache
setattr(obj, f.get_cache_name(), rel_obj)
if f.unique and rel_obj is not None:
# If the field is unique, populate the
# reverse descriptor cache on the related object
setattr(rel_obj, f.related.get_cache_name(), obj)
# Now do the same, but for reverse related objects.
# Only handle the restricted case - i.e., don't do a depth
# descent into reverse relations unless explicitly requested
for f, klass_info in reverse_related_fields:
# Recursively retrieve the data for the related object
cached_row = get_cached_row(row, index_end, using, klass_info)
# If the recursive descent found an object, populate the
# descriptor caches relevant to the object
if cached_row:
rel_obj, index_end = cached_row
if obj is not None:
# If the field is unique, populate the
# reverse descriptor cache
setattr(obj, f.related.get_cache_name(), rel_obj)
if rel_obj is not None:
# If the related object exists, populate
# the descriptor cache.
setattr(rel_obj, f.get_cache_name(), obj)
# Now populate all the non-local field values on the related
# object. If this object has deferred fields, we need to use
# the opts from the original model to get non-local fields
# correctly.
opts = rel_obj._meta
if getattr(rel_obj, '_deferred'):
opts = opts.proxy_for_model._meta
for rel_field, rel_model in opts.get_fields_with_model():
if rel_model is not None:
setattr(rel_obj, rel_field.attname, getattr(obj, rel_field.attname))
# populate the field cache for any related object
# that has already been retrieved
if rel_field.rel:
try:
cached_obj = getattr(obj, rel_field.get_cache_name())
setattr(rel_obj, rel_field.get_cache_name(), cached_obj)
except AttributeError:
# Related object hasn't been cached yet
pass
return obj, index_end
以上是简要分析。
下面还有如何把object->sql
object.save()方法,执行以后会将数据存储到db。因此分析model基类的save方法。分析到save_base方法
def save_base(self, raw=False, cls=None, origin=None, force_insert=False,
force_update=False, using=None, update_fields=None):
...
manager = cls._base_manager
...
result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw)这里可以看出是插入部分
def _insert(self, objs, fields, **kwargs):
return insert_query(self.model, objs, fields, **kwargs)
def insert_query(model, objs, fields, return_id=False, raw=False, using=None):
query = sql.InsertQuery(model)生成插入查询对象
query.insert_values(fields, objs, raw=raw)插入值
return query.get_compiler(using=using).execute_sql(return_id)获得生成sql对象,并执行sql语句
插入值部分
def insert_values(self, fields, objs, raw=False):
self.fields = fields
# Check that no Promise object reaches the DB. Refs #10498.
for field in fields:
for obj in objs:
value = getattr(obj, field.attname)
if isinstance(value, Promise):
setattr(obj, field.attname, force_text(value))
self.objs = objs
self.raw = raw
查看执行语句
def execute_sql(self, return_id=False):
assert not (return_id and len(self.query.objs) != 1)
self.return_id = return_id
cursor = self.connection.cursor()
for sql, params in self.as_sql():
cursor.execute(sql, params)
if not (return_id and cursor):
return
if self.connection.features.can_return_id_from_insert:
return self.connection.ops.fetch_returned_insert_id(cursor)
return self.connection.ops.last_insert_id(cursor,
self.query.model._meta.db_table, self.query.model._meta.pk.column)
def as_sql(self):
qn = self.connection.ops.quote_name
opts = self.query.model._meta
result = ['INSERT INTO %s' % qn(opts.db_table)]
has_fields = bool(self.query.fields)
fields = self.query.fields if has_fields else [opts.pk]
result.append('(%s)' % ', '.join([qn(f.column) for f in fields]))
if has_fields:
params = values = [
[
f.get_db_prep_save(getattr(obj, f.attname) if self.query.raw else f.pre_save(obj, True), connection=self.connection)
for f in fields
]
for obj in self.query.objs
]
else:
values = [[self.connection.ops.pk_default_value()] for obj in self.query.objs]
params = [[]]
fields = [None]
can_bulk = (not any(hasattr(field, "get_placeholder") for field in fields) and
not self.return_id and self.connection.features.has_bulk_insert)
if can_bulk:
placeholders = [["%s"] * len(fields)]
else:
placeholders = [
[self.placeholder(field, v) for field, v in zip(fields, val)]
for val in values
]
# Oracle Spatial needs to remove some values due to #10888
params = self.connection.ops.modify_insert_params(placeholders, params)
if self.return_id and self.connection.features.can_return_id_from_insert:
params = params[0]
col = "%s.%s" % (qn(opts.db_table), qn(opts.pk.column))
result.append("VALUES (%s)" % ", ".join(placeholders[0]))
r_fmt, r_params = self.connection.ops.return_insert_id()
# Skip empty r_fmt to allow subclasses to customize behaviour for
# 3rd party backends. Refs #19096.
if r_fmt:
result.append(r_fmt % col)
params += r_params
return [(" ".join(result), tuple(params))]
if can_bulk:
result.append(self.connection.ops.bulk_insert_sql(fields, len(values)))
return [(" ".join(result), tuple([v for val in values for v in val]))]
else:
return [
(" ".join(result + ["VALUES (%s)" % ", ".join(p)]), vals)
for p, vals in zip(placeholders, params)
]
上面看到很明显的组合sql语句。
因此object->sql部分逻辑是
object->得到表名,字段名,和值
格式化到 insert into 表 (字段) values (值) 并执行
最后1个问题。。有点复杂的说。要深入学习下。