【python】一文带你了解什么是dataclass?

为什么需要dataclass数据类

Python 3.7(PEP 557)后引入一个新功能是装饰器@dataclass,它通过自动生成特殊方法(如__init__() 和 __repr__() ...等魔术方法 )来简化数据类的创建。

数据类和普通类一样,但设计用于存储数据、结构简单、用于将相关的数据组织在一起、具有清晰字段的类。

这种类,也称为数据结构,非常常见。例如,用于存储点坐标的类只是一个具有 3 个字段(x、y、z)的类。

而如果不使用类来表示,python中也有其它可替换的数据结构。


假设我们现在遇到一个场景, 需要一个数据对象来保存一些运动员信息,信息包括球员姓名,号码,位置,年龄。

使用tuple

harden = ('James Harden', 1, 'PG', 34)
print(harden[2])  # PG

劣势: 不灵活,创建和取值基于位置,需要记住坐标对应的信息。

使用namedtuple

from collections import namedtuple

Player = namedtuple('Player', ['name', 'number', 'position', 'age', 'grade'])

jordan = Player('James Harden', 1, 'PG', 1, 'S+')

print(jordan)  # Player(name='James Harden', number=1, position='PG', age=1, grade='S+')

print(jordan.name)  # James Harden

使用namedtuple可以使用.获取数据的属性, 可以明确数据的属性名称,但是仍然存在一些问题,比如:

  • 数据无法修改。
  • 无法自定义数据比较,没有默认值,没有函数支持。

使用typing.NamedTuple

from typing import NamedTuple

class Player(NamedTuple):
    name: str
    number: int
    position: str
    age: int
    grade: str

jordan = Player('James Harden', 1, 'PG', 1, 'S+')

print(jordan)        # Player(name='James Harden', number=1, position='PG', age=1, grade='S+')
print(jordan.name)   # James Harden

通过类型提示让代码更具可读性和可维护性。但同样有namedtuple的一些问题,如不可变性等。

使用dict

使用dict来存放一些参数,配置信息,相比tuple来说可以支持更复杂的嵌套结构。

jordan = {'name': 'James Harden', 'number': 1, 'position': 'PG', 'age': 34}
print(jordan['position'])  # PG

劣势: 无法对数据属性名进行控制。

使用typing.TypedDict

可以更多的利用类型检查来帮助减少错误发生的可能,同时也能帮助其他开发者理解复杂数据结构。

from typing import TypedDict

class Player(TypedDict):
    name: str
    number: int
    position: str
    age: int

jordan: Player = {'name': 'James Harden', 'number': 1, 'position': 'PG', 'age': 34}

print(jordan['position'])  # Output: PG

总的来说,对于一些简单的场景,tuplenamedtupledict还是有一席用武之地的,但是在一些更复杂的场景中,这三者就显得没那么好用了,比如:数据比较,设置默认值等。

因此,我们一般会通过自定义类来实现复杂场景的数据类。

class Player:

    def __init__(self, name, number, position, age, grade):
        self.name = name
        self.number = number
        self.position = position
        self.age = age
        self.grade = grade


harden = Player('James Harden', 1, 'PG', 34, 'S+')
bryant = Player(name='Kobe Bryant', number=24, position='PG', age=41, grade='S+')

print(harden.name)  # James Harden
print(bryant.name)  # Kobe Bryant

print(harden)  # <__main__.Player object at 0x000002431AFC6E00>

print(harden < bryant)

结果:

James Harden
Kobe Bryant
<__main__.Player object at 0x000002431AFC6E00>
Traceback (most recent call last):
  File "F:\study\django-restframesork-jwt-demo\test\1.py", line 33, in <module>
    print(harden < bryant)
TypeError: '<' not supported between instances of 'Player' and 'Player'

然而,这样定义的类还是有以下问题:

  1. 不支持比较
  2. 对于对象的描述不太友好

为了解决上面两个问题,可以通过实现__repr__方法来自定义描述, 实现__gt__方法来支持比较的功能。

假设比较的属性为age, 更新代码如下:

class Player:

    def __init__(self, name, number, position, age, grade):
        self.name = name
        self.number = number
        self.position = position
        self.age = age
        self.grade = grade

    def __repr__(self):
        return f'Player: {self.name} : {self.age}'

    def __gt__(self, other):
        return self.age > other.age

    def __eq__(self, other):
        return self.age == other.age


harden = Player('James Harden', 1, 'PG', 34, 'S+')
bryant = Player(name='Kobe Bryant', number=24, position='PG', age=41, grade='S+')

print(harden.name)  # James Harden
print(bryant.name)  # Kobe Bryant

print(harden)  # Player: James Harden : 34

print(harden < bryant)  # True

这样,这个数据对象有了更直观的描述, 支持了对比。

我们经常需要添加构造函数表示方法比较函数等。这些函数很麻烦,而这正是语言应该透明地处理的。

from dataclasses import dataclass


@dataclass(order=True)
class Player:
    name: str
    number: int
    position: str

    grade: str
    age: int = 18  # 默认值,跟函数定义一样,需要往后放


harden = Player('James Harden', 1, 'PG', 'S+', 34)
bryant = Player(name='Kobe Bryant', number=24, position='PG', grade='S+', age=41)

print(harden.name)  # James Harden
print(bryant.name)  # Kobe Bryant

print(harden)  # Player(name='James Harden', number=1, position='PG', grade='S+', age=34)

# 比较, 默认按照属性定义的顺序比较的
print(harden < bryant)  # True

dataclass相较于dicttuple具有明显优势。它能更精确地指定每个成员变量的类型,同时提供字段名的检查,大大降低了出错的可能性。相对于传统的类定义,使用dataclass更加简洁,省去了冗长的__init__方法等,只需直接列出成员变量即可。

数据类更易于阅读和理解,类型提示使得读者更自然地理解数据的组织结构。当数据类清晰明了时,读者更容易形成准确的假设,也更容易发现并修复潜在的错误。

使用dataclass改造了之后,看起来结果也是符合预期的,但是我们需要了解下其中的原理,不然也是会不经意间遗留下bug

你是否好奇dataclass加上的这些魔术方法是什么样的?比如说比较的逻辑是什么?

接下来我们看一下源码及官方的介绍,那样你就知道上面的代码是否有问题啦!

dataclass如何装饰类

def dataclass(cls=None, /, *, init=True, repr=True, eq=True, order=False,
              unsafe_hash=False, frozen=False, match_args=True,
              kw_only=False, slots=False):
    """Returns the same class as was passed in, with dunder methods
    added based on the fields defined in the class.

    Examines PEP 526 __annotations__ to determine fields.

    If init is true, an __init__() method is added to the class. If
    repr is true, a __repr__() method is added. If order is true, rich
    comparison dunder methods are added. If unsafe_hash is true, a
    __hash__() method function is added. If frozen is true, fields may
    not be assigned to after instance creation. If match_args is true,
    the __match_args__ tuple is added. If kw_only is true, then by
    default all fields are keyword-only. If slots is true, an
    __slots__ attribute is added.
    """

    def wrap(cls):
        return _process_class(cls, init, repr, eq, order, unsafe_hash,
                              frozen, match_args, kw_only, slots)

    # See if we're being called as @dataclass or @dataclass().
    if cls is None:
        # We're called with parens.
        return wrap

    # We're called as @dataclass without parens.
    return wrap(cls)

dataclass提供了一些字段,使用这些字段,装饰器将生成的方法定义添加到类中,以支持实例初始化repr比较方法以及规范部分中所述的其他方法(可选)。

  • init :如果为 True,则生成__init__方法。
  • repr :如果为 True,则生成__repr__方法。
  • eq :如果为 True,则通过比较字段作为元组来生成__eq__方法。
  • order :如果为 True,则生成__lt____le____gt____ge__方法。
  • unsafe_hash :如果为True,则将生成函数__hash__
  • frozen :如果为True,则实例将是不可变的(只读)。

这样的类称为Data类,但该类实际上并没有什么特别之处,装饰器将生成的方法添加到类中,并返回给定的相同类。

举个例子:

@dataclass
class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

@dataclass装饰器可以将这些方法的等效项添加到InventoryItem类中,可以通过参数控制:

def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0) -> None:
    self.name = name
    self.unit_price = unit_price
    self.quantity_on_hand = quantity_on_hand
def __repr__(self):
    return f'InventoryItem(name={self.name!r}, unit_price={self.unit_price!r}, quantity_on_hand={self.quantity_on_hand!r})'
def __eq__(self, other):
    if other.__class__ is self.__class__:
        return (self.name, self.unit_price, self.quantity_on_hand) == (other.name, other.unit_price, other.quantity_on_hand)
    return NotImplemented
def __ne__(self, other):
    if other.__class__ is self.__class__:
        return (self.name, self.unit_price, self.quantity_on_hand) != (other.name, other.unit_price, other.quantity_on_hand)
    return NotImplemented
def __lt__(self, other):
    if other.__class__ is self.__class__:
        return (self.name, self.unit_price, self.quantity_on_hand) < (other.name, other.unit_price, other.quantity_on_hand)
    return NotImplemented
def __le__(self, other):
    if other.__class__ is self.__class__:
        return (self.name, self.unit_price, self.quantity_on_hand) <= (other.name, other.unit_price, other.quantity_on_hand)
    return NotImplemented
def __gt__(self, other):
    if other.__class__ is self.__class__:
        return (self.name, self.unit_price, self.quantity_on_hand) > (other.name, other.unit_price, other.quantity_on_hand)
    return NotImplemented
def __ge__(self, other):
    if other.__class__ is self.__class__:
        return (self.name, self.unit_price, self.quantity_on_hand) >= (other.name, other.unit_price, other.quantity_on_hand)
    return NotImplemented

看完上面的例子,我们也就对其原理有了一定了解,dataclass在一定程度上帮我们简化了数据类的定义,但是如果我们需要精准控制我们的程序,还是需要我们重写其中的相关魔术方法的。

我们再来看下运动员的例子,使用dataclass改造如下,以实现更精准的控制:

from dataclasses import dataclass


@dataclass
class Player:
    name: str
    number: int
    position: str
    grade: str
    age: int = 18

    def __eq__(self, other):
        return self.age == other.age  # 只比较age

    def __lt__(self, other):
        return self.age < other.age  # 只比较 age


# 示例使用
harden = Player('James Harden', 1, 'PG', 'S+', 34)
bryant = Player(name='Kobe Bryant', number=24, position='PG', grade='S+', age=41)

result = harden < bryant  # 按照 age 进行比较
print(result)  # 输出 True,因为 34 < 41
print(harden.name)  # James Harden
print(bryant.name)  # Kobe Bryant
print(bryant == harden)  # False

当然,如果都要自己重载实现,那dataclass看起来也是不太聪明的样子。不想全部的字段都参与,dataclass也是提供了其它机制用于简化。

dataclass 的使用

通过上面的示例,我们了解到,dataclass帮我们模板化的实现了一批魔术方法,而我们要做的仅仅是根据需求调整dataclass的参数或者在适当的时候进行部分重载以满足我们的实际场景。

类型提示和默认值

与函数参数规则一样,具有默认值的属性必须出现在没有默认值的属性之后。

from dataclasses import dataclass
from typing import Any


@dataclass
class Player:
    name: str
    number: int
    position: str
    grade: str
    age: int = 18
    team: Any = "nba"


# 示例使用
harden = Player('James Harden', 1, 'PG', 'S+', 34)
bryant = Player(name='Kobe Bryant', number=24, position='PG', grade='S+')

print(harden.name)  # James Harden
print(bryant.name)  # Kobe Bryant
print(bryant.age)  # 18
print(bryant.team)  # nba

数据嵌套

数据类可以嵌套为其他数据类的字段,可以简单创建一个有2个队员的球队。快船队包含:哈登和小卡。

from dataclasses import dataclass
from typing import List


@dataclass
class Player:
    name: str
    number: int
    position: str
    grade: str
    age: int = 18


@dataclass
class Team:
    name: str
    players: List[Player]


# 示例使用
harden = Player('James Harden', 1, 'PG', 'S+', 34)
leonard = Player(name='Kawhi Leonard', number=2, position='SF', grade='S+')

clippers = Team("clippers", [harden, leonard])
print(harden.name)  # James Harden
print(leonard.name)  # Kawhi Leonard
print(leonard.age)  # 18

print(clippers)  # Team(name='clippers', players=[Player(name='James Harden', number=1, position='PG', grade='S+', age=34), Player(name='Kawhi Leonard', number=2, position='SF', grade='S+', age=18)])

继承

from dataclasses import dataclass, field


@dataclass(order=True)
class Person:
    name: str
    age: int


@dataclass(order=True)
class Player(Person):
    number: int
    position: str
    grade: str
    team: str = "nba"


# 示例使用
harden = Player(name='James Harden', age=34, number=1, position='PG', grade='S+')
bryant = Player(name='Kobe Bryant', age=41, number=24, position='PG', grade='S+')

print(harden.name)  # James Harden
print(bryant.name)  # Kobe Bryant
print(bryant.age)  # 41
print(bryant.team)  # nba

# 使用 order 参数,可以比较对象的大小(用于排序)
print(harden < bryant)  # True

类中定义的字段的顺序(先父类,再当前类)。

不定长参数

数据类一般建议是显示声明属性。如果你想额外接收一些参数,可能以下方法可以满足你。

from dataclasses import dataclass, field


@dataclass
class Player:
    name: str
    number: int
    position: str
    grade: str
    age: int = 18
    args: tuple = ()
    kwargs: dict = field(default_factory=dict)


# 示例使用
harden = Player('James Harden', 1, 'PG', 'S+', 34)
bryant = Player(name='Kobe Bryant', number=24, position='PG', grade='S+', args=(1, 2), kwargs={"hello": "world"})

print(bryant)

输出:

Player(name='Kobe Bryant', number=24, position='PG', grade='S+', age=18, args=(1, 2), kwargs={'hello': 'world'})

field对象

如果数据类的属性是不可变类型,可以直接为其赋默认值,然而当属性是不可变类型时,直接给定默认值时会报错。

from dataclasses import dataclass
from typing import List


@dataclass
class Player:
    name: str
    number: int
    position: str
    grade: str
    age: int = 18


# 示例使用
harden = Player('James Harden', 1, 'PG', 'S+', 34)
leonard = Player(name='Kawhi Leonard', number=2, position='SF', grade='S+')


@dataclass
class Team:
    name: str
    players: List[Player] = [leonard]  # 这里会报错


clippers = Team("clippers", [harden, leonard])
print(harden.name)  
print(leonard.name)  
print(leonard.age) 

print(clippers)  

输出:

ValueError: mutable default <class 'list'> for field players is not allowed: use default_factory

dataclass默认阻止使用可变数据做默认值

正如报错提示的一样,这时候field对象就登场了。

from dataclasses import dataclass, field, fields
from typing import List


@dataclass
class Player:
    name: str
    number: int
    position: str
    grade: str
    age: int = 18


# 示例使用
harden = Player('James Harden', 1, 'PG', 'S+', 34)
leonard = Player(name='Kawhi Leonard', number=2, position='SF', grade='S+')


@dataclass
class Team:
    name: str = field(metadata={'unit': 'name'})
    players: List[Player] = field(default_factory=lambda: [leonard], metadata={'unit': 'players'})


clippers = Team("clippers", [harden])
clippers1 = Team("clippers")
print(harden.name)
print(leonard.name)
print(leonard.age)

print(clippers.players)
print(clippers1.players)

print(fields(clippers))
print(fields(clippers)[1].metadata)

输出:

James Harden
Kawhi Leonard
18
[Player(name='James Harden', number=1, position='PG', grade='S+', age=34)]
[Player(name='Kawhi Leonard', number=2, position='SF', grade='S+', age=18)]
(Field(name='name',type=<class 'str'>,default=<dataclasses._MISSING_TYPE object at 0x0000029523A65060>,default_factory=<dataclasses._MISSING_TYPE object at 0x0000029523A65060>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({'unit': 'name'}),kw_only=False,_field_type=_FIELD), Field(name='players',type=typing.List[__main__.Player],default=<dataclasses._MISSING_TYPE object at 0x0000029523A65060>,default_factory=<function Team.<lambda> at 0x0000029523B44B80>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({'unit': 'players'}),kw_only=False,_field_type=_FIELD))
{'unit': 'players'}

我们来看一下field对象的签名:

def field(*, default=MISSING, default_factory=MISSING, init=True, repr=True,
          hash=None, compare=True, metadata=None, kw_only=MISSING):
    """Return an object to identify dataclass fields.

    default is the default value of the field.  default_factory is a
    0-argument function called to initialize a field's value.  If init
    is true, the field will be a parameter to the class's __init__()
    function.  If repr is true, the field will be included in the
    object's repr().  If hash is true, the field will be included in the
    object's hash().  If compare is true, the field will be used in
    comparison functions.  metadata, if specified, must be a mapping
    which is stored but not otherwise examined by dataclass.  If kw_only
    is true, the field will become a keyword-only parameter to
    __init__().

    It is an error to specify both default and default_factory.
    """

    if default is not MISSING and default_factory is not MISSING:
        raise ValueError('cannot specify both default and default_factory')
    return Field(default, default_factory, init, repr, hash, compare,
                 metadata, kw_only)

参数 描述 默认值
default 指定字段的默认值。
default_factory 与 default 相似,但是是一个可调用对象,用于提供默认值。每次创建实例时,都会重新调用工厂函数以获取新的默认值。
init 控制是否在__init__方法中包含该字段 True
repr 是否在__repr__()方法中使用字段 True
compare 是否在比较对象时, 包括该字段 True
hash 计算hash时, 是否包括字段 True
metadata 包含字段信息的映射

如不想name加入比较,则可以设置:name: str = field(compare = False)

元数据(metadata)

可以基于元数据进行数据校验:

from dataclasses import dataclass, field, fields
from datetime import datetime


class ValidationError(Exception):
    def __init__(self, field_name, condition, actual_value):
        self.field_name = field_name
        self.condition = condition
        self.actual_value = actual_value
        super().__init__(f"{field_name} validation failed: {condition} (Actual value: {actual_value})")


class Color:
    RED = '\033[91m'
    END = '\033[0m'


@dataclass
class Player:
    name: str = field(default="", metadata={"validation": [lambda x: len(x) == 0]})
    number: int = field(default=0, metadata={"validation": [lambda x: not 0 < x <= 100]})
    position: str = field(default="", metadata={"validation": [lambda x: len(x) == 0]})
    grade: str = field(default="", metadata={"validation": [lambda x: x in {'S+', 'S', 'A', 'B', 'C'}]})
    age: int = field(default=0, metadata={"validation": [lambda x: not 0 < x <= 150]})
    foundation_date: datetime = field(default_factory=datetime.now)

    def validation(self):
        for field_ in fields(self):
            validations = field_.metadata.get("validation", [])
            for validation in validations:
                if validation(getattr(self, field_.name)):
                    raise ValidationError(field_.name, str(validation), getattr(self, field_.name))


harden = Player(name='James Harden', number=13, position='PG', grade='S+', age=32)
bryant = Player(name='Kobe Bryant', number=24, position='SG', grade='S', age=41)

# 无效的数据,引发异常
try:
    harden.validation()
except ValidationError as e:
    print(f"{Color.RED}{e}{Color.END}")

try:
    bryant.validation()
except ValidationError as e:
    print(f"{Color.RED}{e}{Color.END}")

输出:

grade validation failed: <function Player.<lambda> at 0x00000197FD6B4CA0> (Actual value: S+)
grade validation failed: <function Player.<lambda> at 0x00000197FD6B4CA0> (Actual value: S)

自定义属性

通过对field()对象的剖析,我们可以指定属性:是否参与比较,是否参与hash计算等等。

不过我们知道默认的比较顺序,我们也可以通过增加属性以实现按需比较的功能。而这个用于比较的属性位于数据类的第一个属性,并可以借助__post_init__魔法函数实现灵活赋值。

from dataclasses import dataclass, field


@dataclass(order=True)
class Player:
    sort_index: tuple = field(init=False)  # 添加一个 sort_index 字段,并设置为不在 __init__ 方法中初始化
    name: str
    number: int
    position: str
    grade: str
    age: int = 18

    def __post_init__(self):
        self.sort_index = (self.age, self.grade)  # 在 __post_init__ 方法中计算 sort_index


# 示例使用
harden = Player('James Harden', 1, 'PG', 'S+', 34)
bryant = Player(name='Kobe Bryant', number=24, position='PG', grade='S+', age=41)

result = harden < bryant  # 按照 age 进行比较
print(result)  # 输出 True,因为 34 < 41
print(harden.name)  # James Harden
print(bryant.name)  # Kobe Bryant
print(bryant == harden)  # False

不可变数据类

def dataclass(cls=None, /, *, init=True, repr=True, eq=True, order=False,
              unsafe_hash=False, frozen=False, match_args=True,
              kw_only=False, slots=False):

使用dataclass实现的数据类默认是可变的,要使数据类不可变,需要在创建类时设置frozen=True

from dataclasses import dataclass, field


@dataclass(order=True, frozen=True)
class Player:
    name: str
    number: int
    position: str
    grade: str
    age: int = 18


# 示例使用
harden = Player('James Harden', 1, 'PG', 'S+', 34)

harden.age = 33  # dataclasses.FrozenInstanceError: cannot assign to field 'age'

实现数据类去重

unsafe_hash=True时,可以实现数据类的去重。参与的字段同样可由field对象控制。

from dataclasses import dataclass, field


@dataclass(order=True, unsafe_hash=True)
class Player:
    name: str
    number: int
    position: str = field(hash=False)  # 不参与hash
    grade: str
    age: int = 18


# 示例使用
harden = Player('James Harden', 1, 'PG', 'S+', 34)
harden2 = Player('James Harden', 1, 'PG', 'S+', 34)

harden3 = Player('James Harden', 1, 'SG', 'S+', 34)

print({harden, harden2})
print({harden, harden3})

输出:

{Player(name='James Harden', number=1, position='PG', grade='S+', age=34)}
{Player(name='James Harden', number=1, position='PG', grade='S+', age=34), Player(name='James Harden', number=1, position='SG', grade='S+', age=34)}

数据类转换为元组或字典

from dataclasses import dataclass, field, asdict, astuple


@dataclass
class Player:
    name: str
    number: int
    position: str
    grade: str
    age: int = 18
    args: tuple = ()
    kwargs: dict = field(default_factory=dict)


# 示例使用
harden = Player('James Harden', 1, 'PG', 'S+', 34)
bryant = Player(name='Kobe Bryant', number=24, position='PG', grade='S+', args=(1, 2), kwargs={"hello": "world"})

alist = [harden, bryant]
print(sorted(alist, key=lambda x: x.age))

print(asdict(bryant))
print(astuple(harden))

输出:

[Player(name='Kobe Bryant', number=24, position='PG', grade='S+', age=18, args=(1, 2), kwargs={'hello': 'world'}), Player(name='James Harden', number=1, position='PG', grade='S+', age=34, args=(), kwargs={})]
{'name': 'Kobe Bryant', 'number': 24, 'position': 'PG', 'grade': 'S+', 'age': 18, 'args': (1, 2), 'kwargs': {'hello': 'world'}}
('James Harden', 1, 'PG', 'S+', 34, (), {})

replace方法

这个方法允许你创建一个新的实例,其中某些字段的值被更改,而其他字段的值保持不变。

from dataclasses import dataclass, field, fields, replace
from typing import List


@dataclass
class Player:
    name: str
    number: int
    position: str
    grade: str
    age: int = 18


# 示例使用
harden = Player('James Harden', 1, 'PG', 'S+', 34)
leonard = Player(name='Kawhi Leonard', number=2, position='SF', grade='S+')


@dataclass
class Team:
    name: str = field(metadata={'unit': 'name'})
    players: List[Player] = field(default_factory=lambda: [leonard], metadata={'unit': 'players'})


clippers = Team("clippers", [leonard])

# 使用 replace() 替换 Team 实例中的字段值
new_clippers = replace(clippers, name="new_clippers", players=[leonard, harden])

print("Original Clippers:", clippers)
print("New Clippers:", new_clippers)

输出:

Original Clippers: Team(name='clippers', players=[Player(name='Kawhi Leonard', number=2, position='SF', grade='S+', age=18)])
New Clippers: Team(name='new_clippers', players=[Player(name='Kawhi Leonard', number=2, position='SF', grade='S+', age=18), Player(name='James Harden', number=1, position='PG', grade='S+', age=34)])

Python中dataclass的应用示例

数据提取[参数校验]:

dataclass数据类可以配合一些校验工具包和数据提取工具包以实现数据提取或参数校验的工作,以下是配合marshmallowdesert实现数据校验提取工作的示例:

import requests
from dataclasses import dataclass
import dataclasses
from marshmallow import fields, EXCLUDE, validate
import desert


@dataclass
class Activity:
    activity: str
    participants: int = dataclasses.field(metadata=desert.metadata(
        fields.Int(required=True,
                   validate=validate.Range(min=1, max=50,
                                           error="Participants must be between 1 and 50 people"))
    ))
    price: float = dataclasses.field(metadata=desert.metadata(
        fields.Float(required=True,
                     validate=validate.Range(
                         min=0, max=50,
                         error="Price must be between $1 and $50"))
    ))

    def __post_init__(self):
        self.price = self.price * 100


def get_activity():
    # resp = requests.get("https://www.boredapi.com/api/activity").json()
    resp = {
        "activity": "Improve your touch typing",
        "type": "busywork",
        "participants": 1,
        "price": 1.0,
        # "price": 51,
        "link": "https://en.wikipedia.org/wiki/Touch_typing",
        "key": "2526437",
        "accessibility": 0.8
    }
    # 只提取关心的部分,未知内容选择忽略
    schema = desert.schema(Activity, meta={"unknown": EXCLUDE})
    return schema.load(resp)


print(get_activity())

输出:

Activity(activity='Improve your touch typing', participants=1, price=100.0)

如果你修改一下resp的值,比如使price大于50,这时候你会得到校验失败的提示:

marshmallow.exceptions.ValidationError: {'price': ['Price must be between $1 and $50']}

存储数据的简单对象

dataclasses 在许多情境下都表现出色,尤其是在定义用于存储数据的简单对象时。它特别适用于处理配置信息、数据传输对象(DTO)、领域对象以及其他仅包含数据的结构。

需求:程序退出前自动持久化配置对象到配置文件。

import json
import atexit
import logging
import threading
from pathlib import Path
from dataclasses import dataclass, asdict


@dataclass
class Config(object):
    name: str = "mysql"
    port: int = 3306

    _instance = None
    _lock = threading.Lock()
    _registered = False  # 新增类属性

    def __new__(cls, *args, **kw):
        with cls._lock:
            if cls._instance is None:
                cls._instance = super().__new__(cls)
            return cls._instance

    def load_from_file(self, file_path):
        """从配置文件加载配置,如果文件不存在或加载失败,保持默认值。
        """
        if file_path.exists():
            try:
                with file_path.open() as f:
                    json_data = json.load(f)
                    for key, value in json_data.items():
                        setattr(self, key, value)
            except Exception as err:
                logging.error(f"Failed to load config from file: {err}")
        else:
            logging.warning(f"Config file '{file_path}' not exists. Using default values.")

    def save_to_file(self, file_path):
        """保存配置到文件
        """
        json_str = json.dumps(asdict(self), indent=4)
        with file_path.open('w') as f:
            logging.warning(f"Saving configs to '{file_path}'")
            f.write(json_str)

    @classmethod
    def register_atexit(cls):
        """注册在程序退出时保存配置到配置文件"""
        with cls._lock:
            if not cls._registered:
                atexit.register(cls._instance.save_to_file, Path("./config.json"))
                cls._registered = True

    # 读取配置文件和保存配置的逻辑分离
    def __post_init__(self):
        config_file = Path("./config.json")

        # 从配置文件加载配置
        self.load_from_file(config_file)

        # 注册在程序退出时保存配置到配置文件
        self.register_atexit()


if __name__ == "__main__":
    # 创建一个 Config 实例
    config_instance = Config(name="redis", port=6379)

    # 打印当前配置
    print("Current Config:", config_instance)

    # 修改配置并再次打印
    config_instance.port = 8080
    print("Updated Config:", config_instance)

    # 创建另一个 Config 实例,演示单例模式
    another_instance = Config()
    print("Another Instance Config:", another_instance)

    # 保存配置到文件
    another_instance.save_to_file(Path("./another_config.json"))

    # 从文件加载配置
    another_instance.load_from_file(Path("./another_config.json"))
    print("Loaded Config from File:", another_instance)

输出:

Current Config: Config(name='mysql', port=3306)
Updated Config: Config(name='mysql', port=8080)
Another Instance Config: Config(name='mysql', port=3306)
Loaded Config from File: Config(name='mysql', port=3306)
WARNING:root:Saving configs to 'another_config.json'
WARNING:root:Saving configs to 'config.json'

让函数返回值更明确清晰

from dataclasses import dataclass
from enum import Enum
from typing import Tuple, Dict, Union

class Grade(Enum):
    S_PLUS = 'S+'
    # 定义其他等级...

@dataclass
class Player:
    name: str
    number: int
    position: str
    grade: Grade
    age: int = 18

def create_player(name: str, number: int, position: str, grade: Grade, age: int) -> Player:
    return Player(name, number, position, grade, age)

# 示例使用
harden = create_player('詹姆斯·哈登', 1, '控球后卫', Grade.S_PLUS, 34)
bryant = create_player('科比·布莱恩特', 24, '得分后卫', Grade.S_PLUS, 41)

print(harden)
print(bryant)

输出:

Player(name='詹姆斯·哈登', number=1, position='控球后卫', grade=<Grade.S_PLUS: 'S+'>, age=34)
Player(name='科比·布莱恩特', number=24, position='得分后卫', grade=<Grade.S_PLUS: 'S+'>, age=41)

dataclasses 的替代方案

dataclasses提供了许多方便的功能,但是PEP 557中还提到一个同样强大的数据类库attrs,并且这个库支持验证器等功能。

import attr

@attr.s
class Point:
    x = attr.ib(type=int)
    y = attr.ib(type=int)

p = Point(1, 2)
print(p)  # Output: Point(x=1, y=2)

在选择使用dataclasses还是attrs时,取决于项目的需求和个人喜好。dataclasses更简单直观,而attrs提供了更多的扩展性。如果只需要一些基本的自动生成特殊方法的功能,dataclasses是个不错的选择。如果你需要更高级的功能和更多的定制选项,可以考虑使用attrs

总结

dataclass 是一个强大的工具,使得创建和管理类变得更加简单和高效。

在实际应用中,特别是在数据处理和对象建模方面,使用@dataclass装饰器能够极大地提升代码的清晰度,减少冗余的样板代码。

深入理解dataclass的各项特性将帮助我们更灵活地运用这一功能,从而提高代码的质量和开发效率。

更多使用技巧请查阅官方文档!


如果你觉得文章还不错,请大家点赞、关注、分享、在看下,因为这将是我持续输出更多优质文章的最强动力!

参考

https://peps.python.org/pep-0557/
https://realpython.com/python-data-classes/#more-flexible-data-classes
https://docs.python.org/zh-cn/3/library/dataclasses.html#module-contents
https://www.pythontutorial.net/python-oop/python-dataclass/
https://github.com/python-desert/desert
https://glyph.twistedmatrix.com/2016/08/attrs.html
https://github.com/pviafore/RobustPython

你可能感兴趣的:(python基础,python,开发语言,后端)