python data class: dataclass

Keywords: Python

Original text: https://www.cnblogs.com/dan-baishucaizi/p/14786600.html

1. Introduction to dataclass

dataclass is a new attribute (class decorator) from Python 3.7. dataclass refers to "a variable namedtuple with default value". In essence, it is a class. Its attributes can be accessed directly without special circumstances. There are class methods related to attributes in the class. In short, it is a class containing data and its operation methods.

The difference between dataclass and ordinary class

  • Compared with ordinary classes, dataclass usually does not contain private attributes, which can be accessed directly (or private);
  • The repr() function converts the object into a form that can be read by the interpreter; the repr method of dataclass usually has its fixed format and prints the class name, attribute name and attribute value;
  • dataclass has _ eq and hash;
  • Data class has a single and fixed construction mode. It sometimes needs to overload operators as needed, which is usually not required for ordinary classes.

Note: namedtuple is a subclass of tuple, and its elements are named!

Top  ---  Bottom

2. Introducing the dataclass decorator

Common class generation methods

class elfin:
    def __init__(self, name, age):
        self.name = name
        self.age = age

Using the dataclass decorator

@dataclass
class elfin:
    name: str
    age: int

We can use @ dataclass to achieve the same effect as ordinary classes, so the code is more concise!

__post_init__method

If an attribute needs to be processed after init, it can be placed in _post_init _!

@dataclass
class elfin:
    name: str
    age: int
    
    def __post_init__(self):
        if type(self.name) is str:
            self.identity = identity_dict[self.name]

Test the above case:

>>> from dataclasses import dataclass
>>> identity_dict = {
... "firstelfin": "boss",
... "secondelfin": "master",
... "thirdelfin": "captain"
... }
>>> @dataclass
... class Elfin:
...     name: str
...     age: int
...
...     def __post_init__(self):
...         if type(self.name) is str:
...             self.identity = identity_dict[self.name]
>>> print(Elfin)
... Out[1]: <class '__main__.Elfin'>
>>> elfin_ins = Elfin("firstelfin", 23)
>>> elfin_ins
... Out[2]: Elfin(name='firstelfin', age=23)
>>> elfin_ins.identity
... Out[3]: 'boss'

The above case shows that even if the init part does not generate the identity attribute, the instance can be obtained!

Next, we will show some knowledge points of the dataclass decorator.

Top  ---  Bottom

3. dataclass decorator options

Using the options of the dataclass decorator, we can customize the data classes we want. The default options are:

@dataclass(init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)
class Elfin:
    pass

Parameter options description of decorator:

  • Init controls whether to generate _init _method;
  • Repr controls whether to generate _repr methods;
  • EQ controls whether to generate _ eq_ methods, which are used to judge whether instances are equal;
  • Order controls whether to create four size relationship methods: _lt_, _le_, _gt_, _ge_; if order is True, eq cannot be False, and the order method cannot be customized.
  • unsafe_hash controls how hashes are generated.
    • When unsafe_hash is False, the _hash _methodwill be generated according to eq and frozen parameters;
      1. When eq and frozen are True, _hash _ will be generated;
      2. eq is True, frozen is False, and _hash _ will be set to None;
      3. eq is False, frozen is True, and _hash _ _willuse the same name attribute of object (superclass) (usually hash of object id)
    • When unsafe_hash is True, it will generate _hashaccording to the properties of the class. According to its name, this is unsafe because the properties are variable, which will lead to inconsistent hash. Of course, you can ensure that the object properties will not change, and you can also set it to True.
  • frozen controls whether to freeze the assignment to the field. When set to True, the object will be immutable. Because it is immutable, if _setattr _ and __delattr are set, it will cause TypeError error.

We have actually seen the effect of the first two parameters in the previous chapter. Let's check the parameters eq and order:

>>> @dataclass(init=True, repr=True, eq=True, order=True)
... class Elfin:
...     name: str
...     age: int
...
...     def __post_init__(self):
...         if type(self.name) is str:
...             self.identity = identity_dict[self.name]
>>> elfin_ins1 = Elfin("thirdelfin", 18)
>>> elfin_ins2 = Elfin("secondelfin", 20)
>>> elfin_ins1 == elfin_ins2
... Out[4]: False
>>> elfin_ins1 >= elfin_ins2
... Out[5]: True
>>> 

It can be found that we can compare the sizes between instances! At the same time, we know that ordinary classes compare sizes differently:

>>> class A:
... def __init__(self, age):
...     self.age = age
>>> a1 = A(20)
>>> a2 = A(30)
>>> a1 > a2
... TypeError                  Traceback (most recent call last)
... <ipython-input-24-854e76ddfa09> in <module>
... ----> 1 a1 > a2
...
... TypeError: '>' not supported between instances of 'A' and 'A'

We mentioned field above. In fact, all data class attributes are controlled by field, which represents a data entity and its meta information. Let's take a look at dataclasses.field.

Top  ---  Bottom

4. The cornerstone of data classes -- dataclasses.field

field is defined as follows:

def field(*, default=MISSING, default_factory=MISSING, init=True, repr=True,
          hash=None, compare=True, metadata=None):
    if default is not MISSING and default_factory is not MISSING:
        raise ValueError('cannot specify both default and default_factory')
    return Field(default, default_factory, init, repr, hash, compare,
                 metadata)

Generally, we don't need to use it directly. The decorator will automatically generate fields according to the type annotations we give, but sometimes we need to customize this process, so dataclasses.field is particularly important!

Parameter Description:

  • Default: if it is not specified when calling, it defaults to None, which controls the default value of field;

  • default_factory: controls how to generate values, it receives a callable object without parameters or all default parameters, then calls the initial value of the object field, and then copies default to callable object.

  • Init: controls whether this parameter is generated in init. In the case in the previous chapter, we want to generate the self.identity attribute, but we don't want to pass it in init, so we can use field.

    >>> @dataclass(init=True, repr=True, eq=True, order=True)
    ... class Elfin:
    ...     name: str
    ...     age: int
    ...    	identity: str = field(init=False)
    ...
    ...     def __post_init__(self):
    ...         if type(self.name) is str:
    ...             self.identity = identity_dict[self.name]
    >>> elfin_ins3 = Elfin("firstelfin", 20)
    >>> elfin_ins3
    ... Out[6]: Elfin(name='firstelfin', age=20, identity='boss')
    
  • Repr: indicates whether the field is included in the output of the repr. It is output by default, as in the above case.

  • compare: whether to participate in the comparison and calculation of hash values.

  • Hash: whether to participate in the comparison and calculation of hash values.

  • metadata is not used by the dataclass itself. It is usually used when third-party components get some meta information from it, so we don't need to use this parameter.

Only called properties can be initialized

If the type annotation of a field is specified as dataclasses.InitVar, the field will only be used during initialization (_ init_ and _ post_init_). When the initialization is completed, accessing the field will return a dataclasses.Field object instead of the original value of the field, that is, the field is no longer an accessible data object.

>>> from dataclasses import InitVar
>>> @dataclass(init=True, repr=True, eq=True, order=True)
... class Elfin:
...     name: str
...     age: int
...    	identity: InitVar[str] = None
...
...     def __post_init__(self, identity):
...         if type(self.name) is str:
...             self.identity = identity_dict[self.name]
>>> elfin_ins3 = Elfin("firstelfin", 20)
>>> elfin_ins3
... Out[7]: Elfin(name='firstelfin', age=20)
>>> elfin_ins3.identity
>>>

Note that the elfin_ins3.identity description here does not return. In fact, it should be "boss", but we can't access it.

Top  ---  Bottom

5. Common functions of dataclass

5.1 convert data to dictionary dataclasses.asdict

>>> from dataclasses import asdict
>>> asdict(elfin_ins3)
... Out[8]: {'name': 'firstelfin', 'age': 20}

5.2 converting data to tuple dataclasses.asuple

>>> from dataclasses import astuple
>>> astuple(elfin_ins3)
... Out[9]: ('firstelfin', 20)

5.3 judge whether it is a dataclass class

>>> from dataclasses import is_dataclass
>>> is_dataclass(Elfin)
... Out[10]: True
>>> is_dataclass(elfin_ins3)
... Out[11]: True

Top  ---  Bottom

6. dataclass inheritance

One of the main reasons why Python 3.7 introduces data classes is that compared with namedtuple, data classes can enjoy the convenience of inheritance.

The dataclass decorator will check all base classes of the current class. If a dataclass is found, its properties will be added to the current class in order, and then the field of the current class will be processed. All generated methods will also be processed according to this process. Therefore, if the field in the subclass has the same name as the base class, the subclass will unconditionally overwrite the base class. The subclass will be processed according to all Field regenerates a constructor and initializes the base class in it.

Case:

>>> @dataclass(init=True, repr=True, eq=True, order=True)
... class Elfin:
...     name: str = "firstelfin"
...     age: int = 20
...    	identity: InitVar[str] = None
...
...     def __post_init__(self, identity):
...         if type(self.name) is str:
...             self.identity = identity_dict[self.name]
>>> @dataclass
... class Wude(Elfin):
...     age: int = 68
>>> Wude()
... Out[11]: Wude(name='firstelfin', age=68)
>>> 

As can be seen from the above, Wude class inherits the name attribute of Elfin class, and the age in the instance overrides the age definition in Elfin.

Top  ---  Bottom

7. Summary

Rational use of dataclass will greatly reduce the burden of development and liberate us from a lot of repetitive work. This is the charm of dataclass, but there are always traps behind the charm. Finally, I want to put forward some precautions:

  • The data class is usually unhashable. Because the _hash _ _generatedby default is None, it cannot be used as the key of the dictionary. If there is such a requirement, you should specify your data class as frozen   dataclass
  • Be careful when you define a method with the same name as the one generated by dataclass
  • When using variable types (such as list), you should consider using the default_factory of the field
  • The properties of data classes are public. If you have properties that only need to be used during initialization and do not need to be accessed at other times, please use dataclasses.InitVar

As long as we avoid these pitfalls, dataclass can certainly become a powerful tool to improve productivity.

Posted by Dimitri89 on Sat, 27 Nov 2021 20:31:31 -0800