- In the hashlib.md5(data) function, the type of the data parameter should be bytes.
Data must be converted to bytes before hash data must be converted to bytes before hashfrom hashlib import md5 c = md5("helloworld") # TypeError: Unicode-objects must be encoded before hashing c = md5("helloworld".encode("utf-8"))
- Function description
- hashlib.md5(data)
- digest(): returns the hexadecimal representation of the encrypted string, with the number of digits as digest_size
- digest_size
- Program instance
import numpy as np import hashlib a = hashlib.md5("64".encode("utf-8")) print(a) print(a.digest()) print(a.digest_size) print(a.digest()[-1]) b = hashlib.md5("64".encode("utf-8")) print(a.digest() == b.digest())
Program output:
first output:
<md5 HASH object @ 0x000001F196B02418>
b'\xea]/\x1cF\x08#.\x07\xd3\xaa=\x99\x8eQ5'
16
53
Truesecond output:
<md5 HASH object @ 0x0000016759262418>
b'\xea]/\x1cF\x08#.\x07\xd3\xaa=\x99\x8eQ5'
16
53
True
Be careful:
- Although the output object a of the two running programs is different, the output of digest() is exactly the same
- The digest() of a and b are exactly the same,
-
Using the above two points, we can make a simple password encryption system to increase the security of the password
# table stores the md5 encryption value of {nickname: md5 ﹣ password}, nickname and password # Checking accounts def in_database(name, passw): if hashlib.md5(passw.encode("utf-8")).digest() == table[name].digest(): print('Account in system') # Entry procedure else: print('Account and password are inconsistent')
-
This program uses this haslib library to determine whether an item is to enter the test set
If the last item of the hash value is less than 256 * test ratio, it will be put into the test set. date is the panda object
import hashlib def test_set_check(identifier, test_ratio, hash): return hash(np.int64(identifier)).digest()[-1] < 256 * test_ratio def split_train_test_by_id(data, test_ratio, id_column, hash=hashlib.md5): ids = data[id_column] in_test_set = ids.apply(lambda id_: test_set_check(id_, test_ratio, hash)) return data.loc[~in_test_set], data.loc[in_test_set]