Python3实现哈希散列算法,包含MD5和sha256。
Hash函数算法编写
本实验是使用python来编写MD5和SHA256加密函数,并对加密函数的正确性进行验证。
验证的方式是通过和已有的标准库加密结果进行比较,如果结果相同,则加密函数正确。
1.实验目的
- 熟悉MD5和SHA256加密函数的原理和应用
- 实现MD5和SHA256加密函数并验证
2.实验工具
- Jupyter Notebook
- Python3.5
3.实验环境
- Ubuntu16.04LTS操作系统
- Python3标准库
4.实验步骤
4.1 回顾课程,查阅资料
4.2 熟悉MD5的原理
MD5(Message-Digest Algorithm)消息摘要算法是一种广泛使用的密码散列函数,可以产生出一个128位(16字节)的散列值(hash value),用于确保信息传输完整一致。
MD5将可变长度的消息处理成128位的固定长度输出。输入消息被分解成512位块(16个32位字)块;该消息被填充以使其长度可以被512整除。
填充的工作原理如下:
- 首先将单个位1附加到消息的末尾
- 接着填充0,直到消息长度比512的整数倍少64
- 在接下来64位中填充真实消息的长度值
主MD5算法在128位状态下运行,分为四个32位字,分别表示为A,B,C和D.这些字被初始化为某些固定的常量。
主算法然后依次使用每个512位消息块来修改状态。
消息块的处理由四个相似的阶段组成,称为循环;每轮由基于非线性函数F,模加法和左旋等16种类似操作组成。
- 图1说明了一轮内的一个操作流程。
- 图2表示F有四种可能的功能;每一轮使用不同的一个:
下图是非线性函数的四种可能
4.3 查阅得MD5算法的伪代码
//Note: All variables are unsigned 32 bit and wrap modulo 2^32 when calculating
var int[64] s, K
var int i
//s specifies the per-round shift amounts
s[ 0..15] := { 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22 }
s[16..31] := { 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20 }
s[32..47] := { 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23 }
s[48..63] := { 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21 }
//Use binary integer part of the sines of integers (Radians) as constants:
for i from 0 to 63
K[i] := floor(232 × abs(sin(i + 1)))
end for
//Initialize variables:
var int a0 := 0x67452301 //A
var int b0 := 0xefcdab89 //B
var int c0 := 0x98badcfe //C
var int d0 := 0x10325476 //D
//Pre-processing: adding a single 1 bit
append "1" bit to message
// Notice: the input bytes are considered as bits strings,
// where the first bit is the most significant bit of the byte.[48]
//Pre-processing: padding with zeros
append "0" bit until message length in bits ≡ 448 (mod 512)
append original length in bits mod 264 to message
//Process the message in successive 512-bit chunks:
for each 512-bit chunk of padded message
break chunk into sixteen 32-bit words M[j], 0 ≤ j ≤ 15
//Initialize hash value for this chunk:
var int A := a0
var int B := b0
var int C := c0
var int D := d0
//Main loop:
for i from 0 to 63
var int F, g
if 0 ≤ i ≤ 15 then
F := (B and C) or ((not B) and D)
g := i
else if 16 ≤ i ≤ 31
F := (D and B) or ((not D) and C)
g := (5×i + 1) mod 16
else if 32 ≤ i ≤ 47
F := B xor C xor D
g := (3×i + 5) mod 16
else if 48 ≤ i ≤ 63
F := C xor (B or (not D))
g := (7×i) mod 16
//Be wary of the below definitions of a,b,c,d
F := F + A + K[i] + M[g]
A := D
D := C
C := B
B := B + leftrotate(F, s[i])
end for
//Add this chunk's hash to result so far:
a0 := a0 + A
b0 := b0 + B
c0 := c0 + C
d0 := d0 + D
end for
var char digest[16] := a0 append b0 append c0 append d0 //(Output is in little-endian)
//leftrotate function definition
leftrotate (x, c)
return (x << c) binary or (x >> (32-c));
4.4 编写MD5算法测试用例
Python标准库hashlib中提供了生成MD5值的方法,是通过C语言实现的,我们用自己实现的MD5函数计算出MD5值,在和标准库的结果进行比较,如果结果相同,那么证明了MD5加密是正确的。
注意:
- 只实现了对ASCII码字符串的处理
- demo中存储了测试用例
- my_md5()返回自己生成的MD5值
- true_md5()返回Python标准库生成的MD5值
- 若两者结果不同,抛出AssertionError异常
测试程序如下:
import hashlib
def my_md5(message):
pass
def true_md5(message):
m = hashlib.md5()
m.update(message)
return m.hexdigest()
if __name__=='__main__':
demo = [b"", b"a", b"abc", b"message digest", b"abcdefghijklmnopqrstuvwxyz",
b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789",
b"12345678901234567890123456789012345678901234567890123456789012345678901234567890"]
for message in demo:
print(my_md5(message),' <= "',message.decode('ascii'),'"', sep='')
assert true_md5(message)==my_md5(message) # 若和标准库中不同,会抛出异常
4.5 查阅sha256算法原理和伪代码
sha256是SHA2(Secure Hash Algorithm 2)算法的一个变体,具体内容。
篇幅原因不再贴sha256的伪代码,维基百科中有详细说明。
4.6 编写sha256算法的测试代码
import hashlib
def my_sha256(message):
pass
def true_sha256(message):
m = hashlib.sha256()
m.update(message)
return m.hexdigest()
if __name__=='__main__':
demo = [b"", b"a", b"abc", b"message digest", b"abcdefghijklmnopqrstuvwxyz",
b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789",
b"12345678901234567890123456789012345678901234567890123456789012345678901234567890"]
for message in demo:
print(my_sha256(message),' <= "',message.decode('ascii'),'"', sep='')
assert true_sha256(message)==my_sha256(message) # 若和标准库中不同,会抛出异常
4.7 编写MD5和SHA256的python实现,并进行测试验证
代码和测试结果在报告最后
4.8 总结实验,编写实验报告
5. 实验总结
- 需要善于借鉴前人经验,搜集资料
- 实验只支持ascii码,对其他字符编码没有解决
- 利用优秀的工具,如Jupyter可以提高学习效率
6. 实验完整代码
5.1 MD5实现
# MD5 实现及其验证
import math
import hashlib
rotate_amounts = [7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22,
5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20,
4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23,
6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21]
constants = [int(abs(math.sin(i+1)) * 2**32) & 0xFFFFFFFF for i in range(64)]
# A B C D
init_values = [0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476]
# 非线性函数
functions = 16*[lambda b, c, d: (b & c) | (~b & d)] + \
16*[lambda b, c, d: (d & b) | (~d & c)] + \
16*[lambda b, c, d: b ^ c ^ d] + \
16*[lambda b, c, d: c ^ (b | ~d)]
index_functions = 16*[lambda i: i] + \
16*[lambda i: (5*i + 1)%16] + \
16*[lambda i: (3*i + 5)%16] + \
16*[lambda i: (7*i)%16]
# 对x左移amount位
def left_rotate(x, amount):
x &= 0xFFFFFFFF
return ((x<<amount) | (x>>(32-amount))) & 0xFFFFFFFF
def md5(message):
message = bytearray(message) #copy our input into a mutable buffer
orig_len_in_bits = (8 * len(message)) & 0xffffffffffffffff
message.append(0x80)
while len(message)%64 != 56:
message.append(0)
message += orig_len_in_bits.to_bytes(8, byteorder='little')
hash_pieces = init_values[:]
for chunk_ofst in range(0, len(message), 64):
a, b, c, d = hash_pieces
chunk = message[chunk_ofst:chunk_ofst+64]
for i in range(64):
f = functions[i](b, c, d)
g = index_functions[i](i)
to_rotate = a + f + constants[i] + int.from_bytes(chunk[4*g:4*g+4], byteorder='little')
new_b = (b + left_rotate(to_rotate, rotate_amounts[i])) & 0xFFFFFFFF
a, b, c, d = d, new_b, b, c
for i, val in enumerate([a, b, c, d]):
hash_pieces[i] += val
hash_pieces[i] &= 0xFFFFFFFF
return sum(x<<(32*i) for i, x in enumerate(hash_pieces))
def md5_to_hex(digest):
raw = digest.to_bytes(16, byteorder='little')
return '{:032x}'.format(int.from_bytes(raw, byteorder='big'))
def true_md5(message):
m = hashlib.md5()
m.update(message)
return m.hexdigest()
def my_md5(message):
return md5_to_hex(md5(message))
if __name__=='__main__':
demo = [b"", b"a", b"abc", b"message digest", b"abcdefghijklmnopqrstuvwxyz",
b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789",
b"12345678901234567890123456789012345678901234567890123456789012345678901234567890"]
for message in demo:
print(my_md5(message),' <= "',message.decode('ascii'),'"', sep='')
assert true_md5(message)==my_md5(message) # 若和标准库中不同,会抛出异常
print('\nMD5测试全部通过')
# d41d8cd98f00b204e9800998ecf8427e <= ""
# 0cc175b9c0f1b6a831c399e269772661 <= "a"
# 900150983cd24fb0d6963f7d28e17f72 <= "abc"
# f96b697d7cb7938d525a2f31aaf161d0 <= "message digest"
# c3fcd3d76192e4007dfb496cca67e13b <= "abcdefghijklmnopqrstuvwxyz"
# d174ab98d277d9f5a5611c2c9f419d9f <= "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
# 57edf4a22be3c955ac49da2e2107b67a <= "12345678901234567890123456789012345678901234567890123456789012345678901234567890"
#
# MD5测试全部通过
5.2 Sha256实现
import copy
import struct
import binascii
import hashlib
F32 = 0xFFFFFFFF
_k = [0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3,
0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc,
0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7,
0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13,
0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3,
0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5,
0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208,
0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2]
_h = [0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a,
0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19]
def _pad(msglen):
mdi = msglen & 0x3F
length = struct.pack('!Q', msglen << 3)
if mdi < 56:
padlen = 55 - mdi
else:
padlen = 119 - mdi
return b'\x80' + (b'\x00' * padlen) + length
def _rotr(x, y):
return ((x >> y) | (x << (32 - y))) & F32
def _maj(x, y, z):
return (x & y) ^ (x & z) ^ (y & z)
def _ch(x, y, z):
return (x & y) ^ ((~x) & z)
class SHA256:
_output_size = 8
blocksize = 1
block_size = 64
digest_size = 32
def __init__(self, m=None):
self._counter = 0
self._cache = b''
self._k = copy.deepcopy(_k)
self._h = copy.deepcopy(_h)
self.update(m)
def _compress(self, c):
w = [0] * 64
w[0:16] = struct.unpack('!16L', c)
for i in range(16, 64):
s0 = _rotr(w[i-15], 7) ^ _rotr(w[i-15], 18) ^ (w[i-15] >> 3)
s1 = _rotr(w[i-2], 17) ^ _rotr(w[i-2], 19) ^ (w[i-2] >> 10)
w[i] = (w[i-16] + s0 + w[i-7] + s1) & F32
a, b, c, d, e, f, g, h = self._h
for i in range(64):
s0 = _rotr(a, 2) ^ _rotr(a, 13) ^ _rotr(a, 22)
t2 = s0 + _maj(a, b, c)
s1 = _rotr(e, 6) ^ _rotr(e, 11) ^ _rotr(e, 25)
t1 = h + s1 + _ch(e, f, g) + self._k[i] + w[i]
h = g
g = f
f = e
e = (d + t1) & F32
d = c
c = b
b = a
a = (t1 + t2) & F32
for i, (x, y) in enumerate(zip(self._h, [a, b, c, d, e, f, g, h])):
self._h[i] = (x + y) & F32
def update(self, m):
if not m:
return
self._counter += len(m)
m = self._cache + m
for i in range(0, len(m) // 64):
self._compress(m[64 * i:64 * (i + 1)])
self._cache = m[-(len(m) % 64):]
def digest(self):
r = copy.deepcopy(self)
r.update(_pad(self._counter))
data = [struct.pack('!L', i) for i in r._h[:self._output_size]]
return b''.join(data)
def hexdigest(self):
return binascii.hexlify(self.digest()).decode('ascii')
def true_sha256(message):
m = hashlib.sha256()
m.update(message)
return m.hexdigest()
def my_sha256(message):
return SHA256(message).hexdigest()
if __name__=='__main__':
demo = [b"", b"a", b"abc", b"message digest", b"abcdefghijklmnopqrstuvwxyz",
b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789",
b"12345678901234567890123456789012345678901234567890123456789012345678901234567890"]
for message in demo:
print(my_sha256(message),' <= "',message.decode('ascii'),'"', sep='')
assert true_sha256(message)==my_sha256(message) # 若和标准库中不同,会抛出异常
print('\nSHA256测试全部通过')
# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 <= ""
# ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb <= "a"
# ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad <= "abc"
# f7846f55cf23e14eebeab5b4e1550cad5b509e3348fbc4efa3a1413d393cb650 <= "message digest"
# 71c480df93d6ae2f1efad1447c66c9525e316218cf51fc8d9ed832f2daf18b73 <= "abcdefghijklmnopqrstuvwxyz"
# db4bfcbd4da0cd85a60c3c37d3fbd8805c77f15fc6b1fdfe614ee0a7c8fdb4c0 <= "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
# f371bc4a311f2b009eef952dd83ca80e2b60026c8e935592d0f9c308453c813e <= "12345678901234567890123456789012345678901234567890123456789012345678901234567890"
#
# SHA256测试全部通过