Python3实现哈希散列算法,包含MD5和sha256。

Hash函数算法编写

本实验是使用python来编写MD5和SHA256加密函数,并对加密函数的正确性进行验证。
验证的方式是通过和已有的标准库加密结果进行比较,如果结果相同,则加密函数正确。

1.实验目的

  1. 熟悉MD5和SHA256加密函数的原理和应用
  2. 实现MD5和SHA256加密函数并验证

2.实验工具

  • Jupyter Notebook
  • Python3.5

3.实验环境

  • Ubuntu16.04LTS操作系统
  • Python3标准库

4.实验步骤

4.1 回顾课程,查阅资料

4.2 熟悉MD5的原理

MD5(Message-Digest Algorithm)消息摘要算法是一种广泛使用的密码散列函数,可以产生出一个128位(16字节)的散列值(hash value),用于确保信息传输完整一致。
MD5将可变长度的消息处理成128位的固定长度输出。输入消息被分解成512位块(16个32位字)块;该消息被填充以使其长度可以被512整除。
填充的工作原理如下:

  • 首先将单个位1附加到消息的末尾
  • 接着填充0,直到消息长度比512的整数倍少64
  • 在接下来64位中填充真实消息的长度值

主MD5算法在128位状态下运行,分为四个32位字,分别表示为A,B,C和D.这些字被初始化为某些固定的常量。
主算法然后依次使用每个512位消息块来修改状态。
消息块的处理由四个相似的阶段组成,称为循环;每轮由基于非线性函数F,模加法和左旋等16种类似操作组成。

  • 图1说明了一轮内的一个操作流程。
  • 图2表示F有四种可能的功能;每一轮使用不同的一个:

Drawing

下图是非线性函数的四种可能

Drawing

4.3 查阅得MD5算法的伪代码

//Note: All variables are unsigned 32 bit and wrap modulo 2^32 when calculating
var int[64] s, K
var int i

//s specifies the per-round shift amounts
s[ 0..15] := { 7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22 }
s[16..31] := { 5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20 }
s[32..47] := { 4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23 }
s[48..63] := { 6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21 }

//Use binary integer part of the sines of integers (Radians) as constants:
for i from 0 to 63
    K[i] := floor(232 × abs(sin(i + 1)))
end for

//Initialize variables:
var int a0 := 0x67452301   //A
var int b0 := 0xefcdab89   //B
var int c0 := 0x98badcfe   //C
var int d0 := 0x10325476   //D

//Pre-processing: adding a single 1 bit
append "1" bit to message    
// Notice: the input bytes are considered as bits strings,
//  where the first bit is the most significant bit of the byte.[48]

//Pre-processing: padding with zeros
append "0" bit until message length in bits  448 (mod 512)
append original length in bits mod 264 to message

//Process the message in successive 512-bit chunks:
for each 512-bit chunk of padded message
    break chunk into sixteen 32-bit words M[j], 0  j  15
//Initialize hash value for this chunk:
    var int A := a0
    var int B := b0
    var int C := c0
    var int D := d0
//Main loop:
    for i from 0 to 63
        var int F, g
        if 0  i  15 then
            F := (B and C) or ((not B) and D)
            g := i
        else if 16  i  31
            F := (D and B) or ((not D) and C)
            g := (5×i + 1) mod 16
        else if 32  i  47
            F := B xor C xor D
            g := (3×i + 5) mod 16
        else if 48  i  63
            F := C xor (B or (not D))
            g := (7×i) mod 16
//Be wary of the below definitions of a,b,c,d
        F := F + A + K[i] + M[g]
        A := D
        D := C
        C := B
        B := B + leftrotate(F, s[i])
    end for
//Add this chunk's hash to result so far:
    a0 := a0 + A
    b0 := b0 + B
    c0 := c0 + C
    d0 := d0 + D
end for

var char digest[16] := a0 append b0 append c0 append d0 //(Output is in little-endian)

//leftrotate function definition
leftrotate (x, c)
    return (x << c) binary or (x >> (32-c));

4.4 编写MD5算法测试用例

Python标准库hashlib中提供了生成MD5值的方法,是通过C语言实现的,我们用自己实现的MD5函数计算出MD5值,在和标准库的结果进行比较,如果结果相同,那么证明了MD5加密是正确的。
注意:

  • 只实现了对ASCII码字符串的处理
  • demo中存储了测试用例
  • my_md5()返回自己生成的MD5值
  • true_md5()返回Python标准库生成的MD5值
  • 若两者结果不同,抛出AssertionError异常

测试程序如下:

import hashlib

def my_md5(message):
    pass

def true_md5(message):
    m = hashlib.md5()
    m.update(message)
    return m.hexdigest()
    
if __name__=='__main__':
    demo = [b"", b"a", b"abc", b"message digest", b"abcdefghijklmnopqrstuvwxyz",
            b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789",
            b"12345678901234567890123456789012345678901234567890123456789012345678901234567890"]
    for message in demo:
        print(my_md5(message),' <= "',message.decode('ascii'),'"', sep='')
        assert true_md5(message)==my_md5(message)  # 若和标准库中不同,会抛出异常

4.5 查阅sha256算法原理和伪代码

sha256是SHA2(Secure Hash Algorithm 2)算法的一个变体,具体内容
篇幅原因不再贴sha256的伪代码,维基百科中有详细说明。

4.6 编写sha256算法的测试代码

import hashlib

def my_sha256(message):
    pass

def true_sha256(message):
    m = hashlib.sha256()
    m.update(message)
    return m.hexdigest()
    
if __name__=='__main__':
    demo = [b"", b"a", b"abc", b"message digest", b"abcdefghijklmnopqrstuvwxyz",
            b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789",
            b"12345678901234567890123456789012345678901234567890123456789012345678901234567890"]
    for message in demo:
        print(my_sha256(message),' <= "',message.decode('ascii'),'"', sep='')
        assert true_sha256(message)==my_sha256(message)  # 若和标准库中不同,会抛出异常

4.7 编写MD5和SHA256的python实现,并进行测试验证

代码和测试结果在报告最后

4.8 总结实验,编写实验报告

5. 实验总结

  • 需要善于借鉴前人经验,搜集资料
  • 实验只支持ascii码,对其他字符编码没有解决
  • 利用优秀的工具,如Jupyter可以提高学习效率

6. 实验完整代码

5.1 MD5实现


# MD5 实现及其验证
import math
import hashlib
 
rotate_amounts = [7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22,
                  5,  9, 14, 20, 5,  9, 14, 20, 5,  9, 14, 20, 5,  9, 14, 20,
                  4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23,
                  6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21]
 
constants = [int(abs(math.sin(i+1)) * 2**32) & 0xFFFFFFFF for i in range(64)]

# A B C D
init_values = [0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476]
# 非线性函数
functions = 16*[lambda b, c, d: (b & c) | (~b & d)] + \
            16*[lambda b, c, d: (d & b) | (~d & c)] + \
            16*[lambda b, c, d: b ^ c ^ d] + \
            16*[lambda b, c, d: c ^ (b | ~d)]
 
index_functions = 16*[lambda i: i] + \
                  16*[lambda i: (5*i + 1)%16] + \
                  16*[lambda i: (3*i + 5)%16] + \
                  16*[lambda i: (7*i)%16]
# 对x左移amount位
def left_rotate(x, amount):
    x &= 0xFFFFFFFF
    return ((x<<amount) | (x>>(32-amount))) & 0xFFFFFFFF
 
def md5(message):
 
    message = bytearray(message) #copy our input into a mutable buffer
    orig_len_in_bits = (8 * len(message)) & 0xffffffffffffffff
    message.append(0x80)
    while len(message)%64 != 56:
        message.append(0)
    message += orig_len_in_bits.to_bytes(8, byteorder='little')
 
    hash_pieces = init_values[:]
 
    for chunk_ofst in range(0, len(message), 64):
        a, b, c, d = hash_pieces
        chunk = message[chunk_ofst:chunk_ofst+64]
        for i in range(64):
            f = functions[i](b, c, d)
            g = index_functions[i](i)
            to_rotate = a + f + constants[i] + int.from_bytes(chunk[4*g:4*g+4], byteorder='little')
            new_b = (b + left_rotate(to_rotate, rotate_amounts[i])) & 0xFFFFFFFF
            a, b, c, d = d, new_b, b, c
        for i, val in enumerate([a, b, c, d]):
            hash_pieces[i] += val
            hash_pieces[i] &= 0xFFFFFFFF
 
    return sum(x<<(32*i) for i, x in enumerate(hash_pieces))
 
def md5_to_hex(digest):
    raw = digest.to_bytes(16, byteorder='little')
    return '{:032x}'.format(int.from_bytes(raw, byteorder='big'))

def true_md5(message):
    m = hashlib.md5()
    m.update(message)
    return m.hexdigest()

def my_md5(message):
    return md5_to_hex(md5(message))

if __name__=='__main__':
    demo = [b"", b"a", b"abc", b"message digest", b"abcdefghijklmnopqrstuvwxyz",
            b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789",
            b"12345678901234567890123456789012345678901234567890123456789012345678901234567890"]
    for message in demo:
        print(my_md5(message),' <= "',message.decode('ascii'),'"', sep='')
        assert true_md5(message)==my_md5(message)  # 若和标准库中不同,会抛出异常
    print('\nMD5测试全部通过')

# d41d8cd98f00b204e9800998ecf8427e <= ""
# 0cc175b9c0f1b6a831c399e269772661 <= "a"
# 900150983cd24fb0d6963f7d28e17f72 <= "abc"
# f96b697d7cb7938d525a2f31aaf161d0 <= "message digest"
# c3fcd3d76192e4007dfb496cca67e13b <= "abcdefghijklmnopqrstuvwxyz"
# d174ab98d277d9f5a5611c2c9f419d9f <= "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
# 57edf4a22be3c955ac49da2e2107b67a <= "12345678901234567890123456789012345678901234567890123456789012345678901234567890"
# 
# MD5测试全部通过

5.2 Sha256实现


import copy
import struct
import binascii
import hashlib


F32 = 0xFFFFFFFF

_k = [0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
      0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
      0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3,
      0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
      0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc,
      0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
      0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7,
      0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
      0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13,
      0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
      0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3,
      0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
      0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5,
      0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
      0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208,
      0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2]

_h = [0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a,
      0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19]


def _pad(msglen):
    mdi = msglen & 0x3F
    length = struct.pack('!Q', msglen << 3)

    if mdi < 56:
        padlen = 55 - mdi
    else:
        padlen = 119 - mdi

    return b'\x80' + (b'\x00' * padlen) + length


def _rotr(x, y):
    return ((x >> y) | (x << (32 - y))) & F32


def _maj(x, y, z):
    return (x & y) ^ (x & z) ^ (y & z)


def _ch(x, y, z):
    return (x & y) ^ ((~x) & z)


class SHA256:
    _output_size = 8
    blocksize = 1
    block_size = 64
    digest_size = 32

    def __init__(self, m=None):
        self._counter = 0
        self._cache = b''
        self._k = copy.deepcopy(_k)
        self._h = copy.deepcopy(_h)

        self.update(m)

    def _compress(self, c):
        w = [0] * 64
        w[0:16] = struct.unpack('!16L', c)

        for i in range(16, 64):
            s0 = _rotr(w[i-15], 7) ^ _rotr(w[i-15], 18) ^ (w[i-15] >> 3)
            s1 = _rotr(w[i-2], 17) ^ _rotr(w[i-2], 19) ^ (w[i-2] >> 10)
            w[i] = (w[i-16] + s0 + w[i-7] + s1) & F32

        a, b, c, d, e, f, g, h = self._h

        for i in range(64):
            s0 = _rotr(a, 2) ^ _rotr(a, 13) ^ _rotr(a, 22)
            t2 = s0 + _maj(a, b, c)
            s1 = _rotr(e, 6) ^ _rotr(e, 11) ^ _rotr(e, 25)
            t1 = h + s1 + _ch(e, f, g) + self._k[i] + w[i]

            h = g
            g = f
            f = e
            e = (d + t1) & F32
            d = c
            c = b
            b = a
            a = (t1 + t2) & F32

        for i, (x, y) in enumerate(zip(self._h, [a, b, c, d, e, f, g, h])):
            self._h[i] = (x + y) & F32

    def update(self, m):
        if not m:
            return

        self._counter += len(m)
        m = self._cache + m

        for i in range(0, len(m) // 64):
            self._compress(m[64 * i:64 * (i + 1)])
        self._cache = m[-(len(m) % 64):]

    def digest(self):
        r = copy.deepcopy(self)
        r.update(_pad(self._counter))
        data = [struct.pack('!L', i) for i in r._h[:self._output_size]]
        return b''.join(data)

    def hexdigest(self):
        return binascii.hexlify(self.digest()).decode('ascii')
    
    
def true_sha256(message):
    m = hashlib.sha256()
    m.update(message)
    return m.hexdigest()

def my_sha256(message):
    return SHA256(message).hexdigest()
    
if __name__=='__main__':
    demo = [b"", b"a", b"abc", b"message digest", b"abcdefghijklmnopqrstuvwxyz",
            b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789",
            b"12345678901234567890123456789012345678901234567890123456789012345678901234567890"]
    for message in demo:
        print(my_sha256(message),' <= "',message.decode('ascii'),'"', sep='')
        assert true_sha256(message)==my_sha256(message)  # 若和标准库中不同,会抛出异常
    print('\nSHA256测试全部通过')

# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 <= ""
# ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb <= "a"
# ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad <= "abc"
# f7846f55cf23e14eebeab5b4e1550cad5b509e3348fbc4efa3a1413d393cb650 <= "message digest"
# 71c480df93d6ae2f1efad1447c66c9525e316218cf51fc8d9ed832f2daf18b73 <= "abcdefghijklmnopqrstuvwxyz"
# db4bfcbd4da0cd85a60c3c37d3fbd8805c77f15fc6b1fdfe614ee0a7c8fdb4c0 <= "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
# f371bc4a311f2b009eef952dd83ca80e2b60026c8e935592d0f9c308453c813e <= "12345678901234567890123456789012345678901234567890123456789012345678901234567890"
# 
# SHA256测试全部通过