周记 2017 4.24

微信支付宝对账单下载

背景:帮朋友下载商户在微信和支付宝每天的账单
微信对账单API
支付宝对账单API

微信对账单

对账单接口入参有一个参数:签名来效验请求是否合法，这个签名是由 MD5(请求参数 + 用户自定义秘钥) 生成的，利用MD5的不可逆来保证安全性。

对账单接口出参，请求失败时返回状态码和错误信息；成功时，直接返回文本(csv) 信息。

账单信息:
交易时间,公众账号ID,商户号,子商户号,设备号,微信订单号,商户订单号,用户标识,交易类型,交易状态,付款银行,货币种类,总金额,代金券或立减优惠金额,微信退款单号,商户退款单号,退款金额,代金券或立减优惠退款金额,退款类型,退款状态,商品名称,商户数据包,手续费,费率
.......................................
总交易单数,总交易额,总退款金额,总代金券或立减优惠退款金额,手续费总金额

Python代码(2.7):
ExportBill.py

# -*- coding:utf-8 -*-
# 请求微信 API 下载对账单
# pip install requests
# pip install mysql-connector-python-rf
# pip install MySQL-python

import sys
import requests
import uuid
import hashlib
from MysqlUtil import Mysql

reload(sys)
sys.setdefaultencoding('utf8')

# 公众账号ID
appid = 'xxx'

# 商户号
mch_id = 'xxx'

# 秘钥
key = 'xxx'

# 账单类型
# ALL，返回当日所有订单信息，默认值
# SUCCESS，返回当日成功支付的订单
# REFUND，返回当日退款订单
# RECHARGE_REFUND，返回当日充值退款订单（相比其他对账单多一栏“返还手续费”）
bill_type = 'ALL'

# 随机字符串 随机字符串，不长于32位。推荐随机数生成算法
nonce_str = str(uuid.uuid4()).replace('-', '')

# 下载对账单
url = 'https://api.mch.weixin.qq.com/pay/downloadbill'

# 对账单日期
bill_date = 'xxx'

# 输入参数
params = '''
    
      %s
      %s
      %s
      %s
      %s
      %s
    
'''

encoding = 'utf-8'

md5 = hashlib.md5()


# 生成签名
def md5_count():
    strs = 'appid=%s&bill_date=%s&bill_type=%s&mch_id=%s&nonce_str=%s&key=%s' % (appid, bill_date, bill_type, mch_id,
                                                                                 nonce_str, key)
    md5.update(strs)
    return md5.hexdigest().upper()


# 持久化
def persistence(text):
    try:
        con = Mysql(user='root', password='', host='127.0.0.1', port=3306, database='test')
        print '持久化数据:'
        lines = text.split('\r\n')
        print '表头: %s' % lines.pop(0)
        while len(lines) > 3:
            line = lines.pop(0)
            con.save_list('insert into YM_wx_pay values(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,'
                          '%s,%s,%s,%s,%s,%s,%s,%s)', line.split(','))
        print '-----------------------'
        print lines.pop(0)
        print lines.pop(0)
    finally:
        if con:
            con.close()


if __name__ == '__main__':
    params = params % (appid, bill_date, bill_type, mch_id, nonce_str, md5_count())
    request = requests.post(url, data=params)
    request.encoding = encoding
    if request.text and not request.text.__contains__('return_code'):
        persistence(request.text)
    else:
        print request.text
        print '接口返回参数有误!'

MysqlUtil.py

# -*- coding:utf-8 -*-
import mysql.connector


class Mysql(object):
    def __init__(self, user, password, host, port, database):
        self.__conn__ = mysql.connector.connect(user=user, password=password,
                                                host=host, port=port,
                                                database=database)
        self.__cursor__ = self.__conn__.cursor()

    def save_list(self, sql, params):
        self.__cursor__.execute(sql, params)
        self.__conn__.commit()

    def close(self):
        self.__cursor__.close()
        self.__conn__.close()

支付宝对账单

需要引入SDK然后按照代码示例拉取对账单，目前SDK有Java，.NET，PHP，灵活性没有微信接口好，不能用Python大法解决。

搜狗搜索

当我们要开始探索某个未知的领域，首先会想到在网上查找相关资料，谷歌无疑是很棒的搜索工具；但是现在有很多干货是在知乎或微信公众号发布的，谷歌搜索并不能很好的覆盖这两个内容来源，搜狗搜索可以对微信公众号以及知乎内容进行检索，起到了很好的辅助作用。

Python关于线程的几个知识

使用Python脚本跑数据，跑了2个多小时，事后分析每部时间耗费，发现主要是执行sql导致等待时间过长；首先想到使用多线程来提升并发度，减少CPU等待时间；更优的方法是使用协程来，既提升并行度同时不会造成频繁的CPU上下文切换。

启动一个线程:

import threading

def print_str():
        print 'hello world %s.' % threading.current_thread().name


t = threading.Thread(target=print_str)  # 实例化一个线程，比Java简单多了

t.start()  # 启动线程
t.join()  # 主线程等待直到线程t执行完毕
t.join(1)  # 主线程等待线程t执行(最多等待1m)

锁:

lock = threading.Lock()  # 创建一个锁
lock.acquire()  # 加锁
lock.release()  # 解锁 通常写在finally里

线程本地变量:

local = threading.local()  # 线程本地变量 HTTP使用较多
local.name = 'xxx'

try:
     age = local.age  # 注意:假设我从local中取一个不存在变量抛出异常
except AttributeError:
    pass

线程池:

from multiprocessing.dummy import Pool as ThreadPool

pool = ThreadPool()  # 默认创建线程数量等于当前机器CPU核数

def test(url):
    print url


results = pool.map(test, ['http://www.baidu.com', 'http://www.sina.com', 'http://www.qq.com'])  # 这样线程就可以并发地从列表中取出字符串处理
pool.join()

注意:
因为Python GIL限制，多线程是伪多线程，代码依然在一个CPU中跑。
适用于IO密集型任务中；I/O密集型执行期间大部分是时间都用在I/O上，如数据库I/O，较少时间用在CPU计算上。

Java PriorityQueue VS DelayQueue

PriorityQueue:优先队列，正如在之前文章中所说其本质上是一个二差堆，根节点是最小值，添加元素删除元素的时间复杂度都是logn。

        PriorityQueue priorityQueue = new PriorityQueue();  // 如果没有设置比较器，会使用插入对象的compareTo方法。
        priorityQueue.add(10);
        priorityQueue.add(5);
        priorityQueue.add(3);
        System.out.println(priorityQueue.peek());  // 查看堆顶元素
        System.out.println(priorityQueue.poll());  // 拿出堆顶元素

DelayQueue:延时队列，在PriorityQueue基础上扩展来的。

简单解释:现在有3个任务，任务1要求5分钟后执行，任务2要求1分钟后执行，任务3要求10分钟后执行；把这三个任务添加到DelayQueue中然后在1分钟后弹出任务2执行，5分钟后弹出任务1执行，10分钟后弹出任务3执行。

    static class T implements Delayed {

        public long t;

        public T(long t) {
            this.t = t;
        }

        @Override
        public long getDelay(TimeUnit unit) {
            return t - System.nanoTime();
        }

        @Override
        public int compareTo(Delayed o) {
            return this.getDelay(null) - o.getDelay(null) > 0 ? 1 : -1;
        }
    }

    public static void main(String[] args) throws InterruptedException {
        DelayQueue delayQueue = new DelayQueue();
        long time = System.nanoTime();
        delayQueue.add(new T(time + 1000000000l));  // 1秒
        delayQueue.add(new T(time + 10000000000l));  // 10秒
        delayQueue.add(new T(time + 5000000000l));  // 5秒
        System.out.println(delayQueue.take().t);  // take是阻塞方法，直到到期然后继续执行
        System.out.println(delayQueue.take().t);
        System.out.println(delayQueue.take().t);
    }

take方法是如何在没有元素到期时阻塞在有元素到期是唤醒弹出元素的呢。

    public E take() throws InterruptedException {
        final ReentrantLock lock = this.lock;
        lock.lockInterruptibly();  // 加锁来避免多线程问题
        try {
            for (;;) {
                E first = q.peek();
                if (first == null)
                    available.await();
                else {
                    long delay = first.getDelay(NANOSECONDS);  // 假设当前元素已到期
                    if (delay <= 0)
                        return q.poll();  // 弹出
                    first = null; // don't retain ref while waiting
                    if (leader != null)
                        available.await();
                    else {
                        Thread thisThread = Thread.currentThread();
                        leader = thisThread;
                        try {
                            available.awaitNanos(delay);  // 假设当前对象没有到弹出时间，等待(执行时间 - 当前时间)时间后自动唤醒 注意:单位是纳秒
                        } finally {
                            if (leader == thisThread)
                                leader = null;
                        }
                    }
                }
            }
        } finally {
            if (leader == null && q.peek() != null)
                available.signal();
            lock.unlock();
        }
    }

周记 2017 4.24 - 4.30