碎片化总结

把自己日常工作中遇到的问题随手丢到这里，以便之后查阅，或者需要的时候总结成文

技术

1.postgres主键自增

https://zhiwei.li/text/2012/02/15/postgresql%E4%B8%BB%E9%94%AE%E8%87%AA%E5%A2%9E/

数据类型为 SERIAL

2.服务器上vim查看中文乱码

参考

http://www.cnblogs.com/joeyupdo/archive/2013/03/03/2941737.html
http://www.jianshu.com/p/Rww1Tp
https://www.zhihu.com/question/22363620

在 ~/.vimrc 中加入

set fileencodings=utf-8,ucs-bom,gb18030,gbk,gb2312,cp936
set termencoding=utf-8
set encoding=utf-8

3.url中文解析

参考

http://www.cnblogs.com/stemon/p/6602185.html

Python 下的解法（Python 3）

使用 urllib.parse.unquote 即可

如

>>> import urllib.parse
>>> urllib.parse.unquote('%E9%93%B8%E9%95%B0%E9%BE%99')
'铸镰龙'

4.大文件缓存

参考：

https://chenqx.github.io/2014/10/29/Python-fastest-way-to-read-a-large-file/

虽然没用上……

5.读取tar和zip

tar: https://docs.python.org/3/library/tarfile.html

zip: https://docs.python.org/3/library/zipfile.html

6.python re

(?P{regex}) 捕捉 {regex} 所代表的正则表达式所获取的对象，命名为 regex_name，方便使用，如：

import re
pattern = 'baidu_(?.+)_info'
items = re.search(pattern, 'baidu_baike_info')
item = items.group('part')  # assert item == 'baike'

https://docs.python.org/3/howto/regex.html

7.不可见字符导致数据入库失败

类似这样，要写入的文本：

看到的字符：我是一个作者
实际的字符串："我是一个\x00作者"

其中的十六进制空字符 \x00 不可见，但要入库则会报错。

简单的做法就是用字符串的 replace 方法处理掉这些东西。据 xy 说还有用字符集过滤掉的方式，没试过，之后可尝试

8.logging配置：日志分流（重定向分流）

[loggers]
keys=root

[handlers]
keys=infoHandler,warningHandler

[logger_root]
level=DEBUG
handlers=infoHandler,warningHandler

[handler_infoHandler]
class=StreamHandler
level=INFO
formatter=simpleFormatter
args=(sys.stdout,)

[handler_warningHandler]
class=StreamHandler
level=WARNING
formatter=simpleFormatter
args=(sys.stderr,)

[formatters]
keys=simpleFormatter

[formatter_simpleFormatter]
format=%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s
datefmt=%a, %d %b %Y %H:%M:%S

9.SQL查询某个字段包含已知字符串

https://zhidao.baidu.com/question/193581250.html

SELECT * FROM some_table WHERE spec_column = '%已知的字符串%';

10.SQL LIMIT OFFSET 用法

https://www.postgresql.org/docs/8.2/static/queries-limit.html

SELECT * FROM some_table WHERE cond LIMIT 100 OFFSET 2;

上面这句的意思是：从表格 some_table 中取出满足 cond 的记录，从第 2+1 条开始查找，查找 100 条

11.SQL排序

http://www.w3school.com.cn/sql/sql_orderby.asp

使用 ORDER BY colmn，来根据名为 column 的列的取值排列

使用 DESC 逆序

12.SQL JOIN

分为INNER JOIN 和 (LEFT/RIGHT) OUTER JOIN
前者要求联结的多张表中键值一定要存在；后者则允许某些键值是不存在的

13.SQL 中取出的数值类型与定义有关，不可想当然认为一定是字符串。例如一个典型的 queryid 有时定义为 bigint，但是看起来就像一个全是数字的长字符串，于是本来想匹配已知的 queryid 来进行debug，用了 if queryid == '123'，但已知没能进入 If 条件。后来在想，既然内容都有，但却对不上，拿是不是数值类型不匹配？打印 type 出来，发现都是 int，于是就明白了。

14.原始数值存在，但输出csv却为空。如果用的是 Pandas 的 to_csv，可以注意一下，提供的 columns 与打印的 dict keys 是否一致（一一核对）

15.tornado实现下载文件的方法

http://yobin.sinaapp.com/topic/2923/Tornado%E7%9B%B4%E6%8E%A5%E4%B8%8B%E8%BD%BD%E6%96%87%E4%BB%B6%E7%9A%84%E6%96%B9%E6%B3%95
https://blog.bbzhh.com/index.php/archives/52.html
https://gist.github.com/alejandrobernardis/1790864
https://en.wikipedia.org/wiki/MIME#Content-Disposition
https://blog.robotshell.org/2012/deal-with-http-header-encoding-for-file-download/comment-page-1/

关键部分

    self.set_header ('Content-Type', 'application/octet-stream')
    self.set_header ('Content-Disposition', 'attachment; filename='+filename)
    #读取的模式需要根据实际情况进行修改
    with open(filename, 'rb') as f:
        while True:
            data = f.read(buf_size)
            if not data:
                break
            self.write(data)
    #记得有finish哦
    self.finish()

注意必须 finish，否则无法下载

16.命令行中打不出中文，怎么办？（ubuntu）

http://forum.ubuntu.com.cn/viewtopic.php?f=77&t=277959

命令行中运行

sudo  dpkg-reconfigure locales

然后选择

en_US.UTF-8
zh_CN.GB2312
zh_CN.GB18030
zh_CN.UTF-8
zh_CN.GBK

17.pandas合并，drop

merge: https://pandas.pydata.org/pandas-docs/stable/merging.html

drop: https://stackoverflow.com/questions/14661701/how-to-drop-a-list-of-rows-from-pandas-dataframe

17.crontab 命令中，用 source 会导致命令无法执行

碎片化总结

技术

你可能感兴趣的:(碎片化总结)