阿里云实现后台运行scrapy 爬虫

问题

用阿里云运行scrapy爬虫时,当关闭了putty连接以后发现运行的爬虫进程也关闭了,不可能一直连接着putty让爬虫进行,因此通过其他的方法来实现后台运行爬虫

解决办法

  1. 写一个python的程序run.py,里面有运行scrapy 的代码
    import os
    
    if __name__ == '__main__':
        os.system("scrapy crawl yourspidername")

     

  2. 通过nohup命令
nohup python -u run.py > job.log 2>1& &

 这样原本格式化输出的内容都输出到job.log中了

知识点

1.os.system(command) 在子shell中执行command(字符串)命令。在windows中返回值始终为0.在UNIX中返回值表示为进程的退出状态。

Execute the command (a string) in a subshell. This is implemented by calling the Standard C function system(), and has the same limitations. Changes to sys.stdin, etc. are not reflected in the environment of the executed command. If command generates any output, it will be sent to the interpreter standard output stream.

On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.

On Windows, the return value is that returned by the system shell after running command. The shell is given by the Windows environment variable COMSPEC: it is usually cmd.exe, which returns the exit status of the command run; on systems using a non-native shell, consult your shell documentation.

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in the subprocess documentation for some helpful recipes.

Availability: Unix, Windows

 

2.nohup  不挂断地运行命令

在上面的例子中,0 – stdin (standard input),1 – stdout (standard output),2 – stderr (standard error) ;

2>&1是将标准错误(2)重定向到标准输出(&1),标准输出(&1)再被重定向输入到job.log文件中。

参考:https://www.ibm.com/developerworks/cn/linux/l-cn-nohup/

https://www.cnblogs.com/allenblogs/archive/2011/05/19/2051136.html

你可能感兴趣的:(python,爬虫,linux)