硬盘空间不足导致程序崩溃

硬盘空间不足导致程序崩溃

最近发现一个问题,在硬盘空间耗尽的时候,我们的应用程序会崩溃。

调试后发现,我们会调用fclose()去关闭当前文件,但是硬盘空间满了时,fclose()会返回失败。
然而该线程会反复不停地调用fclose(),一直失败。
恰好此时,另外一个线程打开了一个文件,想要读取数据,确读取失败,导致程序出错。

查询linux用户手册:
close()返回错误时候,尝试再去调用close()关闭该文件是一种错误的行为。这可能会导致另一个线程打开的文件被关闭。
在执行关文件操作的时候,FD会先被释放,后面执行刷新数据等其他操作。

比如:刷新数据时出错,close()返回错误,FD已经被释放,另一个线程打开的文件占用该FD。该线程再次调用close(),这会把另一个线程的文件关闭。
此时,读写文件都会出问题。

再想想,如果不关文件行吗?

答案是否定的。 操作系统拥有的文件描述符数量是有限的,如果耗尽,会导致操作系统运行出错。

还发现一个现象:fwrite() 一直返回成功,即使硬盘空间已满。

因为fwrite()是写到缓存的,只有调用fflush()fclose()时,才会去操作硬盘。

解决方案:

文件关闭失败后,不再继续尝试关闭该文件。

参考:
       Retrying the close() after a failure return is the wrong thing to
       do, since this may cause a reused file descriptor from another
       thread to be closed.  This can occur because the Linux kernel
       always releases the file descriptor early in the close operation,
       freeing it for reuse; the steps that may return an error, such as
       flushing data to the filesystem or device, occur only later in
       the close operation.
However, if your codes runs for long time when you are continuously opening files and not closing, after a certain time, there may be crash in run time.

when you open a file, the operating system creates an entry to represent that file and store the information about that opened file. So if there are 100 files opened in your OS then there will be 100 entries in OS (somewhere in kernel). These entries are represented by integers like (...100, 101, 102....). This entry number is the file descriptor. So it is just an integer number that uniquely represents an opened file in operating system. If your process open 10 files then your Process table will have 10 entries for file descriptors.

Also, this is why you can run out of file descriptors, if you open lots of files at once. Which will prevent *nix systems from running, since they open descriptors to stuff in /proc all the time.

Similar thing should happen in case of all operating system.
fwrite is buffered. Nothing hits disk until fflush or fclose

https://man7.org/linux/man-pages/man2/close.2.html
https://stackoverflow.com/questions/28253569/what-happens-if-i-never-call-close-on-an-open-file-stream
https://stackoverflow.com/questions/16508305/fwrite-not-return-0-when-harddisk-is-full-in-linux-why

你可能感兴趣的:(硬盘空间不足导致程序崩溃)