An experience of fixing a memory-corruption bug (3)

(7) Using mprotect function

Because the second element of the array is always changed to NULL, I used mprotect instead of malloc to allocate the memory of the array, and set read-only attribute of the memory. So if the memory is changed, the program will crash. But the mprotect will allocate a page size of the memory, and this may cause the program change the behavior, I am not sure whether the bug can occur again.


(8) Finding the cause

About 2 weeks later, when a colleague stopped the application, the application crashed again. The cause was a global variable was changed. I checked all the old core dumps immediately, and found the global variable was always changed, and pointed to the address of the array! It seemed I was very clear to the truth.


After about 2 days analysis, the cause was found:  When a thread calls pthread_create to create another thread, it will changes a global variable. But in few cases, the child thread will execute firstly, and it will also changes the global variable. The code assumes the parent thread always execute firstly, and this will cause the program crash.


When looking back the 4-month experience, I have studied a lot of things for debugging this bug: libumem, valgrid, libefence, etc. It is a really memorable and cool experience!    

你可能感兴趣的:(c,debugging)