嵌入式系统中一种定时器保护系统的实现

嵌入式系统设计时一般需要考虑不间断运行。为了在发生问题时能够自动恢复,也设计有看门狗机制,在程序出现bug或内存遭到破坏时,自动重启以恢复系统的正常运行。一个合理设计的看门狗系统能够从绝大部分软件问题中恢复,但对有一类问题却无能为力,这就是定时器异常。

在有些SoC系统中,有的定时器在反复的设定/启动/停止中会出现异常,不能再正确地触发超时中断,从而导致一部分功能不能正常运转。但因为整个系统还是正常动作,因此也不会触发看门狗,就不能从这种局部异常中恢复。

对这类问题,设计了一种软件保护机制,在检出有定时器异常时,自动触发系统重启。为了使用该保护机制,系统中需要有一个单增的计数定时器(简称主计数器)。保护系统由以下三个部分构成:

  • 在定时器启动时,同时记录该主计数器的值,并计算主计数器的保护值。
  • 在定时器停止或超时时,清除保护值。
  • 在主循环中定时读取主计数器的值,检查保护值是否超出,如果超出,说明启动的定时器发生异常,重新启动系统。

以下给出C语言的示例代码。各个函数的调用说明如下:

  • 在定时器初始化时,调用setCountNo()设定该定时器的时间单位对应的主计数器的值。
  • 当定时器启动时,调用startSupervisedCounter()启动定时器保护机制。
  • 当定时器超时、或中断时,调用stopSupervisedCounter()停止定时器保护机制。
  • 在主循环(若是多Task系统,则在最低优先级的Task中)调用superviseTimer()定期检查是否有定时器异常

 

  • TimerProtect.h

#ifndef _TIMER_PROTECT_H_ 
#define _TIMER_PROTECT_H_

typedef enum {
  Timer0 = 0,
  Timer1,
  Timer2,
  Timer3,
  Timer4,
  Timer5,
  Timer_Max,
} Timer_e;

//Set the main counter's numbers of a timer's unit. Each timer's unit is different. It's called
//when each timer is initialized.
void setCountNo(Timer_e timer, uint32_t ulCountNo);

//Set or reset counter value.
//The function should be called in INTERRUPT context or CPU-locked.
 void startSupervisedCounter(Timer_e timer, uint32_t ulNewTimer, uint8_t ucWithCpuLock);

//Stop timer supervision for the indicated timer.
//The function should be called in INTERRUPT context or CPU-locked.
void stopSupervisedCounter(Timer_e timer, uint8_t ucWithCpuLock);

//Check the timer running status. It's called once each loop of main task. If one
//timer's counter reach zero, system will be restarted.
void superviseTimer(void);

#endif /*_TIMER_PROTECT_H_*/
 

  • TimerProtect.c


#include "TimerProtect.h"

/********************************************************************************************************
*
* In field test environment, it's found that some timer will stop running in long time operation, and lead
* device not to work correctly. But software continues running, and cannot recover. Therefore we design a
* supervision mechanism to check timer periodically, and if some timers work abnormally, we'll
* restart system.
*
* Algorithm:
* 1.When timer is started, a parallel counter is set. The counter is 2 * timer normally. But if it's less than 1s
*   the counter will be set to 1s. If the timer is too big to fit a uint32 variable, timer + 1s will be used.
*   If timer + 1s will also excceed uint32's bound, the counter will be set to 0xFFFFFFFF.
* 2.The above set counter will be decreased each main task's loop according to main counter. If the timer
*   is running and its corresponding counter becomes 0. The timer is thought failed and the system will be
*   restarted.
* 3.Each time when a timer is started, a flag to indicate the timer running and a counter will be set.
*   And when the timer is timeout or stopped, the flag will be cleared.
*
* Attention:
*  The mechanism works only in case that main counter works correctly.
*
********************************************************************************************************/
/*
 * ucRunningFlag:        Indicate if timer is running. TRUE - run, FALSE - not run
 * ulCounter:            Counter of supervisor.
 * ulLastMCCounter:      Counter of last main counter value.
 * usMCNoInTimerUnit:    The MC numbers corresponding to the timer's unit.
 */
typedef struct {
  uint8_t     ucRunningFlag;
  uint32_t    ulCounter;
  uint32_t    ulLastMCCounter;
  uint32_t    ulCountNoInTimerUnit;
} TimerSuperviser_t;

/*
 * The index of the array indicates the sequence of timer referring to enum Timer_e
 */
static TimerSuperviser_t timerSupervisers[Timer_Max] = {0};

/************************************************************************************
 *
 * Set the main counter's numbers of a timer's unit. Each timer's unit is different.
 * It's called when each timer is initialized.
 *
 * Parameters:
 * IN:
 *  timer:         index into timerSupervisers.
 *  ulCountNo:     The main counter's numbers of a timer's unit.
 *
 * OUT:
 *   none
 *
 * Return:
 *   none
 *
 ************************************************************************************/
void setCountNo(Timer_e timer, uint32_t ulCountNo) {
  uint8_t ucTimerIdx = (uint8_t)timer;

  timerSupervisers[ucTimerIdx].ulCountNoInTimerUnit = ulCountNo;
}

/************************************************************************************
 *
 * Set or reset counter value.
 * ATTENTION:
 *   the input value is the timer value, it'll not set to counter directly to avoid
 *   unnecessary restart. The counter is set according to the following rules:
 *     - The counter is 2 * timer normally. If it's less than 1s, be reset to 1s.
 *     - If the timer is too big to fit a uint32 variable, timer + 1s will be used.
 *     - If timer + 1s will also excceed uint32's bound, the counter will be set to
 *       0xFFFFFFFF.
 *
 * Parameters:
 * IN:
 *   timer:         index into timerSupervisers.
 *   ulNewTimer:    The new timer value be set. It'll changed to counter value. If ulNewTimer
 *                  is zero, it means to restart supervision with the old counter.
 *   ucWithCpuLock: TRUE - called in a CPU lock context, FALSE - called in a not CPU lock context.
 *
 * OUT:
 *   none
 *
 * Return:
 *   none
 *
 ************************************************************************************/
void startSupervisedCounter(Timer_e timer, uint32_t ulNewTimer, uint8_t ucWithCpuLock) {
  uint8_t ucTimerIdx = (uint8_t)timer;
  uint32_t ul1SToTimer;

  /*
   * Timer supervision doesn't work if main counter is under time correction. In this case
   * the main counter may jump and lead to restart the system unnecessarily.
   */

  ul1SToTimer = _25M_1s / timerSupervisers[ucTimerIdx].ulCountNoInTimerUnit;

  if(!ucWithCpuLock)
    disable_irq();

  if(0 == ulNewTimer) {
    //It means to restart supervision with the old counter.
    if(0 == timerSupervisers[ucTimerIdx].ulCounter) {

      if(!ucWithCpuLock)
        enable_irq();
      
      return;  //Nothing to do.
    }
    
    //Otherwise, restart it.
  }
  else if(ulNewTimer <= 0x7FFFFFFF) {
    timerSupervisers[ucTimerIdx].ulCounter = ulNewTimer<<1;
  
    //We expand the supervision timer to 1s at least.
    if(timerSupervisers[ucTimerIdx].ulCounter < ul1SToTimer)
      timerSupervisers[ucTimerIdx].ulCounter = ul1SToTimer;
  }
  else {
    if((ulNewTimer + ul1SToTimer) <= 0xFFFFFFFF)
      timerSupervisers[ucTimerIdx].ulCounter = ulNewTimer + ul1SToTimer;
    else
      timerSupervisers[ucTimerIdx].ulCounter = 0xFFFFFFFF;
  }

  timerSupervisers[ucTimerIdx].ulLastMCCounter = Get_25MCnt_Cur();
  timerSupervisers[ucTimerIdx].ucRunningFlag     = TRUE;

  if(!ucWithCpuLock)
    enable_irq();

}

/************************************************************************************
 *
 * Stop timer supervision for the indicated timer. However the counter will not be
 * cleared in order to make it possible to restart.
 *
 * Parameters:
 * IN:
 *  timer:      index into timerSupervisers.
 *  ucWithLock: TRUE - called in a CPU lock context, FALSE - called in a not CPU lock context.
 *
 * OUT:
 *   none
 *
 * Return:
 *   none
 *
 ************************************************************************************/
void stopSupvervisedCounter(Timer_e timer, uint8_t ucWithCpuLock) {
  uint8_t ucTimerIdx = (uint8_t)timer;

  if(!ucWithCpuLock)
    disable_irq();
  
  timerSupervisers[ucTimerIdx].ucRunningFlag = FALSE;

  if(!ucWithCpuLock)
    enable_irq();

}

/************************************************************************************
 *
 * Check the timer running status. It's called once each main loop of the main task. If one
 * timer's counter reach zero, system will be restarted.
 *
 * Parameters:
 * IN:
 *  none.
 *
 * OUT:
 *   none
 *
 * Return:
 *   none
 *
 ************************************************************************************/
void superviseTimer(void) {
  uint32_t ulCurMCCounter, ulCounterPassed;
  uint8_t ucStep;

  /*
   * Timer supervision doesn't work if main counter is under time correction. In this case
   * the main counter may jump and lead to restart the system unnecessarily.
   */
  
  for(ucStep = 0; ucStep < Timer_Max; ucStep++) {

    //This part should be ecexuted in CPU-lock, because the corresponding parameters may
    //updated in interrupter.
    
    disable_irq();
   
    if(timerSupervisers[ucStep].ucRunningFlag) {
 
      //Get passed time in main counter.
      ulCurMCCounter = Get_MC_Cur();

      if(CYCLE_LESS_UL(timerSupervisers[ucStep].ulLastMCCounter, ulCurMCCounter)) {
      
        ulCounterPassed = CYCLE_REDUCTION_UL(ulCurMCCounter,  timerSupervisers[ucStep].ulLastMCCounter) /
                          timerSupervisers[ucStep].ulCountNoInTimerUnit;

        if(ulCounterPassed >= timerSupervisers[ucStep].ulCounter) {
          PRINT("OOPS: Timer [%d] failed. Ps: Flag = %d, LMCC = [0x%4x], Counter = %d, Unit = %d, CMCC = [0x%4x], CP = %d\n",
                ucStep, timerSupervisers[ucStep].ucRunningFlag, timerSupervisers[ucStep].ulLastMCCounter,
                timerSupervisers[ucStep].ulCounter, timerSupervisers[ucStep].ulCountNoInTimerUnit, ulCurMCCounter, ulCounterPassed);
          //A timer counter becomes zero, restart the system by watchdog.
          restart_system();
        }
        else {
          if(ulCounterPassed > 0) {
            //Update the counter.
            timerSupervisers[ucStep].ulCounter -= ulCounterPassed;
          
            //Even overflow, the result is correct.
            timerSupervisers[ucStep].ulLastMCCounter += (ulCounterPassed * timerSupervisers[ucStep].ulCountNoInTimerUnit);
          }
          //Otherwise, needn't update in the current loop.
        }
      }
      else {
        /*
         * It means the main counter is corrected, just skip.
         */
      }
    }

    enable_irq();

  }
}

 

你可能感兴趣的:(嵌入式系统中一种定时器保护系统的实现)