STM32 - Read/Write External SRAM

STM32: Using External SRAM

This article is about the external SRAM of STM32. With FSMC, STM32 MCUs can access external
SRAM. Hopefully, this idea looks like saving us from the thirsty of RAM.

Requirements

This article requires you have some basic knowledge about STM32 development. Including:

  1. C programming language
  2. What is SRAM
  3. The official document about STM32 FSMC
  4. Some basic tuning skills
  5. A STM32 dev board which supports FSMC and built in an external SRAM.
  6. A debugger, such as ST-Link or J-Link
  7. An IDE, I'm using this IDE: System Workbench for STM32

SRAM/PSRAM

All right, here comes a new acronym PSRAM. When you search PSRAM on WikiPedia, this page jumps out and note us it is not a real SRAM but pseudo one. JEDEC has a good explain about PSRAM. Usually, SRAM is too expansive(comparing to PSRAM) and has less size(usually counting in KB). However, PSRAM is far cheaper then SRAM and providing sufficient size. Here is my favorite one. STM32 - Read/Write External SRAM_第1张图片

Even the manual won't tell you it is a PSRAM. But we can conclude by its price and capacity. Fortunately, my dev board has a piece of PSRAM of the same type built in.
A real SRAM looks like this one. 32KB for 38RMB. Since we don't have unlimited budget, let's assume our built-in RAM which we thought is as fast as we need. We treat the built-in RAM as real SRAM. Then we can write some code to test the performance of SRAM and PSRAM.

Some tools make your life easier

I use STM32CubeMx a lot. With the GUI we can easily get the board and some basic code prepared, just by a few of clicks. Further more, if we have the following eclipse plugins installed, we will code more happily:

  1. TM Terminal
  2. RxTx

They are for display log string from serial.

Enough talk, let's code

The manual is boring, boring and boring, especially the part about timing. At very beginning I was frustrated by the time order and waveform figures. Soon, after some attempts, I found that our poor PSRAM, err, SRAM doesn't care about too much except the data
According to board manufacture's manual, the board has a 1MB built-in external SRAM which is IS62WV51216B.
This SRAM, unsurprisingly, is a PSRAM, which we can easily figure it out by Taobao.com.
By the way, we use HAL everywhere, so please check HAL support in STM32CubeMx.

FSMC Configuration

Actually, even though our PSRAM support 18bits addressing, by my test, 1 bit addressing works very well. So we can use 1bit addressing at all. And also, 8bits/16bits data bus doesn't matter too. So we can configure our chip using 1bit addressing and 8bit data bus with any pressure. Also, save some IO pins which are perish resources. Here is the FSMC initialization code generated by STM32CubeMx.

// in main.c
static void MX_FSMC_Init(void)
{
  FSMC_NORSRAM_TimingTypeDef Timing;

  /** Perform the SRAM3 memory initialization sequence
  */
  hsram3.Instance = FSMC_NORSRAM_DEVICE;
  hsram3.Extended = FSMC_NORSRAM_EXTENDED_DEVICE;
  /* hsram3.Init */
  hsram3.Init.NSBank = FSMC_NORSRAM_BANK3;    // my board has PSRAM connected on bank3.
  hsram3.Init.DataAddressMux = FSMC_DATA_ADDRESS_MUX_DISABLE;    // Not used, using HAL lock
  hsram3.Init.MemoryType = FSMC_MEMORY_TYPE_SRAM;
  hsram3.Init.MemoryDataWidth = FSMC_NORSRAM_MEM_BUS_WIDTH_8;    // using 8bit for data bus
  hsram3.Init.BurstAccessMode = FSMC_BURST_ACCESS_MODE_DISABLE;    // PSRAM won't care
  hsram3.Init.WaitSignalPolarity = FSMC_WAIT_SIGNAL_POLARITY_LOW;
  hsram3.Init.WrapMode = FSMC_WRAP_MODE_DISABLE;
  hsram3.Init.WaitSignalActive = FSMC_WAIT_TIMING_BEFORE_WS;
  hsram3.Init.WriteOperation = FSMC_WRITE_OPERATION_ENABLE; // Of course, we want to write the memory
  hsram3.Init.WaitSignal = FSMC_WAIT_SIGNAL_DISABLE;    // Let's FSMC manage this
  hsram3.Init.ExtendedMode = FSMC_EXTENDED_MODE_DISABLE;    // What is extended mode? keep default
  hsram3.Init.AsynchronousWait = FSMC_ASYNCHRONOUS_WAIT_DISABLE;    // FSMC won't care
  hsram3.Init.WriteBurst = FSMC_WRITE_BURST_DISABLE;    // Not supported write burst
  /* Timing */
  Timing.AddressSetupTime = 0;    // doesn't matter
  Timing.AddressHoldTime = 0;    // doesn't matter
  Timing.DataSetupTime = 3;        // NOTE: the less, the butter, I tried 2 but failed on 1. 2 
  Timing.BusTurnAroundDuration = 0;    // doen'st matter
  Timing.CLKDivision = 0;    // doesn't care
  Timing.DataLatency = 0;    // doesn't care
  Timing.AccessMode = FSMC_ACCESS_MODE_A;
    .....

IO Pin Configuration

Actually, CubeMx is a good nanny. She does great job. We don't have to care the pins FSMC using. But we can still take a look.

// in stm32f1xx_hal_msp.c
static void HAL_FSMC_MspInit(void){
  /* USER CODE BEGIN FSMC_MspInit 0 */

  /* USER CODE END FSMC_MspInit 0 */
  GPIO_InitTypeDef GPIO_InitStruct;
  if (FSMC_Initialized) {
    return;
  }
  FSMC_Initialized = 1;
  /* Peripheral clock enable */
  __HAL_RCC_FSMC_CLK_ENABLE();
  
  /** FSMC GPIO Configuration  
  PF0   ------> FSMC_A0        // See, 1bit addressing
  PE7   ------> FSMC_D4
  PE8   ------> FSMC_D5
  PE9   ------> FSMC_D6
  PE10   ------> FSMC_D7
  PD14   ------> FSMC_D0
  PD15   ------> FSMC_D1
  PD0   ------> FSMC_D2
  PD1   ------> FSMC_D3
  PD4   ------> FSMC_NOE
  PD5   ------> FSMC_NWE
  PG10   ------> FSMC_NE3
  */
  GPIO_InitStruct.Pin = GPIO_PIN_0;
  GPIO_InitStruct.Mode = GPIO_MODE_AF_PP;
  GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_HIGH;
  HAL_GPIO_Init(GPIOF, &GPIO_InitStruct);

  GPIO_InitStruct.Pin = GPIO_PIN_7|GPIO_PIN_8|GPIO_PIN_9|GPIO_PIN_10;
  GPIO_InitStruct.Mode = GPIO_MODE_AF_PP;
  GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_HIGH;
  HAL_GPIO_Init(GPIOE, &GPIO_InitStruct);

  GPIO_InitStruct.Pin = GPIO_PIN_14|GPIO_PIN_15|GPIO_PIN_0|GPIO_PIN_1 
                          |GPIO_PIN_4|GPIO_PIN_5;
  GPIO_InitStruct.Mode = GPIO_MODE_AF_PP;
  GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_HIGH;
  HAL_GPIO_Init(GPIOD, &GPIO_InitStruct);

  GPIO_InitStruct.Pin = GPIO_PIN_10;
  GPIO_InitStruct.Mode = GPIO_MODE_AF_PP;
  GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_HIGH;
  HAL_GPIO_Init(GPIOG, &GPIO_InitStruct);

  /* USER CODE BEGIN FSMC_MspInit 1 */

  /* USER CODE END FSMC_MspInit 1 */
}

With 12 pins, we can fully control our PSRAM.

Memory Access Code

The CubeMx is so sweet that she even prepared HAL edition SRAM read/write code for us. If we are careful enough, we can find the DMA routines. However, we won't discuss DMA or IT here. We will test our SRAM by dry run. The following list is the typical memory R/W routines.

// Write sram byte by byte, the normally way
void sram_write(unsigned char* pbuf, unsigned long addr, size_t size) {
    while(size--) {
        *(__IO unsigned char *)(FSMC_BANK1_3 + addr) = *pbuf;
        addr++;
        pbuf++;
    }
}
// Read byte by byte
void sram_read(unsigned char* pbuf, unsigned long addr, size_t size) {
    while(size--) {
        *pbuf = *(__IO unsigned char *)(FSMC_BANK1_3 + addr);
        pbuf++;
        addr++;
    }
}
// Faster, word by word
// NOTE: data length won't be concerned.
void sram_write_word(unsigned short* pbuf, unsigned long addr, size_t size) {
    while(size--) {
        *(__IO unsigned short *)(FSMC_BANK1_3 + addr) = *pbuf;
        addr++;
        pbuf++;
    }
}
// Read word by word
void sram_read_word(unsigned short* pbuf, unsigned long addr, size_t size) {
    while(size--) {
        *pbuf = *(__IO unsigned short*)(FSMC_BANK1_3 + addr);
        pbuf++;
        addr++;
    }
}
// One step further, try double word
void sram_write_dword(unsigned int* pbuf, unsigned long addr, size_t size) {
    while(size--) {
        *(__IO unsigned int *)(FSMC_BANK1_3 + addr) = *pbuf;
        addr++;
        pbuf++;
    }
}

void sram_read_dword(unsigned int* pbuf, unsigned long addr, size_t size) {
    while(size--) {
        *pbuf = *(__IO unsigned int*)(FSMC_BANK1_3 + addr);
        pbuf++;
        addr++;
    }
}

// NOTE: the following code uses two tricks:
// 1. Loop weakening
// 2. Code extending
// Fast write 8 bytes
void sram_fast_write8(unsigned char* pbuf, unsigned int addr, size_t size) {
    const int align = 2 * sizeof(unsigned int);

    if (size <= align) {
        sram_write(pbuf, addr, size);
        return ;
    }

    size_t remains = size & 7;
    size_t count = (size - remains) / sizeof(unsigned int);
    unsigned int* psrc= (unsigned int *)pbuf;
    __IO unsigned int* pdst = FSMC_BANK1_3 + addr;

    while(count){
        // Write 8 ints each time
        *pdst++ = *psrc++; count--;
        *pdst++ = *psrc++; count--;
    }

    if (remains) {
        sram_write(pdst, psrc, remains);
    }
}

// Fast write 16 bytes
void sram_fast_write16(unsigned char* pbuf, unsigned int addr, size_t size) {
    const int align = 4 * sizeof(unsigned int);

    if (size <= align) {
        sram_write(pbuf, addr, size);
        return ;
    }

    size_t remains = size & 15;
    size_t count = (size - remains) / sizeof(unsigned int);
    unsigned int* psrc= (unsigned int *)pbuf;
    __IO unsigned int* pdst = FSMC_BANK1_3 + addr;

    while(count){
        // Write 8 ints each time
        *pdst++ = *psrc++; count--;
        *pdst++ = *psrc++; count--;
        *pdst++ = *psrc++; count--;
        *pdst++ = *psrc++; count--;
    }

    if (remains) {
        sram_write(pdst, psrc, remains);
    }
}

// Fast write 32 bytes
void sram_fast_write32(unsigned char* pbuf, unsigned int addr, size_t size) {
    const int align = 8 * sizeof(unsigned int);

    if (size <= align) {
        sram_write(pbuf, addr, size);
        return ;
    }

    size_t remains = size & 31;
    size_t count = (size - remains) / sizeof(unsigned int);
    unsigned int* psrc= (unsigned int *)pbuf;
    __IO unsigned int* pdst = FSMC_BANK1_3 + addr;

    while(count){
        // Write 8 ints each time
        *pdst++ = *psrc++; count--;
        *pdst++ = *psrc++; count--;
        *pdst++ = *psrc++; count--;
        *pdst++ = *psrc++; count--;

        *pdst++ = *psrc++; count--;
        *pdst++ = *psrc++; count--;
        *pdst++ = *psrc++; count--;
        *pdst++ = *psrc++; count--;
    }

    if (remains) {
        sram_write(pdst, psrc, remains);
    }
}

Testing Code

Here we now have the testing code, in main loop:

    ...
  const unsigned int mem_size = 1 * 1024 * 1024;    // We have 1MB memory
  const unsigned int buf_size = 4 * 1024;            // read/write buffer size: 4KB, the unit we do W/R test
  const unsigned int test_loop = 16;                // Run 16 times for each kind test

  unsigned char pbuf[buf_size];    // The read buffer
  unsigned char pres[buf_size];    // The write buffer
    ...

The main loop looks like this:

 /* USER CODE BEGIN WHILE */
  while (1)
  {

  /* USER CODE END WHILE */

  /* USER CODE BEGIN 3 */
      LOG("-------------------- Begin test -------------------------\r\n");

      {
          LOG("Validating....");
          memset(pbuf, 0xAB, sizeof(pbuf));
          sram_write(pbuf, 0, sizeof(pbuf));
          memset(pres, 0, sizeof(pres));
          sram_read(pres, 0, sizeof(pres));

          if (0 != memcmp(pbuf, pres, sizeof(buf_size))) {
              LOG("Failed\r\n");
              HAL_Delay(1000);
              continue ;
          } else {
              LOG("Success\r\n");
          }
      }
      /////////////////////////////////////////////////////
      {
          LOG("Built in:");
          unsigned int ticks = HAL_GetTick();

            for(unsigned int n = 0; n < test_loop; n++) {
                for(unsigned int addr = 0; addr < mem_size; addr += buf_size) {
                    memcpy(pbuf, pres, buf_size);
                }
            }

            ticks = HAL_GetTick() - ticks;
            LOG("\tWt: %lu\tWs: %lu KB/t", ticks / test_loop, mem_size / (ticks * 1024 / test_loop));

            LOG("\r\n");
      }
      /////////////////////////////////////////////////////
      {
            LOG("Byte:");
            unsigned int ticks = HAL_GetTick();

            for(unsigned int n = 0; n < test_loop; n++) {
                for(unsigned int addr = 0; addr < mem_size; addr += buf_size) {
                    sram_write(pbuf, addr, sizeof(pbuf));
                }
            }

            ticks = HAL_GetTick() - ticks;
            LOG("\tWt: %lu\tWs: %lu KB/t", ticks / test_loop, mem_size / (ticks * 1024 / test_loop));

            ticks = HAL_GetTick();

            for(unsigned int n = 0; n < test_loop; n++) {
                for(unsigned int addr = 0; addr < mem_size; addr += buf_size) {
                    sram_read(pbuf, addr, sizeof(pbuf));
                }
            }

            ticks = HAL_GetTick() - ticks;
            LOG("\tRt: %lu\tRs: %lu KB/t\r\n", ticks / test_loop, mem_size / (ticks * 1024 / test_loop));
//            HAL_Delay(1000);
      }

      ///////////////////////////////
      {
            LOG("Word:");
            unsigned int ticks = HAL_GetTick();

            for(unsigned int n = 0; n < test_loop; n++) {
                for(unsigned int addr = 0; addr < mem_size; addr += buf_size) {
                    sram_write_word(pbuf, addr, sizeof(pbuf) / sizeof(unsigned short));
                }
            }

            ticks = HAL_GetTick() - ticks;
            LOG("\tWt: %lu\tWs: %lu KB/t", ticks / test_loop, mem_size / (ticks * 1024 / test_loop));

            ticks = HAL_GetTick();

            for(unsigned int n = 0; n < test_loop; n++) {
                for(unsigned int addr = 0; addr < mem_size; addr += buf_size) {
                    sram_read_word(pbuf, addr, sizeof(pbuf) / sizeof(unsigned short));
                }
            }

            ticks = HAL_GetTick() - ticks;
            LOG("\tRt: %lu\tRs: %lu KB/t\r\n", ticks / test_loop, mem_size / (ticks * 1024 / test_loop));
//            HAL_Delay(1000);
        }

      ///////////////////////////////
        {
            LOG("Dword:");
            unsigned int ticks = HAL_GetTick();

            for(unsigned int n = 0; n < test_loop; n++) {
                for(unsigned int addr = 0; addr < mem_size; addr += buf_size) {
                    sram_write_dword(pbuf, addr, sizeof(pbuf) / sizeof(unsigned int));
                }
            }

            ticks = HAL_GetTick() - ticks;
            LOG("\tWt: %lu\tWs: %lu KB/t", ticks / test_loop, mem_size / (ticks * 1024 / test_loop));

            ticks = HAL_GetTick();

            for(unsigned int n = 0; n < test_loop; n++) {
                for(unsigned int addr = 0; addr < mem_size; addr += buf_size) {
                    sram_read_dword(pbuf, addr, sizeof(pbuf) / sizeof(unsigned int));
                }
            }

            ticks = HAL_GetTick() - ticks;
            LOG("\tRt: %lu\tRs: %lu KB/t\r\n", ticks / test_loop, mem_size / (ticks * 1024 / test_loop));
//            HAL_Delay(1000);
        }
        //////////////////////////////////////////
        {
            LOG("Fast(8B):");
            unsigned int ticks = HAL_GetTick();

            for(unsigned int n = 0; n < test_loop; n++) {
                for(unsigned int addr = 0; addr < mem_size; addr += buf_size) {
                    sram_fast_write8(pbuf, addr, sizeof(pbuf));
                }
            }
            ticks = HAL_GetTick() - ticks;

            LOG("\tWt: %lu\tWs: %lu KB/t", ticks / test_loop, mem_size / (ticks * 1024 / test_loop));

            LOG("\r\n");
        }

        //////////////////////////////////////////
        {
            LOG("Fast(16B):");
            unsigned int ticks = HAL_GetTick();

            for(unsigned int n = 0; n < test_loop; n++) {
                for(unsigned int addr = 0; addr < mem_size; addr += buf_size) {
                    sram_fast_write16(pbuf, addr, sizeof(pbuf));
                }
            }
            ticks = HAL_GetTick() - ticks;

            LOG("\tWt: %lu\tWs: %lu KB/t", ticks / test_loop, mem_size / (ticks * 1024 / test_loop));

            LOG("\r\n");
        }

        //////////////////////////////////////////
        {
            LOG("Fast(32B):");
            unsigned int ticks = HAL_GetTick();

            for(unsigned int n = 0; n < test_loop; n++) {
                for(unsigned int addr = 0; addr < mem_size; addr += buf_size) {
                    sram_fast_write32(pbuf, addr, sizeof(pbuf));
                }
            }
            ticks = HAL_GetTick() - ticks;

            LOG("\tWt: %lu\tWs: %lu KB/t", ticks / test_loop, mem_size / (ticks * 1024 / test_loop));

            LOG("\r\n");
        }


        HAL_Delay(1000); // cooling down
  } // while
  /* USER CODE END 3 */

Final Result

By run the test in debug mode, I got this result:
With Timing.DataSetupTime = 3;

-------------------- Begin test -------------------------
Validating....Success
Built in:       Wt: 234 Ws: 4 KB/t
Byte:   Wt: 204 Ws: 4 KB/t      Rt: 409 Rs: 2 KB/t
Word:   Wt: 175 Ws: 5 KB/t      Rt: 274 Rs: 3 KB/t
Dword:  Wt: 145 Ws: 7 KB/t      Rt: 202 Rs: 5 KB/t
Fast(8B):       Wt: 95  Ws: 10 KB/t
Fast(16B):      Wt: 95  Ws: 10 KB/t
Fast(32B):      Wt: 95  Ws: 10 KB/t

Surprisingly, I found the read operation is almost 50-70% of writing, much slower. Another interesting thing is the built-in RAM which is located in the chip I think, gains speed as same as byte by byte method. And so on, the fast method is really fast, but has it limitation: 8 bytes per loop, won't work harder any more.

With Timing.DataSetupTime = 2;

-------------------- Begin test -------------------------
Validating....Success
Built in:       Wt: 234 Ws: 4 KB/t
Byte:   Wt: 204 Ws: 4 KB/t      Rt: 395 Rs: 2 KB/t
Word:   Wt: 164 Ws: 6 KB/t      Rt: 259 Rs: 3 KB/t
Dword:  Wt: 134 Ws: 7 KB/t      Rt: 187 Rs: 5 KB/t
Fast(8B):       Wt: 80  Ws: 12 KB/t
Fast(16B):      Wt: 80  Ws: 12 KB/t
Fast(32B):      Wt: 80  Ws: 12 KB/t

Noticed that with later configuration, the fast write has 2KB/ticks improvement.

Conclusion

The external "P"SRAM is fast enough. Mostly we can use it as another memory resource. Some applications such as colorful LCD manipulation can use the external PSRAM as double buffer to avoid lagging. Further more, probably we can run program from external PSRAM and have more fun.

Good luck!

你可能感兴趣的:(embedded,c++,stm)