Using new and delete in C++ on Cortex-M3

A nice way to precisely time the instantiation of an object in C++  is to use the new and delete operators. Normally, an global object is instantiated at a undefined time – the compiler decides on this, not a language standard.

Whilst delete would not be of great use in an embedded system without good memory management (the easiest implementation would be the  BGET package), new can be used efficiently to cater for a linear instantiation where the user decides when to instantiate.

If you want to use C++ without the bloat, you’re probably forced to provide your own new and delete routines. On Cortex-M3, this can be tricky.

Although the core supports unaligned memory access, unaligned multiple LDM, STM, LDRD and STRD calls will cause a Usage Fault exception. Also, using unaligned data decreases performance. When instantiating a class with a multitude of variables listed in the class, the allocated memory block could well be unaligned. Also, code could conceivably be generated that will execute a number of LDM or LDRD instructions, for example, in the constructor of the object.  It is therefore imperative that a new call will always return an aligned block of data.

A simple way to implement new and delete would be like this :

static void * toewijzing(const size_t size) /* let op: size geeft het aantal bytes aan */
{
    static UInt8 *heap_ptr = (UInt8 *)(__bss_end);
    const UInt8 *base = heap_ptr;       /*  Point to end of heap.  */
    const UInt8 *eindeSRAM = (const UInt8 *)__TOP_STACK;        

    if ((heap_ptr+size) < eindeSRAM)
    {
        heap_ptr += size;       /*  Increase heap met size aantal bytes  */
    }
    else
    {
        while(1);
        KiwandaAssert(0);
    }

    return((void *)base);               /*  Return pointer to start of new heap area.  */
}

void * operator new(size_t size) /* let op: size geeft het aantal bytes aan */
{
    return(toewijzing(size));
} 

void * operator new[](size_t size)
{
    return(toewijzing(size));
} 

void operator delete(void * ptr)
{
    // free(ptr);
} 

void operator delete[](void * ptr)
{
    //free(ptr);
}

The linker will provide the location of the end of the used SRAM space (__bss_end) and the end of the SRAM block (__TOP_STACK).  An example would be the following Linker Load file, used for the Energy Micro EFM32G880F64 processor:

************************************************************************
ARM Cortex M-3 standaard microcontroller loadscript.
Memory I/O adressen, aangepast aan eigenschappen
van de Energy Micro EFM32G880F64 Microcontroller

Alle aanpassingen in dit bestand zijn
(c) 2008-2011 Kiwanda Embedded Systemen
Voor alle overige software geldt het copyright van Atmel

$Id: efm32g880f64.ld 3041 2011-08-28 20:29:52Z oldenburgh $
************************************************************************/

OUTPUT_FORMAT("elf32-littlearm", "elf32-littlearm", "elf32-littlearm")
OUTPUT_ARCH(arm)

MEMORY
{
  FLASH (rx)   : ORIGIN = 0x00000000, LENGTH = 64K
  DATA (rw)    : ORIGIN = 0x20000000, LENGTH = 16K
}

__TOP_STACK    = ORIGIN(DATA) + LENGTH(DATA);

/* Section Definitions */

SECTIONS
{
  /* first section is .text which is used for code */

  . = ORIGIN(FLASH);
  .text :
  {
        KEEP(*(.cortexm3vectors))      /* core exceptions */
        KEEP(*(.geckovectors))          /* efm32 exceptions */
        . = ALIGN(4);
        KEEP(*(.init))
        CREATE_OBJECT_SYMBOLS
        *(.text .text.*)                   /* C en C++ code */
        *(.gnu.linkonce.t.*)
        *(.glue_7)
        *(.glue_7t)
        *(.vfp11_veneer)
        *(.ARM.extab* .gnu.linkonce.armextab.*)
        *(.gcc_except_table)
        . = ALIGN(4);
   } >FLASH

__exidx_start = .;
.ARM.exidx   : { *(.ARM.exidx* .gnu.linkonce.armexidx.*) }
__exidx_end = .;

   . = ALIGN(4);

        .rodata : ALIGN (4)
        {
                *(.rodata .rodata.* .gnu.linkonce.r.*)

                . = ALIGN(4);
                KEEP(*(.init))

                . = ALIGN(4);
                __preinit_array_start = .;
                KEEP (*(.preinit_array))
                __preinit_array_end = .;

                . = ALIGN(4);
                __init_array_start = .;
                KEEP (*(SORT(.init_array.*)))
                KEEP (*(.init_array))
                __init_array_end = .;

                . = ALIGN(4);
                KEEP(*(.fini))

                . = ALIGN(4);
                __fini_array_start = .;
                KEEP (*(.fini_array))
                KEEP (*(SORT(.fini_array.*)))
                __fini_array_end = .;

                . = ALIGN(0x4);

                __CTOR_LIST__ = .;
                LONG((__CTOR_END__ - __CTOR_LIST__) / 4 - 2)
                *(.ctors)
                LONG(0)
                __CTOR_END__ = .;
                __DTOR_LIST__ = .;
                LONG((__DTOR_END__ - __DTOR_LIST__) / 4 - 2)
                *(.dtors)
                LONG(0)
                __DTOR_END__ = .;

                . = ALIGN(0x4);

                KEEP (*crtbegin.o(.ctors))
                KEEP (*(EXCLUDE_FILE (*crtend.o) .ctors))
                KEEP (*(SORT(.ctors.*)))
                KEEP (*crtend.o(.ctors))

                . = ALIGN(0x4);
                KEEP (*crtbegin.o(.dtors))
                KEEP (*(EXCLUDE_FILE (*crtend.o) .dtors))
                KEEP (*(SORT(.dtors.*)))
                KEEP (*crtend.o(.dtors))

                *(.init .init.*)
                *(.fini .fini.*)

                PROVIDE_HIDDEN (__preinit_array_start = .);
                KEEP (*(.preinit_array))
                PROVIDE_HIDDEN (__preinit_array_end = .);
                PROVIDE_HIDDEN (__init_array_start = .);
                KEEP (*(SORT(.init_array.*)))
                KEEP (*(.init_array))
                PROVIDE_HIDDEN (__init_array_end = .);
                PROVIDE_HIDDEN (__fini_array_start = .);
                KEEP (*(.fini_array))
                KEEP (*(SORT(.fini_array.*)))
                PROVIDE_HIDDEN (__fini_array_end = .);

                . = ALIGN (8);
                *(.rom)
                *(.rom.b)

        } >FLASH

   . = ALIGN(4);
   _etext = . ;
   PROVIDE (etext = .);

  /* .data section which is used for initialized data */

   _bdata = . ;
   PROVIDE (bdata = .);

  .data : AT(_bdata)
  {
    _data = . ;
    *(.data)            /* normale data */
    *(.data.*)        

    SORT(CONSTRUCTORS)
    . = ALIGN(4);
  } >DATA

   _edata = . ;
   PROVIDE (edata = .);

  . = ALIGN(4);

  /* .bss section which is used for uninitialized data */
  .bss(NOLOAD) :
  {
    _bss = . ;
    __bss_start = . ;
    __bss_start__ = . ;
    *(.bss)
    *(.bss.*)
    *(COMMON)
   __bss_end__ = . ;
   __bss_end   = . ;
   PROVIDE (_ebss = .) ;
  } >DATA

  . = ALIGN(4);

   _end = .;
   PROVIDE (end = .);    /* einde intern geheugen */

  /* Stabs debugging sections.  */
  .stab          0 : { *(.stab) }
  .stabstr       0 : { *(.stabstr) }
  .stab.excl     0 : { *(.stab.excl) }
  .stab.exclstr  0 : { *(.stab.exclstr) }
  .stab.index    0 : { *(.stab.index) }
  .stab.indexstr 0 : { *(.stab.indexstr) }
  .comment       0 : { *(.comment) }
  /* DWARF debug sections.
     Symbols in the DWARF debugging sections are relative to the beginning
     of the section so we begin them at 0.  */
  /* DWARF 1 */
  .debug          0 : { *(.debug) }
  .line           0 : { *(.line) }
   *(.data)            /* normale data */
    *(.data.*)        

    SORT(CONSTRUCTORS)
    . = ALIGN(4);
  } >DATA

   _edata = . ;
   PROVIDE (edata = .);

  . = ALIGN(4);

  /* .bss section which is used for uninitialized data */
  .bss(NOLOAD) :
  {
    _bss = . ;
    __bss_start = . ;
    __bss_start__ = . ;
    *(.bss)
    *(.bss.*)
    *(COMMON)
   __bss_end__ = . ;
   __bss_end   = . ;
   PROVIDE (_ebss = .) ;
  } >DATA

  . = ALIGN(4);

   _end = .;
   PROVIDE (end = .);    /* einde intern geheugen */

  /* Stabs debugging sections.  */
  .stab          0 : { *(.stab) }
  .stabstr       0 : { *(.stabstr) }
  .stab.excl     0 : { *(.stab.excl) }
  .stab.exclstr  0 : { *(.stab.exclstr) }
  .stab.index    0 : { *(.stab.index) }
  .stab.indexstr 0 : { *(.stab.indexstr) }
  .comment       0 : { *(.comment) }
  /* DWARF debug sections.
     Symbols in the DWARF debugging sections are relative to the beginning
     of the section so we begin them at 0.  */
  /* DWARF 1 */
  .debug          0 : { *(.debug) }
  .line           0 : { *(.line) }
  /* GNU DWARF 1 extensions */
  .debug_srcinfo  0 : { *(.debug_srcinfo) }
  .debug_sfnames  0 : { *(.debug_sfnames) }
  /* DWARF 1.1 and DWARF 2 */
  .debug_aranges  0 : { *(.debug_aranges) }
  .debug_pubnames 0 : { *(.debug_pubnames) }
  /* DWARF 2 */
  .debug_info     0 : { *(.debug_info .gnu.linkonce.wi.*) }
  .debug_abbrev   0 : { *(.debug_abbrev) }
  .debug_line     0 : { *(.debug_line) }
  .debug_frame    0 : { *(.debug_frame) }
  .debug_str      0 : { *(.debug_str) }
  .debug_loc      0 : { *(.debug_loc) }
  .debug_macinfo  0 : { *(.debug_macinfo) }
  /* SGI/MIPS DWARF 2 extensions */
  .debug_weaknames 0 : { *(.debug_weaknames) }
  .debug_funcnames 0 : { *(.debug_funcnames) }
  .debug_typenames 0 : { *(.debug_typenames) }
  .debug_varnames  0 : { *(.debug_varnames) }

} /* einde secties */

When the size of the allocated block is not a multiple of 4, the next allocated block will be unaligned. In order to prevent this, allocation must always be done at multiples of 4 bytes. The next example will show this:

static void * toewijzing(const size_t size) /* let op: size geeft het aantal bytes aan */
{
    static UInt8 *heap_ptr = (UInt8 *)(__bss_end);
    const UInt8 *base = heap_ptr;       /*  Point to end of heap.  */
    const UInt8 *eindeSRAM = (const UInt8 *)__TOP_STACK;        

    if ((heap_ptr+size) < eindeSRAM)
    {
        heap_ptr += size;       /*  Increase heap met size aantal bytes en eindig op 32 bit boundary */

        while (((unsigned int)heap_ptr & 0x3) != 0)   /* bit 0 en 1 = 0 --> stappen in 4 * byte = 32 bytes */
            heap_ptr++;
    }
    else
    {
        while(1);
        KiwandaAssert(0);
    }

    return((void *)base);               /*  Return pointer to start of new heap area.  */
}

void * operator new(size_t size) /* let op: size geeft het aantal bytes aan */
{
    return(toewijzing(size));
} 

void * operator new[](size_t size)
{
    return(toewijzing(size));
} 

void operator delete(void * ptr)
{
     /* not applicable in embedded application */
} 

void operator delete[](void * ptr)
{
    /* not applicable in embedded application */
}

Advancing the heap_ptr to the next aligned address will prevent unaligned allocation to take place. The price paid for this is the loss of, at most, 3 bytes in between two allocated blocks. Since most systems have SRAM memory equal to or in excess of 16 KBytes, this won’t pose a major problem. During design of a class, careful placement of the data members will also prevent unaligned variables. Simply stated, group your equal variables together (for example, all unsigned char or unsigned short) and pad them, if necessary,  at the end with a dummy variable to grow to the 32 bit boundary:

UInt8 var1;
UInt16 var2[3];

/* total = 8+2*16 = 40 bits  -> we need 24 bits padding */
UInt8 padding[3];

This will prevent costly unaligned acces.