Why this article has a stm tag?
Segmentfault doesn't support STM32 tag, and I have no enough points for creating new one.
An implementation of Fixed Point algorithm for STM32 or any other MCUs which have no FPU support.
For You Information
- Motive
When using STM32F10x MCU, the lack of FPU makes the float computation too slow for a time sensitive processing--for example, an interrupt handle which should get job done in milliseconds. Assembly language is a good way but not best, because it involves the learning of another language and less generic. Not sure if others had built the wheels, but I like to reinvent it--for nothing else, just because I can.
- Compiler
The implementation was written in C++ with templates. By now, most embedded GCC compilers should work well.
- MCU Architecture
The implementation is designed to work on a 32-bit MCU. It works on 16-bit MCU too.
- Limitation of Precision and Range
To represent a float number, the implementation uses partial bits of 32-bit/64-bit integer to represent the integer part and fraction part. It also means the precision and the range the fixed point can represent are limited. For 32-bit representation, the integer part uses 21 bits and the fraction uses 11 bits. For 64-bit, the bits are 42 and 22, just doubled. Further more, the situation of overflow is not considered. It's your responsibility to handle it.Keep this in mind when using the code.
- Understanding The Fixed Point
I assume you already knew the Fixed Point algorithm. If don't, please teach yourself. It's very simple.
- Liability
The code is free for using for anyone. In no event shall I be liable for any situation arising in any way out of the use of the code. Take your own risk to use the code.
Enough talk, let's show the code.
* fxfloat.hpp
* Created on: Mar 4, 2019
* Author: igame
#ifndef FXFLOAT_HPP_
#define FXFLOAT_HPP_
namespace igame {
class FxFloat {
const uint32_t fixed_point_32bits = 11;
const uint32_t fixed_point_64bits = 22;
const uint32_t fixed_point_bits = sizeof(T) == 8 ? fixed_point_64bits : fixed_point_32bits;
const int64_t fixed_point_scale = (int64_t)1 << fixed_point_bits;
const uint64_t fixed_point_fraction_mask = ((uint64_t)0xFFFFFFFFFFFFFFFF >> (32 - fixed_point_bits));
const int64_t fixed_point_max = (int64_t)0x3F3F3F3F3F3F3F3F;
const int64_t fixed_point_min = -fixed_point_max;
typedef T value_type;
typedef const T const_value_type;
value_type value{ 0 };
FxFloat() { }
FxFloat(const float x) {
int64_t integer = ((int64_t)x) << fixed_point_bits;
int64_t fract = (int64_t)((x - (int64_t)x) * fixed_point_scale);
this->value = integer + fract;
/// convert back to float
INLINE operator float() {
return to_float();
/// Assignment
INLINE FxFloat& operator = (const int64_t& right) {
this->value = right;
return *this;
INLINE FxFloat& operator = (const float& right) {
this->value = from_float(right);
return *this;
INLINE FxFloat& operator = (const FxFloat& right) {
return this->operator = (right.value);
/// Add
INLINE FxFloat& operator += (int64_t right) {
this->value += right;
return *this;
INLINE FxFloat& operator += (const float& right) {
return this->operator += (from_float(right));
INLINE FxFloat& operator += (const FxFloat& right) {
return this->operator += (right.value);
/// Sub
INLINE FxFloat& operator -= (const int64_t& right) {
this->value -= right;
return *this;
INLINE FxFloat& operator -= (const float& right) {
return this->operator -= (from_float(right));
INLINE FxFloat& operator -= (const FxFloat& right) {
return this->operator -= (right.value);
/// Mul
INLINE FxFloat& operator *= (const int64_t& right) {
this->value *= right;
this->value >>= fixed_point_bits;
return *this;
INLINE FxFloat& operator *= (const float& right) {
return this->operator *= (from_float(right));
INLINE FxFloat& operator *= (const FxFloat& right) {
return this->operator *= (right.value);
/// Div
INLINE FxFloat& operator /= (const int64_t& right) {
if (right == 0) {
this->value = fixed_point_max;
else {
this->value <<= fixed_point_bits;
this->value /= right;
return *this;
INLINE FxFloat& operator /= (const float& right) {
return this->operator /= (from_float(right));
INLINE FxFloat& operator /= (const FxFloat& right) {
return this->operator /= (right->value);
INLINE int64_t from_float(const float& x) {
int64_t integer = ((int64_t)x) << fixed_point_bits;
int64_t fract = (int64_t)((x - (int64_t)x) * fixed_point_scale);
return integer + fract;
INLINE float to_float() {
(float)this->value / fixed_point_scale;
/// Global Operator Overloading
INLINE FxFloat operator + (FxFloat& left, FxFloat& right) {
FxFloat res{ left };
res += right;
return res;
INLINE FxFloat operator + (const FxFloat& left, const float& right) {
FxFloat res{ left };
res += right;
return res;
INLINE FxFloat operator + (const float& left, const FxFloat& right) {
FxFloat res{ left };
res.value += right;
return res;
INLINE FxFloat operator - (FxFloat& left, FxFloat& right) {
FxFloat res{ left };
res -= right;
return res;
INLINE FxFloat operator - (const FxFloat& left, const float& right) {
FxFloat res{ left };
res -= right;
return res;
INLINE FxFloat operator - (const float& left, const FxFloat& right) {
FxFloat res{ left };
res.value -= right;
return res;
INLINE FxFloat operator * (FxFloat& left, FxFloat& right) {
FxFloat res{ left };
res *= right;
return res;
INLINE FxFloat operator * (const FxFloat& left, const float& right) {
FxFloat res{ left };
res *= right;
return res;
INLINE FxFloat operator * (const float& left, const FxFloat& right) {
FxFloat res{ left };
res.value *= right;
return res;
INLINE FxFloat operator / (FxFloat& left, FxFloat& right) {
FxFloat res{ left };
res /= right;
return res;
INLINE FxFloat operator / (const FxFloat& left, const float& right) {
FxFloat res{ left };
res /= right;
return res;
INLINE FxFloat operator / (const float& left, const FxFloat& right) {
FxFloat res{ left };
res /= right;
return res;
} // ns igame
#endif /* FXFLOAT_HPP_ */