There has been the C++ project BasicExcel for quite some years now. It is very useful to read and write Excel sheets in .xls format. However, it contains only very basic functionality. There are missing features like text formatting using fonts, and display formats and formulas are not yet supported. This article covers this additional functionality. The new, derived project ExcelFormat
is based on the old BasicExcel
code, and just adds what I was missing for my own needs. If you want to see where I am using the new library, you can look at the Export functions of Service Manager and Uninstall Manager.
This library processes Excel files in the BIFF8 XLS file format. For a detailed documentation of the format and all its internals, there are two documents available:
You can use the code on MS Windows with the MSVC compilers of VC++ 6.0 SP3 or later. Using GCC on MacOSX or Linux is also supported, but with some limitations. May be it even works in other environments like Sun Solaris, but this is not yet tested. Here you can see a table of environments with their respective status:
small XLS files | larger XLS files | |
---|---|---|
WIN32, using Windows API IStorage |
OK | OK |
WIN64, using Windows API IStorage |
OK | OK |
MacOS X / Linux 32 bit, using BasicExcel CompoundFile implementation |
OK | issues |
MacOS X / Linux 64 bit, using BasicExcel CompoundFile implementation |
OK | issues |
The issues listed above are because of the incomplete CompoundFile
implementation of the underlyingBasicExcel
library. I already fixed some part of the BasicExcel
code regarding numbers in RKValue
format and 64 bit compatibility, but there are still issues that show up when reading or writing larger XLS files which use XBAT or SBAT entries.
To use the new formatting functionality, first create an XLSFormatManager
object, like in the example1()
function, and attach it to an existing BasicExcel
object:
void example1(const char* path) { BasicExcel xls; // create sheet 1 and get the associated BasicExcelWorksheet pointer xls.New(1); BasicExcelWorksheet* sheet = xls.GetWorksheet(0); XLSFormatManager fmt_mgr(xls);
You can find all the examples of this article in the source code file Examples.cpp.
To define a custom font, create an ExcelFont
object and set any needed properties, for example, the font weight for a bold font:
ExcelFont font_bold; font_bold._weight = FW_BOLD; // 700=bold, 400=normal
The format of an Excel cell can be defined by a CellFormat
object, which holds the chosen font and some more properties:
CellFormat fmt_bold(fmt_mgr); fmt_bold.set_font(font_bold);
After you have prepared the CellFormat
, you can choose the font and display settings of Excel cells by callingSetFormat()
:
// Create a table containing a header row in bold and four rows below. int col, row = 0; for(col=0; col<10; ++col) { BasicExcelCell* cell = sheet->Cell(row, col); cell->Set("TITLE"); cell->SetFormat(fmt_bold); } while(++row < 4) { for(int col=0; col<10; ++col) sheet->Cell(row, col)->Set("text"); } ++row;
Text color is specified by setting color indices in ExcelFont
, for example:
ExcelFont font_red_bold; font_red_bold._weight = FW_BOLD; font_red_bold._color_index = EGA_RED; CellFormat fmt_red_bold(fmt_mgr, font_red_bold); fmt_red_bold.set_color1(COLOR1_PAT_SOLID); // solid background fmt_red_bold.set_color2(MAKE_COLOR2(EGA_BLUE,0)); // blue background CellFormat fmt_green(fmt_mgr, ExcelFont().set_color_index(EGA_GREEN)); for(col=0; col<10; ++col) { BasicExcelCell* cell = sheet->Cell(row, col); cell->Set("xxx"); cell->SetFormat(fmt_red_bold); cell = sheet->Cell(row, ++col); cell->Set("yyy"); cell->SetFormat(fmt_green); }
ExcelFormat.h contains constants to define basic palette colors in the enumeration EXCEL_COLORS
, you can use in calls to ExcelFont()::set_color_index()
. The macro MAKE_COLOR2
accepts two color indices to specify the pattern line and pattern background colors. As a shortcut to calling CellFormat::set_color1()
andCellFormat::set_color2()
, you can also use CellFormat::set_background()
to define cells with solid background colors or colorize patterns.
After creating and formatting the Excel cells in memory, all you have to do is to save the new Excel sheet as a file:
xls.SaveAs(path); }
This should be enough for a quick overview on how to use the new ExcelFormat
objects. In the source code fileExcelFormat.cpp, you can find some more examples.
There are nine font properties available:
struct ExcelFont { ExcelFont() ... wstring _name; short _height; short _weight; short _options; short _color_index; short _escapement_type; char _underline_type; char _family; char _character_set; ... };
For detailed information, please look into the XLS file format documentation.
These CellFormat
properties are available in addition to the chosen ExcelFont
and the display format:
struct CellFormat { CellFormat(XLSFormatManager& mgr) ... char _alignment; char _rotation; char _text_props; int _borderlines; int _color1; short _color2; ... };
In example2()
, you can see how to change the font face and the font size:
ExcelFont font_header; font_header.set_weight(FW_BOLD); font_header.set_underline_type(EXCEL_UNDERLINE_SINGLE); font_header.set_font_name(L"Times New Roman"); // font face "Times New Roman" font_header.set_color_index(EGA_BLUE); font_header._options = EXCEL_FONT_STRUCK_OUT; CellFormat fmt_header(fmt_mgr, font_header); fmt_header.set_rotation(30); // rotate the header cell text 30° to the left int row = 0; for(int col=0; col<10; ++col) { BasicExcelCell* cell = sheet->Cell(row, col); cell->Set("TITLE"); cell->SetFormat(fmt_header); }
The example3()
function shows how to use CellFormat::set_format_string()
andBasicExcelCell::SetFormat()
to define text, number, and date format string
s. There are some predefined constants for format string
s:
#define XLS_FORMAT_GENERAL L"General" #define XLS_FORMAT_TEXT L"@" #define XLS_FORMAT_INTEGER L"0" #define XLS_FORMAT_DECIMAL L"0.00" #define XLS_FORMAT_PERCENT L"0%" #define XLS_FORMAT_DATE L"M/D/YY" #define XLS_FORMAT_TIME L"h:mm:ss" #define XLS_FORMAT_DATETIME L"M/D/YY h:mm"
However, you can use any valid Excel format string
to define custom display formats.
example4()
shows how to use many different fonts and colors in one sheet:
Formulas are now preserved when reading and writing an Excel sheet. You can even store a formula into an Excel cell by calling BasicExcelCell::SetFormula()
. However, currently, you either have to copy aWorksheet::CellTable::RowBlock::CellBlock::Formula
object from an already existing cell, or you have to define it yourself, which is quite tricky because Excel uses a pre-parsed formula containing RPN tokens.
For those who are interested in a bit of background information, I want to provide a description of what has been changed since the predecessor library BasicExcel.
There are two different BasicExcel
implementations, one for VC++ version 6.0, and one for newer compilers.ExcelFormat
now merges these two code bases by using conditional compilation to handle the differences. Most of this is accomplished in the header file by this code snippet (and modifications using the newly defined macros):
#if _MSC_VER<=1200 // VC++ 6.0 #pragma warning(disable: 4786) #define LONGINT __int64 #define LONGINT_CONST(x) x #define COMPOUNDFILE #else // newer Microsoft compilers #define LONGINT long long #define LONGINT_CONST(x) x##LL #define COMPOUNDFILE CompoundFile:: #ifdef _DEBUG #define _ITERATOR_DEBUG_LEVEL 0 // speedup iterator operations while debugging #endif #endif
To distinguish between MS Windows environments using MSVC and others using GCC, the macro _MSC_VER is tested for existence. This switches between using the Windows API to access the compound binary file format and the BasicExcel implementation of CompoundFile.
The #pragma warning
statement disables compiler warnings of VC++ 6.0 because of long compiler internal names in object files, which occur for non-trivial usage of STL classes.
Immediately after these defines, there are these preprocessor statements for VS 2005 and above:
#if _MSC_VER>=1400 // VS 2005 #define _CRT_SECURE_NO_WARNINGS //MF #define _SCL_SECURE_NO_WARNINGS //MF #endif
They disable the VC++ security library warnings, because the BasicExcel
code is not yet prepared for this new runtime library additions. By the way, the MF comments mark additions and fixes by me to the old BasicExcel
code.
Additionally, I eliminated some compiler warnings by fixing data types. Typically, this was a change from integral types like int
to the C runtime library type size_t
.
To switch between 32 bit and 64 bit mode using GCC, use the compiler option -m32
respective -m64
like in the Makefile example in the source download. In the VC++ environment you can choose the target environment in the project settings.
In addition to the new API described above, I had to add code in BasicExcel
in order to implement these new features:
Formula
structure from/to XLS filesFORMAT
structure from/to XLS filesBasicExcelCell
objects for all the various cell typesThe information about the format of an Excel cell in the BIFF8 file format is stored using a so called XF index. This refers to an XF (extended format
) record, which consists of the following members:
XF { short fontRecordIndex // FORMAT index short formatRecordIndex // FONT index short protectionType char alignment char rotation char textProperties char usedAttributes int borderLines int colour1 short colour2 }
Besides a few directly associated attributes specifying alignment, rotation, etc. of the Excel cell, there are two more index values: fontRecordIndex
and formatRecordIndex
. These are used to define font and display format descriptions. In the whole picture, this two-level index based formatting architecture enables cell formatting with small file sizes and low memory usage, because there are typically only a few different fonts and display formats used in an Excel worksheet. The font index refers to a FONT
record with the following attributes:
FONT { short height short options short colourIndex short weight short escapementType char underlineType char family char characterSet char unused SmallString name }
The third index is a special one. This format index is associated with a record containing only the index itself and a text representation of the display format:
FORMAT {
short index
LargeString fmtstring
}
XLSFormatManager
manages these three formatting substructures to format Excel cells using the C++ structuresCellFormat
and ExcelFont
:
struct CellFormat -> XF {FORMAT index, FONT index, XF attributes} struct ExcelFont -> FONT {FONT attributes}
On calling CellFormat::set_font(const ExcelFont& font)
, the manager class searches for an already registered matching font description. If there is none yet, a new FONT
record is created to be stored in the Excel sheet. On calling CellFormat::set_format_string(const wstring& fmt_str)
, the manager class searches for an already registered matching display format string
. If there is none yet, a new FORMAT
record stores the display format string
. The same strategy is used when applying a CellFormat
to a cell object by callingBasicExcelCell::SetFormat(const CellFormat& fmt)
: the manager class searches for an already registered matching XF description with identical font and format indices as well as matching XF attributes. If there is none yet, a new XF
record is created to be stored in the Excel sheet. In the end, this results in a list of FORMAT
,FONT
, and XF
records, which are stored in the header of the Excel workbook file. Each cell is formatted by storing a single related XF index, which determines all the cell formatting attributes, the font, and the display format.
For further implementation details, please have a look into ExcelFormat.h, ExcelFormat.cpp, BasicExcel.hpp andBasicExcel.cpp of the source code.
To limit memory usage, a reference counting SmartPtr
is used to manage the heap structures needed for formula storage. Every cell contains only an optionally filled pointer to the struct Formula
. For text or number cells, formula information is not needed. So the smart pointer stays empty, and just contains the value NULL
.
In the following section, you can see the implementation of the struct RefCnt
as the base for struct Formula
and the template struct SmartPtr
to hold the reference counted heap objects:
// reference counter for SmartPtr managed objects struct RefCnt { // On construction the reference counter // is initialized with an usage count of 0. RefCnt() : _ref_cnt(0) { } int _ref_cnt; }; // reference counting smart pointer template<typename T> struct SmartPtr { // default constructor SmartPtr() : _ptr(NULL) { } // The initialized SmartPtr constructor increments the reference counter // in struct RefCnt. SmartPtr(T* p) : _ptr(p) { if (p) ++_ptr->_ref_cnt; } // The copy constructor increments the reference counter. SmartPtr(const SmartPtr& other) : _ptr(other._ptr) { if (_ptr) ++_ptr->_ref_cnt; } // The destructor decreases the reference counter and // frees the managed memory as the counter reaches zero. ~SmartPtr() { if (_ptr) { if (!--_ptr->_ref_cnt) delete _ptr; } } // The assignment operator increments the reference counter. SmartPtr& operator=(T* p) { if (_ptr) { if (!--_ptr->_ref_cnt) delete _ptr; _ptr = NULL; } if (p) { _ptr = p; ++_ptr->_ref_cnt; } return *this; } // operator bool() to check for non-empty smart pointers operator bool() const {return _ptr != NULL;} // operator!() to check for empty smart pointers bool operator!() const {return !_ptr;} // operator->() to access the managed objects T* operator->() {return _ptr;} const T* operator->() const {return _ptr;} // Dereference pointed memory T& operator*() {return *_ptr;} const T& operator*() const {return *_ptr;} private: T* _ptr; };
There are some new string
conversion functions: stringFromSmallString()
, stringFromLargeString()
,wstringFromSmallString()
, wstringFromLargeString()
using narrow_string()
/widen_string()
are used to convert from internal Excel string
structures to STL string
classes and vice versa. You can use them to access the internal data storage of BasicExcel
.
ExcelFormat
(BasicExcel
Version 2.0)BasicExcel::Close()
, CellFormat::get
/set_text_props()
andget/set_borderlines()
fstream
instead of closing it followed by open to prevent races in conjunction with virus scannersset_borderlines()
overloadExcelFont::set_italic()
, CellFormat::set_wrapping()
COLINFO
Load()
and SaveAs()
RKValues
written by OpenOffice CalcThis article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)