The 'af' directory contains all source code for the cross-platform application framework. This directory contains the following subdirectories:
ev
Source code for the event mechanism for the cross-platform application framework. This code contains the machinery to do key bindings, mouse bindings, menu bars, and tool bars. This code is currently used by the word processor, but we expect to also use it for the spreadsheet application.
gr
Source code for graphics (drawing code, font code, etc.)
util
Source code for general-purpose utility functions.
xap
Source code for application-neutral portion of the cross-platform framework defined in ev
.
Squiggles are used to underline miss-spelled words. Instead of simply erasing all squiggles and rechecking all words in a block when the block is changed, the squiggles are handled much like the words they underline.
The word currently being edited is the pending word. When the cursor does not touch the pending word anymore (due to being moved away or the user typing a word separator), the word is spell-checked. If it is miss-spelled, it will be squiggled.
When text is added to the block, fl_Squiggles::textInserted is called with information of where in the block the text was added, and how much. It will then remove any squiggle located at that offset, move all following squiggles (so they end up aligned with the words they should underline) and spell-checks words in the added text (via fl_BlockLayout::_recalcPendingWord).
When text is deleted from the block, fl_Squiggles::textDeleted is called with information of where in the block text was deleted, and how much. It removes squiggles intersecting with that area, moves all following squiggles and makes pending the word at the deletion point since two words may have been joined, or a word lost part of its letters.
When a block is split in two, fl_Squiggles::split is called with information of where the block was split, and a pointer to the new block. The squiggles from the old block are split between it and the new block. The word at the end of the old block (which may have been broken), is spell-checked, and the first word of the new block is made the pending word.
When two blocks are merged into one, fl_Squiggles::join is called with information of the offset where squiggles from the second block should be joined onto the first block. The word at the merge point is made the pending word.
There's one known buglet: typing "gref' " correctly squiggles the word when typing ' since it's a word separator. However, deleting the s in "gref's" leaves gref unsquiggled because the word gref was not pending when the ' changed from a word character to a word delimiter. (hard to explain - just try it)
FL_DocLayout is a formatted representation of a specific PD_Document, formatted for a specific GR_Graphics context.
A FL_DocLayout encapsulates two related hierarchies of objects.
The logical (or content) hierarchy corresponds to the logical structure of the document.
Where each fl_BlockLayout corresponds to a logical element in the PD_Document (i.e., usually a paragraph of text).
The physical (or layout) hierarchy, by contrast, encapsulates the subdivision of physical space into objects of successively finer granularity.
Where each fp_Run contains some fragment of content from the original document, usually text.
A pt_PieceTable is the data structure used to represent the document. It presents an interface to access the document content as a sequence of (Unicode) characters. It includes an interface to access document structure and formatting information. It provides efficient editing operations, complete undo, and crash recovery.
The PieceTable consists of the following classs:
InitialBuffer -- This is a read-only character array consisting of the entire character content of the document and initially read from the disk. (All XML tags and other non-content items are omitted from this buffer.)
ChangeBuffer -- This is an append-only character array consisting of all character content inserted into the document during the editing session.
InitialAttrPropTable -- This is a read-only table of Attribute/Property structures extracted from the original document.
ChangeAttrPropTable -- This is an append-only table of Attribute/Property structures that are created during the editing session.
Piece -- This class represents a piece of the sequence of the document; that is, a contiguous sub-sequence having the same properties. Such as a span of text or an object (such as an in-line image). It contains a links to the previous and next Pieces in the document. Pieces are created in response to editing and formatting commands.
TextPiece -- This subclass represents a span of contiguous text in one of the buffers. All text within the span has the same (CSS) properties. A TextPiece is not necessarily the longest contiguous span; it is possible to have adjacent (both in order and in buffer position) TextPieces with the same properties. A TextPiece contains a buffer offset and length for the location an size of the text and a flag to indicate which buffer. A TextPiece contains (or contains a link to) the text formatting information. Note that the buffer offset only gives the location of the content of the span in one of the buffers, it does not specify the absolute position of the span in the document.
ObjectPiece -- This subclass represents an in-line object or image. It has no references to the buffers, but does provide a place-holder in the sequence.
PieceList -- This is doubly-linked list of Pieces. The are linked in document order. A forward traversal of this list will reveal the entire content of the document; in doing so, it may wildly jump around both of the buffers, but that is not an issue.
PX_ChangeRecord -- Each editing and formatting change is represented as a ChangeRecord. A ChangeRecord represents an atomic change that was made to one or more pieces. This includes offset/length changes to a TextPiece and changes to the PieceList.
Insert(position,bAfter,c) -- To insert one or more characters c into the document (either before or after) the absolute document position position, we do the following:
InsertSpan
.
cr.span.m_documentOffset
contains the document position of the insertion.cr.span.m_span
marks the buffer position of the text that was inserted.cr.span.m_bAfter
remembers whether the insertion was before or after the document position.
Delete(position,bAfter,length) -- To delete one or more characters from the document (either before or after) the absolute document position position, we do the following:
DeleteSpan
.
cr.span.m_documentOffset
contains the document position of the deletion.cr.span.m_span
marks the buffer position of the text that was deleted.cr.span.m_bAfter
remembers whether the insertion was before or after the document position.
ChangeFormatting()
Undo -- This can be implemented using the information in the ChangeVector. If the CurrentPosition in the ChangeVector is greater than zero, we have undo information. The information in the ChangeRecord prior to the CurrentPosition is used to undo the editing operation. After an undo the CurrentPosition is decremented.
If the ChangeRecord is of type InsertSpan
: we perform a delete operation using cr.span.m_documentOffset
, cr.span.m_span.m_length
and cr.span.m_bAfter
.
If the ChangeRecord is of type DeleteSpan
: we perform an insert operation using cr.span.m_documentOffset
, cr.span.m_span
, and cr.span.m_bAfter
.
ChangeFormatting
:InsertFormatting
:
Redo -- This can be implemented using the information in the ChangeVector. If the CurrentPosition in the ChangeVector is less than the length of the ChangeVector, the redo has not been invalidated and may be applied. The information in the ChangeRecord at the CurrentPosition provides complete information to describe the editing operation to be redone. After a redo the CurrentPosition is advanced.
The content of the original file are never modified. Pieces in the PieceList describe the current document; the original content is referenced in a random access fashion. For systems with small memory or for very large documents, it may be worth demand loading blocks of the original content rather than loading it completly into the InitialBuffer.
Document content data (in the two buffers) are never moved once written. insert and delete operations change the Pieces in the PieceList, but do not move or change the contents of the two buffers.
TextPieces represent spans of text that are convenient for the structure of the document and a result of the sequence of editing operations. They are not optimized for layout or display.
const char *
into the buffers along with a length, which the caller could use in text drawing or measuring calls, but not c-style, zero-terminated strings.
Mapping an absolute document position to a Piece involves a linear search of the PieceList to compute the absolute document position and find the correct Piece. The number of Pieces in a document is a function of the number of editing operations that have been performed in the session and of the complexity of the structure and formatting of the original document. A linear search might be painfully slow.
We provide a complete, but first-order undo with redo. That is, we do not put the undo-operation in the undo (like emacs).
TODO The before and after stuff on insert and delete is a bit of a hand-wave.
class PT_PieceTable { const UT_UCSChar * m_InitialBuffer; const UT_UCSChar * m_ChangeBuffer; pt_PieceList * m_pieceList; pt_AttrPropTable m_InitialAttrPropTable; pt_AttrPropTable m_ChangeAttrPropTable; ... };
class pt_Piece { enum PieceType { TextPiece, ObjectPiece, StructurePiece }; PieceType m_pieceType; <linked-list or tree pointers> ... };
class pt_Span { UT_Bool m_bInInitialBuffer; UT_uint32 m_offset; UT_uint32 m_length; };
class pt_TextPiece : public pt_Piece { pt_Span m_span; pt_AttrPropReference m_apr; ... };
class pt_ObjectPiece : public pt_Piece { ... };
class pt_StructurePiece : public pt_Piece { pt_AttrPropReference m_apr; ... };
class pt_PieceList { <container for linked-list or tree structure> ... };
class pt_AttrPropReference { UT_Bool m_bInInitialTable; UT_uint32 m_index; ... };
class pt_AttrProp { UT_HashTable * m_pAttributes; UT_HashTable * m_pProperties; ... };
class pt_AttrPropTable { UT_vector<pt_AttrProp *> m_Table; ... };
class pt_ChangeRecord { UT_Bool m_bMultiStepStart; UT_Bool m_bMultiStepEnd;
enum ChangeType { InsertSpan, DeleteSpan, ChangeFormatting, InsertFormatting, ... }; struct { UT_uint32 m_documentOffset; UT_Bool m_bAfter; pt_Span m_span; } span; struct { UT_uint32 m_documentOffset1; UT_uint32 m_documentOffset2; pt_AttrPropReference m_apr; } fmt; ... };
class pt_ChangeVector { UT_vector m_vecChangeRecords; UT_uint32 m_undoPosition; ... };
The 'text' directory contains the text-editing engine used by AbiWord and other AbiSuite apps. There is one subdirectory per module.
fmt/xp
(
Formatter):
Contains formatting and layout code, including views.
ptbl/xp
(
PieceTable):
Contains the editable document, implemented using piece tables.
This part contains all the importer and exporter code used by AbiWord. IE_Imp_* classes are the document importers. IE_Exp_* classes are exporters. IE_ImpGraphic_* classes are graphics importers.
Importers and exporters are also used for clipboard operations.
IE_Imp -- This is the base class for all WP importers.
IE_Imp_AbiWord_1 -- Imports version 1 (ie current) of AbiWord documents
IE_Imp_Applix -- This is the importer for Applix Words documents.
IE_Imp_DocBook -- Importer for DocBook SGML documents.
IE_Imp_GraphicAsDocument -- Import a graphic as an empty document containing that graphics. Use available IE_ImpGraphic_*
IE_Imp_GZipAbiWord -- Imports gzip compressed AbiWord documents (.zabw)
IE_Imp_MsWord_97 -- Imports MS Word 97 documents using libwv.
IE_Imp_RTF -- This is the RTF importer.
IE_Imp_Text -- Plain text importer. Also handle non-ASCII text.
IE_Imp_WordPerfect -- Imports WordPerfect documents.
IE_Imp_XHTML -- Import valid XHTML documents.
IE_Imp_XML -- Generic XML importer. Used as a base class for all other XML work.
IE_ImpGraphic -- This is the base class for all graphics importers.
IE_ImpGraphic_JPEG -- This is the JPEG importer using jpeglib. Convert JPEG image to a PNG image.
IE_ImpGraphic_PNG -- This is the PNG importer. Simply reads the PNG file.
IE_ImpGraphic_BMP -- This is the BMP importer. Convert a BMP file to PNG.
IE_ImpGraphic_WMF -- WMF Importer.
IE_ImpGraphic_SVG -- SVG Importer. Currently worthless.
IE_Exp_AbiWord_1 -- Write AbiWord XML files version 1, the native file format as of today.
IE_Exp_AWT -- Write AbiWord template documents. Most of the functionnality is inherited from IE_Exp_AbiWord_1
IE_Exp_HTML -- Output HTML 4.0 or XHTML.
IE_Exp_RTF -- Exports RTF.
IE_Exp_Text -- Exports plain text.
The 'wp' directory contains all source code specific to AbiWord. There is one subdirectory per module.
ap
(
AP)
Contains source code for application-specific portion of the cross-platform framework defined in src/af/xap
and src/af/ev
. This contains application key bindings, mouse bindings, menu layouts, and toolbar layouts. It contains the menu string tables. It contains the table of application functions to which events may be bound. It contains the code to manage the document window (rulers, scroll bars, and the actual document window itself).
impexp
(
ImpExp)
Contains importers and exporters for various file formats.
main
(
main)
Subdirectories below may have additional hierarchy to further break things down by module. However, eventually, source code should find itself in a directory which indicates the portability of the code within it. For example, cross-platform code should always be placed in a subdirectory called 'xp'. Win32-specific code should be in a subdirectory called 'win'.