Arc Info Binary Coverage Format Analysis

 

Arc/Info Binary Coverage Format Analysis

 

Last Update: 2006-06-14, Daniel Morissette, [email protected]

 

 

TABLE OF CONTENTS

 

  • 1. Introduction
    • 1.1 PC Arc/Info and other variants 1.2 Byte Ordering
  • 2. ARC Coverage Files
    • 2.1 File Header
    • 2.2 Index Files
    • 2.3 ARC
    • 2.4 PAL
    • 2.5 LAB
    • 2.6 CNT
    • 2.7 PRJ
    • 2.8 LOG
    • 2.9 TOL
    • 2.10 TX6/TX7 Annotations
    • 2.11 TXT Annotations
    • 2.12 RXP - Specific to Region coverages
    • 2.13 RPL - Specific to Region coverages
  • 3. The Attribute Files
  • 3.1 INFO Files in V7.x Coverages
    • 3.1.1 INFO/ARC.DIR
    • 3.1.2 INFO/ARC####.DAT
    • 3.1.3 INFO/ARC####.NIT
    • 3.1.4 Table Data files (.adf, ...)
    • 3.1.5 Name and location of TABLE DATA files
  • 3.2 INFO Files in "Weird" Coverages
  • 3.3 DBF Files in PC Coverages

 

 

1. INTRODUCTION

This is an attempt to document the binary vector coverage files used by Arc/Info V7.x for Unix and Windows NT. Since the coverage file's format is not documented by ESRI, this document is mainly based on the analysis of binary dumps of the files... this implies that the information may be incomplete (or even inaccurate!) in some cases. As for any document of this type, it is expected that it will evolve as we learn more.

Another great source of information to help understanding the format would be the (world famous) "ANALYSIS OF ARC EXPORT FILE FORMAT FOR ARC/INFO (REV 6.1.1)" (from which I "borrowed" some extracts ;-)... you can find it at:

 

http://www.geocities.com/~vmushinskiy/fformats/files/e00.txt

Since the contents of the E00 and binary coverage files are very close, the current document will often refer you to an updated version of the E00 Analysis Document mentioned above instead of duplicating the details about a specific file.

The first section of this document covers the coverage vector files (ARC, PAL, CNT, LAB, ...) and the second section covers the INFO files.

 

 

1.1 PC ARC/INFO COVERAGES AND VARIANTS

Even though this document covers mainly Arc/Info V7.x for Unix coverages, some notes have been included to document the differences between the V7.x Unix coverages and some variants.

In each section, the Unix V7.x coverage format is always discussed first (sometimes referred to as "V7 Coverages"), and when applicable, the following variants will also be discussed:

 

  • "PC Coverages V1": Coverages produced by the 16 bits version of Arc/Info for PCs (DOS or Windows?).

     

  • "PC Coverages V2": Look like an hybrid between PC Coverages V1 and V7.x Coverages. Probably produced on Unix systems with Motorola byte ordering. They use DBF files for the info tables (located in the coverage directory) just like PC Coverages V1, but use .adf files for the other files like V7.x Coverages. They also use the same byte ordering as V7.x Coverages.

     

  • "Weird Coverages": Refers to some kind of hybrid between V7 and PC coverages. Probably produced by an early version of Arc/Info for Unix.
    These coverages use the same byte ordering as V7 Coverages.
    The attribute files in these coverages are located in an INFO directory and have names similar (but not identical) to V7 Coverages. The coverage files (ARC, PAL, etc.) are named the same way as in PC Coverages, but they do not have the first 256 bytes header of PC Coverages and they are not padded to a multiple of 256 bytes at the end.
    The name "Weird coverages" refers to our reaction when we saw those coverages for the first time. ;-)

 

1.2 BYTE ORDERING

V7.x Coverages always use MSB-First (Motorola) byte ordering for both the ARC coverage files and the INFO tables. This is true even for coverages produced by Arc/Info V7.x for Windows NT on an Intel platform.

PC Arc/Info coverages V1 always use LSB-First (Intel) byte ordering.

PC Arc/Info coverages V2 always use MSB-First (Motorola) byte ordering.

The Weird coverages use MSB-First (Motorola) byte ordering (Same as V7 Coverages).

 

 

2. ARC COVERAGE FILES

All the vector (ARC) coverage files are stored in the same directory. The name of this directory is the name of the coverage.

The name of the coverage directory (and thus the name of the coverage) appears to be limited to 13 characters.

 

2.1 File Header

 

     

    2.1.1 V7.x Coverage File Header

    Most of these files have a 100 bytes header:

     

    	Bytes	Type	Description
    
    	0-3	int32	Signature - Constant for a given file type
    	4-7	int32	Precision - Usually > 0 for single precision, 
    	                            and < 0 for double precision, but
                                        there are exceptions.
    	8-11	int32	Record size, for files with fixed size records
    	                (or 0 for variable length records)
    	12-23		All zeros
    	24-27	int32	File size (in 2 byte words), including header size 
    	28-99		All zeros
    

     

     

    2.1.2 PC Coverage File Header

    PC Coverages V1 first start with a 256 bytes header, followed by the 100 bytes header described above, for a total of 356 bytes of header. All the files that have this 256 bytes header have an actual size which is a multiple of 256 bytes, padded with junk at the end. So it is very important to take the size specified in the header into account when reading these files.

    Here is what we find in this 256 bytes header specific to PC coverages:

     

    	Bytes	Type	Description
    
    	0-1	int16	Signature ??? 0x0400 or 0x0000
    	2-5	int32	File size (in 2 byte words), including the 100 bytes
                            header size, but not including this 256 bytes header.
    	                This same value will be repeated in bytes 24-27 of
    			the 100 bytes header.
    	6-255		All zeros
    

    Also note that PC Coverages are ALWAYS SINGLE PRECISION, no matter what the value in bytes 4-7 in the 100 bytes header is. (i.e. The preccision flag value is sometimes negative, but the data is really always single precision.)

     

    2.1.3 PC V2 and Weird Coverage File Header

    PC V2 and Weird Coverages have only one 100 bytes header, just like V7 Coverages.

    These coverages can exist in both single and double precision form.

     

 

2.2 Index files

 

    The files that contain variable length records (i.e. ARC, PAL, CNT, etc.) are accompanied by an index file. The name of the index file (when present) will be specified in the documentation for each file type below.

    All index files have the same 100 bytes header as the file that they correspond to (it is identical, except for the size value at byte 24).

    Then starting at byte 100 in the file, you have one index entry for each object from the master file:

     

    	Bytes	Type	Description
    
    	0-3	int32	Start position of the record in the file.  This
    	                value is the number of 2 byte words from the beginning
    			of the file.  So the position for the first object
    			is always 50.
    	4-7	int32	Record length, excluding the first 8 bytes of the
    	                record (number of 2 byte words - 4).  This value
    	                is the same value that we usually find at byte 4 in 
    			the corresponding object record.  See 
    

    PC Coverage V1 index files start with the 256 bytes header followed by the usual 100 bytes header, followed by index entries.

    PC Coverage V2 index files are identical to V7 indexes.

    Weird Coverage index files are identical to V7 indexes, except that they have a different filename.

 

2.3 ARC.ADF

 

    The "arc.adf" file contains the arcs definitions and their vertices.

    It comes with an index file called "arx.adf".

     

    2.3.1 ARC.ADF file in V7.x Coverages

    The file starts with the usual 100 bytes header:

     

    	Bytes	Type	Description
    
    	0-3	int32	Signature - 9994
    	4-7	int32	Precision - +1 for single precision, 
    	                            and -1 for double precision.
    	8-11	int32	Record size (always 0: variable length records)
    	12-23		All zeros
    	24-27	int32	File size (in 2 byte words), including header size 
    	28-99		All zeros
    

    Then variable length arc records follow:

     

    	Bytes	Type	Description
    
    	0-3	int32	Arc_Id
    	4-7	int32	Record length, number of 2 byte words that follow
    	                the current value. (= (12 + size of vertices list)/2)
    	7-11	int32	Arc_UserId
    	12-15	int32	From_Node
    	16-19	int32	To_Node
    	20-23	int32	Left_Poly
    	24-27	int32	Right_Poly
    	28-31	int32	Num_Vertices
    	32+             Vertices list (see below)
    

    The the vertices follow (Num_Vertices pairs of x,y values).

    For SINGLE PRECISION:

     

    	Bytes	Type	Description
    
    	32-35	float	x1
    	36-39	float	y1
    	40-43	float	x2 ...
    	44-47	float	y2 ...
    	...
    	...
    

    For DOUBLE PRECISION:

     

    	Bytes	Type	Description
    
    	32-39	double	x1
    	40-47	double	y1
    	48-55	double	x2 ...
    	56-63	double	y2 ...
    	...
    	...
    

     

    2.3.2 ARC file in PC Coverages V1

    In PC Coverages V1, the main file is called "ARC" and the index "ARX".

    They both start with the 256 bytes header specific to PC Coverages, followed by the 100 bytes header and the data records as described above.

    Note that PC Coverages files are ALWAYS single precision.

     

    2.3.3 ARC file in PC Coverages V2

    Identical to V7.x

     

     

    2.3.4 ARC file in Weird Coverages

    Same as V7 Coverages, except that the main file is called "ARC" and the index "ARX".

 

2.4 PAL.ADF

 

  •  
    • Arc_Id will be negative if the direction of the arc is reversed
    • From_Node_Id is the arc's FNODE#. If the arc is reversed, then From_Node_Id will be the arc's TNODE#.
    • Adjacent_Polygon_Id is the Id of the polygon that shares this arc with the current polygon.
  • The "pal.adf" file contains the polygon definitions. It is present only inside coverages with clean polygon topology.

    It comes with an index file called "pax.adf".

     

    2.4.1 PAL.ADF file in V7.x Coverages

    The file starts with the usual 100 bytes header:

     

    	Bytes	Type	Description
    
    	0-3	int32	Signature - 9994
    	4-7	int32	Precision - +11 for single precision, 
    	                            and -11 or 1011 (yep!) for double prec.
    	8-11	int32	Record size (always 0: variable length records)
    	12-23		All zeros
    	24-27	int32	File size (in 2 byte words), including header size 
    	28-99		All zeros
    

    Then variable length polygon records follow:

    For SINGLE PRECISION:

     

    	Bytes	Type	Description
    
    	0-3	int32	Polygon Id
    	4-7	int32	Record Length, number of 2 byte words that follow
                            the current value.
    	8-11	float	Min. X coordinate
    	12-15	float	Min. Y coordinate
    	16-19	float	Max. X coordinate
    	20-23	float	Max. Y coordinate
    	24-27	int32	Number of Arcs
    	28+     int32	List of Arc records (see below)
    

    For DOUBLE PRECISION:

     

    	Bytes	Type	Description
    
    	0-3	int32	Polygon Id
    	4-7	int32	Record Length, number of 2 byte words that follow
                            the current value.
    	8-15	double	Min. X coordinate
    	16-23	double	Min. Y coordinate
    	24-31	double	Max. X coordinate
    	32-39	double	Max. Y coordinate
    	40-43	int32	Number of Arcs
    	24+     int32	List of Arc records (see below)
    

    For each arc in the arc list, we have a fixed length record:

     

    	Bytes	Type	Description
    
    	0-3	int32	Arc_Id
    	4-7	int32	From_Node_Id
    	8-11	int32	Adjacent_Polygon_Id
    

     

     

    2.4.2 PAL file in PC Coverages V1

    In PC Coverages, the main file is called "PAL" and the index "PAX".

    They both start with the 256 bytes header specific to PC Coverages, followed by the 100 bytes header and the data records as described above.

    Note that PC Coverages files are ALWAYS single precision.

     

    2.4.3 PAL file in Weird Coverages

    Same as V7 Coverages, except that the main file is called "PAL" and the index "PAX".

 

2.5 LAB.ADF

 

    The "lab.adf" file contains label point records.

    This file has no associated index since it has fixed size records.

     

    2.5.1 LAB.ADF file in V7.x Coverages

    The file starts with the usual 100 bytes header:

     

    	Bytes	Type	Description
    
    	0-3	int32	Signature - 9993
    	4-7	int32	Precision - +2 for single precision, 
    	                            and -2 for double precision.
    	8-11	int32	Label Record size, in 2 byte words 
    	                    (16 for single prec. and 28 for double prec.)
    	12-23		All zeros
    	24-27	int32	File size (in 2 byte words), including header size 
    	28-99		All zeros
    

    Then fixed size label point records follow:

    For SINGLE PRECISION:

     

    	Bytes	Type	Description
    
    	0-3	int32	Label Value
    	4-7	int32	Polygon_Id
    	8-11	float	Label X coord.
    	12-15	float	Label Y coord.
    	16-19	float	Label X coord.
    	20-23	float	Label Y coord.
    	24-27	float	Label X coord.
    	28-31	float	Label Y coord.
    

    For DOUBLE PRECISION:

     

    	Bytes	Type	Description
    
    	0-3	int32	Label Value
    	4-7	int32	Polygon_Id
    	8-15	double	Label X coord.
    	16-23	double	Label Y coord.
    	24-31	double	Label X coord.
    	32-39	double	Label Y coord.
    	40-47	double	Label X coord.
    	48-55	double	Label Y coord.
    

     

    2.5.2 LAB file in PC Coverages V1

    In PC Coverages, this file is called "LAB".

    It starts with the 256 bytes header specific to PC Coverages, followed by the 100 bytes header and the data records as described above.

    Note that PC Coverages files are ALWAYS single precision.

     

    2.5.3 LAB file in Weird Coverages

    Same as V7 Coverages, except that the file is called "LAB".

 

2.6 CNT.ADF

 

    The "cnt.adf" file contains polygon centroid information.

    It comes with an index file called "cnx.adf".

     

    2.6.1 CNT.ADF file in V7.x Coverages

    The file starts with the usual 100 bytes header:

     

    	Bytes	Type	Description
    
    	0-3	int32	Signature - 9994
    	4-7	int32	Precision - +14 for single precision, 
    	                            and -14 for double precision.
    	8-11	int32	Record size (always 0: variable length records)
    	12-23		All zeros
    	24-27	int32	File size (in 2 byte words), including header size 
    	28-99		All zeros
    

    Then variable length centroid records follow:

    For SINGLE PRECISION:

     

    	Bytes	Type	Description
    
    	0-3	int32	Polygon Id
    	4-7	int32	Record Length, number of 2 byte words that follow
                            the current value.
    	8-11	float	Centroid X coordinate
    	12-15	float	Centroid Y coordinate
    	16-19	int32	Num_Labels ( >= 0 )
    	20+     int32	List of Label Ids (Only if Num_Labels > 0 )
    

    For DOUBLE PRECISION:

     

    	Bytes	Type	Description
    
    	0-3	int32	Polygon Id
    	4-7	int32	Record Length, number of 2 byte words that follow
                            the current value.
    	8-15	double	Centroid X coordinate
    	16-23	double	Centroid Y coordinate
    	24-27	int32	Num_Labels ( >= 0 )
    	28+     int32	List of Label Ids (Only if Num_Labels > 0 ) 
    

     

    2.6.2 CNT file in PC Coverages V1

    In PC Coverages, the main file is called "CNT" and the index "CNX".

    They both start with the 256 bytes header specific to PC Coverages, followed by the 100 bytes header and the data records as described above.

    Note that PC Coverages files are ALWAYS single precision.

     

    2.6.3 CNT file in Weird Coverages

    Same as V7 Coverages, except that the main file is called "CNT" and the index "CNX".

 

2.7 PRJ.ADF - Projection file

 

     

    2.7.1 PRJ.ADF file in V7.x Coverages

    The PRJ.ADF file is a simple ASCII file with one line for each piece of projection information. The lines have a variable length and are terminated by a newline character.

    Here is an example of a prj.adf file:

         Projection    GEOGRAPHIC
         Zunits        NO
         Units         DD
         Spheroid      CLARKE1866
         Xshift        0.0000000000
         Yshift        0.0000000000
         Parameters
    

     

    2.7.2 PRJ file in PC Coverages V1

    PC Coverages do not appear to carry a PRJ file... or at least we never encountered any.

     

    2.7.3 PRJ file in Weird Coverages

    Just like for PC Coverages... they do not appear to carry a PRJ file... or at least we never encountered any.

 

2.8 LOG - Coverage history

 

     

    2.8.1 LOG file in V7.x Coverages

    The LOG file (named "log", not "log.adf"!) is an ASCII file with variable length lines each terminated with a newline.

    The lines have no known length limit, they can be longer than 80 characters for sure.

     

    2.8.2 LOG file in PC Coverages V1

    Nothing special... the file is called LOG as well.

     

    2.8.3 LOG file in Weird Coverages

    Probably the same... but we've never encountered any.

 

2.9 TOL - Coverage Tolerances

 

  •  
    • 1. fuzzy
    • 2. generalize (unused)
    • 3. node match (unused)
    • 4. dangle
    • 5. tic match
    • 6. undefined
    • 7. undefined
    • 8. undefined
    • 9. undefined
    • 10. undefined
  •  

    2.9.1 TOL.ADF file in V7.x Coverages

    The TOL file contains the tolerance values to use when processing a polygon coverage. It usually contains 10 tolerance entries. For each entry, we have a tolerance type, a tolerance status, and a tolerance value. The tolerance types are:

     

    The tolerance status "is set to 1 if the tolerance is verified (been applied to operations of the coverage) and to 2 if the tolerance is not verified (been set by the TOLERANCE command, but not yet used in processing)."

    In a SINGLE PRECISION coverage, the file is named "tol.adf", it has no header, and for each tolerance value, we have:

     

    	Bytes	Type	Description
    
    	0-3	int32	Tolerance type (usually goes from 1 to 10)
    	4-7	int32	Tolerance status
    	8-11	float	Tolerance value
    

    In DOUBLE PRECISION coverages, the file is named "par.adf", and it DOES have the usual 100 bytes header:

     

    	Bytes	Type	Description
    
    	0-3	int32	Signature - Always 9993
    	4-7	int32	Value of 40 (this should be the precision field???)
    	8-11	int32	Tolerance record size, in 2 byte words (always 8)
    	12-23		All zeros
    	24-27	int32	File size (in 2 byte words), including header size 
    	28-99		All zeros
    

    Then for each double precision tolerance value, we have:

     

    	Bytes	Type	Description
    
    	0-3	int32	Tolerance type (usually goes from 1 to 10)
    	4-7	int32	Tolerance status
    	8-15	double	Tolerance value
    

    Note: The double precision file ("par.adf") header does not seem to follow the general rule for the header... its precision field value is > 0 while this value is negative for all other double precision files(???). Also, the third field has a value of 8, while it is 0 in all other headers.

     

    2.9.2 TOL file in PC Coverages V1

    In PC Coverages, this file is called "TOL".

    Contrary to most other files, the TOL file in PC Coverages does not have any header... it starts immediately with the tolerance entries like the "tol.adf" in single precision V7.x coverages.

    Note that PC Coverages files are ALWAYS single precision.

     

    2.9.3 TOL file in Weird Coverages

    Same as "tol.adf" in single precision V7 Coverages except that the file is called "TOL".

 

2.10 TX6/TX7 - Annotations

 

  •  
    • test.txt - The actual annotations file
    • test.txx - Index file
    • test.tat - INFO table for this set of annotations
  •  

    2.10.1 TX6/TX7 files in V7.x Coverages

    TXT, TX6, and TX7 are 3 variations of text annotations that we find in E00 files. TX6 and TX7 annotations usually come with a .TAT info table, or a set of .TAT tables.

    It seems that you can have several "subclasses" of annotations, in the E00 file they are sub-sections of the main TX6/TX7 section, and in a binary coverage they are stored in separate files.

    There is no difference between the binary files for a TX6 and the files for a TX7. However, in the E00 format, there is an additional value in the first line of a TX7 entry (that is not present in a TX6), this value is very often 0 (or 1 in some cases), but even when it is set, it is not present in the binary file... I have no idea where it comes from!?!

    For the subclass of annotations called TEST, you will find 3 files in the coverage directory:

     

    The file "test.txt" has the usual 100 bytes header, followed by variable length records for each piece of text.

    File Header:

     

    	Bytes	Type	Description
    
    	0-3	int32	Signature - Always 9994
    	4-7	int32	Precision - +67 for single precision, 
    	                            and -67 for double precision.
    	8-11	int32	Record size (always 0: variable length records)
    	12-23		All zeros
    	24-27	int32	File size (in 2 byte words), including header size 
    	28-99		All zeros
    

    Followed by records of data for each piece of text:

     

    	Bytes	Type	Description
    
    	0-3	int32	System ID (TEST#)
    	4-7	int32	Record Length, number of 2 byte words that follow
                            the current value.
    	8-11	int32	User ID (TEST-ID)
    	12-15	int32	??? LEVEL
    	16-19	float	??? Defaults to -1e+02 but is sometimes different
    	                    (this value is always a 4 bytes float value,
    			     even for double-prec. coverages)
    	20-23	int32	SYMBOL (Text font)
    	24-27	int32	num_vertices1: for the line along which the text
                                           is drawn.
    	28-31	int32	??? n28: Always 0 (Verified that it corresponds to the
    	                                   6th value of 1st line in a TX7-E00)
    	32-35	int32	Number of chars in text string
    	36-39	int32	num_vertices2: for the text arrow.  If this value is
                                           negative then the arrow is reversed.
    
    	40-41	int16	??? Always 1  - Corresponds to the second set of
    	42-43	int16	??? Always 0    20 values in a E00 TX7 entry
    	44-45	int16	??? Always 0
    	...             ...
    	78-79	int16	??? Always 0
    
    	80-81	int16	Text justification  - Corresponds to the first set of
    	82-83	int16	??? Always 0          20 values in a E00 TX7 entry
    	84-85	int16	??? Always 0
    	...             ...
    	118-119	int16	??? Always 0
    

    The rest of the record depends on the precision. For SINGLE PRECISION, we have:

     

    	120-123	float	??? v1, Text Height ???
    	124-127	float	??? v2 (always 0)
    	128-131 float	??? v3 (always 0)
    	132+    chars	Text String (padded with spaces to the 
    	                             next 4 bytes boundary)
    
    		float	x1 - Vertices list 
    		float	y1   (num_vertices1+num_vertices2) vertice pairs
    		float	x2
    		float	y2
    		float	... 
    
    		int32	??? Unused ???  - The last 8 bytes look like junk
    		int32	??? Unused ???  - See note below.
    

    And for DOUBLE PRECISION, we would have:

     

    	120-127	double	??? v1, Text Height ???
    	128-135	double	??? v2 (always 0)
    	136-143 double	??? v3 (always 0)
    	144+    chars	Text String (padded with spaces to the 
    	                             next 4 bytes boundary)
    
    		double	x1 - Vertices list 
    		double	y1   (num_vertices1+num_vertices2) vertice pairs
    		double	x2
    		double	y2
    		double	... 
    
    		int32	??? Unused ???  - The last 8 bytes look like junk
    		int32	??? Unused ???  - See note below.
    

    Note:
    The last 8 bytes of junk appear to be always present in V7 coverages. However, they are sometimes present and sometimes not present in Weird coverages. Thus, the only safe way to know whether there is junk to skip at the end of a TX6 record is to use the record length value in bytes 4-7.

     

     

    2.10.2 TX6/TX7 files in PC Coverages V1

    PC Coverages probably can't have TX6/TX7 files but they can have TXT files though... see below.

     

    2.10.3 TX6/TX7 files in Weird Coverages

    Weird coverages can have TX6/TX7 files, and they work the same way as for V7 coverages, except that the name ends with "txt", instead of ".txt". (e.g. we have "testtxt" instead of "test.txt")

 

2.11 TXT - Annotations

 

  •  
    • The file names will be "txt.adf" and "txx.adf" for the Index file
    • The values in bytes 40-119 of each entry look like junk... there appears to be absolutely no correlation with what you find in the corresponding TXT section of an E00 file.
    • When the binary TXT structure is converted to E00-TXT, the first vertex of the vertices list for the text's polyline is always ignored (the first and second vertices in the vertices list are always the same).
      For instance, if num_vertices1==3 in the binary file, then we should ignore the first vertex, and the corresponding E00-TXT entry would have num_vertices1=2 (corresponding to vertices 2 and 3 in the vertices list).
  •  

    2.11.1 TXT.ADF file in V7.x Coverages

    TXT type of annotations use the exact same file structure as TX6/TX7 above, with the following differences:

     

     

    2.11.2 TXT file in PC Coverages V1

    In PC Coverages, the main file is called "TXT" and the index "TXX".

    They both start with the 256 bytes header specific to PC Coverages, followed by the 100 bytes header:

     

    	Bytes	Type	Description
    
    	0-3	int32	Signature - Always 9994
    	4-7	int32	Precision - Always 1 (always single precision)
    	8-11	int32	Record size (always 0: variable length records)
    	12-23		All zeros
    	24-27	int32	File size (in 2 byte words), including header size 
    	28-99		All zeros
    

    However, contrary to what we find with most other file types, the data records in the TXT file are different from what we find in V7.x TXT.ADF files.

    PC Coverage TXT entries are always single precision. For each piece of text, we have:

     

    	Bytes	Type	Description
    
    	0-3	int32	System ID (TEST#)
    	4-7	int32	Record Length, number of 2 byte words that follow
                            the current value.
    	8-11	int32	??? LEVEL (Corresponds to bytes 12-15 in a V7 TXT)
    	12-15	int32	Number of vertice pairs that are valid ( [1..4] )
    	16-19	float	x1 - (1st float value in a E00 TXT section)
    	20-23	float	y1 - (5th float value in a E00 TXT section)
    	24-27	float	x2 - (2nd float value in a E00 TXT section)
    	28-31	float	y2 - (6th float value in a E00 TXT section)
    	32-35	float   x3 
    	36-39	float   y3
    	40-43	float   x4
    	44-47	float   y4
    	48-75	float	Always 0 ??? Probably corresponds to the other
    			float values in the E00 TXT section
    	76-79	float	??? Text Height ???
    	                Corresponds to the 15th float value in a E00 TXT
    	80-83	float	??? Defaults to -1e+02 but is sometimes different
    	84-87	int32	SYMBOL (Text font)
    	88-91	int32	Number of chars in text string
    	92+	chars	Text String (padded with spaces to the 
    	                             next 4 bytes boundary... it was also
    				     noted that strings that are a multiple
    				     of 4 chars in length are also padded 
    				     with 4 spaces)
    

     

     

    2.11.3 TXT file in Weird Coverages

    Weird coverages can have their TXT files stored using either the PC structure or the V7 structure. In both cases the filenames are the same ("TXT" and "TXX"). The only way to tell if the file is in PC TXT format or in V7 TXT/TX6/TX7 format is by looking at the precision field in the 100 bytes header:

     

    	Bytes	Type	Description
    
    	0-3	int32	Signature - Always 9994
    	4-7	int32	Precision - +16 for single precision in PC TXT format,
                                        +67 for single precision in V7 format, 
    	                            and -67 for double precision in V7 format.
    	8-11	int32	Record size (always 0: variable length records)
    	12-23		All zeros
    	24-27	int32	File size (in 2 byte words), including header size 
    	28-99		All zeros
    

    When the V7 structure is used, the files are identical to the V7 TXT/TX6/TX7 files described above except for the filename.

    When the PC TXT structure is used, the files are similar to PC Coverage TXT files, except for the byte ordering and the fact that there is no 256 byte header in the Weird Coverage ones. Another minor difference: in Weird Coverages, when a text string has a length that is a multiple of 4 chars, it won't be padded with 4 spaces as it would have been in a PC Coverage file. This is a minor detail, but it is interesting to notice that this bug has been fixed between PC Arc/Info and the version of Arc/Info that produced the weird coverages.

     

 

2.12 RXP - Specific to Regions

 

  •  
      RXP  2
      OLD
               1       120
               2        11
               3        12
               4        13
               4       202
               5        16
               6        19
               7        14
               7        20
               7        21
               7       125
               8        22
      ...
      ...
      
  • RXP sections contain define the list of polygons from the PAL section that form each region, and they occur only in region coverages. There is one .rxp file for each region in the coverage.

    RXP files were never encountered in PC Coverages and Weird Coverages.

    .rxp files have no header, and they contain fixed size records:

     

    	Bytes	Type	Description
    
    	0-3	int32	Region Polygon ID
    	4-7	int32	PAL Polygon ID
    

    Regions that consist of multiple polygons will have several records with the same Region Polygon Id and differing PAL Polygon Ids.

     

 

2.13 RPL - Specific to Regions

 

    E00-RPL are also specific to region coverages. In the binary coverage, they correspond to files with a ".pal" extension. There is one .pal file for each region in the coverage and they use the exact same structure as "pal.adf" files. Each .pal file probably contains the definition of the polygons that belong to that region.

    RPL files come with an index file with a ".pax" extension.

    See the "pal.adf" description...

    RPL files were never encountered in PC Coverages and Weird Coverages.

 

 

3 - THE ATTRIBUTE FILES

Each type of coverage has a different way to store attribute information:

 

  • V7.x Coverages maintain a "../info" directory with the attribute files for all the coverages that are located in the same parent directory.

     

  • PC Coverages V1 and V2 store their attribute information in regular DBF files inside the coverage directory.

     

  • Weird Coverages also use a "../info" directory shared by a number of coverages, but the organization of that info directory differs a little from what we find in V7.x Coverages.

 

3.1 INFO FILES IN V7.x COVERAGES

 

    The INFO files are tables with the attribute information attached to an Arc/Info coverage. The data files themselves are stored in the coverage directory, but the definition of the table fields are stored in the "../info" directory.

    The ../info directory is shared by all the coverages stored in its parent directory, and contains the following files:

     

            arc.dir
            arc0000.dat
            arc0000.nit
            arc0001.dat
            arc0001.nit
            ...
            ...
            ...
    

 

3.1.1 INFO/ARC.DIR

 

  •  
    • A value of " " indicates an internal table, i.e. the data is stored directly in the info/arc####.dat file.

       

    • A value of "XX" indicates an external table, i.e. the data is stored in a file outside of the info directory. In this case the arc####.dat file contains one 80 chars string with the path to this external data file relative to the info directory (padded with spaces).
  • Contains one record for each attribute table (arc*.*) in this info directory. The file has no header, and each record has a fixed size of 380 bytes.

     

    	Bytes	Type	Description
    
    	0-31	char	Table name (as shown by Arc/Info) padded with spaces
    	32-39	char	Internal Name ("ARC#### " file name)
    	40-41	int16	Number of fields in table (valid fields... see below)
    	42-43	int16	Table Record size (rounded up to a multiple of 2 bytes)
    	44-59	char	 ??? 16 spaces
    	60-61	int16	 ??? Always 132
    	62-63	int16	 ??? Always 0
    	64-67	int32	Number of records  (may also be only an int16???)
    	68-77		 ??? All zeros
    	78-79	char	External flag ("  " or "XX", see note below)
    	80-317		 ??? All zeros
    	318-325	char	 ??? 8 spaces
    	326-379		 ??? All zeros
    

    Note that the Arc/Info table name (first field above) is always the coverage name followed by an extension (ex: TEST.AAT, TEST.TIC, TEST.BND, TEST.PAT, TEST.PATCOUNTRY, etc.). So this name can be used to search the arc.dir for all tables related to a given coverage.

    The arc.dir entry contains the number of valid fields in the table, but the arc####.NIT file can contain deleted field definitions (and these deleted field definitions are even exported in the E00 table headers produced by Arc/Info). In this case the number of field entries in the arc####.nit file will be bigger than the number of fields found here. Unfortunately there does not appear to be anything in the arc.dir that would allow us to tell if the table has deleted fields (or not) until we go and read the arc####.nit file.

    In some cases, the "number of records" field for a table in the arc.dir does not correspond to the real number of records in the data file. In this kind of situation, the number of records returned by Arc/Info in the corresponding E00 file will be based on the real data file size (obtained with stat()), and not on the value from the arc.dir. (i.e. use num_records = physical_data_file_size/record_size)

    The external flag tells where the data file is located.

     

    When the value for "number of records" in the arc.dir is 0, then the data file for this table may not exist yet.

    The value for "number of records" in the arc.dir is the real size used by each record in the data file, and thus must be a multiple of 2 since data records are padded at the end to be aligned with a 2 bytes boundary.

    There does not appear to be any difference between single and double precision table entries.

 

3.1.2 INFO/ARC*.DAT

 

    For internal tables (see external flag in the arc.dir entry), this file contains the table data.

    For external tables, it is an 80 characters ASCII file that contains the relative path of the file that contains the table data. The end of the path is padded to 80 chars with spaces.

     

    Ex: 
    ../test/tic.adf
    

 

3.1.3 INFO/ARC*.NIT

 

    Contains the table fields definition. The file has no header, and each field definition record has a fixed size of 144 bytes.

    The meaning of the items marked with a question mark is unknown, but they could be recognized in the following E00 IFO table header.

     

      ----------------------------------------------------------------------
      FNODE#            4-1   14-1   5-1 50-1  -1  -1-1                   1
      ----------------------------------------------------------------------
    
    	Bytes	Type	Description
    
    	0-15	char	(FNODE#) Field name padded with spaces
    	16-17	int16	(4)	 Storage size in bytes
    	18-19	int16	(-1)	 ?
    	20-21	int16	(1)	 1-based offset of the field in a record
    	22-23	int16	(4)	 ? (always 4 !!!)
    	24-25	int16	(-1)	 ?
    	26-27	int16	(5)	 Display format: width
    	28-29	int16	(-1)	 Display format: number of decimals or -1
    						 if not applicable.
    	30-31	int16	(5)	 First digit of field type
    	32-33	int16	(0)	 2nd digit of field type (always 0)
    	34-35	int16	(-1)	 ?
    	36-37	int16	(-1)	 ?
    	38-39	int16	(-1)	 ?
    	40-41	int16	(-1)	 ?
    	42-57	char		 ? Alternate Name (always blank!) ?
    	...
    	114-115	int16	(1)	 1-based field index (-1 if field is deleted)
    	116-144			 ? All zeros
    

    The field type is specified by the following codes:

     

    	10 (D) Date (stored as 8 bytes, display width must be either
    	       8 chars (12/31/99) or 10 chars (12/31/1999) )
    	20 (C) Character string
    	30 (I) Integer with fixed number of digits (1 byte storage per digit)
    	40 (N) Numeric value with decimals and fixed number of digits 
    	       (1 byte storage per digit, value is right-justified)
    	50 (B) Binary integer (2 or 4 bytes)
    	60 (F) Binary float (4 or 8 bytes, depends on coverage precision)
    
    Ref: Understanding GIS... p.6-5, 6-6

    When exported to E00, here is the form that each type takes:

     

    	10 (D) 8 characters
    	20 (C) Nbr of chars = field storage size.
    	30 (I) Nbr of chars = field storage size, value is right-justified
    	40 (N) stored as single prec. floats = 14 chars, ex: "-1.7735416E+00"
                   (Uses 1 byte storage per digit internally, but always stored
                    as single precision floats in both single and double 
                    precision E00 tables.)
    	50 (B) 32 bits integer use 11 chars, right-justified
    	       16 bits integer????? Never saw an E00 that contained any!
    		    but it would probably be 6 chars since the biggest 
    		    value to store would be "-32767"
    	60 (F) single prec. = 14 chars total, ex: "-1.7735416E+00"
    	       double prec. = 24 chars total, ex: "-2.60358875000000000E+05"
    

 

3.1.4 TABLE DATA files (.adf, ...)

 

    The table data itself is stored in binary files inside the coverage directory. They usually have a .adf extension in simple coverages, (ex: tic.adf, bnd.adf, aat.adf, pat.adf, ...) but it may not always be the case for coverages with regions, etc.

    These files do not have any header, and they have fixed size records of the size specified in the corresponding ../info/arc####.dat and ../info/arc####.nit files.

     

 

3.1.5 Name and location of TABLE DATA files

 

  •  
    • [COVERNAME]:
      The first part of the table name (before the '.') is the name of the coverage to which the table belongs, and the data file will be created in this coverage's directory... so it is assumed that the directory "../[covername]" already exists and is writable.
    • [EXT]:
      The coverage name is followed by a 3 chars extension that will be used to build the name of the external table to create.
    • [SUBCLASSNAME]:
      For some table types, the extension is followed by a subclass name.
  • When reading a coverage, the information found in the arc.dir (and in the arc####.dat for external tables) is sufficient to establish the location of the actual data file.

    However, when time comes to create a new coverage, one needs to know how to name and where to place the data files.

    For internal tables, the data file goes directly in the info directory, inside the arc####.dat so there is not much to worry about.

    For external tables, the table name (first field in the arc.dir, and in and E00 table header) is composed of 3 parts:

             [COVERNAME].[EXT][SUBCLASSNAME]
    

    When [SUBCLASSNAME] is present, then the data file name will be:

                "../[covername]/[subclassname].[ext]"
    
    e.g. The table named "TEST.PATCOUNTY" would be stored in the file "../test/county.pat" (this path is realtive to the info directory)

    When the [SUBCLASSNAME] is not present, then the name of the data file will be:

                "../[covername]/[ext].adf"
    

    e.g. The table named "TEST.PAT" would be stored in the file "../test/pat.adf"

    Of course, it would be too easy if there were no exceptions to these rules! Single precision ".TIC" and ".BND" follow the above rules and will be named "tic.adf" and "bnd.adf" but in double precision coverages, they will be named "dbltic.adf" and "dblbnd.adf".

     

 

3.2 INFO FILES IN "WEIRD" COVERAGES

 

    Weird coverages use the same method to store their INFO tables as V7.x Coverages except for the file names used.

     

          V7 Filename        Corresponding Weird Filename
    
          info/arc.dir	   info/arcdr9
          info/arc0000.dat	   info/arc000dat
          info/arc0000.nit     info/arc000nit
          covername/aat.adf    covername/aat
    

    Weird coverage filenames and directory names are often in upper case. We've also observed some coverages in which the DAT/NIT filenames were truncated to 8 characters, e.g. "ARC000DA", "ARC000NI", ...

    Another difference that was noted is that the "ARCDR9" file can contain multiple entries for the same table name, and the only way to tell which one is valid is by looking for the corresponding DAT and NIT files.

    V7 coverages will overwrite old tables in the arc.dir, but weird coverages seem to always append to the end of the index.

 

3.3 DBF FILES IN PC COVERAGES V1 and V2

 

  •  
    • AAT.DBF
    • PAT.DBF
    • TIC.DBF
    • BND.DBF
    • LUT.DBF
    • AAT.DBF
    • PAT.DBF
    • DBLTIC.DBF
    • DBLBND.DBF
    • LANDUSE.DBF
  • PC Coverages store their attribute information in regular DBF files inside the coverage directory.

    File names:

    There is no equivalent to the arc.dir with the list of table for each coverage. You have to look for "???.DBF" in the coverage directory to get the list of tables.

    Here are the most common .DBF table filenames we can find:

     

    Double precision PC Coverages V2 may contain:

     

    Field Names:

    Because of restrictions in the DBF specs for attribute names, some special attribute names have to be repaired when they are read from a DBF file. For instance, in a coverage named "TEST", the following DBF field names will contain "_" characters in place of some characters that are not permitted in DBF field names:

        .DBF Attribute Name    Arc/Info Name
    	
    	TEST_                 TEST#
    	TEST_ID		      TEST-ID
    	FNODE_                FNODE#
    	TNODE_                TNODE#
    	LPOLY_                LPOLY#
    	RPOLY_                RPOLY#
    

    It is also important to note that DBF field names are limited to 10 characters while Arc/Info field names can have up to 15 characters.

    Field Data Types:

    DBF and INFO files do not use the same code for field data types. The DBF data types have to be mapped to Arc/Info data types:

     

        Arc/Info Data Type            DBF Field Data Type
    
    	10 (D) Date                ??? Never seen any... probably 'D' (date)
    	20 (C) Char		   'C' - char
    	30 (I) Integer		   'N' - Numeric, decimals=0
    	40 (N) Numeric		   ??? Never seen any... probably 'N'
    	50 (B) Binary int.	   'N' - Numeric, decimals=0
    	60 (F) Binary float	   'N' - Numeric, see not below
    

    Note: Floating point values (type 60) are stored inside the DBF file using exponent notation in 13 characters numeric ('N') fields with 0 significant digits before the point. (e.g. -110.333300 is stored as -.1103333E+03, and 65.277460 is stored as 0.6527746E+02)

    Note2: What is the difference between types 30 and 50 when stored in DBF files? It seems that all system attributes (TEST#, TEST-ID, FNODE#, etc...) are always stored as type 50, and all user-defined integer fields would always be stored as type 30.

     

你可能感兴趣的:(File,header,table,each,byte,Annotations)