As promised, here is a little How-I-did-it / How-To.
First off: I am not an experienced SAX-User.. So this approach might be packing the problem at it’s tail, but this is how DOM-Users feel comfortable with ;)
Let’s assume we want to parse the following XML:
tranist.xml
<root> <schedules> <schedule id="0"> <from>SourceA</from> <to>DestinationA</to> <links> <link id="0"> <departure>2008-01-01 01:01</departure> <arrival>2008-01-01 01:02</arrival> <info>With food</info> <parts> <part id="0"> <departure>2008-01-01 01:01</departure> <arrival>2008-01-01 01:02</arrival> <vehicle>Walk</vehicle> </part> <part id="1"> <departure>2008-01-01 01:01</departure> <arrival>2008-01-01 01:02</arrival> <trackfrom>1</trackfrom> <trackto>2</trackto> <vehicle>Train</vehicle> </part> </parts> </link> <link id="1"> ... </link> <link id="2"> ... </link> </links> </schedule> <schedule id="1"> ... </schedule> <schedule id="2"> ... </schedule> </schedules> </root>
n human readable format, this means: We have multiple schedules with from/to etc. These schedules consist of multiple links (different connections for the same route) with departure/arrival etc. These links consist then of multiple parts/sections with various elements which are not sure to be there..
With the let’s find the element called ‘part’ - approach, you won’t get anywhere..
The Basics
So what do we want to achieve? We want a list/array of Schedules, which have the given members. On member is a list/array of Links, also consisting of the given members and a list/array of parts with the respective members.
This is also the basic idea behind my approach: for every new node-container, use a new class/object (an array will also work, but it’s kinda crap..)
Now we have a Schedule class, a Link class and a Part class.
This is an example of the Link class interface:
Link.h
#import "Part.h" @interface Link : NSObject { NSString *departure; NSString *arrival; NSString *info; NSMutableArray *parts; } @property (nonatomic, retain) NSString *departure; @property (nonatomic, retain) NSString *arrival; @property (nonatomic, retain) NSString *info; @property (readonly, retain) NSMutableArray *parts; - (void)addPart:(Part *)part; @end
We use an accessor method for the parts, because it just feels better when dealing with arrays. (Instead of later using [foo.myArray addObject:..] we have [foo addMe:..])
Also we make it easier for us, using retain properties..
The Parser setup
A short introduction into SAX:
The parsing goes node by node and is not nesting-sensitive. That means that first we get root, then schedules, then schedule, then from, then to, then links, then link, then departure etc. As soon as the parser returns you the node for example, you don’t know anymore in what schedule you were. As long as you have a clearly defined structure where always every element must be present, you could do this using a counter, but as soon as you have multiple nodes with no defined count, you have a problem.
What we do is known as recursive parsing. What does this mean? We implement some kind of memory.
In our parser, we have 4 members and 1 method (to make actual use of the parser..):
@property (nonatomic, retain) NSMutableString *currentProperty; @property (nonatomic, retain) Schedule *currentSchedule; @property (nonatomic, retain) Link *currentLink; @property (nonatomic, retain) Part *currentPart; @property (nonatomic, readonly) NSMutableArray *schedules; - (void)parseScheduleData:(NSData *)data parseError:(NSError **)error;
(Yes, this needs to be a NSMutableString..)
Your parseScheduleData method should look similar to the following:
arseJourneyData
- (void)parseJourneyData:(NSData *)data parseError:(NSError **)err { NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data]; self.schedules = [[NSMutableArray alloc] init]; // Create our scheduler list [parser setDelegate:self]; // The parser calls methods in this class [parser setShouldProcessNamespaces:NO]; // We don't care about namespaces [parser setShouldReportNamespacePrefixes:NO]; // [parser setShouldResolveExternalEntities:NO]; // We just want data, no other stuff [parser parse]; // Parse that data.. if (err && [parser parserError]) { *err = [parser parserError]; } [parser release]; }
Now we need those delegate methods.
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
This function is called by the parser, when it reads something between nodes. (Text that is..) Like with blah it would read “blah”. It is possible, that this method is called multiple times in one node. As you will see later, we define the property “currentProperty” only if we find a node, we care about. That’s why we test it against this property to make sure, that we need this property. This will then look something like this:
Parser
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string { if (self.currentProperty) { [currentProperty appendString:string]; } }
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
This is called, when the parser finds an opening element. In this case, we have a few cases, we need to distinguish. These are:
It’s standard property in the schedule (like <form> etc.) or it’s a deeper nested node (like <links>), the same for all the other nodes.
How to? We define, that we only set a member, if we are in that node. That means, only when we have entered a <part>, then currentPart is set, otherwise it’s nil. The same with the others.
We do then need to check them in reverse order of their nesting level.. Why? Because if we would check for currentLink before currentPart, currentLink would also evaluate to YES/True and hence we will have a problem if their are elements with the same name. If we aren’t in any node, then there is probably a new main node comming -> in the else..
When we hit a nested node, we need to allocate the respective member of our class, so we can use it when the parser gets deeper into it.
This will look like this:
Parser
(void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict { if (qName) { elementName = qName; } if (self.currentPart) { // Are we in a // Check for standard nodes if ([elementName isEqualToString:@"departure"] || [elementName isEqualToString:@"arrival"] || [elementName isEqualToString:@"vehicle"] || [elementName isEqualToString:@"trackfrom"] || [elementName isEqualToString:@"trackto"] ) { self.currentProperty = [NSMutableString string]; } } else if (self.currentLink) { // Are we in a // Check for standard nodes if ([elementName isEqualToString:@"departure"] || [elementName isEqualToString:@"arrival"] || [elementName isEqualToString:@"info"]) { self.currentProperty = [NSMutableString string]; // Check for deeper nested node } else if ([elementName isEqualToString:@"part"]) { self.currentPart = [[Part alloc] init]; // Create the element } } else if (self.currentSchedule) { // Are we in a ? // Check for standard nodes if ([elementName isEqualToString:@"from"] || [elementName isEqualToString:@"to"]) { self.currentProperty = [NSMutableString string]; // Check for deeper nested node } else if ([elementName isEqualToString:@"link"]) { self.currentLink = [[Link alloc] init]; // Create the element } } else { // We are outside of everything, so we need a // Check for deeper nested node if ([elementName isEqualToString:@"schedule"]) { self.currentSchedule = [[Schedule alloc] init]; } } }
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
Basically, the same things apply as for didStartElement above. This time, we need to clean things up and assign them if they are set :) This is a bit a pitty, since it’s a lot of code.. *(for not so much)
It’s the same checker-structure..
If we are in a deeper nested node (like <Link>) and we hit an ending element of that nested node (like </Link>), Then we need to add this element to the parent (like <Schedule>) and set it to nil
See yourself:
arser
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName { if (qName) { elementName = qName; } if (self.currentPart) { // Are we in a // Check for standard nodes if ([elementName isEqualToString:@"departure"]) { self.currentPart.departure = self.currentProperty; } else if ([elementName isEqualToString:@"arrival"]) { self.currentPart.arrival = self.currentProperty; } else if ([elementName isEqualToString:@"vehicle"]) { self.currentPart.vehicle = self.currentProperty; } else if ([elementName isEqualToString:@"trackfrom"]) { self.currentPart.trackfrom = self.currentProperty; } else if ([elementName isEqualToString:@"trackto"]) { self.currentPart.trackto = self.currentProperty; // Are we at the end? } else if ([elementName isEqualToString:@"part"]) { [currentLink addPart:self.currentPart]; // Add to parent self.currentPart = nil; // Set nil } } else if (self.currentLink) { // Are we in a // Check for standard nodes if ([elementName isEqualToString:@"departure"]) { self.currentLink.departure = self.currentProperty; } else if ([elementName isEqualToString:@"arrival"]) { self.currentLink.arrival = self.currentProperty; } else if ([elementName isEqualToString:@"info"]) { self.currentLink.info = self.currentProperty; // Are we at the end? } else if ([elementName isEqualToString:@"link"]) { [currentSchedule addPart:self.currentLink]; // Add to parent self.currentLink = nil; // Set nil } } else if (self.currentSchedule) { // Are we in a ? // Check for standard nodes if ([elementName isEqualToString:@"from"]) { self.currentSchedule.from = self.currentProperty; } else if ([elementName isEqualToString:@"to"]) { self.currentSchedule.to = self.currentProperty; // Are we at the end? } else if ([elementName isEqualToString:@"schedule"]) { // Corrected thanks to Muhammad Ishaq [schedules addObject:self.currentSchedule]; // Add to the result node self.currentSchedule = nil; // Set nil } } // We reset the currentProperty, for the next textnodes.. self.currentProperty = nil; }
Finally..
Well, that’s it. You can expand / shrink this principle as you like. You can also add a maxElements counter, like in the SeismicXML example of the iPhone SDK to get only a certain number of elements. You can abort the parser with [parser abortParsing]; It is important, that you don’t abort while in a deeper nested node, because this could lead to inconsistencies. You will need to skip them..
Please note, that I wrote this, while watching TV, so you may need to fix some syntax errors ;) But I hope you get the idea..
from:http://codesofa.com/blog/archive/2008/07/23/make-nsxmlparser-your-friend.html