Parsing XML files, on iPhone..
Hello All,
I'm doing a little iPhone project, and they've got a few XML files. Anyone know if there's an official 'Apple way' to parse XML files, or should I use some open source library? Any suggestions? (p.s. I am a bit of an XML newbie).
Thanks,
I'm doing a little iPhone project, and they've got a few XML files. Anyone know if there's an official 'Apple way' to parse XML files, or should I use some open source library? Any suggestions? (p.s. I am a bit of an XML newbie).
Thanks,
Blog - http://bit.ly/mrqwak
Games - http://bit.ly/mrqwakgames
Twitter - http://www.twitter.com/mrqwak
Hi Jamie,
I use libxml2 in my own iPhone projects. You can find a description of setting up libxml in Xcode at Jeff Lamarache's site.
If your coding in Objective-C you may prefer to use NSXMLParser instead.
Good luck with your project.
I use libxml2 in my own iPhone projects. You can find a description of setting up libxml in Xcode at Jeff Lamarache's site.
If your coding in Objective-C you may prefer to use NSXMLParser instead.
Good luck with your project.
The Monkey Hustle - Now available on the App Store!
Damn it! I mean, thank you for your reply. I just posted a response, thanking you, and explaining I found a solution, and was posting it here, when the forum software claimed it believed I was posting spam! Maybe the forum admin should be aware that the software is flagging legit helpful posts as spam?...
Blog - http://bit.ly/mrqwak
Games - http://bit.ly/mrqwakgames
Twitter - http://www.twitter.com/mrqwak
Haha, this happened to me as well... Fortunately Carlos and the others responded, scared me for a bit.
Hehehe. Well, for anyone else coming across this thread; as well as what mariocaprino suggestions, you might want to try googling 'RapidXML' (scared to post links now!).
Blog - http://bit.ly/mrqwak
Games - http://bit.ly/mrqwakgames
Twitter - http://www.twitter.com/mrqwak
If you need a quick and dirty way to read XML files in your game - you can use the following Python script that parses a XML file and outputs a simple binary format that equates to the SAX protocol.
The binary format is divided into three parts: a header, a string list, and the XML structure (SAX command protocol).
All values are 32-bit aligned so you can easily traverse the file in memory using a UINT pointer.
Header
The header contains the following 4 32-bit values ('hdr ', 2, 'sax ', 0)
String list
The string list starts with the following header ('list', sizeof(list) / 4). Each string has a length prefix. Be aware the length is described in number of 32-bit values, not bytes! You can for example use the following code to count the number of strings in the list:
XML structure
The XML structure starts with the following header ('doc ', sizeof(doc) / 4). The contents is a list of commands and paramters simulating the SAX protocol. Each command is described with an identifier and length of parameters. Parameters is a list of string indices from the string list. You'll notice that the file format only uses 32-bit unsigned integers to describe the the whole XML structure.
Here are some examples of SAX commands:
Limitations
Compared to other binary formats this solution has the following limitations:
Python script
Your free to use the following code with your own projects - I hope you find it usefull.
The binary format is divided into three parts: a header, a string list, and the XML structure (SAX command protocol).
- The header is just a magic token at the start of the file
- The string list contains all strings in the original XML document in UTF-8 encoding, and terminating zero. Each string is given an index that is used to refer to the string in the later XML structure section. Duplicate strings have the same index. Each string is padded to be 32-bit aligned.
- The XML structure is described using SAX commands.
All values are 32-bit aligned so you can easily traverse the file in memory using a UINT pointer.
Header
The header contains the following 4 32-bit values ('hdr ', 2, 'sax ', 0)
String list
The string list starts with the following header ('list', sizeof(list) / 4). Each string has a length prefix. Be aware the length is described in number of 32-bit values, not bytes! You can for example use the following code to count the number of strings in the list:
Code:
static UINT xml_countlist (const UINT* list, UINT length)
{
UINT count= 0;
for (; 0 < length; count++)
{
UINT skip= 1 + list[0];
list+= skip;
length-= skip;
}
return count;
}XML structure
The XML structure starts with the following header ('doc ', sizeof(doc) / 4). The contents is a list of commands and paramters simulating the SAX protocol. Each command is described with an identifier and length of parameters. Parameters is a list of string indices from the string list. You'll notice that the file format only uses 32-bit unsigned integers to describe the the whole XML structure.
Here are some examples of SAX commands:
- startElement ('selm'): Parameters are the element name and pairs of key/value indices. The length is (1 + numattributes * 2)
- endElement ('eelm'): Parameters is the element name. The length will always be 1.
- characters ('chr '): Parameters is the character string. The length will always be 1 because I compress all neighbouring character commands into one during encoding.
Limitations
Compared to other binary formats this solution has the following limitations:
- The whole file need to be read into memory (because of the string list)
- The file size is usually close the original (because all values are 32-bit aligned)
- The format simulates the SAX protocol and therefore also has the same limitations
Python script
Your free to use the following code with your own projects - I hope you find it usefull.
Code:
#!/usr/bin/env python
import os
import re
import xml.sax
import struct
import StringIO
class SaxHandler (xml.sax.handler.ContentHandler):
def __init__ (self, file):
self.out= file
self.charlist= None
self.stringlist= [];
self.re= re.compile (r"\s+", re.UNICODE | re.LOCALE)
def stringIndex (self, str):
try:
return self.stringlist.index (str)
except ValueError:
self.stringlist.append (str)
return len (self.stringlist) - 1
def stringList (self):
return self.stringlist
def flushCharacters (self):
if self.charlist is None:
return
#command
str= struct.pack ('4sI', 'chr ', 1)
self.out.write (str)
#characters
str= ''.join (self.charlist)
self.charlist= None
str= self.re.sub (' ', str)
str= struct.pack ('I', self.stringIndex (str))
self.out.write (str)
#---
def startElement (self, name, attrs):
self.flushCharacters()
attrs= attrs.items()
#command
self.out.write (struct.pack ('4sI', 'selm', 1 + len (attrs) * 2))
#name
n= self.stringIndex (name)
self.out.write (struct.pack ('I', n))
#attrs
for (key, value) in attrs:
k= self.stringIndex (key)
v= self.stringIndex (value)
self.out.write (struct.pack ('II', k, v))
def endElement (self, name):
self.flushCharacters()
#command
self.out.write (struct.pack ('4sI', 'eelm', 1))
#name
n= self.stringIndex (name)
self.out.write (struct.pack ('I', n))
def characters (self, ch):
if self.charlist is None:
self.charlist= []
self.charlist.append (ch)
#---
def write_doc (out, filename):
handler= SaxHandler (out)
xml.sax.parse (filename, handler)
return handler.stringList()
def align (size, boundary):
return ((size) + ((boundary) - 1)) & ~((boundary) - 1)
def write_list (out, list):
for str in list:
str= str.encode ('utf-8')
l= align (len (str) + 1, 4)
fmt= "I%ds" % (l)
out.write (struct.pack (fmt, l / 4, str))
def encode (inpath):
docstr= StringIO.StringIO()
list= write_doc (docstr, inpath)
docstr= docstr.getvalue()
liststr= StringIO.StringIO()
write_list (liststr, list)
liststr= liststr.getvalue()
#header
out= StringIO.StringIO()
out.write (struct.pack ('4sI4sI', 'hdr ', 2, 'sax ', 0))
#list
out.write (struct.pack ('4sI', 'list', len (liststr) / 4))
out.write (liststr)
#doc
out.write (struct.pack ('4sI', 'doc ', len (docstr) / 4))
out.write (docstr)
return out.getvalue()
if __name__ == '__main__':
inpath= os.getenv ('INPUT_FILE_PATH')
outstr= encode (inpath)
outpath= os.path.join (os.getenv ('DERIVED_FILES_DIR'), os.getenv ('INPUT_FILE_REGION_PATH_COMPONENT') + os.getenv ('INPUT_FILE_NAME'))
out= open (outpath, 'wb')
out.write (outpath)
out.close()The Monkey Hustle - Now available on the App Store!
@mariocaprino Python isn't going to be much help on the iPhone.
NSXMLParser leaks (personal experience circa iPhone OS 2, could have been fixed in a more recent version) but it does use libxml2 under the hood.
If you're not doing a whole lot of XML parsing, use NSXMLParser. If it's something that you plan on doing a whole lot, use libxml2 and be careful to eliminate all memory leaks.
NSXMLParser leaks (personal experience circa iPhone OS 2, could have been fixed in a more recent version) but it does use libxml2 under the hood.
If you're not doing a whole lot of XML parsing, use NSXMLParser. If it's something that you plan on doing a whole lot, use libxml2 and be careful to eliminate all memory leaks.
Hi cmiller,
The Python script is just a binary XML-encoder to be run on your development environment. The binary output format is meant to be simple and fast to parse on your target platform. I can upload a sample decoder/parser in C for use on the iPhone if there is interest.
The reason I wrote the binary XML solution is because I experienced slow performance with libxml when parsing AngelCodes font files. Maybe somebody finds it useful - instead of inventing Yet Anoher Binary XML Format :-)
The Python script is just a binary XML-encoder to be run on your development environment. The binary output format is meant to be simple and fast to parse on your target platform. I can upload a sample decoder/parser in C for use on the iPhone if there is interest.
The reason I wrote the binary XML solution is because I experienced slow performance with libxml when parsing AngelCodes font files. Maybe somebody finds it useful - instead of inventing Yet Anoher Binary XML Format :-)
The Monkey Hustle - Now available on the App Store!
Another Objective C option, though I don't know how recently it's been updated:
http://code.google.com/p/touchcode/wiki/TouchXML
http://code.google.com/p/touchcode/wiki/TouchXML
KB Productions, Car Care for iPhone/iPod Touch
@karlbecker_com
All too often, art is simply the loss of practicality.
Another good XML parser for the iPhone is TBXML. If you only want to parse XML and not create new DOMs or write XML out, this is a very quick parser with a really simple API.
http://www.tbxml.co.uk
http://www.tbxml.co.uk
iPhone Game Development Blog - 71Squared
There are different xml files that can be used for iPhone like NSXML Parser, Touch XML and libxml 2. I recommend NSXML because it is simple and easy written in Objective C.
(Aug 5, 2010 03:23 PM)mariocaprino Wrote: Hi cmiller,
The Python script is just a binary XML-encoder to be run on your development environment. The binary output format is meant to be simple and fast to parse on your target platform. I can upload a sample decoder/parser in C for use on the iPhone if there is interest.
The reason I wrote the binary XML solution is because I experienced slow performance with libxml when parsing AngelCodes font files. Maybe somebody finds it useful - instead of inventing Yet Anoher Binary XML Format :-)
Hey there! :-)
I'd certainly be interested in seeing the 'C' decoder/parser; if you wouldn't mind.
Cheers
Mark
(Jan 25, 2011 06:59 AM)markhula Wrote: I'd certainly be interested in seeing the 'C' decoder/parser; if you wouldn't mind.
I'll gladly post the parser I currently use for parsing output from my Python encoder. Sadly for people that might want to use the code it requires the Apache Portable Runtime. Thus you might want to check out their xml documentation regarding how I store the DOM document in memory. APR also use memory pools - which make memory dealocation for large DOM structures super easy - but might not fit your project's memory model.
If you want to read more about Apache Portable Runtime I can recommend the following tutorial.
Code:
const apr_xml_doc* xml_parsebinary (const fsfile_t* self, apr_pool_t* pool)
{
UINT len;
const UINT* ints= fsfile_read (self, &len, pool); //read the whole contents of a binary file
//because the file is encoded as a list of 32bit values (even the strings fit this model) it can be treated as an array of ints
//header
LOG_ASSERT (xml_istag (ints)); //check file header
ints+= 2 + ints[1]; //skip the block tag and block length
//list
xml_context_t ctx;
LOG_ASSERT (ints[0] == 'tsil'); //check block id
xml_init (&ctx, ints + 2, ints[1], pool); //initialise string table
ints+= 2 + ints[1]; //skip...
//doc
LOG_ASSERT (ints[0] == ' cod'); //check...
UINT length= ints[1];
ints+= 2;
apr_xml_doc* doc= apr_pcalloc (pool, sizeof (apr_xml_doc));
while (0 < length)
{
switch (ints[0])
{
case 'mles': xml_startelem (&ctx, ints + 2, ints[1]); break;
case 'mlee': doc->root= xml_endelem (&ctx, ints + 2, ints[1]); break;
case ' rhc': xml_characters (&ctx, ints + 2, ints[1]); break;
default:
LOG_BADCASE (ints[0]);
break;
}
UINT skip= 2 + ints[1];
ints+= skip;
length-= skip;
}
apr_pool_destroy (ctx.temp);
return doc;
}So that's the basic idea of a SAX parser for the in-memory structure. You would replace the functions xml_startelem, xml_endelem and xml_characters for your SAX callback functions.
If you are interested in a DOM parser - keep on reading.
Sadly it's not a drop-in solution because of it's dependence on APR - but hopefully it can help you develop your own parser or give you some new ideas about how to proceed.
Here are the details...
Code:
static BOOL xml_istag (const UINT* bytes)
{
const UINT tag[]= {' rdh', 2, ' xas', 0};
return memcmp (bytes, tag, sizeof (tag)) == 0;
}
static UINT xml_countlist (const UINT* list, UINT length)
{
UINT count= 0;
for (; 0 < length; count++)
{
UINT skip= 1 + list[0];
list+= skip;
length-= skip;
}
return count;
}
typedef struct
{
apr_pool_t* pool, *temp;
apr_xml_elem* elem;
apr_array_header_t* strings;
} xml_context_t;
static void xml_init (xml_context_t* ctx, const UINT* ints, UINT length, apr_pool_t* pool)
{
ctx->pool= pool;
ctx->elem= NULL;
UINT nelts= xml_countlist (ints, length);
POOL_CREATETEMP (&ctx->temp);
ctx->strings= apr_array_make (ctx->temp, nelts, sizeof (char*));
for (UINT i= 0; i < nelts; i++)
{
const char* str= (const char*) (ints + 1);
APR_ARRAY_PUSH (ctx->strings, const char*)= str;
ints+= 1 + ints[0];
}
}
static const char* xml_getstring (const xml_context_t* ctx, UINT i)
{
const apr_array_header_t* arr= ctx->strings;
LOG_ASSERT (i < arr->nelts);
const char** base= (const char**) arr->elts;
return base[i];
}
#define xml_topelem(ctx) ((ctx)->elem)
static apr_xml_elem* xml_pushelem (xml_context_t* ctx)
{
apr_xml_elem* parent= xml_topelem (ctx);
apr_xml_elem* elem= apr_pcalloc (ctx->pool, sizeof (apr_xml_elem));
ctx->elem= elem;
if (parent == NULL)
return elem;
elem->parent= parent;
if (parent->first_child == NULL)
{
LOG_ASSERT (parent->last_child == NULL);
parent->first_child= elem;
}
else
{
LOG_ASSERT (parent->last_child != NULL);
parent->last_child->next= elem;
}
parent->last_child= elem;
return elem;
}
static void xml_popelem (xml_context_t* ctx)
{
LOG_ASSERT (ctx->elem != NULL);
ctx->elem= ctx->elem->parent;
}
//---
static void xml_startelem (xml_context_t* ctx, const UINT* ints, UINT length)
{
apr_xml_elem* elem= xml_pushelem (ctx);
elem->name= xml_getstring (ctx, ints[0]);
apr_xml_attr** next= &elem->attr;
for (UINT i= 1; i < length; i+= 2)
{
*next= apr_pcalloc (ctx->pool, sizeof (apr_xml_attr));
(*next)->name= xml_getstring (ctx, ints[i]);
(*next)->value= xml_getstring (ctx, ints[i + 1]);
next= &(*next)->next;
}
}
static apr_xml_elem* xml_endelem (xml_context_t* ctx, const UINT* ints, UINT length)
{
LOG_ASSERT (length == 1);
const char* name= xml_getstring (ctx, ints[0]);
apr_xml_elem* elem= xml_topelem (ctx);
LOG_ASSERT (strcmp (elem->name, name) == 0);
xml_popelem (ctx);
return elem;
}
static void xml_characters (xml_context_t* ctx, const UINT* ints, UINT length)
{
apr_xml_elem* elem= xml_topelem (ctx);
apr_text_header* hdr= (elem->last_child == NULL) ? &elem->first_cdata : &elem->last_child->following_cdata;
for (UINT i= 0; i < length; i++)
{
const char* str= xml_getstring (ctx, ints[i]);
apr_text_append (ctx->pool, hdr, str);
}
}The Monkey Hustle - Now available on the App Store!
(Jan 25, 2011 06:59 AM)markhula Wrote:(Aug 5, 2010 03:23 PM)mariocaprino Wrote: Hi cmiller,
The Python script is just a binary XML-encoder to be run on your development environment. The binary output format is meant to be simple and fast to parse on your target platform. I can upload a sample decoder/parser in C for use on the iPhone if there is interest.
The reason I wrote the binary XML solution is because I experienced slow performance with libxml when parsing AngelCodes font files. Maybe somebody finds it useful - instead of inventing Yet Anoher Binary XML Format :-)
Hey there! :-)
I'd certainly be interested in seeing the 'C' decoder/parser; if you wouldn't mind.
Cheers
Mark
Try this.
http://codesuppository.blogspot.com/2009...tream.html
If you don't know who the author is ... here is some info:
http://en.wikipedia.org/wiki/John_W._Ratcliff
Possibly Related Threads...
| Thread: | Author | Replies: | Views: | Last Post | |
| Question about tmx files (tilemaps) and resolution for Iphone 3gs, Ipad, and Retina | appsolutecreations | 10 | 13,381 |
May 19, 2012 10:02 AM Last Post: tapouillo |
|
| Playing .mod files on iPhone? | Jamie W | 0 | 2,637 |
Jul 23, 2009 05:06 AM Last Post: Jamie W |
|

