Parsing XML files, on iPhone..

Member
Posts: 129
Joined: 2009.03
Post: #1
Hello All,

I'm doing a little iPhone project, and they've got a few XML files. Anyone know if there's an official 'Apple way' to parse XML files, or should I use some open source library? Any suggestions? (p.s. I am a bit of an XML newbie).

Thanks,
Quote this message in a reply
Member
Posts: 23
Joined: 2010.08
Post: #2
Hi Jamie,

I use libxml2 in my own iPhone projects. You can find a description of setting up libxml in Xcode at Jeff Lamarache's site.

If your coding in Objective-C you may prefer to use NSXMLParser instead.

Good luck with your project.

The Monkey Hustle - Now available on the App Store!
Quote this message in a reply
Member
Posts: 129
Joined: 2009.03
Post: #3
Damn it! I mean, thank you for your reply. I just posted a response, thanking you, and explaining I found a solution, and was posting it here, when the forum software claimed it believed I was posting spam! Maybe the forum admin should be aware that the software is flagging legit helpful posts as spam?...
Quote this message in a reply
Member
Posts: 227
Joined: 2008.08
Post: #4
Haha, this happened to me as well... Fortunately Carlos and the others responded, scared me for a bit.
Quote this message in a reply
Member
Posts: 129
Joined: 2009.03
Post: #5
Hehehe. Well, for anyone else coming across this thread; as well as what mariocaprino suggestions, you might want to try googling 'RapidXML' (scared to post links now!). Wink
Quote this message in a reply
Member
Posts: 23
Joined: 2010.08
Post: #6
If you need a quick and dirty way to read XML files in your game - you can use the following Python script that parses a XML file and outputs a simple binary format that equates to the SAX protocol.

The binary format is divided into three parts: a header, a string list, and the XML structure (SAX command protocol).
  1. The header is just a magic token at the start of the file
  2. The string list contains all strings in the original XML document in UTF-8 encoding, and terminating zero. Each string is given an index that is used to refer to the string in the later XML structure section. Duplicate strings have the same index. Each string is padded to be 32-bit aligned.
  3. The XML structure is described using SAX commands.

All values are 32-bit aligned so you can easily traverse the file in memory using a UINT pointer.

Header
The header contains the following 4 32-bit values ('hdr ', 2, 'sax ', 0)

String list
The string list starts with the following header ('list', sizeof(list) / 4). Each string has a length prefix. Be aware the length is described in number of 32-bit values, not bytes! You can for example use the following code to count the number of strings in the list:

Code:
static UINT xml_countlist (const UINT* list, UINT length)
{
    UINT count= 0;
    for (; 0 < length; count++)
    {
        UINT skip= 1 + list[0];
        list+= skip;
        length-= skip;
    }
    
    return count;
}

XML structure
The XML structure starts with the following header ('doc ', sizeof(doc) / 4). The contents is a list of commands and paramters simulating the SAX protocol. Each command is described with an identifier and length of parameters. Parameters is a list of string indices from the string list. You'll notice that the file format only uses 32-bit unsigned integers to describe the the whole XML structure.

Here are some examples of SAX commands:
  • startElement ('selm'): Parameters are the element name and pairs of key/value indices. The length is (1 + numattributes * 2)
  • endElement ('eelm'): Parameters is the element name. The length will always be 1.
  • characters ('chr '): Parameters is the character string. The length will always be 1 because I compress all neighbouring character commands into one during encoding.

Limitations
Compared to other binary formats this solution has the following limitations:
  • The whole file need to be read into memory (because of the string list)
  • The file size is usually close the original (because all values are 32-bit aligned)
  • The format simulates the SAX protocol and therefore also has the same limitations

Python script
Your free to use the following code with your own projects - I hope you find it usefull.
Code:
#!/usr/bin/env python

import os
import re
import xml.sax
import struct
import StringIO

class SaxHandler (xml.sax.handler.ContentHandler):
    def __init__ (self, file):
        self.out= file
        self.charlist= None
        self.stringlist= [];
        self.re= re.compile (r"\s+", re.UNICODE | re.LOCALE)
        
    
    def stringIndex (self, str):
        try:
            return self.stringlist.index (str)    
        except ValueError:
            self.stringlist.append (str)
            return len (self.stringlist) - 1
    
    def stringList (self):
        return self.stringlist
    
    def flushCharacters (self):
        if self.charlist is None:
            return

        #command
        str= struct.pack ('4sI', 'chr ', 1)
        self.out.write (str)

        #characters
        str= ''.join (self.charlist)
        self.charlist= None
        
        str= self.re.sub (' ', str)
        str= struct.pack ('I', self.stringIndex (str))
        self.out.write (str)        
    
    #---
        
    def startElement (self, name, attrs):
        self.flushCharacters()
        attrs= attrs.items()
        
        #command
        self.out.write (struct.pack ('4sI', 'selm', 1 + len (attrs) * 2))
        
        #name
        n= self.stringIndex (name)
        self.out.write (struct.pack ('I', n))
        
        #attrs
        for (key, value) in attrs:
            k= self.stringIndex (key)
            v= self.stringIndex (value)
            self.out.write (struct.pack ('II', k, v))
            
    def endElement (self, name):
        self.flushCharacters()
        
        #command
        self.out.write (struct.pack ('4sI', 'eelm', 1))
        
        #name
        n= self.stringIndex (name)
        self.out.write (struct.pack ('I', n))
        
    def characters (self, ch):
        if self.charlist is None:
            self.charlist= []
        self.charlist.append (ch)
        

#---
def write_doc (out, filename):
    handler= SaxHandler (out)
    xml.sax.parse (filename, handler)
    return handler.stringList()

def align (size, boundary):
    return ((size) + ((boundary) - 1)) & ~((boundary) - 1)
    
def write_list (out, list):    
    for str in list:
        str= str.encode ('utf-8')
        l= align (len (str) + 1, 4)
        
        fmt= "I%ds" % (l)
        out.write (struct.pack (fmt, l / 4, str))    

def encode (inpath):
    docstr= StringIO.StringIO()
    list= write_doc (docstr, inpath)
    docstr= docstr.getvalue()
    
    liststr= StringIO.StringIO()
    write_list (liststr, list)
    liststr= liststr.getvalue()
    
    #header
    out= StringIO.StringIO()
    out.write (struct.pack ('4sI4sI', 'hdr ', 2, 'sax ', 0))
    
    #list
    out.write (struct.pack ('4sI', 'list', len (liststr) / 4))
    out.write (liststr)
        
    #doc
    out.write (struct.pack ('4sI', 'doc ', len (docstr) / 4))
    out.write (docstr)
    
    return out.getvalue()
    

if __name__ == '__main__':
    inpath= os.getenv ('INPUT_FILE_PATH')
    outstr= encode (inpath)
    
    outpath= os.path.join (os.getenv ('DERIVED_FILES_DIR'), os.getenv ('INPUT_FILE_REGION_PATH_COMPONENT') + os.getenv ('INPUT_FILE_NAME'))
    out= open (outpath, 'wb')
    out.write (outpath)
    out.close()

The Monkey Hustle - Now available on the App Store!
Quote this message in a reply
Member
Posts: 144
Joined: 2009.11
Post: #7
@mariocaprino Python isn't going to be much help on the iPhone.

NSXMLParser leaks (personal experience circa iPhone OS 2, could have been fixed in a more recent version) but it does use libxml2 under the hood.

If you're not doing a whole lot of XML parsing, use NSXMLParser. If it's something that you plan on doing a whole lot, use libxml2 and be careful to eliminate all memory leaks.

Everyone's favourite forum lurker!
https://github.com/NSError
Quote this message in a reply
Member
Posts: 23
Joined: 2010.08
Post: #8
Hi cmiller,

The Python script is just a binary XML-encoder to be run on your development environment. The binary output format is meant to be simple and fast to parse on your target platform. I can upload a sample decoder/parser in C for use on the iPhone if there is interest.

The reason I wrote the binary XML solution is because I experienced slow performance with libxml when parsing AngelCodes font files. Maybe somebody finds it useful - instead of inventing Yet Anoher Binary XML Format :-)

The Monkey Hustle - Now available on the App Store!
Quote this message in a reply
Moderator
Posts: 385
Joined: 2002.08
Post: #9
Another Objective C option, though I don't know how recently it's been updated:
http://code.google.com/p/touchcode/wiki/TouchXML

KB Productions, Car Care for iPhone/iPod Touch
@karlbecker_com
All too often, art is simply the loss of practicality.
Quote this message in a reply
Member
Posts: 65
Joined: 2009.03
Post: #10
Another good XML parser for the iPhone is TBXML. If you only want to parse XML and not create new DOMs or write XML out, this is a very quick parser with a really simple API.

http://www.tbxml.co.uk

iPhone Game Development Blog - 71Squared
Quote this message in a reply
Apprentice
Posts: 6
Joined: 2010.12
Post: #11
There are different xml files that can be used for iPhone like NSXML Parser, Touch XML and libxml 2. I recommend NSXML because it is simple and easy written in Objective C.
Quote this message in a reply
Member
Posts: 117
Joined: 2010.09
Post: #12
(Aug 5, 2010 03:23 PM)mariocaprino Wrote:  Hi cmiller,

The Python script is just a binary XML-encoder to be run on your development environment. The binary output format is meant to be simple and fast to parse on your target platform. I can upload a sample decoder/parser in C for use on the iPhone if there is interest.

The reason I wrote the binary XML solution is because I experienced slow performance with libxml when parsing AngelCodes font files. Maybe somebody finds it useful - instead of inventing Yet Anoher Binary XML Format :-)

Hey there! :-)
I'd certainly be interested in seeing the 'C' decoder/parser; if you wouldn't mind.

Cheers

Mark
Quote this message in a reply
Member
Posts: 23
Joined: 2010.08
Post: #13
(Jan 25, 2011 06:59 AM)markhula Wrote:  I'd certainly be interested in seeing the 'C' decoder/parser; if you wouldn't mind.

I'll gladly post the parser I currently use for parsing output from my Python encoder. Sadly for people that might want to use the code it requires the Apache Portable Runtime. Thus you might want to check out their xml documentation regarding how I store the DOM document in memory. APR also use memory pools - which make memory dealocation for large DOM structures super easy - but might not fit your project's memory model.

If you want to read more about Apache Portable Runtime I can recommend the following tutorial.

Code:
const apr_xml_doc* xml_parsebinary (const fsfile_t* self, apr_pool_t* pool)
{
    UINT len;    
    const UINT* ints= fsfile_read (self, &len, pool); //read the whole contents of a binary file
    //because the file is encoded as a list of 32bit values (even the strings fit this model) it can be treated as an array of ints
    
    //header
    LOG_ASSERT (xml_istag (ints)); //check file header
    ints+= 2 + ints[1]; //skip the block tag and block length
    
    
    //list
    xml_context_t ctx;
    LOG_ASSERT (ints[0] == 'tsil'); //check block id
    xml_init (&ctx, ints + 2, ints[1], pool); //initialise string table
    ints+= 2 + ints[1]; //skip...
    
    //doc
    LOG_ASSERT (ints[0] == ' cod'); //check...
    UINT length= ints[1];
    ints+= 2;
    
    apr_xml_doc* doc= apr_pcalloc (pool, sizeof (apr_xml_doc));
    while (0 < length)
    {
        switch (ints[0])
        {
            case 'mles': xml_startelem (&ctx, ints + 2, ints[1]);    break;
            case 'mlee': doc->root= xml_endelem (&ctx, ints + 2, ints[1]);    break;
            case ' rhc': xml_characters (&ctx, ints + 2, ints[1]);    break;
            default:
                LOG_BADCASE (ints[0]);
                break;
        }
        
        UINT skip= 2 + ints[1];
        ints+= skip;
        length-= skip;
    }
    
    apr_pool_destroy (ctx.temp);
    return doc;
}

So that's the basic idea of a SAX parser for the in-memory structure. You would replace the functions xml_startelem, xml_endelem and xml_characters for your SAX callback functions.

If you are interested in a DOM parser - keep on reading.
Sadly it's not a drop-in solution because of it's dependence on APR - but hopefully it can help you develop your own parser or give you some new ideas about how to proceed.

Here are the details...
Code:
static BOOL xml_istag (const UINT* bytes)
{
    const UINT tag[]= {' rdh', 2, ' xas', 0};
    return memcmp (bytes, tag, sizeof (tag)) == 0;
}

static UINT xml_countlist (const UINT* list, UINT length)
{
    UINT count= 0;
    for (; 0 < length; count++)
    {
        UINT skip= 1 + list[0];
        list+= skip;
        length-= skip;
    }
    
    return count;
}

typedef struct
{
    apr_pool_t* pool, *temp;
    apr_xml_elem* elem;
    apr_array_header_t* strings;
} xml_context_t;

static void xml_init (xml_context_t* ctx, const UINT* ints, UINT length, apr_pool_t* pool)
{    
    ctx->pool= pool;
    ctx->elem= NULL;
    
    UINT nelts= xml_countlist (ints, length);
    POOL_CREATETEMP (&ctx->temp);
    ctx->strings= apr_array_make (ctx->temp, nelts, sizeof (char*));
    for (UINT i= 0; i < nelts; i++)
    {
        const char* str= (const char*) (ints + 1);
        APR_ARRAY_PUSH (ctx->strings, const char*)= str;
        ints+= 1 + ints[0];
    }
}

static const char* xml_getstring (const xml_context_t* ctx, UINT i)
{
    const apr_array_header_t* arr= ctx->strings;
    
    LOG_ASSERT (i < arr->nelts);
    const char** base= (const char**) arr->elts;
    return base[i];
}

#define xml_topelem(ctx)    ((ctx)->elem)

static apr_xml_elem* xml_pushelem (xml_context_t* ctx)
{
    apr_xml_elem* parent= xml_topelem (ctx);
    apr_xml_elem* elem= apr_pcalloc (ctx->pool, sizeof (apr_xml_elem));
    ctx->elem= elem;
    
    if (parent == NULL)
        return elem;
    
    elem->parent= parent;
    if (parent->first_child == NULL)
    {
        LOG_ASSERT (parent->last_child == NULL);
        parent->first_child= elem;
    }
    else
    {
        LOG_ASSERT (parent->last_child != NULL);
        parent->last_child->next= elem;
    }
    
    parent->last_child= elem;
    return elem;
}

static void xml_popelem (xml_context_t* ctx)
{
    LOG_ASSERT (ctx->elem != NULL);
    ctx->elem= ctx->elem->parent;
}

//---

static void xml_startelem (xml_context_t* ctx, const UINT* ints, UINT length)
{
    apr_xml_elem* elem= xml_pushelem (ctx);
    elem->name= xml_getstring (ctx, ints[0]);
    
    apr_xml_attr** next= &elem->attr;
    for (UINT i= 1; i < length; i+= 2)
    {
        *next= apr_pcalloc (ctx->pool, sizeof (apr_xml_attr));
        (*next)->name= xml_getstring (ctx, ints[i]);
        (*next)->value= xml_getstring (ctx, ints[i + 1]);
        next= &(*next)->next;
    }
}

static apr_xml_elem* xml_endelem (xml_context_t* ctx, const UINT* ints, UINT length)
{
    LOG_ASSERT (length == 1);
    const char* name= xml_getstring (ctx, ints[0]);
    apr_xml_elem* elem= xml_topelem (ctx);
    LOG_ASSERT (strcmp (elem->name, name) == 0);
    xml_popelem (ctx);
    return elem;
}

static void xml_characters (xml_context_t* ctx, const UINT* ints, UINT length)
{
    apr_xml_elem* elem= xml_topelem (ctx);
    apr_text_header* hdr= (elem->last_child == NULL) ? &elem->first_cdata : &elem->last_child->following_cdata;
    
    for (UINT i= 0; i < length; i++)
    {
        const char* str= xml_getstring (ctx, ints[i]);
        apr_text_append (ctx->pool, hdr, str);
    }
}

The Monkey Hustle - Now available on the App Store!
Quote this message in a reply
Member
Posts: 166
Joined: 2009.04
Post: #14
(Jan 25, 2011 06:59 AM)markhula Wrote:  
(Aug 5, 2010 03:23 PM)mariocaprino Wrote:  Hi cmiller,

The Python script is just a binary XML-encoder to be run on your development environment. The binary output format is meant to be simple and fast to parse on your target platform. I can upload a sample decoder/parser in C for use on the iPhone if there is interest.

The reason I wrote the binary XML solution is because I experienced slow performance with libxml when parsing AngelCodes font files. Maybe somebody finds it useful - instead of inventing Yet Anoher Binary XML Format :-)

Hey there! :-)
I'd certainly be interested in seeing the 'C' decoder/parser; if you wouldn't mind.

Cheers

Mark

Try this.

http://codesuppository.blogspot.com/2009...tream.html

If you don't know who the author is ... here is some info:

http://en.wikipedia.org/wiki/John_W._Ratcliff
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  Question about tmx files (tilemaps) and resolution for Iphone 3gs, Ipad, and Retina appsolutecreations 10 16,321 May 19, 2012 10:02 AM
Last Post: tapouillo
  Playing .mod files on iPhone? Jamie W 0 3,180 Jul 23, 2009 05:06 AM
Last Post: Jamie W