iDevGames Forums
Parsing XML files, on iPhone.. - Printable Version

+- iDevGames Forums (http://www.idevgames.com/forums)
+-- Forum: Development Zone (/forum-3.html)
+--- Forum: iPhone, iPad & iPod Game Development (/forum-11.html)
+--- Thread: Parsing XML files, on iPhone.. (/thread-7937.html)



Parsing XML files, on iPhone.. - Jamie W - Aug 3, 2010 04:25 AM

Hello All,

I'm doing a little iPhone project, and they've got a few XML files. Anyone know if there's an official 'Apple way' to parse XML files, or should I use some open source library? Any suggestions? (p.s. I am a bit of an XML newbie).

Thanks,


RE: Parsing XML files, on iPhone.. - mariocaprino - Aug 3, 2010 05:49 AM

Hi Jamie,

I use libxml2 in my own iPhone projects. You can find a description of setting up libxml in Xcode at Jeff Lamarache's site.

If your coding in Objective-C you may prefer to use NSXMLParser instead.

Good luck with your project.


RE: Parsing XML files, on iPhone.. - Jamie W - Aug 4, 2010 04:06 PM

Damn it! I mean, thank you for your reply. I just posted a response, thanking you, and explaining I found a solution, and was posting it here, when the forum software claimed it believed I was posting spam! Maybe the forum admin should be aware that the software is flagging legit helpful posts as spam?...


RE: Parsing XML files, on iPhone.. - Oddity007 - Aug 4, 2010 04:51 PM

Haha, this happened to me as well... Fortunately Carlos and the others responded, scared me for a bit.


RE: Parsing XML files, on iPhone.. - Jamie W - Aug 5, 2010 06:26 AM

Hehehe. Well, for anyone else coming across this thread; as well as what mariocaprino suggestions, you might want to try googling 'RapidXML' (scared to post links now!). Wink


RE: Parsing XML files, on iPhone.. - mariocaprino - Aug 5, 2010 08:52 AM

If you need a quick and dirty way to read XML files in your game - you can use the following Python script that parses a XML file and outputs a simple binary format that equates to the SAX protocol.

The binary format is divided into three parts: a header, a string list, and the XML structure (SAX command protocol).
  1. The header is just a magic token at the start of the file
  2. The string list contains all strings in the original XML document in UTF-8 encoding, and terminating zero. Each string is given an index that is used to refer to the string in the later XML structure section. Duplicate strings have the same index. Each string is padded to be 32-bit aligned.
  3. The XML structure is described using SAX commands.

All values are 32-bit aligned so you can easily traverse the file in memory using a UINT pointer.

Header
The header contains the following 4 32-bit values ('hdr ', 2, 'sax ', 0)

String list
The string list starts with the following header ('list', sizeof(list) / 4). Each string has a length prefix. Be aware the length is described in number of 32-bit values, not bytes! You can for example use the following code to count the number of strings in the list:

Code:
static UINT xml_countlist (const UINT* list, UINT length)
{
    UINT count= 0;
    for (; 0 < length; count++)
    {
        UINT skip= 1 + list[0];
        list+= skip;
        length-= skip;
    }
    
    return count;
}

XML structure
The XML structure starts with the following header ('doc ', sizeof(doc) / 4). The contents is a list of commands and paramters simulating the SAX protocol. Each command is described with an identifier and length of parameters. Parameters is a list of string indices from the string list. You'll notice that the file format only uses 32-bit unsigned integers to describe the the whole XML structure.

Here are some examples of SAX commands:
  • startElement ('selm'): Parameters are the element name and pairs of key/value indices. The length is (1 + numattributes * 2)
  • endElement ('eelm'): Parameters is the element name. The length will always be 1.
  • characters ('chr '): Parameters is the character string. The length will always be 1 because I compress all neighbouring character commands into one during encoding.

Limitations
Compared to other binary formats this solution has the following limitations:
  • The whole file need to be read into memory (because of the string list)
  • The file size is usually close the original (because all values are 32-bit aligned)
  • The format simulates the SAX protocol and therefore also has the same limitations

Python script
Your free to use the following code with your own projects - I hope you find it usefull.
Code:
#!/usr/bin/env python

import os
import re
import xml.sax
import struct
import StringIO

class SaxHandler (xml.sax.handler.ContentHandler):
    def __init__ (self, file):
        self.out= file
        self.charlist= None
        self.stringlist= [];
        self.re= re.compile (r"\s+", re.UNICODE | re.LOCALE)
        
    
    def stringIndex (self, str):
        try:
            return self.stringlist.index (str)    
        except ValueError:
            self.stringlist.append (str)
            return len (self.stringlist) - 1
    
    def stringList (self):
        return self.stringlist
    
    def flushCharacters (self):
        if self.charlist is None:
            return

        #command
        str= struct.pack ('4sI', 'chr ', 1)
        self.out.write (str)

        #characters
        str= ''.join (self.charlist)
        self.charlist= None
        
        str= self.re.sub (' ', str)
        str= struct.pack ('I', self.stringIndex (str))
        self.out.write (str)        
    
    #---
        
    def startElement (self, name, attrs):
        self.flushCharacters()
        attrs= attrs.items()
        
        #command
        self.out.write (struct.pack ('4sI', 'selm', 1 + len (attrs) * 2))
        
        #name
        n= self.stringIndex (name)
        self.out.write (struct.pack ('I', n))
        
        #attrs
        for (key, value) in attrs:
            k= self.stringIndex (key)
            v= self.stringIndex (value)
            self.out.write (struct.pack ('II', k, v))
            
    def endElement (self, name):
        self.flushCharacters()
        
        #command
        self.out.write (struct.pack ('4sI', 'eelm', 1))
        
        #name
        n= self.stringIndex (name)
        self.out.write (struct.pack ('I', n))
        
    def characters (self, ch):
        if self.charlist is None:
            self.charlist= []
        self.charlist.append (ch)
        

#---
def write_doc (out, filename):
    handler= SaxHandler (out)
    xml.sax.parse (filename, handler)
    return handler.stringList()

def align (size, boundary):
    return ((size) + ((boundary) - 1)) & ~((boundary) - 1)
    
def write_list (out, list):    
    for str in list:
        str= str.encode ('utf-8')
        l= align (len (str) + 1, 4)
        
        fmt= "I%ds" % (l)
        out.write (struct.pack (fmt, l / 4, str))    

def encode (inpath):
    docstr= StringIO.StringIO()
    list= write_doc (docstr, inpath)
    docstr= docstr.getvalue()
    
    liststr= StringIO.StringIO()
    write_list (liststr, list)
    liststr= liststr.getvalue()
    
    #header
    out= StringIO.StringIO()
    out.write (struct.pack ('4sI4sI', 'hdr ', 2, 'sax ', 0))
    
    #list
    out.write (struct.pack ('4sI', 'list', len (liststr) / 4))
    out.write (liststr)
        
    #doc
    out.write (struct.pack ('4sI', 'doc ', len (docstr) / 4))
    out.write (docstr)
    
    return out.getvalue()
    

if __name__ == '__main__':
    inpath= os.getenv ('INPUT_FILE_PATH')
    outstr= encode (inpath)
    
    outpath= os.path.join (os.getenv ('DERIVED_FILES_DIR'), os.getenv ('INPUT_FILE_REGION_PATH_COMPONENT') + os.getenv ('INPUT_FILE_NAME'))
    out= open (outpath, 'wb')
    out.write (outpath)
    out.close()



RE: Parsing XML files, on iPhone.. - cmiller - Aug 5, 2010 02:40 PM

@mariocaprino Python isn't going to be much help on the iPhone.

NSXMLParser leaks (personal experience circa iPhone OS 2, could have been fixed in a more recent version) but it does use libxml2 under the hood.

If you're not doing a whole lot of XML parsing, use NSXMLParser. If it's something that you plan on doing a whole lot, use libxml2 and be careful to eliminate all memory leaks.


RE: Parsing XML files, on iPhone.. - mariocaprino - Aug 5, 2010 03:23 PM

Hi cmiller,

The Python script is just a binary XML-encoder to be run on your development environment. The binary output format is meant to be simple and fast to parse on your target platform. I can upload a sample decoder/parser in C for use on the iPhone if there is interest.

The reason I wrote the binary XML solution is because I experienced slow performance with libxml when parsing AngelCodes font files. Maybe somebody finds it useful - instead of inventing Yet Anoher Binary XML Format :-)


RE: Parsing XML files, on iPhone.. - funkboy - Aug 6, 2010 05:42 AM

Another Objective C option, though I don't know how recently it's been updated:
http://code.google.com/p/touchcode/wiki/TouchXML


RE: Parsing XML files, on iPhone.. - MikeD - Aug 10, 2010 05:56 AM

Another good XML parser for the iPhone is TBXML. If you only want to parse XML and not create new DOMs or write XML out, this is a very quick parser with a really simple API.

http://www.tbxml.co.uk


RE: Parsing XML files, on iPhone.. - JohnEdward - Jan 25, 2011 02:45 AM

There are different xml files that can be used for iPhone like NSXML Parser, Touch XML and libxml 2. I recommend NSXML because it is simple and easy written in Objective C.


RE: Parsing XML files, on iPhone.. - markhula - Jan 25, 2011 06:59 AM

(Aug 5, 2010 03:23 PM)mariocaprino Wrote:  Hi cmiller,

The Python script is just a binary XML-encoder to be run on your development environment. The binary output format is meant to be simple and fast to parse on your target platform. I can upload a sample decoder/parser in C for use on the iPhone if there is interest.

The reason I wrote the binary XML solution is because I experienced slow performance with libxml when parsing AngelCodes font files. Maybe somebody finds it useful - instead of inventing Yet Anoher Binary XML Format :-)

Hey there! :-)
I'd certainly be interested in seeing the 'C' decoder/parser; if you wouldn't mind.

Cheers

Mark


RE: Parsing XML files, on iPhone.. - mariocaprino - Jan 26, 2011 01:00 AM

(Jan 25, 2011 06:59 AM)markhula Wrote:  I'd certainly be interested in seeing the 'C' decoder/parser; if you wouldn't mind.

I'll gladly post the parser I currently use for parsing output from my Python encoder. Sadly for people that might want to use the code it requires the Apache Portable Runtime. Thus you might want to check out their xml documentation regarding how I store the DOM document in memory. APR also use memory pools - which make memory dealocation for large DOM structures super easy - but might not fit your project's memory model.

If you want to read more about Apache Portable Runtime I can recommend the following tutorial.

Code:
const apr_xml_doc* xml_parsebinary (const fsfile_t* self, apr_pool_t* pool)
{
    UINT len;    
    const UINT* ints= fsfile_read (self, &len, pool); //read the whole contents of a binary file
    //because the file is encoded as a list of 32bit values (even the strings fit this model) it can be treated as an array of ints
    
    //header
    LOG_ASSERT (xml_istag (ints)); //check file header
    ints+= 2 + ints[1]; //skip the block tag and block length
    
    
    //list
    xml_context_t ctx;
    LOG_ASSERT (ints[0] == 'tsil'); //check block id
    xml_init (&ctx, ints + 2, ints[1], pool); //initialise string table
    ints+= 2 + ints[1]; //skip...
    
    //doc
    LOG_ASSERT (ints[0] == ' cod'); //check...
    UINT length= ints[1];
    ints+= 2;
    
    apr_xml_doc* doc= apr_pcalloc (pool, sizeof (apr_xml_doc));
    while (0 < length)
    {
        switch (ints[0])
        {
            case 'mles': xml_startelem (&ctx, ints + 2, ints[1]);    break;
            case 'mlee': doc->root= xml_endelem (&ctx, ints + 2, ints[1]);    break;
            case ' rhc': xml_characters (&ctx, ints + 2, ints[1]);    break;
            default:
                LOG_BADCASE (ints[0]);
                break;
        }
        
        UINT skip= 2 + ints[1];
        ints+= skip;
        length-= skip;
    }
    
    apr_pool_destroy (ctx.temp);
    return doc;
}

So that's the basic idea of a SAX parser for the in-memory structure. You would replace the functions xml_startelem, xml_endelem and xml_characters for your SAX callback functions.

If you are interested in a DOM parser - keep on reading.
Sadly it's not a drop-in solution because of it's dependence on APR - but hopefully it can help you develop your own parser or give you some new ideas about how to proceed.

Here are the details...
Code:
static BOOL xml_istag (const UINT* bytes)
{
    const UINT tag[]= {' rdh', 2, ' xas', 0};
    return memcmp (bytes, tag, sizeof (tag)) == 0;
}

static UINT xml_countlist (const UINT* list, UINT length)
{
    UINT count= 0;
    for (; 0 < length; count++)
    {
        UINT skip= 1 + list[0];
        list+= skip;
        length-= skip;
    }
    
    return count;
}

typedef struct
{
    apr_pool_t* pool, *temp;
    apr_xml_elem* elem;
    apr_array_header_t* strings;
} xml_context_t;

static void xml_init (xml_context_t* ctx, const UINT* ints, UINT length, apr_pool_t* pool)
{    
    ctx->pool= pool;
    ctx->elem= NULL;
    
    UINT nelts= xml_countlist (ints, length);
    POOL_CREATETEMP (&ctx->temp);
    ctx->strings= apr_array_make (ctx->temp, nelts, sizeof (char*));
    for (UINT i= 0; i < nelts; i++)
    {
        const char* str= (const char*) (ints + 1);
        APR_ARRAY_PUSH (ctx->strings, const char*)= str;
        ints+= 1 + ints[0];
    }
}

static const char* xml_getstring (const xml_context_t* ctx, UINT i)
{
    const apr_array_header_t* arr= ctx->strings;
    
    LOG_ASSERT (i < arr->nelts);
    const char** base= (const char**) arr->elts;
    return base[i];
}

#define xml_topelem(ctx)    ((ctx)->elem)

static apr_xml_elem* xml_pushelem (xml_context_t* ctx)
{
    apr_xml_elem* parent= xml_topelem (ctx);
    apr_xml_elem* elem= apr_pcalloc (ctx->pool, sizeof (apr_xml_elem));
    ctx->elem= elem;
    
    if (parent == NULL)
        return elem;
    
    elem->parent= parent;
    if (parent->first_child == NULL)
    {
        LOG_ASSERT (parent->last_child == NULL);
        parent->first_child= elem;
    }
    else
    {
        LOG_ASSERT (parent->last_child != NULL);
        parent->last_child->next= elem;
    }
    
    parent->last_child= elem;
    return elem;
}

static void xml_popelem (xml_context_t* ctx)
{
    LOG_ASSERT (ctx->elem != NULL);
    ctx->elem= ctx->elem->parent;
}

//---

static void xml_startelem (xml_context_t* ctx, const UINT* ints, UINT length)
{
    apr_xml_elem* elem= xml_pushelem (ctx);
    elem->name= xml_getstring (ctx, ints[0]);
    
    apr_xml_attr** next= &elem->attr;
    for (UINT i= 1; i < length; i+= 2)
    {
        *next= apr_pcalloc (ctx->pool, sizeof (apr_xml_attr));
        (*next)->name= xml_getstring (ctx, ints[i]);
        (*next)->value= xml_getstring (ctx, ints[i + 1]);
        next= &(*next)->next;
    }
}

static apr_xml_elem* xml_endelem (xml_context_t* ctx, const UINT* ints, UINT length)
{
    LOG_ASSERT (length == 1);
    const char* name= xml_getstring (ctx, ints[0]);
    apr_xml_elem* elem= xml_topelem (ctx);
    LOG_ASSERT (strcmp (elem->name, name) == 0);
    xml_popelem (ctx);
    return elem;
}

static void xml_characters (xml_context_t* ctx, const UINT* ints, UINT length)
{
    apr_xml_elem* elem= xml_topelem (ctx);
    apr_text_header* hdr= (elem->last_child == NULL) ? &elem->first_cdata : &elem->last_child->following_cdata;
    
    for (UINT i= 0; i < length; i++)
    {
        const char* str= xml_getstring (ctx, ints[i]);
        apr_text_append (ctx->pool, hdr, str);
    }
}



RE: Parsing XML files, on iPhone.. - warmi - Jan 26, 2011 03:12 PM

(Jan 25, 2011 06:59 AM)markhula Wrote:  
(Aug 5, 2010 03:23 PM)mariocaprino Wrote:  Hi cmiller,

The Python script is just a binary XML-encoder to be run on your development environment. The binary output format is meant to be simple and fast to parse on your target platform. I can upload a sample decoder/parser in C for use on the iPhone if there is interest.

The reason I wrote the binary XML solution is because I experienced slow performance with libxml when parsing AngelCodes font files. Maybe somebody finds it useful - instead of inventing Yet Anoher Binary XML Format :-)

Hey there! :-)
I'd certainly be interested in seeing the 'C' decoder/parser; if you wouldn't mind.

Cheers

Mark

Try this.

http://codesuppository.blogspot.com/2009/02/fastxml-extremely-lightweight-stream.html

If you don't know who the author is ... here is some info:

http://en.wikipedia.org/wiki/John_W._Ratcliff