File parsing... the best way...

Member
Posts: 277
Joined: 2004.10
Post: #1
Very hard question...

What is the best way to handle file parsing?


See, I have a map that looks like this:
Code:
world
name whatever
size 400
end

box
name box1
size 1.0 1.0 1.0
position 1.0 1.0 1.0
rotation 1.0 1.0 1.0
end

pyramid
name box1
size 1.0 1.0 1.0
position 1.0 1.0 1.0
rotation 1.0 1.0 1.0
end

Now, I've looked at tokenizing in Cocoa and in C++, and NSScanner.

All look horrible ugly, and I even had a short conversation about this on IRC... but I don't want to get flamed when I write my own hanlders.
"when I have plenty of code that I just need to port"Annoyed
Quote this message in a reply
Member
Posts: 321
Joined: 2004.10
Post: #2
This is certainly not an token system, but it might give
you some ideas.

If it is too simple or not what you're looking for, then my apologies.

From your data example, it looks like the data is fairly regular and straightforward. Just as a rough approximation, maybe something like:

// stick this in the while() down below.
inFile >> tag
>> obj.size.x
>> obj.size.y
>> obj.size.z;
if tag |= "size"
abort(token "size" expected);

inFile >> tag
>> obj.position.x
>> obj. position.y
>> obj. position.z;
if tag |= "position"
abort(token "position" expected);

// etc. etc.

You'll need some logic for the header data and put
// at the begining of each blank line in your data file.

I particularly like the // comment feature because you can
comment the data if you want.


Code:
void Guns::enterGunIntoGunWarehouse( Guns &gunEntry )
{
   string path = GlobalParameterServices::getDataPath();
  
   string imageFile = "";   // Targa, PNG
   string comment = "//";
   string token = "";
   string restOfLine = "";
      
  
   string fullyQualifiedFile = path + gunEntry.name + ".txt";
  
  
   ErrorAndLogServices::infoOrError( "Attempting to read file: " + fullyQualifiedFile,
                                     __FILE__, __FUNCTION__, __LINE__, CONTINUE );  
      
   ifstream inFile( fullyQualifiedFile.c_str(), ios::in );
      
   if (!inFile)
   {
      ErrorAndLogServices::infoOrError( "Unable to open or read file: " + fullyQualifiedFile,
                                        __FILE__, __FUNCTION__, __LINE__, ABORT );  
   }
  
   while ( inFile >> token )
   {
      if ( token == comment )
      {
          getline(inFile, restOfLine); // read rest of comment line
      }
      else
      {
         // first token is an image file specification
      
         imageFile = path + token + ".png";

         TextureServices::loadImageDataFromFileAndBindToNewTexture( PNG , imageFile, gunEntry.texture,
                                                                    gunEntry.pixelWidth, gunEntry.pixelHeight );
                            
         inFile >> gunEntry.lengthFeet
                >> gunEntry.trunnionOverhang
                >> gunEntry.maxRange
                >> gunEntry.shellWeight
                >> gunEntry.timeToLoad
                >> gunEntry.trainingRate;        
      }      
   }        
    
}

Where the data file looks something like this


Code:
// 16 inch/50  Mark 7 main guns
//
//                     Note: pixel dimensions are now found from the image file            
//   png file          length (feet)   trunionOverhang       maxRange   shellWeight    timeToLoad    trainingRate
//                                                                                                    
//      ----------------------------------------------------------------------------------
16in-50Mk7                32.0               149.0             0.0            0.0          0.0           1.0
5in-38Mk30or48         24.23           140.0                 0.0           0.0           1.0           0.0
Quote this message in a reply
Member
Posts: 277
Joined: 2004.10
Post: #3
Here's another example showing as much as possible
Code:
box
size 1.0 1 1.0005
name boxy
position 2.7 220.5 1
rot 0.1 #(comment) pos and rot can be used in place of position and rotation
end

Now you're code is extremly helpful but kinda hard to understand...
(I don't work with ofstream :-P)

Global warming is caused by hobos and mooses
Quote this message in a reply
Moderator
Posts: 1,140
Joined: 2005.07
Post: #4
Here's some commented C code that would show how I would implement it.
Code:
FILE *file = fopen(pathName, "r");
char buffer[BUFSIZ];
int currentChar;
//the magical struct that holds whatever you are reading in
MagicalStruct *object = newMagicalStruct();
//type of object that's currently used; the types are defined somewhere
//else in the file
int type = NO_TYPE;
//loop until the file (will explicitly break out at end)
while (!feof(file))
{
    //get the current char to check for comments
    currentChar = getc(file);
    //see comment, so skip the rest of the line
    if (currentChar == '#')
        fgets(buffer, BUFSIZ, file);
    //put the character back into the stream
    putc(currentChar, file);

    //get the current option
    fscanf(file, "%s", buffer);
    //end of the input
    if (!strcmp(buffer, "end")
        break;

    //read the types
    if (!strcmp(buffer, "box")
    {
        type = BOX;
        continue;
    }
    //other types go here

    if (!strcmp(buffer, "size")
    {
        if (type == BOX)
            fscanf(file, "%f %f %f", &object->box->length, &object->box->width,
            &object->box->height);
        //other types go here with else if statements
        continue;
    }
    if (!strcmp(buffer, "name")
    {
        //the name that will be read in
        char *name;
        //the length of the string read in
        int stringLength;
        //maximum length of the string
        int maxLength;
        //the length of the string read in (used to replace the newline at the end)
        int stringLength;
        if (type == BOX)
        {
            name = object->box->name;
            maxLength = object->box->nameLength;
        }
        //other types go here with else if statements

        //read the rest of the line for the name
        fgets(name, maxLength, file);
        stringLength = strlen(name);
        if (name[stringLength - 1] == '\n')
            name[stringLength - 1] = '\0';
        continue;
    }
    if (!strcmp(buffer, "position") || !strcmp(buffer, "pos")
    {
        if (type == BOX)
            fscanf(file, "%f %f %f", &object->box->x, &object->box->y,
            &object->box->z);
        //other types go here with else if statements
        continue;
    }
    if (!strcmp(buffer, "rotation") || !strcmp(buffer, "rot")
    {
        if (type == BOX)
            fscanf(file, "%f", &object->box->rot);
        //other types go here with else if statements
        continue;
    }
    //other commands go here
}

Edit: I suggest you just copy and paste that into another window, since these code tags are too small to actually see anything and quote tags would destroy any indenting.
Quote this message in a reply
Member
Posts: 257
Joined: 2004.06
Post: #5
The way I handled it in my contest entry, Chemical Bonds, was that I used sscanf. No, really, I did. I have lines that look like:
Code:
name This is the level name.
speed 2000
goal 5

Then in my data loading code:
Code:
if ( strstr(line, "goal") == line )
{
    int goal;
    if ( sscanf(line, "%*s %d", &goal) == 1 )
    {
        fGoal = goal;
    }
}
else if ( strstr(line, "name") == line )
{
    char* name = line + 5;
    fName = name; // fName is of type std::string
    fName.erase(fName.length() - 1);
}
else if ( strstr(line, "speed") == line )
{
    int speed;
    if ( sscanf(line, "%*s %d", &speed) == 1 )
    {
        fSpeed = speed;
    }
}

If you're just loading simple sets of data, then you probably won't need complicated parsing.

The brains and fingers behind Malarkey Software (plus caretaker of the world's two brattiest felines).
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #6
File parsing is unfortunately inherently ugly...

[edit]Outnumbered uses sscanf for all its file parsing...[/edit]
Quote this message in a reply
Apprentice
Posts: 19
Joined: 2004.10
Post: #7
I typically use lex/yacc for file parsing. I know they are overkill, but I know that by using them I will not be constrained in the future by my choice of syntax.

First you start by defining a set of tokens in flex:

Code:
letter    [a-zA-Z_]
num       [0-9]
number    {num}+
id        {letter}({letter}|{num})*
comment   #{.*}[\n\r\]

%%
{id}      { strcpy(lastToken, yytext); return ID_; }
{number}  { yylval = atoi(yytext); return NUMBER_; }

{comment} {/* Do nothing */}

.        { /* Return the token if no match */ return yytext[0]; }
%%

And your yacc file:
Code:
%token ID_ NUMBER_

%%
lines:
  assign
  | lines assign
;

assign:
  ID_ '=' VAL_ ';' { /* Insert the value into the token table */ }
;
%%

(The above example is simplistic, but it gives you a flavor.)

There are a number of tutorials on flex & yacc online. (Also look for flex/bison, the GNU implementations.) The O'Reilly book is horrible, or at least the first version was, there may be a new one.
Quote this message in a reply
Sage
Posts: 1,199
Joined: 2004.10
Post: #8
While I think the use of lex and yacc may be a little extreme, JFaller's on the money in my opinion. Any filetype which denoted meaning by order is brittle, and easy to break if you add functionality.

I'm not going to pimp my own yet-another-markup but my files read like so:

[CODE]

[begin:someblock anAttribute="foo" anotherAttribute="bar"]
someParam:someDatatype=...
[begin:nestedBlock]
aParameter:aDatatype=...
[end]
[end]

[CODE]

It's basically like a lightweight XML, and I wrote a simple sax parser for it in a matter of hours in C++. Then I wrote a DOM builder which builds off the SAX parser in a few more hours after that. The whole thing took maybe a weekend to design develop and debug in total. The filetype is *super* robust, and I've been using it for three years now, in many *very* different projects.

Plus, since I can build a DOM representation dynamically, I can write properly formatted files too, not just read them.

While a simple position-denotes-meaning filetype is sufficient for a one-off parser, I'd recommend taking a few days to design something you can use in the future.
Quote this message in a reply
Member
Posts: 277
Joined: 2004.10
Post: #9
Strange I could have sworn I posted...

Anyway, I need to read in a specific file format and nothing else.
If I do make my own game with map's their gonna be binary not text.

I don't want to create some super nice map file opener... just a specific file format reader.

and I have a hard enough time getting the devs to make a Jaguar version... they decided to distribute like ten libraries in the actual source but the only one they don't, isn't on Jaguar...
and then their amazed when I install a library that even OSC said I couldn't install (IRC channal Rasp) and get the game to run.
(and then I have like twenty ppl asking me how to do it)

anyway enough of my ranting...
I'm gonna do it this way.

tokenize by \n
tokenize each tokenized line by " "
loop through each filled line (line that has more than "\n")
find if line has an object name.
pass that object name and the offset to that object handler.
wait for object handler to get to end and give us back offset of "end"

So far, I have everything but the pass to the object handler...

thanks guys!

Global warming is caused by hobos and mooses
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  Parsing from a string to something faster? Madrayken 3 3,239 Aug 10, 2009 03:32 PM
Last Post: smasher
  OH GAWD! Text parsing in cocoa?!?!? hypnotx 6 4,530 Jul 5, 2007 12:57 PM
Last Post: SethWillits