File parsing... the best way...
Very hard question...
What is the best way to handle file parsing?
See, I have a map that looks like this:
Now, I've looked at tokenizing in Cocoa and in C++, and NSScanner.
All look horrible ugly, and I even had a short conversation about this on IRC... but I don't want to get flamed when I write my own hanlders.
"when I have plenty of code that I just need to port"
What is the best way to handle file parsing?
See, I have a map that looks like this:
Code:
world
name whatever
size 400
end
box
name box1
size 1.0 1.0 1.0
position 1.0 1.0 1.0
rotation 1.0 1.0 1.0
end
pyramid
name box1
size 1.0 1.0 1.0
position 1.0 1.0 1.0
rotation 1.0 1.0 1.0
endNow, I've looked at tokenizing in Cocoa and in C++, and NSScanner.
All look horrible ugly, and I even had a short conversation about this on IRC... but I don't want to get flamed when I write my own hanlders.
"when I have plenty of code that I just need to port"
This is certainly not an token system, but it might give
you some ideas.
If it is too simple or not what you're looking for, then my apologies.
From your data example, it looks like the data is fairly regular and straightforward. Just as a rough approximation, maybe something like:
// stick this in the while() down below.
inFile >> tag
>> obj.size.x
>> obj.size.y
>> obj.size.z;
if tag |= "size"
abort(token "size" expected);
inFile >> tag
>> obj.position.x
>> obj. position.y
>> obj. position.z;
if tag |= "position"
abort(token "position" expected);
// etc. etc.
You'll need some logic for the header data and put
// at the begining of each blank line in your data file.
I particularly like the // comment feature because you can
comment the data if you want.
Where the data file looks something like this
you some ideas.
If it is too simple or not what you're looking for, then my apologies.
From your data example, it looks like the data is fairly regular and straightforward. Just as a rough approximation, maybe something like:
// stick this in the while() down below.
inFile >> tag
>> obj.size.x
>> obj.size.y
>> obj.size.z;
if tag |= "size"
abort(token "size" expected);
inFile >> tag
>> obj.position.x
>> obj. position.y
>> obj. position.z;
if tag |= "position"
abort(token "position" expected);
// etc. etc.
You'll need some logic for the header data and put
// at the begining of each blank line in your data file.
I particularly like the // comment feature because you can
comment the data if you want.
Code:
void Guns::enterGunIntoGunWarehouse( Guns &gunEntry )
{
string path = GlobalParameterServices::getDataPath();
string imageFile = ""; // Targa, PNG
string comment = "//";
string token = "";
string restOfLine = "";
string fullyQualifiedFile = path + gunEntry.name + ".txt";
ErrorAndLogServices::infoOrError( "Attempting to read file: " + fullyQualifiedFile,
__FILE__, __FUNCTION__, __LINE__, CONTINUE );
ifstream inFile( fullyQualifiedFile.c_str(), ios::in );
if (!inFile)
{
ErrorAndLogServices::infoOrError( "Unable to open or read file: " + fullyQualifiedFile,
__FILE__, __FUNCTION__, __LINE__, ABORT );
}
while ( inFile >> token )
{
if ( token == comment )
{
getline(inFile, restOfLine); // read rest of comment line
}
else
{
// first token is an image file specification
imageFile = path + token + ".png";
TextureServices::loadImageDataFromFileAndBindToNewTexture( PNG , imageFile, gunEntry.texture,
gunEntry.pixelWidth, gunEntry.pixelHeight );
inFile >> gunEntry.lengthFeet
>> gunEntry.trunnionOverhang
>> gunEntry.maxRange
>> gunEntry.shellWeight
>> gunEntry.timeToLoad
>> gunEntry.trainingRate;
}
}
}Where the data file looks something like this
Code:
// 16 inch/50 Mark 7 main guns
//
// Note: pixel dimensions are now found from the image file
// png file length (feet) trunionOverhang maxRange shellWeight timeToLoad trainingRate
//
// ----------------------------------------------------------------------------------
16in-50Mk7 32.0 149.0 0.0 0.0 0.0 1.0
5in-38Mk30or48 24.23 140.0 0.0 0.0 1.0 0.0
Here's another example showing as much as possible
Now you're code is extremly helpful but kinda hard to understand...
(I don't work with ofstream :-P)
Code:
box
size 1.0 1 1.0005
name boxy
position 2.7 220.5 1
rot 0.1 #(comment) pos and rot can be used in place of position and rotation
endNow you're code is extremly helpful but kinda hard to understand...
(I don't work with ofstream :-P)
Global warming is caused by hobos and mooses
Here's some commented C code that would show how I would implement it.
Edit: I suggest you just copy and paste that into another window, since these code tags are too small to actually see anything and quote tags would destroy any indenting.
Code:
FILE *file = fopen(pathName, "r");
char buffer[BUFSIZ];
int currentChar;
//the magical struct that holds whatever you are reading in
MagicalStruct *object = newMagicalStruct();
//type of object that's currently used; the types are defined somewhere
//else in the file
int type = NO_TYPE;
//loop until the file (will explicitly break out at end)
while (!feof(file))
{
//get the current char to check for comments
currentChar = getc(file);
//see comment, so skip the rest of the line
if (currentChar == '#')
fgets(buffer, BUFSIZ, file);
//put the character back into the stream
putc(currentChar, file);
//get the current option
fscanf(file, "%s", buffer);
//end of the input
if (!strcmp(buffer, "end")
break;
//read the types
if (!strcmp(buffer, "box")
{
type = BOX;
continue;
}
//other types go here
if (!strcmp(buffer, "size")
{
if (type == BOX)
fscanf(file, "%f %f %f", &object->box->length, &object->box->width,
&object->box->height);
//other types go here with else if statements
continue;
}
if (!strcmp(buffer, "name")
{
//the name that will be read in
char *name;
//the length of the string read in
int stringLength;
//maximum length of the string
int maxLength;
//the length of the string read in (used to replace the newline at the end)
int stringLength;
if (type == BOX)
{
name = object->box->name;
maxLength = object->box->nameLength;
}
//other types go here with else if statements
//read the rest of the line for the name
fgets(name, maxLength, file);
stringLength = strlen(name);
if (name[stringLength - 1] == '\n')
name[stringLength - 1] = '\0';
continue;
}
if (!strcmp(buffer, "position") || !strcmp(buffer, "pos")
{
if (type == BOX)
fscanf(file, "%f %f %f", &object->box->x, &object->box->y,
&object->box->z);
//other types go here with else if statements
continue;
}
if (!strcmp(buffer, "rotation") || !strcmp(buffer, "rot")
{
if (type == BOX)
fscanf(file, "%f", &object->box->rot);
//other types go here with else if statements
continue;
}
//other commands go here
}Edit: I suggest you just copy and paste that into another window, since these code tags are too small to actually see anything and quote tags would destroy any indenting.
The way I handled it in my contest entry, Chemical Bonds, was that I used sscanf. No, really, I did. I have lines that look like:
Then in my data loading code:
If you're just loading simple sets of data, then you probably won't need complicated parsing.
Code:
name This is the level name.
speed 2000
goal 5Then in my data loading code:
Code:
if ( strstr(line, "goal") == line )
{
int goal;
if ( sscanf(line, "%*s %d", &goal) == 1 )
{
fGoal = goal;
}
}
else if ( strstr(line, "name") == line )
{
char* name = line + 5;
fName = name; // fName is of type std::string
fName.erase(fName.length() - 1);
}
else if ( strstr(line, "speed") == line )
{
int speed;
if ( sscanf(line, "%*s %d", &speed) == 1 )
{
fSpeed = speed;
}
}If you're just loading simple sets of data, then you probably won't need complicated parsing.
The brains and fingers behind Malarkey Software (plus caretaker of the world's two brattiest felines).
File parsing is unfortunately inherently ugly...
[edit]Outnumbered uses sscanf for all its file parsing...[/edit]
[edit]Outnumbered uses sscanf for all its file parsing...[/edit]
I typically use lex/yacc for file parsing. I know they are overkill, but I know that by using them I will not be constrained in the future by my choice of syntax.
First you start by defining a set of tokens in flex:
And your yacc file:
(The above example is simplistic, but it gives you a flavor.)
There are a number of tutorials on flex & yacc online. (Also look for flex/bison, the GNU implementations.) The O'Reilly book is horrible, or at least the first version was, there may be a new one.
First you start by defining a set of tokens in flex:
Code:
letter [a-zA-Z_]
num [0-9]
number {num}+
id {letter}({letter}|{num})*
comment #{.*}[\n\r\]
%%
{id} { strcpy(lastToken, yytext); return ID_; }
{number} { yylval = atoi(yytext); return NUMBER_; }
{comment} {/* Do nothing */}
. { /* Return the token if no match */ return yytext[0]; }
%%And your yacc file:
Code:
%token ID_ NUMBER_
%%
lines:
assign
| lines assign
;
assign:
ID_ '=' VAL_ ';' { /* Insert the value into the token table */ }
;
%%(The above example is simplistic, but it gives you a flavor.)
There are a number of tutorials on flex & yacc online. (Also look for flex/bison, the GNU implementations.) The O'Reilly book is horrible, or at least the first version was, there may be a new one.
While I think the use of lex and yacc may be a little extreme, JFaller's on the money in my opinion. Any filetype which denoted meaning by order is brittle, and easy to break if you add functionality.
I'm not going to pimp my own yet-another-markup but my files read like so:
[CODE]
[begin:someblock anAttribute="foo" anotherAttribute="bar"]
someParam:someDatatype=...
[begin:nestedBlock]
aParameter:aDatatype=...
[end]
[end]
[CODE]
It's basically like a lightweight XML, and I wrote a simple sax parser for it in a matter of hours in C++. Then I wrote a DOM builder which builds off the SAX parser in a few more hours after that. The whole thing took maybe a weekend to design develop and debug in total. The filetype is *super* robust, and I've been using it for three years now, in many *very* different projects.
Plus, since I can build a DOM representation dynamically, I can write properly formatted files too, not just read them.
While a simple position-denotes-meaning filetype is sufficient for a one-off parser, I'd recommend taking a few days to design something you can use in the future.
I'm not going to pimp my own yet-another-markup but my files read like so:
[CODE]
[begin:someblock anAttribute="foo" anotherAttribute="bar"]
someParam:someDatatype=...
[begin:nestedBlock]
aParameter:aDatatype=...
[end]
[end]
[CODE]
It's basically like a lightweight XML, and I wrote a simple sax parser for it in a matter of hours in C++. Then I wrote a DOM builder which builds off the SAX parser in a few more hours after that. The whole thing took maybe a weekend to design develop and debug in total. The filetype is *super* robust, and I've been using it for three years now, in many *very* different projects.
Plus, since I can build a DOM representation dynamically, I can write properly formatted files too, not just read them.
While a simple position-denotes-meaning filetype is sufficient for a one-off parser, I'd recommend taking a few days to design something you can use in the future.
Strange I could have sworn I posted...
Anyway, I need to read in a specific file format and nothing else.
If I do make my own game with map's their gonna be binary not text.
I don't want to create some super nice map file opener... just a specific file format reader.
and I have a hard enough time getting the devs to make a Jaguar version... they decided to distribute like ten libraries in the actual source but the only one they don't, isn't on Jaguar...
and then their amazed when I install a library that even OSC said I couldn't install (IRC channal
) and get the game to run.
(and then I have like twenty ppl asking me how to do it)
anyway enough of my ranting...
I'm gonna do it this way.
tokenize by \n
tokenize each tokenized line by " "
loop through each filled line (line that has more than "\n")
find if line has an object name.
pass that object name and the offset to that object handler.
wait for object handler to get to end and give us back offset of "end"
So far, I have everything but the pass to the object handler...
thanks guys!
Anyway, I need to read in a specific file format and nothing else.
If I do make my own game with map's their gonna be binary not text.
I don't want to create some super nice map file opener... just a specific file format reader.
and I have a hard enough time getting the devs to make a Jaguar version... they decided to distribute like ten libraries in the actual source but the only one they don't, isn't on Jaguar...
and then their amazed when I install a library that even OSC said I couldn't install (IRC channal
) and get the game to run.(and then I have like twenty ppl asking me how to do it)
anyway enough of my ranting...
I'm gonna do it this way.
tokenize by \n
tokenize each tokenized line by " "
loop through each filled line (line that has more than "\n")
find if line has an object name.
pass that object name and the offset to that object handler.
wait for object handler to get to end and give us back offset of "end"
So far, I have everything but the pass to the object handler...
thanks guys!
Global warming is caused by hobos and mooses
Possibly Related Threads...
| Thread: | Author | Replies: | Views: | Last Post | |
| Parsing from a string to something faster? | Madrayken | 3 | 2,701 |
Aug 10, 2009 03:32 PM Last Post: smasher |
|
| OH GAWD! Text parsing in cocoa?!?!? | hypnotx | 6 | 3,924 |
Jul 5, 2007 12:57 PM Last Post: SethWillits |
|

