• Hey Guest. Check out your NeoGAF Wrapped 2025 results here!

C++ .NET experts - need help with Tokenizing (with a little RegEx stuff on the side)

Status
Not open for further replies.

djtiesto

is beloved, despite what anyone might say
I don't normally ask for help like this, but I'm having a problem and it's been frustrating me all day.

I'm trying to read in a file, and break up the contents into tokens. There's a lot of whitespace in this file, that I would like to remove (i.e. have it so that no token consists of whitespace). My code looks like this:
Code:
Regex* r = new Regex("\\n+|\\s+|\\t+"); // Trim Whitespace

StreamReader* sr = new StreamReader(S"file.txt");
try 
{
  line=sr->ReadLine();
  while (!eof) {
     for (int l=0; l<split->Length; l++) {
         split = r->Split(line);
         Console::Write(split[l]);
         Console::WriteLine(";");
      }
   Console::WriteLine("-----");
   line=sr->ReadLine();
   }
}

and so on...

The file looks similar to this:
 route
16  blah  blah  blah  (32)
  20  50  60
 8  blah  blah  blah  (32)
  20  50  60
I would like my output to be:
route;
16;
blah;
blah;
blah;
(32);
20;
50;
60;
8;
... you get the point.

But instead I get it like:
; <-- shouldn't be here
route;
16;
blah;
blah;
blah;
(32);
20;
50;
60;
; <-- shouldn't be here
8;
...etc

Unfortunately, I'm getting several spaces in, usually at the beginning of a line, with the above code. Any C++ gurus your help is GREATLY appreciated!
 
just tried and it worked... get rid of \t and \n and, preface the string with @ and use only one backslash.. so
Code:
Regex* r = new Regex(@"\s+");
should work fine.
 
borghe said:
just tried and it worked... get rid of \t and \n and, preface the string with @ and use only one backslash.. so
Code:
Regex* r = new Regex(@"\s+");
should work fine.

Gave that a try, but the compiler isn't liking that @ character... I looked at some MS docs, and it looks like the @ and the single \s is used for C#, I'm just using standard C++ .net with the System::IO and System::Text::RegularExpressions namespaces.
 
ok, it is a space character, just not an escape character.

try just \s+ again without the @
 
Please don't use Managed C++. It's garbage and ugly. If you want to use .net, use C#. If you want to use C++, use C++. MC++ is a GLUE LANGUAGE.
 
maharg said:
Please don't use Managed C++. It's garbage and ugly. If you want to use .net, use C#. If you want to use C++, use C++. MC++ is a GLUE LANGUAGE.

Glad SOMEBODY said it first so it wouldn't have to be me :)
 
Status
Not open for further replies.
Top Bottom