1
Vote

Lexer fails to recognize Unicode escape sequence on Identifiers.

description

Just a note, C# allows escape sequences on identifiers, such as \u0061 for 'a'. The parser as it is, does not.
 
The issue seems to lie in the lexer aspect of things because the strings table it emits seems to contain the part of the identifier before the escaped aspect and then the 'u...' aspect separately.
 
You could easily add a parse for such a sequence, according to the C# language specification, an identifier has a few initial rules, it cannot be a keyword, or if it is, it must be escaped with a '@' character, or must contain a unicode escape sequence. Beyond that, it starts with one character in the following character classes: Lu, Ll, Lt, Lm, Lo, Nl, an escape sequence within the same classes, or the underscore character (U+005F).
Every character after that is of the following classes: Lu, Ll, Lt, Lm, Lo, Nl, Mn, Mc, Nd, Pc, Cf, a unicode escape sequence of equivalent range or, yet again, an underscore character.
 
To obtain the character's class, you can use char.GetUnicodeCategory(ch), where 'ch' is the character you're wanting to obtain the category of.
 
I've e-mailed Debreuil asking about potentially updating the project, in the future, for later versions of the specification (namely keywords and lambda expressions and so on), and asked him his insight on state machines and the like; I noticed the error in this project while trying to fix mine (I figured I might as well check, since I don't use unicode character escapes, but I know they're allowed.)
 
I could work on a fix of my own, but I'm merely interested in different methodologies, my preferred method is code generation, thus I'm making a program to write parsers for me.
 
If there's no plan to fix this, I apologize, I couldn't find any information about it on your other issues or Discussions.

comments

debreuil wrote Mar 13, 2009 at 6:18 AM

Hey Alexander,

Absolutely this should be fixed... thanks for the catch. I am on a tight sprint for this video game project until thursday, but maybe I can track it down over lunch... if you have the fix feel free too : ) (I'm still one fix behind here :( )

wrote Feb 14, 2013 at 6:33 PM