Automated Parser Generation

Jul 26, 2009 at 5:27 AM
Edited Aug 1, 2009 at 3:21 AM

It's been about four months, and I'm finally getting somewhere interesting.

Previously, as you all [may] know, I decided to attempt a backtrack-less, left-most derivation, recursive descent parser.  In the end, all the research told me: it's not worth it. Other LL(*) implementations, such as ANTLR, that use recursive descent and error recovery features to handle back-tracking, typically involve large amounts of spaghetti code that make reading or discovering the original language description nigh impossible.

I knew there had to be a better way, so I decided to try a few things (hence the lack of activity on my behalf).  The research into this has been a struggle, to say the least, because it's just been difficult to find documents associated to the methods I'm trying to use (that is non-recursive descent top-down left-most derivation with focus on LL(*) vs LL(k)). 

Another issue with Recursive Descent is Left recursion, in three different varieties: Direct, indirect and hidden. Hidden is the most irksome case, but important nonetheless. To alleviate this entire issue, I shifted gears and started down a deterministic model that would work similar to a deterministic finite automation for Regular Expressions, except this one would accept recursion.

Instead of pre-compiling this information into the application, due to cyclic model issues, I decided that it would be more appropriate to focus on embedding the logic to compute this directly in the resultant application. With this, the application can delay large computational operations related to look-ahead/follow sets until the information is absolutely necessary (as in, what's next right now?)

Presently I'm hand-writing the logic, using the C# parser as a basis for data, and once I'm done (and it... works) I can bake the logic associated into the parser builder.  Below, is the ClassDeclaration's flat-DFA view.  Flat because it doesn't include the description of the sub-rules referenced within the rule.  This is area is generated by the parser builder at present, and is used by the next stage.

internal static DeclarationSiteRule ClassDeclaration
{
    get
    {
        if (_ClassDeclaration == null)
        {
            DeclarationSiteRule rule = new DeclarationSiteRule(false, RuleIdentifiers.Rules1.ClassDeclaration);
            DeclarationSiteState State1 = new DeclarationSiteState(false, rule);
            DeclarationSiteState State2 = new DeclarationSiteState(false, rule);
            DeclarationSiteState State3 = new DeclarationSiteState(false, rule);
            DeclarationSiteState State4 = new DeclarationSiteState(false, rule);
            DeclarationSiteState State5 = new DeclarationSiteState(false, rule);
            DeclarationSiteState State6 = new DeclarationSiteState(false, rule);
            DeclarationSiteState State7 = new DeclarationSiteState(false, rule);
            DeclarationSiteState State8 = new DeclarationSiteState(false, rule);
            DeclarationSiteState State9 = new DeclarationSiteState(false, rule);
            DeclarationSiteState StateA = new DeclarationSiteState(false, rule);
            DeclarationSiteState StateB = new DeclarationSiteState(false, rule);
            DeclarationSiteState StateC = new DeclarationSiteState(false, rule);
            DeclarationSiteState StateD = new DeclarationSiteState(false, rule);
            DeclarationSiteState StateE = new DeclarationSiteState(false, rule);
            DeclarationSiteState StateF = new DeclarationSiteState(false, rule);
            DeclarationSiteState State11 = new DeclarationSiteState(true, rule);
            DeclarationSiteState State10 = new DeclarationSiteState(false, rule);
            // Transitions for state 0
            // Transition: DocumentationComment
            rule.Transitions.Add(dstd[37], State1);
            // Transition: CustomAttributeDeclarations
            rule.Transitions.Add(dstd[15], State2);
            // Transition: Modifiers (Abstract | Internal | New | Private | Protected | Public | Sealed | Static)
            rule.Transitions.Add(dstd[38], State3);
            // Transition: Modifiers (Partial)
            rule.Transitions.Add(dstd[39], State4);
            // Transition: Keywords (Class)
            rule.Transitions.Add(dstd[40], State5);
            // Transitions for state 1
            // Transition: CustomAttributeDeclarations
            State1.Transitions.Add(dstd[15], State1);
            // Transition: Modifiers (Abstract | Internal | New | Private | Protected | Public | Sealed | Static)
            State1.Transitions.Add(dstd[38], State3);
            // Transition: Modifiers (Partial)
            State1.Transitions.Add(dstd[39], State4);
            // Transition: Keywords (Class)
            State1.Transitions.Add(dstd[40], State5);
            // Transitions for state 2
            // Transition: CustomAttributeDeclarations
            State2.Transitions.Add(dstd[15], State2);
            // Transition: DocumentationComment
            State2.Transitions.Add(dstd[37], State6);
            // Transition: Modifiers (Abstract | Internal | New | Private | Protected | Public | Sealed | Static)
            State2.Transitions.Add(dstd[38], State3);
            // Transition: Modifiers (Partial)
            State2.Transitions.Add(dstd[39], State4);
            // Transition: Keywords (Class)
            State2.Transitions.Add(dstd[40], State5);
            // Transitions for state 3
            // Transition: Whitespace
            State3.Transitions.Add(dstd[1], State6);
            // Transitions for state 4
            // Transition: Whitespace
            State4.Transitions.Add(dstd[1], State7);
            // Transitions for state 5
            // Transition: Whitespace
            State5.Transitions.Add(dstd[1], State8);
            // Transitions for state 6
            // Transition: Modifiers (Abstract | Internal | New | Private | Protected | Public | Sealed | Static)
            State6.Transitions.Add(dstd[38], State3);
            // Transition: Modifiers (Partial)
            State6.Transitions.Add(dstd[39], State4);
            // Transition: Keywords (Class)
            State6.Transitions.Add(dstd[40], State5);
            // Transitions for state 7
            // Transition: Keywords (Class)
            State7.Transitions.Add(dstd[40], State5);
            // Transitions for state 8
            // Transition: Identifier
            State8.Transitions.Add(dstd[2], State9);
            // Transitions for state 9
            // Transition: GenericTypeParamsDef
            State9.Transitions.Add(dstd[41], StateA);
            // Transition: Operators (Colon)
            State9.Transitions.Add(dstd[42], StateB);
            // Transition: Operators (OpenBlock)
            State9.Transitions.Add(dstd[4], StateC);
            // Transitions for state A
            // Transition: Operators (Colon)
            StateA.Transitions.Add(dstd[42], StateD);
            // Transition: GenericTypeWhereClause
            StateA.Transitions.Add(dstd[43], StateE);
            // Transition: Operators (OpenBlock)
            StateA.Transitions.Add(dstd[4], StateC);
            // Transitions for state B
            // Transition: TypeIdentifier
            StateB.Transitions.Add(dstd[17], State10);
            // Transitions for state C
            // Transition: ClassMember
            StateC.Transitions.Add(dstd[44], StateC);
            // Transition: Operators (CloseBlock)
            StateC.Transitions.Add(dstd[8], State11);
            // Transitions for state D
            // Transition: TypeIdentifier
            StateD.Transitions.Add(dstd[17], StateF);
            // Transitions for state E
            // Transition: GenericTypeWhereClause
            StateE.Transitions.Add(dstd[43], StateE);
            // Transition: Operators (OpenBlock)
            StateE.Transitions.Add(dstd[4], StateC);
            // Transitions for state F
            // Transition: Operators (Comma)
            StateF.Transitions.Add(dstd[21], StateD);
            // Transition: GenericTypeWhereClause
            StateF.Transitions.Add(dstd[43], StateE);
            // Transition: Operators (OpenBlock)
            StateF.Transitions.Add(dstd[4], StateC);
            // Transitions for state 10
            // Transition: Operators (Comma)
            State10.Transitions.Add(dstd[21], StateB);
            // Transition: Operators (OpenBlock)
            State10.Transitions.Add(dstd[4], StateC);
            // Rule edge setup.
            rule.SetEdges(new DeclarationSiteState[]
                    {
                        State11
                    });
            _ClassDeclaration = rule;
        }
        return _ClassDeclaration;
    }
}

The next phase uses two classes called the CSharpTreeRule and CSharpTreeState (which are done), which links in the individual rules.  After this will be the CSharpUnionTree Rule and State (not done), which effectively creates the look-ahead and follow set information from the branched model observed from the CSharpTreeRule/CSharpTreeState.

Originally I had intended to create a FirstSetState which generated the appropriate look-ahead information for the current state, the issue with this became that it needs to include the full look-ahead and follow set information in order to work properly, this is primarily due to a structural flaw in how I originally coded the first set state, as well as issues with hidden Left Recursion.

The dstd reference in the code above is a common transition cache which is used by all rules.  I decided to cache the transition information because the bit sets are larger than 32 or even 64 bits in size, so the logic needed to do bitwise computation requires custom work, so I decided to reduce redundant computations.  Comments were added by the generator to alleviate this issue, and say pretty much what the original computation would have anyway.

Does anyone have any suggestions as far as readability goes?  As strange as it sounds, I think that how well a project can be maintained is more important than whether it does a good job.

Any comments appreciated,