Two-level Grammar of JinXML

Overview

JinXML has a whitespace insensitive layout, which means that it is a good idea to split the syntax into two phases: a lower-level tokenisation phase and an upper level parsing phase. This page describes both levels for JinXML in EBNF and also illustrates the grammars with railroad diagrams, thanks to the excellent Railroad Diagram Generator.

Upper-Level Grammar in EBNF, corresponds to parse phase

JinXML ::= Element | JSON | Call
Call ::= ( NCName | '&' ) '<' Attribute* ( '/>' | '>' '(' Item* ')' )
Element ::= StartTag  Item* EndTag | FusedTag
StartTag ::= '<' Name Attribute* '>'
EndTag ::= '</' Name '>'
FusedTag ::= '<' Name Attribute* '/>'
Attribute ::= FieldPrefix String
JSON ::= Reserved | Number | String | Array | Object
Reserved ::= 'null' | 'true' | 'false'
Item ::= ( Entry | JinXML ) Terminator?
Array ::= '[' ( JinXML Terminator? )*  ']'
Object ::= '{' ( Entry Terminator? )* '}'
Entry ::= FieldPrefix JinXML 
FieldPrefix ::= Name ( ':' | '=' | '+:' | '+=' )
Name ::= NCName | '&' | String
NCName ::= [http://www.w3.org/TR/xml-names/#NT-NCName]
Terminator ::= ',' | ';'

The following side-conditions apply:

ElementNames in paired tags must not differ, where & is considered to automatically match.
& can only be used in a StartTag when its element appears on the right of an Entry.
& can only be used as an Entry name when followed by a named StartTag (not &).

Top Level Grammar as Railroad Diagram

JinXML: JinXML is the non-terminal through which all recursion happens

Image of JinXML rule

Call: A function-call like syntax for elements

Image of Call rule

Element: Element are made up of tags

Image of Element rule

StartTag: Must be paired with an EndTag

Image of StartTag rule

EndTag: Must be paired with a StartTag

Image of EndTag rule

FusedTag: Combines a start-and-end tag pair when there are no children

Image of FusedTag rule

Attribute: An attribute pairs up a name with a string value

Image of Attribute rule

JSON: Denotes a JSON-styled expression

Image of JSON rule

Reserved: JSON reserves null, true and false

Image of Reserved rule

Array: JSON-style array brackets

Image of Array rule

Object: JSON-style object brackets

Image of Object rule

Entry: Member of JSON-style object

Image of Entry rule

FieldPrefix: Member of JSON-style object

Image of Entry rule

ElementName: Element names, attribute keys and object keys are almost identical - but ‘+’ is allowed for element names.

Image of ElementName rule

NCName: Same as XML spec

Image of NCName URL

Terminator: Optional comma or semi between members of arrays, objects or elements.

Image of Terminator rule

Lower-Level Grammar for Tokenisation in EBNF, corresponds lexical analysis phase

Note that Shebang sequences may only occur at the start of a stream.

Reserved ::= 'null' | 'true' | 'false'
Number ::= '-'? [0-9]+ ( '.' [0-9]+ )? ( ( 'e' | 'E' ) [0-9]+ )?
String ::= SingleQuotedString | DoubleQuotedString
DoubleQuotedString ::= '"' ([^"\]|Escape)* '"'
SingleQuotedString ::= "'" ([^'\]|Escape)* "'"
Escape ::= '\' ( ["'\/bfnrt] | 'u' Hex Hex Hex Hex | XEscape )
XEscape ::= '&' (NamedCharacterReference|'#' [0-9]+|'#x' Hex+)';'
NamedCharacterReference ::= [http://www.w3.org/TR/html5/syntax.html#named-character-references]
Hex ::= [0-9a-fA-F]
Discard ::= ( Whitespace | XComment | XOther | JComment )+
XComment ::= '<!--' ( [^-]* | '-'+ [^->] )* '-'* '-->' 
XOther ::= '<' [?!] [^>]* '>' 
JComment ::= LongComment | EoLComment
LongComment ::=  '/*' ( [^*] | '*'+ [^*/] )* '*'* '*/'
EoLComment ::= '//' [^#xA]* #xA
Whitespace ::= (#x20 | #x9 | #xD | #xA)+
Shebang ::= ('#!' [^#xA]* #xA)+