Skip to the content.

Two-level Grammar of JinXML

Overview

JinXML has a whitespace insensitive layout, which means that it is a good idea to split the syntax into two phases: a lower-level tokenisation phase and an upper level parsing phase. This page describes both levels for JinXML in EBNF and also illustrates the grammars with railroad diagrams, thanks to the excellent Railroad Diagram Generator.

Upper-Level Grammar in EBNF, corresponds to parse phase

JinXML ::= Element | JSON | Call
Call ::= ( NCName | '&' ) '<' Attribute* ( '/>' | '>' '(' Item* ')' )
Element ::= StartTag  Item* EndTag | FusedTag
StartTag ::= '<' Name Attribute* '>'
EndTag ::= '</' Name '>'
FusedTag ::= '<' Name Attribute* '/>'
Attribute ::= FieldPrefix String
JSON ::= Reserved | Number | String | Array | Object
Reserved ::= 'null' | 'true' | 'false'
Item ::= ( Entry | JinXML ) Terminator?
Array ::= '[' ( JinXML Terminator? )*  ']'
Object ::= '{' ( Entry Terminator? )* '}'
Entry ::= FieldPrefix JinXML 
FieldPrefix ::= Name ( ':' | '=' | '+:' | '+=' )
Name ::= NCName | '&' | String
NCName ::= [http://www.w3.org/TR/xml-names/#NT-NCName]
Terminator ::= ',' | ';'

The following side-conditions apply:

Top Level Grammar as Railroad Diagram

JinXML: JinXML is the non-terminal through which all recursion happens

Image of JinXML rule

Call: A function-call like syntax for elements

Image of Call rule

Element: Element are made up of tags

Image of Element rule

StartTag: Must be paired with an EndTag

Image of StartTag rule

EndTag: Must be paired with a StartTag

Image of EndTag rule

FusedTag: Combines a start-and-end tag pair when there are no children

Image of FusedTag rule

Attribute: An attribute pairs up a name with a string value

Image of Attribute rule

JSON: Denotes a JSON-styled expression

Image of JSON rule

Reserved: JSON reserves null, true and false

Image of Reserved rule

Array: JSON-style array brackets

Image of Array rule

Object: JSON-style object brackets

Image of Object rule

Entry: Member of JSON-style object

Image of Entry rule

FieldPrefix: Member of JSON-style object

Image of Entry rule

ElementName: Element names, attribute keys and object keys are almost identical - but ‘+’ is allowed for element names.

Image of ElementName rule

NCName: Same as XML spec

Image of NCName URL

Terminator: Optional comma or semi between members of arrays, objects or elements.

Image of Terminator rule

Lower-Level Grammar for Tokenisation in EBNF, corresponds lexical analysis phase

Note that Shebang sequences may only occur at the start of a stream.

Reserved ::= 'null' | 'true' | 'false'
Number ::= '-'? [0-9]+ ( '.' [0-9]+ )? ( ( 'e' | 'E' ) [0-9]+ )?
String ::= SingleQuotedString | DoubleQuotedString
DoubleQuotedString ::= '"' ([^"\]|Escape)* '"'
SingleQuotedString ::= "'" ([^'\]|Escape)* "'"
Escape ::= '\' ( ["'\/bfnrt] | 'u' Hex Hex Hex Hex | XEscape )
XEscape ::= '&' (NamedCharacterReference|'#' [0-9]+|'#x' Hex+)';'
NamedCharacterReference ::= [http://www.w3.org/TR/html5/syntax.html#named-character-references]
Hex ::= [0-9a-fA-F]
Discard ::= ( Whitespace | XComment | XOther | JComment )+
XComment ::= '<!--' ( [^-]* | '-'+ [^->] )* '-'* '-->' 
XOther ::= '<' [?!] [^>]* '>' 
JComment ::= LongComment | EoLComment
LongComment ::=  '/*' ( [^*] | '*'+ [^*/] )* '*'* '*/'
EoLComment ::= '//' [^#xA]* #xA
Whitespace ::= (#x20 | #x9 | #xD | #xA)+
Shebang ::= ('#!' [^#xA]* #xA)+

Lower-Level Grammar for Tokenisation as Railroad Diagrams

Reserved: identifiers that play the role of literal constants

Image of Reserved rule

Numbers: Only base 10 so far

Image of Number rule

Strings: Single and double quoted strings and their symmetrical escape sequences

Image of String rule

DoubleQuotedString: JSON-like double-quoted strings

Image of DoubleQuotedString rule

SingleQuotedString: XML-like single-quoted strings

Image of StringQuotedString rule

Escape: JSON-style Escapes

Image of Escape rule

XEscape: XML-style Escapes

Image of XEscape rule

NamedCharacterReference:

Image of NamedCharacterReference rule

Hex: Hex Characters

Image of Hex rule

Discards: Tokens to be discarded

Image of Discard rule

XComment: XML-style comment

Image of XComment rule

XOther: Other XML-content to be discarded

Image of XOther rule

JComment: JSON-style comment

Image of JComment rule

LongComment: Multi-line Javascript like comment

Image of LongComment rule

EoLComment: End of Line Javascript style comment

Image of EoLComment rule

Whitespace:

Image of Whitespace rule

Shebang:

Image of Shebang rule