12. Parsers
Module Parser.XML
- Method
node_to_struct
mapping(string:string|array|mapping) node_to_struct(.NSTree.NSNode|.Tree.Noderootnode)- Description
XML parsing made easy.
- Returns
A hierarchical structure of nested mappings and arrays representing the XML structure starting at
rootnodeusing a minimal depth."":stringThe text content of the node.
"/":mappingThe arguments on this node.
"...":stringThe text content of a simple subnode.
"...":arrayA list of subnodes.
"...":mappingA complex subnode (recurse).
- Example
Parser.XML.node_to_struct(Parser.XML.NSTree.parse_input("<foo>bar</foo>"));
- Method
text_quote
stringtext_quote(stringdata)- Description
Quotes the string given in
databy escaping &, < and >.
Class Parser.XML.Simple
- Method
compat_allow_errors
voidcompat_allow_errors(stringversion)- Description
Set whether the parser should allow certain errors for compatibility with earlier versions.
versioncan be:"7.2"Allow more data after the root element.
"7.6"Allow multiple and invalidly placed "<?xml ... ?>" and "<!DOCTYPE ... >" declarations (invalid "<?xml ... ?>" declarations are otherwise treated as normal PI:s). Allow "<![CDATA[ ... ]]>" outside the root element. Allow the root element to be absent.
versioncan also be zero to enable all error checks.
- Method
define_entity
voiddefine_entity(stringentity,strings,function(:void)cb,mixed...extras)- Description
Define an entity or an SMEG.
- Parameter
entity Entity name, or SMEG name (if preceeded by a
"%").- Parameter
s Expansion of the entity. Entity evaluation will be performed.
- See also
define_entity_raw()
- Method
define_entity_raw
voiddefine_entity_raw(stringentity,stringraw)- Description
Define an entity or an SMEG.
- Parameter
entity Entity name, or SMEG name (if preceeded by a
"%").- Parameter
raw Verbatim expansion of the entity.
- See also
define_entity()
- Method
lookup_entity
stringlookup_entity(stringentity)- Returns
Returns the verbatim expansion of the entity.
- Method
parse
arrayparse(stringxml,stringcontext,function(:void)cb,mixed...extra_args)arrayparse(stringxml,function(:void)cb,mixed...extra_args)
- Method
parse_dtd
mixedparse_dtd(stringdtd,stringcontext,function(:void)cb,mixed...extras)mixedparse_dtd(stringdtd,function(:void)cb,mixed...extras)
Class Parser.XML.Simple.Context
- Method
create
Parser.XML.Simple.ContextParser.XML.Simple.Context(strings,stringcontext,intflags,function(:void)cb,mixed...extra_args)Parser.XML.Simple.ContextParser.XML.Simple.Context(strings,intflags,function(:void)cb,mixed...extra_args)- Parameter
s - Parameter
context These two arguments are passed along to
push_string().- Parameter
flags Parser flags.
- Parameter
cb Callback function. This function gets called at various stages during the parsing.
- Method
push_string
voidpush_string(strings)voidpush_string(strings,stringcontext)- Description
Add a string to parse at the current position.
- Parameter
s String to insert at the current parsing position.
- Parameter
context Optional context used to refer to the inserted string. This is typically an URL, but may also be an entity (preceeded by an
"&") or a SMEG reference (preceeded by a"%"). Not used by the XML parser as such, but is simply passed into the callbackinfo mapping as the field"context"where it can be useful for eg resolving relative URLs when parsing DTDs, or for determining where errors occur.
- Method
create
- Method
compat_allow_errors
Class Parser.XML.Validating
- Description
Validating XML parser.
Validates an XML file according to a DTD.
- Method
get_external_entity
string|zeroget_external_entity(stringsysid,string|voidpubid,mapping|voidinfo,mixed...extra)- Description
Get an external entity.
Called when a <!DOCTYPE> with a SYSTEM identifier is encountered, or when an entity reference needs expanding.
- Parameter
sysid The SYSTEM identifier.
- Parameter
pubid The PUBLIC identifier (if any).
- Parameter
info The callbackinfo mapping containing the current parser state.
- Parameter
extra The extra arguments as passed to
parse()orparse_dtd().- Returns
Returns a string with a DTD fragment on success. Returns
0(zero) on failure.- Note
Returning zero will cause the validator to report an error.
- Note
In Pike 7.7 and earlier
infohad the value0(zero).- Note
The default implementation always returns
0(zero). Override this function to provide other behaviour.- See also
parse(),parse_dtd()
- Method
parse
arrayparse(stringdata,string|function(string,string,mapping,array|string,mapping(string:mixed),__unknown__... :mixed)callback,mixed...extra)- FIXME
Document this function
- Method
parse_dtd
arrayparse_dtd(stringdata,string|function(string,string,mapping,array|string,mapping(string:mixed),__unknown__... :mixed)callback,mixed...extra)- FIXME
Document this function
- Method
validate
privatemixedvalidate(stringkind,stringname,mappingattributes,array|stringcontents,mapping(string:mixed)info,function(string,string|zero,mapping|zero,array|string,mapping(string:mixed),__unknown__... :mixed)callback,array(mixed)extra)- Description
The validation callback function.
- See also
::parse()
Class Parser.XML.Validating.Element
- Description
XML Element node.
Module Parser.XML.DOM
Class Parser.XML.DOM.DOMException
Class Parser.XML.DOM.DOMParser
Class Parser.XML.DOM.NonValidatingDOMParser
Module Parser.XML.NSTree
- Description
A namespace aware version of Parser.XML.Tree. This implementation does as little validation as possible, so e.g. you can call your namespace xmlfoo without complaints.
- Method
parse_input
NSNodeparse_input(stringdata,void|stringdefault_ns)- Description
Takes a XML string
dataand produces a namespace node tree. Ifdefault_nsis given, it will be used as the default namespace.- Throws
Throws an
errorwhen an error is encountered during XML parsing.
- Method
visualize
stringvisualize(Noden,void|stringindent)- Description
Makes a visualization of a node graph suitable for printing out on a terminal.
- Example
> object x = parse_input("<a><b><c/>d</b><b><e/><f>g</f></b></a>"); > write(visualize(x)); Node(ROOT) NSNode(ELEMENT,"a") NSNode(ELEMENT,"b") NSNode(ELEMENT,"c") NSNode(TEXT) NSNode(ELEMENT,"b") NSNode(ELEMENT,"e") NSNode(ELEMENT,"f") NSNode(TEXT) Result 1: 201
Class Parser.XML.NSTree.NSNode
- Description
Namespace aware node.
- Method
add_namespace
voidadd_namespace(stringns,void|stringsymbol,void|boolchain)- Description
Adds a new namespace to this node. The preferred symbol to use to identify the namespace can be provided in the
symbolargument. Ifchainis set, no attempts to overwrite an already defined namespace with the same identifier will be made.
- Method
change_namespace
voidchange_namespace(stringfrom,stringto)- Description
Change all elements and attributes in the subtree in namespace
fromto namespaceto. In case an attribute is defined in both namespaces it will be overwritten.
- Method
child_namespaces
mappingchild_namespaces(mapping(Node:mapping(string:string))intermediate)- Description
Return the defined namespaces from the tree.
- Parameter
intermediate If namespaces are clobbered, the node that needs additional xmlns attributes are added to this mapping.
- Method
diff_namespaces
mapping(string:string) diff_namespaces()- Description
Returns the difference between this node and its parent namespaces.
- Method
get_default_ns
stringget_default_ns()- Description
Returns the default namespace in the current scope.
- Method
get_defined_nss
mapping(string:string) get_defined_nss()- Description
Returns a mapping with all the namespaces defined in the current scope, except the default namespace.
- Note
The returned mapping is the same as the one in the node, so destructive changes will affect the node.
- Method
get_ns
stringget_ns()- Description
Returns the namespace in which the current element is defined in.
- Method
get_ns_attributes
mapping(string:mapping(string:string)) get_ns_attributes()- Description
Returns all the attributes in all namespaces that is associated with this node.
- Note
The returned mapping is the same as the one in the node, so destructive changes will affect the node.
- Method
get_ns_attributes
mapping(string:string) get_ns_attributes(stringnamespace)- Description
Returns the attributes in this node that is declared in the provided namespace.
- Method
get_ns_short
stringget_ns_short(stringns)- Description
Returns the short name for the given namespace in this context. Returns the empty string if the namespace is the default namespace. Returns 0 if the namespace is unknown.
- Method
get_short_attributes
mapping(string:string) get_short_attributes()- Description
Return the attributes for the element with the names given their short name prefixes.
- Method
get_xml_name
stringget_xml_name()- Description
Returns the element name as it occurs in xml files. E.g. "zonk:name" for the element "name" defined in a namespace denoted with "zonk". It will look up a symbol for the namespace in the symbol tables for the node and its parents. If none is found a new label will be generated by hashing the namespace.
- Method
remove_child
voidremove_child(NSNodechild)- Description
The remove_child is a not updated to take care of name space issues. To properly remove all the parents name spaces from the chid, call
remove_nodein the child.
- Method
rename_namespace
voidrename_namespace(stringfrom,stringto)- Description
Renames the namespace prefix of a namespace. No checks will be made to see if the namespace represented is the same throughout the subtree.
Module Parser.XML.SloppyDOM
- Description
A somewhat DOM-like library that implements lazy generation of the node tree, i.e. it's generated from the data upon lookup. There's also a little bit of XPath evaluation to do queries on the node tree.
Implementation note: This is generally more pragmatic than
Parser.XML.DOM, meaning it's not so pretty and compliant, but more efficient.Implementation status: There's only enough implemented to parse a node tree from source and access it, i.e. modification functions aren't implemented. Data hiding stuff like NodeList and NamedNodeMap is not implemented, partly since it's cumbersome to meet the "live" requirement. Also,
Parser.HTMLis used in XML mode to parse the input. Thus it's too error tolerant to be XML compliant, and it currently doesn't handle DTD elements, like "<!DOCTYPE", or the XML declaration (i.e. "<?xml version='1.0'?>".
- Method
parse
Documentparse(stringsource,void|intraw_values)- Description
Normally entities are decoded, and
Node.xml_formatwill encode them again. Ifraw_valuesis nonzero then all text and attribute values are instead kept in their original form.
Class Parser.XML.SloppyDOM.Document
- Note
The node tree is very likely a cyclic structure, so it might be an good idea to destruct it when you're finished with it, to avoid garbage. Destructing the
Documentobject always destroys all nodes in it.
- Method
get_elements
array(Element) get_elements(stringname)- Description
Note that this one looks among the top level elements, as opposed to
get_elements_by_tag_name. This means that if the document is correct, you can only look up the single top level element here.- Note
Not DOM compliant.
Class Parser.XML.SloppyDOM.Node
- Description
Basic node.
- Method
get_text_content
stringget_text_content()- Description
If the raw_values flag is set in the owning document, the text is returned with entities and CDATA blocks intact.
- See also
parse
- Method
simple_path
mapping(string:string)|Node|array(mapping(string:string)|Node)|string|zerosimple_path(stringpath,void|intxml_format)- Description
Access a node or a set of nodes through an expression that is a subset of an XPath RelativeLocationPath in abbreviated form.
That means one or more Steps separated by "/" or "//". A Step consists of an AxisSpecifier followed by a NodeTest and then optionally by one or more Predicate's.
"/" before a Step causes it to be matched only against the immediate children of the node(s) selected by the previous Step. "//" before a Step causes it to be matched against any children in the tree below the node(s) selected by the previous Step. The initial selection before the first Step is this element.
The currently allowed AxisSpecifier NodeTest combinations are:
name to select all elements with the given name. The name can be "*" to select all.
@name to select all attributes with the given name. The name can be "*" to select all.
comment() to select all comments.
text() to select all text and CDATA blocks. Note that all entity references are also selected, under the assumption that they would expand to text only.
processing-instruction("name") to select all processing instructions with the given name. The name can be left out to select all. Either ' or " may be used to delimit the name. For compatibility, it can also occur without surrounding quotes.
node() to select all nodes, i.e. the whole content of an element node.
. to select the currently selected element itself.
A Predicate is on the form [PredicateExpr] where PredicateExpr currently can be in any of the following forms:
An integer indexes one item in the selected set, according to the document order. A negative index counts from the end of the set.
A RelativeLocationPath as specified above. It's executed for each element in the selected set and those where it yields an empty result are filtered out while the rest remain in the set.
A RelativeLocationPath as specified above followed by ="value". The path is executed for each element in the selected set and those where the text result of it is equal to the given value remain in the set. Either ' or " may be used to delimit the value.
If
xml_formatis nonzero, the return value is an xml formatted string of all the matched nodes, in document order. Otherwise the return value is as follows:Attributes are returned as one or more index/value pairs in a mapping. Other nodes are returned as the node objects. If the expression is on a form that can give at most one answer (i.e. there's a predicate with an integer index) then a single mapping or node is returned, or zero if there was no match. If the expression can give more answers then the return value is an array containing zero or more attribute mappings and/or nodes. The array follows document order.
- Note
Not DOM compliant.
Class Parser.XML.SloppyDOM.NodeWithChildElements
- Description
Node with child elements.
- Method
get_descendant_elements
array(Element) get_descendant_elements()- Description
Returns all descendant elements in document order.
- Note
Not DOM compliant.
- Method
get_descendant_nodes
array(Node) get_descendant_nodes()- Description
Returns all descendant nodes (except attribute nodes) in document order.
- Note
Not DOM compliant.
Module Parser.XML.Tree
- Description
XML parser that generates node-trees.
Has some support for XML namespaces http://www.w3.org/TR/REC-xml-names/RFC 2518 section 23.4.
- Note
This module defines two sets of node trees; the
SimpleNode-based, and theNode-based. The main difference between the two, is that theNode-based trees have parent pointers, which tend to generate circular data references and thus garbage.There are some more subtle differences between the two. Please read the documentation carefully.
- Constant
XML_ATTR
constantintParser.XML.Tree.XML_ATTR- Description
Attribute nodes are created on demand
- Method
attribute_quote
stringattribute_quote(stringdata,void|stringignore)- Description
Quotes the string given in
databy escaping &, <, >, ' and ".
- Method
parse_file
Nodeparse_file(stringpath,bool|voidparse_namespaces)- Description
Loads the XML file
path, creates a node tree representation and returns the root node.
- Method
parse_input
RootNodeparse_input(stringdata,void|boolno_fallback,void|boolforce_lowercase,void|mapping(string:string)predefined_entities,void|boolparse_namespaces,ParseFlags|voidflags)- Description
Takes an XML string and produces a node tree.
- Note
flagsis not used forPARSE_WANT_ERROR_CONTEXT,PARSE_FORCE_LOWERCASEorPARSE_ENABLE_NAMESPACESsince they are covered by the separate flag arguments.
- Method
roxen_attribute_quote
stringroxen_attribute_quote(stringdata,void|stringignore)- Description
Quotes strings just like
attribute_quote, but entities in the form
RXML parse error: Unknown scope "foo". | &foo.bar; | <xsltransform html-fallback="yes" preparse="yes" xsl="/assets/xsl/apps/manual-parser.xsl"> | <else> | <else> | <nocache> | <cache enable-protocol-cache="yes">
will not be quoted.
- Method
roxen_text_quote
stringroxen_text_quote(stringdata)- Description
Quotes strings just like
text_quote, but entities in the form
RXML parse error: Unknown scope "foo". | &foo.bar; | <xsltransform html-fallback="yes" preparse="yes" xsl="/assets/xsl/apps/manual-parser.xsl"> | <else> | <else> | <nocache> | <cache enable-protocol-cache="yes">
will not be quoted.
- Method
simple_parse_file
SimpleRootNodesimple_parse_file(stringpath,void|mappingpredefined_entities,ParseFlags|voidflags,string|voiddefault_namespace)- Description
Loads the XML file
path, creates aSimpleNodetree representation and returns the root node.
- Method
simple_parse_input
SimpleRootNodesimple_parse_input(stringdata,void|mappingpredefined_entities,ParseFlags|voidflags,string|voiddefault_namespace)- Description
Takes an XML string and produces a
SimpleNodetree.
- Method
text_quote
stringtext_quote(stringdata)- Description
Quotes the string given in
databy escaping &, < and >.
Enum Parser.XML.Tree.ParseFlags
- Description
Flags used together with
simple_parse_input()andsimple_parse_file().
Class Parser.XML.Tree.AbstractNode
- Annotations
@Pike.Annotations.Implements(AbstractSimpleNode)- Description
Base class for nodes with parent pointers.
- Method
add_child
AbstractNodeadd_child(AbstractNodec)- Description
Adds the node
cto the list of children of this node. The node is added before the nodeold, which is assumed to be an existing child of this node. The node is added first ifoldis zero.- Note
Returns the new child node, NOT the current node.
- Returns
The new child node is returned.
- Method
add_child_after
AbstractNodeadd_child_after(AbstractNodec,AbstractNodeold)- Description
Adds the node
cto the list of children of this node. The node is added after the nodeold, which is assumed to be an existing child of this node. The node is added first ifoldis zero.- Returns
The current node.
- Method
add_child_before
AbstractNodeadd_child_before(AbstractNodec,AbstractNodeold)- Description
Adds the node
cto the list of children of this node. The node is added before the nodeold, which is assumed to be an existing child of this node. The node is added last ifoldis zero.- Returns
The current node.
- Method
clone
AbstractNodeclone(void|int(-1..1)direction)- Description
Clones the node, optionally connected to parts of the tree. If direction is -1 the cloned nodes parent will be set, if direction is 1 the clone nodes childen will be set.
- Method
fix_tree
voidfix_tree()- Description
Fix all parent pointers recursively in a tree that has been built with
tmp_add_child.
- Method
get_ancestors
array(AbstractNode) get_ancestors(boolinclude_self)- Description
Returns a list of all ancestors, with the top node last. The list will start with this node if
include_selfis set.
- Method
get_following
array(AbstractNode) get_following()- Description
Returns all the nodes that follows after the current one.
- Method
get_following_siblings
array(AbstractNode) get_following_siblings()- Description
Returns all following siblings, i.e. all siblings present after this node in the parents children list.
- Method
get_preceding
array(AbstractNode) get_preceding()- Description
Returns all preceding nodes, excluding this nodes ancestors.
- Method
get_preceding_siblings
array(AbstractNode) get_preceding_siblings()- Description
Returns all preceding siblings, i.e. all siblings present before this node in the parents children list.
- Method
get_root
AbstractNodeget_root()- Description
Follows all parent pointers and returns the root node.
- Method
get_siblings
array(AbstractNode) get_siblings()- Description
Returns all siblings, including this node.
- Method
low_clone
optionalAbstractNodelow_clone()- Description
Returns an initialized copy of the node.
- Note
The returned node has no children, and no parent.
- Method
remove_child
voidremove_child(AbstractNodec)- Description
Removes all occurrences of the provided node from the called nodes list of children. The removed nodes parent reference is set to null.
- Method
remove_node
voidremove_node()- Description
Removes this node from its parent. The parent reference is set to null.
- Method
replace_child
AbstractNode|zeroreplace_child(AbstractNodeold,AbstractNode|array(AbstractNode)new)- Description
Replaces the first occurrence of the old node child with the new node child or children. All parent references are updated.
- Note
The returned value is NOT the current node.
- Returns
Returns the new child node.
- Method
replace_children
voidreplace_children(array(AbstractNode)children)- Description
Replaces the nodes children with the provided ones. All parent references are updated.
- Method
replace_node
AbstractNode|array(AbstractNode) replace_node(AbstractNode|array(AbstractNode)new)- Description
Replaces this node with the provided one.
- Returns
Returns the new node.
- Method
tmp_add_child
Method tmp_add_child_before
Method tmp_add_child_after AbstractNodetmp_add_child(AbstractNodec)AbstractNodetmp_add_child_before(AbstractNodec,AbstractNodeold)AbstractNodetmp_add_child_after(AbstractNodec,AbstractNodeold)- Description
Variants of
add_child,add_child_beforeandadd_child_afterthat doesn't set the parent pointer in the newly added children.This is useful while building a node tree, to get efficient refcount garbage collection if the build stops abruptly.
fix_treehas to be called on the root node when the building is done.
Class Parser.XML.Tree.AbstractSimpleNode
- Description
Base class for nodes.
- Method
`[]
AbstractSimpleNode|zerores =Parser.XML.Tree.AbstractSimpleNode()[pos]- Description
The [] operator indexes among the node children, so
node[0]returns the first node andnode[-1]the last.- Note
The [] operator will select a node from all the nodes children, not just its element children.
- Method
add_child
AbstractSimpleNodeadd_child(AbstractSimpleNodec)- Description
Adds the given node to the list of children of this node. The new node is added last in the list.
- Note
The return value differs from the one returned by
Node()->add_child().- Returns
The current node.
- Method
add_child_after
AbstractSimpleNodeadd_child_after(AbstractSimpleNodec,AbstractSimpleNodeold)- Description
Adds the node
cto the list of children of this node. The node is added after the nodeold, which is assumed to be an existing child of this node. The node is added first ifoldis zero.- Returns
The current node.
- Method
add_child_before
AbstractSimpleNodeadd_child_before(AbstractSimpleNodec,AbstractSimpleNodeold)- Description
Adds the node
cto the list of children of this node. The node is added before the nodeold, which is assumed to be an existing child of this node. The node is added last ifoldis zero.- Returns
The current node.
- Method
clone
optionalAbstractSimpleNodeclone()- Description
Returns a clone of the sub-tree rooted in the node.
- Method
get_children
array(AbstractSimpleNode) get_children()- Description
Returns all the nodes children.
- Method
get_descendants
array(AbstractSimpleNode) get_descendants(boolinclude_self)- Description
Returns a list of all descendants in document order. Includes this node if
include_selfis set.
- Method
get_last_child
AbstractSimpleNode|zeroget_last_child()- Description
Returns the last child node or zero.
- Method
iterate_children
intiterate_children(function(AbstractSimpleNode,mixed... :int|void)callback,mixed...args)- Description
Iterates over the nodes children from left to right, calling the function
callbackfor every node. If the callback function returnsSTOP_WALKthe iteration is promptly aborted andSTOP_WALKis returned.
- Method
low_clone
optionalAbstractSimpleNodelow_clone()- Description
Returns an initialized copy of the node.
- Note
The returned node has no children.
- Method
node_factory
optionalthis_programnode_factory(inttype,stringname,mappingattr,stringtext)- Description
Optional factory for creating contained nodes.
- Parameter
type Type of node to create. One of:
XML_TEXTXML text.
textcontains a string with the text.XML_COMMENTXML comment.
textcontains a string with the comment text.XML_HEADER<?xml?>-header
attrcontains a mapping with the attributes.XML_PIXML processing instruction.
namecontains the name of the processing instruction andtextthe remainder.XML_ELEMENTXML element tag.
namecontains the name of the tag andattrthe attributes.XML_DOCTYPEDTD information.
DTD_ENTITYDTD_ELEMENTDTD_ATTLISTDTD_NOTATION- Parameter
name Name of the tag if applicable.
- Parameter
attr Attributes for the tag if applicable.
- Parameter
text Contained text of the tab if any.
This function is called during parsning to create the various XML nodes.
Define this function to provide application-specific XML nodes.
- Returns
Returns one of
AbstractSimpleNodeA node object representing the XML tag.
int(0)0(zero) if the subtree rooted here should be cut.zeroUNDEFINEDto fall back to the next level of parser (ie behave as if this function does not exist).- Note
This function is only relevant for
XML_ELEMENTnodes.- Note
This function is not available in Pike 7.6 and earlier.
- Note
In Pike 8.0 and earlier this function was only called in root nodes.
- Method
remove_child
voidremove_child(AbstractSimpleNodec)- Description
Removes all occurrences of the provided node from the list of children of this node.
- Method
replace_child
AbstractSimpleNode|zeroreplace_child(AbstractSimpleNodeold,AbstractSimpleNode|array(AbstractSimpleNode)new)- Description
Replaces the first occurrence of the old node child with the new node child or children.
- Note
The return value differs from the one returned by
Node()->replace_child().- Returns
Returns the current node on success, and
0(zero) if the nodeoldwasn't found.
- Method
replace_children
voidreplace_children(array(AbstractSimpleNode)children)- Description
Replaces the nodes children with the provided ones.
- Method
walk_inorder
intwalk_inorder(function(AbstractSimpleNode,mixed... :int|void)callback,mixed...args)- Description
Traverse the node subtree in inorder, left subtree first, then root node, and finally the remaining subtrees, calling the function
callbackfor every node. If the functioncallbackreturnsSTOP_WALKthe traverse is promptly aborted andSTOP_WALKis returned.
- Method
walk_postorder
intwalk_postorder(function(AbstractSimpleNode,mixed... :int|void)callback,mixed...args)- Description
Traverse the node subtree in postorder, first subtrees from left to right, then the root node, calling the function
callbackfor every node. If the functioncallbackreturnsSTOP_WALKthe traverse is promptly aborted andSTOP_WALKis returned.
- Method
walk_preorder
intwalk_preorder(function(AbstractSimpleNode,mixed... :int|void)callback,mixed...args)- Description
Traverse the node subtree in preorder, root node first, then subtrees from left to right, calling the callback function for every node. If the callback function returns
STOP_WALKthe traverse is promptly aborted andSTOP_WALKis returned.
- Method
walk_preorder_2
intwalk_preorder_2(function(AbstractSimpleNode,mixed... :int|void)cb_1,function(AbstractSimpleNode,mixed... :int|void)cb_2,mixed...args)- Description
Traverse the node subtree in preorder, root node first, then subtrees from left to right. For each node we call
cb_1before iterating through children, and thencb_2(which always gets called even if the walk is aborted earlier). If the callback function returnsSTOP_WALKthe traverse decend is aborted andSTOP_WALKis returned once all waitingcb_2functions have been called.
Class Parser.XML.Tree.AttributeNode
- Annotations
@Pike.Annotations.Implements(Node)
Class Parser.XML.Tree.CommentNode
- Annotations
@Pike.Annotations.Implements(Node)
Class Parser.XML.Tree.DTDAttlistNode
- Annotations
@Pike.Annotations.Implements(Node)
Class Parser.XML.Tree.DTDElementNode
- Annotations
@Pike.Annotations.Implements(Node)@Pike.Annotations.Implements(DTDElementHelper)
Class Parser.XML.Tree.DTDEntityNode
- Annotations
@Pike.Annotations.Implements(Node)
Class Parser.XML.Tree.DTDNotationNode
- Annotations
@Pike.Annotations.Implements(Node)
Class Parser.XML.Tree.DoctypeNode
- Annotations
@Pike.Annotations.Implements(Node)
Class Parser.XML.Tree.ElementNode
- Annotations
@Pike.Annotations.Implements(Node)
Class Parser.XML.Tree.HeaderNode
- Annotations
@Pike.Annotations.Implements(Node)
Class Parser.XML.Tree.Node
- Annotations
@Pike.Annotations.Implements(AbstractNode)@Pike.Annotations.Implements(VirtualNode)- Description
XML node with parent pointers.
- Method
get_attribute_nodes
array(Node) get_attribute_nodes()- Description
Creates and returns an array of new nodes; they will not be added as proper children to the parent node, but the parent link in the nodes are set so that upwards traversal is made possible.
Class Parser.XML.Tree.PINode
- Annotations
@Pike.Annotations.Implements(Node)
Class Parser.XML.Tree.RootNode
- Annotations
@Pike.Annotations.Implements(Node)- Description
The root node of an XML-tree consisting of
Nodes.
- Method
create
Parser.XML.Tree.RootNodeParser.XML.Tree.RootNode(string|voiddata,mapping|voidpredefined_entities,ParseFlags|voidflags)
- Method
flush_node_id_cache
voidflush_node_id_cache()- Description
Clears the node id cache built and used by
get_element_by_id.
- Method
get_element_by_id
ElementNodeget_element_by_id(stringid,int|voidforce)- Description
Find the element with the specified id.
- Parameter
id The XML id of the node to search for.
- Parameter
force Force a regeneration of the id lookup cache. Needed the first time after the node tree has been modified by adding or removing element nodes, or by changing the id attribute of an element node.
- Returns
Returns the element node with the specified id if any. Returns
UNDEFINEDotherwise.- See also
flush_node_id_cache
Class Parser.XML.Tree.SimpleCommentNode
- Annotations
@Pike.Annotations.Implements(SimpleNode)
Class Parser.XML.Tree.SimpleDTDAttlistNode
- Annotations
@Pike.Annotations.Implements(SimpleNode)
Class Parser.XML.Tree.SimpleDTDElementNode
- Annotations
@Pike.Annotations.Implements(SimpleNode)@Pike.Annotations.Implements(DTDElementHelper)
Class Parser.XML.Tree.SimpleDTDEntityNode
- Annotations
@Pike.Annotations.Implements(SimpleNode)
Class Parser.XML.Tree.SimpleDTDNotationNode
- Annotations
@Pike.Annotations.Implements(SimpleNode)
Class Parser.XML.Tree.SimpleDoctypeNode
- Annotations
@Pike.Annotations.Implements(SimpleNode)
Class Parser.XML.Tree.SimpleElementNode
- Annotations
@Pike.Annotations.Implements(SimpleNode)
Class Parser.XML.Tree.SimpleHeaderNode
- Annotations
@Pike.Annotations.Implements(SimpleNode)
Class Parser.XML.Tree.SimpleNode
- Annotations
@Pike.Annotations.Implements(AbstractSimpleNode)@Pike.Annotations.Implements(VirtualNode)- Description
XML node without parent pointers and attribute nodes.
Class Parser.XML.Tree.SimplePINode
- Annotations
@Pike.Annotations.Implements(SimpleNode)
Class Parser.XML.Tree.SimpleRootNode
- Annotations
@Pike.Annotations.Implements(SimpleNode)- Description
The root node of an XML-tree consisting of
SimpleNodes.
- Method
create
Parser.XML.Tree.SimpleRootNodeParser.XML.Tree.SimpleRootNode(string|voiddata,mapping|voidpredefined_entities,ParseFlags|voidflags,string|voiddefault_namespace)
- Method
flush_node_id_cache
voidflush_node_id_cache()- Description
Clears the node id cache built and used by
get_element_by_id.
- Method
get_element_by_id
SimpleElementNodeget_element_by_id(stringid,int|voidforce)- Description
Find the element with the specified id.
- Parameter
id The XML id of the node to search for.
- Parameter
force Force a regeneration of the id lookup cache. Needed the first time after the node tree has been modified by adding or removing element nodes, or by changing the id attribute of an element node.
- Returns
Returns the element node with the specified id if any. Returns
UNDEFINEDotherwise.- See also
flush_node_id_cache
Class Parser.XML.Tree.SimpleTextNode
- Annotations
@Pike.Annotations.Implements(SimpleNode)
Class Parser.XML.Tree.TextNode
- Annotations
@Pike.Annotations.Implements(Node)
Class Parser.XML.Tree.VirtualNode
- Description
Node in XML tree
- Method
cast
(int)Parser.XML.Tree.VirtualNode()
(float)Parser.XML.Tree.VirtualNode()
(string)Parser.XML.Tree.VirtualNode()
(array)Parser.XML.Tree.VirtualNode()
(mapping)Parser.XML.Tree.VirtualNode()
(multiset)Parser.XML.Tree.VirtualNode()- Description
It is possible to cast a node to a string, which will return
render_xml()for that node.
- Method
create
Parser.XML.Tree.VirtualNodeParser.XML.Tree.VirtualNode(inttype,string|zeroname,mapping|zeroattr,string|zerotext)
- Method
get_attributes
mapping(string:string) get_attributes()- Description
Returns this nodes attributes, which can be altered destructivly to alter the nodes attributes.
- See also
replace_attributes()
- Method
get_elements
array(AbstractNode) get_elements(string|voidname,bool|voidfull)- Description
Returns all element children to this node.
- Parameter
name If provided, only elements with that name is returned.
- Parameter
full If specified, name matching will be done against the full name.
- Returns
Returns an array with matching nodes.
- Method
get_first_element
AbstractNode|zeroget_first_element(string|voidname,bool|voidfull)- Description
Returns the first element child to this node.
- Parameter
name If provided, the first element child with that name is returned.
- Parameter
full If specified, name matching will be done against the full name.
- Returns
Returns the first matching node, and 0 if no such node was found.
- Method
get_full_name
stringget_full_name()- Description
Return fully qualified name of the element node.
- Method
get_node_type
intget_node_type()- Description
Returns the node type. See defined node type constants.
- Method
get_short_attributes
mappingget_short_attributes()- Description
Returns this nodes name-space adjusted attributes.
- Note
set_short_namespaces()orset_short_attributes()must have been called before calling this function.
- Method
get_tag_name
stringget_tag_name()- Description
Returns the name of the element node, or the nearest element above if an attribute node.
- Method
render_to_file
voidrender_to_file(Stdio.Filef,void|boolpreserve_roxen_entities)- Description
Creates an XML representation for the node sub tree and streams the output to the file
f. If the flagpreserve_roxen_entitiesis set, entities on the form
RXML parse error: Unknown scope "foo". | &foo.bar; | <xsltransform html-fallback="yes" preparse="yes" xsl="/assets/xsl/apps/manual-parser.xsl"> | <else> | <else> | <nocache> | <cache enable-protocol-cache="yes">
will not be escaped.
- Method
render_xml
stringrender_xml(void|boolpreserve_roxen_entities,void|mapping(string:string)namespace_lookup,void|stringencoding,void|int(2bit)quote_mode)- Description
Creates an XML representation of the node sub tree. If the flag
preserve_roxen_entitiesis set, entities on the form
RXML parse error: Unknown scope "foo". | &foo.bar; | <xsltransform html-fallback="yes" preparse="yes" xsl="/assets/xsl/apps/manual-parser.xsl"> | <else> | <else> | <nocache> | <cache enable-protocol-cache="yes">
will not be escaped.- Parameter
namespace_lookup Mapping from namespace prefix to namespace symbol prefix.
- Parameter
encoding Force a specific output character encoding. By default the encoding set in the document XML processing instruction will be used, with UTF-8 as a fallback. Setting this value will change the XML processing instruction, if present.
- Parameter
quote_mode 0Defaults to single quote, but use double quote if it avoids escaping.
1Defaults to double quote, but use single quote if it avoids escaping.
2Use only single quote.
3Use only double quote.
- Method
replace_attributes
voidreplace_attributes(mapping(string:string)attrs)- Description
Replace the entire set of attributes.
- See also
get_attributes()
- Method
set_short_attributes
voidset_short_attributes(mappingshort_attrs)- Description
Sets this nodes name-space adjusted attributes.
- Method
set_tag_name
voidset_tag_name(stringname)- Description
Change the tag name destructively. Can only be used on element and processing-instruction nodes.
Class Parser.XML.Tree.XMLNSParser
- Description
Namespace aware parser.
Class Parser.XML.Tree.XMLParser
- Description
Mixin for parsing XML.
Uses
Parser.XML.Simpleto perform the actual parsing.
- Method
node_factory
protectedAbstractSimpleNodenode_factory(inttype,stringname,mappingattr,stringtext)- Description
Factory for creating nodes.
- Parameter
type Type of node to create. One of:
XML_TEXTXML text.
textcontains a string with the text.XML_COMMENTXML comment.
textcontains a string with the comment text.XML_HEADER<?xml?>-header
attrcontains a mapping with the attributes.XML_PIXML processing instruction.
namecontains the name of the processing instruction andtextthe remainder.XML_ELEMENTXML element tag.
namecontains the name of the tag andattrthe attributes.XML_DOCTYPEDTD information.
DTD_ENTITYDTD_ELEMENTDTD_ATTLISTDTD_NOTATION- Parameter
name Name of the tag if applicable.
- Parameter
attr Attributes for the tag if applicable.
- Parameter
text Contained text of the tab if any.
This function is called during parsning to create the various XML nodes.
Overload this function to provide application-specific XML nodes.
- Returns
Returns a node object representing the XML tag, or
0(zero) if the subtree rooted in the tag should be cut.- Note
This function is not available in Pike 7.6 and earlier.
- See also
node_factory_dispatch(),AbstractSimpleNode()->node_factory()
- Method
node_to_struct
Class Parser.HTML
- Description
This is a simple parser for SGML structured markups. It's not really HTML, but it's useful for that purpose.
The simple way to use it is to give it some information about available tags and containers, and what callbacks those are to call.
The object is easily reused, by calling the
clone()function.- See also
add_tag,add_container,finish
- Method
_inspect
mapping_inspect()- Description
This is a low-level way of debugging a parser. This gives a mapping of the internal state of the Parser.HTML object.
The format and contents of this mapping may change without further notice.
- Method
_set_tag_callback
Method _set_entity_callback
Method _set_data_callback Parser.HTML_set_tag_callback(function(:void)|string|arrayto_call)Parser.HTML_set_entity_callback(function(:void)|string|arrayto_call)Parser.HTML_set_data_callback(function(:void)|string|arrayto_call)- Description
These functions set up the parser object to call the given callbacks upon tags, entities and/or data. The callbacks will only be called if there isn't another tag/container/entity handler for these.
The callback function will be called with the parser object as first argument, and the active string as second. Note that no parsing of the contents has been done. Both endtags and normal tags are called; there is no container parsing.
The return values from the callbacks are handled in the same way as the return values from callbacks registered with
add_tagand similar functions.The data callback will be called as seldom as possible with the longest possible string, as long as it doesn't get called out of order with any other callback. It will never be called with a zero length string.
If a string or array is given instead of a function, it will act as the return value from the function. Arrays or empty strings is probably preferable to avoid recursion.
- Returns
Returns the object being called.
- Method
add_tag
Method add_container
Method add_entity
Method add_quote_tag
Method add_tags
Method add_containers
Method add_entities Parser.HTMLadd_tag(stringname,mixedto_do)Parser.HTMLadd_container(stringname,mixedto_do)Parser.HTMLadd_entity(stringentity,mixedto_do)Parser.HTMLadd_quote_tag(stringname,mixedto_do,stringend)Parser.HTMLadd_tags(mapping(string:mixed)tags)Parser.HTMLadd_containers(mapping(string:mixed)containers)Parser.HTMLadd_entities(mapping(string:mixed)entities)- Description
Registers the actions to take when parsing various things. Tags, containers, entities are as usual.
add_quote_tag()adds a special kind of tag that reads any data until the next occurrence of the end string immediately before a tag end.- Parameter
to_do This argument can be any of the following.
function(:void)The function will be called as a callback function. It will get the following arguments, depending on the type of callback.
mixed tag_callback(Parser.HTML parser,mapping args,mixed ... extra) mixed container_callback(Parser.HTML parser,mapping args,string content,mixed ... extra) mixed entity_callback(Parser.HTML parser,mixed ... extra) mixed quote_tag_callback(Parser.HTML parser,string content,mixed ... extra)
stringThis tag/container/entity is then replaced by the string. The string is normally not reparsed, i.e. it's equivalent to writing a function that returns the string in an array (but a lot faster). If
reparse_stringsis set the string will be reparsed, though.arrayThe first element is a function as above. It will receive the rest of the array as extra arguments. If extra arguments are given by
set_extra(), they will appear after the ones in this array.int(0..)If there is a tag/container/entity with the given name in the parser, it's removed.
The callback function can return:
stringThis string will be pushed on the parser stack and be parsed. Be careful not to return anything in this way that could lead to a infinite recursion.
arrayThe element(s) of the array is the result of the function. This will not be parsed. This is useful for avoiding infinite recursion. The array can be of any size, this means the empty array is the most effective to return if you don't care about the result. If the parser is operating in
mixed_mode, the array can contain anything. Otherwise only strings are allowed.int(0)This means "don't do anything", ie the item that generated the callback is left as it is, and the parser continues.
int(1)Reparse the last item again. This is useful to parse a tag as a container, or vice versa: just add or remove callbacks for the tag and return this to jump to the right callback.
- Returns
Returns the object being called.
- See also
tags,containers,entities
- Method
at
Method at_line
Method at_char
Method at_column array(int) at()intat_line()intat_char()intat_column()- Description
Returns the current position. Characters and columns count from
0, lines count from1.at()gives an array with the following layout.Array int0Line.
int1Character.
int2Column.
- Method
case_insensitive_tag
intcase_insensitive_tag(void|intvalue)- Description
All tags and containers are matched case insensitively, and argument names are converted to lowercase. Tags added with
add_quote_tag()are not affected, though. Switching to case insensitive mode and back won't preserve the case of registered tags and containers.
- Method
clear_tags
Method clear_containers
Method clear_entities
Method clear_quote_tags Parser.HTMLclear_tags()Parser.HTMLclear_containers()Parser.HTMLclear_entities()Parser.HTMLclear_quote_tags()- Description
Removes all registered definitions in the different categories.
- Returns
Returns the object being called.
- See also
add_tag,add_tags,add_container,add_containers,add_entity,add_entities
- Method
clone
Parser.HTMLclone(mixed...args)- Description
Clones the
Parser.HTMLobject. A new object of the same class is created, filled with the parse setup from the old object.This is the simpliest way of flushing a parse feed/output.
The arguments to clone is sent to the new object, simplifying work for custom classes that inherits
Parser.HTML.- Returns
Returns the new object.
- Note
create is called _before_ the setup is copied.
- Method
tags
Method containers
Method entities mapping(string:mixed) tags()mapping(string:mixed) containers()mapping(string:mixed) entities()- Description
Returns the current callback settings. When matching is done case insensitively, all names will be returned in lowercase.
Implementation note: These run in constant time since they return copy-on-write mappings.
- See also
add_tag,add_tags,add_container,add_containers,add_entity,add_entities
- Method
context
stringcontext()- Description
Returns the current output context as a string.
"data"In top level data. This is always returned when called from tag or container callbacks.
"arg"In an unquoted argument.
"splice_arg"In a splice argument.
The return value can also be a single character string, in which case the context is a quoted argument. The string contains the starting quote character.
This function is typically only useful in entity callbacks, which can be called both from text and argument values of different sorts.
- See also
splice_arg
- Method
current
stringcurrent()- Description
Gives the current range of data, ie the whole tag/entity/etc being parsed in the current callback. Returns zero if there's no current range, i.e. when the function is not called in a callback.
- Method
feed
Parser.HTMLfeed()Parser.HTMLfeed(strings,void|intdo_parse)- Description
Feed new data to the
Parser.HTMLobject. This will start a scan and may result in callbacks. Note that it's possible that all data fed isn't processed - to do that, callfinish().If the function is called without arguments, no data is fed, but the parser is run. If the string argument is followed by a
0,->feed(s,0);, the string is fed, but the parser isn't run.- Returns
Returns the object being called.
- See also
finish,read,feed_insert
- Method
feed_insert
Parser.HTMLfeed_insert(strings)- Description
This pushes a string on the parser stack.
- Returns
Returns the object being called.
- Note
Don't use!
- Method
finish
Parser.HTMLfinish()Parser.HTMLfinish(strings)- Description
Finish a parser pass. A string may be sent here, similar to feed().
- Returns
Returns the object being called.
- Method
get_extra
arrayget_extra()- Description
Gets the extra arguments set by
set_extra().- Returns
Returns the object being called.
- Method
ignore_tags
intignore_tags(void|intvalue)- Description
Do not look for tags at all. Normally tags are matched even when there's no callbacks for them at all. When this is set, the tag delimiters
'<'and'>'will be treated as any normal character.
- Method
ignore_unknown
intignore_unknown(void|intvalue)- Description
Treat unknown tags and entities as text data, continuing parsing for tags and entities inside them.
- Note
When functions are specified with
_set_tag_callback()or_set_entity_callback(), all tags or entities, respectively, are considered known. However, if one of those functions return 1 and ignore_unknown is set, they are treated as text data instead of making another call to the same function again.
- Method
lazy_argument_end
intlazy_argument_end(void|intvalue)- Description
A
'>'in a tag argument closes both the argument and the tag, even if the argument is quoted.
- Method
lazy_entity_end
intlazy_entity_end(void|intvalue)- Description
Normally, the parser search indefinitely for the entity end character (i.e.
';'). When this flag is set, the characters'&','<','>','"',''', and any whitespace breaks the search for the entity end, and the entity text is then ignored, i.e. treated as data.
- Method
match_tag
intmatch_tag(void|intvalue)- Description
Unquoted nested tag starters and enders will be balanced when parsing tags. This is the default.
- Method
max_parse_depth
intmax_parse_depth(void|intvalue)- Description
Maximum recursion depth during parsing. Recursion occurs when a tag/container/entity/quote tag callback function returns a string to be reparsed. The default value is
10.
- Method
mixed_mode
intmixed_mode(void|intvalue)- Description
Allow callbacks to return arbitrary data in the arrays, which will be concatenated in the output.
- Method
parse_tag_args
mappingparse_tag_args(stringtag)- Description
Parses the tag arguments from a tag string without the name and surrounding brackets, i.e. a string on the form
"some='tag' Â args".- Returns
Returns a mapping containing the tag arguments.
- See also
tag_args
- Method
parse_tag_name
stringparse_tag_name(stringtag)- Description
Parses the tag name from a tag string without the surrounding brackets, i.e. a string on the form
"tagname some='tag'  args".- Returns
Returns the tag name or an empty string if none.
- Method
quote_stapling
intquote_stapling(int|voidenable)- Description
Enable old-style attribute quoting by stapling.
- Parameter
enable Enable/disable the mode. Defaults to keeping the old setting.
- Returns
Returns the prior setting.
- Note
Any use of this mode is discouraged, and is only provided for compatibility with versions of Pike prior to 8.0.
- Note
Note also that this mode will output runtime warnings whenever the mode has had an effect on the parsing.
- Method
quote_tags
mapping(string:array(mixed|string)) quote_tags()- Description
Returns the current callback settings. The values are arrays ({callback, end_quote}). When matching is done case insensitively, all names will be returned in lowercase.
Implementation note:
quote_tags()allocates a new mapping for every call and thus, unlike e.g.tags()runs in linear time.- See also
add_quote_tag
- Method
read
string|array(mixed) read()string|array(mixed) read(intmax_elems)- Description
Read parsed data from the parser object.
- Returns
Returns a string of parsed data if the parser isn't in
mixed_mode, an array of arbitrary data otherwise.
- Method
reparse_strings
intreparse_strings(void|intvalue)- Description
When a plain string is used as a tag/container/entity/quote tag callback, it's not reparsed if this flag is unset. Setting it causes all such strings to be reparsed.
- Method
set_extra
Parser.HTMLset_extra(mixed...args)- Description
Sets the extra arguments passed to all tag, container and entity callbacks.
- Returns
Returns the object being called.
- Method
splice_arg
stringsplice_arg(void|stringname)- Description
If given a string, it sets the splice argument name to it. It returns the old splice argument name.
If a splice argument name is set, it's parsed in all tags, both those with callbacks and those without. Wherever it occurs, its value (after being parsed for entities in the normal way) is inserted directly into the tag. E.g:
<foo arg1="val 1" splice="arg2='val 2' arg3" arg4>
becomes
<foo arg1="val 1" arg2='val 2' arg3 arg4>
if
"splice"is set as the splice argument name.
- Method
tag
arraytag(void|mixeddefault_value)- Description
Returns the equivalent of the following calls.
Array string0tag_name()mapping(string:mixed)1tag_args(default_value)string2tag_content()
- Method
tag_args
mapping(string:mixed) tag_args(void|mixeddefault_value)- Description
Gives the arguments of the current tag, parsed to a convenient mapping consisting of key:value pairs. If the current thing isn't a tag, it gives zero.
default_valueis used for arguments which have no value in the tag. Ifdefault_valueisn't given, the value is set to the same string as the key.
- Method
tag_content
stringtag_content()- Description
Gives the content of the current tag, if it's a container or quote tag. Otherwise returns zero.
- Method
tag_name
string|zerotag_name()- Description
Gives the name of the current tag, or zero. If used from an entity callback, it gives the string inside the entity.
- Method
write_out
Parser.HTMLwrite_out(mixed...args)- Description
Send data to the output stream, i.e. it won't be parsed and it won't be sent to the data callback, if any.
Any data is allowed when the parser is running in
mixed_mode. Only strings are allowed otherwise.- Returns
Returns the object being called.
- Method
ws_before_tag_name
intws_before_tag_name(void|intvalue)- Description
Allow whitespace between the tag start character and the tag name.
- Method
xml_tag_syntax
intxml_tag_syntax(void|intvalue)- Description
Whether or not to use XML syntax to tell empty tags and container tags apart.
0Use HTML syntax only. If there's a
'/'last in a tag, it's just treated as any other argument.1Use HTML syntax, but ignore a
'/'if it comes last in a tag. This is the default.2Use XML syntax, but when a tag that does not end with
'/>'is found which only got a non-container tag callback, treat it as a non-container (i.e. don't start to seek for the container end).3Use XML syntax only. If a tag got both container and non-container callbacks, the non-container callback is called when the empty element form (i.e. the one ending with
'/>') is used, and the container callback otherwise. If only a container callback exists, it gets the empty string as content when there's none to be parsed. If only a non-container callback exists, it will be called (without the content argument) for both kinds of tags.
Module Parser
- Method
decode_numeric_xml_entity
string|zerodecode_numeric_xml_entity(stringchref)- Description
Decodes the numeric XML entity
chref, e.g. "4" and returns the character as a string.chrefis the name part of the entity, i.e. without the leading '&' and trailing ';'. Returns zero ifchrefisn't on a recognized form or if the character number is too large to be represented in a string.
- Method
encode_html_entities
stringencode_html_entities(stringraw)- Description
Encode characters to HTML entities, e.g. turning
"<"into"<".The characters that will be encoded are characters <= 32,
"\"&'<>"and characters >= 127 and <= 160 and characters >= 255.
- Method
get_xml_parser
HTMLget_xml_parser()- Description
Returns a
Parser.HTMLinitialized for parsing XML. It has all the flags set properly for XML syntax and callbacks to ignore comments, CDATA blocks and unknown PI tags, but it has no registered tags and doesn't decode any entities.
- Method
html_entity_parser
Method parse_html_entities HTMLhtml_entity_parser()stringparse_html_entities(stringin)HTMLhtml_entity_parser(intnoerror)stringparse_html_entities(stringin,intnoerror)- Description
Parse any HTML entities in the string to unicode characters. Either return a complete parser (to build on or use) or parse a string. Throw an error if there is an unrecognized entity in the string if noerror is not set.
- Note
Currently using XHTML 1.0 tables.
Class Parser.CSV
- Description
This is a parser for line oriented data that is either comma, semi-colon or tab separated. It extends the functionality of the
Parser.Tabularwith some specific functionality related to a header and record oriented parsing of huge datasets.We document only the differences with the basic
Parser.Tabular.- See also
Parser.Tabular
- Method
fetchrecord
mappingfetchrecord(void|array|mappingformat)- Description
This function consumes a single record from the input. To be used in conjunction with
parsehead().- Returns
It returns the mapping describing the record.
- See also
parsehead(),fetch()
- Method
parsehead
intparsehead(void|stringdelimiters,void|string|objectmatchfieldname)- Description
This function consumes the header-line preceding a typical comma, semicolon or tab separated value list and autocompiles a format description from that. After this function has successfully parsed a header-line, you can proceed with either
fetchrecord()orfetch()to get the remaining records.- Parameter
delimiters Explicitly specify a string containing all the characters that should be considered field delimiters. If not specified or empty, the function will try to autodetect the single delimiter in use.
- Parameter
matchfieldname A string containing a regular expression, using
Regexp.SimpleRegexpsyntax, or an object providing aRegexp.SimpleRegexp.match()single string argument compatible method, that must match all the individual fieldnames before the header will be considered valid.- Returns
It returns true if a CSV head has successfully been parsed.
- See also
fetchrecord(),fetch(),compile()
Class Parser.RCS
- Description
A RCS file parser that eats a RCS *,v file and presents nice pike data structures of its contents.
- Constant
max_revisions_supported
constantintParser.RCS.max_revisions_supported- Description
Feature detection constant for the max_revisions argument to
create(),parse()andparse_delta_sections().
- Variable
access
array(string) Parser.RCS.access- Description
The usernames listed in the ACCESS section of the RCS file.
- Variable
branch
string|int(0)Parser.RCS.branch- Description
The default branch (or revision), if present,
0otherwise.
- Variable
branches
mapping(string:string) Parser.RCS.branches- Description
Maps branch numbers (indices) to branch names (values).
- Note
The indices are short branch revision numbers (ie
"1.1.2"and not"1.1.0.2").
- Variable
comment
string|int(0)Parser.RCS.comment- Description
The RCS file comment if present,
0otherwise.
- Variable
expand
stringParser.RCS.expand- Description
The keyword expansion options (as named by RCS) if present,
0otherwise.
- Variable
locks
mapping(string:string) Parser.RCS.locks- Description
Maps from username to revision for users that have acquired locks on this file.
- Variable
rcs_file_name
stringParser.RCS.rcs_file_name- Description
The filename of the RCS file as sent to
create().
- Variable
revisions
mapping(string:Revision) Parser.RCS.revisions- Description
Data for all revisions of the file. The indices of the mapping are the revision numbers, whereas the values are the data from the corresponding revision.
- Variable
strict_locks
boolParser.RCS.strict_locks- Description
1if strict locking is set,0otherwise.
- Variable
tags
mapping(string:string) Parser.RCS.tags- Description
Maps tag names (indices) to tagged revision numbers (values).
- Note
This mapping typically contains raw revision numbers for branches (ie
"1.1.0.2"and not"1.1.2").
- Variable
trunk
array(Revision) Parser.RCS.trunk- Description
Data for all revisions on the trunk, sorted in the same order as the RCS file stored them - ie descending, most recent first, I'd assume (rcsfile(5), of course, fails to state such irrelevant information).
- Method
create
Parser.RCSParser.RCS(string|voidfile_name,string|int(0)|voidfile_contents,void|intmax_revisions)- Description
Initializes the RCS object.
- Parameter
file_name The path to the raw RCS file (includes trailing ",v"). Used mainly for error reporting (truncated RCS file or similar). Stored in
rcs_file_name.- Parameter
file_contents If a string is provided, that string will be parsed to initialize the RCS object. If a zero (
0) is sent, no initialization will be performed at all. If no value is given at all, butfile_namewas provided, that file will be loaded and parsed for object initialization.- Parameter
max_revisions Maximum number of revisions to process. If unset, all revisions will be processed.
- Method
expand_keywords_for_revision
string|zeroexpand_keywords_for_revision(string|Revisionrev,string|voidtext,int|voidexpansion_mode)- Description
Expand keywords and return the resulting text according to the expansion rules set for the file.
- Parameter
rev The revision to apply the expansion for.
- Parameter
text If supplied, substitute keywords for that text instead using values that would apply for the given revision. Otherwise, revision
revis used.- Parameter
expansion_mode Expansion mode
1Perform expansion even if the file was checked in as binary.
0Perform expansion only if the file was checked in as non-binary with expansion enabled.
-1Perform contraction if the file was checked in as non-binary.
- Note
The Log keyword (which lacks sane quoting rules) is not expanded. Keyword expansion rules set in CVSROOT/cvswrappers are ignored. Only implements the -kkv, -ko and -kb expansion modes.
- Note
Does not perform any line-ending conversion.
- See also
get_contents_for_revision
- Method
get_contents_for_revision
string|zeroget_contents_for_revision(string|Revisionrev,void|booldont_cache_data)- Description
Returns the file contents from the revision
rev, without performing any keyword expansion. Ifdont_cache_datais set we will not keep intermediate revisions in memory unless they already existed. This will cut down memory use at the expense of slow access to older revisions.- See also
expand_keywords_for_revision()
- Method
parse
this_programparse(arrayraw,void|function(string:void)progress_callback,void|intmax_revisions)- Description
Parse the RCS file
rawand initialize all members of this object fully initialized.- Parameter
raw The unprocessed RCS file.
- Parameter
progress_callback Passed on to
parse_deltatext_sections.- Parameter
max_revisions Maximum number of revisions to process. If unset, all revisions will be processed.
- Returns
The fully initialized object (only returned for API convenience; the object itself is destructively modified to match the data extracted from
raw)- See also
parse_admin_section,parse_delta_sections,parse_deltatext_sections,create
- Method
parse_admin_section
arrayparse_admin_section(string|arrayraw)- Description
Lower-level API function for parsing only the admin section (the initial chunk of an RCS file, see manpage rcsfile(5)) of an RCS file. After running
parse_admin_section, the RCS object will be initialized with the values forhead,branch,access,branches,tokenize,tags,locks,strict_locks,commentandexpand.- Parameter
raw The tokenized RCS file, or the raw RCS-file data.
- Returns
The rest of the RCS file, admin section removed.
- See also
parse_delta_sections,parse_deltatext_sections,parse,create- FIXME
Does not handle rcsfile(5) newphrase skipping.
- Method
parse_delta_sections
arrayparse_delta_sections(arrayraw,void|intmax_revisions)- Description
Lower-level API function for parsing only the delta sections (the second chunk of an RCS file, see manpage rcsfile(5)) of an RCS file. After running
parse_delta_sections, the RCS object will be initialized with the value ofdescriptionand populatedrevisionsmapping andtrunkarray. TheirRevisionmembers are however only populated with the membersRevision->revision,Revision->branch,Revision->time,Revision->author,Revision->state,Revision->branches,Revision->rcs_next,Revision->ancestorandRevision->next.- Parameter
raw The tokenized RCS file, with admin section removed. (See
parse_admin_section.)- Parameter
max_revisions Maximum number of revisions to process. If unset, all revisions will be processed.
- Returns
The rest of the RCS file, delta sections removed.
- See also
parse_admin_section,tokenize,parse_deltatext_sections,parse,create- FIXME
Does not handle rcsfile(5) newphrase skipping.
- Method
parse_deltatext_sections
voidparse_deltatext_sections(arrayraw,void|function(string:void)progress_callback,array|voidcallback_args)- Description
Lower-level API function for parsing only the deltatext sections (the final and typically largest chunk of an RCS file, see manpage rcsfile(5)) of an RCS file. After a
parse_deltatext_sectionsrun, the RCS object will be fully populated.- Parameter
raw The tokenized RCS file, with admin and delta sections removed. (See
parse_admin_section,tokenizeandparse_delta_sections.)- Parameter
progress_callback This optional callback is invoked with the revision of the deltatext about to be parsed (useful for progress indicators).
- Parameter
args Optional extra trailing arguments to be sent to
progress_callback- See also
parse_admin_section,parse_delta_sections,parse,create- FIXME
Does not handle rcsfile(5) newphrase skipping.
- Method
tokenize
array(array(string)) tokenize(stringdata)- Description
Tokenize an RCS file into tokens suitable as argument to the various parse functions
- Parameter
data The RCS file data
- Returns
An array with arrays of tokens
Class Parser.RCS.DeltatextIterator
- Description
Iterator for the deltatext sections of the RCS file. Typical usage:
- Example
string raw = Stdio.read_file(my_rcs_filename); Parser.RCS rcs = Parser.RCS(my_rcs_filename, 0); raw = rcs->parse_delta_sections(rcs->parse_admin_section(raw)); foreach(rcs->DeltatextIterator(raw); int n; Parser.RCS.Revision rev) do_something(rev);
- Method
_iterator_index
protectedint_iterator_index()- Returns
the number of deltatext entries processed so far (0..N-1, N being the total number of revisions in the rcs file)
- Method
_iterator_next
protectedbool_iterator_next()- Description
Advance the iterator one step.
Returns
UNDEFINEDwhen the iterator is finished, and1otherwise.
- Method
_iterator_value
protectedRevision_iterator_value()- Returns
the
Revisionat whose deltatext data we are, updated with its info
- Method
create
Parser.RCS.DeltatextIteratorParser.RCS.DeltatextIterator(arraydeltatext_section,void|function(string,mixed... :void)progress_callback,void|array(mixed)progress_callback_args)- Parameter
deltatext_section the deltatext section of the RCS file in its entirety
- Parameter
progress_callback This optional callback is invoked with the revision of the deltatext about to be parsed (useful for progress indicators).
- Parameter
progress_callback_args Optional extra trailing arguments to be sent to
progress_callback- See also
the rcsfile(5) manpage outlines the sections of an RCS file
- Syntax
intParser.RCS.DeltatextIterator.nprotectedboolread_next()- Description
Drops the leading whitespace before next revision's deltatext entry and sets this_rev to the revision number we're about to read.
- Method
parse_deltatext_section
protectedintparse_deltatext_section(arrayraw,into)- Description
Chops off the first deltatext section from the token array
rawand returns the rest of the string, or the value0(zero) if we had already visited the final deltatext entry. The deltatext's data is stored destructively in the appropriate entry of therevisionsarray.- Note
raw+omust start with a deltatext entry for this method to work- FIXME
does not handle rcsfile(5) newphrase skipping
- FIXME
if the rcs file is truncated, this method writes a descriptive error to stderr and then returns 0 - some nicer error handling wouldn't hurt
Class Parser.RCS.Revision
- Description
All data tied to a particular revision of the file.
- Variable
added
intParser.RCS.Revision.added- Description
The number of lines that were added from the previous revision to make this revision (for the initial revision too).
- See also
lines,removed
- Variable
ancestor
string|zeroParser.RCS.Revision.ancestor- Description
The revision of the ancestor of this revision, or
0if this was the initial revision.- See also
next
- Variable
author
stringParser.RCS.Revision.author- Description
The userid of the user that committed the revision.
- Variable
branch
stringParser.RCS.Revision.branch- Description
The branch name on which this revision was committed (calculated according to how cvs manages branches).
- Variable
branches
array(string) Parser.RCS.Revision.branches- Description
When there are branches from this revision, an array with the first revision number for each of the branches, otherwise
0.Follow the
nextfields to get to the branch head.
- Variable
lines
intParser.RCS.Revision.lines- Description
The number of lines this revision contained, altogether (not of particular interest for binary files).
- See also
added,removed
- Variable
next
string|zeroParser.RCS.Revision.next- Description
The revision that succeeds this revision, or
0if none exists (ie if this is the HEAD of the trunk or of a branch).- See also
ancestor
- Variable
rcs_next
string|zeroParser.RCS.Revision.rcs_next- Description
The revision stored next in the RCS file, or
0if none exists.- Note
This field is straight from the RCS file, and has somewhat weird semantics. Usually you will want to use one of the derived fields
nextorprevor possiblyrcs_prev.- See also
next,prev,rcs_prev
- Variable
rcs_prev
string|zeroParser.RCS.Revision.rcs_prev- Description
The revision that this revision is based on, or
0if it is the HEAD.This is the reverse pointer of
rcs_nextandbranches, and is used byget_contents_for_revision()when applying the deltas to settext.- See also
rcs_next
- Variable
rcs_text
stringParser.RCS.Revision.rcs_text- Description
The raw delta as stored in the RCS file.
- See also
text,get_contents_for_revision()
- Variable
removed
intParser.RCS.Revision.removed- Description
The number of lines that were removed from the previous revision to make this revision.
- See also
lines,added
- Variable
revision
stringParser.RCS.Revision.revision- Description
The revision number (i e
rcs_file->revisions["1.1"]->revision == "1.1").
- Variable
state
stringParser.RCS.Revision.state- Description
The state of the revision - typically
"Exp"or"dead".
- Variable
text
string|zeroParser.RCS.Revision.text- Description
The text as committed or
0ifget_contents_for_revision()hasn't been called for this revision yet.Typically you don't access this field directly, but use
get_contents_for_revision()to retrieve it.- See also
get_contents_for_revision(),rcs_text
Class Parser.SGML
- Description
This is a handy simple parser of SGML-like syntax like HTML. It doesn't do anything advanced, but finding the corresponding end-tags.
It's used like this:
array res=Parser.SGML()->feed(string)->finish()->result();The resulting structure is an array of atoms, where the atom can be a string or a tag. A tag contains a similar array, as data.
- Example
A string
"<gat> <gurka> </gurka> <banan> <kiwi> </gat>"results in({ tag "gat"object with data:({ tag "gurka"object with data:({" "}) tag "banan"object with data:({" " tag "kiwi"object with data:({" "})})})})ie, simple "tags" (not containers) are not detected, but containers are ended implicitely by a surrounding container _with_ an end tag.
The 'tag' is an object with the following variables:
string name; - name of tag mapping args; - argument to tag int line,char,column; - position of tag int eline,echar,ecolumn; - end position of tag, src[char..echar-1] got the block. add by Xuesong Guo string file; - filename (see <ref>create</ref>) array(SGMLatom) data; - contained data int open; - is not an empty element and has no end tag. add by Xuesong Guo
- Method
create
Parser.SGMLParser.SGML()Parser.SGMLParser.SGML(stringfilename,function(:void)|voidname_formater,function(:void)|voidargname_formater)- Description
This object is created with this filename. It's passed to all created tags, for debug and trace purposes. All tag name will be replace as name_formater(name) All arg_name will be replace as argname_formater(arg_name)
- Note
No, it doesn't read the file itself. See
feed().
- Method
feed
Method finish
Method result objectfeed(strings)array(SGMLatom|string) finish()array(SGMLatom|string) result(strings)- Description
Feed new data to the object, or finish the stream. No result can be used until
finish()is called.Both
finish()andresult()return the computed data.feed()returns the called object.
Class Parser.SGML.SGMLatom
- Variable
name
Variable args
Variable line
Variable char
Variable column
Variable eline
Variable echar
Variable ecolumn
Variable file
Variable data
Variable open stringParser.SGML.SGMLatom.namemappingParser.SGML.SGMLatom.argsintParser.SGML.SGMLatom.lineintParser.SGML.SGMLatom.charintParser.SGML.SGMLatom.columnintParser.SGML.SGMLatom.elineintParser.SGML.SGMLatom.echarintParser.SGML.SGMLatom.ecolumnstringParser.SGML.SGMLatom.filearray(SGMLatom) Parser.SGML.SGMLatom.dataintParser.SGML.SGMLatom.open
- Variable
name
Class Parser.Tabular
- Description
This is a parser for line and block oriented data. It provides a flexible yet concise record-description language to parse character/column/delimiter-organised records.
- See also
Parser.LR, http://www.wikipedia.org/wiki/Comma-separated_values, http://www.wikipedia.org/wiki/EDIFACT
- Method
compile
array|mappingcompile(string|Stdio.File|Stdio.FILEinput)- Description
Compiles the format description language into a compiled structure that can be fed to
setformat,fetch, orcreate.The format description is case sensitive.
The format description starts with a single line containing:
[Tabular description begin]The format description ends with a single line containing:
[Tabular description end]Any lines before the startline are skipped.
Any lines after the endline are not consumed.
Empty lines are skipped.
Comments start after a
#or;.The depth level of a field is indicated by the number of leading spaces or colons at the beginning of the line.
The fieldname must not contain any whitespace.
An arbitrary number of single character field delimiters can be specified between brackets, e.g.
[,;]or[,]would be for CSV.When field delimiters are being used: in case of CSV type delimiters
[\t,;Â ]the standard CSV quoting rules apply, in case other delimiters are used, no quoting is supported and the last field on a line should not specify a delimiter, but should specify a 0 fieldwidth instead.A fixed field width can be specified by a plain decimal integer, a value of 0 indicates a field with arbitrary length that extends till the end of the line.
A matching regular expression can be enclosed in
"", it has to match the complete field content and usesRegexp.SimpleRegexpsyntax.On records the following options are supported:
- mandatory
This record is required.
- fold
Fold this record's contents in the enclosing record.
- single
This record is present at most once.
On fields the following options are supported:
- drop
After reading and matching this field, drop the field content from the resulting mappingstructure.
- See also
setformat(),create(),fetch()- Example
Example of the description language:
[Tabular description begin] csv :gtz ::mybankno [,] ::transferdate [,] ::mutatiesoort [,] ::volgnummer [,] ::bankno [,] ::name [,] ::kostenplaats [,] drop ::amount [,] ::afbij [,] ::mutatie [,] ::reference [,] ::valutacode [,] mt940 :messageheader1 mandatory ::exporttime "0000" drop ::CS1 " " drop ::exportday "01" drop ::exportaddress 12 ::exportnumber 5 "[0-9]+" :messageheader3 mandatory fold single ::messagetype "940" drop ::CS1 " " drop ::messagepriority "00" drop :TRN fold ::tag ":20:" drop ::reference "GTZPB|MPBZ|INGEB" :accountid fold ::tag ":25:" drop ::accountno 10 :statementno fold ::tag ":28C:" drop ::settlementno 0 drop :openingbalance mandatory single ::tag ":60F:" drop ::creditdebit 1 ::date 6 ::currency "EUR" ::amount 0 "[0-9]+,[0-9][0-9]" :statements ::statementline mandatory fold single :::tag ":61:" drop :::valuedate 6 :::creditdebit 1 :::amount "[0-9]+,[0-9][0-9]" :::CS1 "N" drop :::transactiontype 3 # 3 for Postbank, 4 for ING :::paymentreference 0 ::informationtoaccountowner fold single :::tag ":86:" drop :::accountno "[0-9]*( |)" :::accountname 0 ::description fold :::description 0 "|[^:].*" :closingbalance mandatory single ::tag ":62[FM]:" drop ::creditdebit 1 ::date 6 ::currency "EUR" ::amount 0 "[0-9]+,[0-9][0-9]" :informationtoaccountowner fold single ::tag ":86:" drop ::debit "D" drop ::debitentries 6 ::credit "C" drop ::creditentries 6 ::debit "D" drop ::debitamount "[0-9]+,[0-9][0-9]" ::credit "C" drop ::creditamount "[0-9]+,[0-9][0-9]" drop ::accountname "(\n[^-:][^\n]*)*" drop :messagetrailer mandatory single ::start "-" ::end "XXX" [Tabular description end]
- Method
create
Parser.TabularParser.Tabular(void|string|Stdio.File|Stdio.FILEinput,void|array|mapping|string|Stdio.File|Stdio.FILEformat,void|intverbose)- Description
This function initialises the parser.
- Parameter
input The input stream or string.
- Parameter
format The format to be used (either precompiled or not). The format description language is documented under
compile().- Parameter
verbose If
>1, it specifies the number of characters to display of the beginning of each record as a progress indicator. Special values are:-4Turns on format debugging with visible mismatches.
-3Turns on format debugging with named field contents.
-2Turns on format debugging with field contents.
-1Turns on basic format debugging.
0Turns off verbosity. Default.
1Is the same as setting it to
70.- See also
compile(),setformat(),fetch()
- Method
feed
objectfeed(stringcontent)- Parameter
content Is injected into the input stream.
- Returns
This object.
- See also
fetch()
- Method
fetch
mapping|zerofetch(void|array|mappingformat)- Description
This function consumes as much input as needed to parse the full tabular structures at once.
- Parameter
format Describes (precompiled only) formats to be parsed. If no format is specified, the format specified on
create()is used, and empty lines are automatically skipped.- Returns
A nested mapping that contains the complete structure as described in the specified format.
If nothing matches the specified format, no input is consumed (except empty lines, if the default format is used), and zero is returned.
- See also
compile(),create(),setformat(),skipemptylines()
- Method
setformat
array|mappingsetformat(array|mappingformat)- Parameter
format Replaces the default (precompiled only) format.
- Returns
The previous default format.
- See also
compile(),fetch()
Module Parser.C
- Method
group
array(Token|array) group(array(string|Token)tokens,void|mapping(string:string)groupings)- Description
Fold sub blocks of an array of tokens into sub arrays, for grouping purposes.
- Parameter
tokens The token array to fold.
- Parameter
groupings Supplies the tokens marking the boundaries of blocks to fold. The indices of the mapping mark the start of a block, the corresponding values mark where the block ends. The sub arrays will start and end in these tokens. If no groupings mapping is provided, {}, () and [] are used as block boundaries.
- Method
hide_whitespaces
arrayhide_whitespaces(arraytokens)- Description
Folds all whitespace tokens into the previous token's trailing_whitespaces.
- Method
reconstitute_with_line_numbers
stringreconstitute_with_line_numbers(array(string|Token|array)tokens)- Description
Like
simple_reconstitute, but adding additional #line n "file" preprocessor statements in the output whereever a new line or file starts.
- Method
simple_reconstitute
stringsimple_reconstitute(array(string|Token|array)tokens)- Description
Reconstitutes the token array into a plain string again; essentially reversing
split()and whichever of thetokenize,groupandhide_whitespacesmethods may have been invoked.
- Method
split
array(string) split(stringdata,void|mapping(string:string)state)- Description
Splits the
datastring into an array of tokens. An additional element with a newline will be added to the resulting array of tokens. If the optional argumentstateis provided the split function is able to pause and resume splitting inside #"" and /**/ tokens. Thestateargument should be an initially empty mapping, in which split will store its state between successive calls.
- Method
strip_line_statements
array(Token|array) strip_line_statements(array(Token|array)tokens)- Description
Strips off all (preprocessor) line statements from a token array.
- Method
tokenize
array(Token) tokenize(array(string)s,void|stringfile)- Description
Returns an array of
Tokenobjects given an array of string tokens.
Class Parser.C.Token
- Description
Represents a C token, along with a selection of associated data and operations.
- Variable
trailing_whitespaces
stringParser.C.Token.trailing_whitespaces- Description
Trailing whitespaces.
- Method
_sprintf
stringsprintf(stringformat, ... Parser.C.Tokenarg ... )- Description
If the object is printed as %s it will only output its text contents.
- Method
`+
stringres =Parser.C.Token()+s- Description
A string can be added to the Token, which will be added to the text contents.
- Method
`==
intres =Parser.C.Token()==foo- Description
Tokens are considered equal if the text contents are equal. It is also possible to compare the Token object with a text string directly.
- Method
`[]
int|stringres =Parser.C.Token()[a]- Description
Characters and ranges may be indexed from the text contents of the token.
- Method
``+
stringres =s+Parser.C.Token()- Description
A string can be added to the Token, which will be added to the text contents.
- Method
cast
(int)Parser.C.Token()
(float)Parser.C.Token()
(string)Parser.C.Token()
(array)Parser.C.Token()
(mapping)Parser.C.Token()
(multiset)Parser.C.Token()- Description
It is possible to case a Token object to a string. The text content will be returned.
Class Parser.C.UnterminatedCharacterError
- Description
Error thrown when an unterminated character token is encountered.
Class Parser.C.UnterminatedCommentError
- Description
Error thrown when an unterminated comment token is encountered.
- Method
group
Module Parser.ECMAScript
- Description
ECMAScript/JavaScript token parser based on ECMAScript 2017 (ECMA-262), chapter 11: Lexical Grammar.
Module Parser.LR
- Description
LALR(1) parser generator.
Enum Parser.LR.SeverityLevel
- Description
Severity level
Class Parser.LR.ErrorHandler
- Description
Class handling reporting of errors and warnings.
- Variable
verbose
optionalint(-1..1)Parser.LR.ErrorHandler.verbose- Description
Verbosity level
-1Just errors.
0Errors and warnings.
1Also notices.
Class Parser.LR.Parser
- Description
This object implements an LALR(1) parser and compiler.
Normal use of this object would be:
set_error_handler {add_rule, set_priority, set_associativity}* set_symbol_to_string compile {parse}*
- Variable
error_handler
function(SeverityLevel,string,string,mixed... :void) Parser.LR.Parser.error_handler- Description
Compile error and warning handler.
- Variable
known_states
mapping(string:Kernel) Parser.LR.Parser.known_states- Description
LR0 states that are already known to the compiler.
- Variable
s_q
StateQueue|zeroParser.LR.Parser.s_q- Description
Contains all states used. In the queue section are the states that remain to be compiled.
- Method
_sprintf
stringsprintf(stringformat, ... Parser.LR.Parserarg ... )- Description
Pretty-prints the current grammar to a string.
- Method
cast
(int)Parser.LR.Parser()
(float)Parser.LR.Parser()
(string)Parser.LR.Parser()
(array)Parser.LR.Parser()
(mapping)Parser.LR.Parser()
(multiset)Parser.LR.Parser()- Description
Implements casting.
- Parameter
type Type to cast to.
- Method
compile
intcompile()- Description
Compiles the grammar into a parser, so that parse() can be called.
- Method
item_to_string
stringitem_to_string(Itemi)- Description
Pretty-prints an item to a string.
- Parameter
i Item to pretty-print.
- Method
parse
mixedparse(object|function(void:string|array(string|mixed))scanner,void|objectaction_object)- Description
Parse the input according to the compiled grammar. The last value reduced is returned.
- Note
The parser must have been compiled (with compile()) prior to calling this function.
- Bugs
Errors should be throw()n.
- Parameter
scanner The scanner function. It returns the next symbol from the input. It should either return a string (terminal) or an array with a string (terminal) and a mixed (value). EOF is indicated with the empty string.
- Parameter
action_object Object used to resolve those actions that have been specified as strings.
- Method
rule_to_string
stringrule_to_string(Ruler)- Description
Pretty-prints a rule to a string.
- Parameter
r Rule to print.
- Method
set_associativity
voidset_associativity(stringterminal,intassoc)- Description
Sets the associativity of a terminal.
- Parameter
terminal Terminal to set the associativity for.
- Parameter
assoc Associativity; negative - left, positive - right, zero - no associativity.
- Method
set_error_handler
voidset_error_handler(void|function(SeverityLevel,string,string,mixed... :void)handler)- Description
Sets the error report function.
- Parameter
handler Function to call to report errors and warnings. If zero or not specifier, use the built-in function.
- Method
set_priority
voidset_priority(stringterminal,intpri_val)- Description
Sets the priority of a terminal.
- Parameter
terminal Terminal to set the priority for.
- Parameter
pri_val Priority; higher = prefer this terminal.
- Method
set_symbol_to_string
voidset_symbol_to_string(void|function(int|string:string)s_to_s)- Description
Sets the symbol to string conversion function. The conversion function is used by the various *_to_string functions to make comprehensible output.
- Parameter
s_to_s Symbol to string conversion function. If zero or not specified, use the built-in function.
- Method
state_to_string
stringstate_to_string(Kernelstate)- Description
Pretty-prints a state to a string.
- Parameter
state State to pretty-print.
Class Parser.LR.Parser.Item
- Description
An LR(0) item, a partially parsed rule.
- Variable
direct_lookahead
multiset(string) Parser.LR.Parser.Item.direct_lookahead- Description
Look-ahead set for this item.
- Variable
error_lookahead
multiset(string) Parser.LR.Parser.Item.error_lookahead- Description
Look-ahead set used for detecting conflicts
- Variable
item_id
intParser.LR.Parser.Item.item_id- Description
Used to identify the item. Equal to r->number + offset.
- Variable
master_item
Item|zeroParser.LR.Parser.Item.master_item- Description
Item representing this one (used for shifts).
- Variable
next_state
Kernel|zeroParser.LR.Parser.Item.next_state- Description
The state we will get if we shift according to this rule
- Variable
number
intParser.LR.Parser.Item.number- Description
Item identification number (used when compiling).
- Variable
offset
intParser.LR.Parser.Item.offset- Description
How long into the rule the parsing has come.
Class Parser.LR.Parser.Kernel
- Description
Implements an LR(1) state
- Variable
action
mapping(int|string:Kernel|Rule) Parser.LR.Parser.Kernel.action- Description
The action table for this state
object(kernel) SHIFT to this state on this symbol. object(rule) REDUCE according to this rule on this symbol.
- Variable
closure_set
multisetParser.LR.Parser.Kernel.closure_set- Description
The symbols that closure has been called on.
- Variable
item_id_to_item
mapping(int:Item) Parser.LR.Parser.Kernel.item_id_to_item- Description
Used to lookup items given rule and offset
- Variable
rules
multiset(Rule) Parser.LR.Parser.Kernel.rules- Description
Used to check if a rule already has been added when doing closures.
- Variable
symbol_items
mapping(int:multiset(Item)) Parser.LR.Parser.Kernel.symbol_items- Description
Contains the items whose next symbol is this non-terminal.
- Method
closure
voidclosure(intnonterminal)- Description
Make the closure of this state.
- Parameter
nonterminal Nonterminal to make the closure on.
- Method
do_goto
Kerneldo_goto(int|stringsymbol)- Description
Generates the state reached when doing goto on the specified symbol. i.e. it compiles the LR(0) state.
- Parameter
symbol Symbol to make goto on.
Class Parser.LR.Parser.StateQueue
- Description
This is a queue, which keeps the elements even after they are retrieved.
Class Parser.LR.Priority
- Description
Specifies the priority and associativity of a rule.
Class Parser.LR.Rule
- Description
This object is used to represent a BNF-rule in the LR parser.
- Variable
action
function(:void)|string|zeroParser.LR.Rule.action- Description
Action to do when reducing this rule. function - call this function. string - call this function by name in the object given to the parser. The function is called with arguments corresponding to the values of the elements of the rule. The return value of the function will be the value of this non-terminal. The default rule is to return the first argument.
- Variable
num_nonnullables
intParser.LR.Rule.num_nonnullables- Description
This rule has this many non-nullable symbols at the moment.
- Variable
number
intParser.LR.Rule.number- Description
Sequence number of this rule (used for conflict resolving) Also used to identify the rule.
- Method
create
Parser.LR.RuleParser.LR.Rule(intnt,array(string|int)r,function(:void)|string|voida)- Description
Create a BNF rule.
- Example
The rule
rule : nonterminal ":" symbols ";" { add_rule };
might be created as
rule(4, ({ 9, ":", 5, ";" }), "add_rule");
where 4 corresponds to the nonterminal "rule", 9 to "nonterminal" and 5 to "symbols", and the function "add_rule" is too be called when this rule is reduced.
- Parameter
nt Non-terminal to reduce to.
- Parameter
r Symbol sequence that reduces to nt.
- Parameter
a Action to do when reducing according to this rule. function - Call this function. string - Call this function by name in the object given to the parser. The function is called with arguments corresponding to the values of the elements of the rule. The return value of the function will become the value of this non-terminal. The default rule is to return the first argument.
Module Parser.LR.GrammarParser
- Description
This module generates an LR parser from a grammar specified according to the following grammar:
directives : directive ; directives : directives directive ; directive : declaration ; directive : rule ; declaration : "%token" terminals ";" ; rule : nonterminal ":" symbols ";" ; rule : nonterminal ":" symbols action ";" ; symbols : symbol ; symbols : symbols symbol ; terminals : terminal ; terminals : terminals terminal ; symbol : nonterminal ; symbol : "string" ; action : "{" "identifier" "}" ; nonterminal : "identifier" ; terminal : "string";
- Method
make_parser
Parsermake_parser(stringstr,object|voidm)- Description
Compiles the parser-specification given in the first argument. Named actions are taken from the object if available, otherwise left as is.
- Bugs
Returns error-code in both GrammarParser.error and return_value->lr_error.
Module Parser.Markdown
- Description
This is a port of the Javascript Markdown parser 'Marked' https://github.com/chjj/marked. The only method needed to be used is
parse()which will transform Markdown text to HTML.For a description on Markdown, go to the web page of the inventor of Markdown https://daringfireball.net/projects/markdown/.
- Method
encode_html
protectedstringencode_html(stringhtml,void|boolenc)- Description
HTML encode <>"'. If
encis true& will also be encoded
- Method
encode_tex
protectedstringencode_tex(stringstr)- Description
TeX encode special characters eg #&\\ etc
- Method
parse
stringparse(stringmd,void|mappingoptions)- Description
Convert markdown
mdto html- Parameter
options "gfm":boolEnable Github Flavoured Markdown. (true)
"tables":boolEnable GFM tables. Requires "gfm" (true)
"breaks":boolEnable GFM "breaks". Requires "gfm" (false)
"pedantic":boolConform to obscure parts of markdown.pl as much as possible. Don't fix any of the original markdown bugs or poor behavior. (false)
"sanitize":boolSanitize the output. Ignore any HTML that has been input. (false)
"mangle":boolMangle (obfuscate) autolinked email addresses (true)
"smart_lists":boolUse smarter list behavior than the original markdown. (true)
"smartypants":boolUse "smart" typographic punctuation for things like quotes and dashes. (false)
"header_prefix":stringAdd prefix to ID attributes of header tags (empty)
"xhtml":boolGenerate self closing XHTML tags (false)
"newline":boolAdd a newline after tags. If false the output will be on one line (well, newlines in text will be kept). (false)
"renderer":RendererUse this renderer to render output. (Renderer)
"lexer":LexerUse this lexer to parse blocks of text. (Lexer)
"inline_lexer":InlineLexerUse this lexer to parse inline text. (InlineLexer)
"parser":ParserUse this parser instead of the default. (Parser)
- Method
replace1
protectedstringreplace1(stringsubject,stringfrom,stringto)- Description
Replaces the first occurance of
frominsubjecttoto
Class Parser.Markdown.InlineLexer
- Description
Lexer used for inline text (eg bold text inside a paragraph).
Class Parser.Markdown.LaTeXRenderer
- Method
attrs
stringattrs(mappingtoken,mapping|voiddflt)- Description
Attributes are currently not supported and will be ignored.
- Method
html
Method text
Method strong
Method em
Method del
Method codespan
Method br stringhtml(stringtext,mappingtoken)stringtext(stringt,mappingtoken)stringstrong(stringt,mappingtoken)stringem(stringt,mappingtoken)stringdel(stringt,mappingtoken)stringcodespan(stringt,mappingtoken)stringbr(mappingtoken)
- Method
attrs
Class Parser.Markdown.Lexer
- Description
Block-level lexer (parses paragraphs, lists, tables, etc).
Class Parser.Markdown.Parser
- Description
Top-level parsing handler. It's usually easier to replace the Renderer instead.
Class Parser.Markdown.Renderer
- Method
attrs
stringattrs(mappingtoken,mapping|voiddflt)- Description
Collect additional attributes from the token and render them as HTML attributes. Default attributes can be provided.
- Method
html
Method text
Method strong
Method em
Method del
Method codespan
Method br stringhtml(stringtext,mappingtoken)stringtext(stringt,mappingtoken)stringstrong(stringt,mappingtoken)stringem(stringt,mappingtoken)stringdel(stringt,mappingtoken)stringcodespan(stringt,mappingtoken)stringbr(mappingtoken)
- Method
attrs
Module Parser.Pike
- Description
This module parses and tokenizes Pike source code.
Module Parser.Python
Module Parser._parser
- Description
Low-level helpers for parsers.
- Note
You probably don't want to use the modules contained in this module directly, but instead use the other
Parsermodules. See instead the modules below.- See also
Parser,Parser.C,Parser.Pike,Parser.RCS,Parser.HTML,Parser.XML
Module Parser._parser._C
- Description
Low-level helpers for
Parser.C.- Note
You probably want to use
Parser.Cinstead of this module.- See also
Parser.C,_Pike.
Module Parser._parser._Pike
- Description
Low-level helpers for
Parser.Pike.- Note
You probably want to use
Parser.Pikeinstead of this module.- See also
Parser.Pike,_C.
Module Parser._parser._RCS
- Description
Low-level helpers for
Parser.RCS.- Note
You probably want to use
Parser.RCSinstead of this module.- See also
Parser.RCS
- Method
decode_numeric_xml_entity