Mixp is a Scheme interface to James Clark's expat library. It may be used to parse XML documents with Guile.
If you do not know expat, first have a look at the sample program
See section 1.1 Sample programs. Typically, you will create a parser object with
expat:parser-create
, then associate one or more handlers to it
(usually with expat:set-element-handler
and
expat:set-character-data-handler
), then parse the document with
expat:parse
or mixp:parse-file
. The most
commonly used functions are documented here. You may guess how others
work from their prototypes. See also the test programs in the test/
subdirectory of the distribution.
If you happen to know expat already, you will find easily what you are
looking for by taking a C expat function name, replacing XML_
with expat:
, using hyphens instead of capital letters to separate
the words, and searching it in the reference documentation See section 2 Expat interface. In most cases, the prototype is the same, modulo the
differences between C and Scheme.
Another source of information, maybe more accurate, is expat itself: expat.html, xmlparse.h, and http://www.jclark.com/xml/expatfaq.html.
The following sample program reads an XML file (provided with the Mixp distribution), and prints its start and end tags. You can launch a Guile shell from the samples/ directory of the distribute, and execute this code. Your GUILE_LOAD_PATH variable should contain the directory in which you installed Mixp (that is, the directory which contains the xml/ subdirectory).
(use-modules (xml expat) (xml mixp)) ;; Create the parser object (let ((parser (expat:parser-create))) ;; Specify callback functions (expat:set-element-handler parser (lambda (p name attribs) (display "start ")(display name)(newline)) (lambda (p name) (display "end ")(display name)(newline))) ;; Parse the file (mixp:parse-file parser "REC-xml-19980210.xml"))
For more information about the Expat interface and handlers, See section 2 Expat interface.
The following sample program builds a hierarchical tree structure from an XML document which reside in a string. This tree structure should be easy to use with traditional Scheme functions.
(use-modules (xml mixp)) (let ((xml-doc "<foo name='Paul'><bar>Some text</bar><void/></foo>")) (display (call-with-input-string xml-doc mixp:xml->tree)))
Result is :
((element ("foo" (("name" . "Paul"))) (element ("bar" ()) (character-data "Some text")) (element ("void" ()))))
For more information about this interface, See section 3 High-level extensions.
From the Guile shell or from a Guile script, you should type the following commands before using the Mixp API:
(use-modules (xml expat)) (use-modules (xml mixp))
Actually, you may load just xml:expat
if you intend to use only
the raw expat interface (i.e. the functions which name is prefixed by
expat:
, See section 2 Expat interface.) You need xml:mixp
if
you want to use the extension functions (See section 3 High-level extensions.)
Mixp contains two Scheme modules:
xml:expat
immediately. All the functions in this
module are prefixed with expat:
.
mixp:parse-file
than expat:parse
. This module may evolve
into a higher-level interface, for example an object-based interface.
All the functions in this module are prefixed with mixp:
.
On another point of view, Mixp contains two files: a shared library,
libexpat.so
, which defines the xml:expat
interface and a
part of the xml
interface, and a Scheme file, mixp.scm
,
which defines other parts of the xml:mixp
interface. Both files
are located in the xml
directory somewhere along your
GUILE_LOAD_PATH.
This section describes a few common tasks which may be solved with Mixp.
mixp:parse-data
without specifying a parser. A default one will be
created, and it will do nothing interesting except raising errors if
there is any error in the document:
(call-with-input-string "<doc><elem></elem>" mixp:parse-data)See section 3 High-level extensions.
<title>
and </title
in
an XML document:
(use-modules (xml expat) (xml mixp)) (let ((parser (expat:parser-create)) (in-title-p #f) ;; becomes #t inside the tag (title "")) ;; will contain the result (expat:set-element-handler parser (lambda (p name attribs) (if (equal? name "title") (set! in-title-p #t))) (lambda (p name) (if (equal? name "title") (set! in-title-p #f)))) (expat:set-character-data-handler parser (lambda (data value) (if (equal? in-title-p #t) (set! title (string-append title value))))) (mixp:call-with-input-string "<doc><title>Hello</title></doc>" parser) (display title)(newline))
mixp:xml->tree
.
(call-with-input-file "file.xml" mixp:xml->tree)See section 3 High-level extensions.
DOCTYPE
declaration and
expand the entities.
(use-modules (xml expat) (xml mixp) (ice-9 format)) (use-syntax (ice-9 syncase)) ;; Create the parser object (let ((parser (expat:parser-create)) (external-entity-ref-handler (lambda (my-parser context base system-id public-id) (display (format "Reference to external entity: ~A.\n" system-id)) (open-file system-id "r")))) (expat:set-param-entity-parsing parser 'expat:XML_PARAM_ENTITY_PARSING_ALWAYS) ;; Specify callback functions (expat:set-character-data-handler parser (lambda (p value) (display "Char: ") (display value) (display ".\n"))) (expat:set-external-entity-ref-handler parser external-entity-ref-handler) ;; Parse the file (mixp:parse-file parser "foo.xml"))
This section contains the reference documentation for the expat
interface, i.e the xml:expat
module.
Handlers are functions called by the parser at specific points in the XML document (for example when finding a new tag, or a comment, etc). This section explains how to specify these handlers, and what their prototype should be.
Most of the time, the first argument will be user-data
.
user-data
is either a user-specified buffer (see
expat:set-user-data
) or the parser object itself (see
expat:use-parser-as-handler-arg
).
parser is the parser object returned by
expat:parser-create
.
start-handler is a function to be called when an opening tag is found. This function will be called as follows:
(start-handler user-data name attributes)
name is the tag name. attributes is an association list which contains the attributes with their values.
end-handler is a function to be called when an closing tag is found (<foo>). This function will be called as follows:
(end-handler user-data name)
The arguments have the same meaning as in the start handler.
(handler user-data value)
value is the text encoded in UTF-8.
Sets the processing instruction handler, which should have the following prototype:
(handler user-data pi-data)
This handler will be called by Mixp every time it finds a processing
instruction (<? ... ?>
).
Sets the comment handler, which should have the following prototype:
(handler user-data comment-data)
This handler will be called by Mixp every time it finds a comment
(<!-- ... -->
).
Sets the CDATA section handler, which should have the following prototype:
(start-handler user-data) (end-handler user-data)
This handler will be called by Mixp every time it finds a CDATA section
(<![CDATA[ ... ]]>
).
Sets the default handler and also inhibits expansion of internal entities. The entity reference will be passed to the default handler.
(handler user-data string)
The default handler is called for any characters in the XML document for which there is no applicable handler. This includes both characters that are part of markup which is of a kind that is not reported (comments, markup declarations), or characters that are part of a construct which could be reported but for which no handler has been supplied. The characters are passed exactly as they were in the XML document except that they will be encoded in UTF-8. Line boundaries are not normalized. Note that a byte order mark character is not passed to the default handler. There are no guarantees about how characters are divided between calls to the default handler: for example, a comment might be split between multiple calls.
Sets the default handler but does not inhibit expansion of internal entities. The entity reference will not be passed to the default handler.
(handler user-data string)
See expat:set-default-handler
for a description of the handler.
Sets the unparsed entity declaration handler, which should have the following prototype:
(handler user-data entity-name base system-id public-id notation-name)
The handler is called by Mixp every time it finds a declaration of an unparsed entity (`<!ENTITY Antarctica SYSTEM "http://www.antarctica.net" NDATA vrml>').
The base argument is whatever was set by expat:set-base
.
The entity-name, system-id and notation-name arguments
will never be #f
. The other arguments may be.
(handler user-data notation-name base system-id public-id)
The base argument is whatever was set by
expat:set-base
. notation-name will never be #f
.
The other arguments can be.
Sets the Namespace declaration handler, which should have the following prototype:
(start-namespace-decl-handler user-data prefix uri) (end-namespace-decl-handler user-data prefix uri)
When namespace processing is enabled, these are called once for each namespace declaration. The call to the start and end element handlers occur between the calls to the start and end namespace declaration handlers. For an xmlns attribute, prefix will be null. For an xmlns="" attribute, uri will be null.
Sets the Standalone declartion handler, which should have the following prototype:
(not-standalone-handler user-data)
This is called if the document is not standalone (it has an
external subset or a reference to a parameter entity, but does not
have standalone="yes"). If this handler returns 0, then processing
will not continue, and the parser will return a
expat:XML_ERROR_NOT_STANDALONE
error.
(external-entity-ref-handler user-data context base system-id public-id)
The external entity reference handler is called by Mixp when it finds a
reference to an external entity in the document. For example, the
<!DOCTYPE ...>
declaration contains an external entity reference
when it specifies an external DTD. In that case, you should also call
(expat:set-param-entity-parsing parser)
, because you probably
want Mixp to expand the references to entities declared in your DTD.
See See section 1.4 How to... for an example.
The external entity reference handler should return an open port to the
external entity. For example, assuming that system-id
refers to a
relative file path, you may define the handler as follows:
(lambda (my-parser context base system-id public-id) (display (format "Reference to external entity: ~A.\n" system-id)) (open-file system-id "r"))
The system identifier is defined by the XML specification as a URI. Therefore, the example above will only work if you know that the system id is actually a file path. You may need to use, for example, a http library if you want to support URIs which start with "http://".
Note that the behaviour of this handler is very different in expat.
Also see expat:set-external-entity-ref-handler-arg
.
(unknown-encoding-handler encoding-handler-data name info)
#t
if there is none specified.
expat:use-parser-as-handler-arg
has been called. This value
can be any Scheme value.
expat:set-user-data
and retrieved with
expat:get-user-data
.
expat:get-error-code
to obtain more
information. The last call to expat:parse
must have
is-final set to #t
.
expat:parse
has returned 0 (i.e an
error). The result is a symbol which may have one of the following
values:
An error message describing the error may be obtained by calling
expat:error-string
.
expat:get-error-code
on the parser object.
Expat supports the following encodings : UTF-8, UTF-16, ISO-8859-1, US-ASCII.
The encoding is usually indicated in the first line of an XML file (the
<?xml... ?>
declaration). But every data you will receive in
your handlers (tag names, attributes, character data...), will be
encoded in UTF-8, whatever the original encoding was. UTF-8 represents
ASCII characters with no modification, but represents other characters
with multi-byte characters. In other words, texts with non-ASCII
characters look very strange on most terminals when they're encoded in
UTF-8. ISO-8859-1 has a better support in standard editors, but is too
euro-centric.
The encoding features of expat are not completely supported in Mixp.
Using unknown encoding handlers will not work, or at least I have not
tested that feature. However, XML documents which encoding (as
specified in the <?xml... ?>
declaration) is supported by expat
should be parsed correctly. For example, you should get an error if you
parse a document which claims to be US-ASCII but contains 8-bit
characters.
In the Expat interface, expat:parse
returns 0 when an error is
encountered, i.e when the document is not well-formed. Then
expat:get-error-code
should be called to retrieve an error code,
as a Scheme symbol, which identifies the error. The error codes are
listed in the documentation of expat:get-error-code
(See section 2.2 Other expat functions).
The functions in the Mixp extensions use the same error codes, but they
throw them as exceptions instead of returning 0. The following codes
demonstrates simple error handling with mixp:parse-data
:
(let ((bad-xml "<doc>dfssfd</do>")) (catch #t (lambda () (call-with-input-string bad-xml mixp:parse-data)) (lambda (key) (display "Received an error: ")(display key)(newline))))
@result{Received an error: expat:XML_ERROR_TAG_MISMATCH}
The following function is a part of the expat interface, but it was not implemented.
You should also read the section about encodings See section 2.3 Encodings.
The following functions are extensions to the raw expat interface, but I still don't know exactly what to do here.
expat:parser-create
, and this procedure will just check that the
document is well-formed. port must be an open input port
@xref{Ports,,,guile-ref}. Parsing will continue until the end of the
port data is reached.
expat:parser-create
.
expat:parser-create
.
expat:parser-create
. Each XML node is a small list which
describes a part of the XML file. The first item in that list is a
symbol which value is the node type. The meaning of the other items
depend upon the node type. The following node types are supported
(other kind of data in the XML file is ignored):
start-element
end-element
character-data
notation-decl
entity-decl
mixp:xml->list
above
for the arguments, and for the supported node types. To give an idea of
the tree structure which is supported, let us consider the following
sample XML document.
<foo name='Paul'><bar>Some text</bar><void/></foo>
For this document, mixp:xml->list
will return the following list:
((start-element "foo" (("name" . "Paul"))) (start-element "bar" ()) (character-data "Some text") (end-element "bar") (start-element "void" ()) (end-element "void") (end-element "foo"))
And this is the data structure produced by mixp:xml->tree
:
(element ("foo" (("name" . "Paul"))) (element ("bar" ()) (character-data "Some text")) (element ("void" ())))
Hint: use call-with-input-file
or call-with-input-string
in conjunction with mixp:xml->list
or mixp:xml->tree
to
create structured views of XML documents:
(call-with-input-file "foobar.xml" mixp:xml->tree)
mixp:xml->tree
, into a list, as
returned by mixp:xml->list
.
mixp:xml->list
, into a tree, as
returned by mixp:xml->tree
.
#t
if the arg is a parser object (as created by
expat:parser-create
.
This document was generated on 12 August 2001 using the texi2html translator version 1.51.