In many application environments that deal with XML-formatted data, it is useful to be able to process an XML document in an "event driven" manner, where particular Java objects are created (or methods of existing objects are invoked) when particular patterns of nested XML elements have been recognized. Developers familiar with the Simple API for XML Parsing (SAX) approach to processing XML documents will recognize that the Digester provides a higher level, more developer-friendly interface to SAX events, because most of the details of navigating the XML element hierarchy are hidden -- allowing the developer to focus on the processing to be performed.
In order to use a Digester, the following basic steps are required:
org.apache.commons.digester.Digester class.  Previously
    created Digester instances may be safely reused, as long as you have
    completed any previously requested parse, and you do not try to utilize
    a particular Digester instance from more than one thread at a time.digester.parse() method, passing a reference to the
    XML document to be parsed in one of a variety of forms.  See the
    Digester.parse()
    documentation for details.  Note that you will need to be prepared to
    catch any IOException or SAXException that is
    thrown by the parser, or any runtime expression that is thrown by one of
    the processing rules.A org.apache.commons.digester.Digester instance contains several
configuration properties that can be used to customize its operation.  These
properties must be configured before you call one of the
parse() variants, in order for them to take effect on that
parse.
Property Description classLoader You can optionally specify the class loader that will be used to load classes when required by the ObjectCreateRuleandFactoryCreateRulerules. If not specified, application classes will be loaded from the thread's context class loader (if theuseContextClassLoaderproperty is set totrue) or the same class loader that was used to load theDigesterclass itself.debug An integer defining the amount of debugging output that will be written to System.out()as the parse progresses. This is useful when tracking down where parsing problems are occurring. The default value of zero means no debugging output will be generated -- increasing values generally cause the generation of more verbose and detailed debugging information.errorHandler You can optionally specify a SAX ErrorHandlerthat is notified when parsing errors occur. By default, any parsing errors that are encountered are logged, but Digester will continue processing as well.namespaceAware A boolean that is set to trueto perform parsing in a manner that is aware of XML namespaces. Among other things, this setting affects how elements are matched to processing rules. See Namespace Aware Parsing for more information.ruleNamespaceURI The public URI of the namespace for which all subsequently added rules are associated, or nullfor adding rules that are not associated with any namespace. See Namespace Aware Parsing for more information.rules The Rulescomponent that actually performs matching ofRuleinstances against the current element nesting pattern is pluggable. By default, Digester includes aRulesimplementation that behaves as described in this document. See Pluggable Rules Processing for more information.useContextClassLoader A boolean that is set to trueif you want application classes required byFactoryCreateRuleandObjectCreateRuleto be loaded from the context class loader of the current thread. By default, classes will be loaded from the class loader that loaded thisDigesterclass. NOTE - This property is ignored if you set a value for theclassLoaderproperty; that class loader will be used unconditionally.validating A boolean that is set to trueif you wish to validate the XML document against a Document Type Definition (DTD) that is specified in itsDOCTYPEdeclaration. The default value offalserequests a parse that only detects "well formed" XML documents, rather than "valid" ones.
In addition to the scalar properties defined above, you can also register
a local copy of a Document Type Definition (DTD) that is referenced in a
DOCTYPE declaration.  Such a registration tells the XML parser
that, whenever it encounters a DOCTYPE declaration with the
specified public identifier, it should utilize the actual DTD content at the
registered system identifier (a URL), rather than the one in the
DOCTYPE declaration.
For example, the Struts framework controller servlet uses the following registration in order to tell Struts to use a local copy of the DTD for the Struts configuration file. This allows usage of Struts in environments that are not connected to the Internet, and speeds up processing even at Internet connected sites (because it avoids the need to go across the network).
    URL url = new URL("/org/apache/struts/resources/struts-config_1_0.dtd");
    digester.register
      ("-//Apache Software Foundation//DTD Struts Configuration 1.0//EN",
       url.toString());
As a side note, the system identifier used in this example is the path
that would be passed to java.lang.ClassLoader.getResource()
or java.lang.ClassLoader.getResourceAsStream().  The actual DTD
resource is loaded through the same class loader that loads all of the Struts
classes -- typically from the struts.jar file.
One very common use of org.apache.commons.digester.Digester
technology is to dynamically construct a tree of Java objects, whose internal
organization, as well as the details of property settings on these objects,
are configured based on the contents of the XML document.  In fact, the
primary reason that the Digester package was created (it was originally part
of Struts, and then moved to the Commons project because it was recognized
as being generally useful) was to facilitate the
way that the Struts controller servlet configures itself based on the contents
of your application's struts-config.xml file.
To facilitate this usage, the Digester exposes a stack that can be manipulated by processing rules that are fired when element matching patterns are satisfied. The usual stack-related operations are made available, including the following:
A typical design pattern, then, is to fire a rule that creates a new object and pushes it on the stack when the beginning of a particular XML element is encountered. The object will remain there while the nested content of this element is processed, and it will be popped off when the end of the element is encountered. As we will see, the standard "object create" processing rule supports exactly this functionalility in a very convenient way.
Several potential issues with this design pattern are addressed by other features of the Digester functionality:
A primary feature of the org.apache.commons.digester.Digester
parser is that the Digester automatically navigates the element hierarchy of
the XML document you are parsing for you, without requiring any developer
attention to this process.  Instead, you focus on deciding what functions you
would like to have performed whenver a certain arrangement of nested elements
is encountered in the XML document being parsed.  The mechanism for specifying
such arrangements are called element matching patterns.
A very simple element matching pattern is a simple string like "a".  This
pattern is matched whenever an <a> top-level element is
encountered in the XML document, no matter how many times it occurs.  Note that
nested <a> elements will not match this
pattern -- we will describe means to support this kind of matching later.
The next step up in matching pattern complexity is "a/b".  This pattern will
be matched when a <b> element is found nested inside a
top-level <a> element.  Again, this match can occur as many
times as desired, depending on the content of the XML document being parsed.
You can use multiple slashes to define a hierarchy of any desired depth that
will be matched appropriately.
For example, assume you have registered processing rules that match patterns "a", "a/b", and "a/b/c". For an input XML document with the following contents, the indicated patterns will be matched when the corresponding element is parsed:
  <a>         -- Matches pattern "a"
    <b>       -- Matches pattern "a/b"
      <c/>    -- Matches pattern "a/b/c"
      <c/>    -- Matches pattern "a/b/c"
    </b>
    <b>       -- Matches pattern "a/b"
      <c/>    -- Matches pattern "a/b/c"
      <c/>    -- Matches pattern "a/b/c"
      <c/>    -- Matches pattern "a/b/c"
    </b>
  </a>
It is also possible to match a particular XML element, no matter how it is
nested (or not nested) in the XML document, by using the "*" wildcard character
in your matching pattern strings.  For example, an element matching pattern
of "*/a" will match an <a> element at any nesting position
within the document.
It is quite possible that, when a particular XML element is being parsed, the pattern for more than one registered processing rule will be matched either because you registered more than one processing rule with the same matching pattern, or because one more more exact pattern matches and wildcard pattern matches are satisfied by the same element.
When this occurs, the corresponding processing rules will all be fired in order. 
begin (and body) method calls are executed in the 
order that the Rules where initially registered with the 
Digester, whilst end method calls are execute in 
reverse order. In other words - the order is first in, last out.
The previous section documented how you identify when you wish to have certain actions take place. The purpose of processing rules is to define what should happen when the patterns are matched.
Formally, a processing rule is a Java class that subclasses the org.apache.commons.digester.Rule interface. Each Rule implements one or more of the following event methods that are called at well-defined times when the matching patterns corresponding to this rule trigger it:
As you are configuring your digester, you can call the
addRule() method to register a specific element matching pattern,
along with an instance of a Rule class that will have its event
handling methods called at the appropriate times, as described above.  This
mechanism allows you to create Rule implementation classes
dynamically, to implement any desired application specific functionality.
In addition, a set of processing rule implementation classes are provided, which deal with many common programming scenarios. These classes include the following:
begin() method is called, this rule instantiates a new
    instance of a specified Java class, and pushes it on the stack.  The
    class name to be used is defaulted according to a parameter passed to
    this rule's constructor, but can optionally be overridden by a classname
    passed via the specified attribute to the XML element being processed.
    When the end() method is called, the top object on the stack
    (presumably, the one we added in the begin() method) will
    be popped, and any reference to it (within the Digester) will be
    discarded.ObjectCreateRule that is useful when the Java class with
    which you wish to create an object instance does not have a no-arguments
    constructor, or where you wish to perform other setup processing before
    the object is handed over to the Digester.begin() method is called, the digester uses the standard
    Java Reflection API to identify any JavaBeans property setter methods
    (on the object at the top of the digester's stack)
    who have property names that match the attributes specified on this XML
    element, and then call them individually, passing the corresponding
    attribute values. These natural mappings can be overridden. This allows
    (for example) a class attribute to be mapped correctly.
    It is recommended that this feature should not be overused - in most cases,
    it's better to use the standard BeanInfo mechanism.
    A very common idiom is to define an object create
    rule, followed by a set properties rule, with the same element matching
    pattern.  This causes the creation of a new Java object, followed by
    "configuration" of that object's properties based on the attributes
    of the same XML element that created this object.begin() method is called, the digester calls a specified
    property setter (where the property itself is named by an attribute)
    with a specified value (where the value is named by another attribute),
    on the object at the top of the digester's stack.
    This is useful when your XML file conforms to a particular DTD, and
    you wish to configure a particular property that does not have a
    corresponding attribute in the DTD.end() method is called, the digester analyzes the
    next-to-top element on the stack, looking for a property setter method
    for a specified property.  It then calls this method, passing the object
    at the top of the stack as an argument.  This rule is commonly used to
    establish one-to-many relationships between the two objects, with the
    method name commonly being something like "addChild".end() method is called, the digester analyzes the
    top element on the stack, looking for a property setter method for a
    specified property.  It then calls this method, passing the next-to-top
    object on the stack as an argument.  This rule would be used as an
    alternative to a SetNextRule, with a typical method name "setParent",
    if the API supported by your object classes prefers this approach.end() method is
    called.  You configure this rule by specifying the name of the method
    to be called, the number of arguments it takes, and (optionally) the
    Java class name(s) defining the type(s) of the method's arguments.
    The actual parameter values, if any, will typically be accumulated from
    the body content of nested elements within the element that triggered
    this rule, using the CallParamRule discussed next.DOM Node and then
    pushes it onto the stack.You can create instances of the standard Rule classes and
register them by calling digester.addRule(), as described above.
However, because their usage is so common, shorthand registration methods are
defined for each of the standard rules, directly on the Digester
class.  For example, the following code sequence:
    Rule rule = new SetNextRule(digester, "addChild",
                                "com.mycompany.mypackage.MyChildClass");
    digester.addRule("a/b/c", rule);
can be replaced by:
    digester.addSetNext("a/b/c", "addChild",
                        "com.mycompany.mypackage.MyChildClass");
Logging is a vital tool for debugging Digester rulesets. Digester can log copious amounts of debugging information. So, you need to know how logging works before you start using Digester seriously.
Digester uses Jakarta Commons Logging. This component is not really a logging framework - rather an extensible, configurable bridge. It can be configured to swallow all log messages, to provide very basic logging by itself or to pass logging messages on to more sophisticated logging frameworks. Commons-logging comes with connectors for many popular logging frameworks. Consult the commons-logging documentation for more information.
Two main logs are used by Digester.
org.apache.commons.digester.Digester.sax. This log gives 
information about the basic SAX events received by digester.org.apache.commons.digester.Digester is used for 
 everything else. You'll probably want to have this log turned up during debugging but turned down
 during production due to the high message volume.As stated earlier, the primary reason that the
Digester package was created is because the
Struts controller servlet itself needed a robust, flexible, easy to extend
mechanism for processing the contents of the struts-config.xml
configuration that describes nearly every aspect of a Struts-based application.
Because of this, the controller servlet contains a comprehensive, real world,
example of how the Digester can be employed for this type of a use case.
See the initDigester() method of class
org.apache.struts.action.ActionServlet for the code that creates
and configures the Digester to be used, and the initMapping()
method for where the parsing actually takes place.
(Struts binary and source distributions can be acquired at http://jakarta.apache.org/struts/.)
The following discussion highlights a few of the matching patterns and processing rules that are configured, to illustrate the use of some of the Digester features. First, let's look at how the Digester instance is created and initialized:
    Digester digester = new Digester();
    digester.push(this);
    digester.setDebug(detail);
    digester.setValidating(true);
We see that a new Digester instance is created, and is configured to use a validating parser. Validation will occur against the struts-config_1_0.dtd DTD that is included with Struts (as discussed earlier). In order to provide a means of tracking the configured objects, the controller servlet instance itself will be added to the digester's stack.
    digester.addObjectCreate("struts-config/global-forwards/forward",
                             forwardClass, "className");
    digester.addSetProperties("struts-config/global-forwards/forward");
    digester.addSetNext("struts-config/global-forwards/forward",
                        "addForward",
                        "org.apache.struts.action.ActionForward");
    digester.addSetProperty
      ("struts-config/global-forwards/forward/set-property",
       "property", "value");
The rules created by these lines are used to process the global forward
declarations.  When a <forward> element is encountered,
the following actions take place:
ActionForward
    instance that will represent this definition.  The Java class name
    defaults to that specified as an initialization parameter (which
    we have stored in the String variable forwardClass), but can
    be overridden by using the "className" attribute (if it is present in the
    XML element we are currently parsing).  The new ActionForward
    instance is pushed onto the stack.ActionForward instance (at the top of
    the stack) are configured based on the attributes of the
    <forward> element.<set-property> element
    cause calls to additional property setter methods to occur.  This is
    required only if you have provided a custom implementation of the
    ActionForward class with additional properties that are
    not included in the DTD.addForward() method of the next-to-top object on
    the stack (i.e. the controller servlet itself) will be called, passing
    the object at the top of the stack (i.e. the ActionForward
    instance) as an argument.  This causes the global forward to be
    registered, and as a result of this it will be remembered even after
    the stack is popped.<forward> element, the top element
    (i.e. the ActionForward instance) will be popped off the
    stack.Later on, the digester is actually executed as follows:
    InputStream input =
      getServletContext().getResourceAsStream(config);
    ...
    try {
        digester.parse(input);
        input.close();
    } catch (SAXException e) {
        ... deal with the problem ...
    }
As a result of the call to parse(), all of the configuration
information that was defined in the struts-config.xml file is
now represented as collections of objects cached within the Struts controller
servlet, as well as being exposed as servlet context attributes.
The Digester module also allows you to process the nested body text in an
XML file, not just the elements and attributes that are encountered.  The
following example is based on an assumed need to parse the web application
deployment descriptor (/WEB-INF/web.xml) for the current web
application, and record the configuration information for a particular
servlet.  To record this information, assume the existence of a bean class
with the following method signatures (among others):
  package com.mycompany;
  public class ServletBean {
    public void setServletName(String servletName);
    public void setServletClass(String servletClass);
    public void addInitParam(String name, String value);
  }
We are going to process the web.xml file that declares the
controller servlet in a typical Struts-based application (abridged for
brevity in this example):
  <web-app>
    ...
    <servlet>
      <servlet-name>action</servlet-name>
      <servlet-class>org.apache.struts.action.ActionServlet<servlet-class>
      <init-param>
        <param-name>application</param-name>
        <param-value>org.apache.struts.example.ApplicationResources<param-value>
      </init-param>
      <init-param>
        <param-name>config</param-name>
        <param-value>/WEB-INF/struts-config.xml<param-value>
      </init-param>
    </servlet>
    ...
  </web-app>
Next, lets define some Digester processing rules for this input file:
  digester.addObjectCreate("web-app/servlet",
                           "com.mycompany.ServletBean");
  digester.addCallMethod("web-app/servlet/servlet-name", "setServletName", 0);
  digester.addCallMethod("web-app/servlet/servlet-class",
                         "setServletClass", 0);
  digester.addCallMethod("web-app/servlet/init-param",
                         "addInitParam", 2);
  digester.addCallParam("web-app/servlet/init-param/param-name", 0);
  digester.addCallParam("web-app/servlet/init-param/param-value", 1);
Now, as elements are parsed, the following processing occurs:
com.mycompany.ServletBean
    object is created, and pushed on to the object stack.setServletName() method
    of the top object on the stack (our ServletBean) is called,
    passing the body content of this element as a single parameter.setServletClass() method
    of the top object on the stack (our ServletBean) is called,
    passing the body content of this element as a single parameter.addInitParam
    method of the top object on the stack (our ServletBean) is
    set up, but it is not called yet.  The call will be
    expecting two String parameters, which must be set up by
    subsequent call parameter rules.addInitParam()
    that we have set up is now executed, which will cause a new name-value
    combination to be recorded in our bean.addInitParam() with the
    second parameter's name and value.ServletBean we pushed earlier) is
    popped off the object stack.For digesting XML documents that do not use XML namespaces, the default
behavior of Digester, as described above, is generally sufficient.
However, if the document you are processing uses namespaces, it is often
convenient to have sets of Rule instances that are only
matched on elements that use the prefix of a particular namespace.  This
approach, for example, elegantly deals with element names that are the same
in different namespaces, but where you want to perform different processing
for each namespace.  To accomplish this, follow these steps:
Digester that you will be doing namespace
    aware parsing, by adding this statement in your initalization
    of the Digester's properties:
    
    digester.setNamespaceAware(true);
    
    digester.setRuleNamespaceURI("http://www.mycompany.com/MyNamespace");
    addObjectCreate() or
    addSetProperties().  In the matching patterns you specify,
    use only the local name portion of the elements (i.e. the
    part after the prefix and associated colon (":") character:
    
    digester.addObjectCreate("foo/bar", "com.mycompany.MyFoo");
    digester.addSetProperties("foo/bar");
    Digester run.Now, consider that you might wish to digest the following document, using the rules that were set up in the steps above:
<m:foo xmlns:m="http://www.mycompany.com/MyNamespace" xmlns:y="http://www.yourcompany.com/YourNamespace"> <m:bar name="My Name" value="My Value"/> <y:bar id="123" product="Product Description"/>L </x:foo>
Note that your object create and set properties rules will be fired for the
first occurrence of the bar element, but not the
second one.  This is because we declared that our rules only matched
for the particular namespace we are interested in.  Any elements in the
document that are associated with other namespaces (or no namespaces at all)
will not be processed.  In this way, you can easily create rules that digest
only the portions of a compound document that they understand, without placing
any restrictions on what other content is present in the document.
You might also want to look at Encapsulated Rule Sets if you wish to reuse a particular set of rules, associated with a particular namespace, in more than one application context.
Using rules with namespaces is very useful when you have orthogonal rulesets. One ruleset applies to a namespace and is independent of other rulesets applying to other namespaces. However, if your rule logic requires mixed namespaces, then matching namespace prefix patterns might be a better strategy.
When you set the NamespaceAware property to false, digester uses
the qualified element name (which includes the namespace prefix) rather than the
local name as the patten component for the element. This means that your pattern
matches can include namespace prefixes as well as element names. So, rather than
create namespace-aware rules, create pattern matches including the namespace
prefixes.
For example, (with NamespaceAware false), the pattern 
'foo:bar' will match a top level element named 'bar' in the 
namespace with (local) prefix 'foo'.
By default, Digester selects the rules that match a particular
pattern of nested elements as described under
Element Matching Patterns.  If you prefer to use
different selection policies, however, you can create your own implementation
of the org.apache.commons.digester.Rules interface,
or subclass the corresponding convenience base class
org.apache.commons.digester.RulesBase.
Your implementation of the match() method will be called when the
processing for a particular element is started or ended, and you must return
a List of the rules that are relevant for the current nesting
pattern.  The order of the rules you return is significant,
and should match the order in which rules were initally added.
Your policy for rule selection should generally be sensitive to whether
Namespace Aware Parsing is taking place.  In
general, if namespaceAware is true, you should select only rules
that:
ExtendedBaseRules, adds some additional expression syntax for pattern matching to the default mechanism, but it also executes more slowly. See the JavaDocs for more details on the new pattern matching syntax, and suggestions on when this implementation should be used. To use it, simply do the following as part of your Digester initialization:
Digester digester = ... ... digester.setRules(new ExtendedBaseRules()); ...
RegexRules is an advanced Rules 
implementation which does not build on the default pattern matching rules.
It uses a pluggable RegexMatcher implementation to test if a path matches the pattern
for a Rule. All matching rules are returned (note that this behaviour differs from 
longest matching rule of the default pattern matching rules). See the Java Docs for more details.
Example usage:
Digester digester = ... ... digester.setRules(new RegexRules(new SimpleRegexMatcher())); ...
All of the examples above have described a scenario where the rules to be
processed are registered with a Digester instance immediately
after it is created.  However, this approach makes it difficult to reuse the
same set of rules in more than one application environment.  Ideally, one
could package a set of rules into a single class, which could be easily
loaded and registered with a Digester instance in one easy step.
The RuleSet interface (and the convenience base
class RuleSetBase) make it possible to do this.
In addition, the rule instances registered with a particular
RuleSet can optionally be associated with a particular namespace,
as described under Namespace Aware Processing.
An example of creating a RuleSet might be something like this:
public class MyRuleSet extends RuleSetBase {
  public MyRuleSet() {
    this("");
  }
  public MyRuleSet(String prefix) {
    super();
    this.prefix = prefix;
    this.namespaceURI = "http://www.mycompany.com/MyNamespace";
  }
  protected String prefix = null;
  public void addRuleInstances(Digester digester) {
    digester.addObjectCreate(prefix + "foo/bar",
      "com.mycompany.MyFoo");
    digester.addSetProperties(prefix + "foo/bar");
  }
}
You might use this RuleSet as follow to initialize a
Digester instance:
  Digester digester = new Digester();
  ... configure Digester properties ...
  digester.addRuleSet(new MyRuleSet("baz/"));
A couple of interesting notes about this approach:
RuleSet being used is associated
    with a particular namespace URI.  That knowledge is emedded inside the
    RuleSet class itself.RuleSet to
    allow this to be specified dynamically.MyRuleSet example above illustrates another technique
    that increases reusability -- you can specify (as an argument to the
    constructor) the leading portion of the matching pattern to be used.
    In this way, you can construct a Digester that recognizes
    the same set of nested elements at different nesting levels within an
    XML document.If you're using a JAXP 1.1 parser, you might see the following warning (in your log):
[WARN] Digester - -Error: JAXP SAXParser property not recognized: http://java.sun.com/xml/jaxp/properties/schemaLanguage
Digester ships with a sample application: a mapping for the Rich Site Summary format used by many newsfeeds. Download the source distribution to see how it works.
The Rich Site Summary application is intended to be a sample application. It works but we have no plans to add support for other versions of the format.
We would consider donations of standard digester applications but it's unlikely that these would ever be shipped with the base digester distribution. If you want to discuss this, please post to common-dev mailing list
There is an issue when invoking public methods contained in a default access superclass.
Reflection locates these methods fine and correctly assigns them as public.
However, an IllegalAccessException is thrown if the method is invoked.
MethodUtils contains a workaround for this situation. 
It will attempt to call setAccessible on this method.
If this call succeeds, then the method can be invoked as normal.
This call will only succeed when the application has sufficient security privilages. 
If this call fails then a warning will be logged and the method may fail.
Digester uses MethodUtils and so there may be an issue accessing methods
of this kind from a high security environment. If you think that you might be experiencing this 
problem, please ask on the mailing list.