My Tech Blogs!!!: 2007

Tuesday, November 06, 2007

HtmlParser: Parsing by tag attribute

HTML Parser is a Java library used to parse HTML in either a linear or nested fashion. I have tried various open source parsers like WebHarvest etc and found this one to be the most robust when handling bad and nasty html. My primary purpose in using a parser is to extract content from websites. Other people have other needs from the parser which this might not suffice. It has some pretty cool features like filters, visitors, custom tags and easy to use JavaBeans. It is a fast, robust and well tested package. One of the drawbacks is on the documentation front. There is minimal documentation around and most of the stuff I discovered is by playing around.

So my need here was to be able to extract content from a given tag and the way to identify the tag is by using its ID field. For instance I want to extract the text "some text two" from the below page:

<html><body><div id='one'> some text one </div> <div id='two'> some text two </div></body></html>

Here's the code sample to accomplish this:


import org.htmlparser.Parser;
import org.htmlparser.beans.StringBean;
import org.htmlparser.filters.AndFilter;
import org.htmlparser.filters.HasAttributeFilter;
import org.htmlparser.filters.NodeClassFilter;
import org.htmlparser.tags.Div;
import org.htmlparser.util.NodeList;

....

Parser parse = new Parser("[[url here]]");
// if you have html you can alternativey use the parse.setInputHTML method
NodeList lstNodes = parse.extractAllNodesThatMatch (
     new AndFilter (new NodeClassFilter (Div.class), new HasAttributeFilter ("id")));
if (lstNodes != null && lstNodes.size() > 0)
{
   Div tag = null;
   for (int itr=0; itr<lstNodes.size(); itr++)
   {
      tag = (Div)lstNodes.elementAt(itr);
      String idAttribute = tag.getAttribute("id");
      if (idAttribute != null && idAttribute.equals("two"))
      {
          // this will print the div html <div id='two'> some text two </div>

          System.out.println(tag.toPlainTextString());
          // now I need to extract the text from this div tag
          Parser tagParser = new Parser();
          tagParser.setInputHTML(tag.toHtml());
          StringBean sb = new StringBean ();
          tagParser.visitAllNodesWith (sb);
          System.out.println(sb.getStrings ()); // this will print the content "some text two"
      }
  }
}

Friday, August 17, 2007

Resolving hibernate Duplicate Entry Issue

So the other day I encountered this issue with hibernate duplicate entry. The error is usually of this sort (See exception stack trace below. We use Spring with hibernate in our project and that explains the DataIntegrityViolationException? being thrown here. Spring categorizes the black box SQLException to more granular level sql exceptions).

I encountered this error when running my testcase. The situation looked fairly simple to resolve at first sight. The exception pointed me to the erroring model "save" call in the code and all I had to do was to ensure that the duplicated key being reported did not already exist in the corresponding model db table. However the model table actually did not contain the erroring key. Then I tried to flush my session in desperate aim to put the blame on hibernate. That did not resolve the issue.

In looking at detail I noticed that there were several other model inserts happening before the erroring model "save" call and given the fact that these objects are related to each other, any of the previous/related inserts might be contributing to this problem. The erroring model was further tough to track becuase the testcase initially cleans all the test related tables in the test database so pretty much all tables were starting with a key 1. So it pretty much boiled down to identifying the erroring model.

I enabled the hibernate logs to print the executing SQL statements (so basically in your log4j properties you can add this line (log4j.logger.net.sf.hibernate.SQL=DEBUG) and this will spit out the executing native sql queries to the root appender). By doing this I was able to find the exact insert statement that was causing the issue. In fact the table erroring out was not being cleaned before the test case got executed and that caused issue with existing data in that table.

Exception stack trace from duplicate entry key issue:

12:37:30,969 WARN  [main] JDBCExceptionReporter.logExceptions(38) - SQL Error: 1062, SQLState: 23000
    [junit] 12:37:30,970 ERROR [main] JDBCExceptionReporter.logExceptions(46) - Duplicate entry '1' for key 1
    [junit] 12:37:30,971 WARN  [main] JDBCExceptionReporter.logExceptions(38) - SQL Error: 1062, SQLState: 23000
    [junit] 12:37:30,971 ERROR [main] JDBCExceptionReporter.logExceptions(46) - Duplicate entry '1' for key 1
    [junit] 12:37:30,973 ERROR [main] JDBCException.<init>(38) - Could not execute JDBC batch update
    [junit] java.sql.BatchUpdateException: Duplicate entry '1' for key 1
    [junit] at com.mysql.jdbc.ServerPreparedStatement.executeBatch(ServerPreparedStatement.java:647)
    [junit] at org.apache.commons.dbcp.DelegatingStatement.executeBatch(DelegatingStatement.java:294)
    [junit] at net.sf.hibernate.impl.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:54)
    [junit] at net.sf.hibernate.impl.BatcherImpl.executeBatch(BatcherImpl.java:126)
    [junit] at net.sf.hibernate.impl.SessionImpl.executeAll(SessionImpl.java:2440)
    [junit] at net.sf.hibernate.impl.SessionImpl.executeInserts(SessionImpl.java:2329)
    [junit] at net.sf.hibernate.impl.SessionImpl.doSave(SessionImpl.java:884)
    [junit] at net.sf.hibernate.impl.SessionImpl.doSave(SessionImpl.java:865)
    [junit] at net.sf.hibernate.impl.SessionImpl.saveWithGeneratedIdentifier(SessionImpl.java:783)
    [junit] at net.sf.hibernate.impl.SessionImpl.save(SessionImpl.java:746)
    [junit] at net.sf.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:1396)
    [junit] at org.springframework.orm.hibernate.HibernateTemplate$12.doInHibernate(HibernateTemplate.java:583)
    [junit] at org.springframework.orm.hibernate.HibernateTemplate.execute(HibernateTemplate.java:357)
    [junit] at org.springframework.orm.hibernate.HibernateTemplate.saveOrUpdate(HibernateTemplate.java:580)
[junit] Hibernate operation: Could not execute JDBC batch update; SQL []; Duplicate entry '2' for key 1; nested exception is java.sql.BatchUpdateException: Duplicate entry '1' for key 1
    [junit] org.springframework.dao.DataIntegrityViolationException: Hibernate operation: Could not execute JDBC batch update; SQL []; Duplicate entry '1' for key 1; nested exception is java.sql.BatchUpdateException: Duplicate entry '1' for key 1
    [junit] java.sql.BatchUpdateException: Duplicate entry '1' for key 1
    [junit] at com.mysql.jdbc.ServerPreparedStatement.executeBatch(ServerPreparedStatement.java:647)
    [junit] at org.apache.commons.dbcp.DelegatingStatement.executeBatch(DelegatingStatement.java:294)
    [junit] at net.sf.hibernate.impl.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:54)
    [junit] at net.sf.hibernate.impl.BatcherImpl.executeBatch(BatcherImpl.java:126)
    [junit] at net.sf.hibernate.impl.SessionImpl.executeAll(SessionImpl.java:2440)
    [junit] at net.sf.hibernate.impl.SessionImpl.executeInserts(SessionImpl.java:2329)
    [junit] at net.sf.hibernate.impl.SessionImpl.doSave(SessionImpl.java:884)
    [junit] at net.sf.hibernate.impl.SessionImpl.doSave(SessionImpl.java:865)
    [junit] at net.sf.hibernate.impl.SessionImpl.saveWithGeneratedIdentifier(SessionImpl.java:783)
    [junit] at net.sf.hibernate.impl.SessionImpl.save(SessionImpl.java:746)
    [junit] at net.sf.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:1396)
    [junit] at org.springframework.orm.hibernate.HibernateTemplate$12.doInHibernate(HibernateTemplate.java:583)
    [junit] at org.springframework.orm.hibernate.HibernateTemplate.execute(HibernateTemplate.java:357)
    [junit] at org.springframework.orm.hibernate.HibernateTemplate.saveOrUpdate(HibernateTemplate.java:580)

Saturday, August 11, 2007

eclipse debugging tomcat using jpda issues

So the other day I was getting this issue of "Connection refused" when trying to connect my eclipse debugger to the tomcat localhost with jpda enabled. Try these suggestions:

1) some sites online suggest to add this line at the end of the startup.bat file

...
set JPDA_TRANSPORT=dt_socket
set JPDA_ADDRESS=8000
call "%EXECUTABLE%" jpda start %CMD_LINE_ARGS%
...

The above setting may not work because the values are overidden in the catalina.bat file. So the best way would be to change the values in the catalina.bat file as following

.....
if not ""%1"" == ""jpda"" goto noJpda
set JPDA=jpda
if not "%JPDA_TRANSPORT%" == "" goto gotJpdaTransport
set JPDA_TRANSPORT=dt_socket
:gotJpdaTransport
if not "%JPDA_ADDRESS%" == "" goto gotJpdaAddress
set JPDA_ADDRESS=8000
:gotJpdaAddress
....

2) First ensure that the jpda is listening at the specified port. When tomcat starts with joda enabled it spits out a line of this sort "Listening for transport dt_socket at address: 8000". This line ensures that jpda is started. Make sure you are specifying the similar attributes to eclipse. My eclipse version only allows a connection type of socket so dt_socket was the only option for me.

3) On Windows it may indicate that the Windows Firewall is blocking the connection (it may not notify you of the blockage, even if configured to do so). Go to Control Panel / Windows Firewall / Exceptions / Add Port, and set Name = remote
debugging, Port = 8000, TCP.

Saturday, July 21, 2007

Agile Management of Database Changes

The last couple of weeks I have been researching on a better way to manage our application database changes. We have a small size team (8-9 persons) working on the application and we have a hard time managing the database changes. The way we handle the changes currently is not the most elegant approach but is followed in many team environments.

In the current system we store the database changes as sql statements in source control categorized by the changed component such as tables, sequences, indexes etc. So we basically have a "db" folder under source control containg separate folders for various DB components such as tables, sequences, indexes etc. For each new DB component a new sql file is created and is stored in the corresponding db component folder and changes to existing DB components are added to corresponding DB component creation script.

Here are some of the issues with our current approach:

1) Hard to compare db changes between build versions

2) No elegant way to release db changes to production/staging/other team member dev boxes. In order to push the db changes to production system @ build time, a separate changes.sql file is created and run at release time. Sometimes for developers to catch up with current db state, they need to go through a maze of sql files and manually determine the changes that need to be synched.

3) Hard to switch database to a previous release version. Database changes must be correctly segregated to accomplish the above scenario.

Looking around in search of an elegant approach I stumbled across this concept of agile databases. Of many things that constitute the core principals of agile databases, incremental change and easy change management is one of them. So I found this tool called migratDB that can be used to manage db changes.

MigrateDB is a simple xml based solution that applies all the changes defined by you in the XML file. For each sql change, you provide a pre-condition or test condition for the execution of the sql and the SQL is executed when the condition is met. The tool provides command line support and also gives ANT integration.

Here are some of the salient features (taken from migratDB):

1) Allows construction of a database at a particular version
2) Allows migration from an existing database to a later version
3) Human readable format for releases
4) Release ‘action’ available on multiple environments (i.e., various operating systems) allowing development on a different platform than production
5) Provides complete history of changes for each database object
6) The source code can be branched and merged
7) Allows multiple developers to work with/on the same database source code, at the same time
8) Supports an ‘automated build’ / ‘continuous integration’ enviroment

Here's the setup I played around with:

Database: Postgres

Sample changes.xml file:


<project> 



  <!-- test & action to create a new employee table -->
  <change>
    <sqltest exists="false">
       select * from information_schema.tables where table_name = 'employee';
    </sqltest>
    <sqlaction>
      CREATE TABLE employee (
      emp_id int4 NOT NULL DEFAULT nextval('employee_empId_seq'::text),
      first_name VARCHAR(12) NOT NULL,
      middle_initial CHAR(1) NOT NULL,
      last_name VARCHAR(15) NOT NULL,
      dept_id int4,
      phone_number CHAR(4),
      hire_date DATE,
      job VARCHAR(50),
      education_level int2 NOT NULL,
      sex CHAR(1) ,
      birth_date DATE ,
      salary NUMERIC ,
      bonus NUMERIC ,
      commission NUMERIC ,
      CONSTRAINT employee_emp_id_pk PRIMARY KEY (emp_id),
      CONSTRAINT employee_phone_number_ck CHECK (phone_number >= '0000' AND phone_number <= '9999')
      );
    </sqlaction>
    <sqlaction>
      ALTER TABLE employee OWNER TO pgsql;
    </sqlaction>
    <sqlaction>
      GRANT ALL ON TABLE employee TO pgsql;
    </sqlaction>
  </change>



  <!-- test & action to create a new department table -->
  <change>
    <sqltest exists="false">
       select * from information_schema.tables where table_name = 'department';
    </sqltest>
    <sqlaction>
      CREATE TABLE department (
      dept_id int4 NOT NULL DEFAULT nextval('department_dept_id_seq'::text),
      dept_name VARCHAR(36) NOT NULL,
      manager_emp_id int4,
      admin_dept_id int4,
      location VARCHAR(50),
      CONSTRAINT department_dept_id_pk PRIMARY KEY (dept_id)
      );
    </sqlaction>
    <sqlaction>
      ALTER TABLE department OWNER TO pgsql;
    </sqlaction>
    <sqlaction>
      GRANT ALL ON TABLE department TO pgsql;
    </sqlaction>
 </change>



  <!-- test & action to create a new referential integrity -->
  <change>
    <sqltest exists="false">
      SELECT * FROM information_schema.referential_constraints where
      constraint_name = 'department_admin_dept_id_fk';
    </sqltest>
    <sqlaction>
      alter table department add CONSTRAINT department_admin_dept_id_fk FOREIGN KEY
      (admin_dept_id) REFERENCES department(dept_id)ON DELETE SET NULL;
    </sqlaction>
  </change>




  <!-- test & action to create a new sequence -->
  <change>
    <sqltest exists="false">
      SELECT * FROM pg_catalog.pg_statio_user_sequences WHERE relname = 'department_dept_id_seq';
    </sqltest>
    <sqlaction>
      CREATE SEQUENCE department_dept_id_seq
      INCREMENT 1
      MINVALUE 0
      MAXVALUE 9223372036854775807
      START 1
      CACHE 1;
    </sqlaction>
    <sqlaction>
      ALTER TABLE department_dept_id_seq OWNER TO pgsql;
    </sqlaction>
  </change>




  <!-- test & action to create a new index -->
  <change>
    <sqltest exists="false">
      SELECT * FROM pg_catalog.pg_statio_user_indexes WHERE indexrelname = 'department_admin_dept_id_n_idx';
    </sqltest>
    <sqlaction>
      CREATE INDEX department_admin_dept_id_n_idx ON department (admin_dept_id);
    </sqlaction>
    <sqlaction>
      ALTER INDEX department_admin_dept_id_n_idx OWNER TO pgsql;
    </sqlaction>
  </change>



</project>

The above changes.xml file is stored in source control and can be used to detect database changes between versions. A custom XML parser can be built to track changes for a particular db object.

For databases that do not support data dictionary, an alternative approach called Generic Migrations can be used:

Sample ant target:


<project name="sample_migrateDB">
  <description>Ant build file for database synchronization</description>
  <property file="${basedir}/buildMigrateDB.properties" />

  <target name="init">
    <property value="${db.user}" name="db.user" />
    <property value="${db.password}" name="db.password" />
    <property value="${db.drivername}" name="driver.name" />
    <property value="${db.url}" name="db.url" />
    <echo message="DB Name: ${db.url}" />
  </target>

  <!--  Define the build path -->
  <path id="buildMigrateDB.classpath.ref">
    <!-- ensure the following libs are present in the {lib.dir}: migratedb.jar and postgres jdbc jar -->
    
    <fileset dir="${lib.dir}">
      <include name="*.jar" />
      <include name="*.zip" />
    </fileset>
  </path>

  <property name="buildMigrateDB.classpath" refid="buildMigrateDB.classpath.ref" />
  <!-- End defining the build path  -->

  <taskdef name="dbrelease" classname="net.sf.migratedb.ant.MigrateDbTask" classpath="${buildMigrateDB.classpath}" />

  <target name="dbmigrate" depends="init" description="Create Latest Database Version">

    <dbrelease driver="${db.drivername}"
               url="${db.url}"
               userid="${db.user}"
               password="${db.password}"
               apply="true"
               verbose="true"
               <!-- location for above change xml file -->
               file="${root.dir}/database/postgres/db.xml"
    />
  </target>
</project>

Thursday, July 19, 2007

Managing Log4j Appenders @ Runtime

Many a times when debugging production issues we wish the log level would have been @ DEBUG level or some other lower level to get more insight into the issue. The usual way to debug non-obvious production issue is to reproduce it in you local development environment. In many cases it works but in some cases its just hard to reproduce the exact environment which caused the error. For instance trying to reproduce a complex hibernate transaction error and that sort. In our application we use a hell lot of webservice calls to interact with various third party hosted services. Its almost impractical to log all the SOAP interactions and many a times reproducing the environment which generated the faulty SOAP call is not possible.

For this reason we manage our Log4j appenders @ runtime wherein we can add new appenders and change priority levels of existing appenders @ runtime. Once the stacktrace or the required log is gathered, the priority levels can be reverted back to the non-voluminous/required levels.

We have a Log4j administration page in our application which can only be accessed by super ninjas a.k.a developers. This page can be used to change/add appenders at runtime. The page has sections to add new appenders and change priority levels of existing appenders.

To start with we display all the current appenders with their current priority levels. This can be achieved using the code below (for simplicity sake I am :



 // collect all current appenders
 List appenders = new ArrayList(50);
 Enumeration e = LogManager.getCurrentLoggers();

 while ( e.hasMoreElements() )
 {
    appenders.add(e.nextElement());
 }
 request.setAttribute("appenders",appenders);

 // all possible priority levels
 Priority [] prios = Priority.getAllPossiblePriorities();
 request.setAtrribute("possiblePriorities", prios);

In the JSP page you can use a simple JSTL tag to display the above collected information in a tabular format.



<%@ page import="java.util.*,org.apache.log4j.*"

....
....

<table>
<tr>
 <td> Appender </td>
 <c:forEach var="priority" items='${possiblePriorities}'>
      <td><c:out value='${priority}'/></td>
 </c:forEach>
</tr>

<c:forEach var="appender" items='${appenders}'>
   <tr>    
     <td><c:out value='${appender.getName()}'/></td>
     <c:forEach var="priority" items='${possiblePriorities}'>
      <td>
         <input type="radio" name="'${appender.getName()}'" value="'${priority}'"
            <c:if test='${appender.getChainedPriority() == priority)}'> checked </c:if>
         >
      </td>
     </c:forEach>
   </tr>
</c:forEach>

<tr>
  <td rowspan=5 align="center"> <button type="submit" name="submit" value="update">Update</button></td>
</tr>

</table>

<!-- section of page to add new appender-->

<table style="width:auto;">
  <tr>
     <td align="center">
          <input type="text" name="newLogger" size="70">
     <td>
     <td>
       <select name="newLoggerLevel">
          <c:forEach var="priority" items='${possiblePriorities}'>
             <option value="<c:out value='${priority}' />"><c:out value='${priority}' /></option>
         </c:forEach>
       </select>
     <td>
     <td>
       <button type="submit" name="submit" value="Add">Add Logger</button>
     </td>
  </tr>
</table>

The idea is to get a view of this sort:

Two events can generate from the above page:
1) add a new appender
2) update level of existing appender

Both of these events can be handled with the below servlet code:



import org.apache.log4j.*;

....
....

Enumeration e = LogManager.getCurrentLoggers();
while (e.hasMoreElements())
{
   Logger logger = (Logger) e.nextElement();
   String prio = request.getParameter(logger.getName());
   if (prio != null &&amp; prio.length() > 0)
   {
       Level p = Level.toLevel(prio);
       if (p != null && ! p.equals(logger.getEffectiveLevel()))
       {
           logger.setLevel(p);
       }
   }
}   

// add new loggers desired
String newLogger = request.getParameter("newLogger");
String newLoggerLevel = request.getParameter("newLoggerLevel");
if (newLogger != null)
{
    Level p = Level.toLevel(newLoggerLevel);
    Logger logger = Logger.getLogger(newLogger);
    logger.setLevel(p);
}

Sunday, June 24, 2007

Hibernate fetching strategies

A fetching strategy is the strategy Hibernate will use for retrieving associated objects if the application needs to navigate the association. Fetch strategies may be declared in the O/R mapping metadata, or over-ridden by a particular HQL or Criteria query.

Hibernate3 defines the following fetching strategies:

Join fetching - Hibernate retrieves the associated instance or collection in the same SELECT, using an OUTER JOIN.

<set name="permissions" fetch="join">
    <key column="userId"/>
    <one-to-many class="Permission"/>
</set

<many-to-one name="mother" class="Cat" fetch="join"/>

Can be specified at query level:


User user = (User) session.createCriteria(User.class)
                .setFetchMode("permissions", FetchMode.JOIN)
                .add( Restrictions.idEq(userId) )
                .uniqueResult();

Select fetching - a second SELECT is used to retrieve the associated entity or collection. Unless you explicitly disable lazy fetching by specifying lazy="false", this second select will only be executed when you actually access the association. Use fetch="select" or FetchMode?.Select to accomplish this.
Subselect fetching - a second SELECT is used to retrieve the associated collections for all entities retrieved in a previous query or fetch. Unless you explicitly disable lazy fetching by specifying lazy="false", this second select will only be executed when you actually access the association. Use fetch="subselect" or FetchMode?.SubSelect? to accomplish this.

* Batch fetching - an optimization strategy for select fetching - Hibernate retrieves a batch of entity instances or collections in a single SELECT, by specifying a list of primary keys or foreign keys. You may also enable batch fetching of collections. For example, if each Person has a lazy collection of Cats, and 10 persons are currently loaded in the Sesssion, iterating through all persons will generate 10 SELECTs, one for every call to getCats(). If you enable batch fetching for the cats collection in the mapping of Person, Hibernate can pre-fetch collections. With a batch-size of 8, Hibernate will load 3, 3, 3, 1 collections in four SELECTs. Again, the value of the attribute depends on the expected number of uninitialized collections in a particular Session. Batch fetching of collections is particularly useful if you have a nested tree of items, ie. the typical bill-of-materials pattern. (Although a nested set or a materialized path might be a better option for read-mostly trees.)

<class name="Person">
    <set name="cats" batch-size="3">
    ...
    </set>
</class>

Hibernate also distinguishes between:

Immediate fetching - an association, collection or attribute is fetched immediately, when the owner is loaded.
Lazy collection fetching - a collection is fetched when the application invokes an operation upon that collection. (This is the default for collections.)
"Extra-lazy" collection fetching - individual elements of the collection are accessed from the database as needed. Hibernate tries not to fetch the whole collection into memory unless absolutely needed (suitable for very large collections)
Proxy fetching - a single-valued association is fetched when a method other than the identifier getter is invoked upon the associated object.
"No-proxy" fetching - a single-valued association is fetched when the instance variable is accessed. Compared to proxy fetching, this approach is less lazy (the association is fetched even when only the identifier is accessed) but more transparent, since no proxy is visible to the application. This approach requires buildtime bytecode instrumentation and is rarely necessary.
Lazy attribute fetching - an attribute or single valued association is fetched when the instance variable is accessed. This approach requires buildtime bytecode instrumentation and is rarely necessary.

By default, Hibernate3 uses lazy select fetching for collections and lazy proxy fetching for single-valued associations. These defaults make sense for almost all associations in almost all applications.

To enable lazy property loading, set the lazy attribute on your particular property mappings:

<class name="Document">
       <id name="id">
        <generator class="native"/>
    </id>
    <property name="name" not-null="true" length="50"/>
    <property name="summary" not-null="true" length="200" lazy="true"/>
    <property name="text" not-null="true" length="2000" lazy="true"/>
</class>

Saturday, June 09, 2007

XWORK external reference resolver

Sometimes xwork action classes needs to be wired with external references, for example spring maintained resources. In such cases the dependencies are defined using a <external-ref> tag. For example: see below a sample xwork configuration wherein an action class needs to be wired with a spring maintained datasource:

<package name="auditIntegration" extends="auditResultRun">
    <action name="executeAllAudits" class="com.xxx.AllSchemasAuditAction">
      <external-ref name="dataSource" required="true">/db/dw/DataSource</external-ref>
      <param name="nestedActionName">executeAudit</param>
    </action>
  </package>

In such cases external resolvers need to be specified which perform the property setting. XWORK provides a specification using com.opensymphony.xwork.config.ExternalReferenceResolver and provides a default implementation com.opensymphony.xwork.spring.SpringExternalReferenceResolver for spring reference resolver. Custom implementation can be provided to resolve other external references like JNDI dependencies into xwork etc. In any case these external resolvers must be explicitly declared to be used by XWORK. See below for a sample declaration (extends the webwork default config):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xwork PUBLIC "-//OpenSymphony Group//XWork 1.0//EN" "http://www.opensymphony.com/xwork/xwork-1.0.dtd">
<xwork>
  <include file="webwork-default.xml"/>
  <package name="xwork-common" extends="webwork-default"
           externalReferenceResolver="com.xxx.SpringExternalReferenceResolver">
 </package>
</xwork>

Sample implementation for the external resolver (code adapted from XWORK com.opensymphony.xwork.spring.SpringExternalReferenceResolver implementation):

public class SpringExternalReferenceResolver implements com.opensymphony.xwork.config.ExternalReferenceResolver
{
    private static final Logger log = Logger.getLogger(SpringExternalReferenceResolver.class);

    /**
     * resolve the references for this invocation
     * @param invocation the invocation to resolve
     * @throws ReferenceResolverException if we had issues.
     */
    public void resolveReferences(ActionInvocation invocation) throws ReferenceResolverException
    {
        ApplicationContext ctx = {{get app context handle here}};
        if (ctx == null)
            throw new IllegalStateException("application context has not been set for this "
                                            + "external reference resolver!");

        List externalRefs = invocation.getProxy().getConfig().getExternalRefs();
        for (Iterator iter = externalRefs.iterator(); iter.hasNext(); )
        {
            ExternalReference reference = (ExternalReference) iter.next();
            if (log.isDebugEnabled())
                log.debug("resolving " + reference.getName() + " to " + reference.getExternalRef());
            if (reference.getExternalRef() == null)
            {
                throw new ReferenceResolverException(
                    "reference " + reference.getName() + " has no external ref");
            }

            Object bean = null;
            try
            {
                // no such bean exception
                bean = ctx.getBean(reference.getExternalRef());
                if (log.isDebugEnabled())
                    log.debug("resolved " + reference.getExternalRef() + " to " +
                              (bean != null ? bean.getClass().toString() : "null"));

                // other exceptions
                Map context = Ognl.createDefaultContext(invocation.getAction());
                if (log.isDebugEnabled())
                    log.debug("setting bean into property " + reference.getName() + " of " +
                              invocation.getAction().getClass());

                // unbelievably, this actually throws a RuntimeException!  Unbelievable
                OgnlUtil.setProperty(reference.getName(), bean, invocation.getAction(),
                                     context, true);
                if (log.isDebugEnabled())
                    log.debug("resolved " + reference.getName() + " to " +
                              reference.getExternalRef() + ": " + bean);
            }
            catch (NoSuchBeanDefinitionException e)
            {
                if (reference.isRequired())
                {
                    //if a dependency is required but wasn't found throw an exception
                    throw new ReferenceResolverException(
                        "Failed to find external reference: " + reference.getExternalRef(), e);
                }
                else
                {
                    log.warn("Bean '" + reference.getExternalRef() +
                             "' could not be found in spring");
                    // just keep going
                    continue;
                }
            }
            catch (Exception e)
            {
                throw new ReferenceResolverException(
                    "Failed to set external reference: " + reference.getExternalRef()
                    + " for bean attribute: " + reference.getName() + ": " + e.getMessage() +
                    " bean hashcode: " + (bean != null ? bean.getClass().hashCode() : -1), e);
            }
        }
        if (log.isDebugEnabled())
            log.debug("external reference resolution for " + invocation.getAction() + " complete.");
    }


}

WS-BPEL

- What is WS-BPEL?

An XML-based grammar for describing the logic to orchestrate the interaction between Web services in a business process. It defines a set of basic control structures like conditions or loops as well as elements to invoke web services and receive messages from services. It relies on WSDL to express web services interfaces. Message structures can be manipulated, assigning parts or the whole of them to variables that can in turn be used to send other messages.

- Why is it needed?

• Web services --> move towards service-oriented computing
• Applications are viewed as “services”
• Loosely coupled, dynamic interactions
• Heterogeneous platforms
• No single party has complete control
• How do you compose services in this domain?
• WSDL defined Web services have a stateless interaction model
• Messages are exchanged using
• Synchronous invocation
• Uncorrelated asynchronous invocations
• Most “real-world” business processes require a more robust interaction model
• Support for Messages exchanged in a two-way, peer-to-peer conversation lasting minutes, hours, days, etc.
• BPEL provides the ability to express stateful, long-running interactions

- Relationship with WSDL?

BPEL is layered on top of and extends the WSDL service model
• WSDL defines the specific operations allowed
• BPEL defines how WSDL operations are orchestrated to satisfy a business process
• BPEL also specifies extensions to WSDL in support of long-running asynchronous business processes
• Expressed entirely in XML
• Uses and extends WSDL 1.1
• Uses XML Schema 1.0 for the data model

- What is Apache ODE?
Apache ODE (Orchestration Director Engine) executes business processes written following the WS-BPEL standard. It talks to web services, sending and receiving messages, handling data manipulation and error recovery as described by your process definition. It supports both long and short living process executions to orchestrate all the services that are part of your application.

- References
• Real use cases scenarios

Friday, May 25, 2007

ACEGI Authentication Provider Examples

Acegi provides a very flexible way to configure the authentication provider. By default it provides two implementations of the authentication provider

- InMemoryDaoImpl : Retrieves user details from an in-memory list created by the bean context. So basically the list of users and their passwords are specified in the bean configuration file. (http://www.acegisecurity.org/multiproject/acegi-security/apidocs/org/acegisecurity/userdetails/memory/InMemoryDaoImpl.html). See below for sample bean configuration

<bean id="inMemoryDaoImpl" class="org.acegisecurity.userdetails.memory.InMemoryDaoImpl">
    <property name="userMap">
       <value>
           marissa=koala,ROLE_TELLER,ROLE_SUPERVISOR
           dianne=emu,ROLE_TELLER
           scott=wombat,ROLE_TELLER
           peter=opal,disabled,ROLE_TELLER
       </value>
   </property>
</bean>

- JdbcDaoImpl : Retrieves user details (username, password, enabled flag, and authorities) from a JDBC location. A default database structure is assumed, which most users of this class will need to override, if using an existing schema. This may be done by setting the default query strings used. If this does not provide enough flexibility, another strategy would be to subclass this class and override the MappingSqlQuery instances used, via the initMappingSqlQueries() extension point. (http://www.acegisecurity.org/multiproject/acegi-security/apidocs/org/acegisecurity/userdetails/jdbc/JdbcDaoImpl.html). See below for code sample to configure using a jdbc driver or a datasource. Irrespective of the database used and how a DataSource is obtained, a standard schema must exist in the database

-- USING JDBC Driver
<bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
   <property name="driverClassName"><value>org.hsqldb.jdbcDriver</value></property>
   <property name="url"><value>jdbc:hsqldb:hsql://localhost:9001</value></property>
   <property name="username"><value>sa</value></property>
   <property name="password"><value></value></property>
</bean>

-- Using Datasource
<bean id="jdbcDaoImpl" class="org.acegisecurity.userdetails.jdbc.JdbcDaoImpl">
   <property name="dataSource"><ref bean="dataSource"/></property>
</bean>

- Custom implementation : The above two implementations basically implement the UserDetailsService interface. If you have complex needs (such as a special schema or would like a certain UserDetails implementation returned), you'd be better off writing your own UserDetailsService(http://www.acegisecurity.org/multiproject/acegi-security/apidocs/index.html?org/acegisecurity/userdetails/UserDetailsService.html).

Code Sample:

<bean id="authenticationManager" class="org.acegisecurity.providers.ProviderManager">
      <property name="providers">
         <list>
            <ref local="daoAuthenticationProvider"/>
         </list>
      </property>
   </bean>

   <bean id="daoAuthenticationProvider" class="org.acegisecurity.providers.dao.DaoAuthenticationProvider">
      <property name="userDetailsService"><ref bean="UserService"/></property>    
   </bean>

   <bean id="UserService" class = "com.icrossing.xxx.CustomAuthenticationProvider"/>

-- Sample implementation

class CustomAuthenticationProvider implements org.acegisecurity.userdetails.UserDetailsService
{
   public UserDetails loadUserByUsername(String userId) throws UsernameNotFoundException, DataAccessException 
   {
        User user = null;
        GrantedAuthority[] grantedAuthorities = null;
        try {
            user = getUserDAO().lookupUser(userId);
            
            if(user==null) {
                throw new UsernameNotFoundException("Invalid User");            
            }
            
            Set roles = user.getRoles();
            int i = 0;
            grantedAuthorities = new GrantedAuthority[roles.size()];
            for (Iterator iter = roles.iterator(); iter.hasNext(); i++) {
                Role role = (Role) iter.next();
                
                GrantedAuthority authority = new GrantedAuthorityImpl(role.getRole());
                grantedAuthorities[i] = authority;
            }
        } catch (DataStoreException e) {
            throw new DataRetrievalFailureException("Cannot loadUserByUsername userId:"+userId+ " Exception:" + e.getMessage(), e);
        }

        UserDetails userDetails = new org.acegisecurity.userdetails.User(
                user.getUserId(), 
                user.getPassword(),
                user.isEnabled(), //enabled
                user.isEnabled(), //accountNonExpired
                user.isEnabled(), //credentialsNonExpired
                user.isEnabled(), //accountNonLocked
                grantedAuthorities
                );
        return userDetails;
    }
}

FTPClient Default Buffer Policy

Just found this while researching the buffer policy of org.apache.commons.net.ftp.FTPClient (Apache commons net FTP).

Methods storeFile() and retrieveFile() in FTPClient use a default buffer size of 1024 (http://jakarta.apache.org/commons/net/apidocs/org/apache/commons/net/io/Util.html#DEFAULT_COPY_BUFFER_SIZE).

Methods storeFileAsStream() and retrieveFileStream() do not use a default buffer when file type is BINARY however when file type is ASCII they use a default buffer of 1024. Here's the developer's comment as to why

// We buffer ascii transfers because the buffering has to
// be interposed between ToNetASCIIOutputSream and the underlying
// socket output stream. We don't buffer binary transfers
// because we don't want to impose a buffering policy on the
// programmer if possible. Programmers can decide on their
// own if they want to wrap the SocketOutputStream we return
// for file types other than ASCII.

Tuesday, May 22, 2007

FTPClient timeout values

In looking at the docs for org.apache.commons.net.ftp.FTPClient there are three timeouts which can be configured:

setDefaultTimeout : Set the default timeout in milliseconds to use when opening a socket. This value is only used previous to a call to connect() and should not be confused with setSoTimeout() which operates on an the currently opened socket.

setSoTimeout : Set the timeout in milliseconds of a currently open connection. Only call this method after a connection has been opened by connect().

setDataTimeout : Sets the timeout in milliseconds to use when reading from the data connection. This timeout will be set immediately after opening the data connection.

This seemed confusing so I went ahead and peeked at the source code for FTPClient and the whole thing made sense. So basically the FTPClient uses the underlying java.net.Socket and the various timeouts apply at the various stages of socket usage.

if the setDefaultTimeout is set then the underlying java.net.Socket.setSoTimeout() is set and is used default for all connections made using this FTPClient instance. It basically saves you the trouble of calling setSoTimeout after every connection establishment.

if setSoTimeout is set then the underlying java.net.Socket.setSoTimeout() is set for the current connection and at disconnect() the value reverts back to the defaultTimeout set using the setDefaultTimeout. If you call it before connecting, you'll get a NullPointerException

if setDataTimeout is set then the underlying java.net.Socket.setSoTimeout() is set before a read is performed and after the read completion the timeout value is restored to the pre-read state so basically should be called before a data connection is established (e.g., a file transfer) because it doesn't affect an active data connection. Usually when a read() method tries to read data from a socket the program will block until the data arrives. However, if you set the timeout property, the read() will only wait the specified number of milliseconds. Then, if no data was received, it will throw an InterruptedIOException. The data timeout specified blocks for each socket read() call and is not cumulative of all read calls.

It seems obvious that defaultTimeout will suffice the purpose however there might be need have read specific dataTimeouts (e.g., you don't want a 2 gb file transfer to die just because there is a 10 minute loss of connectivity)...

On another note what value is optimal to set for timeout given its implications at various stages? Online research recommends: timeout for connect : 5 secs, write/reads : 120 secs

Update: 08/23/2007
So the interesting fact is after all the babbling above I was not able to make the above timeouts to work So here's what I tried: I have a file of size 57mb and I tried setting various combinations of timeouts for upload:

- set defaultTimeout (120 secs) and dataTimeout (1200 secs) before establishing login connection --> Result: upload failed with timeout

- set defaultTimeout (120 secs) before login connection and dataTimeout (1200 secs) after login connection (1200 secs) - Result: upload failed with timeout

- set defaultTimeout (120 secs) before login connection and after login connection (1200 secs) connection --> Result: upload failed with timeout

- set defaultTimeout (1200 secs) before login connection --> Result: upload succeeded

So I tried the above @ various times to make sure I am not dealing with a nework spike or any of that sort and got the same results. I will update once I find more on this stuff.

Wednesday, May 02, 2007

OutOfMemory issue in JUnit

Recently one of my colleague encountered this issue when running a JUnit test. Inspite of increasing the -vm settings the error kept coming. On further research we found that if JUnit is set with "fork=true" the task will be executed in forked VM. So the memory settings of default VM will not effective. So, you need to set maxmemory attribute of JUnit to avoid OutOfMemory? exception.

Example:

<target name="test.class.inner" if="test.class">
      <echo message="test.classpath"/>
      <mkdir dir="${test.output.dir}"/>
      <mkdir dir="${build.dir}/tmp"/>
      <junit dir="${build.dir}" haltonfailure="yes" haltonerror="yes" printsummary="on"
            fork="true" filtertrace="true" '''maxmemory="1024m"'''>
      <sysproperty key="merchantize.env" value="test"/>
....

My Tech Blogs!!!