Exp not working in Oracle 11gR2

Starting from Oracle 11gR2, exp is deprecated and will exclude empty tables from the dump file (not even the table definition). The fact that exp is deprecated does not make it less than a bug, especially when the proposed replacement expdp is a DBA-only tool that generates the dump files on the database server...

At least for this release, setting DEFERRED_SEGMENT_CREATION to false and issuing alter table T move; and alter index I rebuild; for all tables and indexes created previously seems to make exp to work a little better. For the next releases this may not work and there may not be any suitable workaround.

Oracle really needs to rethink expdp and make it work from a client installation only, the workaround that involves installing another Oracle DB of the exact same version and connecting the 2 databases via a DB link is nothing short of absurd.


Better logging on ATG 2007.1 and JBoss

Here's a way to have actual Nucleus component paths in log files with ATG 2007.1 and JBoss: create a localconfig/atg/dynamo/service/logging/ScreenLog.properties file with the following content
useNucleusPathForClassName=true
useFullPaths=true
The result is:
15:52:15,650 INFO  [/atg/registry/PipelineRegistry] Starting Pipeline Registry.
15:52:16,232 INFO  [/atg/reporting/md/DatasetDomain] Domain service started
15:52:16,442 INFO  [/atg/reporting/ChartTemplateScheduler] DSS Template scheduler started
15:52:16,754 INFO  [/atg/epub/file/RepositoryGroupRefresherService] Refreshed repository groups
15:52:16,847 INFO  [/atg/epub/file/RestartableComponentCheckinListener] Restartable component checkin listener started up successfully.
15:52:17,323 INFO  [/atg/epub/DeploymentServer] no targets defined in topology definition file
This configuration could possibly cause a small performance hit, but in my view makes the log much more useful and it should become the default.

One could play also with the Log4J configuration, for example this pattern in log4j.xml
<param name="ConversionPattern" value="**** %-5p %d{ABSOLUTE} %c{1} %m%n"/>
gives a more familiar look (for those accustomed to DAS, anyway):
**** INFO  15:57:29,842 /atg/epub/file/RepositoryGroupRefresherService Refreshed repository groups
**** INFO  15:57:29,936 /atg/epub/file/RestartableComponentCheckinListener Restartable component checkin listener started up successfully.
**** INFO  15:57:30,552 /atg/epub/DeploymentServer no targets defined in topology definition file

The Usual Suspects

I decided to try some of the most known Java IDE on the market to see which one would work better for me: the idea was to install each IDE and work with it for at least a week on a real project, to test if I could adapt to a new IDE and the IDE could adapt to my working patterns in a relatively short time and in a "real world" situation. I wanted a comparison that made sense to me at this time and in the context of this particular project, not trying every single feature of each IDE: I wanted to test

  • editing and refactoring code
  • cvs integration
  • configurability (how much the IDE can adapt and how much it gets in my way of working)
  • remote debugging
  • speed of operation

I didn't care at all about Web services, JUnit integration, JSF support, etc.: all of this does not matter to me at this particular time.

Setup and constraints
The setup on this project is a little unusual, but there are good reasons for it: the IDE runs on my laptop, where the code being edited and ran is on another workstation, mounted as a Windows shared drive (Z:\) on the laptop. The code is a multi-module ATG application, consisting of about several hundreds of Java classes, properties files and JSP pages: all the files are kept in a CVS server, and the CVS working directory is also the ATG module directory so that only one copy of each file exists at any given time. IDE-specific project files should not go into CVS, to avoid conflicts between workstations with different path names.

Eclipse 3.2.1
Despite my good intentions, Eclipse did last only half a day... I had configured my workspace and the project files on drive D:\, then attempted to configure the project source folders via the Link Source feature: it worked fine up to the point of checking out the sources from CVS, when I discovered that Eclipse is unable to check out/check in. There is a long standing bug that apparently won't be fixed.

This is a pity, because Eclipse is a decent tool despite its shortcomings and many inconsistencies (often debugging cannot "see" the project sources, 2 or 3 different perspectives to work with source control, etc.). I tried to run it on the workstation, but adding it to the 3/4 ATG instances already running was asking too much to the hardware.

NetBeans 5.5
NetBeans was more willing to cooperate with my setup, and it lasted a whole week of intense work. Nothing much to say about the CVS integration and the project organization, at least it allowed me to place the files where I wanted them to be; the speed of the IDE was something I could live with, even if not extraordinary, but the editor had its own share of problems:
  • at first I could not find how to generate getters and setters for a class member variable, then I found I must select the variable and choose Refactor > Encapsulate Field: I found that counterintuitive but acceptable, I could not accept however that it didn't respect the code conventions I configured ("m" prefix for member variables, "p" prefix for parameters)
  • saving a file took 30 seconds on average, probably due to the working directory being on a shared drive and the IDE trying to synchronize too many files in the proceeding
  • the JSP code assistance produced weird code, for example I typed <dsp:param name="propertyName" value="phoneNumber"/> and then Enter, and it added a </dsp:param> that makes no sense
  • after writing a Java method will all of its parameters and return type, typing "/**" to start a javadoc comment does not generate the @param and @return tags
  • very often the remote debugger didn't work, in many cases I couldn't get the local variables of a method (Evaluating... was displayed for minutes in the debugger panel) and in some other cases it hung up the whole IDE (the only recurse was force-quitting it)
  • once, after an intense, day-long editing session I had an icon flashing in the lower-right corner of the IDE window with a "too many window handles" message: switching between Windows process with Alt-Tab gave weird results, with windows flashing all over the screen, and in NetBeans I couldn't do anything; in the end I had to kill it
So NetBeans was better than Eclipse in my situation (I could at least *use* it to work) and it didn't have many of Eclipse's interface annoyances: however I didn't find it good enough to keep working with it.

JDeveloper 10.1.3.1.0
I spent more time setting up a project with JDeveloper than with the other IDEs as I didn't understand at first the system and application navigators: what I want is see the files as they are laid out on disk, and only then maybe overlay a custom view on top. I discovered in the end that setting the "flat level" to 1 in the application navigator came close to what I want. I liked the structure panel of the JSP editor, but I couldn't find an option to open JSPs by default in the source editor rather than in the visual editor, which was useless to me; saving files was even slower than in NetBeans and refactoring was sometimes bizarre (renaming a dynamohttpservletrequest method parameter to pReq produced invalid syntax...).

I don't have much else to say about JDeveloper because in the end it lasted less than one day... In a couple of occasions it took 99% of the CPU and swallowed more than 680MB of RAM, maybe the sources on a shared drive bothered it but I didn't spend too much time analyzing the problem and quickly proceeded to the next one.

IntelliJ IDEA 6.0.4
I know, I know... this is not a free IDE. But I never said I would limit myself to free offering, and by the way in my tests it came out as the best of the pack. I was able to configure a multi-module project without difficulties, the only trick is to start small (for example with a Java module) and then add the config directory and the JSP directories either as a source directory or as a separate web module. The editor is the best part of the product, fast, straightforward to use and full of class acts (try selecting a method parameter, hit Shift-F6, type something and see what happens to the method source code). I liked very much the multi-language support, very handy when editing a complex JSP page that uses several JavaScript libraries and fragments; code assistance is available for XML documents even when a DTD is not loaded, after a while the editor "learns" the syntax and pops up a choice from the previously entered tags and attributes.
The CVS integration worked well for the basic operations of check-ins, check-outs and version comparison, I hadn't however had the opportunity to try to tagging or merging of branches so I can't say how it compares to Eclipse's. Debugging also worked fine, the basics (setting breakpoint, evaluating expressions, hotswap reloading) did just what I expected to do without getting in the way.

And that's the major decider for me... IDEA simply worked without getting in my way with arbitrary limitations. Where it shines is in the editor, the configurability and the speed at which everything operates: in a "shopping list" of features it might lose to the others, but it wins because the implementation and the design feel more consistent to me.

Stop the pollution in WEB-INF/classes

Of the 12 or so Java libraries used in this site, 6 of them by default have their configuration files in the CLASSPATH:
  • log4j (log4j.properties / log4j.xml)
  • commons-logging (commons-logging.properties)
  • XWork (xwork.xml)
  • WebWork (webwork.properties, validators.xml plus all the <action>-validator.xml)
  • OSCache (oscache.properties)
  • iBATIS (the sql map configuration and definition)
Various resource bundle also litter the CLASSPATH, thanks mainly to WebWork action messages and global resource bundle. Enough of that for me, time to end that incestuous relationship, stop wasting time with .cvsignore and svn:ignore: I want Java classes in the CLASSPATH and nothing else. I could convince at least log4j and Ibatis to look elsewhere:
  • log4j has PropertyConfigurator.configureAndWatch(), where the argument is the full path of WEB-INF/config/log4j/log4j.properties
  • Ibatis has SqlMapClientBuilder.buildSqlMapClient(), where the argument is a java.io.Reader from the full path of WEB-INF/config/ibatis/sqlMapConfig.xml (nice side effect, I can freely reload my Sql map definitions at runtime)
For resource bundles I now use a database, a simple table with (key, language, message) and a class that implements ResourceBundle, no more .properties and native2ascii to deal with. As you see, I've started to use WEB-INF/config and subdirectories, whose full paths are determined via ServletContext.getRealPath("/") + "/WEB-INF/config". The benefits:
  • the files won't be served to HTTP clients
  • very easy to put under source control, no gymnastics with .cvsignore or svn:ignore required
There is one disadvantage, it won't work if the web application runs from a compressed WAR file: my guess that WEB-INF/classes became a popular dumping ground because other than Thread.getContextClassLoader().getResource() there is no portable way of reading files in a web application. How many web application actually run from a compressed WAR file? What's the rationale for having it in the Servlet spec?

Never send email from a web page...

... synchronously. Recently I had to deal with nonsensical email bounces from a site I occasionally update, errors of the type:
unrouteable mail domain "hotmail.com"
Of course, domains such as hotmail.com, yahoo.com, etc. are perfectly capable of receiving all the messages one wants to throw at them, so the problem was in our own mail server. It turns out that the mail server returns this bogus message when you've tried to send too many message over a certain period (say for example more than 20 pieces per minute, the actual limit may vary from configuration to configuration), and to make matters worse, the limit is a global one: all the sites hosted on the same box concur to the limit, so if site A sends 15 pieces and site B sends 5, then site C would not be able to send email at all in that minute.

My site was indeed sending messages synchronously, after a form submission, and it tried to send more than 250 pieces each time; this also made the user interface completely unresponsive, at the point that some users repeatedly clicked on the "Send" button, aggravating the problem even further. Also, all the bounces arrived in my inbox...

Here's how I solved the problem: I created a new database table EMAIL_MESSAGES to store the messages with the following structure
create table EMAIL_MESSAGES (
  id integer not null primary key,
  sender varchar2(200),
  reply_to varchar2(200),
  destination varchar2(200),
  subject varchar2(200),
  body clob,
  creation_date date default sysdate,
  sent_date date,
  sent_flag integer default 0
)

and then changed the form submission code to insert into the table a copy of the message for each recipient. I created a script that selects X messages maximum with sent_flag = 0 and sends each copy, setting the flag to 1 afterwards: the trick is scheduling the script with the correct frequency, not too often to avoid crossing the limits and not too rarely to freak out people not receiving emails. We also can selectively resend only the messages that bounced and not the whole lot, and the user interface has greatly improved since it returns immediately after the form submission. Why this solution is scalable? Because at any time I could easily schedule another instance of the script to send via a different SMTP server...

Old messages do not stay in the database indefinitely, actually the first thing the script does is deleting messages older than 15 days, irrespective of whether they've been sent successfully or not.

While my actual implementation was in PHP/MySQL (difficult to find a worse combination...) the principles of this solution are applicable to any language and database: for example Tom Kyte wrote a while back how to do the same thing with the Oracle database, ATG has the TemplateEmailSender class and many others undoubtedly exist out there. Make sure you look for one of them the next time you have to send some emails.

On my bookshelf: Agile Project Management with Scrum

Agile Project Management with Scrum (Microsoft Professional)

The book is interesting in the way it presents case studies of applying Scrum to software development projects in different kind of organizations, and in what one has to do to stop relying on meaningless Gantt charts and Microsoft Project constructions to plan and report project progress. I felt that the day-to-day activities of Scrum needed a bit more space than the appendix, and to how Scrum can be integrated with other practices, such as task estimation techniques: for example, we're told that at the Project Backlog meeting the Team should estimate the work it can do in a Sprint, and while I agree that the estimating activity isn't part of Scrum I would find very useful mentioning which kind of estimating technique can be used in such a short time.

Also the treatment of Scrum for fixed-price, fixed-date contracts is way too light for me, as it is one of the situations I'm most likely to find myself into (I find very interesting that fixed-scope doesn't get mentioned along fixed-price and fixed-date). For this reason and for the anecdotical tone of the book, my global judgment is fair: it was difficult not to feel that a Scrum project cannot be successful unless the author is present...

Wiping out an ATG repository

Or, wiping out all the data in an ATG repository. Sometimes you have to do it, for example to reset some tables to start from scratch: ATG provides a documented operational tag remove-all-items that can be written in an XML file executed with startSQLRepository. This tag demands an item-descriptor attribute, but to completely wipe out a repository you'll have to copy/paste the descriptor names from the repository definition file.

Here's a very quick and dirty page to produce the XML file:
<%@ page contentType="text/xml;charset=UTF-8" import="atg.repository.Repository" %>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE gsa-template SYSTEM "dynamosystemresource:/atg/dtds/gsa/gsa_1.0.dtd">
<gsa-template>

<%@ taglib uri="/dspEL" prefix="dsp" %>

<%--
  Generates an XML file that contains the "remove-all-items" tags to remove all items of all item
  descriptors in the repository component passed as the repository parameter.
--%>

<dsp:droplet name="/atg/dynamo/droplet/IsEmpty">
  <dsp:param name="value" param="repository" />
  <dsp:oparam name="false">
    <%
      // we need a Dynamo request to be able to resolveName()
      DynamoHttpServletRequest req = atg.servlet.ServletUtil.getDynamoRequest(request);
      Repository rep = (Repository)req.resolveName(request.getParameter("repository"));
      if (rep != null) {
        String[] itemDescNames = rep.getItemDescriptorNames();
        if (itemDescNames != null) {
          for (int i = 0; i < itemDescNames.length; i++) {
            // emit the remove-all-items tag
            out.print("<remove-all-items item-descriptor=\"" + itemDescNames[i] + "\" />" + "\r\n");
          }
        }
      }
    %>
  </dsp:oparam>
</dsp:droplet>

</gsa-template>
Save this page in a webapp and call it with specifying the repository to be wiped out:
http://localhost:8840/default/remove-all-items.jsp?repository=/TestRepository
You can then save the result in an XML file in the localconfig directory and execute it with:
bin/startSQLRepository -m module -repository /TestRepository /test-repository-remove-all-items.xml
See again the documentation for the system property to set to make it all work.

Cannot find web application MyWebApp in web app registry WebApplicationRegistry

I'm sure it happened to most people, sooner or later. The problem: you're trying to send an email from a scenario or from a workflow, and the action fails with the following message
**** Error      Thu May 26 18:05:06 CEST 2005   1117123506889
  /atg/epub/workflow/process/WorkflowProcessManager
  Error executing action emailNotifyTaskActors[taskElementId=6,
  scenarioPathInfo=MyWebApp:/notification/notifyToPreviewApproval.jsp,
  recipientSet=permittedActors]
  on process instance 42700003; deleting the process instance
  atg.process.ProcessException: Cannot find web application MyWebApp
  in web app registry WebApplicationRegistry

It seems there's nothing strange in the scenario or workflow editor, yet the email doesn't get sent and the log shows the error messages. There are 2 (documented) approaches to solve the problem:
  1. configure MyWebApp in the StaticWebAppRegistry:
    create a MyWebApp/config/atg/registry/webappregistry/MyWebApp.properties file with
    $class=atg.service.webappregistry.WebApp
    properties=\
            display-name=MyWebApp,\
            appState=started,\
            context-root=/mywebapp
    
    and a MyWebApp/config/atg/registry/webappregistry/StaticWebAppRegistry.properties with
    preConfiguredWebApps+=/atg/registry/webappregistry/MyWebApp
    
  2. let NucleusServlet configure MyWebApp in the ServletContextWebAppRegistry:
    add to MyWebApp's web.xml file a display-name tag and invoke NucleusServlet at the webapp's startup
    <display-name>MyWebApp</display-name>
    ...
    <servlet>
      <servlet-name>NucleusServlet</servlet-name>
      <display-name>NucleusServlet</display-name>
      <servlet-class>atg.nucleus.servlet.NucleusServlet</servlet-class>
      <load-on-startup>1</load-on-startup>
    </servlet>
    <servlet-mapping>
      <servlet-name>NucleusServlet</servlet-name>
      <url-pattern>/nucleus</url-pattern>
    </servlet-mapping>
    

How to explain thread-safety in another language?

In between a site redesign, learning Python and being swamped with work, I have an issue: some of the people I'm working with configured a couple of instances of ItemLookupDroplet in request scope. The reasoning behind is that each invocation of this droplet will not interfere with others.

As the documentation states, this droplet communicates with a page with input, output and open parameters, there aren't member variables. Since each invocation of this droplet pushes a context onto a stack and is stateless, there is no need to instantiate a droplet per request: a droplet in the default global scope would suffice. A droplet in request scope is a very special case, and I don't remember having seen any.

But the real problem is: my French is still limited, and I'm aware that I can't explain this concept as well as I'd like. It's frustrating and not productive, so are there any online resources to which I can refer?

On my bookshelf: The Data Warehouse Toolkit

The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses

I wanted to read this book to get an overview about data warehousing, a subject I never really had the opportunity to tackle in my career so far: the book seems indeed aimed to people with some experience in enterprise applications, especially in the OLTP field, and is full of very practical and no-nonsense advice gathered from the author's experience.

The book is written in a easy-to-follow style, and domain-specific terminology is introduced gradually; the first chapter outlines the difference between a (dimensional) data warehouse (DW) and a OLTP database, and why someone would want a data warehouse. Database design and normalization being one of my favorite subjects, getting to grips with dimensional modeling at first took me some time, and some faith in the author: however, the book rather clearly shows how dimensional modeling serves different needs from "traditional" entity-relationship modeling, and why there are differences between the two.

There is a comprehensive example of dimensional modeling for a grocery store, which gives the opportunity to introduce other elements of dimensional modeling such as the treatment of data that changes with time. There are other chapters that I didn't find much interesting, such as the examples of financial services and insurances; on the other hand, Chapter 12 is excellent, with a description of a methodology, of some usual design decision and suggestions for a DW design team interviewing end-users and IS people. There are not only good questions to ask, but also good explanation of why one should ask them... I'm probably going to adapt and use many of the questions described for the projects I'm involved into.

Throughout the whole book, there are clearly marked design principles, which are described and exemplified in the main text. At the end of the book there is a checklist of the technical steps involved in building a dimensional data warehouse and a glossary of the terminology (very useful to understand the vendors literature).

The book is dated 1996, but it's nevertheless interesting to read the chapters about front-end applications and the future and try to match the text with the features advertised in today's RDBMS and database tools.