Articles : The Java Data Mining API :

Data Mining Tasks:

The main tasks involved in Data mining are listed and explained below, building a model, combining the model and data, testing the model, importing and exporting mining objects and analyzing data.

The JDM provides a mechanism to build models based on the functions we described above. To build a model a task needs to be defined with the model name, the data and the settings, which define the kind of model required. The attributes of the build data that would be used in the model will be defined in the settings. A build task is defined for building the model based on the settings defined above and the function. Finally when model is built it is stored in the metadata repository.

The next important task after defining the model is to test it. Testing gives information about the accuracy of the model in mining the data. The testing task takes the model and test data as input and returns a TestResult object back. The content of the TestResult would depend on the model being used. Classification models produce a confusion matrix, while regression models produce error estimates. Apart from this the user can specify to compute lift information in the testing task (Lift, confusion matrix, and error estimates are discussed in the Data mining terms section).

Now comes the task of applying the model, when a model is applied it results in one or more predictions. In supervised mining, application of a model produces predictions along with probabilities. In unsupervised mining application of the model assigns a case to the cluster.

The JDM specification also defines tasks to export and import data, which are not very frequently used. The JDM allows the import and export of the system metadata in XML format, Java serialized object format and other proprietary formats. Since XML is the most common format for data transfer, the JDM specifies two standard definitions for data mining metadata in XML, PMML and CWM. Importing and exporting data is usually done to exchange model information with other DMEs or to persist model information to some other type of storage other than the metadata repository.

Another very important task of data mining is to analyze the data. For this purposethe JDM provides a computer statistics task to calculate statistics on data. The details of what statistics are to be calculated is left to the vendor implementing the specification.

Conclusion:

The Java Data Mining API's goal is to provide a unified API for all data mining applications or clients. As of today individual database providers have their own implementations of data mining, which makes it difficult for clients who have multiple data sources to consolidate their data. The specification just finished its second public review and is waiting for the approval ballot. Some of the big players in the database industry like IBM and Oracle are in the expert group of this specification. This gives developers some confidence that these database vendors might start implementing the specification for their own database systems in the near future. Once there is some momentum, other database vendors will follow suit. But seeing a unified API that can be used on any data from any vendor would take some time. For more information on the Specification take a look at the JCP site (JSR 73).


Benoy Jose is a web developer with over six years of experience in J2EE and Microsoft technologies. He is a Sun Certified programmer and enjoys writing technical and non-technical articles for various magazines.

Print Article

How to Add Java Applets to Your Site

New on the Java Boutique:

New Review:

Time Management Made Easy with the Quartz Enterprise Job Scheduler
Why not just use the Java timer API? This open source scheduling API boasts simplicity, ease-of-integration, a well-rounded feature set, and it's free!

New Applet:

Reverse Complement
Reverse Complement is a simple applet that converts DNA or RNA sequences into three useful formats.

Elsewhere on internet.com:

WebDeveloper Java
Lots of Java information on webdeveloper.com

WDVL Java
Thorough Java resource at the Web Developer's Virtual Library.

ScriptSearch Java
Hundreds of free Java code files to download.

jGuru: Your View of the Java Universe
Customizable portal with online training, FAQs, regular news updates, and tutorials.