Data Mining Tasks:
The main tasks involved in Data mining are listed and explained
below, building a model, combining the model and data, testing
the model, importing and exporting mining objects and analyzing
data.
The JDM provides a mechanism to build models based on the functions
we described above. To build a model a task needs to be defined
with the model name, the data and the settings, which define the
kind of model required. The attributes of the build data that
would be used in the model will be defined in the settings. A
build task is defined for building the model based on the
settings defined above and the function. Finally when model is
built it is stored in the metadata repository.
The next important task after defining the model is to test it.
Testing gives information about the accuracy of the model in
mining the data. The testing task takes the model and test data
as input and returns a TestResult object back. The content of
the TestResult would depend on the model being used.
Classification models produce a confusion matrix, while
regression models produce error estimates. Apart from this the
user can specify to compute lift information in the testing task
(Lift, confusion matrix, and error estimates are discussed in
the Data mining terms section).
Now comes the task of applying the model, when a model is
applied it results in one or more predictions. In supervised
mining, application of a model produces predictions along with
probabilities. In unsupervised mining application of the model
assigns a case to the cluster.
The JDM specification also defines tasks to export and import
data, which are not very frequently used. The JDM allows the import and
export of the system metadata in XML format, Java serialized object
format and other proprietary formats. Since XML is the most
common format for data transfer, the JDM specifies two standard
definitions for data mining metadata in XML, PMML and CWM.
Importing and exporting data is usually done to exchange model
information with other DMEs or to persist model information to
some other type of storage other than the metadata repository.
Another very important task of data mining is to analyze the
data. For this purposethe JDM provides a computer statistics
task to calculate statistics on data. The details of what
statistics are to be calculated is left to the vendor
implementing the specification.
Conclusion:
The Java Data Mining API's goal is to provide a unified API for
all data mining applications or clients. As of today individual
database providers have their own implementations of data
mining, which makes it difficult for clients who have multiple
data sources to consolidate their data. The specification just
finished its second public review and is waiting for the approval
ballot. Some of the big players in the database industry like
IBM and Oracle are in the expert group of this specification.
This gives developers some confidence that these database
vendors might start implementing the specification for their own
database systems in the near future. Once there is some momentum,
other database vendors will follow suit. But seeing a unified
API that can be used on any data from any vendor would take some
time. For more information on the Specification take a look at the
JCP site (JSR 73).
Benoy Jose is a web developer with over six years of experience
in J2EE and Microsoft technologies. He is a Sun Certified
programmer and enjoys writing technical and non-technical
articles for various magazines.
New on the Java Boutique:
New Review:
Time Management Made Easy with the Quartz Enterprise Job Scheduler
Why not just use the Java timer API? This open source scheduling
API boasts simplicity, ease-of-integration, a well-rounded feature
set, and it's free!
New Applet:
Reverse Complement
Reverse Complement is a simple applet that converts DNA or RNA
sequences into three useful formats.
Elsewhere on internet.com:
WebDeveloper Java
Lots of Java information on webdeveloper.com
WDVL Java
Thorough Java resource at the Web Developer's Virtual Library.
ScriptSearch Java
Hundreds of free Java code files to download.
jGuru: Your View of the Java Universe
Customizable portal with online training, FAQs, regular news updates, and tutorials.