The Java Data Mining API
by Benoy Jose
Introduction:
Data Mining is a very important process used by most companies today.
It includes sifting through tons of business data for potential
leads, sales analysis, audit, data warehousing, business
intelligence and many other functions. Most companies have a
variety of data sources like mainframe, databases, and files
etc. , where data is stored. Grouping and analyzing data from
disparate data sources now becomes a big problem. Most of these
individual data source providers have some API through which
data analysis can be done on the data in the database. Imagine
the plight of a companies that have data in different data
sources like Oracle, mainframe etc. and have to do analysis of
their own data. They would have to do individual analysis of the
data in each database and then consolidate the results. This is
where the JDM fits in. The Java Data Mining API (JDM) proposes a
pure Java API for developing data mining applications. The idea
is to have a common API for data mining that can be used by
clients without users being aware or affected by the actual
vendor implementations for data mining.
Architecture:
The JDM architecture consists of three logical components, the
API, the data mining Engine (DME), and the metadata repository
(MR). The API is the exposed programming interface that provides
access to the services provided by the DME. The API shields the
data mining user from the actual implementation in the DME and
any associated sub components used by the DME. The DME is the
engine that provides services that can be used by users through
the API defined above. The DME can be implemented as a server in
which case it is called a Data Mining Server. The third
component is the metadata repository (MR) which is used to
persist data mining objects. These persisted data mining objects
are again used by the DME for data mining operations. The
metadata repository can exist as a flat file system or can be a
relational database. The three logical components can be grouped
into one physical system or they can exist independently as
separate components. Apart from these JDM implementers can add
additional components and tools to enhance the vendor
implementation of the JDM. But these additional components are
not defined in the JDM specification.
New on the Java Boutique:
New Review:
Time Management Made Easy with the Quartz Enterprise Job Scheduler
Why not just use the Java timer API? This open source scheduling
API boasts simplicity, ease-of-integration, a well-rounded feature
set, and it's free!
New Applet:
Reverse Complement
Reverse Complement is a simple applet that converts DNA or RNA
sequences into three useful formats.
Elsewhere on internet.com:
WebDeveloper Java
Lots of Java information on webdeveloper.com
WDVL Java
Thorough Java resource at the Web Developer's Virtual Library.
ScriptSearch Java
Hundreds of free Java code files to download.
jGuru: Your View of the Java Universe
Customizable portal with online training, FAQs, regular news updates, and tutorials.