[edm-discuss] Re: Clustering with more DB tables

  • From: "Joseph E. Beck" <joseph.beck@xxxxxxxxx>
  • To: edm-discuss@xxxxxxxxxxxxx
  • Date: Wed, 24 Jun 2009 15:07:05 -0400


Most data mining tools are designed for a tabular representation, so you'll
have to get your data in that form to work with them.  Doing a full join
across all tables is generally inpractical, so the solution I use is to
create a table for each analysis, or, if possible, think of a table that
will generalize to a large set of analyses and create it.  This general
table often has rows you won't need for a particular analysis, so the
procedure then is to simply select the columns (and cases) you want from the
table, and feed it to your analysis package.  One problem you may encounter
is that weka does not handle large data sets very well--at least earlier

The other solution is to use a data mining package that is designed to work
on databases and structured data.  I have no experience in this area, but
perhaps others could say more?


On Wed, Jun 24, 2009 at 1:49 PM, qazmlp q <qazmlp1209@xxxxxxxxxxxxxx> wrote:

> On Wed, 24 Jun 2009 21:24:32 +0530 wrote
> >Most of the tools do work that way.
> Which way? Supporting only single table? or with multiple tables?
> >  As a simple point you can form a
> >temporary table containing all of the results for the purposes of a
> > single process.  What Database or wrapper are you using?
> I use mySQL. But I assume that the problem is a common one.
> Isn't this a very common case? Having only a single table is unrealistic
> for me.
> Regards,
> Michael
> <http://sigads.rediff.com/RealMedia/ads/click_nx.ads/www.rediffmail.com/signatureline.htm@Middle?>

Joseph E. Beck
Research Scientist
Computer Science Department, Fuller Labs 138
Worcester Polytechnic Institute

Other related posts: