[edm-discuss] Re: Clustering with more DB tables

  • From: Markus Weimer <markus@xxxxxxxx>
  • To: edm-discuss@xxxxxxxxxxxxx
  • Date: Thu, 25 Jun 2009 15:36:18 +0200

Hi,

one key insight here is that you want to find *new* structures in your
data when applying data mining techniques. If they would explicitly
use your, probably very biased (and I mean that in a good way) current
data structure, you would hinder the overall goal of the data mining
procedure.

The least common denominator and a rather unbiased one when it comes
to structure in data mining is a vector of attributes, hence the
methods focus on this.  And with only 500 dimensions, running the join
shouldn't be that much of a resource hog, either. Just make sure that
any line of the resultset corresponds to an entity you want to operate
on.

Hope that helps,

Markus

On Thu, Jun 25, 2009 at 2:39 PM, qazmlp q<qazmlp1209@xxxxxxxxxxxxxx> wrote:
> On Thu, 25 Jun 2009 01:10:23 +0530 wrote
>>Well then depending upon your tool then I would suggest using a sql
>>interface as suggested by Joe.
>>
>>Many of the better tools will let you form join queries that can acheive
>>the benefits of a single table without the costs.
> Ok, everybody seems to suggest the same.
>
> I have 65 tables, and if I assume on average that each table has around 10
> attributes/columns, I will have to arrive at a single very big table with
> more than 500 attributes. Do you really mean that? If yes, I seriously feel
> we have a very inefficient(atleast from a layman point of view) way of
> performing the Data mining operations.
> Aren't there any other better way of dealing this very common issue?
>
> Regards,
> Michael
>

Other related posts: