[edm-discuss] Re: Clustering with more DB tables

  • From: Markus Weimer <markus@xxxxxxxx>
  • To: edm-discuss@xxxxxxxxxxxxx
  • Date: Thu, 25 Jun 2009 15:36:18 +0200


one key insight here is that you want to find *new* structures in your
data when applying data mining techniques. If they would explicitly
use your, probably very biased (and I mean that in a good way) current
data structure, you would hinder the overall goal of the data mining

The least common denominator and a rather unbiased one when it comes
to structure in data mining is a vector of attributes, hence the
methods focus on this.  And with only 500 dimensions, running the join
shouldn't be that much of a resource hog, either. Just make sure that
any line of the resultset corresponds to an entity you want to operate

Hope that helps,


On Thu, Jun 25, 2009 at 2:39 PM, qazmlp q<qazmlp1209@xxxxxxxxxxxxxx> wrote:
> On Thu, 25 Jun 2009 01:10:23 +0530 wrote
>>Well then depending upon your tool then I would suggest using a sql
>>interface as suggested by Joe.
>>Many of the better tools will let you form join queries that can acheive
>>the benefits of a single table without the costs.
> Ok, everybody seems to suggest the same.
> I have 65 tables, and if I assume on average that each table has around 10
> attributes/columns, I will have to arrive at a single very big table with
> more than 500 attributes. Do you really mean that? If yes, I seriously feel
> we have a very inefficient(atleast from a layman point of view) way of
> performing the Data mining operations.
> Aren't there any other better way of dealing this very common issue?
> Regards,
> Michael

Other related posts: