[edm-discuss] Re: Clustering with more DB tables

  • From: "Ryan S.J.d. Baker" <rsbaker@xxxxxxx>
  • To: edm-discuss@xxxxxxxxxxxxx
  • Date: Thu, 25 Jun 2009 08:51:34 -0400

500 attributes is really not that big.

The data sets I work with are definitely not large feature space,
and I often work with tables with over 500 features, once I've
derived my set of composite features.

The kinds of tables one sees in publications at KDD (a principal data
mining conference) can get much, much, much, much
bigger than 500 attributes. By several orders of magnitude.

If your preferred DM algorithm can't handle data this size, you might
check out the literature on dimensionality reduction. I particularly
like Yu & Liu's work on fast correlation-based filtering.


> I have 65 tables, and if I assume on average that each table has around 10
> attributes/columns, I will have to arrive at a single very big table with
> more than 500 attributes. Do you really mean that? If yes, I seriously feel
> we have a very inefficient(atleast from a layman point of view) way of
> performing the Data mining operations.
> Aren't there any other better way of dealing this very common issue?
> Regards,
> Michael

Other related posts: