[edm-discuss] Re: Clustering with more DB tables

  • From: "Joseph E. Beck" <joseph.beck@xxxxxxxxx>
  • To: edm-discuss@xxxxxxxxxxxxx
  • Date: Thu, 25 Jun 2009 09:47:38 -0400

As a counterpoint, I don't think I've ever built a table with 500
attributes.  Typically, you're not going to need every attribute for an
analysis, so build a table with just the attributes you need.  I find it
easier to think about and error check smaller tables, and if you have a lot
of rows 500 attributes can be a problem for a lot of packages.

As to whether it's better to use automated techniques (like the correlation
approach Ryan mentioned) or knowledge based ones (i.e. thinking through what
will be useful) for shrinking from 500 rows to something more manageable,
that comes down to preference in how to conduct research, and is a much
longer discussion.


On Thu, Jun 25, 2009 at 8:51 AM, Ryan S.J.d. Baker <rsbaker@xxxxxxx> wrote:

> 500 attributes is really not that big.
> The data sets I work with are definitely not large feature space,
> and I often work with tables with over 500 features, once I've
> derived my set of composite features.
> The kinds of tables one sees in publications at KDD (a principal data
> mining conference) can get much, much, much, much
> bigger than 500 attributes. By several orders of magnitude.
> If your preferred DM algorithm can't handle data this size, you might
> check out the literature on dimensionality reduction. I particularly
> like Yu & Liu's work on fast correlation-based filtering.
> Cheers,
> Ryan
> > I have 65 tables, and if I assume on average that each table has around
> 10
> > attributes/columns, I will have to arrive at a single very big table with
> > more than 500 attributes. Do you really mean that? If yes, I seriously
> feel
> > we have a very inefficient(atleast from a layman point of view) way of
> > performing the Data mining operations.
> > Aren't there any other better way of dealing this very common issue?
> >
> > Regards,
> > Michael
> >

Joseph E. Beck
Research Scientist
Computer Science Department, Fuller Labs 138
Worcester Polytechnic Institute

Other related posts: