Managing large physical mailings ...

  • From: stv <stvsmth@xxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Mon, 2 Oct 2006 16:49:18 -0600

Hi all,

We're a non-profit that does quite a bit of physical mailings:
newsletter, catalogs, etc. There are three data sources and there's
quite a bit of overlap. It'll take some time to clean that mess up ...
meanwhile we'll have to work with what we have.

I'm new here and I have a few weeks before the next "big" mailing goes
out (only 20,000). I've never dealt with bulk snail mailing before,
but it seems to me the problem of isolating duplicates from multiple
data sources was solved many, many, many years ago.

Do y'all do this much yourself? Or do you outsource it? If you do it
yourself, what advice would you have for a newbie in this area?

Are there generic queries or regex that I can start from and expand?
We are dealing solely with US addresses and our databases use the
basic first, last, addr1, addr2, city, st, zip columns.

It should be noted that we'd prefer to over-filter on folks and save
on postage; if we miss a few folks because they cannot type in a
proper two-letter state code, then oh well. We're not mailing out
1099s are anything.

I would think the obvious steps are to

1) Normalize the data: all upper, strip some punctuation, check for
valid state (2 char) & zips (5nums and/or ???) maybe clean up the
street/st Ave/ave situations. Maybe. However, this seems endless: 4th
vs Fourth, all the combinations of apartment number ids, etc. Surely
there are some existing queries for this

2) Remove obvious duplicates

3) Create some compound "keys" addr1 || addr2 || zip  ... possibly a
number of others in order to find less obvious duplicates.

3) Probably need to do some selects & see if the proposed dupes are
actually dupes; refine the queries and/or keys; repeat.

4) Remove less obvious duplicates.

The previous DBA did something like the above, but he would strip all
rows with same first/last name :( Not the end of the world, as I said,
but I think we can be a bit more sophisticated than that. Also, no
normalization of data before creating compound keys, aside from UPPER.
We had a bunch of people complain about multiple mailings from the
last round, so I'm not too high on re-using his stuff.

--Steve Smith
--
//www.freelists.org/webpage/oracle-l


Other related posts: