[codeface] Re: [PATCH 3/3] Add global email analysis to database

  • From: Andreas Ringlstetter <andreas.ringlstetter@xxxxxxxxxxxxxxxxxxxx>
  • To: <codeface@xxxxxxxxxxxxx>
  • Date: Mon, 26 Oct 2015 14:56:24 +0100



Am 26.10.2015 um 14:15 schrieb Mitchell Joblin:

On Mon, Oct 26, 2015 at 12:43 PM, Andreas Ringlstetter
<andreas.ringlstetter@xxxxxxxxxxxxxxxxxxxx> wrote:
Good Morning,

Am 26.10.2015 um 11:54 schrieb Mitchell Joblin:
+
+-- -----------------------------------------------------
+-- Table `codeface`.`mail`
+-- -----------------------------------------------------
+DROP TABLE IF EXISTS `codeface`.`mail` ;
+
+CREATE TABLE IF NOT EXISTS `codeface`.`mail` (
+ `id` BIGINT NOT NULL AUTO_INCREMENT,
+ `projectId` BIGINT NOT NULL,
+ `emailId` VARCHAR(255) NOT NULL,

Why are you using a varchar for an ID?
Or is this supposed to be the rfc2111 "Message-ID"?
Why "NOT NULL" in the latter case? It's optional.

This is an id created during the corpus generation, its not the
Message-ID. I originally thought we needed to maintain this for some
local operations on the corpus but I think we won't have to store it.

Ah, the local message ID generated by snatm? I don't think that can be
reused in any way.


Also questioning the length,
https://tools.ietf.org/html/rfc4130#section-5.3.3 states that the ID
should be less than 255 characters, but that's not enforced. Neither the
code nor the database scheme contains any safeguard for this case.

And if it actually is the "Message-ID", why is the corresponding
"Reply-To" field discarded? (Which should btw. be resolved prior to
storing it in the database.) Without it, the thread structure can't be
reconstructed.

Also, if it's the Message-ID, is it the mangled form without (optional)
brackets or the raw form?

+ `threadId` BIGINT NOT NULL,
+ `mlId` BIGINT NOT NULL,
+ `author` BIGINT NOT NULL,
+ `subject` VARCHAR(255) NULL DEFAULT NULL,

This will break as soon as a thread contains only a single subject line
longer than expected.
Is it even necessary to record the subject? It shouldn't differ to much
from the parent topic?
And why allowing NULL? It's a mandatory field AFAIK.

This is currently how we handle the subject line in the mail_thread
table. If we want to handle the subject in a different way then please
provide a patch for the mail_thread table as well so that we can
handle it uniformly. Maybe Wolfgang can provide the rationalization
for using VARCHAR(255) for the subject. So far we have had no issues.


+ `creationDate` DATETIME NOT NULL,
+ PRIMARY KEY (`id`),
+ INDEX `mail_author_idx` (`author` ASC),
+ INDEX `mail_projectId_idx` (`projectId` ASC),
+ INDEX `mail_mlId_idx` (`mlId` ASC),

Please do me a favor and don't only add the keys required for contraints
to work, but also composite keys covering attribute combinations
commonly used in queries. This table is going to have a lot of data in
it, and you don't want to cause a full table scan, less so on possible
joins.

Sorry but I don't understand the suggestion. Do you have an example,
or a table which already does this?

Non of the existing tables it using it, which would explain the bad
performance when dealing with complex joins or larger tables.

If you have expensive queries which are querying for multiple constant
field values in a single table, add a composite index covering precisely
these fields. Especially true when such an table is used as a cross
table in a multiple table join.

MySQL will be happy to run your query solely on indexes if it can,
avoiding table or range scans entirely if possible. But that only works
if there are suitable indexes for costly queries.


At least add indexes on the text rows, or querying them will be slow
beyond reason.

Which text are being referred to? I cannot imagine that we ever query
by subject.

So the field is just stored for decoration?

-- Andreas

Thanks,

Mitchell


+ CONSTRAINT `mail_author`
+ FOREIGN KEY (`author`)
+ REFERENCES `codeface`.`person` (`id`)
+ ON DELETE CASCADE
+ ON UPDATE CASCADE,
+ CONSTRAINT `mail_projectId`
+ FOREIGN KEY (`projectId`)
+ REFERENCES `codeface`.`project` (`id`)
+ ON DELETE CASCADE
+ ON UPDATE CASCADE,
+ CONSTRAINT `mail_mlId`
+ FOREIGN KEY (`mlId`)
+ REFERENCES `codeface`.`mailing_list` (`id`)
+ ON DELETE CASCADE
+ ON UPDATE CASCADE)
+ENGINE = InnoDB;
+
USE `codeface` ;

-- -----------------------------------------------------
--

-- Andreas



Other related posts: