[codeface] Re: [PATCH 3/3] Add global email analysis to database

  • From: Andreas Ringlstetter <andreas.ringlstetter@xxxxxxxxxxxxxxxxxxxx>
  • To: <codeface@xxxxxxxxxxxxx>
  • Date: Mon, 26 Oct 2015 12:43:08 +0100

Good Morning,

Am 26.10.2015 um 11:54 schrieb Mitchell Joblin:

+
+-- -----------------------------------------------------
+-- Table `codeface`.`mail`
+-- -----------------------------------------------------
+DROP TABLE IF EXISTS `codeface`.`mail` ;
+
+CREATE TABLE IF NOT EXISTS `codeface`.`mail` (
+ `id` BIGINT NOT NULL AUTO_INCREMENT,
+ `projectId` BIGINT NOT NULL,
+ `emailId` VARCHAR(255) NOT NULL,

Why are you using a varchar for an ID?
Or is this supposed to be the rfc2111 "Message-ID"?
Why "NOT NULL" in the latter case? It's optional.

Also questioning the length,
https://tools.ietf.org/html/rfc4130#section-5.3.3 states that the ID
should be less than 255 characters, but that's not enforced. Neither the
code nor the database scheme contains any safeguard for this case.

And if it actually is the "Message-ID", why is the corresponding
"Reply-To" field discarded? (Which should btw. be resolved prior to
storing it in the database.) Without it, the thread structure can't be
reconstructed.

Also, if it's the Message-ID, is it the mangled form without (optional)
brackets or the raw form?

+ `threadId` BIGINT NOT NULL,
+ `mlId` BIGINT NOT NULL,
+ `author` BIGINT NOT NULL,
+ `subject` VARCHAR(255) NULL DEFAULT NULL,

This will break as soon as a thread contains only a single subject line
longer than expected.
Is it even necessary to record the subject? It shouldn't differ to much
from the parent topic?
And why allowing NULL? It's a mandatory field AFAIK.

+ `creationDate` DATETIME NOT NULL,
+ PRIMARY KEY (`id`),
+ INDEX `mail_author_idx` (`author` ASC),
+ INDEX `mail_projectId_idx` (`projectId` ASC),
+ INDEX `mail_mlId_idx` (`mlId` ASC),

Please do me a favor and don't only add the keys required for contraints
to work, but also composite keys covering attribute combinations
commonly used in queries. This table is going to have a lot of data in
it, and you don't want to cause a full table scan, less so on possible
joins.

At least add indexes on the text rows, or querying them will be slow
beyond reason.

+ CONSTRAINT `mail_author`
+ FOREIGN KEY (`author`)
+ REFERENCES `codeface`.`person` (`id`)
+ ON DELETE CASCADE
+ ON UPDATE CASCADE,
+ CONSTRAINT `mail_projectId`
+ FOREIGN KEY (`projectId`)
+ REFERENCES `codeface`.`project` (`id`)
+ ON DELETE CASCADE
+ ON UPDATE CASCADE,
+ CONSTRAINT `mail_mlId`
+ FOREIGN KEY (`mlId`)
+ REFERENCES `codeface`.`mailing_list` (`id`)
+ ON DELETE CASCADE
+ ON UPDATE CASCADE)
+ENGINE = InnoDB;
+
USE `codeface` ;

-- -----------------------------------------------------
--

-- Andreas

Other related posts: