[comixed] Re: Comics never completing import

From: "Darryl L. Pierce" <mcpierce@xxxxxxxxx>
To: comixed@xxxxxxxxxxxxx
Date: Mon, 7 Sep 2020 08:41:43 -0400

The way a single comic book is modeled is like this:
* Comic -> the parent object that holds the filename and the meta-data for
a comic
* Page -> a many-to-one relationship to Comic, each represents a single
page in the comic archive
* ComicFileEntry ->a many-to-one relationship to Comic, each represents a
single file (not just images like Page) in the comic archive
* ComicFileDetails -> basic details on the physical file (currently, just
the hash for the file itself)

When importing starts, an AddComicWorkerTask is enqueued and given the
filename for the comic. When the AddComicWorkerTask runs, it creates the
Comic object, inserts it into the database, which is what causes the entry
to popup in the browser. At this stage the Comic object only knows the
filename for the comic and nothing else. AdminComicWorkerTask then enqueues
an instance of ProcessComicWorkerTask, which is the workhorse of the
library.

When ProcessComicWorkerTask runs, it extracts the images from the archive,
gets their metrics and creates the Page objects. It also processes the
ComicInfo.xml file to pull out any metadata, it creates the ComicFileEntry
objects, and, at the end of import, gets a hash of the comic file itself.
When this last step is done, ProcessComicWorkerTask creates the
ComicfileDetails object, sets the hash value, and saves everything to the
database. Until this is done the Comic in the database only has a filename
and an added date, nothing else.

When the browser gets an update with a comic, it looks to see if the comic
has an associated ComicFileDetails object. If it does not then the browser
shows the progress spinner to indicate the comic isn't fully processed.
When it receives an update that includes the ComicFileDetails object, that
tells it the comic has been fully processed, and it removes the progress
spinner.

The feature I mentioned in the first post here had to do with how tasks in
general, and ProcessComicWorkerTask specifically here, are processed.
Before, when a task started processing, it was removed from the queue
*first*. If you interrupted the task before it completed (as opposed to it
failing before completion) then you would lose the task and never see it
finish. So, for example, you start importing 100 comics and then shutdown
CX. If five comics were being processed when you shut down and 95 were
still enqueued, then starting the server up would only ever finish those
95: the previous five were lost. The change here was to make it so that the
five that would have been lost would still be enqueued and you could
recover them.

But, if a task fails because of an issue with the comic or with memory,
then the task gets deleted (which it should since it would otherwise become
a bottleneck blocking other tasks from running) and the failure is captured
and put into the Task Audit Log. So whatever issue is causing the error
you're hitting, the details for it should be in the Task Audit Log. If it's
still the issue of the WebP library having issues and throwing an
exception, that stacktrace should be in the database and you can post it
here or add it to the WebP project's issue so they can fix the underlying
problem.

On Mon, Sep 7, 2020 at 7:41 AM bareheiny <dmarc-noreply@xxxxxxxxxxxxx>
wrote:

Didn’t notice this change was available - checked it tonight.

The out of memory issue with multiple, large, webp formatted archives
still exists.

I had thought that it may be resolved - but in hindsight that was a
misconception I think.  The old out of memory error I used to strike caused
CX to stop...this new one doesn’t.  So I’m assuming CX keeps trucking along
as if the comic had processed successfully, and removes the import task.

What would cause the comics to still be in a processing state though?  The
spinning wheel indicates CX knows it’s not finished...but there isn’t any
signs of page hash generation in the logs.

On 29/08/2020, at 00:54, Darryl L. Pierce <mcpierce@xxxxxxxxx> wrote:

I've pushed a change up for PR [1] that should address this issue.

When tasks like importing and processing a comic are created, they're
first put into the database to queue then up. Then they are popped out and
executed in order. Previously they were pulled out of the database, the
record deleted, and then they were executed. Which is fine until the server
exits before a task finishes, since then the task couldn't be restarted
because it was lost.

With this change, the task is loaded and executed and THEN it's deleted
from the database. So if you're importing a comic and kill the server
halfway through, then the next time you start the server the task is still
there in the database and can be rerun.

Hoping this fixes a lot of the headaches bareheiny's been having. :D

--
Darryl L. Pierce <mcpierce@xxxxxxxxx>
"Le centre du monde est partout." - Blaise Pascal
"Let's try and find some point of transcendence and leap together." - Gord
Downie

--
Darryl L. Pierce <mcpierce@xxxxxxxxx>
"Le centre du monde est partout." - Blaise Pascal
"Let's try and find some point of transcendence and leap together." - Gord
Downie

References:
- [comixed] Re: Comics never completing import
  - From: bareheiny

[comixed] Re: Comics never completing import

Other related posts: