[comixed-dev] Re: New library code for fetching

From: "bareheiny" <dmarc-noreply@xxxxxxxxxxxxx> (Redacted sender "bareheiny" for DMARC)
To: "comixed-dev@xxxxxxxxxxxxx" <comixed-dev@xxxxxxxxxxxxx>
Date: Mon, 2 Dec 2019 11:59:44 +1300

I just checked the Chrome log – it took between 0.2 and 11.7 minutes to
retrieve each cover image (I’m assuming they were being pulled concurrently),
and that’s image sizes of between 0.1MB and 5MB  (assuming I’m reading the log
correctly).

I guess the question is how is the image being grabbed?  I’m assuming the comic
file is loaded into memory so the image can be extracted....I’d expect that can
take a bit of time depending on the size of the file.

Eitherway, as long as it’s on your radar for looking at I’m happy.

From: Darryl L. Pierce
Sent: Monday, 2 December 2019 6:15 AM
To: comixed-dev@xxxxxxxxxxxxx
Subject: [comixed-dev] Re: New library code for fetching

I'm still wondering about what's the cause of that. I've used fully loaded
meta-data comics in my tests, load a page of 100 comics, and it'll load in what
feels like a reasonable amount of time. But that sort of load time (6 minutes)
is not in an acceptable range IMO. So we'll definitely push caching up higher
on the priorities list after 0.5 is released, maybe even make it an 0.5.1
feature target. I'm just afraid of pulling something that big into the
development cycle right now since I want to get 0.5 out by the end of the year.

On Fri, Nov 29, 2019 at 9:02 PM bareheiny <dmarc-noreply@xxxxxxxxxxxxx> wrote:
Right-o – I’ve imported a clean library.

By “clean”, I mean there is no metadata (all comic.info files removed), and all
comic files that triggered an image processing error have been corrected or
quarantined.

Once that library was fully loaded (all hashs created etc.), I logged into CX
and went to the library page.  It took around 9 minutes for the covers of 100
comics to load.

The time for a get 100 to complete ranged from 0.04 seconds to 4 minutes,
averaging 0.62 seconds – similar to when processing comics with metadata.

The time between get 100 requests ranged from 0.5 to 3.5 seconds, averaging a
little over 0.5 seconds.

So the biggest bottle neck in loading a library is the time between get 100s
firing.  Given what I’m seeing from the two imports...this seems likely due to
the metadata (no metadata = shorter times between gets), and how that’s being
handled.

Cover population is also slow – but that would be easily (at face value...not
meaning to imply it’ll be easily coded) addressed by having a cache of cover
images.  Honestly, I’m not sure I have the patience to wait over 5 minutes each
time I navigate between library pages.

I think I’ve gone as far as I can with this for now – I need to get back to
fixing the remaining image issues, and doing some other library stuff.

From: bareheiny
Sent: Tuesday, 26 November 2019 10:54 AM
To: comixed-dev@xxxxxxxxxxxxx
Subject: RE: [comixed-dev] Re: New library code for fetching

Cheers – that pretty much confirms what I thought was happening.

I’ve just had a quick look at the log data, and there seem to be a couple of
things going on.

The time between get 100 requests ranges from 1.4 seconds to 6 minutes,
averaging 46 seconds.  The time for a get 100 to complete ranges from 0 seconds
to 5 minutes, averaging 48 seconds.

This leads to two questions – what’s causing the variability in the time for a
get to complete, and the next to begin, and what’s causing the variability in
the time for a get to return a result.

At first blush, I would think that the time for a  get to complete would depend
on the size of the comics in the get request – I’m assuming the each file needs
to be loaded for the cover to be extracted – which could take some time for the
larger files.  The time between a get completing and the next starting...maybe
processing the metadata for loading into the drop down lists etc.?

I do have a copy of my comics with the comic info files removed – I’m
interested in knowing how loading a library with no metadata changes the load
times – that’s my next job, but will take a number of days to load everything
in 😊

Meanwhile, I’ll wait for you change to make it to a release, and have another
look...but I’m expecting that I may see delays of up to 6 minutes for 100
comics to be displayed (based on the above).

From: Darryl L. Pierce
Sent: Tuesday, 26 November 2019 9:21 AM
To: comixed-dev@xxxxxxxxxxxxx
Subject: [comixed-dev] Re: New library code for fetching

Not a problem. :D

When you log into the app, it will (very soon) stop trying to download the
entire library in chunks of 100 comics/request).

Instead, when you go to the library page (/comics), the browser will send a
request to the backend and ask for comics to display on the current page.That's
determined by three things:
1. the current page of comics you're on,
2. how many comics you are showing on a page, and
3. the sort order for the comics.

The display widget (called DataView) knows how many total comics there are, and
how many you want to show per page, and divides that into the number of pages.
So if you're viewing comics 25 at a time then the first page is comics 1-25,
the second is 26-50, etc. So as you navigate through the pages of comics, the
browser sends a request for the comics to show just for that page.

The long and short of it is, if you're only viewing 25 comics per page, then
the browser will only have those 25 in memory. When you go to the next or
previous page of comics the browser will go and get them from the backend and
then those are the ones it has in memory.

Is that a better description?

On Mon, Nov 25, 2019 at 2:34 PM bareheiny Alexander
<dmarc-noreply@xxxxxxxxxxxxx> wrote:
Mmmm....this is sounding familiar, I’ll need to revisit the open tickets to
refresh my memory.

Can you outline what happens when the library is being loaded (in laymen’s
terms)?

I have a rough idea, but want to double check before I say something stupid.

On 26/11/2019, at 05:06, Darryl L. Pierce <mcpierce@xxxxxxxxx> wrote:

With the code that I checked in yesterday, it should be a lot faster to load
since it's no longer going to try and load everything once you log in. It's
going to now start pulling strategic chunks of data as needed, only as much as
need be shown on the current page. The old code is still there as the
collections views still depend on it. But for the main library viewing it only
loads what it needs.

Regarding caching things, we had/have a ticket for caching the cover image so
they can be more quickly retrieved. I'm not sure if pulling them from a
database is going to be any faster than extracting them out of the comic file,
but my thought was to have CX maintain a thumbnails/caching directory where it
would story covers hierarchically by hash value (break up the hash into 4-char
pieces and each represents a subdirectory). Since file access is fastest, it
can get the hash, look for the file and, if it's not there, grab it from the
archive, cache it and return it. But I'm open to other ideas for how to make it
faster.

On Mon, Nov 25, 2019 at 5:32 AM bareheiny Alexander
<dmarc-noreply@xxxxxxxxxxxxx> wrote:
Over the years, I’ve been explored YAC, Ubooquity and ComicRack (something’s
happening here...the website is showing a new landing page).

As far as I can tell, all of the applications extract and store comic covers –
making the thumbnail generation a lot quicker.  Is that something you’ve
considered for CR?

My (non-developer) thoughts are that cover display would be pulled from
pre-extracted images...statistics and filter fields (publisher, characters
etc.) would be pulled from the database via SQL (I’m a BI developer, so I see
SQL everywhere – if all you have is a hammer, all you see are nails as they
say) as the user navigates to the relevant pages – rather than having to wait
for the entire library to load and get everything populated.

As it stands, with CX trying to load the entire library and all metadata....it
takes far too long to load a large library (as I’ve mentioned on GitHub, my
session times out before the library fully loads).  If there is no metadata,
the library loads quick smart (to be fully confirmed).

Not being an application developer, I’m likely missing vital information /
experience...I’m more than happy to be enlightened.  Also happy to be told that
this isn’t the appropriate list for my input :)

From: Darryl L. Pierce
Sent: Monday, 25 November 2019 2:43 PM
To: comixed-dev@xxxxxxxxxxxxx
Subject: [comixed-dev] New library code for fetching

I spent time yesterday and today writing a new set of actions for the library
to move away from loading the whole library into the browser. What it does now
is, when you go to the library view, it fetches just the comics for the current
page. And as you move about, changing sorting, switching pages, changing the
number of comics to show, etc, it makes a REST call to fetch the comics to
display and refreshes the view.

It's going to be a process to refactor other pages, but what I'm thinking is
this:

1. Have the main page load specific statistics (rather than extracting it from
the comics as they're loaded) with a single REST API.
2. Have each of the collections page use a similar request (or enhancements to
the new request) to get comics in pages.
3. Change the current continuous update to do some other, simpler actions to
get statistics on the library rather than loading everything. Or just dump that
continuous update entirely.

And these are all things I think we can get done by the end of the year when I
want to put the 0.5 release out for RC.

I merged this one, but would like to start putting things up for PRs to get
feedback if you guys are comfortable with getting more hands-on now. :D

--
Darryl L. Pierce <mcpierce@xxxxxxxxx>
"Le centre du monde est partout." - Blaise Pascal
"Let's try and find some point of transcendence and leap together." - Gord
Downie

--
Darryl L. Pierce <mcpierce@xxxxxxxxx>
"Le centre du monde est partout." - Blaise Pascal
"Let's try and find some point of transcendence and leap together." - Gord
Downie

--
Darryl L. Pierce <mcpierce@xxxxxxxxx>
"Le centre du monde est partout." - Blaise Pascal
"Let's try and find some point of transcendence and leap together." - Gord
Downie

--
Darryl L. Pierce <mcpierce@xxxxxxxxx>
"Le centre du monde est partout." - Blaise Pascal
"Let's try and find some point of transcendence and leap together." - Gord
Downie

Follow-Ups:
- [comixed-dev] Re: New library code for fetching
  - From: bareheiny

References:
- [comixed-dev] Re: New library code for fetching
  - From: Darryl L. Pierce

[comixed-dev] Re: New library code for fetching

Other related posts: