[contestms] Re: Handling large number of total users/submissions

  • From: Artem Iglikov <artem.iglikov@xxxxxxxxx>
  • To: contestms@xxxxxxxxxxxxx
  • Date: Wed, 23 Apr 2014 13:36:46 +0600

Thank you guys for analysing the situation.

Just a small correction, my estimation of 20000 submits is for overall
number of submits made during all virtual contests, which will be
distributed more or less evenly during 1 or even 2 days.

And stress testing from master doesn't work again :-) The last commit broke
something in storing logs, I suppose.

I did a stress testing with 400 actors (4 instances of StressTest.py with
100 actors each), 12 workers, about 20000 submits in total and default
database settings without PgBouncer, and seemed fine except these:
- I've got exception from #254 a few times
- I've ran out of db connections (which was expected)
 - possibly because of problems with db connections two workers died. I saw
that they were in "compiling" state for about 10 minutes (though in their
console I saw that the job was already done) and after that they became
disabled. I guess they were not able to deliver the result back to ES and
because of these ES kicked them (maybe I'm wrong)
- AWS overview page is not supposed to handle a very large queue, each
refresh forces ES to use 100% of CPU. This obviously shouldn't be the case
during real contest if everything goes as expected, but I'm going to fix
this anyway, partially done here:
https://github.com/artikz/cms/commit/50a1c3235a374bf5695178058c27c4e798c1f096
.

Then I've repeated the stress test, but now tweaked database
max_connections limit to 200 (also had to increase SHMMAX). And only
problems that left were #254 and AWS overview page.

All these problems seems minor ones to me except one: I couldn't get a
"dead" worker back alive. Restarting worker doesn't help, seems only
restarting ES (or another core service) works. Is it for some reason?
Shouldn't the "death" state of a worker be cleared when worker reconnects?



On Tue, Apr 22, 2014 at 10:26 PM, Giovanni Mascellani <
mascellani@xxxxxxxxxxxxxxxxxxxx> wrote:

> Il 22/04/2014 18:23, Luca Chiodini ha scritto:
> >> For instance, a few weeks ago Luca Chiodini complained on this
> >> mailing list that StressTest had a problem, but I didn't have time to
> >> check it out and most probably I won't in the near future.
> >
> > I did, but Artem has already fixed it with #265 [0]
> > and now StessTest works fine.
>
> Yes, sorry, I remembered about that just after having sent my reply.
>
> Gio.
> --
> Giovanni Mascellani <giovanni.mascellani@xxxxxx>
> PhD Student - Scuola Normale Superiore, Pisa, Italy
>
> http://poisson.phc.unipi.it/~mascellani
>
>


-- 
Artem Iglikov

Other related posts: