Hello again. So, seems large number of evaluated submission is a real problem for ES, and, consequently for AWS. Seems it doesn't affect evaluation part of ES, but rather a "presentation" part - to calculate the number of different submissions it loads evaluation results of all of them from the database. Just remembering, we have the contest, starting at May 3, 00:00 UTC and it will last during all day. If anybody of core developers would be online, I would appreciate that. On Wed, Apr 30, 2014 at 1:52 PM, Artem Iglikov <artem.iglikov@xxxxxxxxx>wrote: > The evaluation is almost finished. I have about 22000 evaluated > submissions, ES takes 2.6GB, SS takes 1.7GB of RAM. > > > On Tue, Apr 29, 2014 at 11:26 PM, Artem Iglikov > <artem.iglikov@xxxxxxxxx>wrote: > >> Hello again. >> >> Just to clarify, which one of the issues are we talking about? >> >> If it is the one that I mentioned several weeks ago, I have no access to >> the hardware used on that system, and I cannot reproduce the issue on new >> hardware. Because of this, I think, this is something hardware related >> (slowness, bugginess, some relation to magnetic storms...). >> >> If we are talking about the issues I mentioned in this thread, then, as I >> pointed out, the running out of connections seems to be expected for me, >> because by default the PostgreSQL has 100 connections allowed I had several >> instances of CWS and many users. >> >> I have just ran a quick stress test with default settings and couldn't >> run out of connections. Probably, I have to fill database a bit more. >> Anyway, I'll certainly will do a lot of stress testing during next several >> days, and if I will be able to repeat the result, I'll send you logs. >> >> Also, I would like to note that the #254 reproduces easily. I'm not sure >> about the memory usage - I need to wait when the evaluation finishes (right >> now ES takes 752m with 600 submissions evaluated and 24000 submissions >> being evaluated). >> >> By now the situation with database connections is like the one in >> attachment. The number do not change if I stop stress testing (but probably >> I don't wait too long after stopping). If you think that there is something >> unusual, I can send you logs of CWS, but could you give me your public GPG >> key for them? >> >> >> >> On Tue, Apr 29, 2014 at 6:52 PM, Artem Iglikov >> <artem.iglikov@xxxxxxxxx>wrote: >> >>> I was quite busy these days so haven't done any additional tests, but >>> I'll try to reproduce the issue today with your patch applied. Thanks. >>> On Apr 29, 2014 6:42 PM, "Luca Wehrstedt" <luca.wehrstedt@xxxxxxxxx> >>> wrote: >>> >>>> I'd like to fix the database connections issue but neither I nor >>>> Giovanni have been able to reproduce it. We need to diagnose it on your >>>> system, I'm sorry. >>>> >>>> Could you please apply again the patch I'm attaching, manually start >>>> CWS from the shell, redirecting its standard output & error to file, >>>> reproduce the issue and send us that file? We need the stdout+stderr, as >>>> it's the only place where a detailed access log is available (each request, >>>> with URL and other info). Thanks for your help! >>>> >>>> Luca >>>> >>>> PS: the extreme memory use is also unexpected; as it may be related to >>>> the connection issue I'll tackle it after we've solved this. >>>> >>>> >>>> On Wed, Apr 23, 2014 at 9:36 AM, Artem Iglikov <artem.iglikov@xxxxxxxxx >>>> > wrote: >>>> >>>>> Thank you guys for analysing the situation. >>>>> >>>>> Just a small correction, my estimation of 20000 submits is for >>>>> overall number of submits made during all virtual contests, which will be >>>>> distributed more or less evenly during 1 or even 2 days. >>>>> >>>>> And stress testing from master doesn't work again :-) The last commit >>>>> broke something in storing logs, I suppose. >>>>> >>>>> I did a stress testing with 400 actors (4 instances of StressTest.py >>>>> with 100 actors each), 12 workers, about 20000 submits in total and >>>>> default >>>>> database settings without PgBouncer, and seemed fine except these: >>>>> - I've got exception from #254 a few times >>>>> - I've ran out of db connections (which was expected) >>>>> - possibly because of problems with db connections two workers died. >>>>> I saw that they were in "compiling" state for about 10 minutes (though in >>>>> their console I saw that the job was already done) and after that they >>>>> became disabled. I guess they were not able to deliver the result back to >>>>> ES and because of these ES kicked them (maybe I'm wrong) >>>>> - AWS overview page is not supposed to handle a very large queue, each >>>>> refresh forces ES to use 100% of CPU. This obviously shouldn't be the >>>>> case during real contest if everything goes as expected, but I'm >>>>> going to fix this anyway, partially done here: >>>>> https://github.com/artikz/cms/commit/50a1c3235a374bf5695178058c27c4e798c1f096 >>>>> . >>>>> >>>>> Then I've repeated the stress test, but now tweaked database >>>>> max_connections limit to 200 (also had to increase SHMMAX). And only >>>>> problems that left were #254 and AWS overview page. >>>>> >>>>> All these problems seems minor ones to me except one: I couldn't get a >>>>> "dead" worker back alive. Restarting worker doesn't help, seems only >>>>> restarting ES (or another core service) works. Is it for some reason? >>>>> Shouldn't the "death" state of a worker be cleared when worker reconnects? >>>>> >>>>> >>>>> >>>>> On Tue, Apr 22, 2014 at 10:26 PM, Giovanni Mascellani < >>>>> mascellani@xxxxxxxxxxxxxxxxxxxx> wrote: >>>>> >>>>>> Il 22/04/2014 18:23, Luca Chiodini ha scritto: >>>>>> >> For instance, a few weeks ago Luca Chiodini complained on this >>>>>> >> mailing list that StressTest had a problem, but I didn't have time >>>>>> to >>>>>> >> check it out and most probably I won't in the near future. >>>>>> > >>>>>> > I did, but Artem has already fixed it with #265 [0] >>>>>> > and now StessTest works fine. >>>>>> >>>>>> Yes, sorry, I remembered about that just after having sent my reply. >>>>>> >>>>>> Gio. >>>>>> -- >>>>>> Giovanni Mascellani <giovanni.mascellani@xxxxxx> >>>>>> PhD Student - Scuola Normale Superiore, Pisa, Italy >>>>>> >>>>>> http://poisson.phc.unipi.it/~mascellani >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Artem Iglikov >>>>> >>>> >>>> >> >> >> -- >> Artem Iglikov >> > > > > -- > Artem Iglikov > -- Artem Iglikov