[procps] bug #627257, top: memory leaks

  • From: Jim Warner <james.warner@xxxxxxxxxxx>
  • To: procps@xxxxxxxxxxxxx
  • Date: Fri, 9 Mar 2012 11:15:32 -0600

Hi Craig,

Here's a little more information on the above bug.

The problem began with with this.
. Bug #506303: from Russell Coker <russell@xxxxxxxxxxxx> 
. Date: Thu, 20 Nov 2008 22:01:53 +1100
. Subject: ps should have an option to display the supplementary groups

The memory leak was created with this reply to that bug.
. Patch: from Alfredo Esteban <aedelatorre@xxxxxxxxx>
. Date: Wed, 18 Feb 2009 21:25:38 +0100
. Attachment: patch_supplementary_groups.diff

In that diff the library was given the ability to access Groups (supplementary) 
in the status2proc function.  The library internally called a new external(?) 
allocsupgrp function.  However, the user of the proc_t, which now potentially 
contained extra dynamically acquired memory, was responsible for calling 
freesupgrp, another new exported function.

The patch arranged for the ps/display module to dutifully call freesupgrp.  
Unfortunately, top was never told about this nasty hack.  Thus, when *any* 
/proc/#/status field was displayed, poor old top would get hit with Alfredo's 
memory leak.  In top terms, such fields are those with the L_status or L_EITHER 
library flags.

Now, with procps-ng, all dynamic memory management is the responsibility of the 
library.  That responsibility is discharged when a proc_t is reused and 
explains the valgrind "possibly lost" categories.  Those are simply proc_t 
memory (the proc_t itself plus associated extra memory) that never gets 
recycled and/or freed by program end.

There remain, however, some minor valgrind "definitely lost" library memory 
leaks for which top again gets blamed.  They are associated with the library's 
pwcache.c hashing of results from getpwuid and getgrgid calls.  So whenever the 
functions user_from_uid or group_from_gid are invoked, memory is potentially 
acquired that will never be freed.  This happens whenever a user/group was not 
already hashed.

The amounts for such memory are modest and reach some high water mark dependent 
on a particular system at a particular point in time.  I've included examples 
below.  In any case, this is by design and makes perfect sense -- trade a 
little memory for reduced function call overhead.

So, could we please swat/squash/kill/bury this bug?

Regards,
Jim


examples of "definitely lost" categories due to hashing
-------------------------------------------------------

 300 (60 direct, 240 indirect) bytes in 1 blocks are definitely lost...         
                                        
    at 0x4C28FAC: malloc (vg_replace_malloc.c:236)                              
                                        
    by 0x5390F76: nss_parse_service_list (nsswitch.c:626)                       
                                        
    by 0x5391528: __nss_database_lookup (nsswitch.c:167)                        
                                        
    by 0x5C34553: ???                                                           
                                        
    by 0x53444BC: getpwuid_r@@GLIBC_2.2.5 (getXXbyYY_r.c:256)                   
                                        
    by 0x5343DAE: getpwuid (getXXbyYY.c:117)                                    
                                        
    by 0x4E3539E: user_from_uid (pwcache.c:56)                                  
                                        
    by 0x4E36FA2: simple_readproc (readproc.c:765)                              
                                        
    by 0x4E373BD: readproc (readproc.c:1041)                                    
                                        
    by 0x404EAD: procs_refresh (top.c:2042)                                     
                                        
    by 0x407346: frame_make (top.c:3849)                                        
                                        
    by 0x40A46E: main (top.c:3912)                                              
                                        
                                                                                
                                        
 300 (60 direct, 240 indirect) bytes in 1 blocks are definitely lost...         
                                        
    at 0x4C28FAC: malloc (vg_replace_malloc.c:236)                              
                                        
    by 0x5390F76: nss_parse_service_list (nsswitch.c:626)                       
                                        
    by 0x5391528: __nss_database_lookup (nsswitch.c:167)                        
                                        
    by 0x5C3251B: ???                                                           
                                        
    by 0x5342A9C: getgrgid_r@@GLIBC_2.2.5 (getXXbyYY_r.c:256)                   
                                        
    by 0x53421EE: getgrgid (getXXbyYY.c:117)                                    
                                        
    by 0x4E3546E: group_from_gid (pwcache.c:84)                                 
                                        
    by 0x4E36595: supgrps_from_supgids.clone.0 (readproc.c:430)                 
                                        
    by 0x4E370A1: simple_readproc (readproc.c:754)                              
                                        
    by 0x4E373BD: readproc (readproc.c:1041)                                    
                                        
    by 0x404E0E: procs_refresh (top.c:2042)                                     
                                        
    by 0x407346: frame_make (top.c:3849)                                        
                                        
                                                                                
                                        

Other related posts: