I forgot to ask (but I can guess) - how were you doing the RGB to CMYK conversion ? Were you using device links with icclink -G ? (I would guess not). You will notice that the result using this path is a lot smoother than using the B2A table of a profile.
This doesn't explain anything, but it narrows the phenomenon down to being to do particularly with the nature and resolution of the B2A table in the destination profile. Some trials indicates that the resolution of the "bumps" is related directly to the resolution of the B2A table, and nothing else.
It does suggest a less hand waving explanation:
The nature of the L*a*b* B2A table is fairly inefficient (few of the cells occupy in-gamut space), and the linear (in device space) interpolation within the cells has a fair degree of inaccuracy, leading to a series of straight line (in device space) black generation values, between the grid vertices, that result in perceptual non-linear interpolation when used to translate RGB values into CMYK value. (Luckily the linear interpolation of C,M,Y & K independently gives color values reasonably close to the target for most devices.)
Why is this less evident with min or max black ? - possibly because the black is not changing so much through the gamut, so the effect of interpolation errors is minimized, or the black response is more accurately interpolated by straight lines in device space at min or max black (this isn't a completely convincing explanation.)
It is possible to reformulate the B2A Lut such that its efficiency is much improved, by using the matrix element and input curves to map an XYZ PCS into a pseudo-RGB space that encompasses the CMYK space, and then use the CLUT to translate between them. This approach might give a result much closer to that of device link -G. The disadvantage is that the clipping behaviour for out of gamut colors won't be very good (you'll get per device channel clipping in XYZ/RGB space, rather than nearest color clipping), and (with ICCV2 profiles) the A2B will have to be a CLUT table in XYZ PCS, which may affect accuracy badly. This might be alleviated somewhat by mapping the input to a log encoded (or L* curve) XYZ, or by using ICCV4 to convert between L*a*b* and XYZ between the CLUT and the PCS.