[linux-cirrus] Re: bootmem_init_node failure with 64M TS-7250

> On Thu, Dec 28, 2006 at 11:02:19PM -0500, Charles Moschel wrote:
>
>> > > Booting with four mem=8M@ (lower banks) on the command line works
>> OK,
>> > > but only with 32M of course.  Adding any one of the single higher
>> > > 0xex000000 banks results in a hang.
>> >
>> > The problem here is that Linux 2.6 ep93xx port is somewhat naive and
>> > expects all RAM to be within a 1G physical address range.  This isn't
>> > always true on ep93xx hardware (such as your board), but it _is_ true
>> > for all ep93xx hardware I have access to (for example, I have the 32M
>> > variant of the TS7250), so I never fixed the problem, even though
>> Linux
>> > supports not having all RAM within a 1G range just fine.
>>
>> Is it _worth_ fixing?
>
> It's probably not a lot of work, so I think it's worth fixing at some
> point.
>
> But even if it does get fixed, IMHO sparsemem/discontigmem should be
> disabled by default for ep93xx as it has a performance impact.
>
>
>> This board has (2) 32MB SDRAM chips, but there are other TS boards
>> that have a single 64M chip which would (I guess) still have this
>> problem [1].
>
> With a single 64M chip, I think you wouldn't have this problem, as
> only one chip select would be used in that case.

I think the costs are approaching parity, but 1 64M chip has been
significantly more costly that 2 32M chips for some time.  We do put on
64M chips from time to time for customers that need it as our 2.4 kernel
modifications allow the chips to be pretty much dropped in.  Both our
RedBoot bootloader and our Linux-based bootloader (Linux booting Linux)
auto-detect the SDRAM chip sizes and pass appropriate ATAGs.

>
> (It also wouldn't have been a problem if TS had strapped the ep9302
> to put the 0....... chip at c....... (phys) instead, because then all
> RAM would have been in the same 1G range.)

When we laid out this board, I don't think we were aware that Linux 2.6
had this bug, err I mean, "quirk", regarding physical memory layout.  :-) 
I'm pretty sure we just picked the chip-select line that resulted in the
most efficient hardware track to the SDRAM chip.  Our hardware design
culture at TS really hates to compromise to work around software bugs, and
I'd probably get heckled if I had to suggest/explain this.  Also, we had
custom boards that use all 4 chip selects for memory so it wouldn't have
saved us any time to avoid the 0xd and 0xe chip selects-- we had to get
them to work anyway.

>
> (I remember trying switching async/sync boot mode on the fly on the
> ep9302 a while ago and vaguely remember that that worked.  It might be
> another option to try and put a bit of trampoline code at e....... to
> toggle the sync/async boot flag (relocating the RAM at 0....... to
> c.......) before decompressing the kernel to the c....... area.)
>
>
>> I don't know the market, are there other boards / manufacturers that
>> would benefit from a proper fix?
>
> No idea, really.  I have four ep93xx boards, and none of them have
> this issue.

FWIW, other OS' have exhibited similar short-sightedness regarding
assumptions like this about physical memory layout.  We talked to a QNX
engineer about a port to the ep93xx and he expressed doubts that the QNX
VM system would tolerate such fragmented memory without kernel rework
($$$).  About the only OS' that didn't mind the heavily fragmented memory
were the BSD's.  Just need to call uvm_physload(physaddr, sz) for each
fragment of memory there IIRC.

>
>
>> > > Should I try a DISCONTIGMEM or SPARSEMEM build?
>> >
>> > Yes, this is exactly what needs to be done to fix the problem in a
>> > generic way, but it does involve writing a bit of code.
>>
>> Well, if it's more that sprinkling printks, tweaking #DEFINEs, or
>> editing .config files, I'm out of my league :)
>
> :)
>
>
>> > Another option _might_ be to set PHYS_OFFSET to e0000000, and tell
>> > Redboot to load the kernel at 0xe0008000.  This will likely mess up
>> ATAG
>> > passing (so you'll have to comment out the boot_params line in
>> ts7250.c,
>> > hardcode the machine ID and pass mem= parameters for every memory
>> block
>> > by hand), but it'll cause physical RAM to be within a 1G range
>> (e0000000
>> > .. 1fffffff, (ab)using 4G address wrap) and might just work.
>>
>> Yes, I saw that you made the suggestion of setting PHYS_OFFSET to
>> e0000000 earlier this year on the ts-7000 yahoo list [2].  I tried that
>> as well, but naively, didn't realize the other changes needed to be
>> made.
>
> If you end up trying this, I'd be curious to know whether it works or
> not.
>
>
>> Is it relatively simple for you to properly fix the problem once you
>> have access to an afflicted board?
>
> I think so.  [ Maybe I can trade in my 32M ts7250 for a 64M model. :) ]

No problem.  Where would you like it sent?  We could probably give you a
different board too if you wanted.  Since the TS-7250, we've released a
7260, 7300, and 7400 based on the same processor.

//Jesse Off



Other related posts: