[haiku-development] Re: Q: recover partially corrupt bfs without reinitializing?

  • From: Marcus Jacob <rossi@xxxxxxxxxxxxxxx>
  • To: "haiku-development@xxxxxxxxxxxxx" <haiku-development@xxxxxxxxxxxxx>
  • Date: Fri, 10 Apr 2009 00:41:33 +0200

On 09.04.2009, at 12:09, "Axel Dörfler" <axeld@xxxxxxxxxxxxxxxx> wrote:

Marcus Jacob wrote:
On 08.04.2009, at 17:53, "Axel Dörfler" <axeld@xxxxxxxxxxxxxxxx>
wrote:
checkfs seems to do more bad than good ...
I would doubt that.
Multiple runs of checkfs produce additional errors in my case instead
of stating that everything is fine, as I would expect from the second
run.

The errors on your partition cannot be fixed by checkfs, so how should
it not report them on subsequent runs? :-)
If two files share the same block, you should check manually which of
the two files is corrupted, delete it, and then run checkfs again. Then
it will be able to solve at least this particular problem.

I would understand this if the reported errors would be within the same files or nodes, but the errors reported are always slightly different. I can collect all checkfs output in the future if this might be helpfull? Additionally I can offer to dump a corrupted partition for analysis. My data partion usullay doesn't contain any confidential data and I can usually reproduce the problem after a while with a checked out source tree ...

Corruption occured in a single directory, see ticket #3150.
So far, I haven't been able to reproduce it, unfortunately.
Well ;) I don't know, whats so special in my usage, but my file system
gets corrupted frequently. My main system is trashed every other week
.

While I wouldn't like this on my production system, I'd love to be

Well at least murphy has been good sofar, my StreetPainter sources have never been affected, but they are also in a repository and the old rule that backed up files never get corrupted applies here ;-)

able
to reproduce that this well. I'm running Haiku since quite some time as
my main OS, and I haven't yet seen any BFS problems so far.
But it's not a nice feeling knowing that you shouldn't trust it yet.

Yep. Btw I had the system checked intensivly to exclude memory or other hardware defects.

Most errors appear in checked out source trees, but also in other
places.

Anything I can do the next time my file system gets corrupted to help
locate the problem.

Btw, once the error mentioned in the ticket appears, I also get
frequent panics "vnode already exists".

I think the main problem is located in the block cache, at least that's
pretty much the only component that could be responsible for this kind
of errors.
So far I'm aware of three kinds of problems:
1) files are in B+trees that shouldn't be there anymore
2) inodes are written over existing inodes
3) data ends up in files that shouldn't contain them

I think it's very likely that this has one root cause, and this can
then only be in the block cache, and are caused by outdated block data
finding their way to the disk.

I guess I should spend more time testing this component - I've already
written some tests, but they obviously don't cause any error (anymore).

This is your domain, can really judge the issue but believe that your suspicions make sense.

Where do I find this tool? I remember such a command from BeOS but
haven't seen it in Haiku.
There is none. I would actually prefer to fix bugs instead of
delivering (and shipping!) work arounds like that :-)
Granted. Just hate to reinstall my primary system once a week. And a
sepetate data partition doesn't help as its the data which gets
corrupted.

Indeed. Having a non system file system is not really suited for a
primary system - I'll improve the tracing/debugging capabilities of the
block cache next week, maybe something useful shows up.

Let me know if I can help by enabling whatever you are going to implement.

Cheers,
Rossi

Other related posts: