Help! Linux 'Segmentation fault'

MatiasZ

Regular
I have a linux server running at home, and about 4 days ago I prepared a RAID1 array using two 200GB IDE hard drives. Everything seemed to work fine, but yesterday I woke up and all of the sudden the system was hang with a kernel error, and I had to reboot it. Once it came back up, everything seemed to be working fine. However I did not use the file server on the RAID disk so I don't know for sure how that was.

Today, I went to look for something and it said it couldn't connect to the drive (from windows), so I ssh in and see that nothing is mounted except for / point and the regular proc and those. Now whenever I try to mount either the RAID /dev/md0 or any of the partitions /dev/hdb1 and /deb/hdd1 I get a 'segmentation fault' error and that's it. I don't know what else to try and I have some info I would like to save from those disks so deleting them is not an option for now - that's why I set up the RAID in the first place.

If anyone has any info that might help me through this would be great.

Regards,

Matias
 
OK, I have new info :D:

I tried to perform a mount -a and see what dmesg showed, and this appeared:

Code:
Unable to handle kernel NULL pointer dereference at virtual address 00000009
 printing eip:
c013b0e6
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c013b0e6>]    Not tainted
EFLAGS: 00010282
eax: c0d7cb70   ebx: c031b600   ecx: ca58a000   edx: c0d7cb70
esi: 00000009   edi: ca58a000   ebp: ca4e1000   esp: ca5fdeec
ds: 0018   es: 0018   ss: 0018
Process mount (pid: 4382, stackpage=ca5fd000)
Stack: 00000000 ca58a000 ca5fc000 c013b38c ca58a000 00000000 ca5fdf64 c013bea2 
       ca58a000 00000000 ca5fdf64 ca5fc000 ca4e1000 c014d5cc ca58a000 00000000 
       ca4e1000 ca4e0000 00000000 00000000 ca5fdf64 ca4e1000 c014d883 ca5fdf64 
Call Trace:    [<c013b38c>] [<c013bea2>] [<c014d5cc>] [<c014d883>] [<c014d6fb>]
  [<c014dc8b>] [<c0108a93>]

Code: ac ae 75 08 84 c0 75 f8 31 c0 eb 04 19 c0 0c 01 85 c0 74 0a

This error is the one I got when I had to reset the system.

So I tried to mount another partition (hda4) which is from the same disk as / so that has nothing to do with the RAID array nor with the RAID discs, which I thought was not mounted at boot because it's mount point was within the other one (the raid was /home and hda4 was /home/downloads), and I got the same error. Something appears to be messed with the mounting, but this is more or less as far as I can get in testing before crapping the OS away, so any input would save me a lot of time ;)
 
Does dmesg have any other inforation pertaining to the drives, raid controller or anything else?
 
No not apparently... but it's full of shorewall stuff, so is hard to debug it... I could try rebooting and see what comes up
 
OK, after rebooting the system, and making a 'raidstart /dev/md0' I seem to be up again... and the disks are intact as I was able to mount it and all the data was there. I'll keep researching to see what caused this and how to avoid it. While we are at it, does anybody know if I'm making any fundamental mistakes in my disks arrangement?? I currently have:

hda - Primary Master
hda1 /
hda2 /var
hda4 /home/downloads

hdc - Secondary Master -> /mnt/cdrom mount point

hdb - Primary Slave
hdb1 /dev/md0 device 0

hdd - Secondary Slave
hdd1 /dev/md0 device 1

md0 - RAID Device
mounted at /home (note that hda4 is mounted under it in the fs tree)

If anyone has any suggestion on how to approach this in a better way please let me know ;)
 
Slackware 10.1. I like it because you have to setup everything by yourself in order to get it working, so you learn a lot in the process, and the file distribution across the different directories is simple and very organized. I don't control too much of linux, but I've been getting better through the last months because of this.

I think one of the problem was that the partitions wheren't marked as fd - linux auto raid - so the raid array woudn't automount correctly at boot. I've managed to change this and rebooted the system and everything is working kind of fine, but although reiserfsck pops no errors, when I run mdadm --detail /dev/md0 I get a 'dirty' comment on the status. Any idea how to clean it??
 
No is ReiserFS. I see Reiser4 has been realized and is somewhat prized on their webpage, what is the problem with it?
 
Unstable, doesn't have decent tools, I don't even know about a working fsck, doesn't adhere to 4k stacks, uses 8k instead. Fun stuff like that.
 
Back
Top