You will find this error in ‘/var/log/messages’ repeated number of times and it occurs when a disk drive encounters a fully-recoverable read error or the drive has a bad sector or experiences a failure.
Don’t get panic, this error message does not always indicate a true disk failure, but often indicates a period when the disk is too busy to service an active read request.
Now to overcome this situation, you need to first confirm if the drive in indeed corrupted or has too many bad sectors to work correctly. If this is the case you immediately need to backup the data and look for a replacement. Otherwise, try to repair filesystem with fsck, so let’s start with first step.
1. Check disk health
SMART is Self-Monitoring, Analysis and Reporting Technology, it is system built into many ATA-3 and ATA, IDE and SCSI-3 hard drives. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests. Here we will use smartctl command to help find what is wrong with the disk.
# smartctl -H /dev/sda
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Right now it passed, but this is a general check up, let’s go little deeper check.
2. Check detailed disk health
# smartctl -q errorsonly -H -l selftest -l error /dev/sda
ATA Error Count: 2
Error 2 occurred at disk power-on lifetime: 36795 hours (1533 days + 3 hours)
Error 1 occurred at disk power-on lifetime: 31542 hours (1314 days + 6 hours)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 60% 39255 –
Here we can see the disk faces hardware problems, fsck might not help much as it repairs logical errors, but let’s give this a shot.
3. Before proceeding with fsck, backup your critical data to another server.
Remount / partition.
# mount -o remount /
Run e2fsck command to check ext3 file system.
# e2fsck /dev/sda
Now try remounting the partition and if you still face error move to next step.
# fsck -f -y /dev/sda
This should fix the error, but still monitor the log for some days, if the smartd error repeats then it’s time to replace the disk, otherwise consider yourself lucky and enjoy using same disk.