LinkedIn Sourceforge Twitter

Vincent's Blog

Pleasure in the job puts perfection in the work (Aristote)

Yet another bit rotation scanner for storage

Posted on 2018-10-19 22:24:00 from Vincent in Open Bsd yabitrot

In order to be sure that the backups of my NAS were coherent and consistent, I was looking for a bit rotation scanner.

Unfortunately, I did not find what I need, so I've written mine.

This blog explain the why, the how, ...


what is a bitrot application ?

Let me first remind you what such application can do.

As very well explained on the blog of Solene a bitrot detect the rotation of one bit on your storage device. This could a Hard disk, a SSD, an USB or a CD.

Indeed with the time some bits are loosing their values and become "rotated". This could damage your files if you do not take cares of it.

So, in short a bitrot application is looking if the checksum of unmodified files are well equal. If not, they are in bad shape and a restore from backup is more than welcome for the impacted file.

Why a new bitrot application ?

I'm using a NAS server (running on OpenBSD) to store all important information of several users (included me). This is a disk of 1TB which is backuped regularly on a 2nd external disk. Indeed I do not want to have both PROD and backup located on the same place, they could be damaged both by fire, water, ... and I will loose all my data.

This NAS is heavily using the concept of "time machine" and do lot of harlinks between each "versions" of the local backups (details here)

I'm not using a smart application to backup my data from PROD to Backup. I'm just doing disk copy with the command "pax -rw". So, if I have some, bad bits from my PROD machine will be copied to the backup, with very bad consequences. So, I need a bitrot scanner that could guarantee that all my files are in good shape before copying them on the backup disk.

Why write an other bitrot application ?

Indeed, we can find several bitrot app on internet. For example, you can find bitrot in python, bitrot in javascript, bitrot in go, bitrot scanner in go

In total I've found more than 10 bitrot applications. Unfortunately none are matching important elements for me:

  • use a very light checksum algorithm. Lot are using sha1 or md5, but here our need is just to detect integrity of a file, and not hijack of a file. It's quite improbable that the missing bit we could have do not impact the checksum (what a hacker try to perform). I do not want to spend more than 3 hours on my small NAS server for such scan.
  • be smart and take into account hardlinks. Indeed, all of the bitrot app I've tested are based on filenames. So, if you have 10 different files pointing to the same Inode, such tools will scan 10x this file and perform 10x the checksum. Not really efficient for my case.

So, I wrote yabitrot taking into account the hardkinks and after evaluation, I decided to use adler as checksum. Adler is one of the fastest checksum I've tested.

In short, yabitrot is based on the Inode of each files. For each Inode yabitrot compute the checksum and store the timestamps at which this calculation has been made.

How it works ?

yabitrot use few parameters. One of them is the path of the folder you want to scan.

Since yabitrot remember the Inodes of each file, you cannot scan, in one pass, files spread around different filesystems.

The first time you run yabitrot, it will scan all files, compute the checksum and store it in his DataBase (a file called .cksum.db and located at the root of your folder)

During the next runs, yabitrot will compare the checksums of unmodified files to make sure that their checksum are equal to the previous one (stored int he DB). For new or modified files, we just compute the checksum and store it. We will also remove from the DB all files stored on the DB but no more present in the folder.

On OpenBSD, I suggest you trigger yabitrot via the weekly.local or via the monthly.local file. You will thus receive an email listing the files having bitrot issues.

You must not be root to run it, but you have to run it with the user having enough permissions to read all targeted files and to store the DB at the root of your targeted folder.

Download

You can find this code on Sourceforge: here
You can download it here

Conclusion

I'm running this script since several weeks now without too much troubles.

I can scan my NAS disk of 720GB in 2h40 on my small board with 2 CPU of 3.3GHz and 4GB of Ram. This disk has 6.8 millions files, but 760.000 Inodes (so each files has +- 9 different names).

Here after the log's details after the 1st scan.

No cleanup required
760238 files added
0 files updates
0 files error
6092222 files analysed in 8612.82 sec, 706.492 GB
760238 entries in the DB

We see that such DB takes 29MB !!!

obsd-nas:~#ls -alh /mnt/sd1
total 60004
drwxrwxrwx   7 root  wheel   512B Oct 13 20:37 .
drwxr-xr-x   4 root  wheel   512B Mar 24  2018 ..
-rw-------   1 root  wheel  29.1M Oct 13 20:37 .cksum.db

Here after the log's details after the next scan:

Wed Oct 17 07:54:04 2018: 1025 files removed from DB
Wed Oct 17 07:54:04 2018: 1302 files added
Wed Oct 17 07:54:04 2018: 205 files updates
Wed Oct 17 07:54:04 2018: 0 files error
Wed Oct 17 07:54:04 2018: 6094853 files analysed in 8643.72 sec, 706.798 GB
Wed Oct 17 07:54:04 2018: 760515 entries in the DB

Extra note

This tool is not specific to a filesystem.

I've test it on Windows and it works. So, it should run everywhere :-).



0, 0
displayed: 528



What is the second letter of the word Python?