plugout.net

by **Jocko** » Sun Sep 20, 2020 8:33 am

Hi Mijself

I ran diskmon on my 5big2 this night (because I have also some spin down issue on the system disk)
And indeed I was noticing these amazing lines with diskmon:

Code: Select all: ==== Sun Sep 20 00:40:01 CEST 2020 ==== [24452.405340] diskmon(10430): dirtied inode 176 (ld-linux.so.3) on sda2 [24452.405399] diskmon(10430): dirtied inode 126 (ld-2.17.so) on sda2 [24452.406715] awk(10430): dirtied inode 143 (libm.so.6) on sda2 [24452.406775] awk(10430): dirtied inode 186 (libm-2.17.so) on sda2 [24452.407257] awk(10430): dirtied inode 177 (libc.so.6) on sda2 [24452.407315] awk(10430): dirtied inode 149 (libc-2.17.so) on sda2 [24452.441991] diskmon(10431): dirtied inode 63 (busybox) on sda2 [24452.443351] dmesg(10431): dirtied inode 10833 (libtirpc.so.1) on sda2 [24452.443413] dmesg(10431): dirtied inode 10257 (libtirpc.so.1.0.10) on sda2 [24452.444003] dmesg(10431): dirtied inode 146 (libnsl.so.1) on sda2 [24452.444061] dmesg(10431): dirtied inode 198 (libnsl-2.17.so) on sda2 [24452.444365] dmesg(10431): dirtied inode 180 (libpthread.so.0) on sda2 [24452.444413] dmesg(10431): dirtied inode 130 (libpthread-2.17.so) on sda2 ... [25653.174654] sleep(20297): dirtied inode 178 (librt.so.1) on sda2 [25653.174720] sleep(20297): dirtied inode 172 (librt-2.17.so) on sda2 [25654.278750] diskmon(20306): dirtied inode 70 (date) on sda2 ... [26252.913683] diskmon(25222): dirtied inode 11180 (awk) on sda2 [26252.915790] diskmon(25224): dirtied inode 11078 (gawk) on sda2 [26252.917478] awk(25224): dirtied inode 10282 (libgcc_s.so.1) on sda2

Later these outputs are no longer present

Thank you for your explanation :thumbup about dirty inode because up to now I do not understand why we got that and indeed the firmware partition is mounted with the option relatime :pound

Code: Select all: root@Acrab:/ # mount /dev/sda2 on / type ext3 (rw,relatime) <=============== here none on /proc type proc (rw,relatime) none on /sys type sysfs (rw,relatime) none on /dev/pts type devpts (rw,relatime,mode=600,ptmxmode=000) /dev/sda5 on /rw_fs type ext3 (rw,noatime)

Mijzelf wrote:Does diskmon run in busybox ash? AFAIK that uses 'internal' du, grep, ...

Not fully, the firmware uses busybox applet but also standard command :

Code: Select all: root@Acrab:/ # ls -al /usr/bin/du -rwxr-xr-x 1 root root 100312 2012-10-01 20:30 /usr/bin/du root@Acrab:/ # whereis grep grep: /bin/grep /usr/bin/grep root@Acrab:/ # ls -l /bin/grep /usr/bin/grep lrwxrwxrwx 1 root root 7 2017-05-27 19:41 /bin/grep -> busybox -rwxr-xr-x 1 root root 100420 2009-02-13 21:45 /usr/bin/grep

so we have both command types for grep and only the standard du command.
=> so need to update the script for using full path

by **Mijzelf** » Sun Sep 20, 2020 10:36 am

Did some research, apparently busybox shell only uses internal applets without searching the path when 'FEATURE_PREFER_APPLETS' was set at compile time.
On the ZyXEL NASses for which I wrote this script, it doesn't matter anyway, as the rootfilesystem is in ram. But indeed providing a full path (to the busybox applets) might prevent the disk from being waked up. (Unless the cache of the disk was flushed, so it has to read the disk anyway to find out that the applets are symlinks to busybox.)

Jocko wrote:
Code: Select all
root@Acrab:/ # mount /dev/sda2 on / type ext3 (rw,relatime) <=============== here

In that case executing the binary shouldn't dirty the inode, I think?

man mount wrote:relatime
Update inode access times relative to modify or change time. Access time is only updated if the previous access time was earlier than the current modify or change time. (Similar to noatime, but doesn't break mutt or other applications that need to know if a file has been read since the last time it was modified.)

On my laptop the filesystems are mounted relatime. I searched the /bin directory for some obscure command which was never used, and looked at the timestamps:

Code: Select all: [user@Anthraciet ~]$ stat /bin/msgen File: /bin/msgen Size: 23912 Blocks: 48 IO Block: 4096 regular file Device: 801h/2049d Inode: 2753749 Links: 1 Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-11-15 14:00:55.000000000 +0100 Modify: 2018-11-15 14:00:55.000000000 +0100 Change: 2019-12-01 21:02:37.201100375 +0100 Birth: 2019-12-01 21:02:37.201100375 +0100

So on this file the access timestamp is older than the change timestamp. It should change at execute, but only while access < change. And so it did:

Code: Select all: [user@Anthraciet ~]$ /bin/msgen /bin/msgen: no input file given Try '/bin/msgen --help' for more information. [user@Anthraciet ~]$ stat /bin/msgen File: /bin/msgen Size: 23912 Blocks: 48 IO Block: 4096 regular file Device: 801h/2049d Inode: 2753749 Links: 1 Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2020-09-20 12:12:13.483275025 +0200 Modify: 2018-11-15 14:00:55.000000000 +0100 Change: 2019-12-01 21:02:37.201100375 +0100 Birth: 2019-12-01 21:02:37.201100375 +0100 # Wait one minute... [user@Anthraciet ~]$ /bin/msgen /bin/msgen: no input file given Try '/bin/msgen --help' for more information. [user@Anthraciet ~]$ stat /bin/msgen File: /bin/msgen Size: 23912 Blocks: 48 IO Block: 4096 regular file Device: 801h/2049d Inode: 2753749 Links: 1 Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2020-09-20 12:12:13.483275025 +0200 Modify: 2018-11-15 14:00:55.000000000 +0100 Change: 2019-12-01 21:02:37.201100375 +0100 Birth: 2019-12-01 21:02:37.201100375 +0100

As you can see only the first time the access time is changed, and not the second time. As relatime should do. With noatime it wouldn't have changed at all.
So why are your inodes dirtied? Are the executables touched in between? Or are the symlinks created each time on boot? In the latter case each first access after boot will dirty the inode.
Or maybe, is the 'change' time bogus/in future? Haven't tested it, but I can imagine that if 'change' or 'modified' is in future each access will change 'access'.

by **serram** » Sun Sep 20, 2020 11:12 am

Here it is the log1 file of almost 12 hours of logging with diskmon in my Lacie cloudbox, with inactivity period set to 10 minutes and the LAN cable disconnected.

by **Jocko** » Sun Sep 20, 2020 11:42 am

Hi

Thank you for your post.

After examining the log, what I can say:
- you will have to repeat the check with an updated diskmon version (uses only busybox applet) and noatime option on the fw system partition. (I just made a test and that fixes some I/O)
- hostd updated hosts file every 2 hours xx:51. So that means at this time the disk was spin up
- Mainly I/O accesses have a cycle behaviour (30min): xx:06; xx:36;xx:21;xx:51 I think noflushd still uses a timeout value 30min. I detected a bug last friday: from the web-interface, the firmware fails to kill the running daemon and then noflushd does not use the new settings.
- between these cycles I/O (JBD+noflushd) I noticed nothing

To confirm you have this bug can you post

Code: Select all: ps -ef|grep noflushd

Note: do not reboot the nas, as in this case noflushd uses the expected conf

by **Mijzelf** » Sun Sep 20, 2020 11:57 am

I see a lot of unexplained activity, like dirtying and flushing inodes which shouldn't have been dirtied in the first place.

Another thing:

Code: Select all: ==== Sun Sep 20 06:06:08 CEST 2020 ==== [18074.686786] jbd2/sda5-8(785): WRITE block 530784 on sda5 (8 sectors) [18074.686822] jbd2/sda5-8(785): WRITE block 530792 on sda5 (8 sectors) [18074.686840] jbd2/sda5-8(785): WRITE block 530800 on sda5 (8 sectors) [18074.686854] jbd2/sda5-8(785): WRITE block 530808 on sda5 (8 sectors) [18074.686867] jbd2/sda5-8(785): WRITE block 530816 on sda5 (8 sectors) [18074.686880] jbd2/sda5-8(785): WRITE block 530824 on sda5 (8 sectors) [18080.406232] jbd2/sda5-8(785): WRITE block 530832 on sda5 (8 sectors) [18080.526928] noflushd(1315): WRITE block 0 on sda5 (8 sectors) [18080.526971] noflushd(1315): WRITE block 8 on sda5 (8 sectors) [18080.526994] noflushd(1315): WRITE block 786856 on sda5 (8 sectors) [18080.527020] noflushd(1315): WRITE block 786864 on sda5 (8 sectors) [18080.527040] noflushd(1315): WRITE block 790608 on sda5 (8 sectors)

This repeats until 7:51. The journal blocks numbers are incrementing, as to be expected from a journal file. (I suppose it will wrap around if the end of the journal file is reached).
And then each time noflushd writes block 0, 8, 786856, 786864, 790608. I suppose 0 and 8 are header data, but it would be interesting which file(s) are that high block numbers.
If you have the tool debugfs, it can be found using

Code: Select all: debugfs -R "icheck 786856 786864 790608" /dev/sda5

Which will give the inode number(s), and

Code: Select all: debugfs -R "ncheck <inode-number>" /dev/sda5

which will give the (relative) filename.

It should also be possible to use debugfs to read the journal, but at the moment I don't know how.

by **serram** » Sun Sep 20, 2020 12:05 pm

Hi Jocko

Thank you for your conclusions.
I've observed that the disk only remains still very few minutes, usually less than 10. So I think the hostd behavior (2 hours) wouldn't be a big problem for me.
Although the most desirable behavior would be that the disk remains still as long as possible to avoid future problems of head failure, etc.

To confirm you have this bug can you post
Code: Select all
ps -ef|grep noflushd

This is the output of ps and grep comands:

Code: Select all: root@fvdwsl-base:/ # ps -ef|grep noflushd root 1315 1 0 01:05 ? 00:00:08 /usr/sbin/noflushd -n 10 /dev/sda root 24926 24906 0 13:51 pts/0 00:00:00 grep noflushd

I'll repeat the test with the updated diskmon version when you told me, thanks again.

by **serram** » Sun Sep 20, 2020 12:12 pm

Hi Mijzelf, this is the output of the debugfs tool comands:

Code: Select all: root@fvdwsl-base:/ # debugfs -R "icheck 786856 786864 790608" /dev/sda5 debugfs 1.43.4 (31-Jan-2017) Block Inode number 786856 <block not found> 786864 <block not found> 790608 <block not found> root@fvdwsl-base:/ # debugfs -R "ncheck <inode-number>" /dev/sda5 debugfs 1.43.4 (31-Jan-2017) <inode-number>: Bad inode - <inode-number>

by **Jocko** » Sun Sep 20, 2020 12:50 pm

From web, it seems you have to divide by 8 the block number (fs block size is 4096=8 x 512):
So run

Code: Select all: debugfs -R "icheck 98357 98358 98826" /dev/sda5

by **serram** » Sun Sep 20, 2020 1:08 pm

Here it is the output:

Code: Select all: root@fvdwsl-base:/ # debugfs -R "icheck 98357 98358 98826" /dev/sda5 debugfs 1.43.4 (31-Jan-2017) Block Inode number 98357 <block not found> 98358 <block not found> 98826 22428

by **Jocko** » Sun Sep 20, 2020 1:13 pm

and then run

Code: Select all: debugfs -R "ncheck 22428" /dev/sda5

plugout.net

Hard Disk wake up very frecuently

Re: Hard Disk wake up very frecuently

Re: Hard Disk wake up very frecuently

Re: Hard Disk wake up very frecuently

Re: Hard Disk wake up very frecuently

Re: Hard Disk wake up very frecuently

Re: Hard Disk wake up very frecuently

Re: Hard Disk wake up very frecuently

Re: Hard Disk wake up very frecuently

Re: Hard Disk wake up very frecuently

Re: Hard Disk wake up very frecuently

Who is online