Admin console not accessable after 15-1 update

Re: Admin console not accessable after 15-1 update

Postby fvdw » Thu Mar 20, 2014 9:08 am

my test with kernel 164 is still running ok (now for 1.5 day) with continuous download

yes some changes are made to the kernel in 15-1, main changes are some netfliter functions and support for wireless dongles

The paging is managed by the kernel. Probably this results in wrong pointer values which are passed by nzbget to the memcpy...

Is it possible to check whether memcpy and memory management implementations are changed in the kernel versions using diff tooling??? Or is the memcpy written in assembly

kernel 199 and 164 using are the same kernel code (3.9.5). Undoubtly there will be changes with earlier versions. But anylising that goes beyond my level of competence and if memcpy had a problem then for sure it would have been known as it is a very often used function.
About your remark on paging. It can also be the other around memcopy resulting in a wrong paging instruction.

if we could find out which routine and then wich function in nzbget causes the oops we will have more chance to solve this bug
fvdw
Site Admin - expert
 
Posts: 13471
Joined: Tue Apr 12, 2011 2:30 pm
Location: Netherlands

Re: Admin console not accessable after 15-1 update

Postby brinka123 » Thu Mar 20, 2014 9:58 am

I checked the crash2.txt again, on the top it says kernel #164. So we are sure, #164 can crashes too.. But it looks more stable.

What I also see is that a newer GCC version is used to compile kernel #199 (gcc version 4.8.1) and kernel #164 (gcc version 4.5.4 ) ???

It looks silly but see what newer GCC versions can introduce (Just an example, probably nothing to do with the error we are looking at ):
Changed in GCC 4.7:
•On ARM, when compiling for ARMv6 (but not ARMv6-M), ARMv7-A, ARMv7-R, or ARMv7-M, the new option -munaligned-access is active by default, which for some sources generates code that accesses memory on unaligned addresses. This requires the kernel of those systems to enable such accesses (controlled by CP15 register c1, refer to ARM documentation). Alternatively, or for compatibility with kernels where unaligned accesses are not supported, all code has to be compiled with -mno-unaligned-access. Upstream

Check this link:
https://www.gnu.org/software/gcc/releases.html

Is the nzbget and the kernel compiled with the same compilation tool chain/version??
Thanks for your time.
brinka123
Donator VIP
Donator VIP
 
Posts: 126
Joined: Sat Nov 17, 2012 3:06 pm

Re: Admin console not accessable after 15-1 update

Postby brinka123 » Thu Mar 20, 2014 1:37 pm

Don't know but, did a very fast code review of nzbget.
It is not tested whether pBuffer is NULL in ReadLine function.

Code: Select all
char* Connection::ReadLine(char* pBuffer, int iSize, int* pBytesRead)
{
   if (m_eStatus != csConnected)
   {
      return NULL;
   }

   char* pBufPtr = pBuffer;
   iSize--; // for trailing '0'
   int iBytesRead = 0;
   int iBufAvail = m_iBufAvail; // local variable is faster
   char* szBufPtr = m_szBufPtr; // local variable is faster
   while (iSize)
   {
      if (!iBufAvail)
      {
         iBufAvail = recv(m_iSocket, m_szReadBuf, CONNECTION_READBUFFER_SIZE, 0);
         if (iBufAvail < 0)
         {
            ReportError("Could not receive data on socket", NULL, true, 0);
            break;
         }
         else if (iBufAvail == 0)
         {
            break;
         }
         szBufPtr = m_szReadBuf;
         m_szReadBuf[iBufAvail] = '\0';
      }
      
      int len = 0;
      char* p = (char*)memchr(szBufPtr, '\n', iBufAvail);
      if (p)
      {
         len = (int)(p - szBufPtr + 1);
      }
      else
      {
         len = iBufAvail;
      }
      
      if (len > iSize)
      {
         len = iSize;
      }
      
      memcpy(pBufPtr, szBufPtr, len);
      pBufPtr += len;
      szBufPtr += len;
      iBufAvail -= len;
      iBytesRead += len;
      iSize -= len;
      
      if (p)
      {
         break;
      }
   }
   *pBufPtr = '\0';
   
   m_iBufAvail = iBufAvail > 0 ? iBufAvail : 0; // copy back to member
   m_szBufPtr = szBufPtr; // copy back to member
   
   if (pBytesRead)
   {
      *pBytesRead = iBytesRead;
   }
   
   if (pBufPtr == pBuffer)
   {
      return NULL;
   }
   
   return pBuffer;
}



Function is called by
Code: Select all
WebDownloader::EStatus WebDownloader::DownloadHeaders()
{
   EStatus Status = adRunning;

   m_bConfirmedLength = false;
   const int LineBufSize = 1024*10;
   char* szLineBuf = (char*)malloc(LineBufSize);
   m_iContentLen = -1;
   bool bFirstLine = true;
   m_bGZip = false;
   m_bRedirecting = false;
   m_bRedirected = false;

   // Headers
   while (!IsStopped())
   {
      SetLastUpdateTimeNow();

      int iLen = 0;
      char* line = m_pConnection->ReadLine(szLineBuf, LineBufSize, &iLen);


The malloc function doesn't always returns a valid pointer... When not enough size available, it returns NULL.... So looks like the memcpy can get a NULL pointer when the malloc fails.

Could it be possible that a faulty malloc generates the "Unable to handle kernel paging request at virtual address 0009fc28" error?
Last edited by brinka123 on Thu Mar 20, 2014 4:39 pm, edited 1 time in total.
brinka123
Donator VIP
Donator VIP
 
Posts: 126
Joined: Sat Nov 17, 2012 3:06 pm

Re: Admin console not accessable after 15-1 update

Postby brinka123 » Thu Mar 20, 2014 4:32 pm

nzbget 12.0 with kernel #164 is running 24 hours now...

What I see is that the free memory is changing from 57000 to 3000, when it reaches 3000 it returns to 57000 again. Previously it stayed around 3000.... Weird...
brinka123
Donator VIP
Donator VIP
 
Posts: 126
Joined: Sat Nov 17, 2012 3:06 pm

Re: Admin console not accessable after 15-1 update

Postby fvdw » Thu Mar 20, 2014 5:32 pm

Indeed kernel 164 is behaving more stable w.r.t this bug.
And that makes you think if it is really a software bug. Bugs should be reproduceable. I tried a file that made nzbget 12.0 crash on firmware 15-1 for a second time and it downloaded fine without creating a crash
There is something more interfering or more conditions must be valid at the same time to make it fail. This makes it very hard to find the cause and a solution.

Is the nzbget and the kernel compiled with the same compilation tool chain/version??

kernel 199 and nzbget 12 are both compiled using gcc 4.8.1 and same toolchain (based on latest binutils and glibc-2.17

nzbget-9.0 could be even an older version then 4.5.4 I don't know anymore but I can compile the kernel and nzbget also using the 4.5.4 compiler, however this will be gainst glibc-2.17
On the older firmwares we still had glibc-2.3.6 I donont have a dvelopment nas anymore running with that version.
But if the compiler would create the bug then all programs compiled with it should suffer. Until now only nzbget seems to be causing problems

Could it be possible that a faulty malloc generates the "Unable to handle kernel paging request at virtual address 0009fc28" error?

To be honest I don't know but not likely as then it would cause problems in all applications compiled

A nice challenge to crack this nut.
fvdw
Site Admin - expert
 
Posts: 13471
Joined: Tue Apr 12, 2011 2:30 pm
Location: Netherlands

Re: Admin console not accessable after 15-1 update

Postby brinka123 » Thu Mar 20, 2014 5:55 pm

Kernel #164 crashed again.

I think it is reproducable.

When nzbget is downloading, and from samba side you run an heavy disk access (Quickpar on windows for example) it crashes:

Code: Select all
[89940.409526] Unable to handle kernel paging request at virtual address 0009a2d0
[89940.416813] pgd = cf3b0000
[89940.420068] [0009a2d0] *pgd=0f224831, *pte=0c0f514f, *ppte=0c0f5ffe
[89940.426581] Internal error: Oops: 805 [#1] PREEMPT ARM
[89940.431703] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iscsi_trgt(O) usblp usb_storage ehci_hcd
[89940.442948] CPU: 0    Tainted: G           O  (3.9.5 #164)
[89940.448420] PC is at memcpy+0xc4/0x3a4
[89940.452156] LR is at 0x81d301c3
[89940.455287] pc : [<c02a6164>]    lr : [<81d301c3>]    psr: 20000013
[89940.455287] sp : cf157ccc  ip : 00000010  fp : cf157ce8
[89940.466712] r10: c91c52e4  r9 : cf157cec  r8 : 259bb798
[89940.471916] r7 : a7061182  r6 : 085c537b  r5 : 0009a2d0  r4 : c02a612c
[89940.478413] r3 : 00000010  r2 : 000003d0  r1 : c91c52f4  r0 : 0009a2d0
[89940.484910] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[89940.492012] Control: 0005317f  Table: 0f15c000  DAC: 00000015
[89940.497731] Process nzbget (pid: 14700, stack limit = 0xcf1561b8)



Is it possible to set a log statement in nzbget, which checks whether the malloc fails. (NULL pointer check) ??

The malloc can allocate memory in 99.999% of the cases. Maybe when more process are running it can go wrong, which isn't incorrect. It looks like nzbget is not programmed robust. Maybe the behavior is changed of malloc in newer kernels. Applications which use it must be aware that malloc is not always allocates memory. It looks like nzbget is not programmed to handle this case.
brinka123
Donator VIP
Donator VIP
 
Posts: 126
Joined: Sat Nov 17, 2012 3:06 pm

Re: Admin console not accessable after 15-1 update

Postby fvdw » Thu Mar 20, 2014 7:44 pm

I returned home late this evening. I checked the test nwsp2 running nzbget and found that it crashed after 40 hours

Code: Select all
[145062.630124] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iscsi_trgt(O) usblp usb_storage ehci                                      _hcd
[145062.641455] CPU: 0    Tainted: G           O  (3.9.5 #164)
[145062.647006] PC is at memcpy+0xc4/0x3a4
[145062.650829] LR is at 0x2e1b4709
[145062.654046] pc : [<c02a6164>]    lr : [<2e1b4709>]    psr: 20000013
[145062.654046] sp : cf283ccc  ip : 00000010  fp : cf283ce8
[145062.665644] r10: c3e19504  r9 : cf283cec  r8 : 226918d4
[145062.670933] r7 : 2ab9c13a  r6 : 7938624c  r5 : 001aa330  r4 : c02a612c
[145062.677517] r3 : 00000010  r2 : 000003d0  r1 : c3e19514  r0 : 001aa330
[145062.684101] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[145062.691289] Control: 0005317f  Table: 09684000  DAC: 00000015
[145062.697094] Process nzbget (pid: 6037, stack limit = 0xcf2821b8)
[145062.703160] Stack: (0xcf283ccc to 0xcf284000)
[145062.707590] 3cc0:                            001aa330 cf282000 00000400 00000000 001aa330
[145062.715827] 3ce0: 00000400 c02b1574 cf2c46d8 cace86a8 00000400 cf283f78 00000400 00000400
[145062.724061] 3d00: cf282000 c3e19504 cf283f70 00000400 cf282000 c0412668 00000400 00000000
[145062.732294] 3d20: cf2eac20 00000400 000005a8 c0413050 00000400 cf158ea0 00000000 00000400
[145062.740528] 3d40: cf2eac20 cf15917c 00000000 00000400 cf282000 c045d0ac 00000000 c012498c
[145062.748761] 3d60: c079c420 00000000 00000006 00000000 cf283f54 c0ef9440 00000001 00000000
[145062.756997] 3d80: 00000000 cf158ee4 cdb88514 0000176f c0518bf0 cf283dd0 cf34c5a0 cf283f54
[145062.765229] 3da0: c07034d8 00000000 00000400 00000000 cf282000 c0ef9440 aecfe8b4 c047c410
[145062.773463] 3dc0: 00000000 00000000 cf283dd4 00000000 cf283df0 00000000 cf283df0 cf58e320
[145062.781697] 3de0: cf283f54 c04072ec 00000000 01c38000 00000000 00000000 00000000 00000001
[145062.789930] 3e00: ffffffff 00000000 00000000 00000000 00000000 00000000 c0ef9440 00000000
[145062.798163] 3e20: 00000000 00000000 00000000 00000000 cf283e70 00000000 00000000 00000000
[145062.806397] 3e40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[145062.814632] 3e60: 00000000 00000000 00000000 00000000 00000001 c0094bd8 00000000 00000400
[145062.822867] 3e80: cf58e320 c0065020 00000000 cf283f54 00000001 cf283ec0 cf34c5a0 cf283f80
[145062.831102] 3ea0: cf356280 c00ab2dc fffffff7 cf283f7c cf283f78 cf58e320 00000000 001aa330
[145062.839334] 3ec0: 00000400 00000000 00000000 c0408cf4 ffffffff cf34c5a0 00000000 00000000
[145062.847569] 3ee0: 00000000 00000000 cf80c440 c0040094 00000001 00000002 cdb88458 c00c7bf8
[145062.855801] 3f00: 00000000 00000000 00001000 00000000 00001000 00000000 00000000 00000000
[145062.864034] 3f20: 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00001000
[145062.872271] 3f40: cf34c5a8 00000002 cdb88458 00000000 00001000 cf283ed4 00000080 cf283f70
[145062.880504] 3f60: 00000001 00000000 00000000 aecfe8d0 001aa330 00000400 00000001 fffffff7
[145062.888738] 3f80: b3d86840 b1bcacf8 00000000 00000123 c0012548 c0408d60 00000000 00000000
[145062.896973] 3fa0: 00000400 c00123c0 b3d86840 b1bcacf8 00000009 001aa330 00000400 00000000
[145062.905207] 3fc0: b3d86840 b1bcacf8 00000000 00000123 aecff460 b6007914 00000000 aecfe8b4
[145062.913440] 3fe0: 00000000 aecfe890 b6deccf0 b6ded934 80000010 00000009 00000000 00000000
[145062.921676] Code: e1a00000 e4803004 e4804004 e4805004 (e4806004)
[145062.928556] ---[ end trace 3dd905a334b254a8 ]---
[145062.933246] note: nzbget[6037] exited with preempt_count 1
root@nwsp2-5:/ #


That nwsp2 is only busy with nzbget so nothing with samba, but despite that what you say could be a reason.
We could also run a trial with the 2.6.39.4 kernel. It will run fine with 15-1. We only need to install the 2.6.39.4 modules. We could do without them but then usb won't work

I will look if I can implement your suggestion in the nzbget code.

ps seems somtimes the error code is 805 and sometimes 817 (whatever they mean exactly but bot seems to originate from memcpy call
fvdw
Site Admin - expert
 
Posts: 13471
Joined: Tue Apr 12, 2011 2:30 pm
Location: Netherlands

Re: Admin console not accessable after 15-1 update

Postby fvdw » Thu Mar 20, 2014 8:48 pm

attached kerne linux 2.6.39.4 #166 for nwsp2
You can run it via fvdw-sl console and loading it as external kernel together with fvdw-sl-15-1
Some features might not work but nzbget 12-0 seems to run ok it is running right now on my test nwsp2
Ps usb port own't work as the required kernel modules are not installed for this kernel, but thats not required for this test
You do not have the required permissions to view the files attached to this post.
fvdw
Site Admin - expert
 
Posts: 13471
Joined: Tue Apr 12, 2011 2:30 pm
Location: Netherlands

Re: Admin console not accessable after 15-1 update

Postby fvdw » Thu Mar 20, 2014 10:07 pm

if you follow this discussion http://nzbget.net/forum/viewtopic.php?f ... t=10#p3271 an believe that recv() is the problem
and read this kernel bug http://cxsecurity.com/issue/WLB-2014010055

maybe then kernel 3.13.6 might do better ??

but adapting it for our nwsp2 could be some work as we use some tweaks
fvdw
Site Admin - expert
 
Posts: 13471
Joined: Tue Apr 12, 2011 2:30 pm
Location: Netherlands

Re: Admin console not accessable after 15-1 update

Postby brinka123 » Fri Mar 21, 2014 9:18 am

Kernel #166 with nzbget 12 runs for one night.

So according theory, this setup should run without no problem.

I am still on the malloc, the usage of this function in nzbget is not programmed robust. There is no condtion handling when the malloc can't allocate memory.
Also the pointerhandling further in the module needs more condition checking.

These programming "errors" lead to errors, because the behavior of the kernel is changed (probably). This kernel behavior change doesn't have to be wrong, on the other hand it could be....

Just my few cents...
brinka123
Donator VIP
Donator VIP
 
Posts: 126
Joined: Sat Nov 17, 2012 3:06 pm

PreviousNext

Return to Lacie Network Space vs2 and max version

Who is online

Users browsing this forum: Bing Bot and 11 guests