Register  |  FAQ  |  Search  |  Memberlist  |  Log in 
This forum is locked: you cannot post, reply to, or edit topics.  This topic is locked: you cannot edit posts or make replies.
understanding output of "/usr/bin/time"
POTM-MASTER


Joined: 24 Aug 2004
Posts: 629
Location: New Jersey, USA
Reply with quote
This afternoon SYSTEST slowed to a crawl as programs which previously
ran in seconds started accumulating large amounts of WALL CLOCK time.
When this happens, clearly a match with 100 moves in it may take over
an hour to run whereas normally it might take a minute.

Now - as one user of 1000's on my 1and1 host I don't have access to
accounting data or sar or top or other goodies so I'll ask a generic
questions and hope one of you smart folks can point me to possible
causes - even if they are outside my sphere of influence on the box.

The following is the output from /usr/bin/time of a single move by SYSTEST2:
Code:
0.02user 0.01system 1:38.56elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (329major+176minor)pagefaults 0swaps

A minute and a half of wall clock but little sys or user time.

Here's another one:
Code:
0.24user 0.39system 0:07.76elapsed 8%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (3348major+18233minor)pagefaults 0swaps

LOTS of major and minor page faults ...

Now I'm not real sure what page faults are, but my suspicion is that this
is the root cause of the elapsed time numbers - perhaps multiple retries
that don't get charged to either sys or user time. As I look back at older
runs I also notice things like 500%CPU or 129%CPU that have low wall
clocks even with the page swaps. Having written this, I'd NOW guess that
CPU availability is the issue ... presumably CPU over the entire world of
users rather than just my little login.

So - I don't want to penalize a proggie for long "elapsed" time when the
cause may be well out of their control ...

Just trying to understand ... and if your system test results end up a
few hours late I'm hoping you'll understand!

_________________
=Fred (The omnipotent POTM-MASTER)
View user's profileSend private messageVisit poster's website
Corrosion


Joined: 26 Nov 2004
Posts: 34
Reply with quote
pagefaults happen when the system is low on memory. it wants to access a piece of memory, but it is not "online" so it has to grab it from the disk.

no idea what the difference between major and minor is though... probably the... wait.

look here. Smile

http://en.wikipedia.org/wiki/Page_fault
View user's profileSend private messageSend e-mail
hagman
POTM WINNER

Joined: 14 Sep 2005
Posts: 119
Location: Bonn, Germany
Reply with quote
Corrosion wrote:
it wants to access a piece of memory, but it is not "online" so it has to grab it from the disk.

Apart from memory used by other processes, this can also happen by allocating several Gigs of memory and accessing it at random locations.

A minor page fault happens, when no disk I/O is involved (and which should therefore not cost lots of wall-clock time), e.g. the requested data is in memory from another process (typically favorite executables or shared libraries).
I guess that accessing an uninitialized page (i.e. there is no existing swapped out copy on disk) as in my example also counts as a minor fault. While this does not require I/O immediately, some RAM pages are "forgotton" and need to be reloaded by other processes or other dirty cache pages need to be written before memory is available.

hagman

See also http://dinsights.com/POTM/phpBB2/viewtopic.php?t=470
View user's profileSend private message
understanding output of "/usr/bin/time"
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
All times are GMT - 4 Hours  
Page 1 of 1  

  
  
 This forum is locked: you cannot post, reply to, or edit topics.  This topic is locked: you cannot edit posts or make replies.