This entry was spawned by a post in sitepoint’s forums about benchmarking. It reminded me of what a jaded view many programmers have about quick and dirty benchmarking options. Most people make use of microtime()/time() functions, which at the surface makes sense. However, I never like to look at things on the surface, so I would like to dig a little deeper and show everyone a unix/linux alternative to the standard microtime() bench mark.
3 times to live by
There are three types of times that an application will take. System time, User time, and what I like to call Wall-clock time. If I were to summarize these times I would say system time is the amount of time the system (or the processor) spends on behalf of the current process. The user time is the amount of time spend executing in user mode. Many times this will neglect some of the file i/o or socket operations that the program uses. Wall-clock time is the amount of time that passes on your wall clock while your process runs.
Now the question we have to ask ourselves is which of these three time systems will give us the most accurate amount of time our process actually consumes resources? Because we live in a time of multi-tasking and background processing the wall clock time must be thrown right out the window. Just because we have our browser open for one hour on our desktop doesn’t necessarily mean your computer has spent a solid hour of dedicating it’s resources to the browser. (Well, unless it’s IE of course…:P)
What is interesting is that even though we quickly pick wall-clock time as being the worst way to benchmark resource (specifically time) usage, we are always quick to jump on that microtime() function to give us our benchmarks. The microtime() function measures the same wall-clock time we’ve determined doesn’t work!
So, what does that leave us with? System time and User time. Which of these is better for benchmarking an application? Well, that depends on the application. The advantage of system time is it will give you the amount of time that any subproccesses right down to the low lever system processes (ie. file i/o, socket operations) take. This can also be a disadvantage though because it may make it harder for you to determine if a bottleneck is happening in your code or in some sub-system your code uses.
Next question I am sure you have is how you measure system or user time? Well for that we are going to turned to the least appreciated tool in the arsenal of the resource concious developer: getrusage(). Before we go on I must mention that if you are using a windows server you will NOT have access to this function. I do not know of any other alternatives at this time.
getrusage() provides a direct interface into the unix getrusage function. It returns a detailed array containing many values related to resource consumption. The name and purpose of these values may vary across different platforms. You can find exactly what information is available using the print_r() function on the array returned by getrusage(). Here is a pretty good list of values that you will probably find:
- ru_utime.tv_sec, ru_utime.tv_usec: the total amount of time spent executing in user mode.
- ru_stime.tv_sec, ru_stime.tv_usec: the total amount of time spent in the system executing on
behalf of the process(es).
- ru_maxrss: the maximum resident set size utilized (in kilobytes).
- ru_ixrss: an integral value indicating the amount of memory used by
the text segment that was also shared among other processes.
This value is expressed in units of kilobytes * ticks-of-execution.
- ru_idrss: an integral value of the amount of unshared memory residing
in the data segment of a process (expressed in units of
kilobytes * ticks-of-execution).
- ru_isrss: an integral value of the amount of unshared memory residing
in the stack segment of a process (expressed in units of
kilobytes * ticks-of-execution).
- ru_minflt: the number of page faults serviced without any I/O activity;
here I/O activity is avoided by reclaiming a page frame from
the list of pages awaiting reallocation.
- ru_majflt: the number of page faults serviced that required I/O activity.
- ru_nswap: the number of times a process was swapped out of main memory.
- ru_inblock: the number of times the file system had to perform input.
- ru_oublock: the number of times the file system had to perform output.
- ru_msgsnd: the number of IPC messages sent.
- ru_msgrcv: the number of IPC messages received.
- ru_nsignals: the number of signals delivered.
- ru_nvcsw: the number of times a context switch resulted due to a process
voluntarily giving up the processor before its time
slice was completed (usually to await availability of a resource).
- ru_nivcsw: the number of times a context switch resulted due to a higher
priority process becoming runnable or because the current
process exceeded its time slice.
The values we are worried about for giving a benchmark would be the ru_utime.* and ru_stime.* variables. The example I am going to show you will help you benchmark the system time used. If you would like to captuer the user time then it is just a matter of variable swapping.
I hope everybody learned a little something from this.