Sunday, April 10, 2016

G-Profiling

To enable g-profiling you need to add the -pg flag to the compiler options. These options can be found in the Makefile. Just look for the terms "CFLAGS", "CPPFLAGS", and "LDFLAGS". Behing these options add the -pg flag and g-profiling should be enabled.

At first I couldn't figure out why my software wasn't producing the gmon.out file, which gprof generates for you. I added -pg to all the flags in the head Makefile. But apparently there were way more Makefiles where I needed to change the flags. There were 6 Makefiles in total, so I added -pg to all the compiler flags.

Luckily this did generate the gmon.out file so I could take a look at the profiling by running the following command:
gprof md5deep > gprof.txt
This generates a readable file which contains the profiling. The strange thing was that my gprof file said that every function it called took 0 seconds, even though it said that the function got called 8 times. It probably went through a lot of functions quickly and didn't register time.

The function that gets called 8 times is hash_final_sha1 which looks like this:
void hash_final_sha1(void * ctx, unsigned char *sum) { 
sha1_finish((sha1_context *)ctx.sum); } 
Since it's a one liner there isn't much I can optimize here. But it does call other functions which I can take a look at. I went through all the functions that call each other until I found a function that actually did things by itself instead of just forwarding to another function. I ended up at the following function:
void sha1_update( sha1_context *ctx, const unsigned char *input, size_t ilen )
{
    size_t fill;
    unsigned long left;

    if( ilen <= 0 )
        return;

    left = ctx->total[0] & 0x3F;
    fill = 64 - left;

    ctx->total[0] += (unsigned long) ilen;
    ctx->total[0] &= 0xFFFFFFFF;

    if( ctx->total[0] < (unsigned long) ilen )
        ctx->total[1]++;

    if( left && ilen >= fill )
    {
        memcpy( (void *) (ctx->buffer + left),
                (const void *) input, fill );
        sha1_process( ctx, ctx->buffer );
        input += fill;
        ilen  -= fill;
        left = 0;
    }

    while( ilen >= 64 )
    {
        sha1_process( ctx, input );
        input += 64;
        ilen  -= 64;
    }

    if( ilen > 0 )
    {
        memcpy( (void *) (ctx->buffer + left),
                (const void *) input, ilen );
    }
}
Looking at it I could not find a possible way to optimize this function. Since it took a while to find a suitable function, and I couldn't find an optimization, I'm going to look into the compiler flags next since they use the O2 flag, and this should be able to change to O3.

No comments:

Post a Comment