SPO600 Blog: 2016

Thursday, April 21, 2016

Trying to get my changes upstream

As noted before my optimization was to change the compiler flag from -O2 to -O3. This increased the speed with 11% on x86_64. During the testfase of this optimization I changed the compiler flags in the Makefile itself. If I wanted this to get committed in the community I'd have to find the place where the makefiles are being created.

This would normally be the configure file. However I could not find where the flags were set for the makefiles which I thought was very strange because I am very sure that when you configure and make the program, it compiles with the -O2 flag.

I used grep to find files where -O2 would be used, and the only file it found was an instructions file of how you can manually add -O2 while configuring, not as a standard.

Then I tried using grep to find CFLAGS and where they would be defined. What I discovered is that they use a pthread library which helps find the right flags for compilation. (https://manned.org/pthread-config/b1cf9868). I quoted this from the md5deep documentation:

#   This macro figures out how to build C programs using POSIX threads. It
#   sets the PTHREAD_LIBS output variable to the threads library and linker
#   flags, and the PTHREAD_CFLAGS output variable to any special C compiler
#   flags that are needed.

I did not know how to manipulate the pthread configuration so it would always use -O3. I did read in their instructions that they had an email address for questions or remarks on the compiler options in the README file. It was however not in the file, so I could not contact them personally either.

On that note I'm sad to share that I could not get the right flags in the configure step. This means I could not commit anything to the github project because I could not contact them and ask for help or an explanation.

Sunday, April 10, 2016

Compiling Optimization Betty

After benchmarking x86_64 with the -O3 optimization flag, it's time to test this on ARCH64.
Since I can only work from the command line on the server I needed to come up with a command to replace all Makefiles' -O2 with -O3. The command I found the easiest was the following:

find -name Makefile | xargs sed -i 's/-O2/-O3/'

The following benchmarks are done on ARCH64, server Betty.

With -O2 flag

10.5 mb file	105 mb file	1.5 gb file
real: 0m0.037s	real: 0m0.345s	real: 0m4.792s
user: 0m0.028s	user: 0m0.323s	user: 0m4.551s
sys: 0m0.001s	sys: 0m0.027s	sys: 0m0.400s

With -O3 flag

10.5 mb file	105 mb file	1.5 gb file
real: 0m0.036s	real: 0m0.343s	real: 0m4.768s
user: 0m0.028s	user: 0m0.323s	user: 0m4.499s
sys: 0m0.001s	sys: 0m0.027s	sys: 0m0.426s

As you can see the -O3 did not do anything on ARCH64. I thought this was very strange and checked the executable file to see if it changed at all. The file did change, it got larger as expected. Yet there is no gain in speed. Comparing the real time of the 1.5gb file again, it wasn't even 1% faster. So for ARCH64 I recommend using -O2 because it doesn't change much in run time and your file is smaller.

For Betty I'll have to find another optimization possibility, although I wouldn't know what the next possibility would be. Probably make something specifically for ARCH64, but this would cost way more time.

Another way to go is add the compiler flag -fopt-info-missed to find missed optimizations and see if I can do something about that. Source: https://gcc.gnu.org/onlinedocs/gccint/Dump-examples.html

Compiling Optimization x86_64

As mentioned in the previous post I am going to take a look at the compiler flags and how I might be able to optimize them.

The currents flags for the C compiler that are used:

-pthread -g -pg -O2 -MD -D_FORTIFY_SOURCE=2 -Wpointer-arith -Wmissing-declarations -Wmissing-prototypes -Wshadow -Wwrite-strings -Wcast-align -Waggregate-return -Wbad-function-cast -Wcast-qual -Wundef -Wredundant-decls -Wdisabled-optimization -Wfloat-equal -Wmissing-format-attribute -Wmultichar -Wc++-compat -Wmissing-noreturn -funit-at-a-time -Wall -Wstrict-prototypes -Weffc++

As you can see there are a lot of flags to compile this program with the C compiler. What caught my eye was that they're using the O2 flag and not O3, so I saw this as a optimization possibility and went straight in to change it. Once I changed all the flags I did another benchmark of the program to see if it had helped at all. ~

To check if it did anything, I looked at the executable file. It made the file larger with the -O3 optimization. It went from 2.7 mb to 3.5 mb.

Note: Don't forget to take out the -pg flag as well to have a correct benchmark. -pg makes your program slower so then you won't have the correct comparison.

The following benchmarks are for x86_64

With -O2 flag

10.5 mb file	105 mb file	1.5 gb file
real: 0m0.053s	real: 0m0.287s	real: 0m3.458s
user: 0m0.021s	user: 0m0.200s	user: 0m2.793s
sys: 0m0.003s	sys: 0m0.012s	sys: 0m0.184s

With -O3 flag

10.5 mb file	105 mb file	1.5 gb file
real: 0m0.045s	real: 0m0.258s	real: 0m3.071s
user: 0m0.020s	user: 0m0.197s	user: 0m2.756s
sys: 0m0.002s	sys: 0m0.014s	sys: 0m0.177s

By comparing the real time you can see the -O3 flag makes the program 11.2% faster. Which is a pretty nice optimization. So I recommend they replace the -O2 flag with -O3.

Now it's time to benchmark the software on Betty and hope so see an optimization as well.

G-Profiling

To enable g-profiling you need to add the -pg flag to the compiler options. These options can be found in the Makefile. Just look for the terms "CFLAGS", "CPPFLAGS", and "LDFLAGS". Behing these options add the -pg flag and g-profiling should be enabled.

At first I couldn't figure out why my software wasn't producing the gmon.out file, which gprof generates for you. I added -pg to all the flags in the head Makefile. But apparently there were way more Makefiles where I needed to change the flags. There were 6 Makefiles in total, so I added -pg to all the compiler flags.

Luckily this did generate the gmon.out file so I could take a look at the profiling by running the following command:

gprof md5deep > gprof.txt

This generates a readable file which contains the profiling. The strange thing was that my gprof file said that every function it called took 0 seconds, even though it said that the function got called 8 times. It probably went through a lot of functions quickly and didn't register time.

The function that gets called 8 times is hash_final_sha1 which looks like this:

void hash_final_sha1(void * ctx, unsigned char *sum) {

sha1_finish((sha1_context *)ctx.sum); }

Since it's a one liner there isn't much I can optimize here. But it does call other functions which I can take a look at. I went through all the functions that call each other until I found a function that actually did things by itself instead of just forwarding to another function. I ended up at the following function:

void sha1_update( sha1_context *ctx, const unsigned char *input, size_t ilen )
{
    size_t fill;
    unsigned long left;

    if( ilen <= 0 )
        return;

    left = ctx->total[0] & 0x3F;
    fill = 64 - left;

    ctx->total[0] += (unsigned long) ilen;
    ctx->total[0] &= 0xFFFFFFFF;

    if( ctx->total[0] < (unsigned long) ilen )
        ctx->total[1]++;

    if( left && ilen >= fill )
    {
        memcpy( (void *) (ctx->buffer + left),
                (const void *) input, fill );
        sha1_process( ctx, ctx->buffer );
        input += fill;
        ilen -= fill;
        left = 0;
    }

    while( ilen >= 64 )
    {
        sha1_process( ctx, input );
        input += 64;
        ilen -= 64;
    }

    if( ilen > 0 )
    {
        memcpy( (void *) (ctx->buffer + left),
                (const void *) input, ilen );
    }
}

Looking at it I could not find a possible way to optimize this function. Since it took a while to find a suitable function, and I couldn't find an optimization, I'm going to look into the compiler flags next since they use the O2 flag, and this should be able to change to O3.

Tuesday, March 29, 2016

MD5deep focus area

So my first thought on the focus area was implementing a way for aarch64 to be installed correctly since I ran into trouble with the architecture. But in the latest configure file I had to download they fixed that issue already and was dated 2016-03-24, so that was pretty recent.

Since I'm more familiar with code than all the configuration and make files, I decided to take a better look at the source code.

MD5deep benchmarking betty

Now that I'm done testing and benchmarking on my own x86_64 fedora system, I need to test md5deep on an AARCH64 system. I transferred the files I used in the x86_64 benchmark to the betty server using scp and sftp. Once I got the test files transferred I transferred the tar.gz file of md5deep so I could try to install it on betty. Here's where I ran into some issues...

The normal way to install the md5deep is to unpack it and run the following commands:

sh bootstrap.sh
sudo ./configure
make
make install

Sadly it stopped at the configure step. It could not recognize which build to use since it did not know aarch64. Luckily in the error it gave me a website to download the most recent configure files which did contain aarch64! (ftp://ftp.gnu.org/pub/gnu/config/) So after some more use of sftp the configure step finally worked and I could run the make install command.

On to the benchmarking. Just like benchmarking on the x86_64 I ran the md5deep program 100 times with 3 different file sizes. The results are as following:

10.5 mb file
real: 0m0.037s
user: 0m0.028s
sys: 0m0.001s

105 mb file
real: 0m0.345s
user: 0m0.323s
sys: 0m0.027s

1.5 gb file
real: 0m4.792s
user: 0m4.551s
sys: 0m0.400s

If you only have one file md5deep can hash it pretty quickly. Even if you hash entire directory structures you wouldn't have to wait too long.

MD5deep benchmarking x86_64

I currently only have md5deep installed on my own fedora workstation, so I could test if I could get it working. Now that I did get it to work I can benchmark it for my laptop. When I get this to work I will do the same on betty.

My fedora specifications are:
Memory: 2.9 GiB
Processor: Intel Core i7-3630QM CPU @ 2.40GHz
Machine architecture: x86_64

To benchmark this program I need a few files of different sizes. With the following command I can create a file with complete random data and a size of approximately 100mb.

dd if=/dev/random of=test100m bs=1M count=100 iflag=fullblock

After running this command for different file sizes, I ended up with a 105mb testfile and a 10.5mb file.
By running the following command you can run the md5deep program and see how long it took to complete.

time md5deep test100m

To get a presentabele benchmark you have to run this command a lot of times and get the average. Instead of manually running this command 100 times I wrote the following command which would do it for me:

time for i in {1..100}; do md5deep test100m; done

The time you get for this is the total time of running it 100 times, so the time result you get has to be divided by 100 to get the average time of running once. I got the following results:

10.5 mb file
real: 0m0.053s
user: 0m0.021s
sys: 0m0.003s

105 mb file
real: 0m0.287s
user: 0m0.200s
sys: 0m0.012s

After seeing that these files still get hashed pretty quickly I decided to download a bigger file to see how long that would take. I downloaded the Fedora Workstation iso, which is 1.5gb, to test with.

1.5gb file
real: 0m3.458s
user: 0m2.793s
sys: 0m0.184s

It still doesn't take very long for a 1.5gb file to be hashed, so this program is pretty fast.

Monday, March 28, 2016

MD5Deep Installation

The project I chose to optimize is called md5deep. This is a program to compute hashes for a number of files. Md5deep doesn't only hash files with md5 but can also use SHA variants, Tiger and Whirlpool. It is similar to the md5sum program but md5deep has some additional features, which I quoted from: http://md5deep.sourceforge.net/

Recursive operation - md5deep is able to recursive examine an entire directory tree. That is, compute the MD5 for every file in a directory and for every file in every subdirectory.
Comparison mode - md5deep can accept a list of known hashes and compare them to a set of input files. The program can display either those input files that match the list of known hashes or those that do not match. Hashes sets can be drawn from Encase, the National Software Reference Library, iLook Investigator, Hashkeeper, md5sum, BSD md5, and other generic hash generating programs. Users are welcome to add functionality to read other formats too!
Time estimation - md5deep can produce a time estimate when it's processing very large files.
Piecewise hashing - Hash input files in arbitrary sized blocks
File type mode - md5deep can process only files of a certain type, such as regular files, block devices, etc.

I'll be taking a look at the source code and attempt to find optimization possibilities. The possibilities of optimizations are:

Improving a hashing algorithm
Add code so it works specifically on ARCH64, but not influencing the other systems
Edit the make file so it makes the program faster

I downloaded the source code from https://github.com/jessek/hashdeep/releases and ran the following commands to install it correctly:

sh bootstrap.sh
./configure
make
sudo make install

To test that it worked I hashed a few files and saved the hashes in a file. By saving the hashing you can find the matching files by using their comparison feature. The next step will be benchmarking.

Tuesday, March 15, 2016

Presentation Review

Today I presented my topics discussed in my previous blog: -ftree-tail-merge and -floop-interchange. I used a powerpoint to get the information across, I will put the slides on the end of this post.

Overall I thought the presentation went alright. I was able to show some code examples and explain how the options work. Sadly I wasn't able to compile my code due to different reasons. First of all the -floop-interchange option can only be used when GCC is configured with the --with-ppland and --with-cloog option. Appearently the GCC compiler we're using was only configured with --with-cloog, so this explains why I wasn't able to spot differences in my assembler code.

For the -ftree-tail-merge I do not know what I did wrong while compiling, I just could not get results in the assembler code. A tip I got from a classmate was to add a limit flag where I set the amount of lines that have to be identical. I have yet to try this.

I think most of the class understood what I was saying because there weren't many questions and I don't think the options I chose were very difficult.

One thing I couldn't figure out was, what the difference was between -ftree-tail-merge and -fcrossjumping since they basically both get the same results. Appearently the difference is that the -ftree-tail-merge gets called in the intermediate language GIMPLE and the -fcrossjumping is called in a different compile step. In source code you wouldn't see any differences between these options.

Powerpoint slides

Monday, March 14, 2016

Presentation Topics

The topics I chose for my presentation are the following optimization options. I'll be explaining the options with code examples and not assembler examples since I couldn't compile my code with the right compiler flags.

-floop-interchange

GCC Manual Description:

Perform loop interchange transformations on loops. Interchanging two nested loops switches the inner and outer loops. For example, given a loop like:

          DO J = 1, M
            DO I = 1, N
              A(J, I) = A(J, I) * C
            ENDDO
          ENDDO

loop interchange transforms the loop as if it were written:

          DO I = 1, N
            DO J = 1, M
              A(J, I) = A(J, I) * C
            ENDDO
          ENDDO

which can be beneficial when N is larger than the caches, because in Fortran, the elements of an array are stored in memory contiguously by column, and the original loop iterates over rows, potentially creating at each access a cache miss. This optimization applies to all the languages supported by GCC and is not limited to Fortran. To use this code transformation, GCC has to be configured with --with-ppland --with-cloog to enable the Graphite loop transformation infrastructure.

So after reading the manual and the wiki page, I went right into writing code, which developed into the following:

int main() {
    int array[1000][1000];
    for(int i=0; i<1000; i++){
        for(int j=0; j<1000; j++){
            array[j][i] = i*j;
        }
    }
}

When I tried to compile this with the -floop-interchange option, I saw no differences in the assembler code. Later I figured out that the compiler we're using is not configured with the --with-ppland option. So it made sense that there was no assembler to show for it.

After this I decided to just show in code what would change when the option is enabled, which resulted into the following:

int main() {
    int array[1000][1000];
    for(int j=0; j<1000; j++){
        for(int i=0; i<1000; i++){
            array[j][i] = i*j;
        }
    }
}

The only difference is that the loops have switched, so first it goes through j instead of i.
This happens because in C language it reads the cache in row-major order but the first int that increases is the column. It goes from [0][0] to [1][0] (See table below). So when it needs a value it gets the entire row for one value and the next row for the next value. This would be more effective if it got all the values from one row at once, this is why the loops are switched. It then counts from [0][0] to [0][1], so it can get 4 values from reading the cache once instead of one value per cache read.

-ftree-tail-merge

GCC Manual Description:

Look for identical code sequences. When found, replace one with a jump to the other. This optimization is known as tail merging or cross jumping. This flag is enabled by default at -O2 and higher. The compilation time in this pass can be limited using max-tail-merge-comparisons parameter and max-tail-merge-iterations parameter.

After researching this topic a lot I ended up writing the following code:

int main(int& a, int& b) {
    int c = 0;
    if(a > 100){
        a += 100;
        b += 200;
        c = a + b;
    } else {
        a += 100;
        b += 200;
        c = a + b;
    }    return c;
}

As you can see, there's duplicate code in the if and else. In this simple example the developer should see that this can be changed, but to show you what -ftree-tail-merge does I kept it simple.
Sadly I couldn't get this to compile the right way either, because I probably missed some necessary flags.
After researching some more I found out that -tree-tail-merge and -fcrossjumping have the same output when it comes to the code, but the -ftree-tail-merge gets activated in a different intermediate language step during compiling which is called GIMPLE.
To show you what the difference in assembly code is, I used the -fcrossjumping flag because it gives the same results. I colour aligned the code and assembler to let you know where everything happens and what the difference is between the assembler code.

gcc -O

main(int&, int&):
        movl    (%rdi), %eax
        cmpl    $100, %eax
        jle     .L2
        addl    $100, %eax
        movl    %eax, (%rdi)
        movl    (%rsi), %eax
        addl    $200, %eax
        movl    %eax, (%rsi)
        addl    (%rdi), %eax
        ret
.L2:
        addl    $300, %eax
        movl    %eax, (%rdi)
        movl    (%rsi), %eax
        addl    $200, %eax
        movl    %eax, (%rsi)
        addl    (%rdi), %eax
        ret

gcc -O -fcrossjumping

main(int&, int&):
        movl    (%rdi), %eax
        cmpl    $100, %eax
        jle     .L2
        addl    $100, %eax
        jmp     .L4
.L2:
        addl    $300, %eax
.L4:
        movl    %eax, (%rdi)
        movl    (%rsi), %eax
        addl    $200, %eax
        movl    %eax, (%rsi)
        addl    (%rdi), %eax
        ret

As you can see, it skips over the b and c operations in the if and jumps straight to L4 where the b and c operations are done. In code it would look like the following:

int main(int& a, int& b)
{
   int c = 0;
   if (a > 100) {
      a += 100;
   } else {
      a += 300;
   }

   b += 200;
   c = a + b;

   return c;
}

Resources:
https://en.wikipedia.org/wiki/Loop_interchange
http://www.ucw.cz/~hubicka/papers/proj/node10.html
https://katecpp.wordpress.com/2015/09/19/cross-jumping/
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Sunday, February 28, 2016

Lab 6 - Vectorization

The assignment for this lab is to fill two arrays with 1000 integers and add them together in a third array. The goal is to have the compiler recognize that it can use vectorization.
The code I wrote for this is:

#include <stdio.h>
#include <time.h>

int main(void) {
srand(time(NULL));
int arraySize = 1000;
int arrayOne[arraySize], arrayTwo[arraySize], arrayThree[arraySize];
for (int i = 0; i < arraySize; i++) {
arrayOne[i] = rand() % 1000;
arrayTwo[i] = rand() % 1000;
}

for (int j = 0; j < arraySize; j++) {
arrayThree[j] = arrayOne[j] + arrayTwo[j];
}

long sum = 0;
for (int k = 0; k < arraySize; k++) {
sum += arrayThree[k];
}
printf("%d\n", sum);
return 0;
}

To compile the code on archie we use the following compiler command:

gcc -x c vec.C -o vec -O3 -std=c99 -g

I have not yet tested the code on archie, and only on my own pc. So that's on my TO-DO list.

Lab 5 - Algorithm Selection Lab

The assignment for this lab was to write two algorithms for adjusting volume of sound samples. We did this lab in groups. The first method is to scale a signed 16-bit integer by multiplying it by a volume scaling factor expressed as a float in the range of 0.000 - 1.000.
The first thing to do is create an array and fill it with many random values which act like the sound samples. After this we can write the scaling function. This is as simple as multiplying the sound sample and the scaling factor. We do this for all the sounds samples in the array.

int main() { //creating an array of samples srand((int)time(NULL)); int numberOfElements = 500000000; int16_t* sampleArray = (int16_t*)malloc(sizeof(int16_t) * numberOfElements); fillSampleArray(sampleArray, numberOfElements);
int result = INT16_MIN;

    for (int i = 0; i < numberOfElements; i++) {         result = result ^ volumeScale(sampleArray[i], 0.750);
    }

    printf("%d", result);
    free(sampleArray);
    return 0;
}

void fillSampleArray(int16_t* arr, int numOfElems)
{
    for (int i = 0; i < numOfElems; i++)
    {
        arr[i] = ((rand() * 2) % (UINT16_MAX + 1)) + (INT16_MIN);
    }
}

int16_t volumeScale(int16_t sample, float scale)
{
return (int16_t)sample*scale;
}

The second method is to do the same thing but by using a lookup table. This means creating an array of all the possible values, which would be 65536 possibilities. One lookup table is for one volume scale.
So first off you check if the entered volume scale is the previous volume, and if this is not the case, you make the lookup table for this volume scale. Then you change the previous volume variable to the current scale. This way the lookup table is created once for every volume scale. So instead of doing 500 million multiplications like the first method, now it only take 65+k multiplications. Whenever the volume is changed, another lookup table will be generated for that volume.

Now to compare the two methods on ARCH64 and x86_64.
On x86_64(Xerxes) the lookup table is 55% slower that the normal multiplications.
On Arch64(Aarchie) the lookup table is 22% faster than the normal multiplications.

Lab 7 - Inline Assembler Lab

The assignment was to find inline assembler code in an open source project. I chose the open source project sooperlooper. I downloaded the sourcecode from their site: http://essej.net/sooperlooper/download.html

"SooperLooper is a live looping sampler capable of immediate loop recording, overdubbing, multiplying, reversing and more. It allows for multiple simultaneous multi-channel loops limited only by your computer's available memory."

I extracted the tar file and proceeded to find the inline assembler. I used the following command:
egrep -R "__asm__|asm\(*" *

This gave me all the usages of the terms __asm__ and asm(

The files where inline assembler was used were:
src/atomic.h
libs/pbd/pbd/atomic.h

Supported architecures
Time to take a closer look at the src/atomic.h file.
They support several kinds of architectures, I've listed them below as shown in the code.

__ASM_PPC_ATOMIC_H_ = PowerPC architecture
__i386__ || __x86_64__ / __ARCH_I386_ATOMIC__ = arch i386 architecture
__ARCH_SPARC_ATOMIC__ = arch sparc architecture
__ARCH_IA64_ATOMIC__ = arch ia64 architecture
__ALPHA_ATOMIC_H = alpha architecture
__ARCH_S390_ATOMIC__ = arch s390 architecture
__MIPS__ = mips architecture
__ARCH_M68K_ATOMIC__ = arch m68k architecture

They also have an option when it's none of the architectures listed above:
__NO_STRICT_ATOMIC

Assembler code
This code makes use of atomic operations to make looping more efficient than in normal C code.
The functions they made are:

atomic_add
atomic_sub
atomic_sub_and_test
atomic_inc
atomic_dec
atomic_dec_and_test
atomic_inc_and_test
atomic_add_negative

x86-specific

atomic_clear_mask
atomic_set_mask

To paste all the assembler code here would be a bit much, but I'll take the function atomic_add as an example.

/**
* atomic_add - add integer to atomic variable
* @i: integer value to add
* @v: pointer of type atomic_t
*
* Atomically adds @i to @v. Note that the guaranteed useful range
* of an atomic_t is only 24 bits.
*/
static __inline__ void atomic_add(int i, atomic_t *v)
{
    __asm__ __volatile__(
        SMP_LOCK "addl %1,%0"
        :"=m" (v->counter)
        :"ir" (i), "m" (v->counter));
}

They use __volatile__ here so the compiler is forced not to move the code around and keep the operations happening before this function before it, and whatever comes after this, stay after it.
So this atomic_add function basically just adds int i to the variable v.

Sunday, January 31, 2016

Lab 1 - Code Review

The purpose of this lab is to explore code review processes used in open source projects. We had to select two open source project that have different licenses.

Amarok
The first project I looked at was called Amarok. This is a iTunes like music player for Linux, Unix and Windows. https://amarok.kde.org/. This project is licensed under the GNU General Public License.

There are a few ways you can contribute to this project:

Create artwork for the application
Write code in C++ using the Qt and KDELibs toolkits
Write scripts that expand the application

To start contributing they recommend you take a look at the bugs and see if you can help with any bugs. Their bugtrackingsystem can be found at: https://bugs.kde.org/buglist.cgi?quicksearch=amarok

All the code that you write must be submitted to their reviewboard which can be found at https://git.reviewboard.kde.org/groups/amarok/?show-closed=1&page=1.

On this reviewboard all the members can comment on your contribution. However there are only a few members with commit rights. So when members comment that the code looks good and would be a nice addition, they'll most likely comment that someone with commit access should push this to the master branch. An example I found: https://git.reviewboard.kde.org/r/120930/#review69630

A more recent example of an actual submission: https://git.reviewboard.kde.org/r/2/

Here you can see that normal members comment on it and say to ship it, and a member with commit access commits the code.

Chromium

The second project I looked at was Chromium, an open-source browser behind google chrome: http://www.chromium.org/Home. This project is licesned under the Creative Commons Attribution 2.5 License.

There are three major paths you can take to contribute to Chromium:

You have a new idea
You want to change exicting code
You want to work on an existing bug

When you have a new idea you want to add to Chromium you first have to post your idea to a discussion group. In this group they'll discuss if your idea if worth adding to the application. I couldn't find an example of this, because their discussion group is also for many other problems. It can be found here: https://groups.google.com/a/chromium.org/forum/#!forum/chromium-discuss

If you want to change existing code, you have to contact the person who originally wrote the code and ask if your idea would be good to add.

Not all bugs in their system are assigned, so if it's not assigned youre allowed to pick it up and assign it to yourself. If the bug you want is already assigned you can contact the person it's assigned to and ask if you could help or take over. It's also possible to file your own bug and work on it, but these bugs cannot be whole new ideas. These bugs have to be nontrivial, like simple cleanups and style fixes. The listed bugs can be found here: https://code.google.com/p/chromium/issues/list

To submit your code you have to add it to the code review site: https://codereview.chromium.org/

Your code will be reviewd within 24-48 hours, if the patch is not too big. All members can comment on your code and give you feedback. Only some members have commit access. An exmaple I found: https://codereview.chromium.org/1650283002/

Overall Chromuim takes communication very seriously, if you're working on something you need to keep communicating to relevant people otherwise you might have a hard time getting your code reviewed.

Lab 3 - Code Building

Time to write some assembly! We did this lab in groups so we could put our heads together to write some good assembly code.

First of all we had to download the examples to our directories on the servers and unpack them. This was done with the tar and make command.
Once we had the example files we could take a look at the differences between dumping the C file and the assembler file. When dumping the c source code you get a bunch of information about all the source code. When dumping the assembler code you only get the instructions.

Now it was time to make our own little program. We needed to make a simple looper that goes from 0-9 and prints out the number.
After making our own file with the simple Hello World looper we needed to compile the assembler code using the following commands:
Run the assembler: as -g -o test.o test.s
Run the linker: ls -o test test.o

You're going to use these command a lot so it's easier to write it in one command and re-use it.
as -g -o test.o test.s; ls -o test test.o; ./test

If you do not have the right permissions, use the following command:
chmod 700 filename

Now to edit some assembler code.

To print a number in assembler you need to work with ASCII codes. In our case we needed to add 48 to the number because this is the character for 0. We added this offset to an empty register that would not get trampled at the start of the program. In the loop itself we added the offset to the number it's supposed to be (0-9) and set the result in the rsi register. This is the message location register.

In theory this was supposed to work, but we made a little mistake in the .data section. We set it to be .rodata, this means it was read-only. After changing this to .data it did print all the numbers, but we also had a newline character in our string but that did not seem to work. Appearently the mov command moves the full 64 bytes, even if you only have 1 byte of data. This caused the number to overwrite the newline character. This was an easy fix by adding the b suffix to mov and the register you move it from.

The next step was to loop from 00-30.

To get this to work we divided the loop number by 10 and kept track of our remainders. To do this you use the div command, this takes the rax register and divides it by a chosen register. It stored the outcome into rax and the remainder into rdx. The rdx register has to be set to 0 before you use the div command. As before we added our offset of 48 to rax and rdx and moved these into the string.

The last step was to supress the first zero when the number is below 10.
We did this by checking if the quotient was zero. If it was, we used the je(jump if equal) command to jump to a label. This label would be after the mov command to move the first digit to the string, so the first digit would not get printed if the jump occurred.

Lab 4 - Compiled C

This lab was about compiling code with different arguments. We'll be discussing 6 changes. Each change was done in differents groups. My group did change 5, so this one is more extensive.

Starting code
We used the simple hello world code for this lab:

#include <stdio.h>

int main() {
    printf("Hello World!\n");
}

We compiled this code using the GCC compiler with the following arguments.

-g               # enable debugging information
-O0              # do not optimize (that's a capital letter and then the digit zero)
-fno-builtin     # do not use builtin function optimizations

In the objdump you could see that <main> calls to <printf> to print Hello World.

1. Add the compiler option -static. Note and explain the change in size, section headers, and the function call.

When you add the option -static, the compiler imports the entire library when it only needs one function. This causes the file size to be much larger.

2. Remove the compiler option -fno-builtin. Note and explain the change in the function call.

<printf> replaced with <puts>

3. Remove the compiler option -g. Note and explain the change in size, section headers, and disassembly output.
When you enter the command objdump --source you normally get the written code together with the assembler code. This is handy when you want to see which assembler code belongs to which C code. When you remove the option -g you remove all the debug information. This also means when you enter the command objdump --source, you do not get the C code in the file because this is seen as debug information.

4. Add additional arguments to the printf() function in your program. Note which register each argument is placed in.
In the AARCH64 architecture 7 arguents get saved in registers, the overflow gets pushed on the stack.

In the INTEL architecture only 5 arguments get stored in registerd, and the overflow also gets pushed on the stack.

5. Move the printf() call to a separate function named output(), and call that function from main(). Explain the changes in the object code.
5.1
Our task was to move the printf() function to a seperate function output().

#include <stdio.h>

void output(){
  printf("Hello World!\n");
}

int main() {
  output();
}

We compiled this with the same arguments stated above. The objdump now shows:
<main> calls <output> and that calls <printf>
<output> is the same as <main> without our output() function.

5.2
We now compiled the following code to inspect differences when you add parameters.

#include <stdio.h>

void output(char a[]){
  printf(a);
}

int main() {
  output("Hello World!\n");
}

In the objdump you see that it loads the given parameter to the x0 register.
The output function in the <main> adds a pointer to the parameter to register x0 before calling <output>.
<printf> in the <output> then takes the x0 parameter and puts it in the x1 register to put it as argument 2 for printf.

File size increased slightly with each change.

6. Remove -O0 and add -O3 to the gcc options. Note and explain the difference in the compiled code.These options have to do with optimizations. The compiler gets 'smarter' and deletes lines of code it doesn't need. For example, in O0 it sets up the stack but it never uses the stack after that. These lines are deleted in the O3 option. Another example is that after the <printf> is done, it doesn't return to main. It goes back to whatever called <main>.

Thursday, January 14, 2016

Lab 2 - Code Building

GNUChess
https://www.gnu.org/software/chess/
As recommended, I went to the GNU website to find some projects to install on my fedora installation.There was a whole list of projects and I decided to go with a game. One game I chose was gnuchess.
Since this was my first time installing an open source project on a Linux device, I had to find some instructions on how to install the game. Luckily the package had a install guide in it which told me to run the commands ./configure, make, make install. The command ./configure did work but I got error's when I tried to execute the make commands. Then I just tried to run the gnuchess file and it asked if I wanted to install the packages, which I said yes to. When I ran the gnuchess file again, the game started in the terminal.

I could not yet install a non-GNU project. The projects I found were too big, and I could not install all the needed components. I will try again later.