Sunday, January 23, 2011

Performance Programming

We all are witnessed to many advancements in computing industry....from Pentium to i5 and i7 processors.
  • The most important thing we get now with latest processors are cores. These cores really helping out to make performance computing possible.
  • The most important misconception about these CPU's is we test them with their specifications and commit ourselves that applications will run faster on them than my previous computing experience. And here we fail....
 As computing hardware is evolving there is need for computing software also to evolve and to make best use of available hardware. Last few months I am studying and identifying ways to write performance codes on i5 and i7 systems. I found many things. So I will write about all of them one by one. Today is the time for OpenMP. 
The specifications summary for C++ can be found here.. Even tutorials and examples. 

This OpenMP specification is added in many compilers and one can easily used the C++ specs for performance programming. Now I can show you how it works with just one simple example and #pragma defines. I am too; learning it for my own benefit. So this is my starting blog for multicore programming.  
The series will continue with more explanations and hands on.
Consider the following code:
const int WIDTH = 1920;
const int HEIGHT = 1080;
const int FILTEROP = 25;

float* rDataSource = new float[ WIDTH * HEIGHT * 4 ];
float* rDataTarget = new float[ WIDTH * HEIGHT * 4 ];

void simpleProgramming()
{
    for( int iH = 0; iH < HEIGHT; iH++ ) //traverse the height
        {
            for( int iW = 0; iW < WIDTH; iW++ ) // traverse the width
        {
                    for( int iP = 0; iP < 4; iP++ ) // traverse every pixel
                        {
                            float temp = 0;
                                for( int iOps = 0; iOps < FILTEROP; iOps++ )
                                // Probably the; computationally heavy algorithm
                                {
                                 temp = ::sqrt(rDataSource[(iH * WIDTH) + (iW * 4) + iP ]);
                             temp = temp * temp;
                                }
                                rDataTarget[(iH * WIDTH) + (iW * 4) + iP ] = temp;
                        }
                }
        }
}

Consider, I am working on some HD resolution image...Where 1920 X 1080 and 4 component pixel RGBA make up the image data structure. My algorithm is possibly doing some complex operations to get new image with effects. I have added simple placeholder operation for creating complexity. Where source data pixel components value's square root is taken out. Then again the square roots are multiplied to get the original value and the same value is stored in target data. The operation is looped for creating complexity for testing.
At the end I will write all the statistics for this program. Now consider the following performance Programming.

void performanceProgrammingOMP()
{
        #pragma omp parallel // This is first statement
        #pragma omp for // This is second statement

        for( int iH = 0; iH < HEIGHT; iH++ )
        {
                for( int iW = 0; iW < WIDTH; iW++ )
                {
                        for( int iP = 0; iP < 4; iP++ )
                       {
                               float temp = 0;
                               for( int iOps = 0; iOps < FILTEROP; iOps++ )
                               {
                                      temp = ::sqrt(rDataSource[(iH * WIDTH) + (iW * 4) + iP ]);
                                      temp = temp * temp;
                               }
                               rDataTarget[(iH * WIDTH) + (iW * 4) + iP ] = temp;
                       }
                }
        }
}

Now this performance programming snippet has only two more lines for OpenMP specifications supported by compiler. I am using Visual Studio 2010. Microsoft C++ compiler
It has very good support for OpenMP. Just go to Project Properties -> Configuration Properties -> C++ -> Language -> OpenMP Support -> Yes. Compile the code and you are ready with multicore programming support. The observations are really great. I have one complete program with timing display. Please find it in attachment and prepare your simple Win32 console application for it. Compile and run as directed. OpenMP support requires compiler swith to be turned on. You may vary constants values and post your observations. I have used FILTEROP = 100 for debug build testing and FILTEROP = 1000 for release build testing.
The machine used
 Intel Core i5 CPU 760 Quad Core @2.80 GHz
 4GB RAM
 Windows 7 64 bit OS
 Visual Studio 2010 Ultimate.
 _______________________________________________________________
Statistics With FILTEROP = 100 (Debug) annd FILTEROP=1000(Release)        
------------------------------------------------------------------------------------------------------
Criteria                                  Debug                                          Release                          
------------------------------------------------------------------------------------------------------
                           -OpenMP           +OpenMP              -OpenMP         +OpenMP     
 -----------------------------------------------------------------------------------------------------
Executable Size    36 KB               36.5 KB                  9 KB                  9.5 KB
Time (Seconds)     45.52                 13.03                      7.64                    2.20
CPU Usage            24%                  100%                      25%                   100%
CPU Gadget          26%                  100%                      26%                    50%
Cores Active          1                        4                            1                          4
  
Please download the simple program from :

 Whoooopppppsssss!!!!! OpenMP specification will take your applications to new heights. Performance programming with little few statements.