Tuesday, April 17, 2012

Compiler's Optimization Techniques - Virtual Inheritance - Part 1

Inheritance is one of the pillars of Object Oriented Programming. Many high level languages have different implementations for inheritance and also they support wider ramification of inheritance. Despite these differences, the core implementation structure of inheritance converges to an accepted design which is common across all languages. The advantage of diving deeper into understanding the implementation details of inheritance is that it imparts greater confidence in you to be able to exactly figure out what compiler is trying to achieve.

The reason behind writing this article is, out of all the high level programming languages C++ stands out by supporting multiple inheritance, which none of the other high lever languages support in it's native form, but rather as interfaces. As a programmer it is always good to understand your compiler and its optimization techniques which ultimately helps you to write efficient code.

The main article will be presented as two separate posts, and this being Part-1 of the article, discusses primitive inheritance technique ie single level inheritance, object layout, memory allocation, implicit class pointer conversion, and pointer offsetting. and Part - 2 of the article provides comprehensive insight into implementation details of multiple and virtual inheritance.

Part - 1
Consider two simple classes, " Mobile_OS " and " WindowsPhone" which is derived from "Mobile_OS" with a public access specifier.

 class Mobile_OS  
 {  
 public:  
   float iKernalVersion;  
   float iReleaseVersion;  
   char* strVendor ;  
   
   virtual float GetKernalVersion();  
   virtual float GetReleaseVerison();  
   Mobile_OS();  
   ~ Mobile_OS();  
 };  
   
 class WindowsPhone : public Mobile_OS  
 {  
 private:  
   char* strCodeName ;  
   char* iHardwarePlatform ;    
   
 public:  
   int iCustomRomVersion ;  
   float GetKernalVersion();  
   float GetReleaseVerison();  
   WindowsPhone();  
   ~ WindowsPhone();  
 };  

The blue print of  "WindowsPhone" class lays out all the non virtual data members in the order of their declaration starting from the base class data members, followed by derived class data members. as shown in the diagram below.
 
Why this object layout is preferred for derived class?
The C++ Standards Committee allows object layout of any ordering of data members each separated by access declarator. But VC++ and most of the other compilers ensures that the objects are laid out in the order of declarations, with derived class data members following the base class data members.

What optimization does compiler achieve with this layout?
Derived class inherits all the public properties and behaviors of base class.  The complete instance of base class's data members are contained within the derived class address space. 
By placing the base class "Mobile_OS" at the starting address of a derived class "WindowsPhone" ensures that the address of the base object  "Mobile_OS" within derived object "WindowsPhone" corresponds to very first byte of  "WindowsPhone".  And hence this layout avoids offset calculations for  base object data access with in derived object.

How does offset calculation is avoided ?
 Mobile_OS* Lumia = new WindowsPhone();  
 Lumia->iKernalVersion;  
Consider the code snippet above, here base class data member "iKernalVersion" is accessed from "Lumia" object pointer whose static type is "Mobile_OS*" but dynamic type is "WindowsPhone*".
Since the implicit upcasting form "WindowPhone" to "Mobile_OS" is succesfull in this case, compiler just extracts value of "iKernalVersion" based on "Mobile_OS" layout, and hence no offset calculations required in this case. So to get to the base class data member compiler just needs to compute

&DataValue = DerivedClass_StartingAddress + Offset_of_DataMember_Within_BaseClass

What if  base class data members followed derived class data members ?
Consider the same code snippet specified above, and if at all base class data members followed derived class data members, accessing any data member of the base class would have been an  overload with offset computation. So to get to the base class data member compiler needs to compute.

&DataValue = DerivedClass_StartingAddress + Offset_To_BaseClassObject_Within_DerivedClass  +Offset_of_DataMember_Within_BaseClass 

I hope this provides convincing explanation about object layout and its advantages in case of single inheritance. And as mentioned earlier Multiple and Virtual Inheritance implementations will be discussed in Part-2 of the article. Kindly revert back if any comments or suggestions.

ShareThis