Elevate Programming: 2015

Monday, June 22, 2015

Assemblies: locating, binding and deploying

Introduction

This article describes how the CLR locates and binds assemblies and how to change the default behavior when needed (e.g. in the deployment stage).
Any developer and system administrator who deals with .NET assemblies, especially commercial applications, must be familiar with these topics. This knowledge is the best way to plan for service packs, upgrades and hot fixes as they come along.
The .NET Framework is loaded (almost inflated) with a bunch of terms and features related to assembly deployment, locating and binding.
Here is a short list:

Static and dynamic loading
Public and private
GAC and private folder
Probing
Codebase
BindingRedirect
App.Config and Machine.config
Strongly named and weakly named
DevelopmentMode

Audience

The article is not at the beginner level, readers with basic knowledge in configuration files and assembly structure can also benefit from it right in the first reading.

The quest for type resolving

The CLR – Common Language Runtime – is responsible for the process of locating and binding referenced assemblies. Locating is the process of finding the correct assembly in the hard disk. Binding is the process of loading the assembly in the application address space.
The quest begins when the JIT encounters user defined types that need to be resolved. Then the CLR tries to detect where the type definition is:

Same file in the same assembly
Different file in the same assembly
Different assembly

This article deals with the third option.

General process blocks

The CLR moves from stage to stage, as described above, in order to determine the exact assembly to load. The reason for this flow is that each stage might override the information in the previous stage. Although, it might look like a cry out redundancy it is really necessary, because of the need to make changes in the deployment files after installation. For example, when installing a Service Pack, the system administrator would like to keep the previous installation up and running. This need requires changes in the locating and the binding process of new assemblies with new versions.

1) Search for referenced assembly upon name and version

Configuration File - App.config The CLR checks the App.config after the manifest check. In case the referenced assembly version is overridden the App.config setting has the upper hand.
Publisher policy file The CLR checks for the publisher policy file after the App.config check. Publisher policy files are deployed as part of an update, hot fix or service pack. A publisher policy file is used when the updated shared/public assembly has a new version (that it is different from the assembly's manifest). The setting in the publisher policy file has the upper hand unless the App.config file sets the safe mode (<PUBLISHERPOLICY apply= "no">).
Machine configuration file The CLR checks in the machine.config file after the publisher policy file check. The file is shared by all .NET applications on the machine. In case of a version difference, the setting in the Machine.config has the upper hand.

2) Checking for previously referenced assemblies

The CLR checks if the assembly is already loaded (due to previous code execution statements). If found the CLR uses it. At first it looks like the fox in Redmond has a design bug - why not checking the assembly in the previous loaded assembly list in the first stage? The reason for that is the need to examine first which version is required.

3) Check in the GAC (Global Assembly Cache)

If not found in stage 2 and the manifest implies that the assembly is strongly named, the CLR checks the GAC. If exists, the GAC has the upper hand.

4) Codebase or Probing

The prior stages inform the CLR what is the required assembly version. In this stage, the CLR attempts to find and load the assembly.

Codebase If the Codebase tag is defined in the application configuration file, the CLR checks only the defined location. If the assembly is not in the given URL, the probing process is terminated.
Probing In case there is no Codebase tag in the configuration file or if the attempt to retrieve the file from the URL fails, the CLR starts the Probing process.
Subdirectories Search in the application directory and then in the subdirectories with the assembly name.
```
[application base] / [assembly name].dll 
[application base] / [assembly name] / [assembly name].dll
```
If the referenced assembly has a culture definition then the CLR checks in the following sub-directories:
```
[application base] / [culture] / [assembly name].dll 
[application base] / [culture] / [assembly name] / [assembly name].dll
```
The CLR also check in BinPath.
Tip: The CLR terminates the probing process as soon as the reference assembly is found (name search only). In case the assembly is correct - all is well - else the binding fails (Filenotfound exception raised).

Features and Terms

Static and Dynamic loading

In static loading, the CLR checks for assemblies in the assembly manifest. The list of statically referenced assemblies is entered to the file in the build process. In dynamic loading, the CLR is introduced to the assembly in run time. This feature is wrapped up in the System.Reflection assembly, which exposes methods like Assembly.Load (similar to the LoadLibrary function).

Private and Public/Shared assemblies

Shared assemblies are not deployed in the same directory of the application that uses them. The CLR will not complain if you do, but it is a deployment error to copy the shared assembly in one of the base application's directory and direct others to it. Typically shared assemblies are registered in the GAC or in a share directory (the application that uses them needs to know about).

Strongly and weakly named assemblies

The main difference between the two is that a strongly named assembly contains a public token. The token uniquely identifies the assembly. It is required for shared assemblies that are installed in the GAC. The reason for that is the slight chance that some developer creates an assembly with the same name, culture and version as one that happens to be installed in the same PC. Another difference (implied from the previous) is that strong name assemblies can be deployed privately and publicly (GAC) whereas weakly named assemblies can only be deployed privately.
Tip: Do not copy the assembly to the GAC folder ([windows main folder]\assembly). The assembly must be registered in the GAC. Use the GacUtil via the .NET command prompt or drag and drop the assembly to the GAC window.

Shared assembly and the GAC

A shared assembly that is copied outside the application base directory must be strongly named but is not required to be installed in the GAC. In this case the shared assembly location must be specified in the configuration file using the Codebase tag. Application suites can create a shared folder and copy the shared assemblies to it.
The following is a list of reasons that helps to decide when to use the GAC instead of a proprietary shared folder:

Third party applications might use the shared assembly. Copying the assemblies to a shared folder obligates the application that uses them to know about its location.
Side by side execution is easier when new version assemblies are installed in the GAC. It saves the time of dealing with different paths for different versions. Moreover, using proprietary, shared folders might cause DLL Hell.
Only a user who is defined in the Windows administration group can install assemblies in the GAC (security).
Shared folder is more accessible for reckless user mistakes like delete and override.
Save disk storage: this is quite a stupid reason - I know. Today, disks are very cheap, but you can not ignore the fact that copying an assembly to different locations over and over again will have some storage impact.

Probing

This is one of the two ways to define an assembly location. The instruction is done in a configuration file. You can specify only subdirectories under the application base directory .

CodeBase

This is one of the two ways to define an assembly location. The CLR, first checks the assembly version, then searches for an override CodeBase definition in the configuration file. The version attribute is required, only, for strong named assemblies. For weakly named assemblies the href attribute must be assigned, only, to a subdirectory. This URL can refer to a directory on the user's hard disk or to a Web address. In the case of a Web address, the CLR will automatically download the file and store it in the user's download cache (a subdirectory under <Documents and Settings>\<UserName>\Local Settings\Application Data\Assembly). When referenced in the future, the CLR will load the assembly from this directory rather than access the URL. The CodeBase tag should only be defined in the machine configuration or publisher policy files that also redirect the assembly version.

BindingRedirect

This feature enables the binding of a certain assembly to a different file version and it is very useful for service pack installations. Usually the redirect is done to a GAC registered assembly. The GAC allows us to install the same assembly with different versions. The CLR checks the configuration file and redirects the binding accordingly

Locating Assemblies Using DEVPATH

The DEVPATH is nice feature for the development stage. The feature eases the development life cycle by delaying the decisions in regards to the deployment stage.
The developer can create a DEVPATH environment variable that points to the build output directory for the assembly. Follow these steps to enjoy this feature:

Specify the element in the machine configuration file. Make sure you add it to the relevant framework version. The CLR searches for the referenced assemblies in the path described in the DEVPATH environment variable.
Define an environment variable. Name: DEVPATH. Value: path. Make sure that you enter it as a system variable and that the path value is ending with \.

Example: add the following statement above the </CONFIGURATION> tag.

</configuration>
   <runtime>
      <developmentMode developerInstallation="true"/>
   </runtime>
</configuration>

Snippets

The snippets below are related to the following example:

The application name is HelloWorld.
The application is located in c:\MyTest.
ProductCabinet is a sub directory under the MyTest directory (c:\MyTest\ProductCabinet).
The file ReferencedFile.dll is located under the ProductCabinet directory.

Probing

<?xml version="1.0" encoding="utf?8" ?>
    <configuration>
        <runtime>
            <assemblyBinding xmlns="urn:schemas?microsoft?com:asm.v1">
                    <probing privatePath=" ProductCabinet" />
            </assemblyBinding>
        </runtime>
    </configuration>

Codebase

<?xml version="1.0" encoding="utf?8" ?>
    <configuration>
        <runtime>
            <assemblyBinding xmlns="urn:schemas?microsoft?com:asm.v1">
                <dependentAssembly>
                    <codeBase version="1.0.0.0" 
                      href= "file:///c:\ MyTest\ProductCabinet ReferencedFile.dll"/>
                </dependentAssembly>
            </assemblyBinding>
        </runtime>
    </configuration>

BindingRedirect

<?xml version="1.0" encoding="utf?8" ?>
 <configuration>
   <runtime>
      <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
       <dependentAssembly>
         <assemblyIdentity name=" ReferencedFile"
                           publicKeyToken="99ab3ba45e0b54a8"
                           culture="en-us" />
         <bindingRedirect oldVersion="1.0.0.0"
                          newVersion="2.0.0.0"/>
       </dependentAssembly>
          <publisherPolicy apply="no">
      </assemblyBinding>
   </runtime>
</configuration>

Tools

Fuslogvw – Assembly Binding Log Viewer "The Assembly Binding Log Viewer displays details for failed assembly binds. This information helps you to diagnose why the .NET Framework cannot locate an assembly at run time. These failures are usually the result of an assembly deployed to the wrong location or a mismatch in version numbers or cultures. The common language runtime's failure to locate an assembly typically shows up as a TypeLoadException in your application" (MSDN)
Tip: make sure you set the HKLM\Software\Microsoft\Fusion\ForceLog registry value to 1 (the value is a DWORD).
NET Framework Configuration Tool NET Framework Configuration allows you to configure assemblies, remoting services, and code access security policy specifics. (MSDN)

Saturday, January 31, 2015

Cracking C# Interview Questions Part One

Most asked questions:

Explain the difference between abstract class and interface.

Constructors:

What is Static Constructor?
Static constructor is used to initialize static data members as soon as the class is referenced for first time. Static constructor does not take access modifiers and or have any parameter and cannot access any non-static data member of class.

Whereas the instance constructor is used to initialize instance member of object of class. It is called whenever a new object is created ( either explicitly or implicitly).

Inheritance:

Are private data members inherited?
Yes, but they cannot be accessed by child class methods.

What are Delegates?
A delegate in C# allows us to pass a method of a class to object of another class that can call these methods. It is alias of function pointer in C++.

Tuesday, January 6, 2015

What most programmers need to learn - Basics of Programming

The main thing programmers need to learn is self discipline. The discipline to always write the clearest code you can, the discipline to re factor code if it becomes muddy through changes later in development, the discipline to remove unused code and add comments.

What you think is important in being a good programmer and the usual answer is code should be clear, understandable and maintainable. But it is very rare to actually consistently follow through with that.

Keeping this in mind requires self discipline, because it means not stopping "when it works". If all the variables would have the wrong name the code could still function perfectly, but the code would be super confusing. The step from functional code to clear code brings very little reward in the short term: it worked already and after cleaning it up it still works. That is why discipline is required to take this step.

Let me give a few examples of the kinds of things I often see in code written by starting programmers:

Liar functions/variables/classes

These are functions, classes or variables that do something else than their name suggests. Their name is a lie. It is very obvious that names should be correct, but to my surprise it is quite uncommon for names to be completely off.

An example I recently encountered in code written by a former intern was two classes: EditorGUI and EditorObjectCreatorGUI. This is code that handles the interface in our editors. To my surprise it turned out that the code that handled the button for creating new objects was in EditorGUI, while EditorObjectCreatorGUI only handled navigating through different objects. The exact opposite of what the naming suggests! Even though the code was relatively simple, it took me quite a while to understand it, simply because I started with a completely wrong assumption based on the class names. The solution in this case is really simple: rename EditorObjectCreatorGUI to EditorObjectNavigationGUI and it is already much, much more understandable.

This is something I see a lot: names that are simply incorrect. I think this often happens because code evolves while working on it. When the name was chosen it might have been correct, but by the time the code was finished it had become wrong. The trick is to constantly keep naming in mind. You have to always wonder whether what you are adding still fits the name of the function or class.

Muddy classes

Another problem I see is muddy classes: classes that do a lot of unrelated things. Again this is something that happens as you keep working on the same code. New features are added in the easiest spots and at some point classes become bloated with all kinds of unrelated behaviour. Sometimes the bloating is not even in the size of the classes: a class might be only a few hundred lines but still contain code that does not belong there.

An example of how this can happen is if for some reason a GUI class needs to analyse what textures are available (maybe because there is a button to select a texture). If the GUI class is the only class that needs the results of this analysis, then it makes sense to do that in the GUI class. However, then some totally unrelated gameplay class for some reason also needs that info. So you pass the GUI class to that gameplay class to query the texture information. At this point the GUI class has grown to be something more: it is also the TextureAnalyser class. The solution is simple: split off the TextureAnalyser class into a separate class that can be used by both the GUI class and the gameplay class.

The general rule of thumb to avoid this problem is to always wonder: does the functionality that I am adding here still fit the name of the class? If not, then the class either needs to be renamed, or it needs to be split into separate classes or the code needs to go into a different class.

It is usually a Bad Smell if you cannot come up with a fitting name for your class. If you cannot describe what a class does in its name, then maybe what it does is too muddy. It might need to be split into parts that make more sense and can actually be described with a proper name.

Oversized classes

This one is really similar to the muddy classes above: over time more and more is added to a class and it gets bloated. In this case however it all still makes sense to be in one class, but the class simply grows too big. Gigantic classes are cumbersome to work with. Bugs slip in easily as there is a lot of code manipulating the same private member variables, so there are a lot of details one can easily overlook.

Splitting a class that has grown too big is quite boring work. It can also be a challenge if the code in the class is highly intertwined. Add to this that it already works and that fixing it adds no new functionality. The result is again that it requires serious self discipline to split a class whenever it becomes too big.

As a general rule of thumb at Ronimo we try to keep classes below 500 lines and functions below 50 lines. Sometimes this is just not feasible or sensible, but in general whenever a class or function grows beyond that we look for ways to refactor and split it into smaller, more manageable pieces. (This makes me curious: where do you draw the line? Let me know in the comments!)

Code in comments

Almost all sample code that applicants send us contains pieces of code that have been commented out, without any information on why. Is this broken code that needs to be fixed? Old code that has been replaced? Why is that code there? When asked applicants are usually well aware that commented-out-code is confusing, but somehow they almost always have it in their code.

Parallel logic and code duplication

Another problem that I often see occurring is to have similar logic in several spots.

For example, maybe the name of a texture gives some information as to what it is intended for, like “TreeBackground.dds”. To know whether a texture can be used for a tree we check the filename to see whether it starts with the word “Tree”. Maybe with the SDK being used we can check that really quickly by just usingfilename.beginsWith(”Tree”). This code is so short that if we need it in various spots, we can just paste it there. Of course this is code duplication and everyone knows that code duplication should be avoided, but if the code being duplicated is so short, then it is tempting to just copy it instead. The problem we face here is obvious: maybe later the way we check whether a texture is fit for a tree changes. We then need to apply shotgun surgery and fix each spot separately.

A general rule of thumb here is that if code is very specific, then it should not be copied but put in a function. Even if it is super short and calling a function requires more code than doing it directly.

All of the things discussed in this blogpost are really obvious. Most of these things are even taught in first year at university. The challenge is to make the step from knowing them to actually spending the time to always follow through with them, to always keep them in mind. This is why the most important thing that all programming interns learn at Ronimo is not knowledge, but self discipline.

Friday, January 2, 2015

What is Programming???

Programming is a creative process done by programmers to instruct a computer on how to do a task. Hollywood has helped instill an image of programmers as uber techies who can sit down at a computer and break any password in seconds. Sadly the reality is far less interesting!

The purpose of programming is to find a sequence of instructions that will automate performing a specific task or solving a given problem.The process of programming thus often requires expertise in many different subjects, including knowledge of the application domain, specialized algorithms and formal logic.

It is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This code can be written in a variety of computer programming languages. Some of these languages include Java, C, and Python. Computer code is a collection of typed words that the computer can clearly understand. Just as a human translator might translate from the English language to Spanish, the computer interprets these words as ones and zeros. We as humans use programming languages, instead of writing directly in ones and zeros, so we can easily write and understand the computer code and can organize it.