首页 > 代码库 > C# to IL 2 IL Basics(IL基础)
C# to IL 2 IL Basics(IL基础)
This chapter and the next couple of them will focus on and elicit a simple belief of ours,
that if you really want to understand C# code in earnest, then the best way of doing so is
by understanding the IL code generated by the C# compiler.
So, we shall raise the curtains with a small C# program and then explain the IL code
generated by the compiler. In doing so, we will be able to kill two birds with one stone:
Firstly, we will be able to unravel(解开) the mysteries of IL and secondly, we will obtain a more
intuitive understanding of the C# programming language.
We will first show you a .cs file and then a program written in IL by the C# compiler, whose
output will be the same as that of the .cs file. The output will be displayed of the IL code.
This will enhance our understanding of not only C# but also IL. So, without much ado, lets
take the plunge.
The above code is generated by the il disassembler
After executing ildasm on the exe file, we studied the IL code generated by the program.
Subsequently, we eliminated parts of the code that did not ameliorate our understanding of
IL. This consisted of some comments, directives, functions etc. The remaining IL code
presented is as close to the original as possible.
The advantage of this technique of mastering IL by studying the IL code itself is that, we
are learning from the master, i.e. the C# compiler, on how to write decent IL code. We
cannot find a better authority than the C# compiler to enlighten us about IL.
The rules for creating a static function abc remain the same as any other function such as
Main or vijay. As abc is a static function, we have to use the static modifier in the .method
directive.
When we want to call a function, the following information has to be provided in the order
given below:
? the return data type.
? the class name.
? the function name to be called.
? the data types of the parameters.
The same rules also apply when we call the .ctor function from the base class. It is
mandatory to write the name of the class before the name of the function. In IL, no
assumptions are made about the name of the class. The name defaults to the class we are
in while calling the function.
Thus, the above program first displays "hi" using the WriteLine function and then calls the
static function abc. This function too uses the WriteLine function to display "bye".
Static constructors are always called before any other code is executed. In C#, a static
constructor is merely a function with the same name as a class. In IL, the name of the
function changes to .cctor. Thus, you may have observed that in the earlier example, we
got a free function called ctor.
Whenever we have a class with no constructors, a free constructor with no parameters is
created. This free constructor is given the name .ctor. This knowledge should enhance our
ability as C# programmers, as we are now in a better position to comprehend as to what
goes on below the hood.
The static function gets called first and the function with the entrypoint directive gets
called thereafter.
The keyword new in C# gets converted to the assembler instruction newobj. This provides
evidence that IL is not a low level assembler, and that it can also create objects in memory.
The instruction newobj creates a new object in memory. Even in IL, we are shielded from
what new or newobj really does. This demonstrates that IL is not just another high level
language, but is designed in such a way that other modern languages can be compiled to
it.
The rules for using newobj are the same as that for calling a function. The full prototype of
the function name is required. In this case, we are calling the constructor without any
parameters, hence the function .ctor is called. In the constructor, the WriteLine function is
called.
As we had promised earlier, we are going to explain the instruction ldarg.0 here. Whenever
we create an object that is an instance of a class, it contains two basic entities:
? functions
? fields or variables i.e. data.
When a function gets called, it does not know or care as to where it is being called from or
who is calling it. It receives all its parameters off the stack. There is no point in having two
copies of a function in memory. This is because, if a class contains a megabyte of code,
each time we say ‘new‘ on it, an additional megabyte of memory will be occupied.
When new is called for the first time, memory gets allocated for the code and the variables.
But thereafter, with every call on new, fresh memory is allocated only for the variables.
Thus, if we have five instances of a class, there will be only one copy of the code, but five
separate copies of the variables.
Every non-static or instance function is passed a handle which indicates the location of the
variables of the object that has called this function. This handle is called the this pointer.
‘this‘ is represented by ldarg.0. This handle is always passed as the first parameter to every
instance function. Since it is always passed by default, it is not mentioned in the
parameter list of a function.
All the action takes place on the stack. The instruction pop removes whatever is on the top
of the stack. In this example, we use it to remove the instance of zzz that has been placed
on top of the stack by the newobj instruction.
The static constructor always gets called first whereas the instance constructor gets called
only after new. IL enforces this sequence of execution. The calling of the base class
constructor is not mandatory. Hence, to save space in our book, we have not shown its
code in all the programs.
In some cases, if we do not include the code of a constructor, the programs do not work.
Only in these cases, the code of the constructor has been included. The static constructor
does not call the base class constructor, also ‘this’ is not passed to static functions.
We have created two variables called i and j in our function Main in the C# program. They
are local variables and are created on the stack. On conversion to IL, if you notice, the
names of the variables are lost
The variables get created in IL through the locals directive, which assigns its own names to
the variables, beginning with V_0 and V_1 and so on. The data types are also altered from
int to int32 and from long to int64. The basic types in C# are aliases. They all get converted
to data types that IL understands.
The task on hand is to initialize the variable i to a value of 6. This value has to be loaded
on the stack or evaluation stack. The instruction to do so is ldc.i4.value. An i4 takes up
four bytes of memory.
The value mentioned in the syntax above is the constant that has to be put on the stack.
After the value 6 has been loaded on to the stack, we now need to initialize the variable i to
this value. The variable i has been renamed as V_0 and is the first variable in the locals
directive.
The instruction stloc.0 takes the value present at the top of the stack i.e. 6 and initializes
the variable V_0 to it. The process of initializing a variable is definitely complicated.
The second ldc instruction copies the value of 7 onto the stack. On a 32 bit machine,
memory can only be allocated in chunks of 32 bytes. In the same vein, on a 64 bit
machine, the memory is allocated in chunks of 64 bytes.
The number 7 is stored as a constant and requires only 4 bytes, but a long requires 8
bytes. Thus, we need to convert the 4 bytes to 8 bytes. The instruction conv.i8 is used for
this purpose. It places a 8 byte number on the stack. Only after doing so, we use stloc.1 to
initialize the second variable V_1 to the value of 7. Hence stloc.1
Thus, the ldc series is used to place a constant number on the stack and stloc is utilized to
pick up what is on the stack and initialize a local to that value.
Now you will finally be able to see the light at the end of the tunnel and understand as to
why we wanted you to read this book in the first place.
Let us understand the above code, one field at a time. We have created a variable i that is
static and initialized it to the value of 6. Since the variable i has not been given an access
modifier, the default value is private. The static modifier of C# is applicable to variables in
IL also.
The real action begins now. The variable needs to be assigned an initial value. This value
must be assigned in the static constructor only, because the variable is static. We employ
ldc to place the value 6 on the stack. Note that the locals directive is not used here.
To initialize i, we use the instruction stsfld that looks for a value on top of the stack. The
next parameter to the instruction stsfld is the number of bytes it has to pick up from the
stack to initialize the static variable. In this case, the number of bytes specified is 4.
The variable name is preceded by the name of the class. This is in contrast to the syntax of
local variables.
For the instance variable j, since its access modifier was public in C#, on conversion to IL,
its access modifier is retained as public. Since it is an instance variable, its value gets
initialized in the instance constructor. The instruction used here is stfld and not stsfld.
Here we need 8 bytes of the stack.
The rest of the code remains the same as before. Thus, we can see that the instruction
stloc is used to initialize locals and the instruction stfld is used to initialise fields
The main purpose of the above example is to verify whether the variable is initialized first
or the code contained in a constructor gets called first. The IL output demonstrates very
lucidly that, first all the variables get initialized and thereafter, the code in a constructor
gets executed.
You may have also noticed that the base class constructor gets executed first and then,
and only then, does the code that is written in a constructor, get called.
This nugget of knowledge is sure to enhance your understanding of C# and IL
We can print a number instead of a string by overloading the WriteLine function
First, we push the value 10 onto the stack using the ldc family. Observe carefully, the
instruction now is ldc.i4.s and then the value of 10. Any instruction takes 4 bytes in
memory, but when followed by .s takes only one byte.
Then the C# compiler calls the correct overloaded version of the WriteLine function, which
accepts an int32 value from the stack.
This is similar to printing strings
We shall now delve on how to print a number on the screen.
The WriteLine function accepts a string followed by a variable number of objects. The {0}
prints the first object after the comma. Even though there is no variable in the C# code, on
conversion to IL code, a variable of type int32 is created.
The string {0} is loaded on the stack using our trustworthy ldstr. Then, we place the
number that is to be passed as a parameter to the WriteLine function, on the stack. To do
so, we use ldc.i4.s which loads the constant value on the stack. After this, we initialize the
variable V_0 to 20 with the stloc.0 instruction. and then ldloca.s loads the address of the
local varable on the stack.
The major roadblock that we experience here is that the WriteLine function accepts a string
followed by an object as the next parameter. In this case, the variable is of value type and
not reference type.
An int32 is a value type variable whereas the WriteLine function wants a full-fledged object
of a reference type.
How do we solve the dilemma of converting a value type into a reference type?
As informed earlier, we use the instruction ldloca.s to load the address of the local variable
V_0 onto the stack. Thus, our stack contains a string followed by the address of a value
type variable, V_0.
Next, we call an instruction called box. There are only two types of variables in the .NET
world i.e. value types and reference types. Boxing is the method that .NET uses to convert
a value type variable into a reference type variable.
The box instruction takes an unboxed or value type variable and converts it into a boxed or
reference type variable. The box instruction needs the address of a value type on the stack
and allocates space on the heap for its equivalent reference type.
The heap is an area of memory used to store reference types. The values on the stack
disappear at the end of a function, but the heap is available for a much longer duration.
Once this space is allocated, the box instruction initializes the instance fields of the
reference object. Then, it assigns the memory location in the heap, of this newly
constructed object to the stack, The box instruction requires a memory location of a locals
variable on the stack.
The constant stored on the stack has no physical address. Thus, the variable V_0 is
created to provide the memory location.
This boxed version on the heap is similar to the reference type variable that we are familiar
with. It really does not have any type and thus looks like System.Object. To access its
specific values, we need to unbox it first. The WriteLine function does this internally.
The data type of the parameter that is to be boxed must be the same as that of the variable
whose address has been placed on the stack. We will subsequently explain these details
The above code is used to display the value of a static variable. The .cctor function
initializes the static variable to a value of 10. Then, the string {0} is stored on the stack.
The function ldsldfa loads the address of a static variable of a certain data type on the
stack. Then, as usual, box takes over. The explanation regarding the functionality of ‘box‘
given above is relevant here also.
Static variables in IL work in the same way as instance variables. The only difference is in
the fact that they have their own set of instructions. Instructions like box need a memory
location on the stack without discriminating(有差别的) between static and instance variables.
The only variation that we indulged in from the earlier program is that we have removed
the static constructor. All static variables and instance variables get initialized internally to
ZERO. Thus, IL does not generate any error. Internally, even before the static constructor
gets called, the field i is assigned an initial value of ZERO
We have initialised the local i to a value of 10. This cannot be done in the constructor since
the variable i has been created on the stack. Then, stloc.0 has been used to assign the
value of 10 to V_0. Thereafter, ldloc.0 has been ustilised to place the variable V_0 on the
stack, so that it is available to the WriteLine function.
The Writeline function thereafter displays the value on the screen. A field and a local
behave in a similar manner, except that they use separate sets of instructions.
All local variables have to be initialised, or else, the compiler will generate an unintelligible
error message. Here, even though we have eliminated the ldc and stloc instructions, no
error is generated at runtime. Instead, a very large number is displayed.
The variable V_0 has not been initialised to any value. It was created on the stack and
contained whatever value was available at the memory location assigned to it. On your
machine, the output will be very different than ours.
In a similar situation, the C# compiler will give you an error and not allow you to proceed
further, because the variable has not been initialized. IL, on the other hand, is a strange
kettle of fish. It is much more lenient in its outlook. It does very few error or sanity checks
on the source code. This has its drawback, maening, the programmer has to be much more
responsible and careful while using IL.
In the above example, a static variable has been initialised inside a function and not at the
time of its creation, as seen earlier. The function vijay calls the code present in the static
constructor.
The process given above is the only way to initialize a static or an instance variable.
The above program demonstrates as to how we can call a function with a single parameter.
The rules for placing parameters on the stack are similar to those for the WriteLine
function.
Now let us comprehend as to how a function receives parameters from the stack.
We begin by stating the data type and parameter name in the function declaration. This is
similar to the workings in C#.
Next, we use the instruction ldarga.s to load the address of the parameter i, onto the stack.
box will then convert the value type of this objct into object type and finally WriteLine
function uses these values to display the output on the screen.
In the above example, we have converted an int into an object because, the WriteLine
function requires the parameter to be of this data type.
The only method of achieving this conversion is by using the box instruction. The box
instruction converts an int into an object.
In the function abc, we accept a System.Object and we use the instruction ldarg and not
ldarga. The reason being, we require the value of the parameter and not its address. The
dot after the name signifies the parameter number. In order to place the values of
parameters on the stack, a new instruction is required.
Thus, IL handles locals, fields and parameters with their own set of instructions.
Functions return values. Here, a static function abc has been called. We know from the
function‘s signature that it returns an int. Return values are stored on the stack.
Thus, the stloc.1 instruction picks up the value on the stack and places it in the local V_1.
In this specific case, it is the return value of the function.
Newobj is also like a function. It returns an object which, in our case, is an instance of the
class zzz, and puts it on the stack.
The stloc instruction has been used repeatedly to initialize all our local variables. Just to
refresh your memory, ldloc does the reverse of this process.
A function has to just place a value on the stack using the trustworthy ldc and then cease
execution using the ret instruction.
Thus, the stack has a dual role to play:
? It is used to place values on the stack.
? It receives the return values of the functions
The only innovation and novelty that has been introduced in the above example is that the
return value of the function abc has been stored in an instance variable.
? Stloc assigns the value on the stack to a local variable.
? Ldloc, on the other hand, places the value of a local variable on the stack.
It is not understood as to why the object that looks like zzz has to be put on the stack
again, especially since abc is a static function and not an instance function. Mind you,
static functions are not passed the this pointer on the stack.
Thereafter, the function abc is called, which places the value 20 on the stack. The
instruction stfld picks up the value 20 from the stack, and initializes the instance variable i
with this value.
Local and instance variables are handled in a similar manner except that, the instructions
for their initialization are different.
The instruction ldfld does the reverse of what stfld does. It places the value of an instance
variable on the stack to make it available for the WriteLine function.
C# to IL 2 IL Basics(IL基础)