Introduction to Assembly Language

This is a brief introduction to assembly language. Assembly language is the most basic programming language available for any processor. With assembly language, a programmer works only with operations implemented directly on the physical CPU. Assembly language lacks high-level conveniences such as variables and functions, and it is not portable between various families of processors. Nevertheless, assembly language is the most powerful computer programming language available, and it gives programmers the insight required to write effective code in high-level languages. Learning assembly language is well worth the time and effort of every serious programmer.

The Basics

Before we can explore the process of writing computer programs, we have to go back to the basics and learn exactly what a computer is and how it works. Every computer, no matter how simple or complex, has at its heart exactly two things: a CPU and some memory. Together, these two things are what make it possible for your computer to run programs.

On the most basic level, a computer program is nothing more than a collection of numbers stored in memory. Different numbers tell the CPU to do different things. The CPU reads the numbers one at a time, decodes them, and does what the numbers say. For example, if the CPU reads the number 64 as part of a program, it will add 1 to the number stored in a special location called AX. If the CPU reads the number 146, it will swap the number stored in AX with the number stored in another location called BX. By combining many simple operations such these into a program, a programmer can make the computer perform many incredible things.

As an example, here are the numbers of a simple computer program: 184, 0, 184, 142, 216, 198, 6, 158, 15, 36, 205, 32. If you were to enter these numbers into your computer's memory and run them under MS-DOS, you would see a dollar sign placed in the lower right hand corner of your screen, since that is what these numbers tell the computer to do.

Assembly Language

Although the numbers of the above program make perfect sense to a computer, they are about as clear as mud to a human. Who would have guessed that they put a dollar sign on the screen? Clearly, entering numbers by hand is a lousy way to write a program.

It doesn't have to be this way, though. A long time ago, someone came up with the idea that computer programs could be written using words instead of numbers. A special program called an assembler would then take the programmer's words and convert them to numbers that the computer could understand. This new method, called writing a program in assembly language, saved programmers thousands of hours, since they no longer had to look up hard-to-remember numbers in the backs of programming books, but could use simple words instead.

The program above, written in assembly language, looks like this:

MOV AX, 47104

MOV DS, AX

MOV [3998], 36

INT 32

When an assembler reads this sample program, it converts each line of code into one CPU-level instruction. This program uses two types of instructions, MOV and INT. On Intel processors, the MOV instruction moves data around, while the INT instruction transfers processor control to the device drivers or operating system.

The program still isn't quite clear, but it is much easier to understand than it was before. The first instruction, MOV AX, 47104, tells the computer to copy the number 47104 into the location AX. The next instruction, MOV DS, AX, tells the computer to copy the number in AX into the location DS. The next instruction, MOV [3998], 36 tells the computer to put the number 36 into memory location 3998. Finally, INT 32 exits the program by returning to the operating system.

Before we go on, I would like to explain just how this program works. Inside the CPU are a number of locations, called registers, which can store a number. Some registers, such as AX, are general purpose, and don't do anything special. Other registers, such as DS, control the way the CPU works. DS just happens to be a segment register, and is used to pick which area of memory the CPU can write to. In our program, we put the number 47104 into DS, which tells the CPU to access the memory on the video card. The next thing our program does is to put the number 36 into location 3998 of the video card's memory. Since 36 is the code for the dollar sign, and 3998 is the memory location of the bottom right hand corner of the screen, a dollar sign shows up on the screen a few microseconds later. Finally, our program tells the CPU to perform what is called an interrupt. An interrupt is used to stop one program and execute another in its place. In our case, we want interrupt 32, which ends our program and goes back to MS-DOS, or whatever other program was used to start our program.

Running the Program

Let's go ahead and run this program. First, be sure to print these instructions out, since you will need to refer to them as we go on. Next, click on your start menu, and run the program called MS-DOS Prompt. A black screen with white text should appear. We are now in MS-DOS, the way computers used to be 20 years ago. MS-DOS was before the days of the mouse, so you must type commands on the keyboard to make the computer do things.

First, I want you to type the word debug, and press enter. The cursor should move down a line, and you should see the Debug prompt, which is a simple dash. We are now in a program called Debug. Debug is a powerful utility that lets you directly access the registers and memory of your computer for various purposes. In our case, we want to enter our program into memory and run it, so we'll use Debug's a command, for assemble. Go ahead and type a100 now. The cursor will move down another line, and you will see something like 1073:0100. This is the memory location we are going to enter assembly language instructions at. The first number is the segment, and the second number is the memory location within the segment. Your Debug program will probably pick a different segment for your program than mine did, so don't worry if it's different. Another thing to note is that Debug only understands hexadecimal numbers, which are a sort of computer shorthand. Hexadecimal numbers sometimes contain letters as well as well as digits, so if you see something like 63AF, don't worry.

Let's go ahead and enter our program now. Type each of the instructions below into Debug exactly as they appear, and press enter after each one. When you finish entering the last instruction, press enter twice to tell Debug that we are done entering instructions.

mov ax,B800

mov ds,ax

mov byte[0F9E],24

int 20

As you can see, I've converted all the numbers into hexadecimal, and have made a few other changes so Debug can understand what's going on. If you make a mistake while entering the above program, press enter twice, type a100, and start entering instructions again at the beginning of the program.

Once you have entered the program, you can go ahead and run it. Simply type g for go and press enter when you are ready to start the program. You should see a dollar sign in the lower right hand corner of your screen and the words Program terminated normally. These words are put out by Debug to let you know that the program ended normally. Congratulations! You've just entered and run your first assembly language program!

Let's get back to Windows now. Go ahead and type q to get out of Debug. Now, type exit to get out of MS-DOS. You should now be back in Windows.

Learning more

This tutorial just barely scratches the surface of how assembly language works. To learn more about modern assembly langauge, I suggest reading this tutorial.

Polish Translation by Andrey Fomin

Swanson Technologies