VX Heavens

Home Upload Library Collection Sources Engines Constructors Simulators Utilities Links AV Checkβ

The Art of Computer Virus Research and Defense

Peter Szor
Addison Wesley Professional
ISBN 0-321-30454-3
February 2005

[Back to index]

\text{T_EX size}
The Art of Computer Virus Research and Defense (book cover)

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals.

The author and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein.

Symantec Press Publisher: Linda McCarthy

Editor in Chief: Karen Gettman

Acquisitions Editor: Jessica Goldstein

Cover Designer: Alan Clements

Managing Editor: Gina Kanouse

Senior Project Editor: Kristy Hart

Copy Editor: Christal Andry

Indexers: Cheryl Lenser and Larry Sweazy

Compositor: Stickman Studio

Manufacturing Buyer: Dan Uhrig

The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact:

U. S. Corporate and Government Sales
(800) 382-3419
corpsales@pearsontechgroup.com

For sales outside the U. S., please contact:

International Sales
international@pearsoned.com

Visit us on the Web: www.awprofessional.com

Library of Congress Number: 2004114972

Copyright © 2005 Symantec Corporation

All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to:

Pearson Education, Inc.
Rights and Contracts Department
One Lake Street
Upper Saddle River, NJ 07458

Text printed in the United States on recycled paper at Phoenix BookTech in Hagerstown, Maryland.

First printing, February, 2005

Table of Contents

Dedication

to Natalia

About the Author

Peter Szor is a world renowned computer virus and security researcher. He has been actively conducting research on computer viruses for more than 15 years, and he focused on the subject of computer viruses and virus protection in his diploma work in 1991. Over the years, Peter has been fortunate to work with the best-known antivirus products, such as AVP, F-PROT, and Symantec Norton AntiVirus. Originally, he built his own antivirus program, Pasteur, from 1990 to 1995, in Hungary. Parallel to his interest in computer antivirus development, Peter also has years of experience in fault-tolerant and secured financial transaction systems development.

He was invited to join the Computer Antivirus Researchers Organization (CARO) in 1997. Peter is on the advisory board of Virus Bulletin Magazine and a founding member of the AntiVirus Emergency Discussion (AVED) network. He has been with Symantec for over five years as a chief researcher in Santa Monica, California.

Peter has authored over 70 articles and papers on the subject of computer viruses and security for magazines such as Virus Bulletin, Chip, Source, Windows NT Magazine, and Information Security Bulletin, among others. He is a frequent speaker at conferences, including Virus Bulletin, EICAR, ICSA, and RSA and has given invited talks at such security conferences as the USENIX Security Symposium. Peter is passionate about sharing his research results and educating others about computer viruses and security issues.

Preface

Who Should Read This Book

Over the last two decades, several publications appeared on the subject of computer viruses, but only a few have been written by professionals ("insiders") of computer virus research. Although many books exist that discuss the computer virus problem, they usually target a novice audience and are simply not too interesting for the technical professionals. There are only a few works that have no worries going into the technical details, necessary to understand, to effectively defend against computer viruses.

Part of the problem is that existing books have little if any information about the current complexity of computer viruses. For example, they lack serious technical information on fast-spreading computer worms that exploit vulnerabilities to invade target systems, or they do not discuss recent code evolution techniques such as code metamorphism. If you wanted to get all the information I have in this book, you would need to spend a lot of time reading articles and papers that are often hidden somewhere deep inside computer virus and security conference proceedings, and perhaps you would need to dig into malicious code for years to extract the relevant details.

I believe that this book is most useful for IT and security professionals who fight against computer viruses on a daily basis. Nowadays, system administrators as well as individual home users often need to deal with computer worms and other malicious programs on their networks. Unfortunately, security courses have very little training on computer virus protection, and the general public knows very little about how to analyze and defend their network from such attacks. To make things more difficult, computer virus analysis techniques have not been discussed in any existing works in sufficient length before.

I also think that, for anybody interested in information security, being aware of what the computer virus writers have "achieved" so far is an important thing to know.

For years, computer virus researchers used to be "file" or "infected object" oriented. To the contrary, security professionals were excited about suspicious events only on the network level. In addition, threats such as CodeRed worm appeared to inject their code into the memory of vulnerable processes over the network, but did not "infect" objects on the disk. Today, it is important to understand all of these major perspectivesthe file (storage), in-memory, and network viewsand correlate the events using malicious code analysis techniques.

During the years, I have trained many computer virus and security analysts to effectively analyze and respond to malicious code threats. In this book, I have included information about anything that I ever had to deal with. For example, I have relevant examples of ancient threats, such as 8-bit viruses on the Commodore 64. You will see that techniques such as stealth technology appeared in the earliest computer viruses, and on a variety of platforms. Thus, you will be able to realize that current rootkits do not represent anything new! You will find sufficient coverage on 32-bit Windows worm threats with in-depth exploit discussions, as well as 64-bit viruses and "pocket monsters" on mobile devices. All along the way, my goal is to illustrate how old techniques "reincarnate" in new threats and demonstrate up-to-date attacks with just enough technical details.

I am sure that many of you are interested in joining the fight against malicious code, and perhaps, just like me, some of you will become inventors of defense techniques. All of you should, however, be aware of the pitfalls and the challenges of this field!

That is what this book is all about.

What I Cover

The purpose of this book is to demonstrate the current state of the art of computer virus and antivirus developments and to teach you the methodology of computer virus analysis and protection. I discuss infection techniques of computer viruses from all possible perspectives: file (on storage), in-memory, and network. I classify and tell you all about the dirty little tricks of computer viruses that bad guys developed over the last two decades and tell you what has been done to deal with complexities such as code polymorphism and exploits.

The easiest way to read this book is, well, to read it from chapter to chapter. However, some of the attack chapters have content that can be more relevant after understanding techniques presented in the defense chapters. If you feel that any of the chapters are not your taste, or are too difficult or lengthy, you can always jump to the next chapter. I am sure that everybody will find some parts of this book very difficult and other parts very simple, depending on individual experience.

I expect my readers to be familiar with technology and some level of programming. There are so many things discussed in this book that it is simply impossible to cover everything in sufficient length. However, you will know exactly what you might need to learn from elsewhere to be absolutely successful against malicious threats. To help you, I have created an extensive reference list for each chapter that leads you to the necessary background information.

Indeed, this book could easily have been over 1,000 pages. However, as you can tell, I am not Shakespeare. My knowledge of computer viruses is great, not my English. Most likely, you would have no benefit of my work if this were the other way around.

What I Do Not Cover

I do not cover Trojan horse programs or backdoors in great length. This book is primarily about self-replicating malicious code. There are plenty of great books available on regular malicious programs, but not on computer viruses.

I do not present any virus code in the book that you could directly use to build another virus. This book is not a "virus writing" class. My understanding, however, is that the bad guys already know about most of the techniques that I discuss in this book. So, the good guys need to learn more and start to think (but not act) like a real attacker to develop their defense!

Interestingly, many universities attempt to teach computer virus research courses by offering classes on writing viruses. Would it really help if a student could write a virus to infect millions of systems around the world? Will such students know more about how to develop defense better? Simply, the answer is no...

Instead, classes should focus on the analysis of existing malicious threats. There are so many threats out there waiting for somebody to understand themand do something against them.

Of course, the knowledge of computer viruses is like the "Force" in Star Wars. Depending on the user of the "Force," the knowledge can turn to good or evil. I cannot force you to stay away from the "Dark Side," but I urge you to do so.

Acknowledgments

First, I would like to thank my wife Natalia for encouraging my work for over 15 years! I also thank her for accepting the lost time on all the weekends that we could have spent together while I was working on this book.

I would like to thank everybody who made this book possible. This book grew out of a series of articles and papers on computer viruses, several of which I have co-authored with other researchers over the years. Therefore, I could never adequately thank Eric Chien, Peter Ferrie, Bruce McCorkendale, and Frederic Perriot for their excellent contributions to Chapter 7 and Chapter 10.

This book could not be written without the help of many friends, great antivirus researchers, and colleagues. First and foremost, I would like to thank Dr. Vesselin Bontchev for educating me in the terminology of malicious programs for many years while we worked together. Vesselin is famous ("infamous?") for his religious accuracy in the subject matter, and he greatly influenced and supported my research.

A big thank you needs to go to the following people who encouraged me to write this book, educated me in the subject, and influenced my research over the years: Oliver Beke, Zoltan Hornak, Frans Veldman, Eugene Kaspersky, Istvan Farmosi, Jim Bates, Dr. Frederick Cohen, Fridrik Skulason, David Ferbrache, Dr. Klaus Brunnstein, Mikko Hypponen, Dr. Steve White, and Dr. Alan Solomon.

I owe a huge thanks to my technical reviewers: Dr. Vesselin Bontchev, Peter Ferrie, Nick FitzGerald, Halvar Flake, Mikko Hypponen, Dr. Jose Nazario, and Jason V. Miller. Your encouragements, criticisms, insights, and reviews of early handbook manuscripts were simply invaluable.

I need to thank Janos Kis and Zsolt Szoboszlay for providing me access to in-the-wild virus code for analysis, in the days when the BBS was the center of the computing universe. I also need to thank Gunter May for the greatest present that an east European kid could geta C64.

A big thanks to everybody at Symantec, especially to Linda A. McCarthy and Vincent Weafer, who greatly encouraged me to write this book. I would also like to thank Nancy Conner and Chris Andry for their outstanding editorial work. Without their help, this project simply would never have finished. I also owe a huge thanks to Jessica Goldstein, Kristy Hart, and Christy Hackerd for helping me with the publishing process all the way.

A big thanks to all past and present members of the Computer Antivirus Researchers Organization (CARO), VFORUM, and the AntiVirus Emergency Discussion (AVED) List for all the exciting discussions on computer viruses and other malicious programs and defense systems.

I would like to thank everybody at Virus Bulletin for publishing my articles and papers internationally for almost a decade and for letting me use that material in this book.

Last but not least, I thank my teacher parents and grandparents for the extra "home education" in math, physics, music, and history

Contact Information

If you find errors or have suggestions for clarification or material you would like to see in a future edition, I would love to hear from you. I am planning to introduce clarifications, possible corrections, and new information relevant to the content of this work on my Web site. While I think we have found most of the problems (especially in those paragraphs that were written late at night or between virus and security emergencies), I believe that no such work of this complexity and size can exist without some minor nits. Nonetheless, I made all the efforts to provide you with "trustworthy" information according to the best of my research knowledge.

Peter Szor,
Santa Monica, CA
pszor@acm.org
http://www.peterszor.com

Part I: Strategies of the attacker

Chapter 1. Introduction to the Games of Nature

"To me art is a desire to communicate."

Endre Szasz


Computer virus research is a fascinating subject to many who are interested in nature, biology, or mathematics. Everyone who uses a computer will likely encounter some form of the increasingly common problem of computer viruses. In fact, some well-known computer virus researchers became interested in the field when, decades ago, their own systems were infected.

The title of Donald Knuth's book series1, The Art of Computer Programming, suggests that anything we can explain to a computer is science, but that which we cannot currently explain to a computer is an art. Computer virus research is a rich, complex, multifaceted subject. It is about reverse engineering, developing detection, disinfection, and defense systems with optimized algorithms, so it naturally has scientific aspects; however, many of the analytical methods are an art of their own. This is why outsiders often find this relatively young field so hard to understand. Even after years of research and publications, many new analytical techniques are in the category of art and can only be learned at antivirus and security vendor companies or through the personal associations one must forge to succeed in this field.

This book attempts to provide an insider's view of this fascinating research. In the process, I hope to teach many facts that should interest both students of the art and information technology professionals. My goal is to provide an extended understanding of both the attackers and the systems built to defend against virulent, malicious programs.

Although there are many books about computer viruses, only a few have been written by people experienced enough in computer virus research to discuss the subject for a technically oriented audience.

The following sections discuss historical points in computation that are relevant to computer viruses and arrive at a practical definition of the term computer virus.

1.1. Early Models of Self-Replicating Structures

Humans create new models to represent our world from different perspectives. The idea of self-replicating systems that model self-replicating structures has been around since the Hungarian-American, Neumann JE1nos (John von Neumann), suggested it in 1948 2,3,4.

Von Neumann was a mathematician, an amazing thinker, and one of the greatest computer architects of all time. Today's computers are designed according to his original vision. Neumann's machines introduced memory for storing information and binary (versus analog) operations. According to von Neumann's brother Nicholas, "Johnny" was very impressed with Bach's "Art of the Fugue" because it was written for several voices, with the instrumentation unspecified. Nicholas von Neumann credits the Bach piece as a source for the idea of the stored-program computer5.

In the traditional von Neumann machine, there was no basic difference between code and data. Code was differentiated from data only when the operating system transferred control and executed the information stored there.

To create a more secure computing system, we will find that system operations that better control the differentiation of data from code are essential. However, we also will see the weaknesses of such approaches.

Modern computers can simulate nature using a variety of modeling techniques. Many computer simulations of nature manifest themselves as games. Modern computer viruses are somewhat different from these traditional nature-simulation game systems, but students of computer virus research can appreciate the utility of such games for gaining an understanding of self-replicating structures.

1.1.1. John von Neumann: Theory of Self-Reproducing Automata

Replication is an essential part of life. John von Neumann was the first to provide a model to describe nature's self-reproduction with the idea of self-building automata.

In von Neumann's vision, there were three main components in a system:

  1. A Universal Machine
  2. A Universal Constructor
  3. Information on a Tape

A universal machine (Turing Machine) would read the memory tape and, using the information on the tape, it would be able to rebuild itself piece by piece using a universal constructor. The machine would not understand the processit would simply follow the information (blueprint instructions) on the memory tape. The machine would only be able to select the next proper piece from the set of all the pieces by picking them one by one until the proper piece was found. When it was found, two proper pieces would be put together according to the instructions until the machine reproduced itself completely.

If the information that was necessary to rebuild another system could be found on the tape, then the automata was able to reproduce itself. The original automata would be rebuilt (Figure 1.1), and then the newly built automata was booted, which would start the same process.

Figure 1.1. The model of a self-building machine.

Figure 1.1. The model of a self-building machine.

A few years later, Stanislaw Ulam suggested to von Neumann to use the processes of cellular automation to describe this model. Instead of using "machine parts," states of cells were introduced. Because cells are operated in a robotic fashion according to rules ("code"), the cell is known as an automaton. The array of cells comprises the cellular automata (CA) computer architecture.

Von Neumann changed the original model using cells that had 29 different states in a two-dimensional, 5-cell environment. To create a self-reproducing structure, he used 200,000 cells. Neumann's model mathematically proved the possibility of self-reproducing structures: Regular non-living parts (molecules) could be combined to create self-reproducing structures (potentially living organisms).

In September 1948, von Neumann presented his vision of self-replicating automata systems. Only five years later, in 1953, Watson and Crick recognized that living organisms use the DNA molecule as a "tape" that provides the information for the reproduction system of living organisms.

Unfortunately, von Neumann could not see a proof of his work in his life, but his work was completed by Arthur Burks. Further work was accomplished by E.F. Codd in 1968. Codd simplified Neumann's model using cells that had eight states, 5-cell environments. Such simplification is the base for "self-replicating loops"6 developed by artificial life researchers, such as Christopher G. Langton, in 1979. Such replication loops eliminate the complexity of universal machine from the system and focus on the needs of replication.

In 1980 at NASA/ASEE, Robert A. Freitas, Jr. and William B. Zachary7 conducted research on a self-replicating, growing lunar factory. A lunar manufacturing facility (LMF) was researched, which used the theory of self-reproducing automata and existing automation technology to make a self-replicating, self-growing factory on the moon. Robert A. Freitas, Jr. and Ralph C. Merkle recently authored a book titled Kinematic Self-Replicating Machines. This book indicates a renewed scientific interest in the subject. A few years ago, Freitas introduced the term ecophagy, the theoretical consumption of the entire ecosystem by out of control, self-replicating nano-robots, and he proposed mitigation recommendations8.

It is also interesting to note that the theme of self-replicating machines occurs repeatedly in works of science fiction, from movies such as Terminator to novels written by such authors as Neal Stephenson and William Gibson. And of course, there are many more examples from beyond the world of science fiction, as nanotech and microelectrical mechanical systems (MEMS) engineering have become real sciences.

1.1.2. Fredkin: Reproducing Structures

Several people attempted to simplify von Neumann's model. For instance, in 1961 Edward Fredkin used a specialized cellular automaton in which all the structures could reproduce themselves and replicate using simple patterns on a grid (see Figure 1.2 for a possible illustration). Fredkin's automata had the following rules9:

Figure 1.2. Generation 1, Generation 2, and... Generation 4.

Figure 1.2. Generation 1, Generation 2, and... Generation 4.

Using the rules described previously with this initial layout allows all structures to replicate. Although there are far more interesting layouts to explore, this example is the simplest possible model of self-reproducing cellular automata.

1.1.3. Conway: Game of Life

In 1970, John Horton Conway10 created one of the most interesting cellular automata systems. Just as the pioneer von Neumann did, Conway researched the interaction of simple elements under a common rule and found that this could lead to surprisingly interesting structures. Conway named his game Life. Life is based on the following rules:

Figure 1.3 demonstrates a modern representation of the original Conway table game written by Edwin Martin11.

Figure 1.3. Edwin Martin's Game of Life implementation on the Mac using "Shooter" starting structure.

Figure 1.3. Edwin Martin's Game of Life implementation on the Mac using "Shooter" starting structure.

It is especially interesting to see the computer animation as the game develops with the so-called "Shooter" starting structure. In a few generations, two shooter positions that appear to shoot to each other will develop on the sides of the table, as shown in Figure 1.4, and in doing so they appear to produce so-called gliders that "fly" away (see Figure 1.5) toward the lower-right corner of the table. This sequence continues endlessly, and new gliders are produced.

Figure 1.4. "Shooter" in Generation 355.

Figure 1.4. "Shooter" in Generation 355.

Figure 1.5. The glider moves around without changing shape.

Figure 1.5. The glider moves around without changing shape.

On a two-dimensional table, each cell has two potential states: S=1 if there is one token in the cell, or S=0 if there is no token. Each cell will live according to the rules governed by the cell's environment (see Figure 1.6).

Figure 1.6. The 9-cell-based Moore environment.

Figure 1.6. The 9-cell-based Moore environment.

The following characteristics/rules define Conway's game, Life:

Conway originally believed that there were no self-replicating structures in Life. He even offered $50 to anyone who could create a starting structure that would lead to self-replication. One such structure was quickly found using computers at the artificial intelligence group of the Massachusetts Institute of Technology (MIT).

MIT students found a structure that was later nicknamed a glider. When 13 gliders meet, they create a pulsing structure. Later, in the 100th generation, the pulsing structure suddenly "gives birth" to new gliders, which quickly "fly" away. After this point, in each 30th subsequent generation, there will be a new glider on the table that flies away. This sequence continues endlessly. This setup is very similar to the "Shooter" structure shown in Figures 1.3 and 1.4.

Games with Computers, written by Antal Csakany and Ferenc Vajda in 1980, contains examples of competitive games. The authors described a table game with rules similar to those of Life. The table game uses cabbage, rabbits, and foxes to demonstrate struggles in nature. An initial cell is filled with cabbage as food for the rabbits, which becomes food for the foxes according to predefined rules. Then the rules control and balance the population of rabbits and foxes.

It is interesting to think about computers, computer viruses, and antiviral programs in terms of this model. Without computers (in particular, an operating system or BIOS of some sort), computer viruses are unable to replicate. Computer viruses infect new computer systems, and as they replicate, the viruses can be thought of as prey for antivirus programs.

In some situations, computer viruses fight back. These are called retro viruses. In such a situation, the antiviral application can be thought to "die." When an antiviral program stops an instance of a virus, the virus can be thought to "die." In some cases, the PC will "die" immediately as the virus infects it.

For example, if the virus indiscriminately deletes key operating system files, the system will crash, and the virus can be said to have "killed" its host. If this process happens too quickly, the virus might kill the host before having the opportunity to replicate to other systems. When we imagine millions of computers as a table game of this form, it is fascinating to see how computer virus and antiviral population models parallel those of the cabbage, rabbits, and foxes simulation game.

Rules, side effects, mutations, replication techniques, and degrees of virulence dictate the balance of such programs in a never-ending fight. At the same time, a "co-evolution"12 exists between computer viruses and antivirus programs. As antivirus systems have become more sophisticated, so have computer viruses. This tendency has continued over the more than 30-year history of computer viruses.

Using models along these lines, we can see how the virus population varies according to the number of computers compatible with them. When it comes to computer viruses and antiviral programs, multiple parallel games occur side by side. Viruses within an environment that consists of a large number of compatible computers will be more virulent; that is, they will spread more rapidly to many more computers. A large number of similar PCs with compatible operating systems create a homogeneous environmentfertile ground for virulence (sound familiar?).

With smaller game boards representing a smaller number of compatible computers, we will obviously see smaller outbreaks, along with relatively small virus populations.

This sort of modeling clearly explains why we find major computer virus infections on operating systems such as Windows, which represents about 95% of the current PC population around us on a huge "grid." Of course this is not to say that 5% of computer systems are not enough to cause a global epidemic of some sort.

Note

If you are fascinated by self-replicating, self-repairing, and evolving structures, visit the BioWall project, http://lslwww.epfl.ch/biowall/index.html .

1.1.4. Core War: The Fighting Programs

Around 1966, Robert Morris, Sr., the future National Security Agency (NSA) chief scientist, decided to create a new game environment with two of his friends, Victor Vyssotsky and Dennis Ritchie, who coded the game and called it Darwin. (Morris, Jr. was the first infamous worm writer in the history of computer viruses. His mark on computer virus history will be discussed later in the book.)

The original version of Darwin was created for the PDP-1 (programmed data processing) at Bell Labs. Later, Darwin became Core War, a computer game that many programmers and mathematicians (as well as hackers) play to this day.

Note

I use the term hacker in its original, positive sense. I also believe that all good virus researchers are hackers in the traditional sense. I consider myself a hacker, too, but fundamentally different from malicious hackers who break into other people's computers.

The game is called Core War because the objective of the game is to kill your opponent's programs by overwriting them. The original game is played between two assembly programs written in the Redcode language. The Redcode programs run in the core of a simulated (for example, "virtual") machine named Memory Array Redcode Simulator (MARS). The actual fight between the warrior programs was referred to as Core Wars.

The original instruction set of Redcode consists of 10 simple instructions that allow movement of information from one memory location to another, which provides great flexibility in creating tricky warrior programs. Dewdney wrote several "Computer Recreations" articles in Scientific American13,14 that discussed Core War, beginning with the May 1984 article. Figure 1.7 is a screen shot of a Core War implementation called PMARSV, written by Albert Ma, Na'ndor Sieben, Stefan Strack, and Mintardjo Wangsaw. It is interesting to watch as the little warriors fight each other within the MARS environment.

Figure 1.7. Core Wars warrior programs (Dwarf and MICE) in battle.

Figure 1.7. Core Wars warrior programs (Dwarf and MICE) in battle.

As programs fight in the annual tournaments, certain warriors might become the King of the Hill (KotH). These are the Redcode programs that outperform their competitors.

The warrior program named MICE won the first tournament. Its author, Chip Wendell, received a trophy that incorporated a core-memory board from an early CDC 6600 computer14.

The simplest Redcode program consists of only one MOV instruction: MOV 0,1 (in the traditional syntax). This program is named IMP, which causes the contents at relative address 0 (namely the MOV, or move, instruction itself), to be transferred to relative address 1, just one address ahead of itself. After the instruction is copied to the new location, control is given to that address, executing the instruction, which, in turn, makes a new copy of itself at a higher address, and so on. This happens naturally, as instructions are executed following a higher address. The instruction counter will be incremented after each executed instruction.

The basic core consisted of two warrior programs and 8,000 cells for instructions. Newer revisions of the game can run multiple warriors at the same time. Warrior programs are limited to a specific starting size, normally 100 instructions. Each program has a finite number of iterations; by default, this number is 80,000.

The original version of Redcode supported 10 instructions. Later revisions contain more. For example, the following 14 instructions are used in the 1994 revision, shown in Listing 1.1.

Listing 1.1. Core War Instructions in the 1994 Revision

DATAdata
MOVAmove
ADDAadd
SUBAsubtract
MULAmultiply
DIVAdivide
MODAmodula
JMPAjump
JMZAjump if zero
JMNAjump if not zero
DJNAdecrement, jump if not zero
CMPAcompare
SLTAskip if less than
SPLAsplit execution

Let's take a look at Dewdney's Dwarf tutorial (see Listing 1.2).

Listing 1.2. Dwarf Bombing Warrior Program

;name		Dwarf
;author		A. K. Dewdney
;version	94.1
;date		April 29, 1993
;strategy	Bombs every fourth instruction.
 
ORG	1	; Indicates execution begins with the second
		; instruction (ORG is not actually loaded, and is
		; therefore not counted as an instruction).
 
DAT.F	#0, #0		; Pointer to target instruction.
ADD.AB	#4, $-1		; Increments pointer by 4.
MOV.AB	#0, @-2		; Bombs target instruction.
JMP.A	$-2, #0		; Loops back two instructions.
 

Dwarf follows a so-called bombing strategy. The first few lines are comments indicating the name of the warrior program and its Redcode 1994 standard. Dwarf attempts to destroy its opponents by "dropping" DAT bombs into their operation paths. Because any warrior process that attempts to execute a DAT statement dies in the MARS, Dwarf will be a likely winner when it hits its opponents.

The MOV instruction is used to move information into MARS cells. (The IMP warrior explains this very clearly.) The general format of a Redcode command is of the Opcode A, B form. Thus, the command MOV.AB #0, @-2 will point to the DAT statement in Dwarf's code as a source.

The A field points to the DAT statement, as each instruction has an equivalent size of 1, and at 0, we find DAT #0, #0. Thus, MOV will copy the DAT instruction to where B points. So where does B point to now?

The B field points to DAT.F #0, #0 statement in it. Ordinarily, this would mean that the bomb would be put on top of this statement, but the @ symbol makes this an indirect pointer. In effect, the @ symbol says to use the contents of the location to where the B field points as a new pointer (destination). In this case, the B field appears to point to a value of 0 (location 0, where the DAT.F instruction is placed).

The first instruction to execute before the MOV, however, is an ADD instruction. When this ADD #4, $-1 is executed, the DAT's offset field will be incremented by four each time it is executedthe first time, it will be changed from 0 to 4, the next time from 4 to 8, and so on.

This is why, when the MOV command copies a DAT bomb, it will land four lines (locations) above the DAT statement (see Listing 1.3).

Listing 1.3. Dwarf's Code When the First Bomb Is Dropped

0	DAT.F #0, #8
1 -> ADD.AB 4, $-1
2	MOV.AB #0, @-2 ; launcher
3	JMP.A $-2, #0
<strong>4	DAT ; Bomb 1</strong>
5	.
6	.
7	.
<strong>8	DAT ; Bomb 2</strong>
9	.
 

The JMP.A $-2 instruction transfers control back relative to the current offset, that is, back to the ADD instruction to run the Dwarf program "endlessly." Dwarf will continue to bomb into the core at every four locations until the pointers wrap around the core and return. (After the highest number possible for the DAT location has been reached, it will "wrap" back around past 0. For example, if the highest possible value were 10, 10+1 would be 0, and 10+4 would be 3.)

At that point, Dwarf begins to bomb over its own bombs, until the end of 80,000 cycles/iterations or until another warrior acts upon it. At any time, another warrior program might easily kill Dwarf because Dwarf stays at a constant locationso that it can avoid hitting itself with friendly fire. But in doing so, it exposes itself to attackers.

There are several common strategies in Core War, including scanning, replicating, bombing, IMP-spiral (those using the SPL instruction), and the interesting bomber variation named the vampire.

Dewdney also pointed out that programs can even steal their enemy warrior's very soul by hijacking a warrior execution flow. These are the so-called vampire warriors, which bomb JMP (JUMP) instructions into the core. By bombing with jumps, the enemy program's control can be hijacked to point to a new, predefined location where the hijacked warrior will typically execute useless code. Useless code will "burn" the cycles of the enemy warrior's execution threads, thus giving the vampire warrior an advantage.

Instead of writing computer viruses, I strongly recommend playing this harmless and interesting game. In fact, if worms fascinate you, a new version of Core War can be created to link battles in different networks and allow warrior programs to jump from one battle to another to fight new enemies on those machines. Evolving the game to be more networked allows for simulating worm-like warrior programs.

1.2. Genesis of Computer Viruses

Virus-like programs appeared on microcomputers in the 1980s. However, two fairly recounted precursors deserve mention here: Creeper from 1971-72 and John Walker's "infective" version of the popular ANIMAL game for UNIVAC15 in 1975.

Creeper and its nemesis, Reaper, the first "antivirus" for networked TENEX running on PDP-10s at BBN, was born while they were doing the early development of what became "the Internet."

Even more interestingly, ANIMAL was created on a UNIVAC 1100/42 mainframe computer running under the Univac 1100 series operating system, Exec-8. In January of 1975, John Walker (later founder of Autodesk, Inc. and co-author of AutoCAD) created a general subroutine called PERVADE16, which could be called by any program. When PERVADE was called by ANIMAL, it looked around for all accessible directories and made a copy of its caller program, ANIMAL in this case, to each directory to which the user had access. Programs used to be exchanged relatively slowly, on tapes at the time, but still, within a month, ANIMAL appeared at a number of places.

The first viruses on microcomputers were written on the Apple-II, circa 1982. Rich Skrenta17, who was a ninth-grade student at the time in Pittsburgh, Pennsylvania, wrote "Elk Cloner." He did not think the program would work well, but he coded it nonetheless. His friends found the program quite entertainingunlike his math teacher, whose computer became infected with it. Elk Cloner had a payload that displayed Skrenta's poem after every 50th use of the infected disk when reset was pressed (see Figure 1.8). On every 50th boot, Elk Cloner hooked the reset handler; thus, only pressing reset triggered the payload of the virus.

Figure 1.8. Elk Cloner activates.

Figure 1.8. Elk Cloner activates.

Not surprisingly, the friendship of the two ended shortly after the incident. Skrenta also wrote computer games and many useful programs at the time, and he still finds it amazing that he is best known for the "stupidest hack" he ever coded.

In 1982, two researchers at Xerox PARC18 performed other early studies with computer worms. At that time, the term computer virus was not used to describe these programs. In 1984, mathematician Dr. Frederick Cohen19 introduced this term, thereby becoming the "father" of computer viruses with his early studies of them. Cohen introduced computer virus based on the recommendation of his advisor, Professor Leonard Adleman20, who picked the name from science fiction novels.

1.3. Automated Replicating Code: The Theory and Definition of Computer Viruses

Cohen provided a formal mathematical model for computer viruses in 1984. This model used a Turing machine. In fact, Cohen's formal mathematical model for a computer virus is similar to Neumann's self-replicating cellular automata model. We could say, that in the Neumann sense, a computer virus is a self-reproducing cellular automata. The mathematical model does not have much practical use for today's researcher. It is a rather general description of what a computer virus is. However, the mathematical model provides significant theoretical foundation to the computer virus problem.

Here is Cohen's informal definition of a computer virus: "A virus is a program that is able to infect other programs by modifying them to include a possibly evolved copy of itself."

This definition provides the important properties of a computer virus, such as the possibility of evolution (the capability to make a modified copy of the same code with mutations). However, it might also be a bit misleading if applied in its strictest sense.

This is, by no means, to criticize Cohen's groundbreaking model. It is difficult to provide a precise definition because there are so many different kinds of computer viruses nowadays. For instance, some forms of computer viruses, called companion viruses, do not necessarily modify the code of other programs. They do not strictly follow Cohen's definition because they do not need to include a copy of themselves within other programs. Instead, they make devious use of the program's environmentproperties of the operating systemby placing themselves with the same name ahead of their victim programs on the execution path. This can create a problem for behavior-blocking programs that attempt to block malicious actions of other programsif the authors of such blockers strictly apply Cohen's informal definition. In other words, if such blocking programs are looking only for viruses that make unwanted changes to the code of another program, they will miss companion viruses.

Note

Cohen's mathematical formulation properly encompasses companion viruses; it is only the literal interpretation of the single-sentence human language definition that is problematic. A single-sentence linguistic definition of viruses is difficult to come up with.

Integrity checker programs also rely on the fact that one program's code remains unchanged over time. Such programs rely on a database (created at some initial point in time) assumed to represent a "clean" state of the programs on a machine. Integrity checker programs were Cohen's favorite defense method and my own in the early '90s. However, it is easy to see that the integrity checker would be challenged by companion viruses unless the integrity checker also alerted the user about any new application on the system. Cohen's own system properly performed this. Unfortunately, the general public does not like to be bothered each time a new program is introduced on their systems, but Cohen's approach is definitely the safest technique to use.

Dr. Cohen's definition does not differentiate between programs explicitly designed to copy themselves (the "real viruses" as we call them) from the programs that can copy themselves as a side effect of the fact that they are general-purpose copying programs (compilers and so on).

Indeed, in the real world, behavior-blocking defense systems often alarm in such a situation. For instance, Norton Commander, the popular command shell, might be used to copy the commander's own code to another hard drive or network resource. This action might be confused with self-replicating code, especially if the folder in which the copy is made has a previous version of the program that we overwrite to upgrade it. Though such "false alarms" are easily dealt with, they will undoubtedly annoy end users.

Taking these points into consideration, a more accurate definition of a computer virus would be the following: "A computer virus is a program that recursively and explicitly copies a possibly evolved version of itself."

There is no need to specify how the copy is made, and there is no strict need to "infect" or otherwise modify another application or host program. However, most computer viruses do indeed modify another program's code to take control. Blocking such an action, then, considerably reduces the possibility for viruses to spread on the system.

As a result, there is always a host, an operating system, or another kind of execution environment, such as an interpreter, in which a particular sequence of symbols behaves as a computer virus and replicates itself recursively.

Computer viruses are self-automated programs that, against the user's wishes, make copies of themselves to spread themselves to new targets. Although particular computer viruses ask the user with prompts before they infect a machine, such as, "Do you want to infect another program? (Y/N?)," this does not make them non-viruses. Often, novice researchers in computer virus labs believe otherwise, and they actually argue that such programs are not viruses. Obviously, they are wrong!

When attempting to classify a particular program as a virus, we need to ask the important question of whether a program is able to replicate itself recursively and explicitly. A program cannot be considered a computer virus if it needs any help to make a copy of itself. This help might include modifying the environment of such a program (for example, manually changing bytes in memory or on a disk) orheaven forbidapplying a hot fix to the intended virus code itself using a debugger! Instead, nonworking viruses should be classified as intended viruses.

The copy in question does not have to be an exact clone of the initial instance. Modern computer viruses, especially so-called metamorphic viruses (further discussed in Chapter 7, "Advanced Code Evolution Techniques and Computer Virus Generator Kits"), can rewrite their own code in such a way that the starting sequence of bytes responsible for the copy of such code will look completely different in subsequent generations but will perform the equivalent or similar functionality.

References

1. Donald E. Knuth , The Art of Computer Programming, 2nd Edition, Addison-Wesley, Reading, MA, 1973, 1968, ISBN: 0-201-03809-9 (Hardcover).

2. John von Neumann , "The General and Logical Theory of Automata," Hixon Symposium, 1948.

3. John von Neumann , "Theory and Organization of Complicated Automata," Lectures at the University of Illinois, 1949.

4. John von Neumann , "The Theory of Automata: Contruction, Reproduction, Homogenity," Unfinished manuscript, 1953.

5. William Poundstone , Prisoner's Dilemma, Doubleday, New York, ISBN: 0-385-41580-X (Paperback), 1992.

6. Eli Bachmutsky , "Self-Replication Loops in Cellular Space," http://necsi.org:16080/postdocs/sayama/sdrs/java .

7. Robert A. Freitas, Jr. and William B. Zachary , "A Self-Replicating, Growing Lunar Factory," Fifth Princeton/AIAA Conference, May 1981.

8. Robert A. Freitas, Jr., "Some Limits to Global Ecophagy by Biovorous Nanoreplicators, with Public Policy Recommendations," http://www.foresight.org/nanorev/ecophagy.html .

9. György Marx , A Természet Játékai, Ifúsági Lap és Könyvterjesztô Vállalat, Hungary, 1982, ISBN: 963-422-674-4 (Hardcover).

10. Martin Gardner , "Mathematical Games: The Fantastic Combinations of John Conway's New Solitaire Game 'Life,'" Scientific American, October 1970, pp. 120-123.

11. Edwin Martin , "John Conway's Game of Life," http://www.bitstorm.org/gameoflife (Java version is available).

12. Carey Nachenberg , "Computer Virus-Antivirus Coevolution," Communications of the ACM, January 1997, Vol. 40, No. 1., pp. 46-51.

13. Dewdney, A. K. , The Armchair Universe: An Exploration of Computer Worlds, New York: W. H. Freeman (c) , 1988, ISBN: 0-7167-1939-8 (Paperback).

14. Dewdney, A. K. , The Magic Machine: A Handbook of Computer Sorcery, New York: W. H. Freeman (c) , 1990, ISBN: 0-7167-2125-2 (Hardcover), 0-7167-2144-9 (Paperback).

15. John Walker , "ANIMAL," http://fourmilab.ch/documents/univac/animal.html .

16. John Walker , "PERVADE," http://fourmillab.ch/documents/univac/pervade.html .

17. Rich Skrenta , http://www.skrenta.com .

18. John Shock and Jon Hepps , "The Worm Programs, Early Experience with a Distributed Computation," ACM, Volume 25, 1982, pp. 172-180.

19. Dr. Frederick B. Cohen , A Short Course on Computer Viruses, Wiley Professonal Computing, New York, 2nd edition, 1994, ISBN: 0471007684 (Paperback).

20. Vesselin Vladimirov Bontchev , "Methodology of Computer Anti-Virus Research," University of Hamburg Dissertation, 1998.

Chapter 2. The Fascination of Malicious Code Analysis

"The Lion looked at Alice wearily. 'Are you animalor vegetableor mineral?' he said, yawning at every other word."

Lewis Carroll (18321898), Through the Looking-Glass and What Alice Found There (1871).


For people who are interested in nature, it is difficult to find a subject more fascinating than computer viruses. Computer virus analysis can be extremely difficult for most people at first glance. However, the difficulty depends on the actual virus code in question. Binary forms of viruses, those compiled to object code, must be reverse-engineered to understand them in detail. This process can be challenging for an individual, but it provides a great deal of knowledge about computer systems.

My own interest in computer viruses began in September of 1990, when my new PC clone displayed a bizarre message, followed by two beeps. The message read

"Your PC is now Stoned!"

I had heard about computer viruses before, but this was my first experience with one of these incredible nuisances. Considering that my PC was two weeks old at the time, I was fascinated by how quickly I encountered a virus on it. I had introduced the Stoned boot virus with an infected diskette, which contained a copy of a popular game named Jbird. A friend had given me the game. Obviously he did not know about the hidden "extras" stored on the diskette.

I did not have antivirus software at the time, of course, and because this incident happened on a Saturday, help was not readily available. The PC clone had cost me five months' worth of my summer salary, so you can imagine my disappointment!

I was worried that I was going to lose all the data on my system. I remembered an incident that had happened to a friend in 1988: His PC was infected with a virus, causing characters to fall randomly down his computer screen; after a while, he could not do anything with the machine. He had told me that he needed to format the drive and reinstall all the programs.

Later, we learned that a strain of the Cascade virus had infected his computer. Cascade could have been removed from his system without formatting the hard drive, but he did not know that at the time. Unfortunately, as a result, he lost all his data. Of course I wanted to do the exact opposite on my machineremove the virus without losing my data.

To find the Stoned virus, I first searched the files on the infected diskette for the text that was displayed on the screen. I was not lucky enough to find any files that contained it. If I had had more experience in hunting viruses at the time, I might have considered the possibility that the virus was encrypted in a file. But this virus was not encrypted, and my instinct about a nonfile system hiding place was heading in the right direction.

This gave me the idea that the virus was not stored in the files but instead was located somewhere else on the diskette. I had Peter Norton's book, Programmer's Guide to the IBM PC, on-hand. Up to this point, I had only read a few pages of it, but luckily the book described how the boot sector of diskettes could be accessed using a standard DOS tool called DEBUG.

After some hesitation, I finally executed the DEBUG command for the first time to try to look into the boot sector of the diskette, which was inserted in drive A. The command was the following:

DEBUG
-L 100 0 0 1
 

This command instructs DEBUG to load the first sector (the boot sector) from drive A: to memory at offset 100 hexadecimal. When I used the dump (D) command of DEBUG to display the loaded sector's content, I saw the virus's message, as well as some other text.

-d280
1437:0280 03 33 DB FE C1 CD 13 EB-C5 07 59 6F 75 72 20 50 .3........Your P
1437:0290 43 20 69 73 20 6E 6F 77-20 53 74 6F 6E 65 64 21 C is now Stoned!
1437:02A0 07 0D 0A 0A 00 4C 45 47-41 4C 49 53 45 20 4D 41 .....LEGALISE MA
1437:02B0 52 49 4A 55 41 4E 41 21-00 00 00 00 00 00 00 00 RIJUANA!........
 

You can imagine how excited I was to find the virus. Finally, it was right there in front of me! I spent the weekend reading more of the Norton book because I did not understand the virus's code at all. I simply did not know IBM PC Assembly language at the time, which was required to understand the code. There were so many things to learn!

The Norton book introduced me to a substantial amount of the information I needed to begin. For example, it provided detailed and superb descriptions of the boot process, disk structures, and various interrupts of the DOS and basic input-output system (BIOS) routines.

I spent a few days analyzing Stoned on paper and commenting every single Assembly instruction until I understood everything. It took me almost a full week to absorb all the information, but, sadly, my computer was still infected with the virus.

After a few more days of work, I created a detection program, then a disinfection program for the virus, which I wrote in Turbo Pascal. The disinfection program was able to remove the virus from all over: from the system memory as well as from the boot and Master boot sectors in which the virus was stored.

A couple of days later, I visited the university with my virus detector and found that the virus had infected more than half of the PC labs' machines. I was amazed at how successfully this simple virus code could invade machines around the world. I could not fathom how the virus had traveled all the way from New Zealand where, I learned later, it had been released in early 1988, to Hungary to infect my system.

The Stoned virus was in the wild. (IBM researcher, Dave Chess, coined the term in the wild to describe computer viruses that were encountered on production systems. Not all viruses are in the wild. The viruses that only collectors or researchers have seen are named zoo viruses.)

People welcomed the help, and I was happy because I wanted to assist them and learn more about virus hunting. I started to collect viruses from friends and wrote disinfection programs for them. Viruses such as Cascade, Vacsina, Yankee_Doodle, Vienna, Invader, Tequila, and Dark_Avenger were among the first set that I analyzed in detail, and I wrote detection and disinfection code for them one by one.

Eventually, my work culminated in a diploma, and my antivirus program became a popular shareware in Hungary. I named my program Pasteur after the French microbiologist Louis Pasteur.

All my efforts and experiences opened up a career for me in antivirus research and development. This book is designed to share my knowledge of computer virus research.

2.1. Common Patterns of Virus Research

Computer virus analysis has some common patterns that can be learned easily, lending efficiency to the analysis process. There are several techniques that computer virus researchers use to reach their ultimate goal, which is to acquire a precise understanding of viral programs in a timely manner to provide appropriate prevention and to respond so that computer virus outbreaks can be controlled.

Virus researchers also need to identify and understand particular vulnerabilities and malicious code that exploits them. Vulnerability and exploit research has its own common patterns and techniques. Some of these are similar to the methods of computer virus research, but many key differences exist.

This book will introduce these useful techniques to teach you how to deal with viral programs more efficiently. Along the way, you will learn how to analyze a computer virus more effectively and safely by using disassemblers, debuggers, emulators, virtual machines, file dumpers, goat files, dedicated virus replication machines and systems, virus test networks, decryption tools, unpackers, and many other useful tools. You can use this information to deal with computer virus problems more effectively on a daily basis.

You also will learn how computer viruses are classified and named, as well as a great deal about state-of-the-art computer virus tricks.

Computer virus source code is not discussed in this book. Discussions on this topic are unethical and in some countries, illegal1. More importantly, writing even a dozen viruses would not make you an expert on this subject.

Some virus writers2 believe that they are experts because they created a single piece of code that replicates itself. This assumption could not be further from the truth. Although some virus writers might be very knowledgeable individuals, most of them are not experts on the subject of computer viruses. The masterminds who arguably at various times represented the state of the art in computer virus writing go (or went) by aliases such as Dark Avenger3, Vecna, Jacky Qwerty, Murkry, Sandman, Quantum, Spanska, GriYo, Zombie, roy g biv, and Mental Driller.

2.2. Antivirus Defense Development

Initially, developing antivirus software programs was not difficult. In the late '80s and early '90s, many individuals were able to create some sort of antivirus program against a particular form of a computer virus.

Frederick Cohen proved that antivirus programs cannot solve the computer virus problem because there is no way to create a single program that can detect all future computer viruses in finite time. Regardless of this proven fact, antivirus programs have been quite successful in dealing with the problem for a while. At the same time, other solutions have been researched and developed, but computer antivirus programs are still the most widely used defenses against computer viruses at present, regardless of their many drawbacks, including the inability to contend with and solve the aforementioned problem.

Perhaps under the delusion that they are experts on computer viruses, some security analysts state that any sort of antivirus program is useless if it cannot find all the new viruses. However, the reality is that without antivirus programs, the Internet would be brought to a standstill because of the traffic undetected computer viruses would generate.

Often we do not completely understand how to protect ourselves against viruses, but neither do we know how to reduce the risk of becoming infected by them by adopting proper hygiene habits. Unfortunately, negligence is one of the biggest contributors to the spread of computer viruses. The sociological aspects of computer security appear to be more relevant than technology. Carelessly neglecting the most minimal level of computer maintenance, network security configuration, and failing to clean an infected computer opens up a Pandora's box that allows more problems to spread to other computers.

In the early phases of virus detection and removal, computer viruses were easily managed because very few viruses existed (there were fewer than 100 known strains in 1990). Computer virus researchers could spend weeks analyzing a single virus alone. To make life even easier, computer viruses spread slowly, compared to the rapid proliferation of today's viruses. For example, many successful boot viruses were 512 bytes long (the size of the boot sector on the IBM PC), and they often took a year or longer to travel from one country to another. Consider this: The spread time at which a computer virus traveled in the past compared to today's virus spread time is analogous to comparing the speed of message transfer in ancient times, when messengers walked or ran from city to city to deliver parcels, with today's instant message transfer, via e-mail, with or without attachments.

Finding a virus in the boot sector was easy for those who knew what a boot sector was; writing a program to recognize the infection was tricky. Manually disinfecting an infected system was a true challenge in and of itself, so creating a program that automatically removed viruses from computers was considered a tremendous achievement. Currently, the development of antivirus and security defense systems is deemed an art form, which lends itself to cultivating and developing a plethora of useful skills. However, natural curiosity, dedication, hard work, and the continuous desire to learn often supersede mere hobbyist curiosity and are thus essential to becoming a master of this artistic and creative vocation.

2.3. Terminology of Malicious Programs

The need to define a unified nomenclature for malicious programs is almost as old as computer viruses themselves4. Obviously, each classification has a common pitfall because classes will always appear to overlap, and classes often represent closely related subclasses of each other.

2.3.1. Viruses

As defined in Chapter 1, "Introduction to the Games of Nature," a computer virus is code5 that recursively replicates a possibly evolved copy of itself. Viruses infect a host file or system area, or they simply modify a reference to such objects to take control and then multiply again to form new generations.

2.3.2. Worms

Worms are network viruses, primarily replicating on networks. Usually a worm will execute itself automatically on a remote machine without any extra help from a user. However, there are worms, such as mailer or mass-mailer worms, that will not always automatically execute themselves without the help of a user.

Worms are typically standalone applications without a host program. However, some worms, like W32/Nimda.A@mm, also spread as a file-infector virus and infect host programs, which is precisely why the easiest way to approach and contain worms is to consider them a special subclass of virus. If the primary vector of the virus is the network, it should be classified as a worm.

2.3.2.1 Mailers and Mass-Mailer Worms

Mailers and mass-mailer worms comprise a special class of computer worms, which send themselves in an e-mail. Mass-mailers, often referred to as "@mm" worms such as VBS/Loveletter.A@mm, send multiple e-mails including a copy of themselves once the virus is invoked.

Mailers will send themselves less frequently. For instance, a mailer such as W32/SKA.A@m (also known as the Happy99 worm) sends a copy of itself every time the user sends a new message.

2.3.2.2 Octopus

An octopus is a sophisticated kind of computer worm that exists as a set of programs on more than one computer on a network.

For example, head and tail copies are installed on individual computers that communicate with each other to perform a function. An octopus is not currently a common type of computer worm but will likely become more prevalent in the future. (Interestingly, the idea of the octopus comes from the science fiction novel Shockwave Rider by John Brunner. In the story, the main character, Nickie, is on the run and uses various identities. Nickie is a phone phreak, and he uses a "tapeworm," similar to an octopus, to erase his previous identities.)

2.3.2.3 Rabbits

A rabbit is a special computer worm that exists as a single copy of itself at any point in time as it "jumps around" on networked hosts. Other researchers use the term rabbit to describe crafty, malicious applications that usually run themselves recursively to fill memory with their own copies and to slow down processing time by consuming CPU time. Such malicious code uses too much memory and thus can cause serious side effects on a machine within other applications that are not prepared to work under low-memory conditions and that unexpectedly cease functioning.

2.3.3. Logic Bombs

A logic bomb is a programmed malfunction of a legitimate application. An application, for example, might delete itself from the disk after a couple of runs as a copy protection scheme; a programmer might want to include some extra code to perform a malicious action on certain systems when the application is used. These scenarios are realistic when dealing with large projects driven by limited code-reviews.

An example of a logic bomb can be found in the original version of the popular Mosquitos game on Nokia Series 60 phones. This game has a built-in function to send a message using the Short Message Service (SMS) to premium rate lines. The functionality was built into the first version of the game as a software distribution and piracy protection scheme, but it backfired6. When legitimate users complained to the software vendor, the routine was eliminated from the code of the game. The premium lines have been "disconnected" as well. However, the pirated versions of the game are still in circulation, which have the logic bomb inside and send regular SMS messages. The game used four premium SMS phone numbers such as 4636, 9222, 33333, and 87140, which corresponded to four countries. For example, the number 87140 corresponded to the UK. When the game used this number, it sent the text "king.001151183" as short message. In turn, the user of the game was charged a hefty A31.5 per message.

Often extra functionality is hidden as resources in the applicationand remains hidden. In fact, the way in which these functions are built into an application is similar to the way so-called Easter eggs are making headway into large projects. Programmers create Easter eggs to hide some extra credit pages for team members who have worked on a project.

Applications such as those in the Microsoft Office suite have many Easter eggs hidden within them, and other major software vendors have had similar credit pages embedded within their programs as well. Although Easter eggs are not malicious and do not threaten end users (even though they might consume extra space on the hard drive), logic bombs are always malicious.

2.3.4. Trojan Horses

Perhaps the simplest kind of malicious program is a Trojan horse. Trojan horses try to appeal to and interest the user with some useful functionality to entice the user to run the program. In other cases, malicious hackers leave behind Trojanized versions of real tools to camouflage their activities on a computer, so they can retrace their steps to the compromised system and perform malicious activities later.

For example, on UNIX-based systems, hackers often leave a modified version of "ps" (a tool to display a process list) to hide a particular process ID (PID), which can relate to another backdoor Trojan's process. Later on, it might be difficult to find such changes on a compromised system. These kinds of Trojans are often called user mode rootkits.

The attacker can easily manipulate the tool by modifying the source code of the original tool at a certain location. At first glance, this minor modification is extremely difficult to locate.

Probably the most famous Trojan horse is the AIDS TROJAN DISK7 that was sent to about 7,000 research organizations on a diskette. When the Trojan was introduced on the system, it scrambled the name of all files (except a few) and filled the empty areas of the disk completely. The program offered a recovery solution in exchange of a bounty. Thus, malicious cryptography was born. The author of the Trojan horse was captured shortly after the incident. Dr. Joseph Popp, 39 at the time, a zoologist from Cleveland, Ohio was prosecuted in the UK8.

The filename scrambling function of AIDS TROJAN DISK was based on two substitution tables9. One was used to encrypt the filenames and another to encrypt the file extensions. At some point in the history of cryptography10, such an algorithm was considered unbreakable11. However, it is easy to see that substitution ciphers can be easily attacked based on the use of statistical methods (the distribution of common words). In addition, if given enough time, the defender can disassemble the Trojan's code and pick the tables from its code.

There are two kinds of Trojans:

Note

The source code of Windows NT and Windows 2000 got into circulation in early 2004. It is expected that backdoor and rootkit programs will be created using these sources.

2.3.4.1 Backdoors (Trapdoors)

A backdoor is the malicious hacker's tool of choice that allows remote connections to systems. A typical backdoor opens a network port (UDP/TCP) on the host when it is executed. Then, the listening backdoor waits for a remote connection from the attacker and allows the attacker to connect to the system. This is the most common type of backdoor functionality, which is often mixed with other Trojan-like features.

Another kind of backdoor relates to a program design flaw. Some applications, such as the early implementation of SMTP (simple mail transfer protocol) allowed features to run a command (for example, for debugging purposes). The Morris Internet worm uses such a command to execute itself remotely, with the command placed as the recipient of the message on such vulnerable installations. Fortunately, this command was quickly removed once the Morris worm exploited it. However, there can be many applications, especially newer ones, that allow for similar insecure features.

2.3.4.2 Password-Stealing Trojans

Password-stealing Trojans are a special subclass of Trojans. This class of malicious program is used to capture and send a password to an attacker. As a result, an attacker can return to the vulnerable system and take whatever he or she wants. Password stealers are often combined with keyloggers to capture keystrokes when the password is typed at logon.

2.3.5. Germs

Germs are first-generation viruses in a form that the virus cannot generate to its usual infection processes. Usually, when the virus is compiled for the first time, it exists in a special form and normally does not have a host program attached to it. Germs will not have the usual marks that most viruses use in second-generation form to flag infected files to avoid reinfecting an already infected object.

A germ of an encrypted or polymorphic virus is usually not encrypted but is plain, readable code. Detecting germs might need to be done differently from detecting second, and later, -generation infections.

2.3.6. Exploits

Exploit code is specific to a single vulnerability or set of vulnerabilities. Its goal is to run a program on a (possibly remote, networked) system automatically or provide some other form of more highly privileged access to the target system. Often, a single attacker builds exploit code and shares it with others. "White hat" hackers create a form of exploit code for penetration (or "pen") testing. Therefore, depending on the actual use of the exploit, the exploitation might be malicious in some cases but harmless in othersthe severity of the threat depends on the intention of the attacker.

2.3.7. Downloaders

A downloader is yet another malicious program that installs a set of other items on a machine that is under attack. Usually, a downloader is sent in e-mail, and when it is executed (sometimes aided with the help of an exploit), it downloads malicious content from a Web site or other location and then extracts and runs its content.

2.3.8. Dialers

Dialers got their relatively early start during the heyday of dial-up connections to bulletin board systems (BBSs) in homes. The concept driving a dialer is to make money for the people behind the dialer by having its users (often unwitting victims) call via premium-rate phone numbers. Thus, the person who runs the dialer might know the intent of the application, but the user is not aware of the charges. A common form of dialer is the so-called porn dialer.

Similar approaches exist on the World Wide Web using links to Web pages that connect to paid services.

2.3.9. Droppers

The original term refers to an "installer" for first-generation virus code. For example, boot viruses that first exist as compiled files in binary form are often installed in the boot sector of a floppy using a dropper. The dropper writes the germ code to the boot sector of the diskette. Then the virus can replicate on its own without ever generating the dropper form again.

When the virus regenerates the dropper form, the intermediate form is part of an infection cycle, which is not to be confused with a dedicated (or pure) dropper.

2.3.10. Injectors

Injectors are special kinds of droppers that usually install virus code in memory. An injector can be used to inject virus code in an active form on a disk interrupt handler. Then, the first time a user accesses a diskette, the virus begins to replicate itself normally.

A special kind of injector is the network injector. Attackers also can use legitimate utilities, such as NetCat (NC), to inject code into the network. Usually, a remote target is specified, and the datagram is sent to the machine that will be attacked using the injector. An attacker initially introduced the CodeRed worm using an injector; subsequently, the worm replicated as data on the network without ever hitting the disk again as a file.

Injectors are often used in a process called seeding. Seeding is a process that is used to inject virus code to several remote systems to cause an initial outbreak that is large enough to cause a quick epidemic. For example, there is supporting digital evidence that W32/Witty worm12 was seeded to several systems by its author.

2.3.11. Auto-Rooters

Auto-rooters are usually malicious hacker tools used to break into new machines remotely. Auto-rooters typically use a collection of exploits that they execute against a specified target to "gain root" on the machine. As a result, a malicious hacker (typically a so-called script-kiddie) gains administrative privileges to the remote machine.

2.3.12. Kits (Virus Generators)

Virus writers developed kits, such as the Virus Creation Laboratory (VCL) or PSMPC generators, to generate new computer viruses automatically, using a menu-based application. With such tools, even novice users were able to develop harmful computer viruses without too much background knowledge. Some virus generators exist to create DOS, macro, script, or even Win32 viruses and mass-mailing worms. As discussed in Chapter 7 "Advanced Code Evolution Techniques and Computer Virus Generator Kits," the so-called "Anna Kournikova" virus (technically VBS/VBSWG.J) was created by a Dutch teenager, Jan de Wit, from the VBSWG kitsadly, de Wit got lucky and the kit, infamous for churning out mainly broken, intended code produced a working virus. De Wit was subsequently arrested, convicted, and sentenced for his role in this.

2.3.13. Spammer Programs

Vikings: Spam spam spam spam
Waitress: ...spam spam spam egg and spam; spam spam spam spam spam baked beans spam spam spam...
Vikings: Spam! Lovely spam! Lovely spam!

Monty Python Spam Song


Spammer programs are used to send unsolicited messages to Instant Messaging groups, newsgroups, or any other kind of mobile device in forms of e-mail or cell phone SMS messages.

Two lawyers helped to make spam an international, albeit notorious, superstar of the worldwide Internet virus scene. Their main objective was to send advertisements to Internet newsgroups. Spam mail has become the number one Internet nuisance for the global community. Many e-mail users complain that their inbox is littered with more than 70% spam each day. This ratio has been on the rise for the last couple of years.

The primary motivation of spammers is to make money by generating traffic to Web sites. In addition, spam messages are often used to implement phishing attacks. For example, you might receive an e-mail message asking you to visit your bank's Web site and telling you that if you don't, they will disable your account. There is a link in the e-mail, however, that forwards you to the fraudster. If you fall victim to the attack, you might disclose personal information to the attacker on a silver plate. The fraudster wants to get your credit card number, account number, password, PIN (personal identification number), and other personal information to make money. In addition, you might become the prime subject of an identity theft as well.

2.3.14. Flooders

Malicious hackers use flooders to attack networked computer systems with an extra load of network traffic to carry out a denial of service (DoS) attack. When the DoS attack is performed simultaneously from many compromised systems (so-called zombie machines), the attack is called a distributed denial of service (DDoS) attack. Of course, there are much more sophisticated DoS attacks including SYN floods, packet fragmentation attacks, and other (mis-)sequencing attacks, traffic amplification, or traffic deflection, just to name the most common types.

2.3.15. Keyloggers

A keylogger captures keystrokes on a compromised system, collecting sensitive information for the attacker. Such sensitive information might include names, passwords, PINs, birthdays, Social Security numbers, or credit card numbers. The keylogger is installed on the system. Unbeknownst to the user, a computer could be compromised for weeks before the attack is ever noticed. Attackers often use keyloggers to commit identity theft.

2.3.16. Rootkits

Rootkits are a special set of hacker tools that are used after the attacker has broken into a computer system and gained root-level access. Usually, hackers break into a system with exploits and install modified versions of common tools. Such rootkits are called user-mode rootkits because the Trojanized application runs in user mode.

Some more sophisticated rootkits, such as Adore13, have kernel-mode module components. These rootkits are more dangerous because they change the behavior of the kernel. Thus, they can hide objects from even kernel-level defense software. For example, they can hide processes, files in the file system, registry keys, and values under Windows, and implement stealth capabilities for other malicious components. In contrast, user-mode rootkits cannot typically hide themselves effectively from kernel-level defense software. User-mode rootkits only manipulate with user-mode objects; therefore, defense systems relying on kernel objects have chance to reveal the truth.

2.4. Other Categories

Some other categories of commonly encountered Internet pests are not necessarily malicious in their primary intent. However, they can be a nuisance to end users; therefore, antivirus and antispam products have been created to detect and remove such annoying burdens from computers.

2.4.1. Joke Programs

Joke programs are not malicious; however, as Alan Solomon (author of one of the most widely used scanning engines today) once mentioned, "Whether a program should be classified as a joke program or as a Trojan largely depends on the sense of humor of the victim." Joke programs change or interrupt the normal behavior of your computer, creating a general distraction or nuisance. Colleagues often make fun of each other by installing a joke program or by tricking others to run one on their systems. A typical example of a joke program is a screen saver that randomly locks the system.

However, such programs can be considered harmful in some cases. Consider, for example, a joke program that locks the system but never unlocks it. Thus, computers cannot be stopped safely. As a result, important data could be lost because it was never saved to the disk. Or worse, the file allocation table could get corrupted, and the machine would become unbootable.

2.4.2. Hoaxes: Chain Letters

On computers, hoaxes typically spread information about computer virus infections and ask the recipient of the message to forward it to others. One of the most infamous hoaxes was the Good Times hoax. Good Times appeared in 1994 and warned users about a potential new kind of virus that would arrive in e-mail. The hoax claimed that reading a message with "Good Times" in the subject line would erase data from the hard disk. Although many believed at the time that such an e-mail based virus was a hoax, the reality is that such a payload might be possible. Hoaxes typically mix some reality with lies. Good Times claimed that a particular virus existed, which was simply not true.

End users then spread the e-mail hoax to new people, "replicating" the message on the Internet by themselves and overloading e-mail systems with the hoax. At larger corporations, policies must be implemented to avoid the spread of hoaxes on local systems.

In the past, a typical hoax circulating at large corporations tried to deceive people into believing an untrue story about a very sick child, attempting to collect money for the child's medical procedure. Most people were sympathetic and did not recognize the danger of forwarding the e-mail message in this case; they trusted the source and believed the fabricated story.

With company policies intact, the problems that such hoaxes create can be effectively eliminated. However, hoaxes are considered one of the most successful Internet threats every year; take for example, the new chain letters that surface and rapidly spread around the world.

2.4.3. Other Pests: Adware and Spyware

A new type of application has appeared recently as a direct result of increased residential Internet access. Many companies are interested in what people look for or research on the Web, especially what kinds of products consumers might buy. Therefore, some consumer retail businesses install little applications to collect information and display customized advertisements in pop-up messages.

The most obvious problem with this type of application is that such applications were not written with malicious intent. In fact, many programmers make a living out of writing such tools. However, many of these Internet pests get installed on a system without the user's permission or knowledge, raising questions about privacy. Not surprisingly, corporations as well as home users dislike this type of program, referred to as spyware, which collects various information of user activity and then sends these data to a company via the Internet. Home users are undoubtedly disturbed by this invasive activity, not to mention the frustration that users feel in response to pop-ups.

In addition, these programs are often very poorly written and are resource hogs, particularly when two or more become installed on the same machine. Many also have the highly undesirable habit of lowering Internet Explorer's already deplorable security settings to unconscionable levels, opening the (usually unwitting) "victim" up to even worse exploits and infections14.

Because these applications are often a major source of business for organizations driven by consumer revenue, such businesses prefer that antivirus products not detect such programs at all, or at least not by default. Often such companies bring lawsuits against vendors who produce software to detect and remove their "applications." Such litigation makes the fight against this kind of pest much more difficult.

It is expected, however, that such programs will be illegal to create in several countries in the future. To make things even more interesting, some corporations prefer to remove "unwanted" spyware but want to keep the few "tools" that they use to monitor their employees on a regular basis.

2.5. Computer Malware Naming Scheme

Back in 1991, founding members of CARO (Computer Antivirus Researchers Organization) designed a computer virus naming scheme15 for use in antivirus (AV) products. Today, the CARO naming scheme is slightly outdated compared to daily practice, but it remains the only standard that most antivirus companies ever attempted to adopt. An up-to-date version of the document is in the works and is expected to be published by CARO soon at www.caro.org. In this short section, I can only show you a 10,000-foot view of malware naming. I strongly recommend Nick FitzGerald's AVAR 2002 conference paper16, which greatly expands on further naming considerations. Furthermore, credit must be given to all the respected antivirus researchers of CARO.

Note

The original naming scheme was designed by Dr. Alan Solomon, Fridrik Skulason, and Dr. Vesselin Bontchev.

Virus naming is a challenging task. Unfortunately, there has been a major increase in widespread, fast-running computer virus outbreaks. Nowadays, antivirus researchers must add detections of 500, 1000, 1500, or even more threats to their products each month. Thus, the problem of naming computer viruses, even by the same common name, is getting to be a hard, if not impossible, task to manage. Nonetheless, representatives of antivirus companies still try to reduce the confusion by using a common name for at least the in-the-wild computer malware. However, computer virus outbreaks are on the rise, and researchers do not have the time to agree on a common name for each in-the-wild virus in advance of deploying response definitions. Even more commonly, it is very difficult to predict which viruses will be seen in the wild and which will remain zoo viruses.

Most people remember textual family names better than the naked IDs that many other naming schemes have adopted in the security space. Let's take a look at malware naming in its most complex form:

<malware_type>://<platform>/<family_name>.<group_name>.<infective_length>.<variant><devolution><modifiers>

In practice, very little, if any, malware requires all name components. Practically anything other than the family name is an optional field:

[<malware_type>://][<platform>/]<family_name>[.<group_name>][.<infective_length>][.<variant>[<devolution>]][<modifiers>]

The following sections give a short description of each naming component.

2.5.1. <family_name>

This is the key component of any malware name. The basic rule set for the family name follows:

2.5.2. <malware_type>://

This part of the name indicates whether a malware type is a virus, Trojan, dropper, intended, kit, or garbage type (Virus://, Trojan://, .., Garbage://). Several products have extended this set slightly, and these are expected to become part of the standard malware naming in the future.

2.5.3. <platform>/

The platform prefix indicates the minimum native environment for the malware type that is required for it to function correctly. An annotated list of officially recognized platform names is listed in the next section.

Note

Multiple platform names can be defined for the same threat, for example, virus://{W32,W97M}/Beast.41472.A17. This name indicates a file-infecting virus called Beast that can infect on Win32 platforms and also is able to infect Word 97 documents.

2.5.4. .<group_name>

The group name represents a major family of computer viruses that are similar to each other. The group name is rarely used nowadays. It was mostly used to group DOS viruses.

2.5.5. <infective_length>

The infective length is used to distinguish parasitic viruses within a family or group based on their typical infective length in bytes.

2.5.6. <variant>

The subvariant represents minor variants of the same virus family with the same infective length.

2.5.7. [<devolution>]

The devolution identifier is used most commonly with the subvariant name in the case of macro viruses. Some macro viruses have a common ability (mostly related to programming mistakes) to create a subset of their original macro set during their natural replication cycle. Thus, the subset of macros cannot regenerate the original, complete macro set but is still able to recursively replicate from the partial set.

2.5.8. <modifiers>

The original intent of the modifier was to identify the polymorphic engine of a computer virus. However, most antivirus developers never used this modifier in practice. Nowadays, modifiers include the following optional components:

[[:<locale_specifier>][#<packer>][@'m'|'mm'][!<vendor-specific_comment>]]

2.5.9. :<locale_specifier>

This specifier is used mostly for macro viruses that depend on a particular language version of their environment, such as Word. For example, virus://WM/Concept.B:Fr is a virus that affects only the French version of Microsoft Word.

2.5.10. #<packer>

The packer modifier is rarely used in practice. It can indicate that a computer malware was packed with a particular "on-the-fly" extractor unpacker, such as UPX.

2.5.11. @m or @mm

These symbols indicate self-mailer or mass-mailer computer viruses. Suggested by Bontchev, this is probably the most widely recognized modifier. This modifier highlights computer viruses that are more likely to be encountered by the general public because of the way the viruses use e-mail to propagate themselves.

2.5.12. !<vendor-specific_comment>

The vendor-specific modifier is a recent addition to the set of modifiers. Vendors are allowed to postfix any malware name with such a modifier. For example, a vendor might want to indicate that a virus is multipartite by using !mp in the name.

2.6. Annotated List of Officially Recognized Platform Names

The platform names shown in Table 2.1 are the only officially recognized identifiers following the proposed naming standard. A platform name that does not appear on this list cannot be used as a platform identifier in a malware name following this standard. The Comments column helps to explain some of the finer points of platform name selection. This is intended to be an authoritative list at this book's publication date. The platform list will need to be extended in the future.

Table 2.1. Officially Recognized Platform Names

Short FormLong FormComments
ABAPABAPMalware for the SAP /R3 Advanced Business Application Programming environment.
ALSACADLispScriptMalware that requires AutoCAD Lisp Interpreter.
BATBATMalware that requires a DOS or Windows command shell interpreter or close clone.
BeOSBeOSRequires BeOS.
BootBootRequires MBR and/or system boot sector of IBM PCcompatible hard drive and/or floppy. (Rarely used in practice.)
DOSDOSInfects DOS COM and/or EXE (MZ) and/or SYS format files and requires some version of MS-DOS or a closely compatible OS. (Rarely used in practice.)
EPOCEPOCRequires the EPOC OS up to version 5.
SymbOSSymbianOSRequires Symbian OS (EPOC version 6 and later).
JavaJavaRequires a Java run-time environment (standalone or browser-embedded).
MacOSMacOSRequires a Macintosh OS prior to OS X.
MeOSMenuetOSRequires MenuetOS.
MSILMSILRequires the Microsoft Intermediate Language runtime.
MulMultiThis is a pseudo-platform, and its use is reserved for a few very special cases.
PalmOSPalmOSRequires a version of PalmOS.
OS2OS2Requires OS/2.
OSXOSXRequires Macintosh OS X or a subsequent, essentially similar version.
W16Win16Requires one of the 16-bit Windows x86 OSes. (Note: Several products use the Win prefix.)
W95Win95Requires Windows 9x VxD services.
W32Win32Requires a 32-bit Windows (Windows 9x, Me, NT, 2000, XP on x86).
W64Win64Requires Windows 64.
WinCEWinCERequires WinCE.
WMWordMacroMacro malware for WordBasic as included in WinWord 6.0, Word 95, and Word for Mac 5.x.
W2MWord2MacroMacro malware for WordBasic as included in WinWord 2.0.
W97MWord97MacroMacro malware for Visual Basic for Applications (VBA) v5.0 for Word (that shipped in Word 97) or later. Changes in VBA between Word 97 and 2003 versions (inclusive) are sufficiently slight that we do not distinguish platforms even if the malware makes a version check or uses one of the few VBA features added in versions subsequent to VBA v5.0.
AMAccessMacroMacro malware for AccessBasic.
A97MAccess97MacroMacro malware for Visual Basic for Applications (VBA) v5.0 for Access that shipped in Access 97 and later. As for W97M, changes in VBA versions between Access 97 and 2003 (inclusive) are insufficient to justify distinguishing the platforms.
P98MProject98MacroMacro malware for Visual Basic for Applications (VBA) v5.0 for Project that shipped in Project 98 and later. As for W97M, changes in VBA versions between Project 98 and 2003 (inclusive) are insufficient to justify distinguishing the platforms.
PP97MPowerPoint97MacroMacro malware for Visual Basic for Applications (VBA) v5.0 for Project, which shipped in Project 97 and later. As for W97M, changes in VBA between Project 97 and 2002 inclusive are insufficient to justify distinguishing the platforms.
V5MVisio5MacroMacro malware for Visual Basic for Applications (VBA) v5.0 for Visio that shipped in Visio 5.0 and later. As for W97M, changes in VBA versions between Visio 5.0 and 2002 inclusive are insufficient to justify distinguishing the platforms.
XFExcelFormulaMalware based on Excel Formula language that has shipped in Excel since the very early days.
XMExcelMacroMacro malware for Visual Basic for Applications (VBA) v3.0 that shipped in Excel for Windows 5.0 and Excel for Mac 5.x.
X97MExcel97MacroMacro malware for Visual Basic for Applications (VBA) v5.0 for Excel that shipped in Excel 97 and later. As for W97M, changes in VBA versions between Excel 97 and 2002 (inclusive) are insufficient to justify distinguishing the platforms.
O97MOffice97MacroThis is a pseudo-platform name reserved for macro malware that infects across at least two applications within the Office 97 and later suites. Cross-infectors between Office applications and related products, such as Project or Visio, can also be labeled thus.
AC14MAutoCAD14MacroVBA v5.0 macro viruses for AutoCAD r14 and later. As with W97M malware, minor differences in later versions of VBA are insufficient to justify new plat form names.
ActnSActionScriptRequires the Macromedia ActionScript interpreter found in some ShockWave Flash (and possibly other) animation players.
AplSAppleScriptRequires AppleScript interpreter.
APMAmiProMacroMacro malware for AmiPro.
CSCCorelScriptMalware that requires the CorelScript interpreter shipped in many Corel products.
HLPWinHelpScriptRequires the script interpreter of the WinHelp display engine.
INFINFScriptRequires one of the Windows INF (installer) script interpreters.
JSJScript, JavaScriptRequires a JScript and/or JavaScript interpreter. Hosting does not affect the platform designatorstandalone JS malware that requires MS JS under WSH, HTML-embedded JS malware, and JS malware embedded in Windows-compiled HTML help files (.CHM) all fall under this platform type.
MIRCmIRCScriptRequires the mIRC script interpreter.
MPBMapBasicRequires MapBasic of MapInfo product.
PerlPerlRequires a Perl interpreter. Hosting does not affect the platform designatorstandalone Perl infectors under UNIX(-like) shells, ones that require Perl under WSH and HTML-embedded Perl malware all fall under this platform type.
PHPPHPScriptRequires a PHP script interpreter.
PirchPirchScriptRequires the Pirch script interpreter.
PSPostScriptRequires a PostScript interpreter.
REGRegistryRequires a Windows Registry file (.REG) interpreter. (We do not distinguish .REG versions or ASCII versus Unicode.)
SHShellScriptRequires a UNIX(-like) shell interpreter. Hosting does not affect the platform nameshell malware specific to Linux, Solaris, HP-UX, or other systems, or specific to csh, ksh, bash, or other interpreters currently all fall under this platform type.
VBSVBScript, VisualBasicScriptRequires a VBS interpreter. Hosting does not affect the platform designatorstandalone VBS infectors that require VBS under WSH, HTML-embedded VBS malware, and malware embedded in Windows-compiled HTML help files (.CHM) all fall under this platform type.
UNIXUNIXThis is a common name for binary viruses on UNIX platforms. (More specific platform names are available.)
BSDBSDUsed for malware specific to BSD (-derived) platforms.
LinuxLinuxUsed for malware specific to Linux platforms and others closely based on it.
SolarisSolarisUsed for Solaris-specific malware.

References

1. Joe Hirst, "Virus Research and Social Responsibility," Virus Bulletin, October 1989, page 3.

2. Sarah Gordon, "The Generic Virus Writer," Virus Bulletin Conference, 1994.

3. Vesselin Bontchev, "The Bulgarian and Soviet Virus Writing Factories," Virus Bulletin Conference, 1991, pp. 11-25.

4. Dr. Keith Jackson, "Nomenclature for Malicious Programs," Virus Bulletin, March, 1990, page 13.

5. Vesselin Bontchev, "Are 'Good' Computer Viruses Still a Bad Idea?," EICAR, 1994, pp. 25-47.

6. Jamo Niemela, "Mquito," http://www.f-secure.com/v-descs/mquito.shtml .

7. Jim Bates, "Trojan Horse: AIDS Information Introductory Diskette Version 2.0," Virus Bulletin, January 1990, page 3.

8. Mark Hamilton, "U.S. Judge Rules In Favour Of Extradition," Virus Bulletin, January, 1991.

9. Istvan Farmosi, Janos Kis, Imre Szegedi , "Viruslelektan," Alaplap Konyvek, Budapest, 1990, ISBN: 963-02-8675-0 (Paperback).

10. David Kahn, "The CODE-Breakers," Scribner, New York, 1967, 1996, ISBN: 0-684-83130-9.

11. Tibor Nemetz, Istvan Vajda , "Algorithmic Cryptography," Academic Press, Budapest, 1991, ISBN: 963-05-6093-2.

12. Peter Ferrie, Frederic Perriot and Peter Szor , "Chiba Witty Blues," Virus Bulletin, May 2004, pp. 9-10.

13. Sami Rautiainen, "Hidden Under the Hood: Linux Backdoors," Virus Bulletin Conference 2002, pp. 217-234.

14. Nick FitzGerald, Private Communication, 2004.

15. Vesselin Bontchev, Fridrik Skulason and Alan Solomon , "A Virus Naming Convention," available at the FTP site of University of Hamburg, ftp://ftp.informatik.uni-hamburg.de/pub/virus/texts/tests/naming.zip.

16. Nick FitzGerald, "A Virus by Any Other Name: The Revised CARO Naming Convention," AVAR Conference, 2002.

17. Peter Szor, "Beast Regards," Virus Bulletin, June 1999, pp. 6-7.

Chapter 3. Malicious Code Environments

"In all things of nature there is something of the marvelous."

Aristotle


One of the most important steps toward understanding computer viruses is learning about the particular execution environments in which they operate. In theory, for any given sequence of symbols we could define an environment in which that sequence could replicate itself. In practice, we need to be able to find the environment in which the sequence of symbols operates and prove that it uses code explicitly to make copies of itself and does so recursively1.

A successful penetration of the system by viral code occurs only if the various dependencies of malicious code match a potential environment. Figure 3.1 is an imperfect illustration of common environments for malicious code. A perfect diagram like this is difficult to draw in 2D form.

Figure 3.1. Common environments of malicious code.

Figure 3.1. Common environments of malicious code.

The figure shows that Microsoft Office itself creates a homogeneous environment for malicious code across Mac and the PC. However, not all macro viruses2 that can multiply on the PC will be able to multiply on the Mac because of further dependencies. Each layer might create new dependencies (such as vulnerabilities) for malicious code. It is also interesting to see how possible developments of .NET on further operating systems, such as Linux, might change these dependency points and allow computer viruses to jump across operating systems easily. Imagine that each ring in Figure 3.1 has tiny penetration holes in it. When the holes on all the rings match the viral code and all the dependencies are resolved, the viral code successfully infects the system.

Figure 3.1 suggests how difficult virus research has become over the years. With many platforms already invaded by viruses, the fight against malicious code gets more and more difficult.

Please note that I am not suggesting that viruses would need to exploit systems. An exploitable vulnerability is just one possible dependency out of many examples.

Automation of malicious code analysis has also become increasingly more difficult because of diverse environment dependency issues. It is not uncommon to spend many hours with a virus in a lab environment, attempting natural replication, but without success, while the virus is being reported from hundreds or perhaps even thousands of systems around the world.

Another set of viruses could be so unsuccessful that a researcher could never manage to replicate them. Steve White of IBM Research once said that he could give a copy of the Whale virus ("the mother of all viruses") to everybody in the audience, and it would still not replicate3. However, it turns out that Whale has an interesting dependency on early 8088 architectures4 on which it works perfectly. Even more interestingly, this dependency disappears on Pentium and above processors5. Thus Whale, "the dinosaur heading for extinction,"6 is able to return, theoretically, in a Jurassic Park like fashion.

One of the greatest challenges facing virus researchers is the need to be able to recognize the types, formats, and sequences of code and to find its environment. A researcher can only analyze the code according to the rules of its en-vironment and prove that the sequence of code is malicious in that environment.

Over the years, viruses have appeared on many platforms, including Apple II, C64, Atari ST, Amiga, PC, and Macintosh, as well as mainframe systems and handheld systems such as the PalmPilot7, Symbian phones, and the Pocket PC. However, the largest set of computer viruses exists on the IBM PC and its clones.

In this chapter, I will discuss the most important dependency factors that computer viruses rely on to replicate. I will also demonstrate how computer viruses unexpectedly evolve, devolve, and mutate, caused by the interaction of virus code with its environment.

3.1. Computer Architecture Dependency

Most computer viruses do spread in executable, binary form (also called compiled form). For instance, a boot virus will replicate itself as a single or couple of sectors of code and takes advantage of the computer's boot sequence. Among the very first documented virus incidents was Elk Cloner on the Apple II, which is also a boot virus. Elk Cloner modified the loaded operating system with a hook to itself so that it could intercept disk access and infect newly inserted disks by overwriting their system boot sectors with a copy of its own code and so on. Brain, the oldest known PC computer virus, was a boot sector virus as well, written in 1986. Although the boot sequences of the two systems as well as the structures of these viruses show similarities, viruses are highly dependent on the particularities of the architecture itself (such as the CPU dependency described later on in this chapter) and on the exact load procedure and memory layout. Thus, binary viruses typically depend on the computer architecture. This explains why one computer virus for an Apple II is generally unable to infect an IBM PC and vice versa.

In theory, it would be feasible to create a multi-architecture binary virus, but this is no simple task. It is especially hard to find ways to execute the code made for one architecture to run on another. However, it is relatively easy to code to two independent architectures, inserting the code for both in the same virus. Then the virus must make sure that the proper code gets control on the proper architecture. In March of 2001, the PeElf virus proved that it was possible to create a cross-platform binary virus.

Virus writers found another way to solve the multi-architecture and operating system issue by translating the virus code to a pseudoformat and then translating it to a new architecture. The Simile.D virus (also known as Etap.D) of Mental Driller uses this strategy to spread itself on Windows and Linux systems on 32-bit Intel (and compatible) architectures.

It is interesting to note that some viruses refrain from replication in particular environments. Such an attempt was first seen in the Cascade virus, written by a German programmer in 1987. Cascade was supposed to look at the BIOS of the system, and if it found an IBM copyright, it would refrain from infecting. This part of the virus had a minor bug, so the virus infected all kinds of systems. Its author repeatedly released new versions of the virus to fix this bug, but the newer variants also had bugs in this part of the code8.

Another kind of computer virus is dependent on the nature of BIOS updating. On so-called flashable or upgradeable BIOS systems, BIOS infection is feasible. There have been published attempts to do this by the infamous Australian virus-writer group called VLAD.

3.2. CPU Dependency

CPU dependency affects binary computer viruses. The source code of programs is compiled to object code, which is linked in a binary format such as an EXE (executable) file format. The actual executable contains the "genome" of a program as a sequence of instructions. The instructions consist of opcodes. For instance, the instruction NOP (no operation) has a different opcode on an Intel x86 than on a VAX or a Macintosh. On Intel CPUs, the opcode is defined as 0x90. On the VAX, this opcode would be 0x01.

Thus the sequences of bytes most likely translate to garbage code from one CPU to another because of the differences between the opcode table and the operation of the actual CPU. However, there are some opcodes that might be used as meaningful code on both systems, and some viruses might take advantage of this. Most computer viruses that are compiled to binary format will be CPU-dependent and unable to replicate on a different CPU architecture.

There is yet another form of CPU dependency that occurs when a particular processor is not 100% backward compatible with a previous generation and does not support the features of another perfectly or at all. For example, the Finnpoly virus fails to work on 386 processors because the processor incorrectly executes the instruction CALL SP (make a call according to the Stack Pointer). Because the virus transfers control to its decrypted code on the stack using this instruction, it hangs the machine when an infected file is executed on a 386 processor. In addition, a similar error appeared in Pentium processors as well9. Another example is the Cyrix 486 clones, which have a bug in their single-stepping code10. Single-stepping is used by tunneling viruses (see Chapter 6, "Basic Self-Protection Strategies") such as Yankee_Doodle, thus they fail to work correctly on the bogus processors.

Note

It is not an everyday discovery to find a computer virus that fails because of a bug in the processor.

Some viruses use instructions that are simply no longer supported on a newer CPU. For instance, the 8086 Intel CPU supported a POP CS instruction, although Intel did not document it. Later, the instruction opcode (0x0f) was used to trap into multibyte opcode tables. A similar example of this kind of dependency is the MOV CS, AX instruction used by some early computer viruses, such as the Italian boot virus, Ping Pong:

OpcodeAssembly Instruction
8EC8MOV CS,AX
0EPUSH CS
1FPOP DS

Other computer viruses might use the coprocessor or MMX (Multimedia Extensions) or some other extension, which causes them to fail when they execute on a machine that does not support them.

Some viruses use analytical defense techniques based on altering the processor's prefetch queue. The size of the prefetch queue is different from processor to processor. Viruses try to overwrite code in the next instruction slot, hoping that such code is already in the processor prefetch queue. Such modification occurs during debugging of the virus code; thus, novice virus code analysts are often unable to analyze such viruses. This technique is also effective against early code emulationbased heuristics scanners. However, the disadvantage of such virus code is that it might become incompatible with certain kinds of processors or even operating systems.

3.3. Operating System Dependency

Traditionally, operating systems were hard-coded to a particular CPU architecture. Microsoft's first operating systems, such as MS-DOS, supported Intel processors only. Even Microsoft Windows supported only Intel-compatible hardware. However, in the '90s the need to support more CPU architectures with the same operating system was increasing. Windows NT was Microsoft's first operating system that supported multiple CPU architectures.

Most computer viruses can operate only on a single operating system. However, cross-compatibility between DOS, Windows, Windows 95/98, and Windows NT/2000/XP still exists on the Intel platforms even today. Thus, some of the viruses that were written for DOS can still replicate on newer systems. We tend to use less and less old, "authentic" software, however, thus reducing the risk of such infections. Furthermore, some of the older tricks of computer viruses will not work in the newer environments. On Windows NT, for example, port commands cannot be used directly to access the hardware from DOS programs. As a result, all DOS viruses that use direct port commands will fail at some point because the operating system generates an error. This might prevent the replication of the virus altogether if the port commands (IN/OUT operations) occur before the virus multiplies itself.

A 32-bit Windows virus that will infect only portable executable (PE) files will not be able to replicate itself on DOS because PE is not a native file format of DOS and thus will not execute on it. However, so-called multipartite viruses are able to infect several different file formats or system areas, enabling them to jump from one operating environment to another. The most important environmental dependency of binary computer viruses is the operating system itself.

3.4. Operating System Version Dependency

Some computer viruses depend not only on a particular operating system, but also on an actual system version. Young virus researchers often struggle to analyze such a virus. After a few minutes of unsuccessful test infections on their research systems, they might believe that a particular virus does not work at all. Especially at the beginning of a particular computer virus era, we can see a flurry of computer viruses repeating the same mistakes that make them dependent on some flavor of Windows. For example, the W95/Boza virus does not work on non-English releases of Windows 95, such as the Hungarian release of the operating system.

This leads to the discovery that computer viruses might be used to target the computers of one particular nation more than others. For example, Russian Windows systems can be different enough from U.S. versions to become recognizable, enabling the author of a virus, intentionally or unintentionally, to target only a subset of computer users. In general, however, after a virus has been created, its author has very little or no control over exactly where his or her creation will travel.

3.5. File System Dependency

Computer viruses also have file system dependencies. For most viruses, it does not matter whether the targeted files reside on a File Allocation Table (FAT), originally used by DOS; the New Technology File System (NTFS), used by Windows NT; or a remote file system shared across network connections. For such viruses, as long as they are compatible with the operating environment's high-level file system interface, they work. They will simply infect the file or store new files on the disk without paying attention to the actual storage format. However, other kinds of viruses depend strongly on the actual file system.

3.5.1. Cluster Viruses

Some successful viruses can spread only on a specific file system. For instance, the Bulgarian virus, DIR-II, is a so-called cluster virus, written in 1991. DIR-II has features specific to certain DOS versions but, even more importantly, spreads itself by manipulating key structures of FAT-based file systems. On FAT on a DOS system, direct disk access can be used to overwrite the pointer (stored in the directory entry) to the first cluster on which the beginning of a file is stored.

Files are stored on the disk as clusters, and the FAT is used by DOS to put the puzzle pieces together. The DIR-II virus overwrites the pointer in the directory entry that points to the first cluster of a file with a value that directs the disk-read to the virus body, which has been stored at the end of the disk. The virus stores the pointer to the real first cluster of each host program in an encrypted form, in an unused part of the directory entry structure. This is used later to execute the real host from the disk after the virus has been loaded in memory. In fact, when the virus is active in memory, the disk looks normal and files execute normally.

Such viruses infect programs extremely quickly because they only manipulate a few bytes in the directory entries on the disk. These viruses are often called "super fast" infectors1. It is important to understand that there is only one copy of DIR-II on each infected disk. Consequently, when DIR-II is not active in memory, the file system appears "cross-linked" because all infected files point to the same start cluster: the virus code.

A similar cluster infection technique appeared in the BHP virus on the Commodore 64 in Germany, written by "DR. DR. STROBE & PAPA HACKER" in circa 198611. This virus manipulates with the block entries of host programs stored on Commodore floppy diskettes. I decided to call this special infection technique the cluster prepender method. Let me tell you a little bit more about this ancient creature.

Normally, the Commodore 1541 floppy drive can store up to 166KB on each side of a diskette. The storage capacity of each diskette side is split into 664 "blocks" that are 256 bytes each. When BHP infects a program on the diskette, the virus will attempt to occupy eight free blocks for itself. Next, it replaces the "block" pointer in the first block of the host program to point to the virus code instead. Except for the first block, the host program's code will not be moved on the diskette. Instead, the virus will link its own "blocks" with the "blocks" of the host program as a single cluster of blocks. The infected host program will be loaded with the virus in front. Unlike the DIR-II virus, the BHP virus has multiple copies per diskettes. In each infection, eight blocks of free space will be lost on the diskette, but the infected files will not appear to be larger in a directory listing even if the virus is not active in memory.

Figure 3.2(1) shows when a BHP-infected program called TEST is loaded for the first time with a LOAD command. When I list the content of the loaded program with the LIST command, a BASIC command line appears as shown in Figure 3.2(2). This SYS command triggers the binary virus code. When I execute the infected program with the RUN command, the 6502 Assembly-written virus gets control. On execution of the virus code, BHP becomes active in memory. Finally, the virus runs the original host program. Figure 3.2(2) shows that a "HI" message is displayed when the loaded virus is executed. This message is displayed by the host program.

Figure 3.2. The BHP virus on Commodore 64.

Figure 3.2. The BHP virus on Commodore 64.

When BHP virus is active in memory it becomes stealth just like the DIR-II virus. As shown in Figure 3.2(3), I load the infected TEST program a second time. When I list the content of the program, I see the original host program, a single PRINT command that displays "HI." Thus, the virus is already stealth; as long as the virus code is active in memory, the original content of the program is shown instead of the infected program. In addition, the BHP virus implements a set of basic self-protection tricks. For example, the virus disables restart and reset attempts to stay active in memory. Moreover, BHP uses a self checksum function to check if its binary code was modified or corrupted. As a result, a trivially modified or corrupted virus code will intentionally fail to run.

3.5.2. NTFS Stream Viruses

FAT file systems are simple but very inefficient for larger hard disks (in FAT terms, a drive of several Gigabytes is considered very large). Operating systems such as Windows NT demanded modern file systems that would be fast and efficient on large disks and, more importantly, on the large disk arrays that span many Terabytes, as used in commercial databases.

To meet this need, the NTFS (NT file system) was introduced. A little-known feature of NTFS is primarily intended to support the multiple-fork concept of Apple's Hierarchical File System (HPS). Windows NT had to support multiple-fork files because the server version was intended to service Macintosh computers. On NTFS, a file can contain multiple streams on the disk. The "main stream" is the actual file itself. For instance, notepad.exe's code can be found in the main stream of the file. Someone could store additional named streams in the same file; for instance, the notepad.exe:test stream name can be used to create a stream name called test. When the WNT/Stream12 virus infects a file, it will overwrite the file's main stream with its own code, but first it stores the original code of the host in a named stream called STR. Thus WNT/Stream has an NTFS file system dependency in storing the host program.

Malicious hackers often leave their tools behind in NTFS streams on the disk. Alternate streams are not visible from the command line or the graphical file manager, Explorer. They generally do not increment the file size in the directory entries, although disk space lost to them might be noticed. Furthermore, the content of the alternate streams can be executed directly without storing the file content in a main stream. This allows the potential for sophisticated NTFS worms in the future.

3.5.3. NTFS Compression Viruses

Some viruses attempt to use the compression feature of the NTFS to compress the host program and the virus. Such viruses use the DeviceIoControl() API of Windows and set the FSCTL_SET_COMPRESSION control mode on them. Obviously, this feature depends on an NTFS and will not work without it. For example, the W32/HIV virus, by the Czech virus writer, Benny, depends on this. Some viruses also use NTFS compression as an infection marker, such as the WNT/Stream virus.

3.5.4. ISO Image Infection

Although it is not a common technique, viruses also attack image file formats of CD-ROMs, such as the ISO 9660, which defines a standard file system. Viruses can infect an ISO image before it is burnt onto a CD. In fact, several viruses got wild spread from CD-R disks, which cannot be easily disinfected afterwards. ISO images often have an AUTORUN.INF file on them to automatically lunch an executable when the CD-ROM is used on Windows. Viruses can take advantage of this file within the image and modify it to run an infected executable. This technique was developed by the Russian virus writer, Zombie, in early 2002.

3.6. File Format Dependency

Viruses can be classified according to the file objects they can infect. This short section is an introduction to binary format infectors. Many of the techniques are detailed further in Chapter 4, "Classification of Infection Strategies."

3.6.1. COM Viruses on DOS

Viruses such as Virdem and Cascade only infect DOS binary files that have the COM extension. COM files do not have a specific structure; therefore, they are easy targets of viruses. Dozens of variations of techniques exist to infect COM files.

3.6.2. EXE Viruses on DOS

Other viruses can infect DOS EXE files. EXE files start with a small header structure that holds the entry point of the program among other fields. EXE infector viruses often modify the entry point field of the host and append themselves to the end of the file. There are more techniques for infecting EXE files than for infecting COM files because of the format itself.

EXE files start with an MZ identifier, a monogram of the Microsoft engineer, Mark Zbikowski, who designed the file format. Interestingly, some DOS versions accept either MZ or ZM at the front of the file. This is why some of the early Bulgarian DOS EXE viruses infect files with both signatures in the front. If a scanner recognizes EXE files based on the MZ signature alone, it might have a problem detecting a virus with a ZM signature. Some tricky DOS viruses replace the MZ mark with ZM to avoid detection by antivirus programs, and yet others have used ZM as an infection marker to avoid infecting the file a second time.

Disinfecting EXE files is typically more complicated than disinfecting a COM file. In principle, however, the techniques are similar. The header information, just like the rest of the executable, must be restored, and the file must be truncated properly (whenever needed).

3.6.3. NE (New Executable) Viruses on 16-bit Windows and OS/2

One of the first viruses on Windows was W16/Winvir. Winvir uses DOS interrupt calls to infect files in the Windows NE file format. This is because early versions of Windows use DOS behind the scene. NE files are more complicated in their structure than EXE files. Such NE files start with an old DOS EXE header at the front of the file, followed by the new EXE header, which starts with an NE identifier.

One of the most interesting NE virus infection techniques was developed in the W16/Tentacle_II family, which was found in the wild in June 1996 in the U.S., U.K., Australia, Norway, and New Zealand. Not only was Tentacle_II in the wild, but it was also rather difficult to detect and repair because it took advantage of the complexity of the NE file format. This virus is discussed further in Chapter 4.

3.6.4. LX Viruses on OS/2

Linear eXecutables (LXs) were also introduced in later versions of OS/2. Not many viruses were ever implemented in them, but there are a few such creations. For instance, OS2/Myname is a very simple overwriting virus.

Myname uses a couple of system calls, such as DosFindFirst(), DosFindNext(), DosOpen(), DosRead(), and DosWrite(), to locate executables and then overwrites them with itself. The virus searches for files with executable extensions in the current directory. It does not attempt to identify OS/2 LX files for infection; it simply overwrites any files with its own copy. Nonetheless, OS2/Myname is dependent on the LX file format and OS/2 environment for execution given that the virus itself is an LX executable.

The OS2/Jiskefet version of the virus also overwrites files to spread itself. This virus looks specifically for files with a New Executable header that starts with the LX mark:

	cmp	word ptr [si], 'XL'
	jnz	NO

The header of the file is loaded by the virus, and the si (source index) register is used as an index to check for the mark. If the marker is missing, the virus will not overwrite the file. As a result, Jiskefet is more dependent on the LX file format than Myname.

3.6.5. PE (Portable Executable) Viruses on 32-bit Windows

The first virus known to infect PE files was W95/Boza, written by members of the Australian virus-writing group, VLAD, for the beta version of Windows 95.

The virus was named Bizatch by its authors but got its current name, Boza, from Vesselin Bontchev. He called the virus Boza, referring to a bizarre Bulgarian drink with color and consistency of mud that is disliked by most non-Bulgarians. Bontchev picked the name not only because Boza sounds similar to "Bizatch," but also because the virus was "buggy and messily written." The Bulgarian idiom, "This is a big boza," means "this is extremely messy and unclear."

Quantum, the virus writer, was unhappy about this, which was Bontchev's intention in choosing the name. In fact, other viruses attacked antivirus software databases to change the name of Boza to Bizatch so that the original name would be displayed when an antivirus program detected it. This illustrates the psychological battle waged between virus writers and antivirus researchers.

Because PE file infection is currently one of the most common infection techniques, I will provide more information about it in Chapter 4. Many binary programs use the PE file format, including standard system components, regular applications, screen-saver files, device drivers, native applications, dynamic link libraries, and ActiveX controls.

The new 64-bit PE+ files are already supported by 64-bit architectures, such as IA64, AMD64, and EM64T. Computer virus researchers expected that 64-bit Windows viruses will appear to infect this format correctly with native 64-bit virus code.

The W64/Rugrat.334413 virus appeared in May 2004, written by the virus writer "roy g biv." Rugrat is written in IA64 Assembly. The virus is very compactabout 800 lines. Rugrat utilizes modern features of the Itanium processor, such as code predication. In addition, roy g biv released the W64/Shruggle virus during the summer of 2004. W64/Shruggle infects PE+ files that run on the upcoming 64-bit Windows on AMD64.

3.6.5.1 Dynamic Link Library Viruses

The W95/Lorez virus was one of the first 32-bit Windows viruses that could infect a dynamic link library (DLL). A Windows DLL uses the same basic file format as regular PE executables. Dynamic linked libraries export functions that other applications can use.

The interface between applications and dynamic link libraries is facilitated by exports from DLLs and imports into the executables. Lorez simply infects the user mode KERNEL client component, KERNEL32.DLL. By modifying the DLL's export directory, such viruses can hook an API interface easily.

DLL infection became increasingly successful with the appearance of the Happy99 worm (also known as W32/SKA.A, the worm's CARO name), written by Spanska in early 1999. Figure 3.3 is a capture of Happy99's fireworks payload.

Figure 3.3. The Happy99 worm's payload.

Figure 3.3. The Happy99 worm's payload.

Just as many other worms are linked to holidays, this worm took advantage of the New Year's period by mimicking an attractive New Year's card application.

Happy99 injected a set of hooks into the WSOCK32.DLL library, hooking the connect() and send() APIs to monitor access to mail and newsgroups.

Happy99 started a debate about computer malware classifications by carrying the following message for researchers:

Is it a virus, a worm, a trojan? MOUT-MOUT Hybrid (c) Spanska 1999.
3.6.5.2 Native Viruses

Recently, a new kind of 32-bit Windows virus is on the rise: native infectors. The first such virus, W32/Chiton, was created by the virus writer, roy g biv, in late 2001. Unlike most Win32 viruses, which depend on calling into the Win32 subsystem to access API functions to replicate, W32/Chiton can also replicate outside of the Win32 subsystem.

A PE file can be loaded as a device driver, a GUI Windows application, a console application, or a native application. Native applications, such as autochk.exe, load during boot time. Because they load before subsystems are available, they are responsible for their own memory management. In their file headers, the PE.OptionalHeader.Subsystem value is set to 0001 (Native).

The HKLM\System\CurrentControlset\Control\Session Manager\BootExecute value contains the names and arguments of native applications that are executed by the Session Manager at boot time. The Session Manager looks for such applications in the Windows\System32 directory, with the native executable names specified.

Native applications use the NTDLL.DLL (Native API), where hundreds of APIs are stored and remain largely undocumented by Microsoft. Native applications do not rely on the subsystem DLLs, such as KERNEL32.DLL, as these DLLs are not yet loaded when native applications load. There are only a handful of APIs that a computer virus needs to be able to call from NTDLL.DLL, and virus writers have already discovered the interface for these functions and their parameters.

W32/Chiton relies on the following NTDLL.DLL APIs for memory, directory, and file management:

  1. Memory management:
    • RtlAllocateHeap()
    • RtlFreeHeap()
  2. Directory and file search:
    • RtlSetCurrentDirectory_U()
    • RtlDosPathNameToNtPathName_U()
    • NtQueryDirectoryFile()
  3. File management:
    • NtOpenFile()
    • NtClose()
    • NtMapViewOfSection()
    • NtUnmapViewOfSection()
    • NtSetInformationFile()
    • NtCreateSection()

Native viruses can load very early in the boot process, which gives them great flexibility in infecting applications. Such viruses are similar in structure to kernel-mode viruses. Therefore, it is expected that kernel-mode and native infection techniques will be combined in the future.

3.6.6. ELF (Executable and Linking Format) Viruses on UNIX

Viruses are not unknown on UNIX and UNIX-like operating systems, which generally use the ELF executable file format14. Typically, ELF files do not have any file extensions, but they can be identified based on their internal structure.

Just like PE files, ELF files can support more than one CPU platform. Moreover, ELF files can properly support 32-bit as well as 64-bit CPUs in their original design, unlike PE files, which needed some minor updates to make them compatible with 64-bit environments (resulting in the PE+ file format).

ELF files contain a short header, and the file is divided into logical sections. Viruses that spread on Linux systems typically target this format. Most Linux viruses are relatively simple15. For instance, the Linux/Jac.8759 virus can only infect files in the current folder.

One of the most complex Linux viruses is {W32,Linux}/Simile.D (also known as Etap.D), which was the first entry-point obscuring Linux virus (more on this in Chapter 4). Of course, Simile.D's success will depend on how well security settings are used in the file system. Writeable files will be infected; however, the virus does not elevate privileges to infect files.

It seems likely that future computer worm attacks (such as Linux/Slapper) will be combined with ELF infection on Linux. The elevated privileges often gained by exploiting network services can result in better access to binary files.

The main problem for ELF viruses is the missing binary compatibility between various flavors of UNIX systems. The diversification of the binaries on various CPUs introduces library dependency. Because of this, many ELF-infecting viruses suffer serious problems and crash with core dumps rather than causing infections.

3.6.7. Device Driver Viruses

Device driver infectors were not very common in the DOS days, although virus writer magazines such as 40Hex dedicated early articles to the subject. Device drivers for popular operating systems tend to have their own binary format, but as these are special forms of the more general executable formats for those platforms, all can be infected with known virus infection techniques. For example, 16-bit Windows drivers must be in the LE (linear executable) format. LE is very similar to the OS/2 LX file format. Of course, viruses can infect such files, too.

On Windows 9x, the VxD (virtual device driver) file format was never officially documented by Microsoft for the general public's use. As a result, only a few viruses were created that could infect VxD files. For example, W95/WG can infect VxD files and modify their entry point to run an external file each time the infected VxD is loaded. Consequently, only the entry-point code of the VxD is modified to load the virus code from the external source.

Other viruses, such as the W95/Opera family, infect VxD files by appending the virus code to the end of the file and modifying the real mode entry point of the VxD to run themselves from it.

Recently, device driver infectors appeared on Windows XP systems. On NT-based systems, device drivers are PE files that are linked to NT kernel functions. The few such viruses that exist today hook the INT 2E (System Service on IA32-based NT systems) interrupt handler directly in kernel mode to infect files on the fly. For example, WNT/Infis and W2K/Infis families can infect directly in Windows NT and Windows 2000 kernel mode. The W32/Kick virus was created by the Czech virus writer, Ratter, in 2003. W32/Kick infects only SYS files in the PE device driver format. The virus loads itself into kernel mode memory but runs its infection routine in user mode to infect files through the standard Win32 API.

Note

More information about in-memory strategies of computer viruses is available in Chapter 5, "Classification of In-Memory Strategies."

3.6.8. Object Code and LIB Viruses

Object and LIB infections are not very common. There are only about a dozen such viruses because they tend to be dependent on developer environments.

Source code is first compiled to object code, and then it is linked to an executable format:

Source Code - Object code / Library code - Executable.

Viruses that attack objects or libraries can parse the object or library format. For instance, the Shifter virus16 can infect object files. Such viruses spread in a couple of stages as shown in Figure 3.4.

Figure 3.4. The infection stages of the Shifter virus.

Figure 3.4. The infection stages of the Shifter virus.

Shifter was written by Stormbringer in 1993. The virus carefully checks whether an object file is ready to be linked to a COM, DOS executable. This is done by checking the Data Record Entry offset of object files. If this is 0x100, the virus attempts to infect the object in such a way that once the object is linked, it will be in the front of the COM executable.

3.7. Interpreted Environment Dependency

Several virus classes depend on some sort of interpreted environment. Almost every major application supports users with programmability. For example, Microsoft Office products provide a rich programmable macro environment that uses Visual Basic for Applications. (Older versions of Word, specifically Word 6.0/Word 95, use WordBasic.) Such interpreted environments often enhance viruses with multi-platform capabilities.

3.7.1. Macro Viruses in Microsoft Products

Today there are thousands of macro viruses, and many of them are in the wild. Users often exchange documents that were created with a Microsoft Office product, such as Word, Excel, PowerPoint, Visio, or even Access or Project. The first wild-spread macro virus, WM/Concept.A17, appeared in late 1995. Within a couple of months, only a few dozen such viruses were found, but by 1997 there were thousands of similar creations. The XM/Laroux18, discovered in 1996, was the first wild-spread macro virus to infect Excel spreadsheets. The first known Word macro virus was WM/DMV, written in 1994. The author of the WM/DMV virus also created a nearly functional Excel macro (XM) virus at the same time.

Figure 3.5 illustrates a high-level view of an OLE2 file used by Microsoft products. Microsoft does not officially document the file structure for the public.

Figure 3.5. A high-level view of the OLE2 file format.

Figure 3.5. A high-level view of the OLE2 file format.

Please note that Microsoft products do not directly work with OLE2 files. As a result, technically a macro virus in any such Microsoft environments does not directly infect an OLE2 file because Microsoft products access these objects through the OLE2 API. Also note that different versions of such Microsoft programs use different languages or different versions of such languages.

In the front of OLE2 files, you can find an identifier, a sequence of hex bytes "D0 CF 11 E0," which looks like the word DOCFILE in hex bytes (with a lowercase L). These bytes can appear in both big-endian and little-endian formats. Other values are supported by various beta versions of Microsoft Office products. The header information block contains pointers to important data structures in the file. Among many important fields, it contains pointers to the FAT and the Directory. Indeed, the OLE2 file is analogous to MS-DOS FAT-based storage. The problem is that OLE2 files have an extremely complex structure. They are essentially file systems in a file with their own clusters, file allocation table, root directory, subdirectories (called "storages"), files (called "streams"), and so on.

The basic sector size is 512 bytes, but larger values are also allowed. (In some implementations, a mini-FAT19 allows even shorter "sector" sizes.) Office products locate macros by looking in the Directory of an OLE2 file for the VBA storage folder. The macros appear as streams inside the document. Obviously, any objects can get fragmented, as in a real file systemcorruptions of all kinds are also possible, including circular FAT or Directory entries, and so on. Unfortunately, even macros can get corrupted; as you will see, this fact contributes to the natural creation of new macro virus variants.

In addition, documents have a special bit inside, the so-called template bit. WinWord 6/7 does not look for macros if the template bit is off20.

Macro viruses are stored inside the document instead of at the front or at the very end of the file. Even worse, the macros are buried inside some of the streams, and the streams themselves have a very complex structure. When looking at the physical OLE2 document, without understanding its structure, the (otherwise logically continuous) body of a macro of a macro virus could be split into chunkssome of them as small as 64 bytes.

A major challenge is the protection of user macros in the documents during the removal of virulent macros. In some cases, it is simply impossible to remove a macro virus safely without also removing the user macros. Obviously, users prefer to keep their own macros and remove the viruses from them, but such acrobatics are not always possible.

Macro viruses are much easier to create than other kinds of file infectors. Furthermore, the source of the virus code is available to anybody with the actual infection. Although this greatly simplifies the analysis of macro viruses, it also helps attackers because the virus source code can be accessed and modified easily.

To understand the internal structure of OLE2 documents better, look at a comment fraction of the W97M/Killboot.A virus in Microsoft's DocFile Viewer application, shown in Figure 3.6. DocFile Viewer is available as part of Microsoft Visual C++ 6.0. This tool can be used to browse the document storage and find the "ThisDocument" stream in the Macros\VBA directory.

Figure 3.6. The W97M/Killboot.A virus in DocFile Viewer.

Figure 3.6. The W97M/Killboot.A virus in DocFile Viewer.

The ThisDocument stream can be further browsed to find the virus code. In Figure 3.6, a comment by the virus writer can be seen encoded as VBA code:

E0 00 00 00 39 00 73 65 74 20 74 68 65 20 64 61 ....9.set the da
79 20 6F 66 20 41 72 6D 61 67 65 64 64 6F 6E 2C y of Armageddon,
20 74 68 65 20 32 39 74 68 20 64 61 79 20 6F 66 the 29th day of
20 74 68 65 20 6E 65 78 74 20 6D 6F 6E 74 68 00 the next month.

The 0xE0 opcode is used for comments. The 0x39 represents the size of the comment. Thus the preceding line translates to

'set the day of Armageddon, the 29th day of the next month

The opcode itself is VBA version-specific, so the 0xE0 byte can change to other values, resulting in Word up-conversion and down-conversion issues21.

One of the most interesting aspects of macro viruses is that they introduced a new set of problems not previously seen in such quantities with any other type of computer virus.

3.7.1.1 Macro Corruption

Many macro viruses copy themselves to new files using macro copy commands. A macro virus can copy itself into a new document in this way, often attacking the global template called NORMAL.DOT first and then copying itself from the global template back to user documents.

A natural mutation often occurs in Microsoft Word environments22. The real reason for the corruption was never found, but it is believed to be connected to saving documents on floppy disks. Some users simply did not wait until the document was written perfectly to disk, which can result in a couple of bytes of corruption in the macro body. Because Word interprets the VBA code line by line, it will not generate an error message unless the faulty code is about to be executed1.

As demonstrated earlier, macros are stored as binary data in Word documents. When the binary of the macro body gets corrupted, the virus code often can survive and work at least partially. The problem is that such corruptions are so common that often hundreds of minor variants of a single macro virus family are created by the "mutation engine" of Microsoft Word itself! For instance, the WM/Npad family has many members that are simply natural corruptions, which are not created intentionally.

Corrupted macro viruses can often work after corruption. There are several reasons for this commonly observed behavior:

Consider the example shown in Listing 3.1.

Listing 3.1. A Corrupted Macro Example

	Sub MAIN
		SourceMacro$= FileName$()+ "Foobar"
		DestinationMacro$ = "Global:Foobar"
 
		MacroCopy(SourceMacro$, DestinationMacro$)
 
		// Corruption here //
	End Sub

Because most macro viruses include an error handler at the beginning of their code, macro virus compilation and execution has tended to be resilient to all but the most traumatic corruptions.

Because many AV products use checksums to detect and identify macro viruses, the antivirus software can get confused by the corrupted macro virus variants. Using checksums is the only way to exactly identify each different variant.

Other types of viruses, such as Assembly-written viruses on DOS, most often fail immediately when the slightest corruption occurs in them. However, macro viruses often survive the corruption because the actual replicating instructions are so short in the macro body.

3.7.1.2 Macro Up-Conversion and Down-Conversion

When creating Word 97 and additional support for VBA, Microsoft decided to create new document formats and started to use a different, even richer macro language. To solve compatibility problems for customers, they decided to automatically convert old macros to the new formats. As a result, when a macro virus in the Word 95 WordBasic format was opened with the newer editions of Word, the virus might be converted to the new environment, creating a new virus. As a result, WM viruses are often converted to W97M format, and so on.

The macro up-conversion issue generated many problems for antivirus researchers that went beyond simple technicalities. Some researchers believed it was not ethical to up-convert all old macro viruses to the new format, while others believed it was the only choice to protect customers. Today, techniques are available23 to convert the different macro formats to a canonical form; thus, detection can be done on the canonical form using a single definition. This greatly simplifies the macro detection problems and reduces the antivirus scanner's database growth because less data need to be stored to detect the viruses, and the virus code no longer needs to be replicated on more than one Office platform.

3.7.1.3 Language Dependency

Given that Microsoft translated basic macro commands, such as FileOpen, into different language versions for Office products, most viruses that use these commands to infect files cannot spread to another language version of Microsoft Office, such as the German edition.

Table 3.1 lists some of the most common macro names in Microsoft Word in various localized versions.

Table 3.1. Common Macro Names in Microsoft Word in Some Localized Versions

EnglishFinnishGerman
FileNewTiedostoUusiDateiNeu
FileOpenTiedostoAvaaDateiOffnen
FileCloseTiedostoSuljeDateiSchliesen
FileSaveTiedostoTallennaDateiSpeichern
FileSaveAsTiedostoTallennaNimmelläDateiSpeichernUnter
FileTemplatesTiedostoMallitDateiDokVorlagen
ToolsMacroTyökalutMacroExtrasMakro
SpanishFrenchItalian
ArchivoNuevoFichierNouveauFileNuovo
ArchivoAbrirFichierOuvrirFileApri
ArchivoCerrarFichierFermerFileChiudi
ArchivoGuardarFichierEnregisterFileSalva
ArchivoGuardarComoFichierEnregisterSousFileSalvaConNome
ArchivoPlantillasFichierModulesFileModelli
HerramMacroOutilsMacroStrumMacro

Various Office products use different versions of these macro names. A few common examples can be found in Table 3.2 for English Microsoft Office products.

Table 3.2. Differences in Macro Names Between Word and Excel

Microsoft WordMicrosoft Excel
AutoCloseAuto_Close
AutoOpenAuto_Open

The WM/CAP.A24 virus is an example of language independence, because it uses menu indexes. Using menu indexes was strongly recommended to macro developers by the Microsoft Access team. Of course, menu indexes only work reliably if the host environment has not been customized.

The WM/CAP.A virus also fools users to believe that they are saving their files in RTF (Rich Text Format) when, in fact, they are saving them as infected DOC files instead. Users would prefer to save files as RTF to avoid saving active macros into documents. The virus takes over the File/SaveAs... operation for this trick25.

3.7.1.4 Platform Dependency of Macro Viruses

Although most macro viruses are not platform-dependent, several have introduced some sort of dependency on the actual platform. Microsoft Office products are used not only on Windows but also on Macintosh systems. Not all macro viruses, however, are able to work on both platforms because of the following common reasons.

Win32 function calls
A few macro viruses define API function calls for their own use from the Win32 set of Windows. Such viruses might fail to replicate on the Mac because the API is not implemented on it. For instance, the virus WM/Hot.A used the GetWindowsDirectory() API calls in January, 199626.
Declare Function GetWindowsDirectory Lib "KERNEL.EXE" \
	(Buffer As String, Size As Integer) As Integer
	:
	:
GetWindowsDirectory(WinPath$, SizeBuf)

Tricky macro viruses use Win32 callback functions to run code outside of the context of the macro interpreter. For instance, a simple string variable is defined that has encoded Assembly code. Often the chr() function is used to build larger strings that contain code. Then the callback routine is used to run the string directly as code. This way, the macro virus jumps out of the context of the macro interpreter and becomes CPU and platform -dependent.

For example, the {W32, W97M}/Heathen.12888 virus uses the CallBack12(), CallBack24(), and CreateThread() APIs of KERNEL32.DLL to achieve infection and dropping mechanism of both documents and 32-bit executables.

Location of files in storage
Another key difference among operating system platforms is the location of files on the disk. Some macro viruses use hard-coded path names, such as the location of the NORMAL.DOT template on the C: drive. Obviously, they cannot work on the Mac.

In addition, viruses often assume a Windows-style file system, even if they use the "correct" VBA methods to get the configured folder locations.

Registry modifications
Some macro viruses modify Registry keys on Windows systems to introduce extra tricks or store variables. Such viruses introduce OS dependencies as a result.
3.7.1.5 Macro Evolution and Devolution

Macro viruses consist of a single macro or set of macros. Because these individual macros must be recognized by the antivirus programs on a macro-to-macro basis, a set of interesting problems occurs.

Some macro viruses will copy more than their own set of macros. They can snatch macros from the documents they had infected previously. This way, the virus might evolve into new forms naturally. Some viruses will lose macros from their sets and thus will naturally devolve27 to other forms. There are also sandwiches28, which are created when more than one macro or script virus shares a macro name or script file.

A set of dangerous situations was introduced because of antivirus detection and disinfection, one of which was found by Richard Ford29. The problem occurs when an antivirus product detects a subset of known macros ("macro virus remnants") from a set of macros in a newer virus that has at least one new macro among the other, older known macros. If the antivirus product removes the known macros, it could create a new virus by leaving a macro or set of macros in the document that is still part of the virus and often remains viral itself. This problem can be avoided in several different ways, one of which is to remove all macros from infected documents (although this means removing user macros from the documents also). Researchers also suggested defining a minimal set of macros from a known virus to "safely remove" a set of viral macros from a document. However, there is a natural extension of Richard Ford's problem, which was found by Igor Muttik, described in a scientific paper of Vesselin Bontchev in detail30. This is known as "Igor's problem."

Suppose there is a virus known as Foobar that consists of a single macro called M. The antivirus program identifies M in an infected document, but when it attempts to disinfect the document, a problem occurs. This happens because there is a variant of the Foobar virus in the document. This variant of Foobar consists of { M, P} macros. Unfortunately, macro P is not known to the antivirus program; thus, whenever the antivirus removes macro M, it will leave P behind. The major problem is that P could be a fully functional virus on its own. Consequently, an antivirus program, even with exact identification for Foobar, would create a new virus by accident when repairing a document in such situation. Indeed, sometimes, it is dangerous to remove a macro virus without removing all macros from the document.

The environment of the malicious programs and agents within the environment of the programs can make changes in computer viruses that result in newly evolved or devolved creatures. In addition, multiple infections of different macro viruses in the same document can lead to "crossed" threats and behavior. Indeed, viruses can become "sexual" by accident: They can exchange their macros ("genes") and evolve and devolve accordingly.

3.7.1.6 Life Finds a WaySource, P-code, and Execode

Microsoft file formats had to be reverse-engineered by AV companies to be able to detect computer viruses in them. Although Microsoft offered information to AV developers about certain file formats under NDA, the information received often contained major bugs or was incomplete31.

Some AV companies were more successful in their reverse-engineering efforts than others. As a result, a new kind of expert quickly emerged at AV companies: the file format expert. Among the best file format experts are Vesselin Bontchev, Darren Chi, Peter Ferrie, Andrew Krukov ("Crackov"), Igor Muttik, and Costin Raiu to just name a few.

Starting with VBA5 (Office 97), documents contain the compressed source of the macros, as well as their precompiled code, called p-code (pseudocode), and execode. Execode is a further optimization of p-code that simply runs without any further checks because its state is self-contained. A problem appears because under the right circumstances, any of these three forms can run.

Unfortunately, some AV companies produced products that occasionally corrupted the documents they repaired. In other cases, the products removed any of the three forms, without removing at least one of the other two. For example, some antivirus programs might remove the p-code, but they leave the source behind. Normally the p-code would run first. The VBA Editor also displays decompiled p-code as "a source" for macros, instead of using the actual source code of macros which are saved in the documents. Given the right circumstances, however, when the p-code is removed but the source is not, the virus might be revived. This happens when the document is created in Office 97 but is opened with Office 2000.

Most viruses break without the source because they often use a function such as MacroCopy() that copies the source. In other cases such as worms, however, the macro will continue to function properly because it does not refer to its source.

In some other cases, the execode might run on its own without source and p-code in the document. If the VBA project does contain execode and is opened by the same version of the Office application as the one that created it, the execode runs, and everything else is ignored. In fact, antivirus researchers experienced a case with the X97M/Jini.A virus where both the p-code and the source were removed from a document, but the execode was left behind when an antivirus program "cleaned" the document. The virus runs from the execode when the infected document is opened in the same version of Office that created it26; thus, some of the "half-cooked repaired" viruses can still function and infect further. Life finds a way, so to speak! Indeed, not all viruses will survive, but those that do don't need to refer to their sources or modules. Jini survives because it does not copy any modules. Instead, it copies the victim's data sheets to the workbook where it resides and then overwrites the file of the victim with the file in which it resides. Of course these tricky cases introduce major problems for the antivirus programs. Viruses that exist only in execode form are especially hard to detect. (So far, Microsoft has not provided information about this format to AV developers even under NDA26.)

3.7.1.7 Macro Viruses in the Form of the Multipartite Infection Strategy

There are a couple of binary viruses that attempt to infect documents. These viruses are not primarily dependent on the interpreted environments.

For instance, the multipartite virus, W32/Coke, drops a specially infected global template with a little loader code. This loader will fetch polymorphic macro code (as discussed in Chapter 7) from a text file into the global template. As a result, Coke is one of the most polymorphic binary viruses, as well as a macro virus. Polymorphic macro viruses are usually very slow because of many iterations required to run their code. However, it is normally the polymorphic engine that is slow. Because Coke generates polymorphic macro virus code in a text file using its Win32 code, the polymorphic macro of Coke is not as slow as most polymorphic macro viruses based on macro polymorphic engines.

Other viruses do not need Word to infect Office documents. These viruses are very rare and usually very buggy. Even the Word 6 file format is complicated enough to parse and modify it in such a way that a macro is inserted in the file. The W95/Navrhar virus injects macro code to load a binary file from the end of the Word document. Thus, Navrhar can infect documents without Word installed on the system.

3.7.1.8 New Formula

Another set of problems occurred because Excel not only supported standard macros, but formula macros as well. As you might expect, formulas are not stored with macros; therefore, their locations had to be identified.

Viruses that need the Microsoft Excel Formula language to replicate are predicated with the XF/ tag. Excel macros are stored in the Excel macro module area, but Excel formulas are stored in the Excel 4 macro area instead. Therefore, these viruses are not visible via the Tools/Macro menu, and users must create a special macro to find them. The first such virus, known as XF/Paix32, was of French origin.

3.7.1.9 Infection of User Macros

Most macro viruses replicate their own set of macros to other documents. However, infection is also possible by modifying existing user macros to spread the virus code, similar to the techniques of binary infectors. In practice, very few macro viruses use these parasitic techniques. This is because most of the documents do not contain user macros, and thus the spreadability of such parasitic macro viruses is seriously limited. (In addition, macro viruses often delete any existing macros in the objects they are infecting.) This kind of macro virus is very difficult to detect and remove with precision.

3.7.1.10 New File Formats: XML (Extensible Markup Language)

Microsoft Office 2003 introduced the ability to save documents in XML, textual format. This caused a major headache for antivirus developers, who must parse the entire file to find the embedded, encoded OLE2 files within such documents and then locate the possible macros within them. Currently, Word and Visio 2003 support the XML format with embedded macros33. Initially, such documents did not have any fields in their headers that would indicate whether or not macros were stored in them. Microsoft changed the file format of Word slightly in the release of this version due to pressure from the AV community.

Visio 2003, however, was released without any such flags, leaving no choice for AV software but to parse the entire XML file to figure out whether there are macros in it. Thus, the overhead of scanning increases dramatically and is particularly noticeable when files are scanned over the network.

Note

XML infection was considered by virus writers years ago using VBS (Visual Basic Script) code. The idea was that an XML file can contain a Web link to reference code that is stored in an XSL (Extensible Stylesheet Language) file. This technique was first proposed by the virus writer, Rajaat, and was later introduced by the W32/Press virus of Benny.

3.7.2. REXX Viruses on IBM Systems

IBM has a long tradition of implementing interpreted language environments. Examples include the powerful Job Control languages on mainframe systems. IBM also introduced the REXX command script language to better support both large batch-like installations and simple menu-based installation programs. Not surprisingly, virus writers used REXX to create new script viruses. In fact, some of the first mass-mailer script viruses, such as the infamous CHRISTMA EXEC34 worm, were written in REXX. The worm could execute on machines that supported the REXX interpreter on an IBM VM/CMS system and were also connected to a network. This worm was created by a German Informatics student35 in 1987.

CHRISTMA EXEC displayed the Christmas tree and message shown in Figure 3.7 when the REXX script was executed by the user. Obviously, such viruses rely on social engineering for their execution on remote systems. However, users were happy to follow the instructions in the source of the script. The worm looked around for user IDs on the system and used the CMS command SENDFILE (or SF in short form) to send CHRISTMA EXEC files to other users.

Figure 3.7. A snippet of the CHRISTMA EXEC worm.

Figure 3.7. A snippet of the CHRISTMA EXEC worm.

At one point, such viruses were so common that IBM had to introduce a simple form of content filtering on its gateways to remove them.

REXX interpreters were made available on other IBM operating systems, such as OS/2, as well; thus, a few REXX viruses appeared on OS/2.

3.7.3. DCL (DEC Command Language) Viruses on DEC/VMS

The Father Christmas worm was released in 1988. This worm attacked VAX/VMS systems on SPAN and HEPNET. It utilized DECNET protocols instead of Internet TCP/IP protocols and exploited TASK0, which allows outsiders to perform tasks on the system.

This worm made copies of itself as HI.COM. Although DOS COM files have a binary format, the DCL files with COM extensions are simple text files. The worm sent mail from the infected nodes; however, it did not use e-mail to propagate itself. In fact, this worm could not infect the Internet at all. It attacked remote machines using the default user account and password and copied itself line by line (151 lines) to the remote machine.

Then the worm exploited TASK0 to execute its own copy remotely. It used the SET PROCESS/NAME command to run itself as a MAIL_178DC process on the remote node36. Father Christmas mailed users on other nodes the following funny message:

$ MAILLINE0 = "HI,"
$ MAILLINE1 = ""
$ MAILLINE2 = " HOW ARE YA ? I HAD A HARD TIME PREPARING ALL THE PRESENTS."
$ MAILLINE3 = " IT ISN'T QUITE AN EASY JOB. I'M GETTING MORE AND MORE"
$ MAILLINE4 = " LETTERS FROM THE CHILDREN EVERY YEAR AND IT'S NOT SO EASY"
$ MAILLINE5 = " TO GET THE TERRIBLE RAMBO-GUNS, TANKS AND SPACE SHIPS UP HERE A
T"
$ MAILLINE6 = " THE NORTHPOLE. BUT NOW THE GOOD PART IS COMING."
$ MAILLINE7 = " DISTRIBUTING ALL THE PRESENTS WITH MY SLEIGH AND THE"
$ MAILLINE8 = " DEERS IS REAL FUN. WHEN I SLIDE DOWN THE CHIMNEYS"
$ MAILLINE9 = " I OFTEN FIND A LITTLE PRESENT OFFERED BY THE CHILDREN,"
$ MAILLINE10 = " OR EVEN A LITTLE BRANDY FROM THE FATHER. (YEAH!)"
$ MAILLINE11 = " ANYHOW THE CHIMNEYS ARE GETTING TIGHTER AND TIGHTER"
$ MAILLINE12 = " EVERY YEAR. I THINK I'LL HAVE TO PUT MY DIET ON AGAIN."
$ MAILLINE13 = " AND AFTER CHRISTMAS I'VE GOT MY BIG HOLIDAYS :-)."
$ MAILLINE14 = ""
$ MAILLINE15 = " NOW STOP COMPUTING AND HAVE A GOOD TIME AT HOME !!!!"
$ MAILLINE16 = ""
$ MAILLINE17 = "	MERRY CHRISTMAS"
$ MAILLINE18 = "		AND A HAPPY NEW YEAR"
$ MAILLINE19 = ""
$ MAILLINE20 = "			YOUR FATHER CHRISTMAS"
 

3.7.4. Shell Scripts on UNIX (csh, ksh, and bash)

Most UNIX systems also support script languages, commonly called shell scripts. These are used for installation purposes and batch processing. Naturally, computer worms on UNIX platforms often use shell scripts to install themselves. Shell scripts have the advantage of being able to run equivalently on different flavors of UNIX. Although binary compatibility between most UNIX systems is not provided, shell scripts can be used by attackers to circumvent this problem. Shell scripts can use standard tools on the systems, such as GREP, that greatly enhances the functionality of the viruses.

Shell scripts can implement most of the known infection techniques, such as the overwriter, appender, and prepender techniques. In 2004 some new worms appeared such as SH/Renepo.A that use bash script to copy themselves into the StartupItems folders of mounted drives on MAC OS X. This indicates a renewed interest of worm developments on MAC OS X. In addition, threats like Renepo exposes MAC OS X systems to a flurry of attacks by turning the firewall off, run the popular password cracker tool John The Ripper, and create new user accounts for the attackers. However, current attacks require root privileges.

(It is expected that MAC OS X will be the target of future remote exploitation attacks as well.)

3.7.5. VBScript (Visual Basic Script) Viruses on Windows Systems

Windows script viruses appeared after the initial macro virus attack period was over. The VBS/LoveLetter.A@mm worm spread very rapidly around the world in May of 2000. LoveLetter arrived with a simple message with the subject ILOVEYOU, as shown in Figure 3.8. The actual attachment has a "double extension." The "second" extension is VBS, which is necessary to run the attachment as a Visual Basic Script. This "second" extension is not visible unless the Windows Explorer Folder option Hide File Extensions for Known File Types is disabled. By default, this option is enabled. As a result, many novice users believed they were clicking a harmless text file, a "love letter."

Figure 3.8. Receiving a "love letter."

Figure 3.8. Receiving a "love letter".

On execution of the attachment, the VBS file runs with the script interpreter WSCRIPT.EXE. Mass-mailer VBS script worms typically use Outlook MAPI functions via CreateObject ("Outlook.Application") followed by the NameSpace ("MAPI") method to harvest e-mail addresses with AddressLists(), and then they mass-mail themselves as an attachment to recipients via the Send() method. In this way, many users receive e-mail from people they know. As a result, many recipients are curious enough to run the attachmentoften on more than one occasion.

VBS viruses can use extended functionality via ActiveX objects. They have access to file system objects, other e-mail applications, and locally installed ActiveX objects.

3.7.6. BATCH Viruses

BATCH viruses were not particularly successful in the DOS years. Several unsuccessful attempts were made to develop in-the-wild BATCH viruses, none of which actually became wild. Nevertheless, common infection types, such as the prepender, appender, and overwriting techniques, were all developed as successful demonstrations. For example, BATCH files can be attacked with the appender technique by placing a goto label instruction at the front of the file and appending the extra lines of virus code to the end after the label.

BATCH viruses are also combined with binary attacks. BATVIR uses the technique of redirecting echo output to a DEBUG script; thus, the virus is a textual BATCH command starting with

rem [BATVIR] '94 (c) Stormbringer [P/S]

This is followed by a set of echo commands to create a batvir.94 file with the DEBUG script. The DEBUG command receives a G GO command via the script and runs the binary virus without ever creating it in a new file.

BAT/Hexvir uses a similar technique, but it simply echoes binary code into a file and runs that as a DOS COM executable to locate and infect other files.

Some other tricky BATCH viruses use the FOR % IN () commands to look for files with the BAT extension and insert themselves into the new files in packed form using PKZIP. BAT/Zipbat uses PKUNZIP on execution of the infected BATCH files to extract a new file called V.BAT, which will infect other files by placing itself in them, again in zipped form. Members of the BAT/Batalia family use the compressor, ARJ, instead. Batalia, however, uses random passwords to pack itself into BATCH files.

Similar to BAT/Zipbat, the BAT/Polybat family also uses the PKZIP and PKUNZIP applications to pack and unpack itself at the ends of files. Polybat is practically a polymorphic virus. The virus inserts garbage patterns of percent signs (%) and ampersands (&) that are ignored during normal interpretation. For instance, the ECHO OFF command is represented in some way similar to the following:

@ec%&%h%&%o o%&%f%&%f
 
@e%&%ch%o&% %&o%f%f&%

BATCH viruses, or at least multi-component viruses with a significant BATCH part, are becoming a bigger threat on Windows systems. For instance, the BAT/Mumu family got especially lucky in corporate environments by using a set of binary shareware tools (such as PSEXEC) in combination with the BAT filedriven virus code.

Several custom versions of BATCH languages do exist, such as the BTM files in 4DOS and 4NT productsjust to name a few which also have been used by malicious attackers.

3.7.7. Instant Messaging Viruses in mIRC, PIRCH scripts

Instant messaging software, such as mIRC, supports script files to define user actions and simplify communications with others. The script language allows the definition of commands whenever a new member joins a conference and is often stored in script.ini in the system's mIRC folder.

IRC worms attempt to create or overwrite this file with an INI file that sends copies of the worm to others on IRC. The command script supports the /dcc send command. This command can be used to send a file to a recipient on a connected channel.

3.7.8. SuperLogo Viruses

In April of 2001, a new LOGO worm was created and mass-mailed to some antivirus companies. It never became wild, though, and there is definitely more than one reason for that. Its author calls herself Gigabyte. Gigabyte has a background of creating other malware and has authored mIRC worms. As you will see, she tried to use her existing mIRC knowledge to create the Logic worm37. The actual worm is created in Super Logo, a reincarnation of the old Logo language for Windows platforms, which claims to be "the Windows platform for kids."

In 1984, I came across several Logo implementations for various 8-bit computers. Our 8-bit school computer, the HT 1080Za Z80-based TRS-80 clone built in Hungaryhad a top screen resolution of 128x48 dots in black and white. Although we had not paid too much attention to the fact at the time, the built-in Basic of HT 1080Z was created by Microsoft in 1980.

The Logo language's primary purpose is to provide drawing with the "Turtle." The Turtle is the pen, and its head can be turned and instructed to draw. For instance, in Super Logo, the following commands are common: HIDETURTLE, FORWARD, PENUP, PENDOWN, WAIT, and so on.

The set of commands can be formed as subroutines and saved in a Logo project file with an LGP extension. The actual project file is a pretokenized binary format, but commands and variable names remain easily readable and stored as Pascal-style strings. The project file can be loaded and executed with the Super Logo interpreter. The original Logo language is well extended in Super Logo to compete with other existing implementations. It can deal simultaneously with multiple graphical objects (see the cute Turtle as an example in Figure 3.9) and move them around the screen with complete mouse support.

Figure 3.9. Main Turtle ICON.

Figure 3.9. Main Turtle ICON.

We can easily determine, however, that the Super Logo language does not support mailing or embedded executables; neither does it support spawning of other executables or scriptsyet Super Logo does support a PRINTTO "XYZ" command. XYZ can be a complete path to a file. With that statement, a Logo program might modify any file, such as winstart.bat, overwriting its content with something like the following:

@cls
@echo You think Logo worms don't exist? Think again!

Get the idea? When the logic.lgp project is loaded and executed, the worm will draw LOGIC on screen with a short message, as shown in Figure 3.10.

Figure 3.10. The payload of the Logic worm.

Figure 3.10. The payload of the Logic worm.

The worm will make sure that a STARTUP.VBS file is created in one of the Windows startup folders and, as such, will be executed automatically the next time Windows is booted. The worm also tries to modify the shortcuts (if any) of some common Windows applications, such as notepad.exe, to start the VBS file without a reboot.

This VBS file propagates the 4175-byte logic.lgp worm project file to the first 80 entries of the Outlook address book. This is a very standard VBS mail propagation that has a set of minor bugs. In 2004, Gigabyte was arrested by Belgian authorities. She is facing criminal prosecution; the penalty might include imprisonment and large fines.

3.7.9. JScript Viruses

One of the reasons to turn off JScript support in a Web browser such as Internet Explorer has to do with JScript viruses. JScript viruses typically use functions via ActiveX communication objects. They can access such objects in a way similar to VBS scripts. For instance, the very first overwriting JScript viruses accessed the file system object via the CreateObject ("Scripting.FileSystemObject") method. This kind of virus was first created by jacky of the Metaphase virus-writing group around 1999.

The File System Object provides great flexibility to attackers. For example, an attacker can use the CopyFile() method to overwrite files. This is how overwriting JScript viruses work. Of course, more advanced attacks have been implemented by the attackers using the OpenTextFile(), Read(), Write(), ReadAll(), and Close() functions. Thus JScript viruses can carry out complex file infection functionality similar to VBS viruses, using a slightly different syntax.

3.7.10. Perl Viruses

Perl is an extremely popular script language. Perl interpreters are commonly installed on various operating systems, including Win32 systems. The virus writer, SnakeByte, wrote many Perl viruses in this script language.

Perl scripts can be very short, but they have a lot of functionality in a very compact form. Attackers can use Perl to develop not only encrypted and metamorphic viruses, but also entry point obscuring ones. The open(), print, and close() functions are used to move newly created content to a target file located in storage with the foreach() function.

For example, the following Perl sequence reads its source to the CurrentContent variable:

open(File,$0);
@CurrentContent=<File>;
close(File);

Perl viruses are especially easy to write because Perl is such a powerful script language to process file content.

3.7.11. WebTV Worms in JellyScript Embedded in HTML Mail

Microsoft WebTV is a special embedded device that allows users to browse the Web over their televisions. In July 2002, a new, malicious WebTV worm appeared, which at first glance was believed to be a Trojan horse. The payload of the worm reconfigured the access number (dial-up number) for the WebTV network to call 911 (the phone emergency center of the U.S.) instead, to perform a DoS attack.

WebTV HTML (Hypertext Markup Language) files can run HREF (hyperlink reference) within the <script> </noscript> tags using WebTV's Internet Explorer. The HREF would normally link a page to another location on the World Wide Web; however, in WebTV JellyScript, these special commands were used to set up the WebTV. Obviously, these commands have not been documented officially, though many people tried to figure out something more about WebTV and published detailed information about the available commands.

This malicious program, NEAT, was later identified as a worm that used the sendpage commands to send HTML mail that contained the worm to others on the WebTV network. The mail was sent by various fake "from" addresses, such as Owner_, minimoo, masonman, and so on.

The worm also introduced many pop-up advertising messages on the recipient's machine before it used the ConfirmPhoneSetup?AccessNumber command to reconfigure the dial-up number to 911 to overload the emergency network with a DoS attack.

3.7.12. Python Viruses

Python is an extremely handy programming language. Unlike shell script, which can be rather limited in functionality because of speed issues, Python is fast and modular. Because of its more general data types, Python can solve a larger problem. It has built-in modules to support I/O, system calls, sockets, and even interfaces to graphical user interface toolkits.

Although Python viruses are not extremely common, a few concept viruses written in Python scripts exist. They typically combine the open(), close(), read(), and write() functions to locate files with listdir() to replicate themselves to other files. However, this virus type is probably the simplest imaginable form for a Python virus, which could utilize much more on the system to implement a variety of infection strategies.

3.7.13. VIM Viruses

A successor of the VI UNIX editor is VIM (VI IMproved). Unlike VI, VIM works on Windows, Macintosh, Amiga, OS/2, VMS, QNX, and other systems. VIM is a text editor that includes almost all VI commands and a lot of new ones.

Among its many new features, VIM supports a very powerful scripting language that has already been used by virus writers to create worms. (The known example of such a worm is an intended worm, which will not replicate.)

3.7.14. EMACS Viruses

Just like VIM, newer versions of the EMACS editor also support scripting. This kind of virus is not common, but proof-of-concept creations exist for the environment.

3.7.15. TCL Viruses

TCL (Tool Command Language) is a portable script language that can run on systems such as HP-UX, Linux, Solaris, MAC, and even Windows. TCL is very similar language to Perl. TCL scripts are executed by the tclsh interpreter.

The first virus implemented in TCL (pronounced "tickle") was Darkness, a very simple virus written by Gigabyte in 2003. TCL supports foreach(), open(), close(), gets(), and puts() functions, which are all TCL script viruses need to repli cate themselves.

3.7.16. PHP Viruses

PHP (a recursive acronym for PHP: hypertext preprocessor) is an open-source, general-purpose scripting language. It is well suited to Web development and can be embedded into HTML. PHP is different from client-side scripting, such as JScript, because PHP runs on the server instead of on the local machine. However, PHP also can be used in command-line mode without any server or browser.

PHP/Caracula was introduced in 2001 by the virus writer, Xmorfic, of the BCVG virus-writing group. The virus spreads as an overwriter and creates mIRC scripts to spread as a worm.

PHP viruses typically use the fopen(), fread(), fputs(), fclose() sequence to write themselves to new files, which they locate with direct action infection techniques using the opendir(), readdir(), closedir() sequence in combination with the file_exists() function.

There are examples of polymorphic PHP viruses, such as PHP/Feast, written by the virus writer, Kefi, in 2003. Feast looks for files to overwrite, but it overwrites them with an evolved copy of itself. In particular, each variable in the body of the virus will mutate to random character sequences.

3.7.17. MapInfo Viruses

MapInfo, developed by Geo-Information Systems, is not a widely used application. It is used for mapping and geographical analysis. The MPB/Kynel38 virus demonstrated that it is possible to make this platform virulent. Kynel was created by Russian virus writers in late 2003.

MapInfo Professional has it own development environment called MapBasic, which is a Basic-like language. MapBasic is very powerful and, as expected, supports Open, Close, Read, and Write to both ASCII and binary files. It also supports API calls from other DLLs, dynamic data exchange (DDE), and object linking and embedding (OLE). When these programs are compiled, a new executable, MBX, is created, called MapBasic eXecutable. As expected, however, these files can be only executed by MapInfo.

The MPB/Kynel virus infects new tables. It enumerates for new tables each time the function WinChangedHandler() is called. WinChangedHandler() is triggered whenever the user changes something in a document. The virus hooks this function and uses this moment to create a copy of itself in the newly enumerated tables, as tablename.mif. It then inserts a Run Application line to this MBX executable into the TAB file of MapInfo documents. In this way, the MBX file will be run whenever the infected document is opened.

MapInfo is available on both Windows and Macintosh platforms. It is not very common, but like the SuperLogo virus threats, it demonstrates virus writers' interest in all platforms as possible targets.

3.7.18. ABAP Viruses on SAP

The first virus known to attempt to infect SAP was ABAP/Rivpas, written in April 2002. It is a proof-of-concept virus that is based on the Advanced Business Application Programming scripting language. This creation had a few intentional bugs and did not have a chance to replicate. However, other variants with the fix appeared quicklythat were real viruses. In about 20 lines of script, the virus replicates in databases by copying itself from one database to another.

3.7.19. Help File Viruses on WindowsWhen You Press F1

A very powerful but surprisingly unpopular virus infection target is Windows Help files. Windows Help files are in binary format and contain a script section. The scripts have access to Windows API calls. Most Help viruses inject a little script into the SYSTEM directory of HLP files. This script section will be executed the next time the Help file is loaded. As a result, such a virus is triggered simply by pressing the F1 button in an application that is associated with an infected HLP file.

The major trick of such viruses is to define functions for their use, such as EnumWindows() of the USER32.DLL. For example, the Dream virus uses this technique to infect Windows Help files.

The RR ('USER32.DLL','EnumWindows','SU') script line will define an EnumWindows() callback for use. Then an EnumWindows(virusbody) call is made by the script, which will execute the "string," the virus body, via the callback. Thus execution can continue in native code, getting out of its script context.

The first virus to infect Windows Help files was the 32-bit polymorphic virus, W95/SK39, written in Russia. Unlike Demo, SK uses WinExec() functions to execute a set of command.com /c echo commands to print code into a binary for execution outside of the HLP file in the root directory. The first native Help infector, the HLP/Demo virus, also appeared to replicate from one Windows Help file to another.

3.7.20. JScript Threats in Adobe PDF

The PDF format is used by Adobe Acrobat products. In 2003, the {W32,PDF}/Yourde virus infected PDF files using an executable that is dropped by a JScript exploit (a PDF form is also dropped). The binary is executed by the form when the form is loaded. The complete version of the Adobe Acrobat installation is required to infect files because the virus relies on the user's saving the infected file. (Saving the infected file cannot be forced externally with Adobe Acrobat. Additionally, the reader-only version cannot save PDFs at all.)

The JScript runs automatically by Acrobat itself, without relying on an external interpreter such as Windows Scripting Host; thus, the vulnerability is Acrobat version-specific.

3.7.21. AppleScript Dependency

AppleScript is used on Macintosh systems to support local scripting. Not surprisingly, some threats can replicate only if AppleScript is installed. For example, the AplS/Simpsons@mm worm is written in AppleScript. After it is executed, it utilizes Outlook Express or Entourage to send a copy of itself to everybody in the address book.

This particular worm was not reported frequently from the wild; however, AppleScript threats expose Mac users to similar security problems as those of other powerful script languages, such as VBS on Windows.

3.7.22. ANSI Dependency

IBM PCs introduced ANSI.SYS drivers that fulfill the needs of many users by providing the ability to reconfigure certain key functions via escape (ESC) sequences. These sequences are usually stored in a file with an ANS extension. ESC sequences can start with a special escape code (accessible via holding the Alt key and typing40 on the numeric keypad).

Whenever the line DEVICE=ANSI.SYS is included in the CONFIG.SYS file, the support to execute ESC sequences is available. For example, a simple ANSI sequence can redefine the N key to Y and the n key to y. Consequently, the user would give the wrong answer to confirmation questions asked by applications. This would be done the following way:

ESC [78;89;13p ESC [110;121;13p

This kind of redefinition might be desirable for other keys; the Enter key also can be redefined, and del *.* or format c: might be displayed when Enter is pressed.

ANSI sequences also can be used to redefine entire commands. Thus, the wrong command name is displayed when a different command is typed.

3.7.23. Macromedia Flash ActionScript Threats

A newcomer on the malicious scene is ActionScript malware. The LFM virus uses the ActionScript of Flash files to create and run a DOS COM executable. Such threats, then, are fairly limited because they introduce several other dependencies.

For instance, LFM41 needs to be downloaded to the local machine from a Web page. It can only infect files if it is downloaded to a folder that contains other clean files and only as long as the external file V.COM can run properly.

3.7.24. HyperTalk Script Threats

"An excellent beginning tool to teach average people, from 5th grade, on how to control their computers as masters rather than slaves."

Steve Wozniak


HyperCard is a versatile environment that supports a scripting language called HyperTalk. Created by Bill Atkinson, HyperTalk is one of the most linguistic script languages available. Not surprisingly, some of the oldest computer viruses were written in HyperTalk. The first HyperTalk script virus was Dukakis, written around 1988.

HyperTalk scripts activate based on event handlers associated with a name in the stack. The scripts are stored in HyperCard data files, called stacks, which are in binary format. But the script code itself is purely textual inside the stacks.

For instance, upon opening a HyperCard stack, the openStack event handler can be invoked. This is fairly similar to how Microsoft Office products work with macros, though HyperCard is much more than a scripted text editor. It can be used to create many different projects with menus and database front-ends for cards (records in the database), and different stacks can share their functions with each other. HyperCard extended the promise of easy-to-use systems to easy-to-program environments.

HyperTalk script code is interpreted between the event handler tags of the keywords on and end. Here is an example:

on openStack
	ask "What is your name?"
	put it * it into field "Name"
end openStack
 

HyperCard was developed well before Microsoft's Visual Basic. Like Microsoft Office products' global templates (or should I say, the other way around?), HyperCard supports a so-called Home stack, which contains an arsenal of useful scripts. Most HyperCard viruses infect the Home stack by copying themselves into it with the help of put keywords. After this, they can copy themselves to the newly opened stacks. Any stack can be a Home stack, as long as its name is home.

The Dukakis virus uses the following lines to select its script body for a new copy:

put the script of stack "home" into temp2
get offset (""-** The HyperAvenger **-,"temp2)
put char it to it+2426 of temp2 into theCode
 

This script snippet looks for the offset where the virus code starts in the home stack and copies the virus script (2426 bytes) from that location to the variable, theCode. The virus then only needs to copy theCode into another stack later. The this stack is a reference to the currently opened stack. Its content can be accessed with yet another put command.

Several other HyperCard viruses exist on the Mac; the most famous ones are the Merry Xmas and 3 Tunes families.

3.7.25. AutoLisp Script Viruses

HyperTalk script viruses are very readable and easy to understand; AutoLisp threats are a little more difficult to read. A few script viruses, such as Pobresito42 and ALS/Burstead43, use the AutoLisp scripting feature of AutoCAD environments.

Note

Newer versions of AutoCAD also support VBA.

Pobresito was written during the summer of 2001. Burstead appeared much later, during December 2003 in Finland, and managed to infect a few major corporations that run modern versions of AutoCAD. AutoCAD is rather expensive software, and it is not used as widely as other script language environments.

AutoLisp scripts are stored in text files with the LSP extension. Burstead.A looks for the location of the base.dcl file in the AutoCAD search path, using the findfile function:

(setq path (findfile "base.dcl"))

This is done to locate the directory where the other LISP files can be found. Such viruses attempt to modify files with a load command to load their own LSP file. Thus, whenever the modified LSP file is executed, the virus can get control via the load command:

(load "foobar")

Here, foobar is the name of a file that has an LSP extension in the default folder.

Obviously, AutoLisp allows write-line functions, which could be used by attackers for different kinds of infection methods.

3.7.26. Registry Dependency

Some viruses are implemented to infect from Windows Registry files. The Registry is a central storage database on Windows systems. Previous versions of Windows mostly used INI files to store application settings. On modern Windows systems, the Registry database, called a hive, is used to store such information in trees.

An interesting capability of the Registry is that it stores file paths for system startup time execution under several different subentries of the hive, such as HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\RUN.

Keys like this are commonly attacked by all kinds of malicious code, and other locations of the Registry provide similar attack points for virus writers. For instance, the W32/PrettyPark worm family modifies the Registry key located at HKEY_CLASSES_ROOT\exefile\shell\open\command to get executed whenever an EXE file is run by the user. The worm executes the program that the user wanted to runbut only after itself.

Registry-dependent viruses use such keys to insert a reference to system commands for later execution. Registry entry installation files are stored in textual format, and they contain information about keys and values to install via Regedit. Such viruses are implemented as a single command entry in the REG files. Regedit will interpret the commands in the REG file; as a result, a new entry will be stored in the Registry for later execution.

The malicious entry uses standard system commands with the passed parameters to look for other REG files on local and network sites and modifies them to include the command string to REG files. This technique is based on the fact that DOS batch commands can be executed from the Registry.

3.7.27. PIF and LNK Dependency

Viruses also attack PIFs (program information files) and LNKs (link files) on Windows systems. PIFs are created when you create a shortcut to or modify the properties of an MS-DOS program and allow you to set default properties, such as font size, screen colors, and memory allocation. PIFs also store the path of the executable to run.

Some viruses attack PIFs by modifying their internal links that point to an executable. The approach of typical PIF creations is to run commands via command.com execution, using this link path. They use the copy command to copy the PIF to other locations on the local disk, such as Windows, mIRC, or P2P folders, or to attack network resources.

The LNK (link shortcut files) on Windows 95 and above can be attacked in a manner similar to PIFs.

3.7.28. Lotus Word Pro Macro Viruses

Another class of macro viruses attacks Lotus Word Pro documents of Lotus SmartSuite. For example, the LWP/Spenty virus only replicates in the Chinese version of Word Pro. The virus infects files as they are opened by hooking the DocumentOpened() and DocumentedCreated() macros. The security settings of the document are changed in such a way that a password is set to 720401. In this way, the virus attempts to prevent any modifications of infected objects.

The Spenty virus became widespread in China in 2002. Spenty introduced the problem of Word Pro file parsing for antivirus producers. Word Pro uses a script-like macro language.

3.7.29. AmiPro Document Viruses

Viruses do not frequently attack AmiPro documents, and there is a good reason for this. Unlike most text editors, AmiPro saves documents and macros into two separate files. The documents are stored in files with SAM extensions, and the macro files are kept in files with SMM extensions. AmiPro viruses must connect the two files in such a way that when the SAM file is opened, it invokes execution of the SMM.

The APM/Greenstripe virus consists of four functions: Green_Stripe_Virus(), Infect_File(), SaveFile(), and SaveAsFile(). The SaveFile() and SaveAsFile() functions are hooks installed with the ChangeMenuAction() function, and they correspond to the Save and Save As menus. The virus uses the AssignMacroToFile() function to establish the connection between SAM and SMM files. The virus uses the FindFirst() and FindNext() functions to search for new SAM files to attack.

AmiPro viruses are much less likely to spread via e-mail than Microsoft Office macro viruses because of AmiPro's use of separate document and macro files, as opposed to a single container.

3.7.30. Corel Script Viruses

Corel Draw products also support a script language that is saved in files with CSC extensions. (In addition, contemporary versions of Corel Draw also support VBA.) Corel Script viruses typically look for victim files with the FindFirstFolder() function. The CSC/CSV virus identifies infected victim files by checking for the "REM ViRUS" marker in the CSC files.

If CSV does not find the marker in the file, it will attempt to infect it by prepending its script with print # commands. It then looks for the next file with the FindNextFolder() function. In practice, the virus creates a new host script with the same name, copies itself into it, and then appends the original host script to itself.

REM ViRUS GaLaDRieL FOR COREL SCRIPT bY zAxOn/DDT

The CSC/PVT virus follows a similar strategy. It uses the same functions to look for new files to infect. It even checks potential victims for REM PVT anywhere in the script before attempting to infect them.

REM PVT by Duke/SMF

Unlike CSV, the PVT virus appends itself to the end of the script. As a result, the original script runs first, and upon the exit of the original script, the appended script code is executed.

3.7.31. Lotus 1-2-3 Macro Dependency

Although there are widespread rumors about a Lotus 1-2-3 macro virus with the name Ramble, the actual threat is not viral. Rather, the known threat is a dropper of a BATCH virus. (This is not to say, however, that Lotus 1-2-3 macros would not be able to infect another set of Lotus 1-2-3 worksheets.)

The BAT/Ramble virus dropper, written by "Q The Misanthrope," works the following way: First, the user opens a Trojanized Lotus 1-2-3 document. The malicious Lotus macro activates upon when the document is opened. The malicious macro is then inserted in the A8167 ... A8191 range of the sheet. In this way, it is not visible to the user. After the macro runs, it creates a BATCH virus in the C:\WINSTART.BAT file.

After the BATCH virus is created by the dropper, the macro dropper code removes itself from the sheet, using the /RE command (Range Erase). It also removes the \0 macro name that automatically runs whenever a worksheet is opened.

It must be noted that newer versions of Lotus 1-2-3 have a different worksheet format, which has allowed a macro up-conversion problem to be introduced on this platform.

3.7.32. Windows Installation Script Dependency

The 32-bit Windows versions introduced a new installation script language in INF files. These scripts are invoked via the Windows Setup API. The install scripts have various sections for installation and uninstallation. The script can be generated manually or by using tools such as Microsoft's BATCH.EXE or INF generators.

One of the many features of installation scripts is the use of the autoexec.bat file. Commands can be directly installed into and removed from the automatically executed batch file on system startup. This is done via the UpdateAutoBat command in the Install section associated with a named section of the script. That section contains commands to delete linesas well as to add new malicious commandswith CmdDelete and CmdAdd, respectively. (CmdDelete is used to delete the malicious code in case it was inserted into the file in a previous attack.)

The virus writer, 1nternal, introduced a couple of viruses, such as the INF/Vxer family, that take advantage of INF file infection via batch execution. The CmdAdd entries are used to deliver the source of the viral batch lines to AUTOEXEC.BAT. As a result, on each system startup the virus will look into the Windows\INF folder to infect other INF files.

3.7.33. AUTORUN.INF and Windows INI File Dependency

AUTORUN.INF files and Windows INI files are very similar in structure to Windows installation scripts. Some viruses modify the AUTORUN.INF file to get auto-launched whenever a removable disk is loaded.

AUTORUN.INF was a new feature in Microsoft Windows 95 systems. It was primarily designed to run an application automatically whenever a user inserted a CD into the CD-ROM drive. Whenever an AUTORUN.INF file exists in the root directory of a removable disk type, it is executed by most 32-bit Windows systems, although some of the newer editions of Windows primarily support the CD-ROMs only.

There are a couple of Registry entries associated with Autorun functionality. Whenever such options are enabled, the AUTORUN.INF is interpreted, and its Autorun section is invoked. The Autorun section supports an Open command that can be used to run an executable via the feature. This is the command that malicious code sets alter to be invoked automatically.

The HKLM\Software\Microsoft\Windows\CurrentVersion\Policies\Explorer Registry entry must be modified with a NoDriveAutoRun or NoDriveTypeAutoRun entry set to customized values, such as 0xFF, to turn off the feature for each drive.

Windows INI files are attacked for a similar reason. For instance, the WIN.INI supports a Windows section. In that section, a run= entry can be used to RUN an application during the startup of Windows. Malicious Trojans often modify this entry to load themselves via system startup.

3.7.34. HTML (Hypertext Markup Language) Dependency

HTML does not support functionality to malicious attacks in its strict form, but it supports embedded scripting, such as VBScript or JScript. Several viruses attack HTML files. One of the most successful such attacks was implemented in the W32/Nimda worm in September 2001.

Nimda attacks HTML files by inserting a little JScript section into them. This section opens an EML file that contains a malformed MIME exploit. The JScript code uses the window.open function to launch the EML file. The result is an automatically executed worm executable upon accessing a compromised HTML page using a vulnerable Internet Explorer.

Some HTML threats get invoked from HTML files via HREF entries. They trick the user into clicking something that will, in turn, execute the referenced malicious code.

The first viruses that attacked HTML files were created by the virus writer, 1nternal. Although some vendors initially classified these threats as HTML viruses, the proper classification is based on the actual script language used, such as VBS.

3.8. Vulnerability Dependency

Fast spreading worms, such as W32/CodeRed, Linux/Slapper, W32/Blaster, or Solaris/Sadmind, can only infect a new host if the system can be exploited via a known vulnerability. If the system is not vulnerable or is already patched, such worms cannot infect them. However, several worms, such as W32/Welchia, exploits multiple vulnerabilities to invade new systems. Therefore, the system might remain exploitable by at least one of the nonpatched vulnerabilities.

Chapter 10 , "Exploits, Vulnerabilities, and Buffer Overflow Attacks," is dedicated to computer virus attacks that utilize exploits to spread themselves.

3.9. Date and Time Dependency

Tyrell: What seems to be the problem?
Roy: Death!
Tyrell: Death. Well, I am afraid that's a little out of my jurisdiction.
Roy: I want more life...

Blade Runner, 1982


Several viruses replicate only within a certain time frame of the day. Others refuse to replicate before or after a certain date. For instance, the W32/Welchia worm only attempted to invade systems until January 2004.

Another example is the original W32/CodeRed worm, which was set to kill itself in 2001. However, other variants of the worm were modified to introduce an "endless life" version without this limitation. The life cycle manager of worms is discussed in more detail in Chapter 10.

3.10. JIT Dependency: Microsoft .NET Viruses

A natural evolution of Microsoft's ambitious computer language and execution environment developments is .NET Framework's Just-in-Time compilation. .NET uses executables that are somewhat special portable executable (PE) files. Currently, such executables contain a minimal architecture-dependent code (a single API call to an init function)44. Elsewhere, the compiled PE file contains MSIL (Microsoft Intermediate Language) and metadata information. The first viruses that targeted .NET executables were not JIT-dependent. For example, Donut45 was created by Benny in February of 2002. This virus attacked .NET executables at their native entry point, replacing _CorExeMain() import (which currently runs the JIT initialization) with its own code and appending itself to the end of the file. A few months later, JIT-dependent viruses appeared that could infect other MSIL executables. The first such virus was written by Gigabyte.

W32/HLLP.Sharpei40 implements a simple prepender infection technique. The MSIL code of the virus is JIT compiled by the CLR (common language runtime) of .NET Framework. JIT does not compile the module when it is loaded, but only when a particular method is first used. Only then is the MSIL code translated to the local architecture, and native code execution begins. Figure 3.11 shows the payload message of the W32/HLLP.Sharpei virus.

Figure 3.11. The payload message of Sharpei.

Figure 3.11. The payload message of Sharpei.

In 2004, new infection techniques appeared that targeted .NET executables. These new viruses parasitically infect MSIL programs. It is not surprising that such viruses did not show up any earlier because it is much more difficult to implement them. In fact, some researchers argued that such complex MSIL viruses will never appear. For example, the metamorphic virus, MSIL/Gastropod, uses the System.Reflection.Emit namespace to rebuild its code and the host program to alter the appearance of the virus body. Gastropod is a creation of the virus writer, Whale, who also authored the W95/Perenast viruses. (Whale was captured by the Russian police in November 2004. He was required to pay $50.)

On the other hand, the MSIL/Impanate virus is aware of both 32-bit and 64-bit MSIL files and infects them using EPO (Entry Point Obscuring) techniques without using any library code to do so. MSIL/Impanate was authored by the virus writer, roy g biv.

Note

More information on infection techniques is available in Chapter 4, "Classification of Infection Strategies." Metamorphic viruses are discussed in Chapter 7, "Advanced Code Evolution Techniques and Computer Virus Generator Kits."

3.11. Archive Format Dependency

Some viruses might not be able to spread without packed files. A few viruses only infect archive file formats. The majority of such viruses infect binary files, as well as ZIP, ARJ, RAR, and CAB files (to name the most common archive formats).

Spreading viruses in archive files gained popularity when Microsoft implemented a virus-protection feature for Outlook. Outlook no longer runs regular executable extensions, and recent versions simply do not provide such attachments to end users. However, virus writers quickly figured out that they could send packed files, such as zipped files, which Outlook does not remove from e-mail messages.

Some tricky mailer or mass-mailer worms, such as W32/Beagle@mm46, even use password-protected attachments. Because the password and instructions on how to use it are available to the user, the malicious code can trick the user into running an application, such as Winzip, and typing the given password to unpack and then execute its content. Such viruses often carry their own packer engines, such as InfoZIP libraries, to create new packer containers.

File infector viruses typically insert a new file into an archive file. For example, ZIP infection is simple because ZIP stores a directory for each file in the container's archive. By locating such headers, viruses can insert new files into the project and trick the user into running the files. For example, viruses might insert a file with a name such as "readme.com" and simply hope that the user will execute it to read "the documentation" of the package.

Some very complex viruses, such as the Russian virus, Zhengxi47, infect self-extracting EXE files with multiple archive infection capability, including the packed file format, HA, inside such binary files.

3.12. File Format Dependency Based on Extension

Some viruses have extension dependency. Depending on the extension, a file might be placed in a different execution environment. A simple example of this is COM and BAT (ASCII) extension replacement. As a COM file, the file can function as binary. With a BAT extension, it looks like an ASCII BATCH file.

Other common examples of this kind of dependency are as follows:

  • COM/VBS
  • COM/OLE2 (a trivial variant has the header of an OLE2 file)
  • HTA/SCRIPT
  • MHTML (Binary+Script)
  • INF/COM
  • PIF/mIRC/BATCH

This method is often used as an attempt to confuse scanners about the type of object they are scanning. Because scanners often use header and extension information to determine the environment of the file, their scanning capabilities (such as heuristics analysis) might be affected if they do not identify the type of object properly.

For example, PIF worms typically use mIRC, BAT, or even VBS combinations, based on extension dependency. A file with a PIF extension will function as a PIF. However, with a BAT extension, it will run as a BATCH instead, and the PIF section in the front of the file is simply ignored. Other examples include an mIRC and BATCH combination based on extension dependency tricks.

Figure 3.12 demonstrates how the PIF is organized for extension dependency. The Phager virus uses the previously discussed technique.

Figure 3.12. A high-level structure of a PIF with extension dependency.

Figure 3.12. A high-level structure of a PIF with extension dependency.

Another example that involves extension dependency tricks is INF/Zox, which infects Windows INF files. The main virus body is stored in INF/Zox in an INF file called ULTRAS.INF. However, this INF file can run as a DOS COM executable when renamed.

In the INF form, the virus uses CmdAdd (add command) entries to attack AUTOEXEC.BAT. It also uses the CopyFile entry of the DefaultInstall section to copy the ULTRAS.INF file as Z0X.SYS. The trick is that the new AUTOEXEC.BAT section will rename the Z0X.SYS file to Z0X.COM and run it. The virus starts with a comment entry in the INF form using a semicolon (;) (0x3b).

When the file is loaded as a DOS COM file, the marker is ignored as a compare (CMP) instruction. After the comment, binary code is inserted that "translates" to a jump (JMP) instruction to the binary portion of the virus code at the end of the file:

13BE:0100 3B00		CMP	AX,[BX+SI]	; Compare instruction ignored
13BE:0102 E9F001	JMP	02F5		; Jump to binary virus start

Zox is a direct-action overwriter virus. It overwrites INF files with itself.

3.13. Network Protocol Dependency

Nowadays, the Internet is the largest target of virus attacks. TCP and UDP protocols are used by malicious mobile code48 to attack new targets. There are some old worms, however, such as the Father Christmas worm, that could not spread on the Internet because they relied on DECNET protocols thus, computer worms are typically network protocol dependent.

3.14. Source Code Dependency

Some tricky computer viruses, such as those of the W32/Subit family, infect source files such as Visual Basic or Visual Basic .NET source files. Other viruses spread in C or Pascal sources. These threats have a very long history.

Consider the C source file shown in Listing 3.2, in clean and infected form.

Listing 3.2. A Source Infector Virus

#include <stdio.h>
void main(void)
{
 printf("Hello World!");
}

The infected copy would look similar to the following:

#include <stdio.h>
void infect(void)
{
 /* virus code to search for *.c files to infect */
}
void main(void)
{
 infect(); /* Do not remove this function!! */
 printf("Hello World!");
}

After the infected copy is compiled and executed, the virus will search for other C sources and infect them.

Source code viruses typically use a large string to carry their own source code, defined as a string. The W32/Subit family uses a concatenated string to define its source code, starting with the following lines:

J = "44696D20532041732053797374656D2E494F2E53747265616D5772697465720D"
J = J & "0A44696D204F2C205020417320446174650D0A44696D2052204173204D696372"
J = J & "6F736F66742E57696E33322E52656769737472794B65790D0A52203D204D6963"

This will be converted to Visual Basic .NET source code:

Dim S As System.IO.StreamWriter
Dim O, P As Date
Dim R As Microsoft.Win32.RegistryKey
:
:

The source code infectors replicate in two stages. The first stage is the running of an already infected application with the embedded virus code. After the New() function is called in the infected program, the virus code will search for other Visual Basic .NET project source files on the system and copy its own source code into those files. In the second stage, Subit inserts a function call to run the virus body itself. As a result, the virus can multiply again after the compromised source is compiled and executed on a system.

The major problem with such viruses is that they can appear virtually anywhere in the application, inserted somewhere in the code flow. The code of the virus will be translated differently, depending on the language and the compiler version and options, making the virus look different in binary form on various systems.

3.14.1. Source Code Trojans

The idea of source-only viruses originates in the famous "self-reproducing program" ideas of Ken Thomson (co-author of the UNIX operating system). In his article, "Reflections on Trusting Trust,"49 Thomson introduced the idea of C programs, so-called "guines," that print an exact copy of their source as an output. The idea is nice and simple. The program source's code is defined as a string that is printed to the output with the printf() function.

Thomson also demonstrated a CC (C compiler) hack. The idea was to modify the source code of CC in such a way that whenever the modified compiler binary is used, it will do the following two things:

  • Recognize when the source code of login was compiled and insert a Trojan function into the original source. The Trojanized version of login would let anybody log in to the system with his or her own password. Furthermore, it would let an attacker connect with a specific password for any user account.
  • Introduce source modifications to the CC sources on the fly. Thus, the modification in the source code was available only during the compilation, and it was quickly removed after the compiler's source was compiled.

Source code infectors use the Thomson principle to inject themselves into application source files. Such viruses will be more relevant in the future as open source systems gain popularity.

3.15. Resource Dependency on Mac and Palm Platforms

Some computer viruses are dependent on system resources. For example, the Macintosh environment is a very rich platform of resources. Various functions are implemented in the form of resources that can be edited easily via Resource Editors. For instance, there is a menu definition resource on the Mac. Such definitions get invoked according to the applications' menu items. Macs store information in two forks for each file on the disk: the data fork and the resource fork. Resources, stored in the resource fork, contain code. Because even data files can contain resources on the Mac, the distinction between data and code files is not as clear-cut as it is for the PC, for example.

The MDEF (menu definition) viruses on the Apple Macintosh use the technique of replacing menu definitions with themselves. Thus, the virus code gets invoked whenever a particular menu is activated.

Table 3.3 contains common resource types on the Mac. It is an incomplete list of the most commonly attacked resources by malicious code on the Mac platform36.

Table 3.3. Common Resource Types on the Mac

Resource TypeDescription
ADBSApple Desktop Service
CDEFControl Definition Function
DRVRDevice Driver
FMTRDisk Format Code
CODECode Segment
INITInitialization Code Resource
WDEFWindows Definition Function
FKEYCommand-Shift-Number Function
PTCHROM Patch Routine
MMAPMouse Function

Similar dependencies exists in the Palm viruses. The Palm stores executable applications in PRC files with special application resources. When the application is executed, the resources are accessed from it. In particular, the DATA and CODE resources are important for program execution. The virus Palm/Phage, discovered in September 2000, reads its own DATA and CODE resources and overwrites other applications resources with these. This resource dependency is very similar to the one on the Macintosh platforms.

3.16. Host Size Dependency

To infect applications accurately, many computer viruses have limits on how small or how large the applications they infect can be. For instance, COM files on DOS cannot load if they are larger than a code segment. Consequently, most DOS viruses introduce limits to avoid infecting files that would grow past acceptable limits if the virus code were included in them.

In other cases, viruses such as W95/Zmist use an upper size limitation, such as 400KB, for a file. This enhances the virus infection's reliability by reducing the risks involved in infecting files that are too large. Furthermore, host size dependency also can be used as an antigoat technique (see more details in Chapter 6, "Basic Self-Protection Strategies") to avoid test files that computer virus researchers use.

3.17. Debugger Dependency

Some viruses use an installed debugger, usually DEBUG.EXE of DOS, to convert themselves from textual to binary forms or simply to create binary files. Such threats typically use a piped debug script input to DEBUG, such as

DEBUG <debugs.txt

The input file contains DEBUG commands such as the following:

N example.com
E 100 c3 RCX
1
W
Q
 

This script would create a 1-byte long COM file containing a single RET instruction. A single RET instruction in a COM file is the shortest possible COM program. COM files are loaded to offset 0x100 of the program segment. Before the program segment, the PSP (program segment prefix) is located at offset 0; thus, a single RET instruction will give control to the top of the PSP, assuming that the stack is clear and a zero is popped. The trick is that the top of the PSP contains a 0xCD, 0x20 (INT 20 Return to DOS interrupt) pattern:

13BA:0000 CD20		INT	20

So whenever the execution of a program lands at offset 0, the program will simply terminate.

Note

The N command is used to name an output file. The E command is used to enter data to a memory offset. The CX register holds the lower 16-bit word of the file size, and BX holds the upper 16-bit word. The W command is used to write the content to a file. Finally, the Q command quits the debugger. Viruses typically use several lines of data that use the Enter command to create the malicious code in memory.

The virus writer, Vecna, used this approach in the W95/Fabi family to create EXE files using Microsoft Word macros and debug scripts in combination. From the infected MS Office documents, Fabi creates a new file in the root directory as FABI.DRV and uses the PRINT commands to print the debug script into it:

OPEN "C:\FABI.DRV" FOR OUTPUT AS 1
PRINT #1, "N C:\FABI.EX"
PRINT #1, "E 0100 4D 5A 50 00 02 00 00 00 04 00 0F 00 FF FF 00 00"
PRINT #1, "E 0110 B8 00 00 00 00 00 00 00 40 00 1A 00 00 00 00 00"

The content of the FABI.DRV will look like the following:

N C:\FABI.EX
E 0100 4D 5A 50 00 02 00 00 00 04 00 0F 00 FF FF 00 00 ; DOS EXE header
E 0110 B8 00 00 00 00 00 00 00 40 00 1A 00 00 00 00 00
 
[Virus body is cut from here]
 
E 4D20 10 0F 10 0F 10 0F 10 0F 10 0F 10 0F 10 0F 10 0F
E 4D30 10 0F 10 0F 10 0F 10 FF FF FF
RCX
4C3A
W
Q
 

Another BATCH file is also created by the macro in a manner similar to the debug script. This contains the command to drive DEBUG with the debug script:

DEBUG <C:\FABI.DRV >NUL

Note that DEBUG cannot create EXE files. At least, it cannot save them from memory with an EXE suffix. It can, however, save the content of memory easily without an EXE extension, which works when the file is loaded without an extension in the first place. This is the approach that W95/Fabi uses. It first saves the file with DEBUG as FABI.EX and uses yet another BATCH file to copy FABI.EX as FABI.EXE to run it.

Evidently, if DEBUG.EXE is not installed on the system or is renamed, some of these viruses cannot function completely or at all.

3.17.1. Intended Threats that Rely on a Debugger

Some malicious code might require the user to trace code in a debugger to replicate the virus. In some circumstances, this might happen easily in the case of macro threats. For instance, an error occurs during the execution of the malicious macro. Microsoft Word might then offer the user an option to run the macro debugger to resolve the cause of the problem. When the user selects the macro debugger command and traces the problem, the error might be bypassed. As a result, the virus code can replicate itself in this limited, special environment. There is an agreement between computer virus researchers, however, that such threats should be classified as intended.

3.18. Compiler and Linker Dependency

Several binary viruses spread their own source code during replication. This technique can be found in worms that target systems where binary compatibility is not necessarily provided. To enhance the replication of such worms on more than one flavor of Linux, the Linux/Slapper worm replicates its own source code to new systems. First, it breaks into the system via an exploit code, and then it uses gcc to compile and link itself to a binary. The worm encodes its source on the attacker's system and copies that over to the target system's temporary folder as a hidden file. Then it uses the uudecode command to decode the file:

/usr/bin/uudecode -o /tmp/.bugtraq.c /tmp/.uubugtraq;

The source code is compiled on the target with the following command:

gcc -o /tmp/.bugtraq /tmp/.bugtraq.c lcrypto;

The virus needs the crypto library to link its code perfectly, so not only must gcc be installed with standard source and header files on the target system, but the appropriate crypto libraries must also be available. Otherwise, the worm will not be able to infect the target system properly, although it might successfully penetrate the target by exploiting an Open SSL vulnerability.

The advantage of the source code-based infection method is the enhanced compatibility with the target operating system version. Fortunately, these techniques also have disadvantages. For example, it is a good practice to avoid installing sources and compilers on the path (unless it is absolutely necessary), greatly reducing the impact of such threats. Many system administrators tend to overlook this problem because it looks like a good idea to keep compilers at hand.

3.19. Device Translator Layer Dependency

Many articles circulated that concluded that no Windows CE viruses would ever be implemented, and for many years we did not know of any such creations. However, in July 2004, the virus writer, Ratter, released the first proof of concept virus, WinCE/Duts.1520, to target this platform, as shown in Figure 3.13.

Figure 3.13. The message of the WinCE/Duts virus on an HP iPAQ H2200 Pocket PC.

Figure 3.13. The message of the WinCE/Duts virus on an HP iPAQ H2200 Pocket PC.

Many recent devices run WinCE/Duts successfully because the ARM processor is available on a variety of devices, such as HP iPAQ H2200 (as well as many other iPAQ devices), the Sprint PCS Toshiba 2032SP, T-Mobile Pocket PC 2003, Toshiba e405, and Viewsonic V36, among others. Several additional GSM devices are built on the top of Pocket PC.

Interestingly, WinCE/Duts.1520 is able to infect Portable Executable files on several systems, despite the fact that the virus code looks "hard-coded" to a particular Windows CE release. For instance, the virus uses an ordinal-based function importing mechanism that would appear to be a serious limitation in attacking more than one flavor of Windows CE. In fact, it appears that the author of the virus believed that WinCE/Duts was only compatible with Windows CE 4. In our tests, however, we have seen the virus run correctly on Windows CE 3 as well.

It was not surprising that Windows CE was not attacked by viruses for so long. Windows CE was released on a variety of processors that create incompatibility issues (an inhomogeneous environment) and appear somewhat to limit the success of such viruses.

In addition, Windows CE does not support macros in Microsoft products such as Pocket Word or Pocket Excel, but there might be some troubling threats to come.

Prior to Windows CE 3.0, it was painful to create and distribute Windows CE programs because of binary compatibility issues. The compiled executables were developed in binary format as portable executable (PE) files, but the executable could only run on the processor on which it was compiled. So for each different device, the developer must compile a compatible binary. This can be a time-consuming process for both the developer and the user (who is impatient to install new executables).

The CPU dependency is hard-coded in the header of PE files. For instance, on the SH3 processor, the PE file header will contain the machine type 0x01A2, and its code section will contain compatible code only for that architecture.

Someone can easily create an application that is compiled to run on an SH3 platform; however, Windows CE was ported to support several processors, such as the SH3, SH4, MIPS, ARM, and so on. Consequently, a native Windows CE virus would be unable to spread easily among devices that use different processors. For example, WinCE/Duts.1520 will not infect SH3 processor-based systems.

Virus writers might be able to create a Win32 virus that drops a Windows CE virus via the Microsoft Active Sync. Such a virus could easily send mail and propagate its Intel version (with an embedded Pocket version), but it would only be able to infect a certain set of handheld devices that use a particular processor. In the future, this problem is going to be less of an issue for developers as more compatible processors are released. For example, the new XScale processors are compatible with the ARM series. XScale appears not only in Pocket PC systems, but in Palm devices as well. Obviously, this opens up possibilities for the attackers to create "cross-platform" viruses to target Palm as well as Pocket PC systems with the same virus.

Microsoft developed a new feature on the Pocket PC that made the Windows CE developers' jobs easier. In the Pocket PC, Microsoft started to support a new executable file format: the common executable file (CEF) format.

CEF executables can be compiled with Windows CE development tools, such as eMbedded Visual C++ 3.0. A CEF executable is basically a special kind of PE file. CEF is a processor-neutral code format that enables the creation of portable applications across CPUs supported by Windows CE. In fact, CEF contains MSIL code.

In eMbedded Visual C++, CEF tools (compilers, linkers, and SDK) are made available to the developer the same way that a specific CPU target (such as MIPS or ARM) is selected. When a developer compiles a CEF application, the compiler and linker do everything but generate machine-specific code. You still get a DLL or EXE, but the file contains intermediate language instructions instead of native machine code instructions.

CEF enables WindowsCE application developers to deliver products that support all the CPU architectures that run the WindowsCE 3.0 and above operating systems. Because CEF is an intermediate language, processor vendors can easily add a new CPU family that runs CEF applications. For instance, HP Jornada 540 comes with such a built-in device translator layer. The CEF file might have an EXE extension when distributed, so nothing really changes from the user's perspective.

The device translator is specific to a particular processor and WindowsCE device. The device version normally translates a CEF executable to the native code of that processor when the user installs the CEF executable on the device. This occurs seamlessly, without any indication to the user, other than a brief pause for translation after the executable is clicked on. An operating system hook catches any attempt to load and execute a CEF EXE, DLL, or OCX file automatically and invokes the translator before running the file.

For example, if the Pocket PC is built on an SH3 processor, the translator layer will attempt to compile the CEF file to an SH3 format. The actual CEF executable will be replaced by its compiled SH3 native version, changing the content of the file completely to a native executable. Indeed, the first reincarnation of MSIL, JIT (Just-in-Time) compiling on Pocket PC rewrites the executables themselves on the file system.

Obviously, virus writers might take advantage of the CEF format in the near future. A 32-bit Windows virus could easily install a CEF version of itself to the Pocket device, allowing it to run on all Pocket PC devices because the OS would translate the CEF executable to native format. We can only hope that CEF will not be supported on systems other than Windows CE. A desktop implementation, for example, would be very painful to see in case the operating system would rewrite CEF objects to native executables.

Because executables are converted to new formats on the fly, the content of the file changes. This is an even bigger problem than the up-conversion of Macro viruses in Office products50. Obviously, this is going to be a challenge for antivirus software, integrity checkers, and behavior blocker systems.

First of all, it is clearly a major problem for antivirus software given that the virus code needs to be detected and identified in all possible native translations as well as the original MSIL form. If the MSIL virus is executed on a device, before a signature of the virus is known to the antivirus program, the virus will run and its code will be converted to any of a number of native formats according to the actual type of the system. As a result, the MSIL signature of the virus will not be useful to find the virus afterwards. The virus needs to be detected in all possible native translations as well, but this task is not trivial.

It is a problem for the integrity checkers because the content of the program changes on the disk, not only in memory. As a result, integrity checkers cannot be sure if the change was the result of a virus infection or a simple native code translation. Finally, it is a problem for behavior blocker systems because the content of an executable is changed on the disk, which easily can be confused with virus activity.

3.20. Embedded Object Insertion Dependency

The first known binary virus that could infect Word 6 documents, called Anarchy.609351, appeared in 1997. Not surprisingly, we have not seen many other viruses like this because attacking the document formats to add macros to them is no trivial task. Anarchy was a DOS-based COM, EXE, and DOC file infector.

The first virus to infect VBA documents from binary code was released from Russia. The virus is called {Win32,W97M}/Beast.41472.A, and it appeared in the wild in April 1999. The virus is written in Borland Delphi and compiled to 32-bit PE format.

Beast uses a different means of infection than other binary viruses that infect documents. Instead of having it attack the VBA format on a bit-by-bit level, the Beast author used OLE (Object Linking and Embedding) APIs, such as AddOLEObject(), to inject macro code and embedded executable code into documents by using the internal OLE support of Microsoft Word. Via OLE support, the virus injects an embedded object (executable) into VBA documents. However, this embedded object will not be visible to the user, as it normally would. This is because the virus uses a trick to hide the icon of the embedded object.

The virus looks for actively opened documents in Word. When a handle to an active document is available, the virus calls its infection module. First it tries to check whether there are no embedded objects in the document, but in some cases this routine fails because the virus might have added multiple embedded executables into the documents.

Next, Beast tries to add itself as C:\I.EXE shape into the document, named 3BEPb (Russian for beast). If this procedure goes as planned, then a new macro called AutoOpen()also will be injected into the document.

The execution of the embedded object is facilitated by using the Activate method for the 3BEPb shape in the active document:

ActiveDocument.Shapes("3BEPb").Activate

Beast introduced the need for detection and removal of malicious embedded objects in documents not the simplest problem to solve.

3.21. Self-Contained Environment Dependency

One interesting dependency appears when malicious code carries its own environment to the platform. The W32/Franvir virus family offers a good example.

Franvir is clearly a Win32 application. It is compiled with Borland Delphi to a 32-bit PE program. However, the actual Win32 binary part is known as the Game Maker, written by Mark Overmars of the Netherlands ( http://www.cs.uu.nl/people/markov/gmaker/doc.html ).

The Franvir virus was written by a French virus writer using the script language of Game Maker, called GML (Game Maker Language). This is only available in the registered version of Game Maker, which provides developers with security options for using these functions (turning them on and off). It is up to the developer to set the security settings; therefore, a malicious author can easily use GML of Game Maker for virus writing.

Game Maker is a professional game developer environment. Hundreds of brilliant games have been created in it by professionals. It can be used to develop all kinds of games, including scrolling shooters, puzzle games, and even isometric games. For instance, the shooter game called Doomed was created using Game Maker (see Figure 3.14).

Figure 3.14. Doomed in action.

Figure 3.14. Doomed in action.

GML provides functions for Registry, File, and program execution. The File operation functions are extremely rich and provide high flexibility for game developers to install and execute programsbut they also can be used by malicious attackers. Some of the functions of GML include the following:

  • file_exists(fname)
  • file_delete(fname)
  • file_copy(fname,newname)
  • file_open_write(fname)
  • directory_create(dname)
  • file_find_first(mask,attr)
  • file_find_next()
  • file_attributes(fname,attr)
  • registry_write_string_ext()

GML scripts are stored in the resources of Game Maker, but they are accessed and executed by the environment, the interpreter in Game Maker itself. Franvir is an encrypted GML script. It copies itself all over the hard disk under various existing program names. It also installs itself to local P2P (peer-to-peer) folders or even creates the shared folder for KaZaA if the directory is not installed ("kazaa\my shared folder\") and changes the KaZaA settings to share the folder. Furthermore, it does damage by deleting the win.com file of Windows. Thus, ultimately Franvir must be classified as a Win32 P2P worm. In reality, however, it is a GML script that is carried by its own environment to new platforms. When the virus successfully executes, it eventually uses the show_message() function to display the false error message shown in Figure 3.15.

Figure 3.15. The false error message of Franvir.

Figure 3.15. The false error message of Franvir.

The virus could ultimately offer to play a game such as the DOS virus, Playgame, instead of executing the malicious file delete action as an activation routine, but well... what can we expect from a typical virus writer?

3.22. Multipartite Viruses

The first virus that infected COM files and boot sectors, Ghostball, was discovered by Fridrik Skulason in October 1989. Another early example of a multipartite virus was Tequila. Tequila could infect DOS EXE files as well as the MBR (master boot sector) of hard disks.

Multipartite viruses are often tricky and hard to remove. For instance, the Junkie virus infects COM files and is also a boot virus. Junkie can infect COM files on the hidden partitions52 that some computer manufacturers use to hide data and extra code by marking the partition entries specifically. Because Junkie loads to memory before these hidden files are accessed, these files can get infected easily. Scanners typically scan the content of the visible partitions only, so such infections often lead to mysterious reinfections of the system. This is because the virus has been cleaned from everywhere but from the hidden partition, so the virus can infect the system again as soon as the hidden partition is used to run one of the infected COM files.

In the past, boot and multipartite viruses were especially successful at infecting machines that used the DOS operating system. On modern Windows systems, such viruses are less of a threat, but they do exist.

The Memorial virus53 introduced DOS COM, EXE, and PE infection techniques in the same virus. The payload of the Memorial virus is show in Figure 3.16.

Figure 3.16. The message of the W95/Memorial virus.

Figure 3.16. The message of the W95/Memorial virus.

W95/Memorial also used the VxD (Virtual Device Driver) format of Windows 9x systems to load itself into kernel mode and hook the file system to infect files on the fly whenever they were accessed. As a result, Memorial also infects 16-bit and 32-bit files.

Another interesting example of a multipartite infection is the Russian virus, 3APA3A, which was found in the wild in Moscow in October 199454. 3APA3A is a normal boot virus on a diskette, occupying two sectors for itself, but it uses a special infection method on the hard disk. It infects the DOS core file IO.SYS. First it makes a copy of IO.SYS, and then it overwrites the original. After the infection, the root directory contains two IO.SYS files, but the first is set as a volume label of the disk; thus, the DIR command does not display two files, but a volume label "IO SYS" and a single IO.SYS file. The point is to trick DOS into loading the infected copy of IO.SYS. Then the virus starts the original one after itself. This happens because DOS will load the first IO.SYS file regardless of its attributes. This method represents a special subclass of companion infection techniques.

3.23. Conclusion

New viral environments are discovered each year. Over the last 20 years of PC viruses, there has been tremendous dark energy in place to develop computer viruses for almost every platform imaginable. All over the world, thousands of people created computer viruses. Because of this we are experiencing an ever-growing security problem with malicious code and, consequently, seeing the development of computer virus research as a new scientific field. There is absolutely no question whether computer viruses will stay with us and evolve to future platforms in the upcoming decades.

Fred Cohen's initial research with computer viruses in 1984 concluded that the computer virus problem is ultimately an integrity problem. Over the last 20 years, the scope of integrity expanded dramatically from file integrity to the integrity of applications and operating system software. Modern computer viruses, such as W32/CodeRed and W32/Slammer, clearly indicate this new era: Computer viruses cannot be controlled by file-based integrity checking alone because they jump from system to system over the network, injecting themselves into new process address spaces in such a way that they are never stored on the disk.

Computer viruses changing their environments to suit their needs is a problem that will likely begin to emerge. For example, the W32/Perrun virus appends itself to JPEG picture files. Normally, pictures files are not infectious unless some serious vulnerability condition exists in a picture file viewer (such as the one described in Microsoft Security Bulletin MS04-02855). However, Perrun modifies the environment of the infected host to include an extractor component, resulting in Perrun-compromised JPEG files not being infectious on a clean system but on infected computers only. Such computer viruses can modify the host's environment in such a way that previous assumptions about the environments no longer hold.

References

1. Dr. Vesselin Bontchev, "Methodology of Computer Anti-Virus Research," University of Hamburg, Dissertation, 1998.

2. Dr. Harold Highland, "A Macro Virus," Computers & Security, August 1989, pp. 178-188.

3. Joe Wells, "Brief History of Computer Viruses," 1996, http://www.research.ibm.com/antivirus/timeline.htm.

4. Dr. Peter Lammer, "Jonah's Journey," Virus Bulletin, November 1990, p.20.

5. Peter Ferrie, personal communication, 2004.

6. Jim Bates, "WHALE...A Dinosaur Heading For Extinction," Virus Bulletin, November 1990, pp. 17-19.

7. Eric Chien, "Malicious Threats to Personal Digital Assistants," Symantec, 2000.

8. Dr. Alan Solomon, "A Brief History of Viruses," EICAR, 1994, pp. 117-129.

9. Intel Pentium Processor III Specification Update, http://www.intel.com/design/PentiumIII/specupdt/24445349.pdf.

10. Mikko Hypponen, Private Communication, 1996.

11. Thomas Lipp, "Computerviren," 64'er, Markt&Technik, March 1989.

12. Peter Szor, "Stream of Consciousness," Virus Bulletin, October 2000, p.6.

13. Peter Szor and Peter Ferrie, "64-bit Rugrats," Virus Bulletin, July 2004, pp. 4-6.

14. Marious Van Oers, "Linux VirusesELF File Format," Virus Bulletin Conference, 2000, pp. 381-400.

15. Jakub Kaminski, "Not So Quiet on the Linux Front: Linux Malware II," Virus Bulletin Conference, 2001, pp. 147-172.

16. Eugene Kaspersky, "Shifter.983," http://www.viruslist.com, 1993.

17. Sarah Gordon, "What a (Winword.) Concept," Virus Bulletin, September 1995, pp. 8-9.

18. Sarah Gordon, "Excel Yourself!" Virus Bulletin, August 1996, pp. 9-10.

19. Yoshihiro Yasuda, personal communication, 2004.

20. Dr. Igor Muttik, "Macro VirusesPart 1," Virus Bulletin, September 1999, pp. 13-14.

21. Dr. Vesselin Bontchev, "The Pros and Cons of WordBasic Virus Up-conversion," Virus Bulletin Conference, 1998, pp. 153-172.

22. Dr. Vesselin Bontchev, "Possible Macro Virus Attacks and How to Prevent Them," Virus Bulletin Conference, 1996, pp. 97-127.

23. Dr. Vesselin Bontchev, "Solving the VBA Up-conversion Problem," Virus Bulletin Conference, 2001, pp. 273-300.

24. Nick FitzGerald, "If the CAP Fits," Virus Bulletin, September 1999, pp.6-7.

25. Jimmy Kuo, "Free Anti-Virus Tips and Techniques: Common Sense to Protect Yourself from Macro Viruses," NAI White Paper, 2000.

26. Dr. Vesselin Bontchev, personal communication, 2004.

27. Jakub Kaminski, "Disappearing MacrosNatural Devolution of Up-converted Macro Viruses," Virus Bulletin Conference, 1998, pp. 139-151.

28. Katrin Tocheva, "Multiple Infections," Virus Bulletin, 1999, pp.301-314.

29. Dr. Richard Ford, "Richard's Problem," private communication on VMACRO mailing list, 1997.

30. Dr. Vesselin Bontchev, "Macro Virus Identification Problems," Virus Bulletin Conference, 1997, pp. 157-196.

31. Dr. Vesselin Bontchev, private communication, 1998.

32. Vesselin Bontchev, "No Peace on the Excel Front," Virus Bulletin, April 1998, pp. 16-17.

33. Gabor Szappanos, "XML Heaven," Virus Bulletin, February 2003, pp. 8-9.

34. Peter G. Capek, David M. Chess, Alan Fedeli, and Dr. Steve R. White, "Merry Christmas: An Early Network Worm," IEEE Security & Privacy, http://www.computer.org/security/v1n5/j5cap.htm.

35. Dr. Klaus Brunnstein, "Computer 'Beastware': Trojan Horses, Viruses, Worms A Survey," HISEC'93, 1993.

36. David Ferbrache, "A Pathology of Computer Viruses," Springer-Verlag, 1992, ISBN: 3-540-19610-2.

37. Peter Szor, "Warped Logic?" Virus Bulletin, June 2001, pp. 5-6.

38. Mikhail Pavlyushchik, "Virus Mapping," Virus Bulletin, November 2003, pp. 4-5.

39. Eugene Kaspersky, "Don't Press F1," Virus Bulletin, January 2000, pp.7-8.

40. Peter Szor, "Sharpei Behaviour," Virus Bulletin, April 2002, pp. 4-5.

41. Gabor Kiss, "SWF/LFM-926Flash in the Pan?" Virus Bulletin, February 2002, p. 6.

42. Dmitry Gryaznov, private communication, 2004.

43. Sami Rautiainen, private communication, 2004.

44. Philip Hannay and Richard Wang , "MSIL for the .NET Framework: The Next Battleground?," Virus Bulletin Conference, 2001, pp. 173-196.

45. Peter Szor, "Tasting Donut," Virus Bulletin, March 2002, pp. 6-8.

46. Peter Ferrie, "The Beagle Has Landed," http://www.virusbtn.com/resources/viruses/indepth/beagle.xml.

47. Eugene Kaspersky, "Zhengxi: Saucerful of Secrets," Virus Bulletin, April 1996, pp. 8-10.

48. Roger A. Grimes, Malicious Mobile Code, O'Reilly, 2001, ISBN: 1-56592-682-X (Paperback).

49. Ken Thomson, "Reflections on Trusting Trust," Communication of the ACM, Vol. 27, No. 8, August 1984, pp. 761-763, http://cm.bell-labs.com/who/ken/trust.html.

50. Peter Szor, "Pocket Monsters," Virus Bulletin, August 2001, pp. 8-9.

51. Igor Daniloff, "Anarchy in the USSR," Virus Bulletin, October 1997, pp.6-8.

52. Lakub Kaminski, "Hidden Partitions vs. Multipartite Viruses I'll be back!," Virus Bulletin Conference, 1996.

53. Peter Szor, "Junkie Memorial," Virus Bulletin, September 1997, pp. 6-8.

54. Dr. Igor Muttik , "3apa3a," http://www.f-secure.com/v-descs/3apa3a.shtml, 1994.

55. "Buffer Overrun in JPEG Processing (GDI+) Could Allow Code Execution," MS04-028, http://www.microsoft.com/technet/security/bulletin/ms04-028.mspx.

Chapter 4. Classification of Infection Strategies

"All art is an imitation of nature."

Seneca


In this chapter, you will learn about common computer virus infection techniques that target various file formats and system areas.

4.1. Boot Viruses

The first known successful computer viruses were boot sector viruses. In 1986 two Pakistani brothers, on the IBM PC, created the first such virus called Brain.

Today the boot infection technique is rarely used. However, you should become familiar with boot viruses because they can infect a computer regardless of the actual operating system installed on it.

Boot sector viruses take advantage of the boot process of personal computers (PCs). Because most computers do not contain an operating system (OS) in their read-only memory (ROM), they need to load the system from somewhere else, such as from a disk or from the network (via a network adapter).

A typical IBM PC's disk is organized in up to four partitions, which have logical letters assigned to them on several operating systems such as MS-DOS and Windows NT, typically C:, D:, and so on. (Drive letters are particularities of the operating systemfor example, UNIX systems use mount points, not driver letters.) Most computers only use two of these partitions, which can be accessed easily. Some computer vendors, such as COMPAQ and IBM, often use hidden partitions to store additional BIOS setup tools on the disk. Hidden partitions do not have any logical names assigned to them, making them more difficult to access. Good tools such as Norton Disk Editor can reveal such areas of the disk. (Please use advanced disk tools very carefully because you can easily harm your data!)

Typically PCs load the OS from the hard drive. In early systems, however, the boot order could not be defined, and thus the machine would boot from the diskette, allowing great opportunity for computer viruses to load before the OS. The ROM-BIOS reads the first sector of the specified boot disk according to the boot order settings in the BIOS setup, stores it in the memory at 0:0x7C00 when successful, and runs the loaded code1.

On newer systems, each partition is further divided into additional partitions. The disk is always divided into heads, tracks, and sectors. The master boot record (MBR) is located at head 0, track 0, sector 1, which is the first sector on the hard disk. The MBR contains generic, processor-specific code to locate the active boot partition from partition table (PT) records. The PT is stored in the data area of the MBR. At the front of the MBR is some tiny code, often called a boot strap loader.

Each PT entry contains the following:

  • The addresses of the first and last sectors of the partition
  • A flag whenever the partition is bootable
  • A type byte
  • The offset of the first sector of the partition from the beginning of the disk in sectors
  • The size of the partition in sectors

The loader locates the active partition and loads its first logical sector as the boot sector. The boot sector contains OS-specific code. The MBR is general-purpose code, not related to any OS. Thus IBM PCs can easily support more than one partition with different kinds of file systems and operating systems. This also makes the job of computer viruses very simple. The MBR code can be easily replaced with virus code that loads the original MBR after itself and stays in memory, depending on the installed operating system. In the case of MS-DOS, boot viruses can easily remain in memory and infect other inserted media on the fly. A few tricky boot viruses, like Exebug, always force the computer to load them on the system first and then complete the boot process themselves. Exebug changes the CMOS settings of the BIOS to trick the PC into thinking it has no floppy drives. Thus, the PC will boot using the infected MBR first. When the virus is executed (from the hard disk), it checks if there is a diskette in drive A:, and if there is one, it will load the boot sector of the diskette and transfer control to it. Thus when you try to boot from a boot diskette, the virus can trick you into believing that you indeed booted from the diskette, but in reality, you did not.

In the case of floppy diskettes, the boot sector is the first sector of the diskette. The boot record contains OS-specific filenames to load, such as IBMBIO.COM and IBMDOS.COM.

It is advisable to set the boot process in such a way that you boot from the hard drive first. In first-generation IBM PCs, the boot process was not designed that way, so whenever a diskette was left in drive A:, the PC attempted to boot from it. Boot viruses took advantage of this design mistake. By setting the boot process properly, you can easily avoid simple boot sector viruses.

Note

If your system has a SCSI disk connected to it, the system might not boot from those drives first because it is unable to handle these disks directly from its BIOS.

The following sections discuss in detail the basic kinds of MBR and boot sector infection techniques.

4.1.1. Master Boot Record (MBR) Infection Techniques

Infection of the MBR is a relatively trivial task for viruses. The size of the MBR is 512 bytes. Only a short code fits in there, but it is more than enough for a small virus. Typically the MBR gets infected immediately upon booting from an infected diskette in drive A.

4.1.1.1 MBR Infection by Replacement of Boot Strap Code

The classic type of MBR viruses uses the INT 13h BIOS disk routine to access the disks for read and write access. Most MBR infectors replace the boot strap code in the front of the MBR with their own copy and do not change the PT. This is important, because the hard disk is only accessible when booting from a diskette whenever the PT is in place. Otherwise, DOS has no way to find the data on the drive.

The Stoned virus is a typical example of this technique. The virus stores the original MBR on sector 7 (see Figure 4.1). After the virus gets control via the replaced MBR, it reads the stored MBR located on sector 7 in memory and gives it control. A couple of empty sectors are typically available after the MBR, and Stoned takes advantage of this. However, this condition cannot be 100% guaranteed, and this is exactly why some MBR viruses make a system unbootable after infection.

Figure 4.1. The typical layout of the disk before and after a Stoned infection.

Figure 4.1. The typical layout of the disk before and after a Stoned infection.

4.1.1.2 Replacing the MBR Code but Not Saving It

Another technique of viruses to infect the MBR is to overwrite the boot strap code, leaving the PT entries in place but not saving the original MBR anywhere. Such viruses need to perform the function of the original MBR code. In particular, they need to locate the active partition, load it, and give control to it after themselves.

One of the first viruses that used this technique was Azusa2, discovered in January of 1991 in Ontario, Canada. Viruses like this cannot be disinfected with regular methods because the original copy of the MBR is not stored anywhere.

Antivirus programs quickly reacted to this threat by carrying a standard MBR code within them. To disinfect the virus, this generic MBR code was used to overwrite the virus code, thereby saving the system.

4.1.1.3 Infecting the MBR by Changing the PT Entries

An easy target of MBR viruses is the partition table record of the MBR. By manipulating the PT entry of the active partition, a virus can make sure it loads a different boot sector, where the virus body is stored. Thus the MBR will load the virus boot sector instead of the original one, and the virus will load the original after itself.

The StarShip virus is an example of this technique. Some tricky viruses, such as some members of the Ginger family, manipulate the PT entries in such a way as to create a "circular partition"3,4 effect. Apparently this trick causes MS-DOS v4.07.0 to run in an endless loop when booted. Thus only a clean MS-DOS 3.3x or some other non-Microsoft-made DOS system, such as PC DOS, must be used to be able to boot properly from a diskette.

4.1.1.4 Saving the MBR to the End of the Hard Disk

A common method of infecting the MBR is to replace the MBR completely and save the original at the end of the hard drive, in the hope that nothing overwrites it there. Some of the more careful viruses reduce the size of the partition to make sure that that this area of the disk will not be overwritten again. The multipartite virus, Tequila, uses this technique.

4.1.2. DOS BOOT Record (DBR) Infection Techniques

Boot sector viruses infect the first sector, the boot sector of the diskettes. They optionally infect the hard-disk boot sectors, as well. There are more known infection techniques to infect boot sectors than there are to infect MBRs.

4.1.2.1 Standard Boot Infection Technique

One of the most frequently used boot infection techniques was developed in viruses like Stoned. Stoned infects a diskette's boot sector by replacing the 512-byte boot sector with its own copy and saving the original to the end of the root directory.

In practice, this technique is safe most of the time, but accidental damage to the content of the diskette can happen if there are too many filenames stored in the diskette's directory. In such a case, the original sector's content might overwrite the content of the directory; as a result, only some garbage is displayed on-screen via a DIR command.

4.1.2.2 Boot Viruses That Format Extra Sectors

Some boot viruses are simply too large to fit in a single sector. Most diskettes can be formatted to store more data than their actual formatted size. Not all floppy disk drives support the formatting of extra sectors, but many do. For example, my first PC clone's diskette drive did not support the access to these areas of diskettes. As a result, some copy-protected software simply did not work properly on my system.

Copy-protection software often takes advantage of specially formatted "extra" diskette sectors placed outside of normal ranges. As a result, normal diskette copying tools, such as DISKCOPY, fail to make an identical copy of such diskettes.

Some viruses specially format a set of extra diskette sectors to make it more difficult for the antivirus program to access the original copy during repair. However, the typical use of extra sectors is to make more space for a larger virus body.

The Indonesian virus, Denzuko, is an example that uses this technique. Denzuko was released during the spring of 1988. Unlike with most other viruses, the author of this virus is known. It was written by Denny Yanuar Ramdhani. The nickname of the virus writer is Denny Zuko, which comes from "Danny Zuko," the character in the popular musical movie Grease played by John Travolta5. This boot virus was among the first to implement a counterattack against another computer virus. Denzuko killed the Brain virus whenever it encountered it on a computer.

Denzuko also displayed the graphical payload shown in Figure 4.2 for a fraction of a second when Ctrl-Alt-Del was pressed. Then the computer appeared to reboot, but the virus stayed in memory6.

Figure 4.2. Payload of the Denzuko virus.

Figure 4.2. Payload of the Denzuko virus.

The extremely complex and dangerous Hungarian stealth BOOT/MBR virus, Töltögetö (also known as Filler), uses this technique as well. This virus was written by a computer student at a technical high school in SzE8kesfehE8rvE1r, Hungary, in 1991. Filler has formatting records for both 360KB and 1.2MB diskettes and format sectors on track 40 or 80 on these, respectively. These areas of the diskette are not formatted normally.

A benefit of such an infection technique is the possibility of reviving dead virus code. Reviving attempts were first seen in computer viruses in the early '90s. For example, some COM infector viruses would attempt to load to the very end of the disk, outside of normally formatted areas, and give control to the loaded sector. Many early antivirus solutions did not overwrite the virus code everywhere on the disk during cleanup. The boot sector of the disk was often fixed, and the virus code was considered dead in the diskettes' "out of reach" areas. Unfortunately, this provided the advantage of allowing virus writers to revive such dead virus instances easily, using another virus.

4.1.2.3 Boot Viruses That Mark Sectors as BAD

An interesting method of viruses to infect boot sectors is to replace the original boot sector with the virus code and save the original sector, or additional parts of the virus body, in an unused cluster marked as BAD in the DOS FAT. An example of this kind of virus is the rather dangerous Disk Killer, written in April 19897.

4.1.2.4 Boot Viruses That Do Not Store the Original Boot Sector

Some boot sector viruses do not save the diskette's original boot sector anywhere. Instead, they simply infect the active boot sector or the MBR of the hard disk and give control to saved boot sectors on the hard disk. Thus the diskette infection cannot be repaired with standard techniques because the virus does not need to store the original sector anywhere. Because the boot sector is operating systemspecific, this task is not as simple as replacing the MBR code; there are too many different OS boot sectors to choose from. Not surprisingly, the most common antivirus solution to this problem has been to overwrite the virus code with a generic boot sector code that displays a message asking the user to boot from the hard disk instead. As a result, a system diskette cannot be repaired properly.

A second, less common method is to overwrite the diskette boot sector with the virus code, which will infect the MBR or the boot sector of the hard disk. The virus then displays a false error message, such as "Non-system disk or disk error," and lets the user load the virus from the hard disk. The Strike virus is an example that uses this technique.

A further method to infect the boot sector of diskettes without saving is to mimic the original boot sector functionality and attempt to load some system files. Obviously, this method will only work if the virus code matches the system files on the diskette. The Lucifer virus is an example of this technique.

4.1.2.5 Boot Viruses That Store at the End of Disks

A class of boot viruses replaces the original boot sector by overwriting it and saving it at the end of the hard disk, like MBR viruses, which also do this occasionally. The infamous Form virus uses this method. It saves the original boot sector at the very end of the disk. Form hopes that this sector will be used infrequently, or not at all, and thus the stored boot sector will stay on the disk without too much risk of being modified. Thus the virus does not mark this sector in any way; neither does it reduce the size of the partition that contains the saved sector.

Another class of boot viruses also saves the boot sector at the end of the active partition and makes the partition shorter in the partition table to be certain that this sector is not going to be "free" for other programs to use. Occasionally, the boot sector's data area is modified for the same reason.

4.1.3. Boot Viruses That Work While Windows 95 Is Active

Several boot viruses, typically the multipartite kind, attack the new floppy disk driver of Windows 95 systems stored in \SYSTEM\IOSUBSYS\HSFLOP.PDR. The technique appeared in the Slovenian virus family called Hare (also known as Krishna) in May of 1996, written by virus writer Demon Emperor.

Viruses delete this file to get access to INT 13h, BIOS, real-mode interrupt handler while Windows 95 is active on the system. Without this trick, other boot viruses cannot infect the diskettes using INT 13h because it is not available for them to use.

4.1.4. Possible Boot Image Attacks in Network Environments

Diskless workstations boot using a file image from the server. On Novell NetWare file servers, for instance, the command DOSGEN.EXE can create an image of a bootable diskette, called NET$DOS.SYS, for the use of terminals. The terminals have a special PROM chip installed that searches for the boot images over the network.

This provides two obvious possibilities for the attacker. The first is to infect or replace the NET$DOS.SYS file on the server whenever access is available to it. The second is to simulate the functionality of the server code and host fake virtual servers via virus code on the network with images that contain virus code.

No such viruses are known. However, the NET$DOS.SYS image file is often infected, which is ignored by many virus scanners. This exposes the "dumb terminals" to virus attacks.

4.2. File Infection Techniques

In this section, you will learn about the common virus infection strategies that virus writers8 have used over the years to invade new host systems.

4.2.1. Overwriting Viruses

Some viruses simply locate another file on the disk and overwrite it with their own copy. Of course, this is a very primitive technique, but it is certainly the easiest approach of all. Such simple viruses can do major damage when they overwrite files on the entire disk.

Overwriting viruses cannot be disinfected from a system. Infected files must be deleted from the disk and restored from backups. Figure 4.3 shows how the content of the host program changes when an overwriting virus attacks it.

Figure 4.3. An overwriting virus infection that changes host size.

Figure 4.3. An overwriting virus infection that changes host size.

Normally, overwriting viruses are not very successful threats because the obvious side effects of the infections are easily discovered by users. However, such viruses have better potential when this technique is combined with network-based propagation. For instance, the VBS/LoveLetter.A@mm virus mass mails itself to other systems. When executed, it will overwrite with its own copy any local files with the following extensions:

.vbs, .vbe, .js, .jse, .css, .wsh, .sct, .hta, .jpg, .jpeg, .wav, .txt, .gif, .doc, .htm, .html, .xls, .ini, .bat, .com, .avi, .qt, .mpg, .mpeg, .cpp, .c, .h, .swd, .psd, .wri, .mp3, and .mp2

Another overwriting virus infection method is used by the so-called tiny viruses. A classic family of this type is the Trivial family on DOS. During the early 1990s, many virus writers attempted to write the shortest possible binary virus. Not surprisingly, there are many variants of Trivial. Some of the viruses are as short as 22 bytes (Trivial.22).

The algorithm for such viruses is simple:

  1. Search for any (*.*) new host file in the current directory.
  2. Open the file for writing.
  3. Write the virus code on top of the host program.

The shortest viruses are often unable to infect more than a single host program in the same directory in which the virus was executed. This is because finding the next host file would be "as expensive" as a couple of bytes of extra code. Such viruses are not advanced enough to attack a file marked read-only because that would take a couple of extra instructions.

Often the virus code is optimized to take advantage of the content of the registers during program execution as they are passed in by the operating system. Thus the virus code itself does not need to initialize registers that have known content set by the system loader. By using this condition, virus writers can make their creation even shorter.

Such optimization, however, can cause fatal errors when the virus code is executed on the wrong platform, which did not initialize the registers in the way that the virus expected.

Some tricky overwriting viruses also use BIOS disk writes instead of DOS file functions to infect new files. A very primitive form of such a virus was implemented in 15 bytes. The virus overwrites each sector on the disk with itself. Evidently, the system corruption is so major that such viruses kill the host system very quickly, keeping the virus from spreading any further.

Figure 4.4 illustrates an overwriting virus that simply overwrites the beginning of the host but does not change its size.

Figure 4.4. An overwriting virus that does not change the size of the host.

Figure 4.4. An overwriting virus that does not change the size of the host.

4.2.2. Random Overwriting Viruses

Another rare variation of the overwriting method does not change the code of the program at the top of the host file. Instead, the virus seeks to a random location in the host program and overwrites the file with itself at that location. Evidently, the virus code might not even get control during execution of the host. In both cases, the host program is lost during the virus's attack and often crashes before the virus code can execute. An example of this virus is the Russian virus Omud9, as shown in Figure 4.5.

Figure 4.5. A random overwriter virus.

Figure 4.5. A random overwriter virus.

To improve performance by reducing the disk I/O, modern antivirus scanners are optimized to find viruses at "well-known" locations of the file whenever possible. Thus random overwriting viruses are often problematic for scanners to find because a scanner would need to scan the contents of the complete host program for the virus code, which is too I/O expensive.

4.2.3. Appending Viruses

A very typical DOS COM file infection technique is called normal COM. In this technique, a jump (JMP) instruction is inserted at the front of the host to point to the end of the original host. A typical example of this virus is Vienna, which was published in Ralf Burger's computer virus book in a slightly modified form with its source code. This was back in 1986-1987.

The technique gets its name from the location of the virus body, which is appended to the end of the file. (It is interesting to note that some viruses infect EXE files as COM by first converting the EXE file to a COM file. The Vacsina virus family uses this technique.)

The jump instruction is sometimes replaced with equally functional instructions, such as the following:

  1. CALL start_of_virus
  2. PUSH offset start_of_virus
    RET

The first three overwritten bytes at the top of the host program (sometimes 416) are stored in the virus body. When the virus-infected program is executed, the virus loads in memory with the actual infected host. The jump instruction directs control to the virus body, and then the virus typically replicates itself by locating new host programs on the disk or by executing some sort of activation routine (also called a trigger). Finally, the virus virtually cleans the program in memory by copying the original bytes to offset CS:0x100 (the location where the COM files are loaded) and executes the original program by jumping back to CS:0x100. The COM files are loaded to CS:0x100 because the program segment prefix (PSP) is placed at CS:0CS:0xFF.

Figure 4.6 shows how a DOS COM appender virus infects a host program.

Figure 4.6. A typical DOS COM appender virus.

Figure 4.6. A typical DOS COM appender virus.

Obviously the appender technique can be implemented for any other type of executable file, such as EXE, NE, PE, and ELF formats, and so on. Such files have a header section that stores the address of the main entry point, which, in most cases, will be replaced with a new entry point to the start of the virus code appended to the end of the file.

Section 4.3 is dedicated to Win32 infection techniques to demonstrate the principles of file infection techniques in modern file formats. These formats often have complicated internal structures offering many more opportunities to attackers.

4.2.4. Prepending Viruses

A common virus infection technique uses the principle of inserting virus code at the front of host programs. Such viruses are called prepending viruses. This is a simple kind of infection, and it is often very successful. Virus writers have implemented it on various operating systems, causing major virus outbreaks in many.

An example of a COM prepender virus is the Hungarian virus Polimer.512.A, which prepends itself, 512 bytes long, at the front of the executable and shifts the original program content to follow itself.

Let's take a look at the front of the Polimer virus in DOS DEBUG. Polimer is a good example to study because the top of the virus code is a completely harmless data area with a message that is displayed onscreen during execution of infected programs.

>DEBUG polimer.com
-d
142F:0100 E9 80 00 00 3F 3F 3F 3F-3F 3F 3F 3F 43 4F 4D 00 ....????????COM.
142F:0110 05 00 00 00 2E 8B 26 68-01 00 00 00 00 00 00 00 ......&h........
142F:0120 00 00 00 00 00 00 00 00-41 20 6C 65 27 6A 6F 62 ........A le'job
142F:0130 62 20 6B 61 7A 65 74 74-61 20 61 20 50 4F 4C 49 b kazetta a POLI
142F:0140 4D 45 52 20 6B 61 7A 65-74 74 61 20 21 20 20 20 MER kazetta !
142F:0150 56 65 67 79 65 20 65 7A-74 20 21 20 20 20 20 0A Vegye ezt ! .
 

The virus body is loaded to offset 0x100 in memory. The virus code starts with a jump (0xe9) instruction to give control to the virus code after its own data area. Because Polimer is 512 bytes (0x200) long, at the front of COM executable, offset 0x300 in memory should be the original host program (0x100+0x200=0x300). Indeed, in this example, the actual infected host is the Free Memory Query Program. Prepending COM viruses can easily start their host programs by copying the original programs' content to offset 0x100 and giving it control.

-d300
142F:0300 E9 9E 00 0D 46 72 65 65-20 4D 65 6D 6F 72 79 20 ....Free Memory
142F:0310 51 75 65 72 79 20 50 72-6F 67 72 61 6D 2C 20 56 Query Program, V
142F:0320 65 72 73 69 6F 6E 20 34-2E 30 33 0D 0A 53 4D 47 ersion 4.03..SMG
142F:0330 20 53 6F 66 74 77 61 72-65 0D 0A 28 43 29 20 43 Software..(C) C
142F:0340 6F 70 79 72 69 67 68 74-20 31 39 38 36 2C 31 39 opyright 1986,19
142F:0350 38 37 20 53 74 65 76 65-6E 20 4D 2E 20 47 65 6F 87 Steven M. Geo
142F:0360 72 67 69 61 64 65 73 0D-0A 1A 00 00 00 00 00 00 rgiades.........
 

Figure 4.7 illustrates how a prepender virus is inserted at the front of a host program.

Figure 4.7. A typical prepender virus.

Figure 4.7. A typical prepender virus.

Prepender viruses are often implemented in high-level languages such as C, Pascal, or Delphi. Depending on the actual structure of the executable, the execution of the original program might not be as trivial a task as it is for COM files. This is exactly why a generic solution involves creation of a new temporary file on the disk to hold the content of the original host program. Then a function, such as system(), is used to execute the original program in the temporary file. Such viruses typically pass command-line parameters of the infected host to the host program stored in the temporary file. Thus the functionality of the application will not break because of missing parameters.

4.2.5. Classic Parasitic Viruses

A variation of the prepender technique is known as the classic parasitic infection, as shown in Figure 4.8. Such viruses overwrite the top of the host with their own code and save the top of the original host program to the very end of the host, usually virus-size long. The first such virus was Virdem, written by Ralf Burger. In fact, Virdem is one of the first examples of a file virus ever seen; Burger's book did not even contain information about any other kinds of computer viruses but file viruses. Burger distributed his creation at the Chaos Computer Club conference in December 1986.

Figure 4.8. A classic parasitic virus.

Figure 4.8. A classic parasitic virus.

Often when such viruses are repaired, a common problem occurs. In many cases, the repair definition directs to copy N number of bytes to the front of the file by calculating backward from the end of the infected program. Then the file is truncated at FILESIZE-N, where N is typically the size of the virus but the size of the file can change. The most common reason for this is a multiple infection, when the file is infected more than once.

In other cases, the file has some extra data appended, such as inoculation information placed there by some other antivirus program. For instance, the Jerusalem virus uses the MsDos marker at the end of the infected file to "recognize" files that are already infected. Some early antivirus programs appended the string to the end of all COM and EXE files to inoculate files from recurrent Jerusalem infections. Although it might sound like a great idea, the extra modification of the files can easily cause trouble for disinfectors. This happens when the inoculated file is already infected with another parasitic virus. When the FILESIZE-N calculation is used, the repair routine will seek to an incorrect location 5 bytes after the top of the original program content. This repair will result in a garbage host program that will crash when executed. This kind of disinfection is often called a half-cooked repair10.

Some special parasitic infectors do not save the top of the host to the end of the host program. Instead they use a temporary file to store this information outside of the file, sometimes with hidden attributes. For example, the Hungarian DOS virus, Qpa, uses this technique and saves 333 bytes (the size of the virus) to an extra file. Some members of the infamous W32/Klez family use this technique to store the entire host program in a new file.

4.2.6. Cavity Viruses

Cavity viruses (as shown in Figure 4.9) typically do not increase the size of the object they infect. Instead they overwrite a part of the file that can be used to store the virus code safely. Cavity infectors typically overwrite areas of files that contain zeros in binary files. However, other areas also can be overwritten, such as 0xCC- filled blocks that C compilers often use for instruction alignment. Other viruses overwrite areas that contain spaces (0x20).

Figure 4.9. A cavity virus injects itself into a cave of the host.

Figure 4.9. A cavity virus injects itself into a cave of the host.

The first known virus to use this technique was Lehigh, in 1987. Lehigh was a fairly unsuccessful virus. However, Ken van Wyk created a lot of publicity about the virus and eventually set up the VIRUS-L newsgroup on Usenet to discuss his findings.

Cavity infectors are usually slow spreaders on DOS systems. The Bulgarian Darth_Vader viruses, for instance, never caused major outbreaks. This was also due to the fact that Darth_Vader was a slow infector virus. It waited for a program to be written, and only then did it infect the program using a cavity of the host.

The W2K/Installer virus (written by virus writers Benny and Darkman) uses the cavity infection technique to infect Win32 PE on Windows 2000 without increasing the file's size.

A special kind of cavity virus infection relies on PE programs' relocation sections. Relocations of most executables are not used in normal situations. Modern linker versions can be configured to compile PE executable files without a relocation tableto make them shorter. Relocation cavity viruses overwrite the relocation section when there are relocations in the host. When the relocation section is longer than the virus, the virus does not increase the file size. Such viruses make sure that the relocation section is the last or it has sufficient length. Otherwise, the file gets corrupted easily during infection. For example, the W32/CTX and W95/Vulcano virus families use this technique.

4.2.7. Fractionated Cavity Viruses

A few Windows 95 viruses implement the cavity infection technique extremely successfully. The W95/CIH virus implements a variation of cavity infection called the fractionated cavity technique. In this case, the virus code is split between a loader routine and N number of sections that contain section slack space. First the loader (HEAD) routine of the virus locates the snippets of the virus code and reads them into a continuous area of memory, using an offset tablet kept in the HEAD part of the virus code. During infection, the virus locates the section slack gaps of portable executable (PE) files and injects its code into as many section slack holes as necessary.

A new viral entry point will be presented in the header of the file to point to the start of the virus code, usually inside the header section of the host applications. Some shorter cavity infectors, such as Murkry, use this area to infect files in a single step. However, CIH is longer and needs to split its code into snippets. Eventually, the virus executes the original host program from the stored entry point (EP). The advantage of the technique is that the virus only needs to "remember" the original EP of the host and simply jump there to execute the loaded program in memory.

Figure 4.10 represents the state of the host program before and after the infection of a fractionated cavity virus. The host would normally start at its entry point (EP) defined in its header section. The virus replaces that EP value with VEP, the viral entry point. The VEP points to the loader of the virus snippets. If there is not enough slack space in the file to present the loader in a single snippet, the file cannot get infected.

Figure 4.10. A fractionated cavity virus.

Figure 4.10. A fractionated cavity virus.

The section slacks are typically presented in modern file formats such as PE, and they can be easily located using the section header information of such files.

One of the special problems of cavity virus repair is that the content of overwritten areas cannot be restored 100%. This happens when the virus overwrites areas of files that usually contain zeros, but in other cases contain some other pattern. Thus the cryptographic checksums of files after repair will be often different from the original program's content. Furthermore, exact identification of such viruses is complicated because the virus snippets need to be pieced together.

Detection of the virus code is simple based on the content of the HEAD routine, which must be placed in a single snippet of code.

4.2.8. Compressing Viruses

A special virus infection technique uses the approach of compressing the content of the host program. Sometimes this technique is used to hide the host program's size increase after the infection by packing the host program sufficiently with a binary packing algorithm. Compressor viruses are sometimes called "beneficial" because such viruses might compress the infected program to a much shorter size, saving disk space. (Runtime binary packers, such as PKLITE, LZEXE, UPX, or ASPACK, are extremely popular programs. Many of these have been used independently by attackers to pack the content of Trojan horses, viruses, and computer worms to make them obfuscated and shorter.)

The DOS virus, Cruncher, was among the first to use the compression technique. Some of the 32-bit Windows viruses that use this technique include W32/HybrisF (a file infector plug-in of the Hybris worm), written by the virus writer, Vecna. Another infamous example is W32/Aldebera, which combines the infection method with polymorphism. Aldebera attempts to compress the host in such a way that the host remains equivalent in size to the original file. This virus was written by the virus writer, B0/S0 (Bozo) of the IKX virus writing group, in 1999.

The W32/Redemption virus of the virus writer, Jacky Qwerty, also uses the compression technique to infect 32-bit PE files on Windows systems. Figure 4.11 shows how a compressor virus attacks a file.

Figure 4.11. A compressor virus.

Figure 4.11. A compressor virus.

4.2.9. Amoeba Infection Technique

A rarely seen virus infection technique, Amoeba, embeds the host program inside the virus body. This is done by prepending the head part of the virus to the front of the file and appending the tail part to the very end of the host file. The head has access to the tail and is loaded later. The original host program is reconstructed as a new file on the disk for proper execution afterwards. For example, W32/Sand.12300, written by the virus writer, Alcopaul, uses this technique to infect PE files on Windows systems. Sand is written in Visual Basic.

Figure 4.12 shows the host program before and after infection by a virus that uses the Amoeba infection technique.

Figure 4.12. The Amoeba infection method.

Figure 4.12. The Amoeba infection method.

4.2.10. Embedded Decryptor Technique

Some crafty viruses inject their decryptors into the executable's code. The entry point of the host is modified to point to the decryptor code. The location of the decryptor is randomly selected, and the decryptor is split into many parts. The overwritten blocks are stored inside the virus code for proper execution of the host program after infection.

When the infected application starts, the decryptor is executed. The decryptor of the virus decrypts the encrypted virus body and gives it control. The Slovakian polymorphic virus, One_Half, used this method to infect DOS COM and EXE files in May 1994. Evidently, the proper infection of EXE files with this technique is a more complicated task. If relocations are applied to parts of the file that are overwritten with pieces of the virus decryptor, the decryptor might get corrupted in memory. This can result in problems in executing host programs properly.

Figure 4.13 shows the "Swiss cheese" layout of infected program content. The detection of such viruses made scanning code more complicated. The scanner needed either to detect decryptor blocks split into many parts or to include some more advanced scanning technique, such as code emulation, to resolve detection easily. (These techniques are discussed in Chapter 11.)

Figure 4.13. A "Swiss cheese" infection.

Figure 4.13. A "Swiss cheese" infection.

The easiest way to analyze such virus code is based on the use of special goat (decoy) files filled with a constant pattern, such as 0x41 ("A") characters. After the test infection, the overwritten parts stand out in the infected test program the following way:

142F:0D80 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA
142F:0D90 41 41 41 41 41 2E FD 16-2E F9 FB 36 E9 77 FD 41 AAAAA......6.w.A
142F:0DA0 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA
142F:0DB0 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA
142F:0DC0 41 41 41 41 41 41 41 41-41 3E 2E BB 88 14 2E F9 AAAAAAAAA>......
142F:0DD0 EB 9B 41 41 41 41 41 41-41 41 41 41 41 41 41 41 ..AAAAAAAAAAAAAA
 

Note the 0xE9 (JMP) and 0xEB (JMP short) patterns in the previous dump in two pieces of One_Half's decryptor. These are the pointers to the next decryptor block. In the past, several antivirus products would put together the pieces of the decryptor by following these offsets to decrypt the virus quickly and identify it properly.

4.2.11. Embedded Decryptor and Virus Body Technique

A more sophisticated infection technique was used by the Bulgarian virus, Commander_Bomber, written by Dark Avenger as one of his last known viruses in late 1993. The virus was named after the string that can be found in the virus body: COMMANDER BOMBER WAS HERE.

The Commander_Bomber virus body is split into several parts, which are placed at random positions on the host program, overwriting original content of the host. The head of the virus code starts in the front of the file and gives control to the next piece of the virus code, and so on. These pieces overwrite the host program in a way similar to the One_Half virus. The overwritten parts are stored at the end of the file, and a table is used to describe their locations.

Figure 4.14 shows the sophistication of the virus code's location within the host program. Scanners must follow the spiral path of the control flow from block to block until they find the main virus body.

Figure 4.14. Commander_Bomber-style infection.

Figure 4.14. Commander_Bomber-style infection.

The control blocks are polymorphic, generated by the DAME (Dark Avenger Mutation Engine) of the virus. This makes the blocks especially difficult to read because they contain a lot of garbage code with obfuscated ways to give control to the next block, until the nonencrypted virus body is reached. Eventually, the control arrives at the main virus body, which can be practically anywhere in the file, not at its very end. This is a major advantage for such viruses, because scanners need to locate where the main body of the virus is stored. Back in 1993, this technique was extremely sophisticated, and only a few scanners were able to detect such viruses effectively. The host program is reconstructed by the virus in runtime.

4.2.12. Obfuscated Tricky Jump Technique

W32/Donut, the first virus to infect .NET executables, was not dependent on JIT compilation as discussed in Chapter 3. This is because first editions of the .NET executable format can be attacked at its entry-point code, which is still architecture-dependent. (In later versions of Windows, this platform-dependent code will be eliminated by moving the functionality to the system loader itself.)

Donut gets control immediately upon executing an infected .NET PE file. The virus uses the simplest possible infection technique to infect .NET images. In fact, Donut turns .NET executables to regular-looking PE files. This is because the virus nullifies the data directory entry of the CLR header when it infects a .NET application.

The six-byte-long jump to the _CorExeMain() import at the entry point of .NET files is replaced by Donut with a jump to the virus entry point. The _CorExeMain() function is used to fire up the CLR execution of the MSIL code. The entry point in the header is not changed by the virus. This technique is called an obfuscated tricky jump. Evidently, this method can fool some heuristic scanners.

The actual jump at the entry point will be replaced with a 0xE9 (JMP) opcode, followed by an offset to the start of the virus body in the first physical byte of the relocation section, as shown in Figure 4.15.

Figure 4.15. An applied obfuscated tricky jump technique.

Figure 4.15. An applied obfuscated tricky jump technique.

The obfuscated tricky jump is a common technique to avoid changing the original entry point of the file. One of the first viruses that used this trick was DOS COM infector, Leapfrog, which followed the jump instruction at the front of the host and inserted its own jump to the actual entry point instead, as Figure 4.15 demonstrates.

The first documented Win32 virus, W32/Cabanas11, used this technique as an antiheuristic feature to infect regular PE files on Windows 95 and Windows NT.

When activated, W32/Donut displays the following message box shown in Figure 4.16.

Figure 4.16. The message box of the W32/Donut virus.

Figure 4.16. The message box of the W32/Donut virus.

Note

The virus writer wanted to call this creation ".dotNet," but because this is a platform name, it cannot be the name of the virus. For obvious reasons, viruses are not called "DOS," "Windows," and so on. So I decided to name the virus something that sounds similar to "dotNET," calling it Donut instead.

4.2.13. Entry-Point Obscuring (EPO) Viruses

Entry-point obscuring viruses do not change the entry point of the application to infect it; neither do they change the code at the entry point. Instead, they change the program code somewhere in such a way that the virus gets control randomly.

4.2.13.1 Basic EPO Techniques on DOS

Several viruses use the EPO strategy on DOS to avoid easy detection with fast scanners that scan the file near its entry-point code. For example, in early 1997 the Olivia virus12 infected DOS EXE and COM files using this method. This technique became increasingly popular among virus writers to defeat heuristics analyzer programs after 1995.

Olivia infects COM and EXE files as they are run or renamed or as their attributes are changed. First, the virus clears the attributes of the file, and then it opens the file to analyze its structure.

Figure 4.17 demonstrates the simplified look of an EPO virus-infected program.

Figure 4.17. A typical encrypted DOS EPO virus.

Figure 4.17. A typical encrypted DOS EPO virus.

If the victim has a COM extension, Olivia uses a special function that reads four bytes in a loop from the beginning of the victim and checks for E9h (JMP), EBh (JMP short), 90h (NOP), F8h (CLC), F9h (STC), FAh (CLI), FBh (STI), FCh (CLD), and FDh (STD) each time. If one of the previous instructions is found, the virus seeks the place of the next such instruction. If that position is not in the last 64 bytes of the host, the virus modifies the host program at the location where the previous instruction sequence was detected.

Olivia uses the 0x68 (Intel 286 PUSH) opcode to push a word value to the stack. This is followed by a 0xC3 (RET) instruction, which gives control to the virus code by popping the pushed offset to the decryptor of the virus.

(0x68) PUSH offset DECRYPTOR
(0xC3) RET

In Figure 4.17, a jump instruction is shown to transfer control to a decryptor located at the end of the file, followed by the encrypted virus body. Other viruses often use a CALL instruction or similar trampoline to transfer control to the start of the virus body.

Figure 4.18 shows the happy birthday message displayed by Olivia upon activation.

Figure 4.18. The payload of the Olivia virus.

Figure 4.18. The payload of the Olivia virus.

4.2.13.2 Advanced EPO Techniques on DOS

The Nexiv_Der virus13 (shown in Figure 4.19) is polymorphic in COM files, and it also infects the disk's boot sector (DBS). The most interesting technique of this virus, however, is the special EPO technique that it uses to infect files. Nexiv_Der was named after a backward string contained in its encrypted body: "Nexiv_Der takes on your files."

Figure 4.19. A polymorphic EPO virus.

Figure 4.19. A polymorphic EPO virus.

This virus traces the execution of a program as an application debugger does. Then it patches the code at a randomly selected location to a CALL instruction. This CALL instruction points to the polymorphic decryptor of the virus.

The execution path through a program depends on many parameters, including command-line arguments passed to the program and DOS version number. Depending on the same parameters, an infected victim program will most likely run the virus code upon normal execution each time. However, the virus might not run at all on a different version of DOS because the virus code cannot take control. This generates a major problem for even sophisticated heuristic scanners that use a virtual machine to simulate the execution of programs because it is difficult to emulate all of the system calls and the execution path of the victim.

The major idea of the Nexiv_Der virus is based on its hook of the INT 1 handler (TRACE) under DOS. This handler is the real infection routine. It starts to trace the host program for at least 256 instructions and stops at the maximum of 2,048 iterations. If the last instruction of the trace happens to be an E8h, E9h, or 80C0..CF opcode (CALL, JMP, ADD AL,byte .. OR BH,byte), then Nexiv_Der replaces it with a CALL instruction, which starts the virus at the end of the file. Figure 4.19 shows a high-level look at a Nexiv_Der-infected executable.

The main advantage of this technique is the increased likelihood of virus code execution in a similar host-system environment. This technique, however, is too complicated and thus encountered very rarely.

4.2.13.3 EPO Viruses on 16-Bit Windows

One of the first EPO viruses in the wild was the Tentacle_II family14 on Windows 3.x systems. This virus does not change the original entry point of the NE header, which is the obvious choice of typical 16-bit Windows viruses. How can it take control, then? The virus takes advantage of the NE file structure. Although the NE on disk structure is more complicated to parse, it provides many more possibilities for an attacker to inject code reliably into the execution flow. Tentacle_II takes advantage of the module reference table of NE files to find common function calls that are expected to be executed among the first function calls made by the host program.

Tentacle_II checks for the KERNEL and VBRUN300 module names in the module reference table. The virus picks the module number of the found module name and reads the segment relocation records of every segment. It looks for the relocation record 91 (INITTASK) in the case of KERNEL or 100 (THUNKMAIN) in the case that VBRUN300 has been found previously. Both of these relocation records point to standard initialization code that must be called at the beginning of a Windows application. For example, the original KEYVIEW.EXE (a standard Windows application) has a relocation entry for KERNEL.91 for its first segment as follows:

type offset target
PTR 0053h USER.1
OFFS 007Eh KERNEL.178
PTR 0073h FAXOPT.12
PTR 00D3h USER.5
PTR 005Bh FAXOPT.44
PTR 00CAh KERNEL.30
PTR 0031h USER.176
PTR 00A0hKERNEL.91 (INITTASK)
PTR 008Eh KERNEL.102
Relocations: 9

When KEYVIEW.EXE gets infected, the virus patches this record to point to a new segment, the VIRUS_SEGMENT.

Segment relocation records:

  1. Segment 0001h relocations
    type offset target
    PTR 0053h USER.1
    OFFS 007Eh KERNEL.178
    PTR 0073h FAXOPT.12
    PTR 00D3h USER.5
    PTR 005Bh FAXOPT.44
    PTR 00CAh KERNEL.30
    PTR 0031h USER.176
    PTR 00A0h 0003h:002Eh (VIRUS_SEGMENT:2Eh)
    PTR 008Eh KERNEL.102
    Relocations: 9
    
  2. Segment 0003h relocations (VIRUS_SEGMENT)
    type offset target
    PTR 2964h SHELL.6 (REGQUERYVALUE)
    PTR 2968h SHELL.5 (REGSETVALUE)
    PTR 296Ch KERNEL.91 (INITTASK -> STARTHOST)
    Relocations: 3
    

Thus the infected file starts as it would before the infection, but when the application calls one of the preceding initialization functions, control is passed to the address where the virus starts.

The VIRUS_SEGMENT has three relocation records. One of these will point to the original initialization procedure KERNEL.91 or VBRUN300. In this way, the virus is able to start the host program after itself. This infection technique is an NE entry pointobscuring infection technique, which makes Tentacle_II an antiheuristic Windows virus.

The preceding analysis was made with the help of Borland's TDUMP (Turbo Dump) utility. In the analysis techniques and tools sections, I will give a longer introduction to such tools and their role in virus analysis.

The payload of Tentacle_II is shown in Figure 4.20. The virus creates a TENTACLE.GIF file on the disk, which will be displayed each time a GIF image is viewed on the infected system.

Figure 4.20. The payload of the Tentacle_II virus.

Figure 4.20. The payload of the Tentacle_II virus.

4.2.13.4 API-Hooking Technique on Win32

On Win32 systems, EPO techniques became highly advanced. The PE file format15 can be attacked in different ways. One of the most common EPO techniques is based on the hooks of an instruction pattern in the program's code section. A typical Win32 application makes a lot of calls to APIs (application program interfaces). Many Win32 EPO viruses take advantage of API CALL points and change these pointers to their own start code.

For example, the W32/CTX and W32/Dengue viruses of GriYo locate a CALL instruction in the host program's code section that points to the import directory. In this way, the virus can reliably identify byte patterns that belong to a function call. After that, the CALL instruction is modified in such a way that it will point to the start of the virus code located elsewhere, typically appended to the end of the file. Such viruses typically search for one or both API call implementations:

  • Microsoft API Implementation
    CALL DWORD PTR []
  • Borland API Implementation
    JMP DWORD PTR []

This kind of virus also makes its selection for an API hook location totally at random; in some cases, the virus might not even get control each time a host program is executed. Some families of computer viruses make sure that the virus will execute from the file most of the time.

Viruses can hook an API that is called whenever the application exits back to the system. In this case, most programs call the ExitProcess() API. By replacing the call to ExitProcess() with the call to the virus body, a virus can trigger its infection routine more reliably whenever the application exits. To make antivirus detection more difficult, viruses often combine EPO techniques with code obfuscation techniques, such as encryption or polymorphism.

Figure 4.21 illustrates a Win32 EPO virus that replaces a CALL to ExitProcess() API with a CALL to the virus code. After the virus takes control, it will eventually run the original code (C) by fixing the code in memory and giving control to the fixed block.

Figure 4.21. An EPO virus that hooks API calls of the host.

Figure 4.21. An EPO virus that hooks API calls of the host.

Normally disk activity increases whenever the application exits. This happens for several different reasons. For example, if an application has used a lot of virtual memory, the operating system will need to do a lot of paging, which increases disk activity. Thus it is likely that viruses like this remain unnoticed for a long time.

4.2.13.5 Function Call Hooking on Win32

Another common technique of EPO viruses is to locate a function call reliably in the application's code section to a subroutine of the program. Because the pattern of a CALL instruction could be part of another instruction's data, the virus would not be able to identify the instruction boundaries properly by looking for CALL instruction alone.

To solve this problem, viruses often check to see whether the CALL instruction points to a pattern that appears to be the start of a typical subroutine call, similar to the following:

	CALL Foobar
Foobar:
	PUSH	EBP		; opcode 0x55
	MOV	EBP, ESP	; opcode 0x89E5

Figure 4.22 illustrates the replacement of a function call to Foobar() with a call to the start of virus code. The Foobar() function starts with the 0x55 0x89 0xe5 sequence; it is easily identified as a function entry point. A similar opcode sequence is 0x55 0x8B 0xEC, which also translates to the same assembly. This virus technique is used by variants of W32/RainSong (created by the virus writer, Bumblebee).

Figure 4.22. Function callhooking EPO virus.

Figure 4.22. Function callhooking EPO virus.

Note

The Russian virus, Zhengxi, uses a checksum of the preceding patterns, among others, to obfuscate the virus code further. Zhengxi uses the pattern to infect DOS EXE files using the EPO technique.

4.2.13.6 Import Table Replacing on Win32

Newer Win32 viruses infect Win32 executables in such a way that they do not need to modify the original code of the program to take control. Instead, such EPO viruses work somewhat similarly to the 16-bit Windows virus Tentacle_II.

To get control, the virus simply changes the import address table entries of the PE host in such a way that each API call of the application via the import address directory will run the virus code instead. In turn, the activated virus code presents a new import table in the memory image of the program. As a result, consequential API CALLs run proper, original entry-point code via the fixed import table.

This technique is used by the W32/Idele family of computer viruses, written by the virus writer, Doxtor L, as shown in Figure 4.23. W32/Idele changes the program section slack area of the code section with a small routine that allocates memory and decrypts the virus code into the allocated block and then executes it. Thus Idele avoids creating import entries with addresses that do not point to the code section.

Figure 4.23. Import table-replacing EPO virus.

Figure 4.23. Import table-replacing EPO virus.

4.2.13.7 Instruction Tracing Technique on Win32

The Nexiv_Der virus inspired modern virus writing on 32-bit Windows systems. In 2003, new viruses started to appear that use EPO, based on the technique that was pioneered on DOS. For example, the W32/Perenast16 family of viruses is capable of tracing host programs before infection by running the host as a hidden debug process using standard Windows debug APIs.

4.2.13.8 Use of "Unknown" Entry Points

Another technique to execute virus code in a semi-EPO manner involves code execution via non-well-known entry points of applications. The Win32 PE file format is commonly known to execute applications from the MAIN entry point stored in the PE.OptionalHeader.AddressOfEntryPoint field of the executable's header structure. Thus it is common knowledge that such programs always start wherever this field points.

It might come as a surprise that this is not necessarily the first entry point in a PE file that the system loader executes. On Windows NT systems and above, the system loader looks for the thread local storage (TLS) data directory in the PE files header first. If it finds TLS entry points, it executes these first. Only afterward will it run the MAIN entry-point code.

The following two message boxes are printed by a TLSDEMO program of Peter Ferrie. The demo was created when he discovered the TLS entry-point trick at Symantec during heuristic analysis research in 2000.

When the application is executed, it prints a message box from both the TLS and the MAIN entry points of the applications.

First, it prints the message box from the TLS, as shown in Figure 4.24.

Figure 4.24. The TLS entry point is executed first.

Figure 4.24. The TLS entry point is executed first.

When you click on the OK button, you arrive at the real main entry point, as shown in Figure 4.25.

Figure 4.25. The main entry point is executed next.

Figure 4.25. The main entry point is executed next.

Initially we did not talk about this trick because it could be used to develop even trickier viruses. However, the virus writer, roy g biv, discovered this undocumented trick and has already used it successfully in some of his W32/Chiton17 viruses in 2003.

4.2.13.9 Code IntegrationBased EPO Viruses

A very sophisticated virus infection technique is called code integration. A virus using this technique inserts its own code into the execution flow of the host program using standard EPO techniques and merges its code with the host program's code. This is a complicated process that requires complete disassembling and reassembling of the host. Fortunately, it is extremely complicated to develop such viruses. Disassembling the host program is a fairly CPU-intensive operation that requires a lot of memory. Such viruses need to update the host program's content with proper relocations for code and data sections of the host. The W95/Zmist virus, by the Russian virus writer, Zombie, uses this approach. Because of its high sophistication, this technique is detailed in Chapter 7.

Figure 4.26 shows a typical layout of a file infected with a sophisticated code-integration EPO virus.

Figure 4.26. A poly and metamorphic code-integration virus.

Figure 4.26. A poly and metamorphic code-integration virus.

Code integration is a major challenge for scanners and computer virus analysts. The entire file must be examined to find the virus. The virus is camouflaged in the code section of the infected host program, and it is very difficult to locate the instruction that transfers control to the start of the virus. In the case of W95/Zmist, the decryptor of the virus code is not in one piece but is split in a manner similar to the One_Half or the Commander_Bomber virus.

4.2.14. Possible Future Infection Techniques: Code Builders

After reading the previous sections, you might wonder what could get more complicated and sophisticated than the code-integration EPO technique. This section provides you with an example that has not yet been seen in the most complex implementations of known computer viruses with the kind of sophistication that is unknown in computer viruses. The closest example is the W95/Zmist virus. The Zmist virus makes use of the host program's content in a manner that is similar to the Code Builder technique. Zmist calls into the host program's code to execute an RET (Return) instruction from it. Thus, the virus code flows into the host program's code and back. The author of the virus probably intended to extend this approach to build the entire virus body on the fly, using the content of the host program. Consider the code-builder virus shown in Figure 4.27.

Figure 4.27. A code-builder virus.

Figure 4.27. A code-builder virus.

The idea is based on the fact that any program might contain another set of programs in it as instructions or instruction sequences. A virus might be able to analyze the host program's code in such a sophisticated way that these strings of instruction could be used as the virus itself. It might be difficult to find code that would transfer control properly with accurate register state. However, to demonstrate the idea, imagine a simple code-builder virus that would find the letters V, I, R, U, and S in the host program's code. The builder of the virus would copy these pieces together into memory. The builder itself would look like a generic sequence of code, which could be easy to vary based on metamorphic techniques. The builder would be integrated into the code of the host program itself.

Fortunately, this is a rather complicated virus, but certainly it would be very challenging to detect it in files. (A few members of the W95/Henky family use an approach similar to this, except that the viruses are not EPO, which simplifies their detection.)

4.3. An In-Depth Look at Win32 Viruses

The world of computer antivirus research has changed drastically since Windows 95 appeared on the market18. One reason this happened was that a certain number of DOS viruses became incompatible with Windows 95. In particular, the tricky viruses that used stealth techniques and undocumented DOS features failed to replicate under the new system. Many simple viruses remained compatible with Windows 95, such as Yankee Doodle, a very successful old Bulgarian virus. Regardless of this, virus writers felt that the new challenge was to investigate the new operating system, to create new DOS executable viruses and boot viruses with special attention to Windows 95 compatibility. Because most virus writers did not have enough in-depth knowledge of the internal mechanisms of Windows 95, they looked for shortcuts to enable them to write viruses for the new platform. They quickly found the first one: macro viruses, which are generally not dependent on the operating system or on hardware differences.

Some young virus writers are still happy with macro viruses and develop them endlessly. After writing a few successful macro viruses, however, most grow bored and stop developing them. You may think, fortunately, but the truth is otherwise. Virus writers are looking for other challenges, and they usually find new and different ways to infect systems.

The first Windows 95 virus, W95/Boza, appeared in the same year that Windows 95 was introduced. Boza was written by a member of the Australian VLAD virus-writing group. It took a long time for other virus writers to understand the workings of the system but, during 1997, new Windows 95 viruses appeared, some of them in the wild.

At the end of 1997, the first Win32, Windows NT compatible virus, Cabanas, was written by the same young virus writer (Jacky Qwerty/29A) who wrote the infamous WM/Cap.A virus. Cabanas is compatible with Windows 9x, Windows NT, and Win32s. (It is also compatible with Windows 98 and Windows 2000, even though the virus code was never tested on these systems by the virus writer because these systems appeared later than the actual virus.) Cabanas turned Microsoft's Win32 compatibility dream into a nightmare.

Although it used to be difficult to write such viruses, we suspected that file-infecting DOS viruses from the early years of computer viruses would eventually be replaced by Win32 creations.

This transition in computer virus writing was completed by 2004. Even macro viruses are now very rare; virus writers currently focus on 32-bit and 64-bit Windows viruses.

4.3.1. The Win32 API and Platforms That Support It

In 1995, Windows 95 was introduced by Microsoft as a new major operating system platform. The Windows 95 system is strongly based on Windows 3.x and DOS technologies, but it gives real meaning to the term Win32.

What is Win32? Originally, programmers did not even understand the difference between Win32 and Windows NT. Win32 is the name of an API no more, no less. The set of system functions available to be called from a 32-bit Windows application is contained in the Win32 API. The Win32 API is implemented on several platformsone of them being Windows NT, the most important Win32 platform. Besides DOS programs, Windows NT also is capable of executing 16-bit Windows programs, OS/2 1.x character applications (and, with some extensions, even Presentation Managerbased 1.3 programs with some limitations). In addition, Windows NT introduced the new portable executable (PE) file format (format very similar to, if not based on, the UNIX COFF format) that can run Win32 applications (which call functions in the Win32 API set). As the word portable indicates, this format is supposed to be an easily portable file format, which is actually the most common and important one to run on Windows NT.

Other platforms are also capable of running Win32 applications. In fact, one of them was shipped before Windows NT. This platform is called Win32s. Anyone who has ever tried to develop software for Win32s knows that it was a very unstable solution.

Because Windows NT is a robust system that needs strong hardware on which to run, Win32 technology did not take the market position Microsoft wanted quickly enough. That process ended up with the development of Windows 95, which supported the new PE format by default. Therefore, it supports a special set of Win32 APIs. Windows 95 is a much better implementation of the Win32 APIs than Win32s. However, Windows 95 does not contain the full implementation of the Win32 APIs found in Windows NT.

Until Windows NT gained more momentum, Windows 9x was Microsoft's Win32 platform. After Windows NT, Windows 2000 and Windows 98/Me gained popularity and were replaced by Windows XP and the more secure Windows 2003 server editions, which support the .NET extension by default. On the horizon, Microsoft is talking about the next new Windows release, codenamed Longhorn. All of these systems will support a form of Win32 API that, in most cases, provides binary compatibility among all of these systems.

Last but not least, the Win32 API and the PE format are supported by Windows CE (Windows Mobile edition), which is used primarily by handheld PCs. The main hardware requirement includes 486 and above Intel and AMD processors for a Windows CE platform. However, current implementations seem to use SH3, ARM, and Intel XScale processors.

Now we get to the issue of CPUs. Both Windows NT and Windows CE are capable of running on machines that have different CPUs. The same PE file format is used on the different machines, but the actual executed code contains the compiled binary for the actual processor, and the PE header contains information about the actual processor type needed to execute the image. All of these platforms contain different implementations of Win32 functions. Most functions are available in all implementations. Thus a program can call them regardless of the actual platform on which it is running. Most of the API differences are related to the actual operating system capabilities and available hardware resources. For instance, CreateThread() simply returns NULL when called under Win32s. The Windows CE API set consists of several hundred functions, but it does not support trivial functions such as GetWindowsDirectory() at all because the Windows CE KERNEL is designed to be placed in ROM of the handheld PC. Due to the hardware's severe restrictions (Windows CE must run on machines with 2 or 4MB of RAM without disk storage), Microsoft was forced to create a new operating system that had a smaller footprint than either Windows NT or Windows 95.

Although several manifestations of the Win32 API implement some of the Win32 APIs differently or not at all, in general it is feasible to write a single program that will work on any platform that supports Win32 APIs. Virus writers already understand this fact very well. Their first such virus creation attacked Windows 95 specifically, but virus writers slowly improved the infection methods to attack the PE file format in such a way that the actual infected program remains compatible and also executes correctly under Windows NT/2000/XP systems.

Most Windows 95 viruses depend on Windows 95 system behavior and functionality, such as features related to VxD (virtual device driver) and VMM (virtual machine manager), but some of them contain only a certain amount of bugs and need only slight fixes to be able to run under more than one Win32 platform, such as Windows 95/Windows NT.

Detection and disinfection of such viruses is not a trivial task. In particular, the disinfection can be difficult to implement. This is because, so far, the PE structure is much more complicated than any other executable file format used by DOS or Windows 3.x. However, it is also a fact that the PE format is a much nicer design than, for example, NE.

Unfortunately, over the period from 1995 to 2004, virus writers utilized these platforms aggressively, resulting in the appearance of more than 16,000 variants of 32-bit Windows viruses. However, the principles of these viruses have not changed much. In the next section, you will find details about infection techniques of the PE file format from the perspective of an attacker.

Note

Win64 is almost the same as Win32, but for 64-bit Windows architectures. There are a couple of minor modifications in Win64 to accommodate the platform differences.

4.3.2. Infection Techniques on 32-Bit Windows

This section describes the different ways in which a 32-bit Windows virus can infect different kinds of executable programs used by Windows 95/Windows NT. Because the most common file format is the PE format, most of the infection methods are related to that. The PE format makes it possible for viruses to jump easily from one 32-bit Windows platform to another. We shall concentrate on infection techniques that attack this particular format because these viruses have a strong chance of remaining relevant in the future.

Early Windows 95 viruses have a VxD part, which is dropped by other infected objects such as DOS, EXE, and COM executables or a PE application. Some of these infection methods are not related to Win32 platforms on the API level. For instance, VxDs are only supported by Windows 9x and Windows 3.x, not by Windows NT. VxDs have their own 32-bit, undocumented, linear executable (LE) file format. It is interesting to note that this format was 32-bit even at the time of 16-bit Windows. Microsoft could not drop the support of VxDs from Windows 95 because of the many third-party drivers developed to handle special hardware components. The LE file format remained undocumented by Microsoft, but there are already several viruses, such as Navrhar, that infect this format correctly. I will describe these infection techniques briefly to explain the evolution of Win32 viruses.

4.3.2.1 Introduction to the Portable Executable File Format

In the following section, I will provide an introductory tour of the PE file format that Microsoft designed for use on all its Win32 operating systems (Windows NT, Windows 95, Win32s, and Windows CE). There are several good descriptions of the format on the Microsoft Developer Network CD-ROM, as well as in many other Windows 95 related books, so I'll describe the PE format from the point of view of known virus infection techniques. To understand how Win32 viruses work, you need to understand the PE format. It is that simple.

The PE file format will play a key role in all of Microsoft's operating systems for the foreseeable future. It is common knowledge that Windows NT has a VAX VMS and UNIX heritage. As I mentioned earlier, the PE format is very similar to COFF (common object file format), but it is an updated version. It is called portable because the same file format is used under various platforms.

The most important thing to know about PE files is that the executable code on disk is very similar to what the module looks like after Windows has loaded it for execution. This makes the system loader's job much simpler. In 16-bit Windows, the loader must spend a long time preparing the code for execution. This is because in 16-bit Windows applications, all the functions that call out to a DLL (dynamic loaded library) must be relocated. Some huge applications can have thousands of relocations for API calls, which have to be patched by the system loader while reading the file in portions and allocating memory for its structures one by one. PE applications do not need relocation for library calls anymore. Instead, a special area of the PE file, the import address table (IAT), is used for that functionality by the system loader. The IAT plays a key role in Win32 viruses, and I shall describe it later in detail.

For Win32, all the memory used by the module for code, data, resources, import tables, and export tables is in one continuous range of linear address space. The only thing that an application knows is the address where the loader mapped the executable file into memory. When the base address is known, the various pieces of the module can easily be found by following pointers stored as part of the image.

Another idea you should become familiar with is the relative virtual address, or RVA. Many fields in PE files are specified in terms of RVAs. An RVA is simply the offset of an item to where the file is mapped. For instance, the Windows loader might map a PE application into memory starting at address 0x400000 (the most common base address) in the virtual address space. If a certain item of the image starts at address 0x401234, then the item's RVA is 0x1234.

Another concept to be familiar with when investigating PE files and the viruses that infect them is the section. A section in a PE file is roughly equivalent to a segment in a 16-bit NE file. Sections contain either code or data (and occasionally a mixture of both). Some sections contain code or data declared by the actual application, whereas other data sections contain important information for the operating system. Before jumping into the important details of the PE file, examine Figure 4.28, which shows the overall structure of a PE file.

Figure 4.28. A high-level view of the PE file image.

Figure 4.28. A high-level view of the PE file image.

4.3.2.1.1 The PE Header

The first important part of the PE format is the PE header. Just like all the other Microsoft executable file formats, the PE file has a header area with a collection of fields at an easy-to-find location. The PE header describes vital pieces of the portable executable image. It is not at the very beginning of the file; rather, the old DOS stub program is presented there.

The DOS stub is just a minimal DOS EXE program that displays an error message (usually "This program cannot be run in DOS mode"). Because this header is presented at the beginning of the file, some DOS viruses can infect PE images correctly at their DOS stub. However, Windows 95 and Windows NT's system loaders execute PE applications correctly as 32-bit images, and the DOS stub program remains as a compatibility issue with 16-bit Windows systems.

The loader picks up the PE header's file address from the DOS header lfanew field. The PE header starts with an important magic value of PE\0\0. After that is the image file header structure, followed by the image optional header.

From now on, I will describe only the important fields of the PE header that are involved with Windows 9x/Win32 viruses. The fields are in order, but I will concentrate on the most commonly used values so several will be missing from the list.

Figure 4.28 shows the high-level structure of a PE file image.

The following paragraphs list important fields of the image file header.

WORD Machine
Indicates the CPU for which this file is intended. Many Windows 9x virus check this field by looking for the Intel i386 magic value before actual infection. However, some bogus viruses do not check the machine type and infect PE files for other platforms and cause such files to crash when the virus code is executed on the wrong platform. There is a certain risk that we will see viruses with multiprocessor support in the future. For example, the same viruses could target ARM as well as IA64 and regular X86 PE files.
WORD NumberOfSections
The number of sections in the EXE (DLL). This field is used by viruses for many different reasons. For instance, the NumberOfSections field is incremented by viruses that add a new section to the PE image and place the virus body in that section. (When this field is changed by the virus code, the section table is patched at the same time.) Windows NT based systems accept up to 96 sections in a PE file. Windows 95 based system do not inspect the section number.
WORD Characteristics
The flags with information about the file. Most viruses check these flags to be sure that the executable image is not a DLL but a program. (Some Windows 9x viruses infect KERNEL32.DLL. If so, the field is used to make sure that the executable is a DLL.) This field is not usually changed by viruses.

Important fields of the image optional header follow.

WORD Magic
The optional header starts with a "magic" field. The value of the field is checked by some viruses to make sure that the actual program is a normal executable and not a ROM image or something else.
DWORD SizeOfCode
This field describes the rounded-up size of all executable sections. Usually viruses do not fix the value when adding a new code section to the host program. However, some future viruses might change this value.
DWORD AddressOfEntryPoint
The address where the execution of the image begins. This value is an RVA that normally points to the .text (or CODE) section. This is a crucial field for most Windows 9x/Win32 viruses. The field is changed by most of the known virus infection types to point to the actual entry point of the virus code.
DWORD ImageBase
When the linker creates a PE executable, it assumes that the image will be mapped to a specific memory location. That address is stored in this field. If the image can be loaded to the specified address (currently 0x400000 in Microsoft programs), then the image does not need relocation patches by the loader. This field is used by most viruses before infection to calculate the actual address of certain items, but it is not usually changed.
DWORD SectionAlignment
When the executable is mapped into memory, each section must start at a virtual address that is a multiple of this value. This field minimum is 0x1000 (4096 bytes), but linkers from Borland use much bigger defaults, such as 0x10000 (64KB). Most Win32 viruses use this field to calculate the correct location for the virus body but do not change the field.
DWORD FileAlignment
In the PE file, the raw data starts at a multiple of this value. Viruses do not change this value but use it in a similar way to SectionAlignment.
DWORD SizeOfImage
When the linker creates the image, it calculates the total size of the portions of the image that the loader has to load. This includes the size of the region starting at the image base up through the end of the last section. The end of the last section is rounded up to the nearest multiple of section alignment. Almost every PE infection method uses and changes the SizeOfImage value of the PE header.

Not surprisingly, many viruses calculate this field incorrectly, which makes image execution impossible under Windows NT. This is because the Windows 9x's loader does not bother to check this value when executing the image. Usually (and fortunately) virus writers do not test their creations for long, if at all. Most Windows 95 viruses contain this bug. Some antivirus software used to calculate this field incorrectly when disinfecting files. This causes a side effect: A Windows NTcompatible Win32 program will not be executed by Windows NT but only by Windows 9x, even when the application has been disinfected.

DWORD Checksum
This is a checksum of the file. Most executables contain 0 in this field. All DLLs and drivers, however, must have a checksum. Windows 95's loader simply ignores the checking of this field before loading DLLs, which makes it possible for some Windows 95 viruses to infect KERNEL32.DLL very easily. This field is used by some viruses to represent an infection marker to avoid double infections. Another set of viruses recalculates it to hide an infection even better.
4.3.2.1.2 The Section Table and Commonly Encountered Sections

Between the PE header and the raw data for the image's sections lies the section table. The section table contains information about each section of the actual PE image. (See the following dumps that I made with the PEDUMP tool.)

Basically, sections are used to separate different functioning modules from each other, such as executable code, data, global data, debug information, relocation, and so on. The section table modification is important for viruses to specify their own code section or to patch an already existing section to fit actual virus code into it. Each section in the image has a section header in the section table. These headers describe the name of each section (.text, ... .reloc) as well as its actual, virtual, and raw data locations and sizes. First-generation viruses, like Boza, patch a new section header into the section table. (Boza adds its own .vlad section, which describes the location and size of the virus section.)

Sometimes there is no place for a section header in the file, and the patch cannot take its place easily. Therefore, viruses today (such as W95/Anxiety19 variants) attack the last existing section header and modify its fields to fit the virus code in that section. This makes the virus code section less visible and the infection method less risky.

Listing 4.1 is the section table example of CALC.EXE (the Windows Calculator).

Listing 4.1. Looking at the Section Table of CALC.EXE with PEDUMP

01 .text VirtSize: 000096B0 VirtAddr: 00001000
raw data offs: 00000400 raw data size: 00009800
relocation offs: 00000000 relocations: 00000000
line # offs: 00000000 line #'s: 00000000
characteristics: 60000020 CODE MEM_EXECUTE MEM_READ

02 .bss VirtSize: 0000094C VirtAddr: 0000B000
raw data offs: 00000000 raw data size: 00000000
relocation offs: 00000000 relocations: 00000000
line # offs: 00000000 line #'s: 00000000
characteristics: C0000080 UNINITIALIZED_DATA MEM_READ MEM_WRITE

03 .data VirtSize: 00001700 VirtAddr: 0000C000
raw data offs: 00009C00 raw data size: 00001800
relocation offs: 00000000 relocations: 00000000
line # offs: 00000000 line #'s: 00000000
characteristics: C0000040 INITIALIZED_DATA MEM_READ MEM_WRITE

04 .idata VirtSize: 00000B64 VirtAddr: 0000E000
raw data offs: 0000B400 raw data size: 00000C00
relocation offs: 00000000 relocations: 00000000
line # offs: 00000000 line #'s: 00000000
characteristics: 40000040 INITIALIZED_DATA MEM_READ

05 .rsrc VirtSize: 000015CC VirtAddr: 0000F000
raw data offs: 0000C000 raw data size: 00001600
relocation offs: 00000000 relocations: 00000000
line # offs: 00000000 line #'s: 00000000
characteristics: 40000040 INITIALIZED_DATA MEM_READ

06 .reloc VirtSize: 00001040 VirtAddr: 00011000
raw data offs: 0000D600 raw data size: 00001200
relocation offs: 00000000 relocations: 00000000
line # offs: 00000000 line #'s: 00000000
characteristics: 42000040 INITIALIZED_DATA MEM_DISCARDABLE MEM_READ

The name of the section can be anything. It could even contain just zeros; the loader does not seem to worry about the name. In general, however, the name field describes the actual functionality of the section.

There is a chance for confusion here because the actual code is placed into a .text section of the PE files. This is the traditional name, the same as in the old COFF format. The linker concentrates all the .text section of the various OBJ files to one big .text section and places this in the first position of the section table. As I will describe later, the .text section contains not only code, but an additional jump table for DLL library calls. The Borland linker calls the .text section CODE, which is not a traditional name (but not one beyond normal understanding).

Another common section name is .data, where the initialized data goes. The .bss section contains uninitialized static and global variables. The .rsrc contains and stores the resources for the application.

The .idata section contains the import tablea very important part of the PE format for viruses. (Note that sections are only used as logical separators in the file image. Because nothing is mandatory, the ".idata" section's content might be merged in any other sectionsor not presented at all.)

The .edata section is also very important for viruses because it lists all the APIs that the actual module exports for other executables.

The .reloc section stores the base relocation table. Some viruses take special care of relocation entries of the executables; however, this section seems to disappear from most Windows 98 executables from Microsoft. Somehow the .reloc section had an early PE format design problem. The actual program is loaded before its DLLs, and the application is executed in its own virtual address spacethere seems to be no real need for that.

Last but not least, there is a common section name, the .debug section, which holds the debug information of the executable (if there is any). This is not important for viruses, although they could take advantage of it for infections.

Because the name of the section can be specified by the programmer, some executables contain all kinds of special names by default.

Three of the section table header's fields are very important for most viruses: VirtualSize (which holds the virtual size of the section), SizeOfRawData (which holds the size of the section after it has been rounded up to the nearest file alignment), and the Characteristics field.

The Characteristics field holds a set of flags that indicate the section's attributes (code, data, readable, writable, executable, and so on). The code section has an executable flag but does not need writable attributes because the data are separated. This is not the same with appended virus code, which must keep its data area somewhere in its code. Therefore viruses must check for and change the Characteristics field of the section in which their code will be presented.

All of this indicates that the actual disinfection of a 32-bit virus can be more complicated than that of a normal DOS EXE virus. The infection itself is not trivial in most methods, but so many sources are available on various Internet locations that virus writers have all the necessary support to write new virus variants easily.

4.3.2.1.3 PE File Imports: How Are DLLs Linked to Executables?

Most of the Windows 9x and Windows NT viruses are based heavily on the understanding of the import table, which is a very important part of the PE structure. In Win32 environments, DLLs are linked through the PE file's import table to the application that uses them. The import table holds the names of the imported DLLs and also the names of the imported functions from those DLLs. Consider the following examples:

ADVAPI32.DLL
Ordn	Name
285	RegCreateKeyW
279	RegCloseKey

KERNEL32.DLL
Ordn	Name
292	GetProfileStringW
415	LocalSize
254	GetModuleHandleA
52	CreateFileW
278	GetProcAddress
171	GetCommandLineW
659	lstrcatW
126	FindClose
133	FindFirstFileW
470	ReadFile
635	WriteFile
24	CloseHandle
79	DeleteFileW

The executable code is located in the .text section of PE files (or in the CODE section, as the Borland linker calls it). When the application calls a function that is in a DLL, the actual CALL instruction does not call the DLL directly. Instead, it goes first to a jump (JMP DWORD PTR [XXXXXXXX]) instruction somewhere in the executable's .text section (or in the CODE section in the case of Borland linkers).

The address that the jump instruction looks up is stored in the .idata section (or sometimes in .text) and is called an entry within the IAT (Import Address Table). The jump instruction transfers control to that address pointed by the IAT entry, which is the intended target address. Thus, the DWORD in the .idata section contains the real address of the function entry point, as shown in the following dump. In Listing 4.2, an application calls FindFirstFileA() in KERNEL32.DLL.

Listing 4.2. Function Imports

.text (CODE)
0041008E E85A370000 CALL 004137ED ; KERNEL32!FindFirstFileA

004137E7 FF2568004300 JMP [KERNEL32!GetProcAddress] ; 00430068
004137ED FF256C004300 JMP [KERNEL32!FindFirstFileA] ; 0043006C
004137F3 FF2570004300 JMP [KERNEL32!ExitProcess] ; 00430070
004137F9 FF2574004300 JMP [KERNEL32!GetVersion] ; 00430074

.idata (00430000)
.
00430068 1E3CF177 ;-> 77F13C1E Entry of KERNEL32!GetProcAddress
0043006C DBC3F077 ;-> 77F0C3DB Entry of KERNEL32!FindFirstFileA
00430070 6995F177 ;-> 77F19569 Entry of KERNEL32!ExitProcess
00430074 9C3CF177 ;-> 77F13C9C Entry of KERNEL32!GetVersion

The calls are implemented in this way to make the loader's job easier and faster. By thunking all calls to a given DLL function through one location, there is no longer the need for the loader to patch every instruction that calls a DLL. All the PE loader has to do is patch the correct addresses into the list of DWORDs in the .idata section for each imported function.

The import table is very useful for modern 32-bit Windows viruses. Because the system loader has to patch the addresses of all the APIs that a Win32 program uses by importing, viruses can easily get the address of an API they need to call by looking into the host program's import table.

With traditional DOS viruses, this problem does not exist. When a DOS virus wants to access a system service function, it simply calls a particular interrupt with the corresponding function number. The actual address of the interrupt is placed in the interrupt vector table and is picked up automatically during the execution of the program. The interrupt vector table is not saved from the running programs; all applications can read and write into it because there are no privilege levels in DOS. The OS and all applications share the same available memory with equivalent rights. Therefore access to a particular system function does cause problems for a DOS virus. It has access to everything it needs by default, regardless of the infection method used.

A Windows 95 virus must call APIs or system services to operate correctly. Most 32-bit applications use the import table, which the linker prepares for them. However, there are a couple of ways to avoid imports. Avoiding imports is often necessary for compatibility reasons. When an application is linked to a DLL, the actual program cannot be executed if the system loader cannot load all the DLLs specified in the import table. Moreover, the system loader checks all the necessary API calls and patches their addresses into the import table. If the loader is unable to locate a particular API by its name or ordinal value, the application cannot be executed.

Some applications must overcome this problem. For instance, if a Win32 program wants to list by name all the running processes under both Windows 95 and NT, it must use system DLLs and API calls under Windows 95 that are different from those under Windows NT. In such a case, the application is not linked directly to all the DLLs it wants to access because the program could not be executed on any system. Instead, the LoadLibrary() function is used to load the necessary DLLs, and GetProcAddress() is used to get the API's address. The actual program can access the API address of LoadLibrary() and GetProcAddress() from its import table. This solves the chicken-and-egg problem of how to call an API without knowing its address if an API call is needed.

As we will see later, Boza solves the problem by using hard-coded API addresses. Modern Win32 viruses, however, are capable of searching the import table during infection time and saving pointers to the .idata section's important entries. Whenever the application has imports for a particular API, the attached virus will be able to call it.

Note

One of the important differences in 64-bit and 32-bit PE files is their handling of import and export entries. The IA64 PE files use a PLABEL_DESCRIPTOR structure in place of any IAT entries. (This structure is detailed in Chapter 12.)

4.3.2.1.4 PE File Exports

The opposite of importing a function is exporting a function for use by EXEs or other DLLs. A PE file stores information about its exported functions in the .edata section. Consider the following dump, which lists a few exports of KERNEL32.DLL:

Entry Pt	Ordn	Name
000079CA	1	AddAtomA
.
0000EE2B	38	CopyFileA
.
0000C3DB	131	FindFirstFileA
.
00013C1E	279	GetProcAddress

KERNEL32.DLL's export table consists of an Image_Export_directory, which has pointers to three different lists: the function address table, the function name table, and the function ordinal table. Modern Windows 95/NT viruses search for the "GetProcAddress" string in the function name table to be able to retrieve the API function entry-point value.

When this value is added to the ImageBase, it gives back the 32-bit address of the API in the DLL. In fact, this is almost the same algorithm that the real GetProcAddress() from KERNEL32.DLL follows internally. This function is one of the most important for Windows 95 viruses that want to be compatible with more than one Win32-based system. When the address of GetProcAddress() is available, the virus can get all the API addresses it wants to use.

4.3.2.2 First-Generation Windows 95 Viruses

The first Windows 95 virus, known as W95/Boza.A, was introduced in the VLAD virus writer magazine. Boza's authors obviously wanted to be the first with their creation, and they had to find a Windows 95 beta version very quickly to do so. Pioneer viruses used to be very buggy, and Boza was no exception. Basically, the virus cannot work on more than two Windows 95 versions: a beta release and the final version. Even on those two Windows 95 releases, the virus causes many general protection faults during replication. Infected files are often badly corrupted.

Boza is a typical appending virus that infects PE applications. The virus body is placed in a new section called .vlad. First the .vlad section header is patched into the section table as the last entry, and the number of sections field is incremented in the PE header. The body of the virus is appended to the end of the original host program, and the PE header's entry point is modified to point to the new entry point in the virus section.

Boza uses hard-coded addresses for all the APIs it has to call. That approach is the easiest, but, fortunately, it is not very successful. The authors of the virus worked on a beta version of Windows 95 first and used addresses hard-coded for that particular implementation of KERNEL32.DLL. Later they noticed that the actual virus did not remain compatible with the final release of Windows 95. This happened because Microsoft did not have to provide the same ordinal values and addresses for all the APIs for every system DLL in all releases. This would be impossible. Different Windows 95 implementationsbetas, language versions, OSR2 releasesdo not share the same API addresses. For instance, the first API call in Boza happens to be GetCurrentDirectoryA(). Figure 4.29 shows that the ordinal values and entry points of GetCurrentDirectoryA are different in the English version of Windows 95 and in the Hungarian OSR2 Windows 95 release of KERNEL32.DLL.

Figure 4.29. The ordinal references on two releases of Windows 95.

Figure 4.29. The ordinal references on two releases of Windows 95.

	Entry Pt Ordn
A.	00007744 304 GetCurrentDirectoryA (Windows 95 ENG)
B.	0000774C 307 GetCurrentDirectoryA (Windows 95 OSR2-HUN)

ImageBase is 0xBFF70000 in both KERNEL32.DLL releases, but the procedure address of GetCurrentDirectoryA() is 0xBFF77744 in the English release and 0xBFF7774C in the Hungarian OSR2 version. When Boza wants to replicate on the Hungarian version of Windows 95, it calls an incorrect address and, obviously, fails to replicate. Therefore, Boza cannot be called a real Windows 95compatible virus. It turns out that Boza is incompatible with most Windows 95 releases.

Regardless of these facts, many viruses try to operate with hard-coded API addresses. Most of these Windows 95 viruses cannot become in the wild. Virus writers seem to understand Win32 systems much better already, creating viruses that are compatible not only with all Windows 95 releases but also with Windows 98 and Windows NT versions.

4.3.2.2.1 Header Infection

This type of Windows 95 virus inserts itself between the end of the PE header (after the section table) and the beginning of the first section. It modifies the AddressOfEntryPoint field in the PE header to point to the entry point of the virus instead. The first known virus to use this technique was W95/Murkry.

The virus code must be very short in Windows 95 header infections. Because sections must start at an offset that is a multiple of the FileAlignment, the maximum available place to overwrite cannot reach much more than the FileAlignment value. When the application contains too many sections and the FileAlignment is 512 bytes, there is no place for the virus code. The AddressOfEntryPoint field is an RVA; however, the virus code is not placed in any of the sections and, therefore, the actual RVA is the real physical offset in the file that the virus must place in the header. It is interesting to note that the entry point does not point into any code section but, regardless of that fact, Windows 95's loader happily executes the infected program.

There is a chance that a scanner will fail to detect the second generation of such viruses. This happens when the scanner is only tested on first-generation samples. In first-generation samples, the AddressOfEntryPoint points to a valid section. When the scanner looks for the entry point of the program, it must check all the section headers and whether the AddressOfEntryPoint points to any of them. There is a chance that this function is not implemented to handle those cases in which the entry point does not point to any of the sections. Some scanners may skip the file instead of scanning it from the real entry point, thereby failing to detect the infection in second-generation samples.

4.3.2.2.2 Prepending Viruses

The easiest way to infect PE files is to overwrite their beginning. Some DOS viruses infect PE files this way, but none of the known Windows 95 viruses use this infection method. Of course, the application will not work correctly after the infection. Such viruses are discovered almost immediately for this reason, which is why viruses that do not want to handle the complicated file format of PE files use the prepending method. Such viruses are usually written in a high-level language (HLL) such as C or even Delphi. This method consists of prepending the virus code to the PE file. The infected program starts with the EXE header of the virus. When the virus wants to transfer control to the original program code, it has to extract it to a temporary file and execute it from there.

Disinfection of such viruses is easy. The original header information is available at the very end of the infected program in a nonencrypted format. Virus writers will recognize that and will encrypt the original header information later on. This will make disinfection more complicated.

4.3.2.3 Appending Viruses That Do Not Add a New Section Header

A more advanced appending method is used by the W95/Anxiety virus. Anxiety is very similar to Boza in its infection mechanism, but its code is more related to the somewhat bogus W95/Harry virus.

The Anxiety virus does not add a new section header at the end of the section table. Rather, it patches the last section's section header to fit into that section. In this way, the virus can infect all PE EXE files easily. There is no need to worry that the actual section header does not fit into the section table.

By modifying the VirtualSize and SizeOfRawData fields, the virus code can be placed at the end of the executable. In this way, the NumberOfSection field of the PE header should not need to be modified. The AddressOfEntryPoint field is changed to point to the virus body, and the SizeOfImage is recalculated to represent the new size of the program. Listing 4.3 is the last section of CALC.EXE before and after the W95/Anxiety.1358 infection.

Listing 4.3. The Section Modification of W95/Anxiety.1358

06 .reloc VirtSize: 00001040 VirtAddr: 00011000
raw data offs: 0000D600 raw data size: 00001200
relocation offs: 00000000 relocations: 00000000
line # offs: 00000000 line #'s: 00000000
characteristics: 42000040 INITIALIZED_DATA MEM_DISCARDABLE MEM_READ
06 .reloc VirtSize: 00002040 VirtAddr: 00011000
raw data offs: 0000D600 raw data size: 00001640
relocation offs: 00000000 relocations: 00000000
line # offs: 00000000 line #'s: 00000000
characteristics: E0000040 INITIALIZED_DATA MEM_EXECUTE MEM_READ MEM_WRITE

The Characteristics field of the last section header is changed to have writable/executable attributes. The writable characteristic is enough in itself to execute self-modifying code from any section, but many virus writers initially did not realize that.

Viruses like W32/Zelly use two or more infection strategies. In basic infection mode Zelly adds two sections to the host program. In advanced infection mode, it merges all sections of the host into a single section, and appends the virus to the end of the image. This integrates the virus body tighter into the host program.

4.3.2.4 Appending Viruses That Do Not Modify the Entry Point

Some Windows 95 and Win32 viruses do not modify the AddressOfEntryPoint field of the infected program. The virus appends its code to the PE file, but it gets control in a more sophisticated way. It calculates where the original AddressOfEntryPoint points to and places a JMP instruction there that points to the virus body. Fortunately, it is very difficult to write such viruses.

This is because the virus must take care of the relocation entries that point to the overwritten part of the code. The W32/Cabanas virus masks out the relocation entries that point to that area. W95/Marburg does not place a JMP instruction at the entry point if it finds relocations for that area; instead, it modifies the AddressOfEntryPoint field. The JMP instruction should not be the first instruction in the program. W95/Marburg shows this by placing the JMP instruction after a random garbage block of code when no relocations are present in the first 256 bytes of entry-point code. In this way, it is not obvious to scanners and integrity checkers how to figure out the entry point of the virus code.

4.3.2.5 KERNEL32.DLL Infection

Most Windows 95 viruses attack the PE format, but some of them also infect DOS COM, EXE programs, VxDs, Word documents, and 16-bit Windows new executables (NE). Others may infect DLLs accidentally because these are linked in PE (or NE) formats, but the infection is not able to spread further because the standard entry point of the DLLs is not called by the system loader. Instead, the DLL's execution normally starts at its specified DLL entry point.

KERNEL32.DLL infectors do not attack the entry point. Instead, this type of virus must gain control differently. PE files have many other entry points that are useful for viruses, especially DLLs, which are export APIs (their entry points) by nature. Therefore, the easiest way to attack KERNEL32.DLLs is to patch the export RVA of one of the APIs (for instance, GetFileAttributesA) to point to the virus code at the end of the DLL image. W95/Lorez20 uses this approach. Viruses like this are able to go "resident" easily. The system loads the infected DLL during the system initialization period. After that, every program that has KERNEL32.DLL imports will be attached to this infected DLL. Whenever the application has a call to the API in which the virus code has been attached, the virus code gets control.

All the system DLLs contain a precalculated checksum in their PE header, placed there by the linker. Unlike Windows 95, Windows NT recalculates this checksum before it loads the DLL. If the calculated checksum is not the same as in the header of the DLL, the system loader stops with an error message during the blue screen boot-up period. However, this does not mean that such a virus cannot be implemented for Windows NTit just makes implementation a bit more complicated. Although the checksum algorithm is not documented by Microsoft, there are APIs available in IMAGEHLP.DLL for these purposeslike CheckSumMappedFile()which are efficient enough to calculate a new, correct checksum after the actual infection is done. This is not enough, however, for Windows NT's loader. There are several other steps to take, but there is no doubt that virus writers will be able to solve these questions soon. There is a need for virus scanners to check the consistency of a KERNEL32.DLL by recalculating the PE header checksum, especially if the scanner is a Win32 application itself and is attached to an infected KERNEL32.DLL.

4.3.2.6 Companion Infection

Companion viruses are not very common. Nevertheless, some virus writers do develop Windows 95 companion viruses. A path companion virus depends on the fact that the operating system always executes files with a COM extension first in preference to an EXE extension, if the names of two files in the same directory differ only in their extensions. These viruses simply look for a PE application with an EXE extension and then copy themselves with the same name into the same directory (or somewhere on the path) with a COM extension, using the host's name. W95/Spawn.4096 uses this technique. This functionality is implemented by using FindFirstFileA(), FindNextFileA() APIs for search, CopyFileA() to copy the virus code, and CreateProcessA() to execute the original host program.

4.3.2.7 Fractionated Cavity Infection

I originally predicted this infection technique as one that would possibly be developed in the future. However, the W95/CIH virus had already introduced this technique before my first lecture on Win32 viruses.

There is slack space between most sections, which is usually filled with zeros (or 0xCC) by the linker. This is because the sections have to start at the file alignment, as described in the PE header's FileAlignment field. The actual virtual size each section uses is usually different from the raw data representation. Usually, the virtual size is a smaller value. In most cases, Microsoft's Link program generates PE files like that. The difference between the raw data size of the section and the virtual size is the actual alignment area, which is filled by zeros and not loaded when the program is mapped into its own address space.

Because the default value of FileAlignment is 512 bytes (usual sector size), the usual slack area size is smaller than 512 bytes. When I first considered this kind of infection method, I thought that no such viruses would be developed because less than 512 bytes is not big enough for an average PE infector virus of that kind. However, two minutes later I had to recognize that this simple problem would not stop virus writers from developing such viruses. The only thing that has to be done by the virus is to split its virus body into several parts and then into as many section alignments as are available. The loader code for these blocks can be very short, first moving each separated code block to an allocated memory area, one by one. This code itself fits into a big enough section alignment area.

This is the precise method used by the W95/CIH virus. This makes the job of the scanner and the disinfector much harder. The virus changes the virtual size of the section to be the same as the raw data size in each section header, into which it injects a part of its virus body. The exact identification of such viruses is more difficult than for normal viruses because the virus body must be fetched from different areas of the PE image first.

W95/CIH uses the header infection method at the same time and infects Microsoft Linkercreated images without any problem. The fragmented cavity infection technique has a very important advantage from the virus's point of view. The infected file does not get bigger after the infection; its size remains the same. This makes noticing the virus much harder. The identification must be done very carefully because a virus like that may split its body at any offset, which might also separate the actual search string into several parts. This fact shows that it is very important to analyze new Windows 95 viruses with extreme care. Otherwise, the scanner might not find all generations of the same virus code.

4.3.2.8 Modification of the lfanew Field in an Old EXE Header

This is the second infection method that I originally intended to describe as one that has not yet been developed. However, as with the fragmented cavity infection method (discussed in the previous section), this technique appeared in a virus during the time I was writing about it. This infection method is one of the simplest to implement and therefore is used in many viruses. The first known virus to use this method was W95/Cerebrus. The method itself works on Windows NT, but there is a trivial bug in the virus that makes this impossible. Basically, this infection method is an appending typethe virus body is attached to the very end of the original program.

The important difference is that the virus code itself contains its own PE header. When the virus infects a PE application, it modifies the lfanew field (at 0x3c address) in the old EXE header. As described earlier, the lfanew field holds the file address of the PE header. Because this field points to a new PE header, the program is executed as if it contains only the virus code. The virus functions like a normal Win32 application. It has its own imports and can easily access any APIs it wants to call. When the replication is done, the virus creates a temporary file with a copy of the infected program. In this file, the lfanew field will point correctly to the original PE header. Thus, the original program is functional again when the virus executes the temporary file.

4.3.2.9 VxD-Based Windows 95 Viruses

Most Windows 95 viruses are direct-action infectors. Virus writers recognized the importance of fast infection and tried to look for solutions to implement Windows 95 resident viruses. Though not the easiest, the evident solution was to write a VxD virus. One of the first VxD-based viruses was W95/Memorial. It infects DOS, COM, EXE, and PE applications. The virus does not replicate without Windows 95. The infected programs use a dropping mechanism to extract the real virus code a VxD into the root directory of drive C: as CLINT.VXD.

When the VxD is loaded, the virus code is executed on ring 0, thus the virus can do anything it wants. VxDs can hook the file system easily, and that is exactly what most VxD viruses want to do. They simply hook the installable file system (IFS) with one simple VxD service routine. After that, the virus can monitor access to all files. The VxD code has to be extracted, and the dropper code needs different implementation for each and every format that the virus wants to infect. This makes the virus code very complicated and relatively big (12,413 bytes). Therefore, it is very unlikely that many viruses like this will be developed in the future.

4.3.2.10 PE Viruses That Operate as VxDs

A much easier solution has been introduced by the W95/Harry and W95/Anxiety viruses. These viruses can overcome complications by patching their code into the VMM (virtual machine manager) of Windows 95.

When an infected PE program is executed, the virus code takes control. Programs are executed on the application level, which is why they cannot call system-level functions (VxD calls) normally. These viruses bypass the system by installing their code into the VMM, which runs on ring 0. The installation routine of such a virus searches for a big enough hole in the VMM's code area after the 0C0001000h address.

If a large enough area, consisting of only 0FFh bytes, is detected, the virus looks for the VMM header at 0x0C000157Fh and checks this area by comparing it to VMM. If this is detected, the virus picks up the Schedule_VM_Event system function's address from the VMM and saves it for later use. Then it copies its code into the VMM by overwriting the previously located hole and changes the original Schedule_VM_Event's address to point to a new function. Finally, it executes the original host program by jumping to the original entry point. This all is possible because Microsoft is unable to protect that area from changes to keep backward compatibility with old Windows 3.x VxDs. The full VMM area is available for read and write access for application-level programs.

Before the host program can be executed, the VMM will call Schedule_VM_Event, which is now replaced by the initialization routine of the virus. This code is executed on ring 0 already, which enables it to call VxD functions. Anxiety hooks the IFS by calling IFSMgr_InstallFileSystemApiHook from there. This installs the new hook API of the virus.

The virus replication code needs special care. When VxD code is executed, VxD calls are patched by the VMM. The VMM turns the 0CDh, 20h, DWORD function ID (INT 20H, DWORD ID)21 to FAR CALLS. Some of the VxD functions consist of a single instruction. In this case, the VMM patches the six bytes with this single instruction, which fits there. The VMM does this dynamically with all the executed VxDs to speed up their execution.

When the virus code is executed, the VxD functions in the virus body are patched by the VMM, and the virus therefore cannot copy this image immediately to files again because the virus code would not work in a different Windows 95 environment. These viruses contain a function that patches all their VxD functions back to their normal format first and only after that replicates the code into the host program. Even if this technique looks very complicated, it is not very difficult for virus writers. W95/Anxiety variants used to be in the wild in many countries.

There is no doubt that several viruses will try to overcome the ring 3 to ring 0 problem using similar methods even on Windows NT based systems. W95/CIH uses instructions that are available only from Intel 386 processors and above. It is interesting to note that the interrupt descriptor table is available to write under Windows 95 (because it is part of the VMM). W95/CIH uses the SIDT (store IDT) instruction to get a pointer to the IDT (this technique is detailed in Chapter 6). In this way, the virus can modify the gate descriptor of INT 3 (debug interrupt) in the IDT and allocate memory by using VxD services. The INT 3 routine will be executed as a ring 0 interrupt from its PE virus body. This trick shows how easy it is for virus writers to overcome the ring 3, ring 0 problem. Similar methods will be discovered by Windows 95 virus writers in the near future, resulting in an even simpler method.

4.3.2.11 VxD Infection

A few viruses, such as Navrhar, infect Windows Virtual Device Drivers (VxDs). Navrhar also infects Word documents that are in the OLE2 format and some standard system VxDs. The virus does not infect unknown VxDs, but only known system VxDs that are listed in its PE dropper. When an infected Word document is opened, the virus extracts its PE dropper, which is attached to the very end of the document. Therefore, the only way to access this code is to use Win32 APIs, which is why the virus imports KERNEL32.DLL APIs in its macro code. When the dropper's code is extracted from the document, the dropper is executed, checking for the listed VxDs and infecting them one by one. When the system is rebooted, one of the infected VxDs will be loaded by Windows 95. The virus takes control from the infected VxD, hooks the file system, and checks for Word document access.

Navrhar illustrates that, unlike DOC files, PE applications are not so commonly exchanged by usersnot to mention VxDs, which are not normally exchanged at all. This is why modern Win32 viruses use some form of worm propagation mechanism instead (see Chapters 9 and 10).

4.3.2.12 DLL Load Insertion Technique

This particular infection technique is based on manipulation of PE files in such a way that when the host application is loaded, it will load an extra DLL, which is the virus code.

For example, W32/Initx loads a DLL with the name INITX.DAT via a single LoadLibrary() call inserted into the host program. This extra code is inserted into a slack space of the code section of the host, and the entry point of the host is modified to point to the inserted code. On execution of the host program and whenever the INITX.DAT file is available, the virus code is launched before the host program's code. After this, control is given to the original host entry-point code.

4.3.3. Win32 and Win64 Viruses: Designed for Microsoft Windows?

Microsoft's strategy is clear. The Designed for Microsoft Windows logo program's important requirement is that every application in your product must be a Microsoft Win32 program compiled with a 32-bit compiler that generates an executable file of the PE format. Not surprisingly, the number of Win32 programs developed by third parties has grown intensively during the last few years. People exchange and download more PE programs.

The main reason that Windows 95 and Win32 viruses did not cause big problems for a long time was that virus writers had to learn a lot to "support" the new systems. Young virus writers understand Microsoft's message: "Windows everywhere!" Their answer seems to be "Windows viruses everywhere!" These young guys will not waste their time with DOS viruses anymore but will continuously explore Win32 and Win64 platforms instead.

There is no longer any point in attackers' writing DOS viruses. Virus scanners are much weaker in handling Windows viruses generically and heuristicallydetection and disinfection are not that easy. Vendors must learn and understand the new 64-bit file formats and spend a reasonable amount of time researching and designing new scanning technology.

Because Windows 95 and Windows NT are more complicated systems, it is natural that the first period of such viruses took more time than DOS viruses. However, the number of Win32 viruses surpassed 10,000 in 2004. It took about 10 years for DOS viruses to reach 10,000 known variants, but only 9 years for Win32 threats. This indicates that, although virus writing slows down as new platforms appear (replacing older ones), eventually the growth ratio of any virus type will be exponential.

In the following section, I will describe some important issues that make a Windows 95 virus incompatible with Windows NT. This specifies the differences between the Windows 95 and Win32 prefixes that scanners use to identify 32-bit Windows viruses.

4.3.3.1 Important Windows 95 and NT System Loader Differences

Before I understood W32/Cabanas, I had a different picture of Windows NT from the security point of view because I had incorrect conclusions about the level of system security when the first Windows 95 virus, Boza, appeared. Most antivirus researchers immediately performed some tests with Boza on Windows NT. The result looked reassuring: Windows NT did not even try to execute the infected image as shown on Figure 4.30.

Figure 4.30. An error message is displayed when executing W95/Boza on Windows NT.

Figure 4.30. An error message is displayed when executing W95/Boza on Windows NT.

What is good for Window 95's loader is not good for Windows NT. Why? I answered this question myself by patching PE files.

The PE file format was designed by Microsoft for use by all its Win32 operating systems (Windows NT/2000/XP/2003, Windows 95/98/Me, Win32s, and Windows CE). (Later, the PE file format was extended to PE+ to accommodate the needs of 64-bit platforms.) That is why all the system loaders in Win32 systems have to understand this executable structure. However, the implementation of the loader is different from one system to another. Windows NT's loader simply checks more things in the PE file before it executes the image than Windows 95's loader does. Thus Windows NT finds the Boza-infected file suspicious. This happens because one field in the .vlad section header (which is patched into the section table of the host program) is not precisely calculated by the virus. As a result, correctly calculated sections and section headers can be added to a PE file without any problem. Thus the Windows NT's loader does not have any superior virus detection, as some may assume.

If this problem were fixed in Boza, the virus would be capable of starting the host program even on a Windows NT platform. However, the virus would still not be able to replicate. This is because of another incompatibility problem, from which all the initial Windows 95 viruses have suffered. Every Windows 95 virus must overcome a specific problem: It must be able to call two Win32 KERNEL APIs: GetModuleHandle() and GetProcAddress(). Because those APIs are in KERNEL32.DLL, Windows 95 viruses could access those functions from KERNEL32.DLL directly with a hack. Most Windows 95 viruses have hard-coded pointers to GetModuleHandle() and GetProcAddress() KERNEL APIs. By using GetProcAddress(), the virus can access all the APIs it wants to call. (Alternatively, some viruses use LoadLibrary() to get a module handle to KERNEL32.DLL, but this method is less common. This is because most applications already map the KERNEL32 API in their process address space.)

When the linker creates an executable, it assumes that the file will be mapped to a specific location in memory. In the PE file header, there is a field called ImageBase holding this address. For executables, this address is usually 0x400000 by default. In the case of Windows 95, the KERNEL32.DLL's ImageBase address is 0xBFF70000. Thus, the address of GetModuleHandle() and GetProcAddress() will be at a certain fixed location in the same release of KERNEL32.DLL. However, this address can be different in a new release, which makes Windows 95 viruses incompatible even with other Windows 95 releases. This ImageBase address is 0x77F00000 in Windows NT as the default. Thus Windows 95 viruses that operate with a Windows 95specific base address cannot work on Windows NT. (Interestingly enough, first-generation exploit code often suffers from similar problems and is only able to work on a single platform.)

The third reason for incompatibility is obvious: Windows NT does not support VxDs. Viruses such as Memorial cannot operate on Windows NT because such viruses are VxD-based. They should have included different infection algorithms at the driver level for Windows NT and Windows 95 to operate on both systems, which would make them complicated.

If a Windows 95 virus can overcome the preceding incompatibility and implementation problems, it will eventually work on Windows NT/2000/XP/2003 as well. Such viruses might have Unicode support, but it is not mandatory. W32/Cabanas supports all of these features, being able to trespass the OS barrier imposed by early Windows 95 creations.

Both Boza and Cabanas are 32-bit Win32 programs. Cabanas infects files under Windows 95/98/Me (and any other localized versions) and under all major Windows NTbased systems releases, such as 3.51, 4.0, 5.0 (Windows 2000), and 5.1 (Windows XP). Boza replicates only under the English Windows 95 release. Therefore, the prefix part of the virus name is Win32 for Cabanas and Win95 for Boza.

4.4. Conclusion

This chapter has presented a great deal about computer virus infection techniques in files and other objects. It is important to be familiar with these techniques because they have a great impact on the design of antivirus engines. Even more importantly, they affect the analysis process for both manual and automated methods, which will be demonstrated in Chapter 15.

References

1. Adam Petho, ROM BIOS, 1989, ISBN: 963-553-129-X (Paperback).

2. Fridrik Skulason, "AzusaComplicating the Recovery Process," Virus Bulletin, April 1991, p. 23.

3. Jakub Kaminski, "Rainbow: To Envy or to Hate," Virus Bulletin, September 1995, pp. 2-7.

4. Mike Lambert, "Circular Extended Partitions: Round and Round with DOS," Virus Bulletin, September 1995, p. 14.

5. Fridrik Skulason, "Investigation: The Search for Den Zuk," Virus Bulletin, 1991, pp. 6-7.

6. Mikko Hypponen, "Virus Activation Routines," EICAR, 1995, pp.1-11.

7. Fridrik Skulason, "Disk Killer," Virus Bulletin, January 1990, pp.12-13.

8. Jan Hruska, "Virus Writers and Distributors," Virus Bulletin, July 1990, pp. 12-14.

9. Dr. Vesselin Bontchev, private communication, 1996.

10. Peter Morley, personal communication, 1999.

11. Peter Szor, "Coping with Cabanas," Virus Bulletin, November 1997, pp. 10-12.

12. Peter Szor, "Olivia," Virus Bulletin, June 1997, pp. 11-12.

13. Peter Szor, "Nexiv_Der: Tracing the Vixen," Virus Bulletin, April 1996, pp. 11-12.

14. Peter Szor, "Shelling Out," Virus Bulletin, February 1997, pp.6-7.

15. Matt Pietrek, Windows Internals, Addison-Wesley, 1993, ISBN: 0-201-62217-3 (Paperback).

16. Adrian Marinescu, "Russian Doll," Virus Bulletin, August 2003, pp. 7-9.

17. Peter Ferrie, "Unexpected Resutls [sic]," Virus Bulletin, June 2002, pp. 4-5.

18. Peter Szor, "Attacks on Win32," Virus Bulletin Conference, 1998.

19. Peter Szor, "High Anxiety," Virus Bulletin, January 1998, pp. 7-8.

20. Peter Szor, "Breaking the Lorez," Virus Bulletin, October 1998, pp. 11-13.

21. Andrew Schulman, Unauthorized Windows 95, IDG Books, 1994, ISBN: 1-568-84305-4.

Chapter 5. Classification of In-Memory Strategies

"Little by little, one travels far."

J.R.R. Tolkien


In this chapter, you will learn about common memory residency strategies that computer viruses use to infect other objects on a system or across systems. Depending on the in-memory residency strategy alone, some viruses can become much more virulent than others.

5.1. Direct-Action Viruses

Some of the simpler computer viruses do not actively manifest themselves in computer memory. The very first file infector viruses on the IBM PC, such as Virdem and Vienna, belong to this category. Usually direct-action viruses do not spread fast and do not easily become in the wild.

Direct-action viruses load with the host program into computer memory. Upon getting control, they look for new objects to infect by searching for new files. This is exactly why one of the most common kinds of computer virus is the direct-action infector. This kind of virus can be crafted with relative ease by the attacker on a variety of platforms, in binary or in script languages.

Direct-action viruses typically use a FindFirst, FindNext sequence to look for a set of victim applications to attack. Typically such viruses only infect a couple of files upon execution, but some viruses infect everything at once by enumerating all directories for victims. In other cases, direct-action viruses simply copy themselves between the diskettes and the hard disk without waiting for the user to copy an infected file to the diskette. This technique, however, makes them much more likely to be noticed by a user because the extra diskette activity is a noisy operation.

Depending on the location of the actual host, the virus might become luckier in network environments. On the network, the virus might enumerate network shares or simply attack files, assuming that writeable network resources are available in the A: to Z: range. In this way, direct-action viruses can be extremely slow infectorsunless they appear in a networked environment.

Thousands of virus construction kitgenerated computer viruses use the direct-action method on DOS. An example of this kind of virus is VCL.428, created by the Virus Construction Laboratory.

5.2. Memory-Resident Viruses

A much more efficient class of computer viruses remains in memory after the initialization of virus code. Such viruses typically follow these steps:

  1. The virus gets control of the system.
  2. It allocates a block of memory for its own code.
  3. It relocates its code to the allocated block of memory.
  4. It activates itself in the allocated memory block.
  5. It hooks the execution of the code flow to itself.
  6. It infects new files and/or system areas.

This is the most typical pattern, but several other methods exist that do not require all of the preceding steps. On single-tasking operating systems such as DOS, only a single-user application can run at any one time; any other program code needs to make itself TSR (Terminate-and-Stay-Resident). DOS offers a variety of services in the form of interrupts to develop TSR code.

A typical example of a TSR program on DOS is a clock application to display the time on the screen during the execution of any single program. Because all applications share a single "thread" of execution, any program can easily interfere with any other in more than one way. Indeed, even the code of DOS, some system data structures, device drivers, or interfaces can accidentally be changed by a buggy user application, which can lead to catastrophic system crashes and corruptions.

Here is an anecdote about that. The first version of the Borland Quattro spreadsheet program for DOS was developed in 100% Assembly in Hungary. An interesting situation occurred during the development of the project. Sometimes during the execution of a loop, the control flow took the opposite direction than expected. The code did not explain why that would happen. It turned out that a loaded clock program on the system occasionally flipped the control flow because it modified the direction flag but in some cases forgot to set it back afterward. As a result, the not-intentionally-malicious clock program could easily do harm to the contents of spreadsheets and other programs. Of course, the bogus clock program was a TSR (Terminate-and-Stay-Resident).

The point is that DOS applications are not separated or walled up from each other in any way. Malicious code can take advantage of this kind of system very easily. On standard DOS, the processor is used in a single mode, and therefore any program has the privilege to modify any other program's code in the physical memory, which is addressable up to 1MB long (with some computers capable of accessing an extra high memory area of 64KB above that).

5.2.1. Interrupt Handling and Hooking

DOS programs use DOS and BIOS interrupts for system services. In the past, on microcomputers, programmers typically transferred control to a BIOS-based entry point, so programmers needed to keep such entry points in mind. The interrupt vector table (IVT) simplifies the programmer's task on the IBM PC for several reasons. Using the IVT, programs can refer to functions by their interrupt number and service number. As a result, hard-coded addresses to services need not be compiled into the program code. Instead, the INT x instruction can be used to transfer control to a service via the IVT.

Figure 5.1 illustrates how a typical boot virus such as Brain installs itself into the execution flow by hooking the BIOS disk handler.

Figure 5.1. A typical boot virus hooks INT 13h.

Figure 5.1. A typical boot virus hooks INT 13h.

Boot viruses typically hook the INT 13h BIOS disk interrupt handler and start to monitor its functions, wait for diskette access for read and write, and during such operations write their code (or part of it) into the boot sector of the diskettes.

On DOS the IVT is placed at the beginning of physical memory at 0:0. The table holds the segment and offset values of each interrupt, so each entry in the table occupies four bytes. Thus INT 21h's vector can be found at 0:84h in memory. Table 5.1 shows common interrupts and their typical use by computer viruses.

Table 5.1. Typical Interrupts Used by Computer Viruses

INT IDFunction CategoryOffset in IVTIntercepted/Used by Virus Code
INT 00Divide Error CPU Generated0:[0]Anti-Debugging, Anti-Emulation
INT 01Single Step CPU Generated0:[4]Anti-Debugging, Tunneling, EPO
INT 03Breakpoint CPU Generated0:[0Ch]Anti-Debugging, Tracing
INT 04Overflow CPU Generated0:[10h]Anti-Debugging, Anti-Emulation (caused by an INTO instruction)
INT 05Print Screen BIOS0:[14h]Activation routine, Anti-Debugging
INT 06Invalid Opcode CPU Generated0:[18h]Anti-Debugging, Anti-Emulation
INT 08System Timer CPU Generated0:[20h]Activation routine, Anti-Debugging
INT 09Keyboard BIOS0:[24h]Anti-Debugging, Password stealing, Ctrl+Alt+Del handling
INT 0DhIRQ 5 HD Disk (XT) Hardware0:[34h]Hardware level Stealth on XT
INT 10hVideo BIOS0:[40h]Activation routine
INT 12hGet Memory Size BIOS0:[48h]RAM size check
INT 13hDisk BIOS0:[4Ch]Infection, Activation routine, Stealth
INT 19hBootstrap Loader BIOS0:[64h]Fake rebooting
INT 1AhTime BIOS0:[68h]Activation routine
INT 1ChSystem Timer Tick BIOS0:[70h]Activation routine
INT 20hTerminate Program DOS Kernel0:[80h]Infect on Exit, Terminate Parent
INT 21hDOS Service DOS Kernel0:[84h]Infection, Stealth, Activation routine
INT 23hControl-Break Handler DOS Kernel0:[8Ch]Anti-Debug, Non-Interrupted Infection
INT 24hCritical Error Handler DOS Kernel0:[90h]Avoid DOS errors during Infections (usually hooked temporarily)
INT 25hDOS Absolute Disk Read (DOS Kernel)0:[94h]Disk Infection, Stealth (Gets to INT 13 however)
INT 26hDOS Absolute Disk Write (DOS Kernel)0:[98h]Disk Infection, Stealth (Gets to INT 13 however)
INT 27hTerminate-and-Stay Resident (DOS Kernel)0:[9Ch]Remain in memory
INT 28hDOS IDLE Interrupt DOS Kernel0:[A0h]To perform TSR action while DOS program waits for user input
INT 2AhNetwork Redirector DOS Kernel0:[A8h]To infect files without hooking INT 21
INT 2FhMultiplex Interrupt Multiple use0:[BCh]Infect HMA memory, Access Disk Structures
INT 40hDiskette Handler BIOS0:[100h]Anti-Behavior Blocker
INT 76hIRQ 14 HD Operation Hardware0:[1D8h]Hardware Level Stealth on AT and above

The x86 family of processors has the capability to store 256 different interrupts in the IVT.

Information about the preceding interrupts (and many others) is available in the Ralf Brown Interrupt List, which offers 3,000 pages of further details. Initially, the available information about interrupts was minimal, but The Interrupt List became an essential guide for DOS virus researchers over the years and has increased understanding of undocumented interrupts.

The resident virus technique clearly has a major advantage over direct-action viruses. Resident viruses can easily infect new objects "on the fly" whenever you access them on your system. Furthermore, such viruses also can hide themselves easily using stealth techniques.

In eastern European countries, such as Bulgaria, such information was very hard to get in the pre-Internet days. In fact, programmers in such countries typically reverse-engineered DOS to figure out such details. Not surprisingly, many advanced Bulgarian viruses used such functions as DOS 2+ internal service calls such as "Get List of Lists" (INT 21hAH=52h), for tunneling, leaving wet-behind-the-ears virus researchers to wonder what the virus actually did with them.

5.2.2. Hook Routines on INT 13h (Boot Viruses)

An interrupt is typically used with a set of registers that define subfunction identifiers and pointers to data structures. For instance, the INT 13h takes its subfunctions in AH. To read the disk, someone must set the following registers:

  • AH = 2
  • AL = Number of sectors to read
  • CH = Cylinder
  • CL = Sector
  • DH = Head number
  • DL = Drive number
  • ES:BX = Pointer to allocated data buffer

The memory must be allocated first, and the disk needs to be reset beforehand. In fact, the diskettes are usually slow and do not spin quickly enough, requiring a couple of extra reads, with disk resets in between. It is nice to know that the hard disk numbers start at 80h (with bit 7 set).

When the interrupt is executed, the return address is pushed to the stack. When the called interrupt handler (or chained handlers) returns with an IRET instruction, it will use the return address from the stack. The interrupt handler also can return with an RETF instruction.

Boot viruses are naturally curious about the AH values passed into INT 13h. By presenting a new interrupt handler in the IVT, the virus code can easily monitor this value with a set of compare (CMP) instructions and take action according to the value.

Typically, boot viruses first save the original INT 13h handler upon execution:

	MOV	AX,[004C]	; Offset of INT 13h
	MOV	[7C09],AX	; Save it for later use
	MOV	AX,[004E]	; Segment of INT 13h
	MOV	[7C0B],AX	; Save it for later use

And boot viruses generally allocate memory just below the top 640KB boundary by manipulating the BIOS DATA area at segment 40h and by changing the 40h:13h (0:[413h]) word value that holds the top available memory. When this value is changed, no memory allocation will be possible for any programs above a newly set limit, usually one or a couple of KBs less than the previous value.

Next, the virus copies its code to the "allocated" block and hooks the INT 13h handler. It is interesting to note that viruses such as Stoned hook INT 13h before they relocate their code to the memory, which is set as the new handler. Obviously, the virus expects that no other disk reads can take place during boot time so that the code will not crash.

Hooking a handler is therefore as simple as setting new values in the IVT.

	MOV	[004C],AX	; Set new INT 13h Offset in IVT
	MOV	[004E],ES	; Set new INT 13h Segment in IVT

The new handler of Stoned is shown in Listing 5.1.

Listing 5.1. A New Handler Installed by Stoned

	PUSH	DS		; Save DS to stack
	PUSH	AX		; Save AX to stack
	CMP	AH,02		; Disk Read?
	JB	Exit		; Jump to Exit if Below
	CMP	AH,04		; Disk Verify?
	JNB	Exit		; Jump to Exit if Not Read/Write
	OR	DL,DL		; Diskette A: ?
	JNZ	Exit		; Jump to Exit if Not
	XOR	AX,AX		; Set AX=0
	MOV	DS,AX		; Set DS=0
	MOV	AL,[043F]	; Read Diskette Motor Status
	TEST	AL,01		; Is motor on in Drive A:?
	JNZ	Exit		; Jump to Exit if Not
	CALL	Infect		; Attempt infection
 
Exit:
	POP	AX		; Restore AX from top of stack
	POP	DS		; Restore DS from top of stack
	CS:JMP	FAR [0009]	; Jump to Previously Saved Handler

Obviously, it would be unethical to illustrate the virus with more code, but the previous code should give you a good idea of hooking in general. It also shows computer virus research from the perspective of code analysis. In the past, we typically commented code on printed paper, line by line. Eventually, the prints became far too long, and analysis of code appeared to be a 100-meter tournament, so to speak. Fortunately, great tools such as IDA (the Interactive Disassembler) came to the rescue (which will be discussed in Chapter 15, "Malicious Code Analysis Techniques").

5.2.3. Hook Routines on INT 21h (File Viruses)

File viruses typically hook INT 21h on DOS and it is commonly done using the INT 21h sub-functions 35h and 25h, Get and Set Interrupt vectors, respectively. Not all viruses, however, need to change INT 21h's vector in the IVT itself. An example of a virus that does not change the INT 21h vector is Frodo, written in Israel in 1989. Frodo does not hook the INT 21h vector using normal methods. Instead, the virus modifies the real entry point of INT 21h by placing a jump instruction to the entry point of the handler to its own handler.

Apparently, Frodo is among the first few full-stealth file viruses on MS-DOS. (The Dark Avenger virus, Number_Of_The_Beast1, used full file stealth techniques a few months earlier than Frodo, but Frodo made the technique famous.)

By intercepting INT 21h subfunctions, Frodo can hide file changes from DOS programs, even when they read from the file. The virus is sophisticated enough to show the original file content instead.

Let's look at the INT 21h vector on a Frodo-infected DOS system, using DEBUG.

C:\>DEBUG (We enter to DEBUG.)

We dump the INT 21h vector, which holds the value 19:40EB (segment:offset in memory). Even to a trained eye, this value is not suspicious at all and looks normal. This is because memory is typically filled from the lower segments toward the higher ones, and so "segment 19" might be in DOS itself, or even before it, pointing to a low memory segment.

-d 0:84 l4
0000:0080		EB 40 19 00
.@..
 

Next, we take a look at the handler with the unassembly command from the address we found in the IVT (see Listing 5.2).

Listing 5.2. The Jump (JMP) Instruction to Frodo's Hook Routine

-u19:40eb
 
0019:40EB EAD502209E	JMP	9E20:02D5	; Jump to VIRSEG:02d5
0019:40F0 D280FC33	ROL	BYTE PTR [BX+SI+33FC],CL
0019:40F4 7218		JB	410E
0019:40F6 74A2		JZ	409A
0019:40F8 80FC64	CMP	AH,64
0019:40FB 7711		JA	410E
0019:40FD 74B5		JZ	40B4
0019:40FF 80FC51	CMP	AH,51
0019:4102 74A4		JZ	40A8
 

The preceding code seems strange. Although we can see usual CMP (compare) instructions, the jump instruction at the entry point takes the control flow to 9E20:02D5. This code is patched there by the virus itself to take control in a sophisticated manner.

Finally, we can take a look at a fraction of the entry-point code of Frodo. Another unassembly command reveals the virus code in memory, as shown in Listing 5.3.

Listing 5.3. The Hook Routine Entry of Frodo

-u9e20:02d5
 
9E20:02D5 55		PUSH	BP		; Save BP
9E20:02D6 8BEC		MOV	BP,SP		; Set BP to SP
:
:
9E20:02F3 53		PUSH	BX		; Save BX
9E20:02F4 BB9002	MOV	BX,0290		; Set BX to Function Table
9E20:02F7 2E		CS:
9E20:02F8 3A27		CMP	AH,[BX]		; Is this one hooked?
9E20:02FA 7509		JNZ	0305		; Check all entries
9E20:02FC 2E		CS:			; Found a match
9E20:02FD 8B5F01	MOV	BX,[BX+01]	; offset of hook
9E20:0300 875EEC	XCHG	BX,[BP-14]	; Set the address to
						; "return" to
9E20:0303 FC		CLD
9E20:0304 C3		RET			; Run the hook routine
9E20:0305 83C303	ADD	BX,+03		; Get Next Entry
9E20:0308 81FBCC02	CMP	BX,02CC		; Are we at the end?
9E20:030C 72E9		JB	02F7		; If not, compare
 

Frodo is very tricky, however. Instead of using a simple switch statement using compare (CMP) instructions, the virus uses a table at offset 290 of the virus segment. The virus transfers control to the subfunctions of INT 21h according to the table. The bold characters in the next table are DOS subfunctions, followed by the offset where the subhandlers are located. Let's dump the memory from virus segment:290.

-d9e20:290 9E20:0290 30 7C 07 23 4E 04 37 8B-0E 4B 8B 05 3C D5 04 3D 0|.#N.7..K..<..= 9E20:02A0 11 05 3E 55 05 0F 9B 03-14 CD 03 21 C1 03 27 BF ..>U.......!..'. 9E20:02B0 03 11 59 03 12 59 03 4E-9F 04 4F 9F 04 3F A5 0A ..Y..Y.N..O..?.. 9E20:02C0 40 8A 0B 42 90 0A 57 41-0A 48 34 0E 3D 00 4B 75 @..B..WA.H4.=.Ku

Note

The preceding functions are listed with their descriptions in the "Full-Stealth Viruses" section.

File viruses typically infect on interception of INT 21h, AH=4Bh (EXEC). This event is among the easiest to work with from the virus's point of view: A filename is presented on a silver platter because it is passed for the function as a parameter. The most successful viruses use this trick to replicate, but many of them also infect during file open and close events. This can help the spread of the virus dramatically. For instance, a virus scanner will open all objects for scanningthis is intercepted by the virus, and the virus can infect the scanned files immediately, saying thanks for the help to the antivirus scanner. (Modern antivirus solutions, such as F-PROT, check the file size by two different means to reduce the likelihood of this kind of attack. F-PROT uses the standard "get file size" function and also seeks to the end of the file to obtain the position of the seek pointer. Then F-PROT compares the results of these two methods. If they do not match, F-PROT assumes that a stealth virus is in control. However, a full stealth strategy can be effective against even this tricky solutionunless the virus is detected in memory and its hook routine gets deactivated2.

Table 5.2 shows some of the early, in-the-wild viruses with their common interrupt hook distributions and infection characteristics.

Table 5.2. Common Interrupt Hook Distributions in Early Computer Viruses

Virus NameInfection CharacteristicsHooked Interrupts
BrainDBR, StealthINT 13h
StonedDBR, MBRINT 13h
CascadeCOM, EncryptedINT 1Ch, INT 21h
FrodoCOM, EXE, StealthINT 1, INT 23h, INT 21h
TequilaMultipartite: EXE, MBR, Oligomorphic, StealthINT 13h, INT 1Ch, INT 21h
Yankee_DoodleCOM, EXEINT 1, INT 1Ch, INT 21h

5.2.4. Common Memory Installation Techniques Under DOS

In this section you will learn about the most common techniques computer viruses use to install themselves in memory of a DOS machine. Because DOS does not have any memory protection, viruses can easily manipulate any area of memory. Fortunately on a DOS system, viruses cannot effectively hide themselves in memory. This is because physical memory is continuous, and short pieces of virus code can easily be found. This is why memory stealth viruses are unknown on DOS, although a number of techniques exist that attempt to install virus code in unusual places of memory to confuse antivirus products that look for virus patterns only on the code path of certain interrupt handlers or certain areas of memory (up to 640KB, but not beyond that limit).

  • The easiest way to install a virus in memory is to not take care of memory allocations at all. This rare technique is used by the virus called Stupid. This virus simply installs itself below the 640KB memory limit, but it does not reduce the top memory field kept at 0:[413]. Thus the virus hopes that this area of memory will never be allocated for any program. Indeed, the virus would crash if some program used the same memory area.

    Occasionally this technique is improved by copying the virus code to the end of memory but not letting DOS allocate memory blocks above the start of virus code in memory. This avoids the unwanted overwrite effect that might occur.

  • A common method involves finding some sort of hole in the memory that is already allocated but rarely used. Such hole exists at a couple of locations in the DOS memory. For instance, the second half of the IVT (above 0:200h) is rarely used, so a short piece of virus code can install itself into this "hole" of the IVT.

    Obviously, such a virus is incompatible with DOS extensions and network shells that occupy interrupt vectors above 0:200h so that whenever such a shell is installed, the virus crashes.

    Other viruses, such as Darth_Vader (written by V.T. in Bulgaria), install themselves into the DOS kernel itself in a small hole of memory. A couple of other holes like this exist, and viruses that use them might not be able to spread if these places are occupied by something else.

  • Sometimes, but not often, DOS viruses use TSR (Terminate-and-Stay-Resident) functions, such as INT 27h, to allocate memory for the virus code with normal procedures. The Jerusalem virus uses this method.
  • One of the most common techniques was introduced by the boot viruses, such as Brain. The virus gets the top memory field of the BIOS data area by reading the word value at 0:[413] in memory and then decrements this value by a couple of kilobytes, reducing the 640KB limit to 639KB, 638KB, and so on. In this way, the top of the memory becomes a perfect place for the virus. Such viruses are very easy to spot in memory by checking for interrupt vectors that point to a high segment in memory.

    Boot viruses typically use this method. Occasionally, INT 12h is used to get the value of the top memory, and then the BIOS data area is manipulated to reduce the top memory to a smaller value.

  • One special technique is to manipulate with the MCB (memory control block) chain of DOS. Such viruses usually extend or shrink memory blocks to attach themselves to a particular application's memory allocations in a parasitic manner. Other viruses simply allocate a new MCB and set the owner of the MCB to COMMAND.COM, the command interpreter of DOS. Cascade viruses use this technique to confuse memory map tools that can show associations of applications with allocated memory blocks.

    Some boot viruses, such as Filler, also hook INT 21h to intercept when COMMAND.COM is loaded and manipulate COMMAND.COM's MCB to make space for the virus.

  • Some early DOS viruses, such as Lehigh, allocate memory for themselves in the DOS stack area.
  • A tricky technique was introduced in the Starship viruses. These viruses install their main part above the 640KB and 1MB (UMB: upper memory block) limit of DOS. They take advantage of unused areas of the UMB memory, such as a part of the video memory that is not associated with the visible screen.

    An additional example of a virus that installs itself into UMB is Tremor, written in 1992.

  • Advanced viruses can allocate virus code into the High Memory Area (HMA) that is available when the HIMEM.SYS device is loaded. This memory area is above the 1MB boundary and is 64KB long. The GoldBug virus is an example that uses the HMA on 286 and above computers. GoldBug was written in the U.S. in 1994 by Q the Misanthrope.

    Very few viruses install themselves to the memory regions, such as XMS (Extended Memory Specification), but some viruses dofor example, one variant of the Ginger family. It was written in 1995 by roy g biv and RT Fishel.

    An unusual memory allocation technique is used in the Reboot Panel (INT 2Fh, AX=4A06h) to force DOS to build the Memory Control Block (MCB) around the code; viruses written by 'Q the Misanthrope' claim to use this technique3.

5.2.4.1 Self-Detection Techniques in Memory

A common technique of self-recognition in memory is based on the use of "Are you there?" calls. Boot viruses typically do not use this technique because they only load once during the booting of the system. However, other viruses that infect files need to hook the system only once, so the virus hooks an interrupt or file system and returns specific output for special input registers. The newly executed copies of the virus can check if a previous copy is installed by calling this routine. The memory resident copy answers the call, "Yes, I am here. Do not bother to install again." Table 5.3 contains some examples of this from DOS systems.

Table 5.3. "Are You There?" Call Examples in Early Computer Viruses

Virus Name"Are you there?"Call Return Values
JerusalemINT 21h AH=E0hAX=0300h
FlipINT 21h AX=FE01hAX=01Feh
SundayINT 21h AH=FFhAX=0400h
InvaderINT 21h AX=4243hAX=5678h
NomenklaturaINT 21h AX=4BAAhCarry Flag is Cleared

On other systems such as Windows, viruses often use ram semaphores, such as a global mutex, that they set during the first time the virus is loaded. This way, the newly loaded copies can simply quit when they are executed.

Windows 95 viruses that hook the file system in kernel mode often have similar installation checks to DOS viruses. In some cases, viruses hook I/O port access and return values on these virtual I/O ports. The W95/SK virus got its name from such an I/O port routine. The virus hooks access to I/O port 0x534B (SK) and returns 0x21 (!) when this I/O port is read. Other viruses might examine the content of the memory at a specific location, check for the existence of a filename that is created as a flag, and so on.

Early antivirus products used these calls to detect viruses in memory. Specific monitor programs were also written to simulate "are you there calls" of viruses. Such solutions tricked viruses into believing that their malicious code was already installed in memory; thus the viruses never loaded again actively on the system. Such methods, however, are not general enough to be useful; only virus variant specific antivirus tools might use them.

5.2.5. Stealth Viruses

Stealth viruses always intercept a single function or set of functions in such a way that the caller of the function will receive manipulated data from the hook routine of the virus. Therefore, computer virus researchers only call a virus "stealth" if the virus is active in memory and manipulates the data being returned.

Virus writers always attempt to challenge users, virus researchers, and virus scanners. Some techniques, such as antiheuristics and antiemulation, were only invented by virus writers when scanners started to get stronger; however, stealth viruses appeared very quickly.

In fact, one of the first-known viruses on the PC, Brain (a boot virus), was already stealth. Brain showed the original boot sector whenever an infected sector was accessed and the virus was active in memory, hooking the disk interrupt handling. This was in the golden days when Alan Solomon (author of one of the most widely used virus-scanning engines) was challenged to figure out what exactly was going on in Brain-infected systems.

The stealth technique also quickly appeared in DOS file infector viruses. This method was a sure way for a virus to go unnoticed for a relatively long period of time. In fact, in the DOS days, users would remember sizes of system files in an attempt to apply their own integrity checking. By knowing the original size of a file such as COMMAND.COM, the command interpreter was halfway to success in finding an on-going infection.

According to how difficult it was to find a virus in files and what kind of method was used, virus researchers started to describe the techniques differently. The following sections depict the most common stealth techniques: semistealth, read stealth, full stealth, cluster and sector-level stealth, and hardware-level stealth.

5.2.5.1 Semistealth (Directory Stealth)

We call a virus semistealth if it hides the change of file size but the changed content of the infected objects remains visible via regular file access. The first known semistealth virus, called Eddie-2, was written in Bulgarian virus factories4.

The semistealth technique requires the following basic attack strategy:

  1. Virus code is installed somewhere in memory.
  2. The virus intercepts file functions such as FindFirstFile or FindNextFile using FCB.
  3. It infects files of a constant size (usually).
  4. It marks infected files with a flag.
  5. When an already infected file is intercepted, the virus reduces the file size in the returned data.

Because such viruses need to determine quickly if a file is already infected, the easiest approach is to set a special marker on the file date/time stamp. One of the most popular methods was first seen in the Vienna viruses (although this virus was a direct-action type and thus did not use the trick in conjunction with stealth). Vienna sets the seconds field of the infected file's time/date stamp to an "impossible" value of 30 (which means "60 seconds") or 31 (which means "62 s0econds"). This is because the MS-DOS time/date stamp is stored as a 32-bit value. The lower 5 bits (0-4) of the time/date stamp store the seconds in "compressed" form. The real seconds are divided by 2. Thus, a stored 2 translates to 4 seconds, and 29 translates to 58. However, 5 bits is enough to claim "60" and "62" seconds, which viruses could use as an infection marker.

Because FindFirstFile and FindNextFile return this information in a data structure, the infection marker is readily available when the hook routine of a stealth virus calls the original function to get proper data about the file. So there is not much overhead to figure if the file is already infected, which is an advantage for the attacker. The data structure is manipulated for the file size, and the false data with reduced file size is returned.

Semistealth is not a very common technique on modern operating systems, such as 32-bit Windows. Nevertheless, the first documented Win32 virus, W32/Cabanas, used semistealth (or so-called directory stealth).

5.2.5.1.1 VxDCallINT21_Dispatch Handler

This technique was introduced by W95/HPS5. W95/HPS monitors 714Eh, 714Fh LFN (long file name) FindFirst/FindNext functions, which is mandatory under Windows 95 from the virus's point of view. The actual implementation of the stealth handler is unique. The virus patches the return address of FindFirst/FindNext functions on the fly on the stack to its own handler. This handler checks that the actual program size is divisible by 101 without a remainder, and if so, the virus opens the program with an extended open LFN function and then reads the virus size from the last four bytes of the infected program and subtracts this value as a 32-bit variable from the original return value of FindFirst/FindNext on the stack. Finally, it returns to the caller of the function.

In this way, the virus can hide the file size differences from most applications while the size of the virus body should not be a constant value.

5.2.5.1.2 Hook on Import Address Table (IAT)

This method was introduced by W32/Cabanas and is likely to be reused in new Win32 viruses. The same technique can work under most major Win32 platforms by using the same algorithm. The hook function is based on the manipulation of the IAT. Because the host program holds the addresses of all imported APIs in its .idata section, all the virus has to do is replace those addresses to point to its own API handlers.

First, Cabanas searches the IAT for all the possible function names it wants to hook: GetProcAddress, GetFileAttributesA, GetFileAttributesW, MoveFileExA, MoveFileExW, _lopen, CopyFileA, CopyFileW, OpenFile, MoveFileA, MoveFileW, CreateProcessA, CreateProcessW, CreateFileA, CreateFileW, FindClose, FindFirstFileA, FindFirstFileW, FindNextFileA, FindNextFileW, SetFileAttrA, or SetFileAttrW.

Whenever it finds one, it saves the original address to its own jump table and replaces the .idata section's DWORDs (which holds the original address of the API) with a pointer to its own API handlers.

Consider dump, shown in Listing 5.4, to illustrate hooked GetProcAddress(), FindFirstFileA() functions.

Listing 5.4. Hooking the IAT

.text (CODE)
0041008E E85A370000		CALL	004137ED
004137E7 FF2568004300		JMP	<strong>[00430068]</strong>
004137ED FF256C004300		JMP	<strong>[0043006C]</strong>
004137F3 FF2570004300		JMP	[KERNEL32!ExitProcess]
004137F9 FF2574004300		JMP	[KERNEL32!GetVersion]
 
.idata (00430000)
<strong>00430068</strong> 830DFA77		;-> 77FA0D83 Entry of new GetProcAddress
<strong>0043006C</strong> A10DFA77		;-> 77FA0DA1 Entry of new FindFirstFileA
00430070 6995F177		;-> 77F19569 Entry of KERNEL32!ExitProcess
00430074 9C3CF177		;-> 77F13C9C Entry of KERNEL32!GetVersion
 
NewJMPTable:
77FA0D83 B81E3CF177		MOV	EAX,KERNEL32!GetProcAddress	; Original
77FA0D88 E961F6FFFF		JMP	77FA03EE			;-> New handler
.
.
77FA0DA1 B8DBC3F077		MOV	EAX,KERNEL32!FindFirstFileA	; Original
77FA0DA6 E9F3F6FFFF		JMP	77FA049E			;-> New handler

GetProcAddress is used by many Win32 applications to make dynamical, instead of import address table-based ("static") calls. When the host application calls GetProcAddress, the new handler of the virus first calls the original GetProcAddress to get the address of the requested API. Afterward, it checks whether the function is a KERNEL32 API and whether it is one of the APIs that the virus needs to hook. If the virus wants to hook the API, it returns a new API address that will point into the hook table (NewJMPTable). Thus the host application will also get an address to the new handler.

W32/Cabanas is a directory stealth virus: during FindFirstFileA, FindFirstFileW, FindNextFileA, and FindNextFileW, the virus checks for already-infected programs. If a program is not infected, the virus will infect it; otherwise, it hides the file size difference by returning the original size of the host program. Because the cmd.exe (Command Interpreter of Windows NT) uses the preceding APIs during the DIR command, every uninfected file will be infected (if the cmd.exe was infected previously by W32/Cabanas).

5.2.5.2 Read Stealth

The read stealth technique is an attack strategy that is a bit more advanced. Read stealth shows the original content of an infected object using content simulation, usually by intercepting seek and/or read functions only.

In fact, the first stealth viruses, such as Brain, use the read stealth technique. The virus simply intercepts any access to the first sector of diskettes. When the first sector is accessed and it is not infected, the virus infects it and stores the original sector elsewhere on the diskette. When an application attempts to read the infected DBR, the virus reads the originally stored DBR sector and returns that to the caller. As a result, programs accessing "the boot sector" believe that the fake sector is the true one. See Figure 5.2 for an illustration.

Figure 5.2. A read stealth computer virus.

Figure 5.2. A read stealth computer virus.

Evidently, read stealth on the diskette is one of the simplest stealth methods. Virus writers have also implemented read stealth in file infectors on DOS. The virus does not need to do muchjust intercept read and seek access in files, returning the simulated content of the file instead. For example, a prepender virus can easily intercept the open request to any infected file. Whenever any application attempts to read the content of the infected file, the virus can easily seek the position where the original file header starts. The caller application will read the host application's content without any hesitation.

5.2.5.3 Read Stealth on Windows

You might wonder what happened to read stealth viruses on Windows systems. Do you happen to know any Windows users who remember the size of any Windows application? Who would pay attention to that in these days, when typical applications are so huge that they hardly fit on a diskette? This appears to be the primary reason why there have been only a few attempts so far to develop stealth viruses on 32-bit Windows systems. The first read stealth virus on Windows 9x, W95/Sma6, was discovered in June of 2002, about seven years after the discovery of the first 32-bit virus on Windows. Evidently, development of stealth techniques on Windows did not mature as quickly as on DOS for several reasons.

When I initially attempted to replicate the Sma virus on my test system, it was only a few minutes before I figured that I was "in the Matrix" of the virus. The "Matrix" had me, so seemingly I could not replicate it.

First I believed that I had replicated the virus. I knew that because the size of my goat files changed on the hard disk of my replication machine. I then copied the infected files to a diskette to move them over to my virus research machine. Surprisingly enough, I copied clean files. I repeated the procedure twice before I started to suspect that something was just not right with W95/Sma.

I used my Windows Commander tool on the infected machine to look into the file. Sure enough, there was nothing new in the file. In fact, the file was bigger, but nothing seemed to be appended to it. Then I accessed the file on the diskette one more time. Suddenly, the size of the file changed on the diskette also. I quickly inserted the diskette into my virus research system and saw that W95/Sma was in there. Gotcha!

The virus attempts to set the second field of the infected PE files to 4 to hide its size in such specially marked, infected files. However, there is a minor bug in the virus: It clears the bit that it wants to detect before it compares, so it will always fail to hide the size change. Infected files will appear 4KB longer, but the file seemingly does not have anything more than zeros appended to it.

Whenever an infected PE file is opened that is marked infected, the virus virtualizes the file content. In fact, it hides the changes so well that it is very difficult to see any changes at all. The virus assumes zeros for all the places where unknown data has been placed in the executable, such as the places where the decryptor of the virus would be stored in PE section slack areas. Otherwise, original content is returned for all previously modified fields of PE headers and section headers.

Evidently, if the bug were not in the code, the virus would be totally hidden from the eye. Is it? Yes and no. The virus code remains hidden from regular file _open() _read() functions. Consequently, when someone copies an infected file via such functions, the copy will first be "cleaned" from the virus.

W95/Sma, however, does not hook memory mapping at all. This means that a sequence of memory mapping APIs can reveal the infected file content, so the virus can easily be detected via these routines! This is definitely good news. It is unfortunate that most antivirus software is written using regular C functions for reasons of portability. Such functions, however, are all monitored properly by the virus, and as a result, some of the on-demand scanners can easily miss such infections in files.

Even more interesting is the payload of W95/Sma. The virus listens on a UDP port, and whatever datagram it receives will execute it in kernel mode. This allows the attacker to do practically anything he wants to do on the systemfor example, to burn the FLASH BIOS remotely.

The next section offers information about full-stealth techniques on DOS.

5.2.5.4 Full-Stealth Viruses

Resident, file infector viruses usually hook several DOS functions. (Table 5.4 lists Frodo's hook table.) Frodo hooks several functions that return the size of the file or the content of the file. The virus increments the infected files' date stamp by 100 years, which can be easily accessed later as a virus marker.

Table 5.4. The Function Hook Table of the Full-Stealth Frodo Virus

Sub Function in AHFunction Description
30hGet DOS version
23hGet File Size for FCB (File Control Block)
37hGet/Set AVAILDEV flag
4BhExec Load or Execute Program
3ChCreate or Truncate File
3DhOpen Existing File
3EhClose Existing File
0FhOpen File using FCB
14hSequential Read from FCB
21hRead Random Record from FCB file
27hRandom Block Read from FCB file
11hFind First Matching File using FCB
12hFind Next Matching File using FCB
4EhFind First Matching File
4FhFind Next Matching File
3FhRead from File
40hWrite to File
42hSeek to File Position
57hGet File Time/Date stamp
48hAllocate Memory

The DOS DIR command only shows the year field in the directory, such as 1/09/89, so you can easily miss it if the file date stamp of an executable is "in the future" at, say, 2089. Frodo can easily manipulate the data that are returned to the caller based on the detection of the extra bit of information. Whenever an application such as a virus scanner or an integrity checker tries to check the size of the file or its content, false data are returned based on the virus marker. The virus will decrement the file size of each file that has a date stamp greater than or equal to the year 2044 by 4096 (the size of the virus). Evidently, this trick only works correctly before 2008. This is because the virus adds 100 years to the date, and the DOS date runs out at 2107, and thus the date wraps around, and Frodo starts to fail. Interestingly, the virus starts to fail even more after 2044 because it can no longer distinguish between infected and clean files anymore. (Jokingly, I can say, that over time, even viruses show signs of getting old, though, of course, this is not a real world concern of the author of the virus.) Otherwise, all files are believed to be infected, and their size is incorrectly reduced by 4,096 bytes, Frodo's file infection size.

5.2.5.5 Cluster and Sector -Level File Stealth

The Bulgarian Number_of_the_Beast virus uses a remarkably advanced stealth technique. The virus infects files, but it hides the changes in them by hooking INT 13h (the BIOS disk handler). It infects the fronts of the files using the classic parasitic technique and will not infect a file if the last cluster occupied by the host does not have at least 512 bytes of free space.

The idea is simple. It is based on the fact that most DOS disks will be formatted with cluster sizes of 2,048 bytes. As a result, this will be the minimum size occupied by a file, even if it is only a couple of bytes long. This means that a cluster, as well as a sector slack space (usually less than 512 bytes), exists in which no content is saved by the system. Number_of_the_Beast uses this space to store the overwritten part of the host program. Even when the virus is not active in memory, the size of the file remains the same as it was before infection because the file size is displayed according to the directory entries.</