Jan Bergstra, Alban Ponse
Advances in Grid Computing - EGC 2005, LNCS 3470, pages 1097-1106. Springer-Verlag, 2005
ISBN 3-540-26918-5
2005
Download PDF (106.73Kb) (You need to be registered on forum)Extended Version for SSN - 29 November 2004
Jan Bergstra, Alban Ponse
University of Amsterdam, Programming Research Group, Kruislaan 403,
1098 SJ Amsterdam, The Netherlands
www.science.uva.nl/research/prog/
Jan Bergstra
Utrecht University, Department of Philosophy, Heidelberglaan 8,
3584 CS Utrecht, The Netherlands
www.phil.uu.nl/en/
Detecting illegal resource access in the setting of network communication or grid computing is similar to the problem of virus detection as put forward by Fred Cohen in 1984. We disucuss Cohen's impossibility result on virus detection, and introduce "risk assessment of security hazards", a notion that is decidable for a large class of program behaviors.
Keywords: Malcode, Program algebra, Polarized process algebra, Virus, Worm.
Virus, Worm, Trojan horse, Malcode.... There is a vast amount of literature on these matters, and opinions seem to differ wildly. Many authors agree that malcode contains all others, and that both a virus and a worm can replicate.1 Furthermore, a worm is more autonomous than a virus. Some authors claim that a virus can only replicate as a consequence of actions of users, and that sound education and awareness can protect users from acting with such effect. So, a virus uses a user for its replication; that user may or may not be a victim of the virus' harmful action at the same time. Unclear is if each of these users must be a human one or if background processes in a machine can also be "used" as users.
This paper is about virus detection and discusses two fundamental questions. First, we consider Cohen's result about the impossibility of a uniform tool (or algorithm) for detecting viruses in all programs [4]. This is done in the setting of the program algebra PGA [1]. Then, we define an associated notion of testing -- security hazard risk assessment -- with which the occurrence of security hazards is decidable for a large class of program behaviors. However, if divergence (the absence of halting) is considered also as a security hazard, decidability is lost.
The paper is organized as follows: in Section 2, we introduce some basics of program algebra. Then, in Section 3, we consider Cohen's impossibility result and some related issues. In Section 4 we introduce our notion of security risk assessment. In Section 5 we discuss a variant of the Halting Problem that applies to the present setting. The paper is ended with an Appendix.
Program algebra (PGA, [1]) provides a very simple notation for sequential programs and a setting in which programs can be systematically analyzed and mapped onto behaviors. Program behaviors are modeled in BPPA, Basic Polarized Process Algebra. Finally, we consider in this section some other program notations based on program algebra.
In PGA we consider basic instructions
, given by some collection
. Furthermore, for each
there is a positive test instruction
and a negative test instruction
. The control instructions are termination, notation !, and (relative) jump instructions
. Program expressions in PGA, or shortly PGA-programs, have the following syntax:
and
are PGA-programs, so is their concatenation
,
is a PGA-program, so is its repetition
.The behavior associated with the execution of PGA-programs is explained below in Section 2.2. Instruction congruence of programs has a simple axiomatization, given in Table 1.
Table 1: Axioms for PGA's instruction sequence congruence
The axioms PGA1-4 imply Unfolding, i.e. the law
, and PGA2-4 may be replaced by Unfolding and the proof rule
.
Execution of PGA-programs is modeled in BPPA, Basic Polarized Process Algebra. Given
, now considered as a collection of actions, it is assumed that upon execution each action generates a Boolean reply (true or false). Now, behavior is specified in BPPA by means of the following constants and operations:
represents (successful) termination.
represents the situation in which no subsequent behavior is possible. (Sometimes
is called deadlock or divergence.)
and behavioral expressions
and
in BPPA, the post conditional composition
describes the behavior that first executes action
, and continues with
if true was generated, and
otherwise.
and behavioral expression
, the action prefix
describes the behavior that first executes
and then continues with
, irrespective of the Boolean reply. Action prefix is a special case of post conditional composition:
.The behavior extraction operator
assigns a behavior to program
. Instruction sequence equivalent programs have of course the same behavior. Behavior extraction is defined by the thirteen equations in Table 2, where
and
is a PGA-instruction.
Table 2: Equations for behavior extraction on PGA
Some examples:
and, further taking action prefix to bind stronger than post conditional composition,
In some cases, these equations can be applied (from left to right) without ever generating any behavior, e.g.,
In such cases, the extracted behavior is defined as
.
It is also possible that behavioral extraction yields an infinite recursion, e.g.,
, and therefore,
In such cases the behavior of
is infinite, and can be represented by a finite number of behavioral equations, e.g.,
and
Note. Observe that the following identity holds:
. This identity characterizes that for a finite program object (i.e., a finite sequence of instructions), a missing termination instruction yields inaction. Conversely, this identity makes six out of the thirteen equations in Table 2 derivable (namely, those for programs of length 1 and the equation
).
The program notation PGLB is obtained from PGA by adding backwards jumps
and leaving out the repetition operator. For example,
behaves as
. This can be defined with help of a projection function pglb2pga that translates PGLB-programs in a context-dependent fashion to PGA-programs. For a PGLB-program
we write
(see further [1]).
The language PGLC is the variant of PGLB in which termination is modeled implicitly: a program terminates after the last instruction has been executed and that instruction was no jump into the program, or after a jump outside the program. The termination instruction ! is not present in PGLC. For example,
for
(see [1] for precise definitions of
and
.)
In this section we introduce the setting in which we will analyze code security risks. We then recall Cohen's impossibility result on forecasting security hazards and draw some conclusions.
Let
be some behavior that uses communication with other devices -- further called reactors --
,
and
:
Such communication will be modelled using "focus-method" notation: the reply of a basic instruction
will be determined by the reactor
. Likewise, instructions with focus
or
communicate with
and
, respectively.
Let furthermore skip be the identity on
. Now, execution is secure if no
is called until termination or first call of some
(to the external focus). A behavior can have low risk actions (secure execution expected) and high risk actions (insecure execution expected). For example,
In this section we discuss the impossibility of a tool (algorithm) that forecasts security hazards. Let SHFT be a Security Hazard Forecasting Tool with focus
, thus a reactor that forecasts a security hazard. As assumed earlier, a security hazard is in this simple setting a call (action)
for some
. Furthermore, let
be the test that uses SHFT in the following way: in
the action
returns true if
has a security hazard, and false if
has no security hazard.
Theorem 1. A Security Hazard Forecasting Tool cannot exist.
Proof. Consider
.
If the test action
returns false,
will
be performed, which is a security hazard; if true is returned, then
is performed and no security
hazard arises.

The behavior in the proof above illustrates the impossibility of prediciting that a behavior (or a program) contains a virus, a general phenomenon that was described in Cohen's seminal 1984-paper [4] and that will be further referred to as Cohen's impossiblity result. For the sake of completeness, we recall Cohen's line of reasoning. In the pseudo-code in Figure 1 (taken from [4]), D is a decision procedure that determines whether a program is (contains) a virus, ~D stands for its negation, and next labels the remainder of some (innocent) program.
{1234567;
subroutine infect-executable:=
{loop:file = get-random-executable-file;
if first-line-of-file = 1234567 then goto loop;
prepend virus to file;
}
subroutine do-damage:=
{whatever damage is to be done}
subroutine trigger-pulled:=
{return true if some condition holds}
main-program:=
{if ~D(contradictory-virus) then
{infect-executable;
if trigger-pulled then do-damage;
}
goto next;
}
}
Figure 1: Cohen's program contradictory-virus
In PGLC, the program contradictory-virus can be represented by the following term CV:
where Pre abbreviates the six instructions that model the security hazard:
and Next models the remainder of the program. Applying behavior extraction on this program yields
So,
is indeed a faithful characterization of Cohen's impossibility result.
Even with the aid of universal computational power, the problem whether a behavior has a security hazard (issues an
call) is undecidable. In Section 5.2 we give a proof of this fact.
Alternatively, we can modify our definitions concerning forecasting security hazards:
, SHFT returns false.
, then a negative reply (false) is always correct (using the definition that malcode is code for which the extracted behavior contains a security hazard).Cohen's impossibility result needs the notion of a secure run (no security hazards), as well as a secure program or behavior (a behavior that will have secure runs only). So, Cohen's impossibility result emerges if:
Now there is a difficulty with forecasting: if
returns false one hopes to proceed in such a way that the security hazard is avoided (why else do the test?). But that is not sound as was shown above. Thus we conclude: this type of forecasting security hazards is a problematic idea for security assessment.
Yet another perspective on these problems is to consider the option that test actions
may also yield a third truth value
(Meaningless). At the same time we can avoid the problem that the true-reply of the test
is about a different behavior (the left-argument) as the false-reply, making such a type of test (a little) more plausible. So, consider the following modification of
: in
the action
returns true if
has a security hazard, false if both
and
have no security hazard, and
in all other cases. `the behavior associated with the reply
can be taken
, but other options are possible (see e.g. [3]). This seems a more consistent definition. However, in
the test action
returns
, showing that forecasting security hazards in the style of Cohen remains a problematic issue.
In this section we introduce a security risk assessment tool, taking into account the above-mentioned considerations. This tool turns out to be a more plausible modeling of testing the occurrence of security hazards. However, if we add divergence (the absence of halting) as a security risk, this tool can not exist.
The following security risk assessment tool SHRAT with focus
may be conceived of as assessing a security hazard risk. In
the test action
returns true if
is secure, and false if
is insecure (then
is avoided and
is done instead). This seems to be a more rational test, because it only tests a property of a single behavior (its left argument). Using an external focus
, the test action
in
yields true because
is seen as an action that is beyond control of assessing security hazards.
For testing
actions we can employ a backtracking model: at
,
instead
So, for a behavior
, first evaluate the leftmost
test action. If this yields true, then the rightmost does as well, otherwise evaluate
. For finite behaviors this is a terminating procedure, and not problematic. Some examples of the reply of
:
Evaluation of
actions can be extended to a larger class of behaviors. A polarized regular behavior over
is defined by a finite system of equations over
(for some
) of the following form:
with
where
and
.
Consider
, thus
. Again we can decide the outcome of the test action
by doing a finite number of substitutions, linear in
. (Loops and divergence are not considered security hazards.) This leads us to the following result:
Theorem 2. For regular behaviors, the tool SHRAT is possible.
We give an example: if
then
in
yields true if it does in both
Obviously, it does not in the latter case, so this behavior equals
.
So, evaluation of the reply of
is decidable for regular behaviors. This even remains the case if a stack is added as a reactor (based on the decidability of DPDA-equivalence [5]). We conclude that Cohen's impossibility result does not apply in this case; apparently, that result is about forecasting. Of course, the decidability of the reply of
actions is lost if a Turing Tape is used as a reactor (see Section 5).
If we consider divergence as a security hazard, say by focus
and reactor DRAT (Divergence Risk Assessment Tool), we have a totally different situation: in the behavior defined by
we then obviously want that the test action
returns the answer false. It is well-known that (in general) DRAT can not exist, as it would solve the Halting Problem (further discussed in Section 5).
Now, involving divergence as a security hazard in
actions, we also find that in
the test should yield false (otherwise divergence). However, this yields a problem: in
this goes wrong: the termination problem (Turing impossibility result) "wins", and hence the backtracking model is not suitable anymore. We conclude that SHRAT (a Security Hazard Risk Assessment Tool) does not exist if
(divergence) is considered a security hazard.
In this section we elaborate on a variant of the Halting Problem, which we call the Security Hazard Property (SHP), formalizing the situation that the execution of a certain program in a certain initial state establishes a security hazard. In order to give a concise presentation, we rely on a few notations and explanations given in [3], and we will be very precise about the part of that paper that is used below.
Behaviors as considered in the above arise from PGA-programs (or programs in PGLB or PGLC). A more fundamental view on generators of behavior is the so-called SPI, the Sequence of Primitive Instructions. Of course, each program in PGA, PGLB or PGLC represents a SPI, but not each (computable) SPI can be represented by a program in one of the above-mentioned program notations, a simple example being
The above SPI defines a behavior that is not regular, and because each behavior definable by a PGA-program (PGLC-program) is regular, it is clear that we need an extended setting to specify such behaviors.
One such extension is the use of reactors, as was sketched in the above. In [2] we provide a formal treatment of the (potential) interaction of a behavior
with reactors
,
and
. Notation for that situation is the expression
or equivalently,
.
The operator
(that takes as its left-argument a behavior and as its right-argument a reactor) is called the use operator, where
is some dedicated focus. In the previous part of the paper we considered all communications of
with a reactor
implicit and wrote
instead. In other words, an expression like
as occurring in the previous part of this paper is considered to abbreviate
and this type of interaction is formalized in [2]. Furthermore, the variants of PGA and PGLC that use instructions with a focus explicitly intended for use-applications are called PGA:FMN and PGLC:FMN, where FMN abbreviates focus-method notation.
In the next section we prove that even with the availability of Turing computational power, it is undecidable that security hazards can be predicted. For this proof we use the notation and terminology explained in [3, Sections 3 and 4]. However, the material presented in Section 3.4 of [3] is not used in this exposition.
The Security Hazard Property (SHP) can be modeled as follows: a PGLCi:FMN program
executing on the ETMT with initial configuration
(
a bit sequence) has a security hazard, notation
, if
issues an action of the form
We stipulate that program
solves the question whether
in the following way:
where
is stored as a bit sequence always halts, and after halting, the tape configuration is of the form
for some 
has a security hazard, thus
,
for some 
has no security hazard, i.e.,
.Of course, it is assumed that the program
in no circumstance itself issues an
action, i.e.,
for all tape configurations
, and that its execution always terminates.
Theorem 3. The security hazard property is unsolvable by means of any program in PGLCi:FMN.
Proof. Suppose the contrary, i.e., there exists a program
that solves SHP. Let program
be defined as follows:
with
. Consider the question
. We show below that both assumptions
and
lead to a contradiction. Hence,
cannot exist, and thus
cannot exist.
First, assume that
. Then
Because
, the program
first executes
(which terminates successfully by assumption) and then starts with the first instruction of
. Thus,
for some string
. The remaining behavior is displayed in Figure 2, and results in
This last AnArch clearly represents a security hazard because of the first instruction
, and therefore
. Contradiction.
Figure 2: Critical state of the behavior in case
in the proof of Thm. 3
Now assume that
. The resulting computation is displayed in Figure 3 (for some string
). Here the last configuration represents halting without having executed any
action, and therefore
and again a contradiction occurs.
Figure 3: The case that
in the proof of Thm. 3
So our supposition was definitely wrong, i.e., there is no program
that solves the security hazard property.
Ball State University. What is a Computer Virus?2 A computer virus is a type of computer program that is designed to hide in the background and replicate itself from one computer to another by attaching itself to existing programs or parts of the operating system. Users can unknowingly transmit a virus from one system to another by copying infected program files from one machine to another or by using an infected program downloaded from the Internet. Computer viruses often infect many programs on the same system or even parts of the operating system in an attempt to spread themselves as far as possible.
A computer virus will often have a "trigger" such as a specific date or a specific program being run that will perform a benign event such as flashing a message on the users screen. Some viruses will perform more malicious deeds however, deleting or scrambling users files or their entire system. Some viruses also slow down a users system, disable certain functions, or cause erratic system behavior.
Computer Worm (Definition).3 Worms are very similar to viruses in that they are computer programs that replicate themselves and that often, but not always, contain some functionality that will interfere with the normal use of a computer or a program.
The difference is that unlike viruses, worms exist as separate entities; they do not attach themselves to other files or programs. A worm can spread itself automatically over the network from one computer to the next. Worms take advantage of automatic file sending and receiving features found on many computers.
Webopedia. Virus.4 A program or piece of code that is loaded onto your computer without your knowledge and runs against your wishes. Viruses can also replicate themselves. All computer viruses are manmade. A simple virus that can make a copy of itself over and over again is relatively easy to produce. Even such a simple virus is dangerous because it will quickly use all available memory and bring the system to a halt. An even more dangerous type of virus is one capable of transmitting itself across networks and bypassing security systems.
Since 1987, when a virus infected ARPANET, a large network used by the Defense Department and many universities, many antivirus programs have become available. These programs periodically check your computer system for the best-known types of viruses.
Some people distinguish between general viruses and worms. A worm is a special type of virus that can replicate itself and use memory, but cannot attach itself to other programs.
Worm.5 A program or algorithm that replicates itself over a computer network and usually performs malicious actions, such as using up the computer's resources and possibly shutting the system down.
Trojan horse.6 A destructive program that masquerades as a benign application. Unlike viruses, Trojan horses do not replicate themselves but they can be just as destructive. One of the most insidious types of Trojan horse is a program that claims to rid your computer of viruses but instead introduces viruses onto your computer.
The term comes from a story in Homer's Iliad, in which the Greeks give a giant wooden horse to their foes, the Trojans, ostensibly as a peace offering. But after the Trojans drag the horse inside their city walls, Greek soldiers sneak out of the horse's hollow belly and open the city gates, allowing their compatriots to pour in and capture Troy.
1 See Appendix A for some quotes about these topics.
2 www.bsu.edu/ucs/article/0,1370,6254-1978-3303,00.html
3 www.bsu.edu/ucs/article/0,1370,6254-1978-4488,00.html
4 www.webopedia.com/TERM/v/virus.html
5 www.webopedia.com/TERM/w/worm.html
6 www.webopedia.com/TERM/T/Trojan_horse.html
[Back to index] [Comments (0)]