Optimization (computer science)
In computing, optimization modifies a system to improve its efficiency. This either reduces the resources the system consumes or increases the performance.The system can be a single computer program, a collection of computers, or even an entire network such as Internet. The optmization might be to reduce the maximum execution time, memory use, bandwidth, or some other resource. Those objectives can be mutually exclusive, and require a tradeoff.
In operations research, optimization is the problem of determining the inputs of a function that minimize or maximize its value. Sometimes constraints are imposed on the values that the inputs can take; this problem is known as constrained optimization.
In computer programming, optimization usually specifically means to modify code and its compilation settings on a given computer architecture to produce more efficient software.
Usually, the term "optimization" presumes that the system retains the same functionality. However, often a crucial optimization is to solve only the actual problem, removing useless code.
Although the word "optimization" shares the same root as "optimal," economical optimizations can rarely find a truly optimal system.
Typical problems have such a large number of possibilities that a programming organization can only afford a "good enough" solution.
Some authorities define these terms:
- code optimization or code improvement -- is used in some academic environments to mean the most effective and efficient possible code. These authorities prefer the term "code improvement" to avoid confusion with performance tuning.
- performance tuning is said to include both "code improvement" and the scalability now often needed with network-based computing and large-scale software projects.
| Table of contents |
|
2 Systemic Optimization 3 Algorithms and data structures 4 Manual optimization 5 Automated Optimizations 6 Subpages 7 References 8 Related terms |
When do I want to do optimization?
It is well told that optimization often undermines readibility and adds code that is used only to improve the performance. This may complicate programs or systems, making hard to maintain and debug. Compiler optimization, for example, may introduce weird behavior because of compiler bugs. Because of that, it is preferable that optimization or performance tuning is done in the end of development stage. In other words, often systems or programs perform poorly in middle of developing.
Load balancing spreads the load over a large number of servers. Often load balancing is done transparently (i.e., without users noticing it), using a so-called layer 4 router.
Caching stores intermediate products of computation to avoid duplicate computations.
Optimizing a whole system is usually done by human beings because the system is too complex for automated optimizers. Grid computing or distributed computing aims to optimize the whole system, by moving tasks from computers with high usage to computers with idle time.
Usually, the more memory the program uses, the faster program runs. Take a filtering program. The common practice in such a program is read each line and filter and output that line in the same time. The memory is only needed for one line, but typically the performance is poor. To improve the performance, read the entire file then output the filtered result. This typically improve the peformance dramatically but causes heaviy memory use. Caching the result is also effective though requiring some or huge memory use.
Code optimization usually starts with a rethinking of the algorithm used in the program: more often than not, a particular algorithm can be specifically tailored to a particular problem, yelding better performance than a generic algorithm. For example, the task of sorting a huge list of items is usually done with a quicksort routine, which is one of the most efficient generic algorithms. But if some characteristic of the items is exploitable (for example, they are already arranged in some particular order), a different method can be used, or even a custom-made sort routine.
After one is reasonably sure that the best algorithm is selected, code optimization can start: loops can be unrolled (for maximum efficiency of a processor cache memory), data types as small as possible can be used, an integer arithmetic can be used instead of a floating-point one, hash tables can replace linear vectors, and so on.
Performance bottlenecks can be due to the language rather than algorithms or datastructures used in the program. Sometimes, a critical part of the program can be re-written in a different, faster programming language. For example, it is common for very high-level languages like Python to have modules written in C, for a greater speed. Programs already written in C can have modules written in assembly. See subpages for each language-specific optimization:
Manual optimization often has the side-effect of undermining readability. Thus code optimizations should be carefully documented and their effect on future development evaluated.
Today, automated optimizations are almost exclusively limited to compiler optimization. Compiler optimization is used to improve the efficiency (in terms of running time or resource usage) of the code output by a compiler. These techniques allow programmers to write code in a straightforward manner, expressing their intentions clearly, while allowing the computer to make choices about implementation details that lead to efficient code. Contrary to what the term might imply, this rarely results in code that is perfectly "optimal" by any measure, only code that is much improved compared to direct translation of the programmer's original code.
Further problems with optimizing compilers are:
In the early times of computer science, compiler optimizations were not as good as hand-written ones. As compiler technologies have improved, good compilers can often generate better code than human programmers and good post pass optimizers can improve highly hand-optimized code even further. In the RISC CPU architecture, compiler optimization is the key for obtaining an optimal code, because the RISC instruction set is so compact that it is hard for a human to manually schedule or combine small instructions to get efficient results.
Techniques in optimization can be broken up along various dimensions:
; local vs. global: Local techniques tend to be easier to implement, but result in lesser gains, while global techniques make the opposite tradeoff. Often, optimizations that act on the complete control flow graph are termed global, and those that work on a basic block alone are termed local.
; programming language-independent vs. language-dependent: Most high-level languages share common programming constructs and abstractions--decision (if, switch, case), looping (for, while, repeat .. until, do .. while), encapsulation (structures, objects). Thus similar optimization techniques can be used across languages. However certain language features make some kinds of optimizations possible and/or difficult. For instance, the existence of pointers in C and C++ makes certain optmizations of array accesses difficult. Conversely, in some languages functions are not permitted to have "side effects". Therefore, if repeated calls to the same function with the same arguments are made, the compiler can immediately infer that results need only be computed once and the result referred to repeatedly.
; machine independent vs. machine dependent: A lot of optimizations that operate on abstract programming concepts (loops, objects, structures) are independent of the machine targetted by the compiler. But many of the most effective optimizations are those that best exploit special features of the target platform.
These dimensions aren't completely orthogonal. It is often the case that machine dependent optimizations are local, for example.
An instance of a local machine dependent optimization: to set a register to 0, the obvious way is to use the constant 0 with the instruction that sets a register value to a constant. A less obvious way is to XOR a register with itself. It is up to the compiler to know which instruction variant to use. On many RISC machines, both instructions would be equally appropriate, since they would both be the same length and take the same time. On many other microprocessors such as the Intel x86 family, it turns out that the XOR variant is shorter and maybe faster (no need to decode an immediate operand nor use the internal "immediate operand register").
; The machine itself
It is sometimes possible to parametrize some of these machine dependent factors. A single piece of compiler code can be used to optimize different machines just by altering the machine description parameters. See GCC for a compiler that exemplifies this approach.
; The architecture of the target CPU
; The architecture of the machine
; Debugging : A main factor is speed of compilation. Also, debugging code is usually stepped through in a symbolic debugger -- so it is useful not to apply transformations that make it difficult to identify the source code line numbers from which the code being stepped through was generated from.
; General purpose use : Prepackaged software is very often expected to be executed on a variety of machines and CPUs that may share the same instruction set, but have different timing, cache or memory characteristics. So, the code may not be tuned to any particular CPU, or may be tuned to work well on the most popular CPU and work reasonably on other CPUs.
; Special purpose use : If the software is compiled to be used on one or a few very similar machines, with known characteristics, then the compiler can heavily tune the generated code to those machines alone.
To a large extent, optimization techniques have the following themes, which sometime conflict
; Less code : There is less work for the CPU, cache, and memory. So, likely to be faster.
; Straight line code, fewer jumps : Less complicated code. Jumps interfere with the prefetching of instructions, thus slowing down code.
; Code locality : Pieces of code executed close together in time should be placed close together in memory, which increases spatial locality of reference.
; Extract more information from code : The more information the compiler has, the better it can optimize.
Some optimization techniques are:
; loop unrolling : Attempts to reduce the overhead inherent in testing a loop's condition by replicating the code in the loop body. Completely unrolling a loop, of course, requires that the number of iterations be known at compile time.
; loop combining : Another technique which attempts to reduce loop overhead. When two adjacent loops would iterate the same number of times (whether or not that number is known at compile time), their bodies can be combined as long as they make no reference to each other's data.
; loop interchange : (swapping inner and outer loops)
; common subexpression elimination : In the expression "(a+b)-(a+b)/4", "common subexpression" refers to the duplicated "(a+b)". Compilers implementing this technique realize that "(a+b)" won't change, and as such, only calculate its value once.
; test reordering : if we have two tests that are the condition for something, we can first deal with the simpler tests (e.g. comparing a variable to something) and only then with the complex tests (e.g. those that require a function call). This technique complements lazy evaluation, but can be used only when the tests are not dependent on one another.
; constant folding and propagation : replacing expressions consisting of constants (e.g. "3 + 5") which their final value ("8") rather than doing the calculation in run-time. Used in most modern languages.
; inlining of procedures : when some codes invokes a procedure, it is possible to put the body of the procedure inside this code rather than invoking it from another location. This saves the overhead related to procedure calls, but comes at the cost of duplicating the function body each time it's called inline. Generally, inlining is useful in performance-critical code that uses makes a lot of small procedure calls. If any parameters to the procedure are constants known at compile time, inlining may result in more constants to propagate.
; instruction scheduling :
; unreachable code elimination: Dead code refers to code segments which can never be executed. Dead code elimination prevents the compiler from emitting code for such segments, saving on CPU instruction cache.
; dead code elimination: Remove instructions that will not affect the behaviour of the program. For example in "int myFunc(int a){int b,c; b=a*2; c=a*3; return b;}" the statement "c=a*3" doesn't do anything useful and can be removed.
; using CPU instructions with complex offsets to do math : on many processors in the 68000 family, for example, "lea 25(a1,d5*4),a0" assigns to the a0 register 25 + the contents of a1 + 4 * the contents of d5 in a single instruction and without an explicit move or overwriting a1 or d5
; code-block reordering: reduce conditional branches and improve locality of reference
; factoring out of invariants : if an expression is carried out both when a condition is met and otherwise, it can be written just once outside of the conditional statement. Similarly, if certain types of expressions (e.g. the assignment of a constant into a variable) appear inside a loop, they can be moved out of it because their effect will be the same no matter if they're executed many times or just once. Also known as total redundancy elimination. A more powerful optimization is partial redundancy elimination (PRE).
; removing recursion : recursion is often expensive as it involves the function call overhead, and inconvenient as it can rapidly deplete the stack. Tail recursive algorithms can be converted to iteration, which does not have call overhead and uses a constant amount of the stack.
; strength reduction : a general term encompassing optimisations that replace complex or difficult or expensive operations with simpler ones. A classic example is array indexing in C. Say I have the following code:
; register optimization: The most frequently used variables should be kept in processor registers for fastest access. To find which variables to put in registers a interference-graph is created. Each variable is a vertex and when two variables are used at the same time (have an intersecting liverange) they have an edge between them. This graph is colored using for example Chaitin's algorithm using the same number of colors as there are registers. If the coloring fails one variable is "spilled" to memory and the coloring is retried.
; Stack height reduction: Rearrange expression tree to minimize resources needed for expression evaluation.
; Branch offset optimization (machine independent): Choose the shortest branch displacement that reaches target
Systemic Optimization
The architectural design of a system can overwhelmingly affect its performance.Algorithms and data structures
The choice of algorithm affects efficiency more than any other item of the design. Usually, more complex algorithms and data structures perform well with many items while simple algorithms are more suitable to small amounts of data. For small sets of data, the set-up and initialization time of the more complex algorithm can outweigh the benefit of the better algorithm.Manual optimization
In this technique, programmers or system administrators explicitly change code so that the system performs better. Although it can produce better efficiencies, it is far more expensive than automated optimizations.
Rewriting pays off because of a law known as the 90/10 law, which states that 90% of the time is spent in 10% of the code, and only 10% of the time in the remaining 90% of the code. So optimizing just a small part of the program can have a huge effect on the overall speed.Automated Optimizations
The program that does the automated optimization is called an optimizer. Most optimizers are embedded in compilers and operate during compilation. Optimizers often can tailor the generated code to specific processors.
This is where so-called "post pass" optimizers come in. These tools take the program code output by an "optimizing" compiler and optimize it even further (see http://www.absint.com/aipop for an example of such tool). As opposed to compilers which optimize intermediate representations of programs, post pass optimizers work on the Assembly language level.Factors affecting optimization
Compilers have to schedule instructions such that the pipelines don't stall, or stalls are reduced to a minimum.
Here again, instructions have to be scheduled so that the the various functional units are fully fed with instructions to execute.Intended use of the generated code
Optimization techniques
Common Themes
Techniques
/* set all elements of a to x */
for(i=0; i
}
for(i=0; i
}
for(i=0); i
}
t2 = 0
for(i=0; i
}
; reduction of cache collisions : (e.g. by disrupting alignment within a page) Subpages
References
Related terms