Assembly Program Performance Analysis Metrics: Instructions Performed and Program Latency Exemplified on Loop Unroll
Jonathan Paul Cempron, Chudrack Shalym Salinas*, and Roger Luis Uy
Computer Technology Department
De La Salle University, Taft Avenue, Manila, Philippines
Software program optimization for improved execution speed can be achieved through modifying the program. Said program is usually written in High-Level languages then later translated into Low-Level languages, a language specific to the processor used. A larger coverage of optimization can be achieved through optimizing in Low-Level Language – rather than in the High-Level language – because all High-Level languages are eventually translated to Low-Level. One method that has been used in the past is Loop Unrolling, which is done by transforming iterative looping blocks into longer sequential code blocks. This method of optimization increases code length but reduces branching instructions and the latencies introduced by said instructions. However, measuring the performance difference between the original code against the loop unroll optimized code cannot be exposed using current static performance metrics, which rely on IC. Alternative metrics – Instructions Performed and Instruction Latency – are proposed for examining the effectivity of optimization due to the limitations in traditional metrics based on IC. As an extension of loop unrolling, its specific explanation in this paper is discussed as a pre-processor for auto-vectorization. The specific methods of vectorization, however, will not be a part of this paper’s scope.
High-level and Low-level Languages
Programming is the means of instructing a computer to perform certain operations. Giving computer instructions is done using programming languages. Programming languages have two different levels: High-Level and Low-Level programming languages. High-Level programming languages such as Java, C, C++, COBOL, and FORTRAN are characterized by their relative accessibility and ease of use for the programmer. High-Level programming languages abstract programming in a way that the instructions given to and interpreted by a computer can be written in a way that is much more similar to human languages (Casavant 1988). A High-Level language is then compiled and translated to Low-Level language, which is easily understood by . . . . . read more
AHO A, LAM M, ULMAN J, SETHI R. 2007. Compilers Principles Techniques & Tools, 2nd ed. Boston, USA: Pearson Addison Wesley.
ALETAN S, LIVELY W. 1988. Architectural Design Methodology for Supporting High Level Programming Languages in Computer Languages. Proceedings of IEEE International Conference. Miami Beach, FL, USA. p. 63-66.
BHARADWAJ VP, RAO M. 2016. Compiler Optimization for Superscalar and Pipelined Processors. Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER). IEEE. Mangalore, India.
CASAVANT T. 1988. Low-Level Programming of Parallel Supercomputers. Proceedings of COMPSAC 88: The Twelfth Annual International Computer Software & Applications Conference. Chicago, IL, USA. p. 274-275.
HAHN H. 1992. Assembler Inside & Out. Berkeley, CA, USA: Osborne McGraw-Hill.
HENNESSY J, PATTERSON D. 2009. Computer Organization and Design: The Hardware / Software Interface, 4th ed. Massachusetts, USA: Elsevier.
HENNESSY J, PATTERSON D. 2012. Computer Architecture: A Quantitative Approach, 5th ed. Massachusetts. USA: Elsevier. p. 148-334.
INTEL COROPORATION. 2010. A Guide to Vectorization with Intel C++ Compilers. Retrieved from https://software.intel.com/sites/default/files/m/4/8/8/2/a/31848-CompilerAutovectorizationGuide.pdf on 10 Dec 2016.
LEMPEL O, PELEG A, WEISER U. 1997. Intel’s MMX TM Technology - A New Instruction Set Extension. Proceedings of IEEE COMPCON 97. Digest of Papers. San Jose, CA, USA. p. 255-259.
LEPAK K, CAIN H, LIPASTI M. 2003. Redeeming IPC as a Performance Metric for Multithreaded Programs. Proceedings of 12th International Conference on Parallel Architectures and Compilation Techniques. New Orleans, LA, USA. p. 232-243.
LUPORINI F, VARBANESCU AL, RATHGEBER F, BERCEA G-T, RAMANUJAM J, HAM DA, KELLY PHJ. 2015. Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly. ACM Transactions on Architecture and Code Optimization (TACO) 11(4); Jan 2015.
MACHADO R, ALMEIDA R, JARDIM A. 2017. Comparing Performance of C Compilers Optimizations on Different Multicore Architectures. Computer Architecture and High Performance Computing Workshops (SBAC-PADW). IEEE, Campinas, Brazil.
MCKEE S, WEAVER V. 2009. Code Density Concerns for New Architectures. 2009 IEEE International Conference on Computer Design. Lake Tahoe, CA, USA. p. 459-464.
ORACLE. 2018. Primitive Data Types. Retrieved from https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html on Feb 2018.
SHU W, WANG K, XIAO B. 2009. A Framework for Software Performance Simulation Using Binary to C Translation. 2009 Pacific-Asia Conference on Circuits, Communications and Systems. Chengdu, China. p. 602-605.
SONG L, FENG M, RAVI N, YANG Y, CHAKRADHAR S. 2014. COMP: Compiler Optimizations for Manycore Processors. MICRO-47 Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. Cambridge, United Kingdom. p. 280-292.
SPADACCINI A. 2012. EduMIPS64 Documentation R1.0. Retrieved from https://github.com/lupino3/edumips64/releases/download/v1.2.3/edumips64-1.2.3-manual-en.pdf on Feb 2015.