High Performance Computing Group - Industrial Affiliates Program

Microarchitecture


Register File

Caches Prediction Multithreading Binary Translation Vectors and Multimedia VLIW



Clustered Speculative Multithreaded Processors

This project focuses on alternative microarchitectures that tackle the limitations of current superscalar organizations. The main features of the new researched microarchitectures are: a) a clustered organization in order to exploit communication locality; b) multiple threads of control that are speculatively created at run time by speculating on highly predictable branches; c) value and dependence prediction schemes that allow dependent threads to proceed as if they were independent.
...click here for more information...
Advanced Register Organizations
This project focuses on new register renaming schemes. First we propose to reduce the register pressure using virtual names to mark the dependences and allocate the register when it is actually needed, some cycles later on the pipeline. We also investigate new register file organizations, whith the aim of reducinng the pressure on the ports or their effective access time.
...click here for more information...
Software Managed Caches
Exposing hardware features to the compiler is an interesting alternative to reduce the growing complexity of processors. In this topic, we have implemented a data locality analysis tool. This tool has been used to optimize the memory performance in a different ways: a) We have proposed a software prefetching technique for software pipeling loops; b) We have proposed a multi-module cache architecture, each module being specialized in a different type of locality, that is partially managed by the compiler through the locality analyisis.
...click here for more information...
New Cache Architectures
In this topic, we bring together ideas from different novel cache architectures: dual data cache (hardware-managed cache with two modules in order to exploit different types of locality), pseudo-random caches (caches with conflict-resistance mapping functions based on XOR schemes), multithreading-oriented caches (...)
...click here for more information...
Value Prediction and Reuse
This project investigates the predictability of data values. Value prediction is then used to boost processor performance through data value speculation (predicting the results of instructions and executing speculatively those dependent on them), data dependence speculation (predicting whether there is a dependence between a store and a load inside the instruction window) and control speculation (predicting the outcome of conditional branches by predicting the value of their inputs and executing speculatively the branch). We also investigate value reuse, first new mechanisms are proposed to take advantage of the reusability of values. We also investigate how performance can be improved through reuse at instruction level as well as trace level.
...click here for more information...
Architectures for Efficient Execution of Binary Translated Programs
The goal of this project is to develop microarchitectures well adapted to execute programs that have been binary translated from a foreign ISA. Of course, one of the main goals is efficient execution of legacy ISAs such as the x86, but we will also focus on accelerating modern ISAs such as the Alpha.
...click here for more information...
Dixie: A Retargetable Binary Translation and Instrumentation Tool
Dixie is both a binary translator and a binary instrumentation tool. It can take a binary specified in a certain ISA  (currently, Alpha, Convex and x86) and translate it into an intermediate form (called "Dixie ISA"). At this stage the intermediate binary can be directly run on the Dixie Virtual Machine, which has been compiled on a number of 64-bit hosts. Also, the user can instrument the intermediate binary to produce all sorts of dynamic information that can feed a detailed cycle-level simulator.
...click here for more information...
Anticipation and Prediction Techniques to Reduce Data Hazards
The goal of this project is to develop mechanisms to increase the capacity of anticipation and prediction focussed on memory-access instructions. We evaluate the performance of the mechanisms considering their  hardware cost and their missprediction penalty. Moreover, we analyse the relationship between the mechanisms and the compilation technology.
...click here for more information...
Adding a Vector Unit to a Superscalar Processor
The goal of this project is to study the tradeoffs involved in adding a vector unit to a current out-of-order
superscalar processor. The vector unit targets three application domains: numerical codes, its traditional realm of application, multimedia codes and bandwidth-hungry commercial applications such as databases.
A central aspect of this research involves the design of an adequate on-chip cache hierarchy that can simultaneously handle the needs of traditional scalar/integer code and the bandwidth requirements of the
vector unit.
...click here for more information...
2D Vectorized Multimedia ISA
The goal of this project is to design a new ISA extension targeted at multimedia. The main new feature of the ISA is that it will exploit 2D parallelism: first, sub-word parallelism as is currently done in MMX, VIS, MVI, etc. Second, multiple word parallelism using traditional vectorizing techniques. By providing true vector registers to the multimedia unit (but of restricted length, say, 8 or 16 elements), an extra level
of parallelism can be extracted from multimedia applications. The multimedia vector instructions provide the capability to work with matrices of 8, 16 or 32-bit elements, which is a perfect match to the needs of multimedia applications.
...click here for more information...
Advanced Vector Memory Systems
The goal of this project is to design new methods of accessing a vector memory system that deliver better performance at a substantially lower cost. In particular, migration towards DRAM technology (in the form of SDRAM and/or RDRAM) plus intelligent memory controllers that exploit the semantic information of vector requests  are being investigated.
...click here for more information...


Software-Hardware Trace Cache

The fetch bandwidth of future aggressive processors will be a limiting factor for their performance in the near future. The number of useful instructions per cycle provided to the processor depends on the instruction cache miss rate, the number of instructions provided per access and the branch prediction strategy. In this piece of research, we make use of the Software Trace Cache presented in here  to evaluate its interaction with the hardware of future aggressive processors. Also, we want to investigate changes to the hardware so that the interaction of both software and hardware improves the performance of the code execution.
...click here for more information...
Cost-conscious techniques to exploit ILP
The goal of this project is to develop techniques to effectively exploit ILP in aggressive architectures at low cost (in terms of area and cycle time). A study have been done for VLIW architectures and numerical programs, and currently we are working on multimedia programs.
...click here for more information...