ECE311/CSE311 Computer Engineering Seminar – Spring 2008

 

Time and Location: Wed 2-3pm in ITE 119

 

 

Faculty

 

John A. Chandy, Yunsi Fei (coordinator), Zhijie Jerry Shi (coordinator), Mohammad Tehranipoor, Lei Wang

 

 

Course Description

 

This ECE/CSE Computer Engineering Seminar meets weekly. We will have two types of presentations. Graduate students will give research presentations addressing their recent work or related work from other researchers in the area of computer engineering. Topics include VLSI design, EDA, embedded systems, computer architecture, networking, and operating systems. We will also have some invited talks or practical talks interleaved with research presentations. These talks will help to improve students’ computer and system administration skills so that to enhance their working efficiency.

 

Presentation Schedule

 

Check the preliminary schedule. Note that this is subject to change.

Date

Presenter

Advisor

Title/Abstract

01/30

Baha 

Dundar

Zhijie Shi

Cryptanalysis of MD4 and MD5

 

In this talk some analysis techniques of two important popular hash functions will be given: MD4 and MD5. We review promising cryptanalysis methods by Dobbertin: Cryptanalysis of MD4, Journal of Cryptology, Vol 11, pp. 253-271, 1998. This was the first paper which give full collisions of MD4. The algorithm is also practical which finds collisions in a few seconds. Then we go through cryptanalysis method of MD5 given by Wang and Yu: How to Break MD5 and Other Hash Functions, In proceedings of EUROCRYPT 2005, Vol. 3494 in Lecture Notes in Computer Science, pp. 19–35. Springer Verlag, 2005. This is also the first one which finds the collisions of full rounds of MD5 without regarding initial values.

02/06

Xuan Guan

Yunsi Fei

Reducing Register File Power Consumption through Partitioning and Compiler Support

 

Register file in modern embedded processors contributes a substantial budget in the processor energy consumption due to its large switching capacitance and long working time. It is found that 25% of registers can account for 83% of register file accessing time during many embedded application execution. This fact motivates us to reduce the register file power consumption by partitioning the registers to different regions according their usage pattern. The most frequently used registers are put in the hot part, and the cold part of register file is rarely accessed. We employ the register file bitline splitting and the drowsy register cell techniques in our design to reduce the overall accessing power of the register file. We propose a novel approach to partition the register file in a way so that the largest power saving can be achieved. We formulate the register file partitioning process into a graph partitioning problem, and apply an effective algorithm to obtain the optimal result. Our algorithm is evaluated on MiBench, and an average saving of 42.2% in the register file access power consumption over the original non-partitioned register file is achieved for the Alpha system.

02/20

Juan Carlos

Martinez Santos

Yunsi Fei

Leveraging Speculative Architectures for Run-time Program Validation

 

Program execution can be tampered by malicious attackers through exploiting software vulnerabilities, such as format string vulnerability, buffer overflow, etc. Changing the program behavior by compromising control data and decision data has become the most serious threat to computer systems security. Although several hardware approaches have been presented to validate program control flow, they mostly suffer great hardware area or poor ambiguity handling. In this paper, we propose a new hardware-based approach by leveraging the existing speculative architectures for run-time control flow validation. The on-chip branch target buffer (BTB) is utilized as a cache of the legitimate control flow transfers stored in a secure memory region. In addition, the BTB is extended to store the correct program path execution information. At each indirect branch site, the BTB is used to validate the decision history of conditional branches before it, and more information about the future decision path is fetched to check the following execution at run-time. Implementation of this approach is transparent to the upper operating system and programs. Thus, it is applicable to legacy code. Due to good code locality of the executable programs and the effectiveness of branch prediction, the frequency of run-time control flow validations against the secure off-chip memory is low. Our experimental results show negligible performance penalty and small storage overhead with ambiguity reduced.

 

03/05

Hai Lin

Yunsi Fei

Harnessing Horizontal Parallelism and Vertical Instruction Packing of Programs to Improve System Overall Efficiency

 

Multi-issue processors can exploit the Instruction Level Parallelism (ILP) of programs to improve the performance greatly. How to reduce the energy consumption while maintaining the high performance of programs running on multi-issue processors remains a challenging problem. In this paper, we propose a novel approach to apply the instruction register file (IRF) technique from single-issue processor to VLIW architecture. Frequently executed
instructions are selected to be placed in the on-chip IRF for fast access in program execution. Violation of synchronization among VLIW instruction slots
is avoided by introducing new instruction formats and microarchitectural support. The enhanced VLIW architecture is thus able to orchestrate the horizontal instruction
parallelism and vertical instruction packing for programs to improve system overall efficiency. Our experimental results show that the proposed processor architecture achieves both the performance advantage provided by the VLIW
architecture and high energy efficiency provided by the IRF-based instruction packing technique (e.g., 71.1% reduction in the fetch energy consumption for a 4-way VLIW architecture with 8-entry IRFs).

03/19

Fan Zhang

Jerry Shi

An Efficient Window-Based Countermeasure to Power Analysis of ECC Algorithms

Elliptic curve cryptography (ECC) has been adopted in many systems because it requires shorter keys than traditional public-key algorithms in primary fields. However, power analysis attacks can exploit the power consumptions of ECC devices to retrieve secret keys. In this paper, we propose an efficient window-based countermeasure that is secure against existing power analysis attacks. Compared to previously proposed countermeasures, our method has low memory overhead, requiring only a table of w+1 entries for a window size of w bits. It also has better performance than many algorithms that perform one point addition or subtraction for each bit in the scalar.

03/26

Jeremy Lee

Mohammad H. Tehranipoor

Layout-Aware, IR-Drop Tolerant Transition Fault Pattern Generation

Market and customer demands have continued to push the limits of CMOS performance. At-speed test has become a common method to ensure these high performance chips are being shipped to the customers fault-free. However, at-speed tests have been known to create higher-than-average
switching activity, which normally is not accounted for in the design of the power supply network.  This potentially creates conditions for additional delay in the chip; causing it to fail during test.  In this paper, we propose a pattern compaction technique that considers the layout when generating transition delay fault patterns.  The technique focuses on evenly distributing switching activity generated by the
patterns across the layout rather than allowing high switching activity to occur in a small area in the chip that could occur with conventional delay fault pattern generation.  Due to the relationship between switching activity and IR-drop, the reduction of switching will prevent
large IR-drop in high demand regions while still allowing a suitable amount of switching to occur elsewhere on the chip to prevent fault coverage loss.  This even distribution of switching on the chip will also result in avoiding hot-spots.

03/28

Dr. John Savage

Brown University

Computing with Stochastically Assembled Nanoscale Devices

04/09

Tiansi Hu

Yunsi Fei

qRouting: An Energy-Efficient and Lifetime-Aware Routing Protocol for Underwater Sensor Networks

Underwater sensor network (UWSN) has emerged as a promising network technique for various aquatic applications and has attracted more and more attention in recent years. However, due to some constraints such as low bandwidth, high latency, high energy consumption, etc., it is challenging to build network protocols for UWSNs. In this paper, we focus on addressing the routing issue in UWSNs. We propose an adaptive, scalable, energy-efficient, and lifetime-aware routing protocol based on reinforcement learning. Our protocol assumes generic MAC protocols and aims at prolonging the lifetime of networks by making residual energy of sensor nodes more evenly distributed. The residual energy of each node as well as the energy
distribution among a group is factored in throughout the routing process to calculate the Q value, which aids in selecting the adequate forwarders for packets. Moreover, the protocol is improved to be more robust and more
energy-efficient by controlling the multi-path redundancy, routing loops, and broadcast frequency. We have performed extensive simulations of the proposed protocol on the NS-2 platform, and compared with one existing routing
protocol (VBF) in terms of packet delivery rate, energy efficiency, and lifetime.

04/16

Xiaoxiao Wang

Mohammad H. Tehranipoor

Path-RO: on chip path delay measurement under process variations

 

As technology scales to 45nm and below, process variations will present significant impact on path delay. This trend makes the deviation between simulated path delay and actual path delay in a manufactured chip more significant. In this paper, we propose a new on-chip path delay measurement structure called path-based ring oscillator (Path-RO). The proposed method creates an oscillator from a targeted path for which it is used to measure path delay in addition to the impact of process variations on path delay. To alleviate accuracy degradation caused by the architecture itself, a calibration process will also be investigated. Through experimental results on Path-ROs inserted in ITC'99 b19 benchmark, we obtain path delay distribution under different process variations. The accuracy and efficiency of path delay measurement using Path-RO are aslo verified by comparing the results obained from post-layout Hspice simulations.

04/25

Dr. Xiaoqing Wen

Kyushu Institute of Technology, Japan 

Challenges and Opportunities in Deep-submicron LSI Testing

04/30

Hai Yan

Zhijie Shi

DBR: Depth-Based Routing for Underwater Sensor Networks

Providing scalable and efficient routing services in underwater sensor networks (UWSNs) is very challenging due to the unique characteristics of UWSNs. Firstly, UWSNs often employ acoustic channels for communications because radio signals do not work well in water. Compared with radio-frequency channels, acoustic channels feature much lower bandwidths and several orders of magnitudes longer propagation delays. Secondly, UWSNs usually have very dynamic topology as sensors move passively with water currents. Some routing protocols have been proposed
to address the challenging problem in UWSNs. However, most of them assume that the full-dimensional location information of all sensor nodes in a network is known
in prior through a localization process, which is yet another challenging issue to be solved in UWSNs. In this paper, we propose a depth-based routing (DBR) protocol.
DBR does not require full-dimensional location information of sensor nodes. Instead, it needs only local depth information, which can be easily obtained with an inexpensive depth sensor that can be equipped in every underwater sensor node. A key advantage of our protocol is that it can handle network dynamics efficiently without the assistance of a localization service. Moreover, our routing protocol can take
advantage of a multiple-sink underwater sensor network architecture without introducing extra cost. We conduct extensive simulations. The results show that DBR can achieve very high packet delivery ratios (at least 95%) for dense networks with only small communication cost.

 


Previous Semesters