Research

GLAS Project: UT Center for Space Research

I worked with Prof. Bob Schutz as a research assistant from 2001-2003. My mentor was Prof. Lori A. Magurder who was a senior PhD student at that time. She is currently a member of the faculty at John Hopkins.

During my three years at CSR, I programmed the software and designed a portion of the hardware used for the calibration of the GLAS laser altimeter aboard the Ice, Cloud, and land Elevation Satellite (ICESat). The goal of the system was to independently verify the longitude, latitude, and the altitude reported by ICESat's on-board positioning system. While the complete system is very complex, I will attempt to describe the interesting parts in a few sentences. The system can be split into two major components: a laser gun attached to the ICESat and the ground detection system. The job of the on-board laser gun is to fire a laser beam every 40-ms. The job of the ground system is to detect the time and the place at which the laser beam arrived on earth. To implement the ground detection system, we deploy 900 laser detectors on the path of ICESat, and a computing system to record and process the signals from the detectors. When a detector is hit by the laser beam, it sends a signal to the computer system, which records two pieces of information: which detector was hit and at what time. One key feature of our system is that it can record the time at a very high resolution of 12.5-ns. However, just knowing which detectors were hit and the time of the hit is not enough. There are two reasons. First, the diameter of the laser beam when it reaches earth is approximately 70 meters which oftens hits multiple detectors. To this end, we proposed a simple heuristic to determine the center of the satellite from the position of the detectors which were hit. Second, our atmosphere distorts and delays the laser beam on its way to the earth surface from the satellite. We apply a series of correction algorithms to compute the correct longitude, latitude, and altitude of ICESAt. We proivde the final information to NASA such that the data reported by the satellite's positioning system can be comped with ours.

I also did two other interesting side projects to support the GLAS team.

I developed a GPS-based positioning software for an airplane. It was needed for an experiment to test the system before we sent it in space. For our experiment, we used a government airplane to fire the laser beam on our detectors placed on a runway at the old Austin airport. We felt that we needed a positioning system for the airplace with many custom options not available in GPS solutions on the market. Our software used a USB-based Garmin GPS receiver and provided two very interesting features. First, it could show custom points-of-interests (i.e. the detectors). It also informed the pilot when the airplane neared a detector and predicted if the airplane got close enough to have hit one. If it predicted a hit, it would record that event in a log so we could later compare it with the results of our gound detection system. Second, it let us replay the path of the whole flight so we could go back and see what the pilot did right or wrong:). Overall, we found that developing our own GPS solution from scratch was very effective.
Using emperical data we gather from our experiments, I developed a simulator to visualize the different patterns in which the laser beam could reach the earth surface. It helped us determine a topology for the detectors to maximize the porbabibility that at least three detectors were hit by the laser beam from ICESat.
ICESat was launched on 17th January, 2003 and our system has since been deployed at the White Sands missile range, New Mexico. The more curious readers can read our paper which was published in the Journal of Measurement, Science, and Technology.
Some mentions of our research in international media:
GLAS/ICESat Information

High Performance Substrate (HPS)

I have been a member of HPS research group since 2004. Our group develops techniques to improve performance of future computer systems. Our group has a track record of developing practical ideas, currently used in almost all computers. Here are some articles about Prof. Yale Patt and his research:

The following is a very simplified description of my main research as a part of HPS. I have also worked on two other projects but I will leave out their description for now. The analogies and explanations given below have developed over the years as a result of my numerous attempts at answering a question I commonly get asked by my relatives who are mostly medical doctors: "Do you repair computers at work or write software?". The short answer is neither and I will tell you why.

Note: While this description is not intended for computer scientists, who may prefer to read the published papers instead, they may just find it entertaining.

Before I summarize my research, I must provide some background. Computers are much like human. A human being takes input from its surroundngs through its senses (ears, eyes, etc.), processes it, and produces output (reacts) via its output devices (e.g. tongue, hands, legs etc.). Now consider your laptop. It takes input from its inputs devices, i.e. the keyboard and the mouse, processes it, and reacts using its output devices, i.e. by changing what is on the screen. Every time you move your mouse, click, or press a button on your keyboard, you create a new task for the computer to perform. The time taken to perform the task differentiates faster and slower computers. Unlike humans which can learn how to perform tasks, computers have to be given very precise instructions written by software programmers. These instructions are processed by the brain of the computer, the microprocessor. Performance of a computer system can be improved using two approaches: optimizing the software to take fewer instructions and/or improving the microprocessor to process instructions faster. A good computer design must improve performance while maintaining a balance between the two. This "art" of designing a computer system is similar to architecting a house--the microprocessor is the house to be designed and the software is the potential resident with never ending demands. Not a surprise that it is called Computer Architecture. Computer Architects are the link between the the guys who build the microprocessor and the guys who develop software which runs on it. The former tell us that they cannot build a microprocessor bigger than a finger nail, or the one which consumes a lot of electric power. The latter tell us that they don't want us to increase programmers' effort due to economic constraints. On daily basis, we develop (invent) new solutions through which future software can harness the microprocessors efficiently, without increasing microprocessors' power requirements and programmmers' effort.

Now lets discuss my research. Historically, the microprocessors' speed doubled every two years (e.g. Pentium 1, Pentium 2, ...). This improved performance with negligible programmers' effort but increased power consumption. Eventually, it became infeasible to further increase power and the industry started building computer systems with multiple simpler, low-power, lower-performance, microprocessors. They are called Chip-Multiprocessors (CMPs). CMPs are analogous to providing two brains with a low IQ instead of one with a high IQ. This poses a major challenge for the software programmers. To improve performance, they must divide each task into entities, called threads, which can run concurrently on the multiple processors provided by the CMPs. Some tasks are easy to divide while others are not. First, consider painting a room. The task can be easily parallelized such that two people can paint different walls of the room concurrently and finish faster than one person alone. Now consider the task of driving a car. Having two drivers does not reduce the driving time. To reduce driving time, we require one better driver with a higher IQ. Very similarly, in software, while some tasks are easy-to-parallelize, others are difficult or impossible to parallelize. Tasks which are easy-to-parallelize are best run on multiple small (low-IQ) processors and tasks which are difficult-to-parallelize are best run on a single big (high-IQ) processor. Current products provide either many small (low-IQ) processors (e.g. Intel Larrabee or Sun Niagara), good for the parallelized tasks, or just a few big (high-IQ) processors (e.g. IBM Power5, Intel Core2Quad, AMD Barcelona), good the non-parallelized tasks. Neither is optimal for future software which will include both parallelized and non-parallelized tasks. We propose a computing paradigm which provides one (or a few) big processor(s) and multiple small processors. We call this the Asymmetric Chip Multi-processor (ACMP). The ACMP runs the parallel program portions on the smaller cores and the non-parallel portions on the big core.

My research deals with challenges involved in designing the ACMP and developing techniques for software to leverage the ACMP without increasing the burden on the programmers. To-date, I have made three major contributions:

ACMP: Balancing Hardware-efficiency and Programmer-Efficiency. This work describes the ACMP architecture and shows how the ACMP reduces the burden on the programmers. Since the ACMP provides high performance on the non-parallel software, the programmers are no longer motivated to spend prohibitive effort trying to parallelize the difficult-to-parallelize tasks. I discuss the process of parallelizing some common software applications to provide evidence of this conjecture. We also show how the ACMP can improve performance of common programs.
Feedback-Driven Threading. This paper proposes a simple mechanism to control the amount of parallelism i.e. how many processors should be used for a task. Once again, consider the task of painting a room where we can expect that more workers will reduce the work time. However, there is a limit to how many workers can be applied before more workers do not provide benefit. For example, a 100 workers will probably take longer to paint a small room than a single worker because they will come in each others' way. Exact same is true in computers. Different microprocessors share resources and come in each others' way so it is necessary to choose the number of worker threads carefully by taking workers' interactions into account. Conventional systems either set the number of threads equal to the number of available processors, with no consideration of the interactions, or expect the programmers to choose the number of threads. The former can lead to in-efficient use of resources and the latter increases the burden on the programmer. Furthermore, the exact amount of interaction among threads is often unknown until the program is run. Thus, the programmers, in spite of their best efforts, are often unable to choose the best number of worker threads successfully. We propose a simple and cost-effective mechanism to choose the number of worker threads, while the program is running, using the most up-to-date information about thread interaction. We reduce the power consumed by the processor as well as increase performance.
An asymmetric multi-core architecture to accelerate critical sections. The study investigates how the ACMP paradigm can reduce the performance loss due to interaction among worker threads. Back to the painting the room example. While painters work in parallel for the most part, they have to wait for each other when more than one of them needs to refill paint from their single large bucket of paint. For simplicity, assume that it takes one minute to refill the bucket every time and each painter has to refill it every 3 minutes. If we have 10 painters in the room, we will soon have a long queue of idle painters waiting for each other to finish the refill. Very similarly, worker threads in parallel computer programs often wait on each other to finish critical sections, a name given to pieces of work which only one worker thread is allowed to do at a time. We propose that the ACMP paradigm can be used to reduce this overhead. Recall, the ACMP includes one big processor which can do things faster than the other smaller processors. Instead of using the big processor to do regular work (i.e. paint), we can have it be in-charge of accelerating the critical sections (i.e. refill buckets faster). This not only speeds up the threads executing the critical section (i.e. the worker requesting the refill) but also reduces the waiting time for other threads (i.e. workers waiting for their refills). Our proposed mechanism shows promising results. It not only improves performance when a given number of workers is used but, by reducing the cost of interactions among threads, it also increases the number of threads which can be applied to a particular task.

Currently, I am investigating techniques to apply the ACMP paradigm to address other major performance bottlenecks incurred by parallel programs.