University of Southern California

Smart Storage Systems
  • Data movement cost from storage devices to compute nodes is extremely high for modern data processing applications. Such big data applications pay a significant fraction of their execution time on the data input/output (I/O) time. Near data processing (NDP) is a prominent approach by offloading part of computation to embedded storage processors. NDP is becoming a more viable option with SSDs, which equip embedded processors and page buffer memory. This computation potential in the embedded storage processors can facilitate active storage systems, which enable computing near storage or application-specific storage data management.
  • “Graph Semantic Aware SSD” (ISCA ’19)
  • Summarizer” (MICRO ’17)
GPU Memory Hierarchy Architecture
  • GPU’s memory operation is a critical performance bottleneck since lots of memory requests are issued from dozens of warps (or wavefronts) within a short time window. Thus, limited resources in memory subsystem suffer from low efficiency due to severe data contention and heavy data traffic. I characterized diverged memory requests in GPGPU applications to reveal main performance bottlenecks from the critical load instructions. I also investigated GPU’s unique data access patterns by global loads. Based on these observations I presented the GPU cache management method to utilize the data cache more efficiently.
  • Victimc Cache using Idle Register File” (ISCA ’19)
  • Access Pattern-Aware Cache Management” (ISCA ’17)
  • Revealing Critical Loads” (IISWC ’15)
  • Warped-Compression” (ISCA ’15)
  • Long latency of memory operation is one of critical performance hurdles in general computing processors including GPUs. Prefetch can be one of prominent approach to hide this long latency of data fetch. I proposed an efficient GPU prefetch mechanism based on GPU’s unique software execution model and array index calculation approaches. I also suggested a warp scheduling scheme that can enhance the timeliness of the prefetcher.
  • “CTA-Aware Prefetching and Scheduling” (IPDPS ’18)
  • Warped-Preexecution” (HPCA ’16)
Energy Efficient Computing
  • Battery efficiency varies with usage patterns of mobile computing due to the non-ideal characteristics in converting chemical to electrical energy. Besides the efficiency of power delivery network decides actual lifetime of batteries. I measured end-point power usage patterns and battery discharge time by mobile applications. I presented the control method applying DVFS to CPUs, GPUs, and peripherals to maximize the efficiency of battery and power delivery network.


Memory Controller for Server Architecture
  • I investigated the memory controller architecture for high-performance server processors and next-generation DRAM. Based on the performance analysis using a wide range of datacenter server application traces, I figured out performance issues that lead to high data access latency. I proposed several solutions that can alleviate read/write interventions to lower DRAM access latency.

LG Electronics

Frame Rate Conversion SoC
  • I developed the hardware architecture and RTL design of the enhanced motion estimation processor (eMEP) employed in the frame rate conversion (FRC) SoC. The 3rd generation of FRC SoC included more function such as 2D-to-3D image conversion, 3D depth control and LED local dimming. I revised my motion estimation algorithm to support 3D image frames for the latest generation SoC. I made a hardware equivalent model of the motion estimation algorithm to implement eMEP hardware design. I verified the implemented hardware blocks with the in-house FPGA platform and ZeBu ASIC emulation system. To the best of my knowledge, the 1st generation FRC SoC is the world’s first one chip solution for real 200/240Hz FRC. FRC SoCs were deployed in the high-end 200/240Hz 3D TV sets, which were awarded in 2010 CES.
Digital TV Main SoC
  • While at Digital TV Laboratory I participated in the DTV main SoC research project. The purpose of the project was developing a SoC that integrates most of the DTV functions such as DTV signal processing, image processing, and frame control within one chip package. I developed the hardware architecture and RTL design of the motion estimation processor (MEP). I verified the implemented design with the custom FPGA platform and ZeBu ASIC emulation system. I also fulfilled ASIC back-end tasks such as timing closure and gate-level debugging.
Frame Rate Up-Conversion Algorithm
  • While at Digital TV Laboratory I developed a cost-effective motion estimation algorithm for frame rate up-conversion, which can reduce judder and halo effects without artifacts on the flat panel digital TV sets. I proposed an efficient block-based complementary motion estimation algorithm, which enables accurate motion detection between frames without heavy computation.
  • Complementary Block-Based Motion Estimation” (ICCE ’11)
Multi-Format Optical Disc Player and Recorder Front-End SoC
  • While at Digital Storage Research Laboratory I developed hardware and algorithms for the data read channel employed in the front-end SoC for optical disk playback and recording. I proposed signal processing algorithms for modulator, demodulator, digital timing recovery, equalizer, and PRML decoder supporting multi-format optical discs (CD, DVD, and Blu-ray disc). I also designed the hardware architecture and RTL codes base on the proposed signal processing algorithms. I verified the hardware design with the custom FPGA platform to demonstrate the stable playback of optical discs that have damages and defects.
  • PRML Read Channel with Digital Timing Recovery” (ISCAS ’06)

Seoul National University

Gigabit Ethernet Physical Layer Chip
  • I proposed digital equalizer, Viterbi decoder, and timing recovery algorithms that meet IEEE 802.3ab specification supporting Ethernet PHY for 10/100/1000 BASE-T using UTP-5 cables. The proposed design was verified with a wide range of channel parameters. I implemented the digital hardware blocks that include the proposed digital processing algorithms and the PHY controller layer for Gigabit Ethernet controller chip. I presented the proposed digital equalizer and Viterbi decoder design as my thesis for Master’s degree.


Flash ROM Controller for New Media for Music Contents
  • I was one of the members leading the project awarded by Korea University Students Venture Contest. I designed the flash ROM controller employed to music contents media, which can substitute conventional optical storage media.