2015 -
Moment Localization in Video Corpus, Google Research
To localize relevant segments for a text query in untrimmed and unsegmented videos at scale, we propose HierArchical Multi-Modal EncodeR (HAMMER) that encodes a video at both coarse-grained and finegrained level to extract information at different scales based on multiple subtasks, including video retrieval, segment temporal localization, and masked language modeling.
Annotation for Private Videos, Google Research
We are working on annotating private videos on Google Photos to enable and improve keyword search and creating collages, and so on. Annotating private videos is more challenging than public ones, since we are not able to watch them, and usually no labels available. We are trying both semi-supervised and completely unsupervised approaches to tackle this problem.
Large-Scale Training Framework for Video Annotation, Google Research
We present a MapReduce-based training framework, which exploits both data parallelism and model parallelism to scale training of complex video models. The proposed framework uses alternating optimization and full-batch fine-tuning, and supports large Mixture-of-Experts classifiers with hundreds of thousands of mixtures, which enables a trade-off between model depth and breadth, and the ability to shift model capacity between shared (generalization) layers and per-class (specialization) layers.
Tracking and Predicting Extreme Climate Events, with LLNL/LBNL
We propose two deep-learning-based models to track and predict hurricane trajectories on massive scale climate reanalysis data. First, we address the spatio-temporal tracking as a mapping problem from time-series climate data to time-sequential hurricane heat-maps by using ConvLSTM. Second, we pose the trajectory prediction as a problem of sequential forecasting from past to future hurricane heat map sequences. Our prediction model using ConvLSTM achieves successful mapping from predicted heat maps to ground truth.
MovieLens 20M YouTube Trailers Dataset, Google AI
We extracted audio-visual features with pre-trained deep neural networks for MovieLens 20M movie trailers available at YouTube. We obtained YouTube video IDs for the trailers by querying to www.google.com. Out of the 27,278 unique movie IDs used in MovieLens-20M, our method was able to retrieve the YouTube IDs of 25,623 trailers (= 0.94 hit rate). This dataset is publicly available at MovieLens website (link above).
Collaborative Deep Metric Learning for Video Understanding, Google AI
The goal of video understanding is to develop algorithms that enable machines understand videos at the level of human experts. We propose a deep network that embeds videos using their audio-visual content, onto a metric space which preserves video-to-video relationships. Then, we use the trained embedding network to tackle various domains including video classification and recommendation in scale.
Multi-scale Graph Convolution for Semi-supervised Node Classification, Google AI
Network of GCNs (N-GCN) marries random walks with the gragh convlutional networks (GCN), which has shown significant improvements in semi-supervised learning on graphs. At its core, N-GCN trains multiple instances of GCNs over node pairs discovered at different distances in random walks, and learns a combination of the instance outputs which optimizes the classification objective. This approach shows improved node classification results on various graphs.
Local Topic Discovery with Boosted Ensemble of NMF, with Korea University
We propose a novel ensemble model of NMF for discovering high-quality local topics. Our model successively performs NMF given a residual matrix obtained from previous stages, inspired by a state-of-the-art gradient boosting model, and generates a sequence of topic sets by applying a sophisticated local weighting scheme on the given matrix to enhance the locality of topics.
Content-based Video Recommendation, Google Research (Google internal only)
Recently, video understanding is making great progress thanks to deep learning techniques. We apply deep neural networks on learning co-watch relationship between a pair of YouTube videos to produce features optimized for recommendation purpose. Demonstration on subset of recent videos shows surprisingly nice performance to retrieve similar videos in content-wise.
YouTube-8M Dataset, Google Research
YouTube-8M is a large-scale labeled video dataset containing 8 million YouTube video IDs and associated labels from a diverse vocabulary of 4,800 visual entities. It comes with precomputed state-of-the-art vision features, makeing it possible to train video models from hundreds of thousands of video hours in less than a day on 1 GPU. Our goal is to accelerate research on large-scale video understanding, representation learning, noisy data modeling, transfer learning, and domain adaptation approaches for video.
Cold-start Recommendation using Latent Space Mapping, Amazon
Pure collaborative filtering recommenders cannot be used for cold-start users and items. To tackle this problem, we tried to learn a mapping from user or item feature space to the latent space discovered by matrix factorization. We observed significant improvement in Amazon Instant Video recommendation with user features such as book or DVD purchases.
Leveraging Wikipedia semantics for Contextual Exploration, Microsoft Research
Contextual exploration is an entity recommendation problem given a query and context, in order to satisfy user's information need directly within an application. We leverage semantic signals from Wikipedia link structures as well as relate to the context with several graph mining techniques. A crowd-sourced experimental study indicates that the proposed method successfully mines contextually-relevant pages.
Local Collaborative Ranking, Google Research [Best Student Paper at WWW 2014]
LLORMA found that local low-rank matrix assumption is more realistic in recommendation systems. We combine this local low-rank approximation based on the Frobenius norm with a general empirical risk minimization for ranking losses. Thanks to its local nature, it is easy to parallelize, making it a viable approach for large scale real-world rank-based recommendation systems.
Local Low-Rank Matrix Approximation (LLORMA)
A prevalent assumption in constructing matrix approximations was that the partially observed matrix is of low-rank. Instead, we assume that the matrix is locally of low-rank, leading to a representation of the observed matrix as a weighted sum of low-rank matrices. We verify our model both with theoretical analysis and with experimental study.
Learning Multiple-Question Decision Trees for Cold-Start Recommendation
For cold-start recommendation, it is important to rapidly profile new users and generate a good initial set of recommendations through an interview process. In this project, we propose an algorithm learning to conduct the interview process guided by a decision tree with multiple questions at each split. Both quantitative experiment and user study indicate that the proposed algorithm outperforms state-of-the-art approaches in terms of both the prediction accuracy and user cognitive efforts.
Automatic Feature Induction for Stagewise Collaborative Filtering
For the task of predicting missing ratings in collaborative filtering, we observe that different models have relative advantages in different regions of the input space. This motivates our approach of using stagewise linear combinations of collaborative filtering algorithms, with non-constant combination coefficients based on kernel smoothing. The resulting stagewise model is computationally scalable and outperforms a wide selection of state-of-the-art collaborative filtering algorithms.
A Comparative Study of Collaborative Filtering Algorithms
Both classic and recent state-of-the-art collaborative filtering techniques are compared in a variety of experimental contexts. Specifically, we report conclusions controlling for number of items, number of users, sparsity level, performance criteria, and computational complexity. Our conclusions identify what algorithms work well and in what conditions, with various measures.
PREA: Personalized Recommendation Algorithms Toolkit
With increase demand of personalized services in e-commerce, recommendation systems are playing a critical role in commercial websites. In academia, many researchers have tried to achieve better performance and accuracy with various algorithms. PREA is an open source Java software implementing recent state-of-the-art recommendation algorithms as well as popular evaluation metrics.
Personalized Academic Research Paper Recommendation System
A huge number of academic papers are coming out from a lot of conferences and journals these days, requiring researchers searching or browsing through proceedings of top conferences and journals to find their related work. To ease this difficulty, we propose a Personalized Academic Research Paper Recommendation System, which recommends related articles to each researcher in a personalized way.
Reinforcement Learning using Side Information
When an agent is learning how to act in its environment, human teacher can guide it to achieve better performance or to learn faster. In the perspective of the agent, human input can be seen as a side information. This research aims to find the best way to make use of the side information in Reinforcement learning. Currently, I am working on Pacman game domain.
Fair and Efficient Comparison Method for Keyboard Layouts
An efficient, accurate, and general-purpose method for evaluating the speed of new keyboard layout is suggested. It overcomes the problems of relative familiarity and finger memory in a new way, by appropriately mapping both new and old layouts.
Optimizing a Personalized Multigram Cellphone Keypad
Making use of a genetic algorithm, an efficient mobile-phone keypad suiting for each individual is proposed. Frequently-used multigrams from text messages are introduced to keypads, improving typing speed and efficiency, especially for short message service (SMS) on the phone.
Unified Communicator, LG Electronics
This project is for building an application which downloads and merges address books from several webmails and friend lists from Social Network Services (SNS), based on Windows Mobile. I designed the merging algorithm with a customized L-Distance algorithm in order to check similarity of strings.
Tantra: Game Server Log Analysis and Management System, NHN Corp.
This system gathers scattered server logs into one log server, and analyze them with scripts. Useful information on server status such as CPU, memory, or disk status are extracted and reported in every second.
Pandora: Abusing User Monitoring System, NHN Corp.
Based on log management system Tantra, this module detects users who try to raise levels or to get money illegally, by analyzing game server logs. I suggested and led this project, and it was selected as an excellent project of the year.
Multi-Baduk Game, NHN Corp.
Baduk game (known also as Go game) serviced in Hangame.com. Multi-Baduk project focuses on global services in China, Japan and United States, as well as Korea. It also contains multimedia features such as on-line lectures, real-time relay broadcasting, and advertisement.
HanJanggi (Korean Chess) Game, NHN Corp.
This is Korean traditional Chess-like game, also serviced in Hangame.com. For this service, I am in charge of general maintenance, periodic updates, and server platform control.
Online Broadcasting System, NHN Corp.
For a game relay service, we built an online broadcasting system, making use of a Linux-based open source Icecast. A proxy server for load balancing purpose and frame editing functions were implemented on original version of Icecast.
Video Editor for Network Camera, TechnoVision Inc.
I developed a video editor for recorded files from network camera using H.264 Codec. This program includes following functions: cutting several frames from many clips, merging them to one file, saving still-cuts, and filtering images.
Real-Time Motion Tracking using Omni-Directional PTZ Camera, TechnoVision Inc.
Motion tracking project, which observes and detects moving objects using a sensor, then tracks and records them using an omni-directional PTZ network camera on the ceiling. As a software engineer, I devised an efficient algorithm to distinguish each moving object, even when they are overlapped and separated again.
Unified Monitoring System for Sewer Pipe, TechnoVision Inc.
I participated in a project for gathering data on all sewer pipes in the country and for analyzing them. My role was developing a module analyzing data on precipitation and net flow of sewer pipes as well as visualizing the result. Optimization was the key issue because this application was on real-time basis.
8x8 Dot Table Tennis
This was the final project of Logic Design Lab course in 2004. On a PCB designed for Table Tennis game, we programmed logics for the game in two chips. For introducing the second chip, we needed to implement a way to communicate between two chips. I also implemented a power supply (5V) by myself.