|Moment Localization in Video Corpus, Google Research
To localize relevant segments for a text query in untrimmed and unsegmented videos at scale, we propose HierArchical Multi-Modal EncodeR (HAMMER) that encodes a video at both coarse-grained and finegrained level to extract information at different scales based on multiple subtasks, including video retrieval, segment temporal localization, and masked language modeling.
|Annotation for Private Videos, Google Research|
We are working on annotating private videos on Google Photos to enable and improve keyword search and creating collages, and so on. Annotating private videos is more challenging than public ones, since we are not able to watch them, and usually no labels available. We are trying both semi-supervised and completely unsupervised approaches to tackle this problem.
|Large-Scale Training Framework for Video Annotation, Google Research
We present a MapReduce-based training framework, which exploits both data parallelism and model parallelism to scale training of complex video models. The proposed framework uses alternating optimization and full-batch fine-tuning, and supports large Mixture-of-Experts classifiers with hundreds of thousands of mixtures, which enables a trade-off between model depth and breadth, and the ability to shift model capacity between shared (generalization) layers and per-class (specialization) layers.
|Tracking and Predicting Extreme Climate Events, with LLNL/LBNL
We propose two deep-learning-based models to track and predict hurricane trajectories on massive scale climate reanalysis data. First, we address the spatio-temporal tracking as a mapping problem from time-series climate data to time-sequential hurricane heat-maps by using ConvLSTM. Second, we pose the trajectory prediction as a problem of sequential forecasting from past to future hurricane heat map sequences. Our prediction model using ConvLSTM achieves successful mapping from predicted heat maps to ground truth.
|MovieLens 20M YouTube Trailers Dataset, Google AI
We extracted audio-visual features with pre-trained deep neural networks for MovieLens 20M movie trailers available at YouTube. We obtained YouTube video IDs for the trailers by querying to www.google.com. Out of the 27,278 unique movie IDs used in MovieLens-20M, our method was able to retrieve the YouTube IDs of 25,623 trailers (= 0.94 hit rate). This dataset is publicly available at MovieLens website (link above).
|Collaborative Deep Metric Learning for Video Understanding, Google AI
The goal of video understanding is to develop algorithms that enable machines understand videos at the level of human experts. We propose a deep network that embeds videos using their audio-visual content, onto a metric space which preserves video-to-video relationships. Then, we use the trained embedding network to tackle various domains including video classification and recommendation in scale.
|Multi-scale Graph Convolution for Semi-supervised Node Classification, Google AI
Network of GCNs (N-GCN) marries random walks with the gragh convlutional networks (GCN), which has shown significant improvements in semi-supervised learning on graphs. At its core, N-GCN trains multiple instances of GCNs over node pairs discovered at different distances in random walks, and learns a combination of the instance outputs which optimizes the classification objective. This approach shows improved node classification results on various graphs.
|Local Topic Discovery with Boosted Ensemble of NMF, with Korea University
We propose a novel ensemble model of NMF for discovering high-quality local topics. Our model successively performs NMF given a residual matrix obtained from previous stages, inspired by a state-of-the-art gradient boosting model, and generates a sequence of topic sets by applying a sophisticated local weighting scheme on the given matrix to enhance the locality of topics.
|Content-based Video Recommendation, Google Research
(Google internal only)|
Recently, video understanding is making great progress thanks to deep learning techniques. We apply deep neural networks on learning co-watch relationship between a pair of YouTube videos to produce features optimized for recommendation purpose. Demonstration on subset of recent videos shows surprisingly nice performance to retrieve similar videos in content-wise.
|YouTube-8M Dataset, Google Research
YouTube-8M is a large-scale labeled video dataset containing 8 million YouTube video IDs and associated labels from a diverse vocabulary of 4,800 visual entities. It comes with precomputed state-of-the-art vision features, makeing it possible to train video models from hundreds of thousands of video hours in less than a day on 1 GPU. Our goal is to accelerate research on large-scale video understanding, representation learning, noisy data modeling, transfer learning, and domain adaptation approaches for video.
|Cold-start Recommendation using Latent Space Mapping, Amazon|
Pure collaborative filtering recommenders cannot be used for cold-start users and items. To tackle this problem, we tried to learn a mapping from user or item feature space to the latent space discovered by matrix factorization. We observed significant improvement in Amazon Instant Video recommendation with user features such as book or DVD purchases.
|Leveraging Wikipedia semantics for Contextual Exploration, Microsoft Research
Contextual exploration is an entity recommendation problem given a query and context, in order to satisfy user's information need directly within an application. We leverage semantic signals from Wikipedia link structures as well as relate to the context with several graph mining techniques. A crowd-sourced experimental study indicates that the proposed method successfully mines contextually-relevant pages.
|Local Collaborative Ranking, Google Research [Best Student Paper at WWW 2014]
LLORMA found that local low-rank matrix assumption is more realistic in recommendation systems. We combine this local low-rank approximation based on the Frobenius norm with a general empirical risk minimization for ranking losses. Thanks to its local nature, it is easy to parallelize, making it a viable approach for large scale real-world rank-based recommendation systems.
|Local Low-Rank Matrix Approximation (LLORMA)
A prevalent assumption in constructing matrix approximations was that the partially observed matrix is of low-rank. Instead, we assume that the matrix is locally of low-rank, leading to a representation of the observed matrix as a weighted sum of low-rank matrices. We verify our model both with theoretical analysis and with experimental study.
|Optimizing a Personalized Multigram Cellphone Keypad |
Making use of a genetic algorithm, an efficient mobile-phone keypad suiting for each individual is proposed. Frequently-used multigrams from text messages are introduced to keypads, improving typing speed and efficiency, especially for short message service (SMS) on the phone.
|Unified Communicator, LG Electronics |
This project is for building an application which downloads and merges address books from several webmails and friend lists from Social Network Services (SNS), based on Windows Mobile. I designed the merging algorithm with a customized L-Distance algorithm in order to check similarity of strings.
|Tantra: Game Server Log Analysis and Management System, NHN Corp. |
This system gathers scattered server logs into one log server, and analyze them with scripts. Useful information on server status such as CPU, memory, or disk status are extracted and reported in every second.
|Pandora: Abusing User Monitoring System, NHN Corp.|
Based on log management system Tantra, this module detects users who try to raise levels or to get money illegally, by analyzing game server logs. I suggested and led this project, and it was selected as an excellent project of the year.
|Multi-Baduk Game, NHN Corp.
Baduk game (known also as Go game) serviced in Hangame.com. Multi-Baduk project focuses on global services in China, Japan and United States, as well as Korea. It also contains multimedia features such as on-line lectures, real-time relay broadcasting, and advertisement.
|HanJanggi (Korean Chess) Game, NHN Corp. |
This is Korean traditional Chess-like game, also serviced in Hangame.com. For this service, I am in charge of general maintenance, periodic updates, and server platform control.
|Online Broadcasting System, NHN Corp. |
For a game relay service, we built an online broadcasting system, making use of a Linux-based open source Icecast. A proxy server for load balancing purpose and frame editing functions were implemented on original version of Icecast.
|Video Editor for Network Camera, TechnoVision Inc.|
I developed a video editor for recorded files from network camera using H.264 Codec. This program includes following functions: cutting several frames from many clips, merging them to one file, saving still-cuts, and filtering images.
|Real-Time Motion Tracking using Omni-Directional PTZ Camera, TechnoVision Inc. |
Motion tracking project, which observes and detects moving objects using a sensor, then tracks and records them using an omni-directional PTZ network camera on the ceiling. As a software engineer, I devised an efficient algorithm to distinguish each moving object, even when they are overlapped and separated again.
|Unified Monitoring System for Sewer Pipe, TechnoVision Inc.|
I participated in a project for gathering data on all sewer pipes in the country and for analyzing them. My role was developing a module analyzing data on precipitation and net flow of sewer pipes as well as visualizing the result. Optimization was the key issue because this application was on real-time basis.