2012年6月23日 星期六

Final report

    Project Title :
          CUDA-accelerated Hierarchical K-means
    Members :
          R00922016 朱冠宇 R00922042 李哲君
    Proposal Slides.
           http://cml18.csie.ntu.edu.tw/~cclee/GPUproposal.pptx
    Midterm Update:
           http://gpu-hkm.blogspot.tw/2012/05/gpu-midterm-progress.html
    Poster:
           http://cml18.csie.ntu.edu.tw/~cclee/GPGPU-poster.pdf
    Paper:
           http://cml18.csie.ntu.edu.tw/~cclee/CUDA-acceleratedHierarchicalK-means.pdf
    Demo website:
           http://cml18.csie.ntu.edu.tw/~wfuny/GPGPU/GPU_demo.php

2012年5月28日 星期一

GPU Midterm Progress

1. Our Project Title / Members / Abstract
    Project Title : CUDA-accelerated Hierarchical K-means
    Members : R00922016 朱冠宇 R00922042 李哲君
    Abstract :  In 2011, more than 350 billion photos are generated in a single year. (ref) Thus, some statistic tools which help us to manage data, such as clustering, are indispensable. K-means (KM) is one of the simplest algorithms and is widely used. However, the speed of KM will become quite slow when the data grows larger and larger. Thus, hierarchical  k-means (HKM), which is the approximation of k-means, appears. Although HKM is much faster than KM, it is still not fast enough. Thus, in this project, we want to further improve the speed of HKM by using CUDA. We also show a simple demo that help us to manage the photos by using HKM.


2. Our Proposal Slides.
     http://cml18.csie.ntu.edu.tw/~cclee/GPUproposal.pptx


3. Our Midterm Update    
    We have finished both the CPU and the GPU version.
    The comparison of the performances of GPU and CPU versions can be seen below.



    Where N is the number of 128-dim data points, K is the number of centroids in each group, and L is the number of layers. As you can see, our program can reach about 20~30 times speed up.

    Because KMeans is such a well known algorithm, there should be have some other codes already implemented. We did a quick survey, and find one open source code (http://serban.org/software/kmeans/).  In order to compare with that code, we set L=1 because it only implement KMeans algorithm. The following are the comparison results.
   
   As you can see, our program still much faster than the program provided on that website.
   That means our data arrangement is better than that code under the condition of only one layer. That's one reason why we need to write our own HKM CUDA code.
    
    The following figures are drawn by using the data on the above table. 
    We also show the speed up ratio over CPU below.

    The paragraphs above show that our GPU-implementation is much better than other codes, but we haven't shown how to use it. 
    Thus, we write a simple application that help us visualizing our images on a 2D plane.
    We run the GPU HKM algorithm on the Holiday dataset by using VLAD features. There are 1491 images in that dataset. 
    The following are the procedures:
    First, each image in that dataset will be converted into a 128*16 dim feature.
    Then, we apply HKM on these features. We set K=3 and L=4, which means finding 3 centroids for each layer, and we totally have 4 layers.
    Thus, our GPU program will generate 3+9+27+81=120 2048-dim centroids.
    In order to visualize these points easily, we project these centroids into 2-dimension space by using Multidimensional scaling (MDS), which is one of the distance-preserving algorithms.
     Then, we plot them in the following demo page.
     (You can click the points to unfold the points of next layer)
     Demo page:
         http://cml18.csie.ntu.edu.tw/~wfuny/GPGPU-tmp/draw.php
 
    Now the images above the canvas will not change when you move the cursor onto some point.
    We will add this function later.