GitHub - yichao0319/lens

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
subtask_3ddct		subtask_3ddct
subtask_TM_to_video		subtask_TM_to_video
subtask_analyze_3g		subtask_analyze_3g
subtask_channel_selection		subtask_channel_selection
subtask_compressive_sensing		subtask_compressive_sensing
subtask_detect_anomaly		subtask_detect_anomaly
subtask_distance		subtask_distance
subtask_ewma		subtask_ewma
subtask_feature_match		subtask_feature_match
subtask_ffmpeg		subtask_ffmpeg
subtask_gravity		subtask_gravity
subtask_inject_error		subtask_inject_error
subtask_lens3_gamma		subtask_lens3_gamma
subtask_mpeg		subtask_mpeg
subtask_mpeg_lc		subtask_mpeg_lc
subtask_parse_abilene		subtask_parse_abilene
subtask_parse_csi_channel		subtask_parse_csi_channel
subtask_parse_huawei_3g		subtask_parse_huawei_3g
subtask_parse_mawi		subtask_parse_mawi
subtask_parse_multi_loc_rssi		subtask_parse_multi_loc_rssi
subtask_parse_ron		subtask_parse_ron
subtask_parse_sensor		subtask_parse_sensor
subtask_parse_sjtu_wifi		subtask_parse_sjtu_wifi
subtask_parse_telos_rssi		subtask_parse_telos_rssi
subtask_parse_totem		subtask_parse_totem
subtask_parse_ucsb_meshnet		subtask_parse_ucsb_meshnet
subtask_parse_umich_rss		subtask_parse_umich_rss
subtask_parse_utah_cir		subtask_parse_utah_cir
subtask_pca		subtask_pca
subtask_plot_TM		subtask_plot_TM
subtask_plot_pred		subtask_plot_pred
subtask_process_4sq		subtask_process_4sq
subtask_psnr		subtask_psnr
subtask_rank		subtask_rank
subtask_temporal		subtask_temporal
subtask_time		subtask_time
subtask_tomography		subtask_tomography
utils		utils
.gitignore		.gitignore
README		README

Repository files navigation

/*************************
 * Project: Video Compression based Anomaly Detection
 * Author: Yi-Chao Chen @ UT Austin
 *************************/


/*************************
 * Dataset
 *************************/

1. 4sq/<city>
    - FourSquare data set collected by Gene:
      The files include the checkins of the city. 
      We can retrieve <venue> and <user> info from these checkins.

    - Format:
        1. in "Airport", file name: 4SQ_VENUE_DETAILS_Airport.gz
            VENUE_DATA - venues -| stat
                                 | tags
                                 | ts
                                 | tips
                                 | checkins - users - ['last', 'gender', 'userid', 'ts', 'home', 'first']
                                 | mayor
        2. in other <city>, file name: 4SQ_VENUE_TRENDS_<city>.gz
            -| 'current_lat'
             | 'current_lng'
             | 'VENUE_INDEX'
             | 'VENUE_INFO' - venues - ['city', 'addr', 'zip', 'country', 'cate_name', 'hereNow', 'usersCount', 'state', 'contact', 'cate_id', 'ts', 'checkinsCount', 'lat', 'lng', 'id', 'name']
             | 'VENUE_DETAIL' - venues - ['stat', 'tags', 'ts', 'tips', 'checkins', 'mayor']


2. 4sq/city_info/4SQ_<city>_INFO
    - The detailed information of venues in the above data set.
      Except "Airport", other files are generated by "subtask_process_4sq/generate_city_info.py".

    - Format:
        1. in "Airport", there are several possible format:

            a) venues - ['grp_type', 'city', 'addr', 'checkinsCount', 'cate_name', 'ts', 'hereNow', 'state', 'contact', 'cate_id', 'lat', 'grp_name', 'lng', 'usersCount', 'id', 'name']

            b) venues - ['grp_type', 'name', 'addr', 'checkinsCount', 'cate_name', 'ts', 'hereNow', 'state', 'contact', 'cate_id', 'lat', 'grp_name', 'lng', 'usersCount', 'id', 'name']

        2. in other <city>, our output format is:
            a) venues - ['lat', 'lng', 'id', 'name', 'checkinsCount']


3. video
    Get video samples from:
    - http://trace.eas.asu.edu/yuv/
    - http://media.xiph.org/video/derf/

    - stefan_cif.yuv
        CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 90 frames
    - bus_cif.yuv
        CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 150 frames
    - foreman_cif.yuv
        CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 300 frames
    - coastguard_cif.yuv
        CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 300 frames
    - highway_cif.yuv
        CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 2000 frames

    The video files are large, so I put them in valleyview local disk: 
    /var/local/yichao/anomaly_compression/data/video/

4. huawei_cellular/BS_gps_hourly_traffic.txt
   TM sample produced upon 3G dataset.
   - row: 3075 rows, each row represents the traffic time series of one Base Station
     column: 26 columns, first two are gps values, next 24 columns represent 24 one-hour traffic(in bytes).

5. Traffic Matrix

    - MAWI
        1. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top100.txt.86400.
            25 frames
            93 * 91
        2. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top150.txt.86400.
            25 frames
            138 * 138
        3. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top200.txt.86400.
            25 frames
            180 * 180
        4. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top500.txt.86400.
            25 frames
            93 * 91
    - 4SQ

    - SJTU WiFi
        1. processed_data/subtask_parse_sjtu_wifi/tm/tm_upload.sort_ips.ap.country.txt.3600.top400.
            19 frames
            250 * 193
           processed_data/subtask_parse_sjtu_wifi/tm/tm_download.sort_ips.ap.country.txt.3600.top400.
            19 frames
            193 * 250
        2. processed_data/subtask_parse_sjtu_wifi/tm/tm_upload.sort_ips.ap.bgp.sub_CN.txt.3600.top400.
            19 frames
            250 * 400
           processed_data/subtask_parse_sjtu_wifi/tm/tm_download.sort_ips.ap.bgp.sub_CN.txt.3600.top400.
            19 frames
            400 * 250
        3. processed_data/subtask_parse_sjtu_wifi/tm/tm_upload.sort_ips.ap.gps.5.txt.3600.top400.
            19 frames
            250 * 400
           processed_data/subtask_parse_sjtu_wifi/tm/tm_download.sort_ips.ap.gps.5.txt.3600.top400.
            19 frames
            400 * 250
        
        4. Group by NUM top loaded APs
           processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi.ap_load.all.bin600.top50.txt  
            114 (time) * 50 (APs)
           processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi.ap_load.dl.bin600.top50.txt  
            114 (time) * 50 (APs)
           processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi.ap_load.ul.bin600.top50.txt
            114 (time) * 50 (APs)

           processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi2.ap_load.all.bin600.top100.txt
            287 (time) * 100 (APs)
           processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi2.ap_load.dl.bin600.top100.txt
            287 (time) * 100 (APs)
           processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi2.ap_load.ul.bin600.top100.txt
            287 (time) * 100 (APs)

            


    - Huawei 3G
        1. Group by lat,lng of BS

           processed_data/subtask_parse_huawei_3g/region_tm/tm_3g_region_all.res0.006.bin10.sub.
            146 frames
            26 * 21

        2. Group by BS at different areas:

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs0.all.bin10.txt
            group by BS
            BS types: unknown
            1074 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs1.all.bin10.txt
            group by BS
            BS types: general urban area
            458 * 145
        
           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs2.all.bin10.txt
            group by BS
            BS types: general urban area
            48 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs3.all.bin10.txt
            group by BS
            BS types: general urban area
            472 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs3.all.bin60.txt
            group by BS
            BS types: general urban area
            472 * 24

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs4.all.bin10.txt
            group by BS
            BS types: general urban area
            24 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs5.all.bin10.txt
            group by BS
            BS types: general urban area
            1 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs6.all.bin10.txt
            group by BS
            BS types: general urban area
            240 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs7.all.bin10.txt
            group by BS
            BS types: general urban area
            14 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs8.all.bin10.txt
            group by BS
            BS types: general urban area
            19 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs9.all.bin10.txt
            group by BS
            BS types: general urban area
            24 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs10.all.bin10.txt
            group by BS
            BS types: general urban area
            82 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs11.all.bin10.txt
            group by BS
            BS types: general urban area
            13 * 145

        3. group by BS (all BSs)

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.all.all.bin10.txt
            2469 * 145

        4. group by BS (all BSs) and choose the top loaded BSs

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.load.top200.all.bin10.txt
            200 * 145

        5. group by BS (all BSs) and choose the most stable BSs

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.stable.top200.all.bin10.txt
            200 * 145

        6. group by RNC

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.rnc.all.bin10.txt
            13 * 145

    - GEANT (Totem)
        1. processed_data/subtask_parse_totem/tm/tm_totem.
            10772 frames
            23 * 23
            time bin = 15 minutes

    - Abilene
        1. data/abilene/X
            1008 (time) * 121 (od pairs)
            time bin = 10 minutes

        2. processed_data/subtask_parse_abilene/tm/tm_abilene.od.
            same as above, but in 3D version
            1008 frames
            11 * 11
            time bin = 10 minutes

    - CSI
        1. Static
            /v/filer4b/v27q002/ut-wireless/swati/processed_traces/MonitorExp1/128.83.158.127_file.dat0_matrix.mat.txt
             9850 * 90

            /v/filer4b/v27q002/ut-wireless/swati/processed_traces/MonitorExp1/128.83.158.50_file.dat0_matrix.mat.txt
             9706 * 90

        2. Mobile
            data/csi/mobile/Mob-Recv1run1.dat0_matrix.mat_dB.txt
             10000 * 90

            data/csi/mobile/Mob-Recv1run1.dat1_matrix.mat_dB.txt
             10000 * 90

    - Sensor
        1. IntelLab
            processed_data/subtask_parse_sensor/tm/tm_sensor.temp.bin600.txt
            processed_data/subtask_parse_sensor/tm/tm_sensor.humidity.bin600.txt
            processed_data/subtask_parse_sensor/tm/tm_sensor.light.bin600.txt
            processed_data/subtask_parse_sensor/tm/tm_sensor.voltage.bin600.txt
            4943 * 54

    - RON
        processed_data/subtask_parse_ron/tm/tm_ron1.latency.
            12 * 12 * 494

    - Cister RSSI: telos
        processed_data/subtask_parse_telos_rssi/tm/tm_telos_rssi.txt
            10000 * 16

    - CU RSSI: multi location
        processed_data/subtask_parse_multi_loc_rssi/tm/tm_multi_loc_rssi.txt
            500 * 895 (179 nodes * 5 monitors)

    - Channel CSI
        condor_data/subtask_parse_csi_channel/csi/static_trace13.ant1.mag.txt
            5000 * 270

    - UCSB Meshnet
        condor_data/subtask_parse_ucsb_meshnet/tm/tm_ucsb_meshnet.connected.txt
            1527 * 425
    
    - UMich RSS
        condor_data/subtask_parse_umich_rss/tm/tm_umich_rss.txt
            3127 * 182



/*************************
 * Subtasks
 *************************/

1. subtask_process_4sq
    a) generate_city_info.py
        - Goal: read 4sq checkins and produce the information of all venues in the dataset.

        - Input:
            1. city: the name of the city. e.g. Airport, Manhattan, Austin, San_Francisco

        - Output:
            1. ../processed_data/subtask_process_4sq/combined_city_info/4SQ_<city>_INFO
                - The information of the venues in the city.
                  Will be link to ../data/4sq/city_info/4SQ_<city>_INFO and used in generating TM.
                - Format:
                  venues - ['lat', 'lng', 'id', 'name', 'checkinsCount']

        - Batch Run:
            1. batch_generate_city_info.sh

    b) generate_Human_TM.py
        - Goal: read 4sq checkins and produce human traffic matrix
        
        - Input: 
            1. period: generate a traffic matrix with "period" days of checkins data.
            2. city: the name of the city. e.g. Airport, Manhattan, Austin, San_Francisco
        
        - Output:
            1. ../processed_data/subtask_process_4sq/TM/<city>_sorted.txt
                - The order of airports in TM
                - Format: <airport name> <lat> <lng>
            2. ../processed_data/subtask_process_4sq/TM/TM_<city>_period<period>_<index>.txt
                - The Human Traffic Matrix using <period> days of data
        
        - Variables:
            1. user_hist: userid - ts - ['last', 'gender', 'userid', 'ts', 'home', 'first', 'lat', 'lng', 'venue', 'venue_id']
        
        - Batch Run:
            1. batch_generate_Human_TM.sh


    c) plot_TM.mother.plot
        - Goal: given the Human TM generated above, plot the heat map using Gnuplot

        - Output:
            1. ../figures/subtask_process_4sq/TM/TM_period<period>_<index>.eps
        
        - Batch Run:
            1. batch_plot_TM.pl

2. subtask_psnr
    To compare PSNR of videos compressed using MPEG and PCA.
    It also output the compressed video for anomaly detection.

    a) PCA_psnr.m
        - Goal: calculate the PSNR of a video which is compressed by PCA low-rank approximation.
            step 1: given a video with [frame, width, height, YUV] pixels.
            step 2: convert to a 2D matrix: [frame, width * height * YUV]
            step 3: fragment the 2D matrix into small ones:
                    fragment i = [frame, x_i:y_i]
            step 4: apply PCA to each fragment with a given rank.
                    the rank decides the compression ratio and quality
            step 5: reconstruct the approximated matrix
            step 6: calculate PSNR

        - Input:
            1. num_PC: num of PCs to use (i.e. rank)
            2. video_name: the name of raw video (assume the video format: YUV CIF 4:2:0)
            3. frames: number of frames to analyze
            4. width: the width of the video
            5. height: the height of the video

        - Output:
            1. PSNR: the PSNR (dB) of the PCA low-rank approximation.
            2. compressed size: the size of the PCA approximation. e.g. the size of principle components and eigenvectors

        - Batch Run:
            1. batch_PCA_psnr.m

    b) PCA_psnr_by_frame.m
        - Goal: calculate the PSNR oa a vidoe which is compressed by PCA low-rank approximation
            step 1: given a video with [frame, width, height, YUV] pixels.
            step 2: make a GoP every [4 or 8 or 16] frames 
            step 3: convert GoP into a 2D matrix.
            step 4: apply DCT to the 2D matrix.
            step 5: apply PCA to the DCT output with a given rank.
            step 6: reconstruct the approximated matrix
            step 7: apply inverted DCT the the approximated matrix
            step 8: calculate PSNR

    c) DCT_psnr.m
        - Goal: calculate the PSNR of a video which is compressed using 3D DCT
            step 1. given a video with [frame, width, height, YUV] pixels.
            step 2: make a GoP every [4 or 8 or 16] frames 
            step 3: apply 3D DCT to each GoP.
            step 4: partition a GoP into small chunks (e.g. 44x35 pixels a chunk)
            step 5: for each chunk, calculate the error after iDCT if the chunk is removed
            step 6: remove chunks with small error
            step 7: apply inverted DCT to the matrix with only remaining chunks
            step 8: calculate PSNR

        - DCT_psnr_combine_yuv.m
            The only difference of this code is to apply DCT on a 4D array: [width, height, frame, YUV].

    d) DCT_psnr_combine_yuv.m
        - Goal: Similar to "DCT_psnr.m", but instead of handling YUV seperately, this one combine them into one and apply 3D DCT to the combined matrix.

        Note. it's too slow (due to the larger matrix), so not used for now...

    e) compressive_sensing_psnr.m
        - Goal: calculate the PSNR of a video which is compressed using compressive sensing.
            step 1. given a video with [frame, width, height, YUV] pixels.
            step 2: make a GoP every [4 or 8 or 16] frames 
            step 3: apply compressive sensing with spatial and temporal constraints to each GoP.
            step 4: reconstruct the GoP using U and V returned by compressive sensing
            step 5: calculate PSNR

    f) yuv_psnr.m
        - Goal: calculate the PSNR of two videos

        - Input: 
            1. video_name1: file name and path of the 1st video.
            2. video_name2: file name and path of the 2nd video.
            3. frames: number of frames to analyze
            4. width: the width of the video
            5. height: the height of the video

3. subtask_inject_error
    The objective of this task is to inject anomalies to a given matrix or video.

    a) inject_err.m
        - Goal: inject anomalies by adding some large numbers to the given matrix.

4. subtask_TM_to_video
    Convert the matrix to a YUV video.
    Because ffmpeg only work on video, before implementing our own MPEG encoding, we need to convert TM to video to apply MPEG based anomaly detection method. 

    a) TM_to_video.m
        - Goal: convert the given 3D matrix to a YUV video.
            Since a pixel in the YUV video only have 1 byte, I put the 1st byte to V, 2nd byte to U, and 3rd byte to Y (assuming the values in the matrix have at most 3 bytes.)

    b) sanity_check.m
        - Goal: the code is used to check my implementation is correct..

5. subtask_ffmpeg
    Use ffmpeg to convert raw YUV video to MPEG, and also convert MPEG back to YUV video.
    a) batch_convert.sh
    b) batch_convert_TM.sh

6. subtask_detect_anomaly
    After getting the normal subspace (i.e. compressed video) using scripts in "subtask_psnr" and "subtask_ffmpeg, the scripts here calculate the abnormal subspace (i.e. the difference between raw video and compressed video) and detect anomalies.

    a) diff_orig_comp_video.m
        - Goal: calculate the difference between raw video and compressed video.

    b) detect_anomaly.m
        - Goal: given the difference time-series from "diff_orig_comp_video.m", this script detect anomalies and return the performance.


/*************************
 * Helpers: 
 *   ./utils/
 *************************/
1. Matlab Lib: YUV2Image
    - YUV2Image/
    - http://www.mathworks.com/matlabcentral/fileexchange/6318-convert-yuv-cif-420-video-file-to-image-files/

2. Gene Lee's codes
    - data.py, googlemaps.py, utils.py
    - used to process FourSquare dataset.

3. PSNR
    - calculate_psnr.m
    - The codes copied from somewhere online to calculate PSNR of a video

4. Matlab Lib: DCT/IDCT
    - mirt_dctn
    - http://www.mathworks.com/matlabcentral/fileexchange/24050-multidimensional-discrete-cosine-transform-dct