Skip to content

yichao0319/lens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

153 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

/*************************
 * Project: Video Compression based Anomaly Detection
 * Author: Yi-Chao Chen @ UT Austin
 *************************/


/*************************
 * Dataset
 *************************/

1. 4sq/<city>
    - FourSquare data set collected by Gene:
      The files include the checkins of the city. 
      We can retrieve <venue> and <user> info from these checkins.

    - Format:
        1. in "Airport", file name: 4SQ_VENUE_DETAILS_Airport.gz
            VENUE_DATA - venues -| stat
                                 | tags
                                 | ts
                                 | tips
                                 | checkins - users - ['last', 'gender', 'userid', 'ts', 'home', 'first']
                                 | mayor
        2. in other <city>, file name: 4SQ_VENUE_TRENDS_<city>.gz
            -| 'current_lat'
             | 'current_lng'
             | 'VENUE_INDEX'
             | 'VENUE_INFO' - venues - ['city', 'addr', 'zip', 'country', 'cate_name', 'hereNow', 'usersCount', 'state', 'contact', 'cate_id', 'ts', 'checkinsCount', 'lat', 'lng', 'id', 'name']
             | 'VENUE_DETAIL' - venues - ['stat', 'tags', 'ts', 'tips', 'checkins', 'mayor']


2. 4sq/city_info/4SQ_<city>_INFO
    - The detailed information of venues in the above data set.
      Except "Airport", other files are generated by "subtask_process_4sq/generate_city_info.py".

    - Format:
        1. in "Airport", there are several possible format:

            a) venues - ['grp_type', 'city', 'addr', 'checkinsCount', 'cate_name', 'ts', 'hereNow', 'state', 'contact', 'cate_id', 'lat', 'grp_name', 'lng', 'usersCount', 'id', 'name']

            b) venues - ['grp_type', 'name', 'addr', 'checkinsCount', 'cate_name', 'ts', 'hereNow', 'state', 'contact', 'cate_id', 'lat', 'grp_name', 'lng', 'usersCount', 'id', 'name']

        2. in other <city>, our output format is:
            a) venues - ['lat', 'lng', 'id', 'name', 'checkinsCount']


3. video
    Get video samples from:
    - http://trace.eas.asu.edu/yuv/
    - http://media.xiph.org/video/derf/

    - stefan_cif.yuv
        CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 90 frames
    - bus_cif.yuv
        CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 150 frames
    - foreman_cif.yuv
        CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 300 frames
    - coastguard_cif.yuv
        CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 300 frames
    - highway_cif.yuv
        CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 2000 frames

    The video files are large, so I put them in valleyview local disk: 
    /var/local/yichao/anomaly_compression/data/video/

4. huawei_cellular/BS_gps_hourly_traffic.txt
   TM sample produced upon 3G dataset.
   - row: 3075 rows, each row represents the traffic time series of one Base Station
     column: 26 columns, first two are gps values, next 24 columns represent 24 one-hour traffic(in bytes).

5. Traffic Matrix

    - MAWI
        1. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top100.txt.86400.
            25 frames
            93 * 91
        2. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top150.txt.86400.
            25 frames
            138 * 138
        3. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top200.txt.86400.
            25 frames
            180 * 180
        4. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top500.txt.86400.
            25 frames
            93 * 91
    - 4SQ

    - SJTU WiFi
        1. processed_data/subtask_parse_sjtu_wifi/tm/tm_upload.sort_ips.ap.country.txt.3600.top400.
            19 frames
            250 * 193
           processed_data/subtask_parse_sjtu_wifi/tm/tm_download.sort_ips.ap.country.txt.3600.top400.
            19 frames
            193 * 250
        2. processed_data/subtask_parse_sjtu_wifi/tm/tm_upload.sort_ips.ap.bgp.sub_CN.txt.3600.top400.
            19 frames
            250 * 400
           processed_data/subtask_parse_sjtu_wifi/tm/tm_download.sort_ips.ap.bgp.sub_CN.txt.3600.top400.
            19 frames
            400 * 250
        3. processed_data/subtask_parse_sjtu_wifi/tm/tm_upload.sort_ips.ap.gps.5.txt.3600.top400.
            19 frames
            250 * 400
           processed_data/subtask_parse_sjtu_wifi/tm/tm_download.sort_ips.ap.gps.5.txt.3600.top400.
            19 frames
            400 * 250
        
        4. Group by NUM top loaded APs
           processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi.ap_load.all.bin600.top50.txt  
            114 (time) * 50 (APs)
           processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi.ap_load.dl.bin600.top50.txt  
            114 (time) * 50 (APs)
           processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi.ap_load.ul.bin600.top50.txt
            114 (time) * 50 (APs)

           processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi2.ap_load.all.bin600.top100.txt
            287 (time) * 100 (APs)
           processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi2.ap_load.dl.bin600.top100.txt
            287 (time) * 100 (APs)
           processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi2.ap_load.ul.bin600.top100.txt
            287 (time) * 100 (APs)

            


    - Huawei 3G
        1. Group by lat,lng of BS

           processed_data/subtask_parse_huawei_3g/region_tm/tm_3g_region_all.res0.006.bin10.sub.
            146 frames
            26 * 21

        2. Group by BS at different areas:

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs0.all.bin10.txt
            group by BS
            BS types: unknown
            1074 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs1.all.bin10.txt
            group by BS
            BS types: general urban area
            458 * 145
        
           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs2.all.bin10.txt
            group by BS
            BS types: general urban area
            48 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs3.all.bin10.txt
            group by BS
            BS types: general urban area
            472 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs3.all.bin60.txt
            group by BS
            BS types: general urban area
            472 * 24

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs4.all.bin10.txt
            group by BS
            BS types: general urban area
            24 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs5.all.bin10.txt
            group by BS
            BS types: general urban area
            1 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs6.all.bin10.txt
            group by BS
            BS types: general urban area
            240 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs7.all.bin10.txt
            group by BS
            BS types: general urban area
            14 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs8.all.bin10.txt
            group by BS
            BS types: general urban area
            19 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs9.all.bin10.txt
            group by BS
            BS types: general urban area
            24 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs10.all.bin10.txt
            group by BS
            BS types: general urban area
            82 * 145

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs11.all.bin10.txt
            group by BS
            BS types: general urban area
            13 * 145

        3. group by BS (all BSs)

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.all.all.bin10.txt
            2469 * 145

        4. group by BS (all BSs) and choose the top loaded BSs

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.load.top200.all.bin10.txt
            200 * 145

        5. group by BS (all BSs) and choose the most stable BSs

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.stable.top200.all.bin10.txt
            200 * 145

        6. group by RNC

           processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.rnc.all.bin10.txt
            13 * 145

    - GEANT (Totem)
        1. processed_data/subtask_parse_totem/tm/tm_totem.
            10772 frames
            23 * 23
            time bin = 15 minutes

    - Abilene
        1. data/abilene/X
            1008 (time) * 121 (od pairs)
            time bin = 10 minutes

        2. processed_data/subtask_parse_abilene/tm/tm_abilene.od.
            same as above, but in 3D version
            1008 frames
            11 * 11
            time bin = 10 minutes

    - CSI
        1. Static
            /v/filer4b/v27q002/ut-wireless/swati/processed_traces/MonitorExp1/128.83.158.127_file.dat0_matrix.mat.txt
             9850 * 90

            /v/filer4b/v27q002/ut-wireless/swati/processed_traces/MonitorExp1/128.83.158.50_file.dat0_matrix.mat.txt
             9706 * 90

        2. Mobile
            data/csi/mobile/Mob-Recv1run1.dat0_matrix.mat_dB.txt
             10000 * 90

            data/csi/mobile/Mob-Recv1run1.dat1_matrix.mat_dB.txt
             10000 * 90

    - Sensor
        1. IntelLab
            processed_data/subtask_parse_sensor/tm/tm_sensor.temp.bin600.txt
            processed_data/subtask_parse_sensor/tm/tm_sensor.humidity.bin600.txt
            processed_data/subtask_parse_sensor/tm/tm_sensor.light.bin600.txt
            processed_data/subtask_parse_sensor/tm/tm_sensor.voltage.bin600.txt
            4943 * 54

    - RON
        processed_data/subtask_parse_ron/tm/tm_ron1.latency.
            12 * 12 * 494

    - Cister RSSI: telos
        processed_data/subtask_parse_telos_rssi/tm/tm_telos_rssi.txt
            10000 * 16

    - CU RSSI: multi location
        processed_data/subtask_parse_multi_loc_rssi/tm/tm_multi_loc_rssi.txt
            500 * 895 (179 nodes * 5 monitors)

    - Channel CSI
        condor_data/subtask_parse_csi_channel/csi/static_trace13.ant1.mag.txt
            5000 * 270

    - UCSB Meshnet
        condor_data/subtask_parse_ucsb_meshnet/tm/tm_ucsb_meshnet.connected.txt
            1527 * 425
    
    - UMich RSS
        condor_data/subtask_parse_umich_rss/tm/tm_umich_rss.txt
            3127 * 182



/*************************
 * Subtasks
 *************************/

1. subtask_process_4sq
    a) generate_city_info.py
        - Goal: read 4sq checkins and produce the information of all venues in the dataset.

        - Input:
            1. city: the name of the city. e.g. Airport, Manhattan, Austin, San_Francisco

        - Output:
            1. ../processed_data/subtask_process_4sq/combined_city_info/4SQ_<city>_INFO
                - The information of the venues in the city.
                  Will be link to ../data/4sq/city_info/4SQ_<city>_INFO and used in generating TM.
                - Format:
                  venues - ['lat', 'lng', 'id', 'name', 'checkinsCount']

        - Batch Run:
            1. batch_generate_city_info.sh

    b) generate_Human_TM.py
        - Goal: read 4sq checkins and produce human traffic matrix
        
        - Input: 
            1. period: generate a traffic matrix with "period" days of checkins data.
            2. city: the name of the city. e.g. Airport, Manhattan, Austin, San_Francisco
        
        - Output:
            1. ../processed_data/subtask_process_4sq/TM/<city>_sorted.txt
                - The order of airports in TM
                - Format: <airport name> <lat> <lng>
            2. ../processed_data/subtask_process_4sq/TM/TM_<city>_period<period>_<index>.txt
                - The Human Traffic Matrix using <period> days of data
        
        - Variables:
            1. user_hist: userid - ts - ['last', 'gender', 'userid', 'ts', 'home', 'first', 'lat', 'lng', 'venue', 'venue_id']
        
        - Batch Run:
            1. batch_generate_Human_TM.sh


    c) plot_TM.mother.plot
        - Goal: given the Human TM generated above, plot the heat map using Gnuplot

        - Output:
            1. ../figures/subtask_process_4sq/TM/TM_period<period>_<index>.eps
        
        - Batch Run:
            1. batch_plot_TM.pl

2. subtask_psnr
    To compare PSNR of videos compressed using MPEG and PCA.
    It also output the compressed video for anomaly detection.

    a) PCA_psnr.m
        - Goal: calculate the PSNR of a video which is compressed by PCA low-rank approximation.
            step 1: given a video with [frame, width, height, YUV] pixels.
            step 2: convert to a 2D matrix: [frame, width * height * YUV]
            step 3: fragment the 2D matrix into small ones:
                    fragment i = [frame, x_i:y_i]
            step 4: apply PCA to each fragment with a given rank.
                    the rank decides the compression ratio and quality
            step 5: reconstruct the approximated matrix
            step 6: calculate PSNR

        - Input:
            1. num_PC: num of PCs to use (i.e. rank)
            2. video_name: the name of raw video (assume the video format: YUV CIF 4:2:0)
            3. frames: number of frames to analyze
            4. width: the width of the video
            5. height: the height of the video

        - Output:
            1. PSNR: the PSNR (dB) of the PCA low-rank approximation.
            2. compressed size: the size of the PCA approximation. e.g. the size of principle components and eigenvectors

        - Batch Run:
            1. batch_PCA_psnr.m

    b) PCA_psnr_by_frame.m
        - Goal: calculate the PSNR oa a vidoe which is compressed by PCA low-rank approximation
            step 1: given a video with [frame, width, height, YUV] pixels.
            step 2: make a GoP every [4 or 8 or 16] frames 
            step 3: convert GoP into a 2D matrix.
            step 4: apply DCT to the 2D matrix.
            step 5: apply PCA to the DCT output with a given rank.
            step 6: reconstruct the approximated matrix
            step 7: apply inverted DCT the the approximated matrix
            step 8: calculate PSNR

    c) DCT_psnr.m
        - Goal: calculate the PSNR of a video which is compressed using 3D DCT
            step 1. given a video with [frame, width, height, YUV] pixels.
            step 2: make a GoP every [4 or 8 or 16] frames 
            step 3: apply 3D DCT to each GoP.
            step 4: partition a GoP into small chunks (e.g. 44x35 pixels a chunk)
            step 5: for each chunk, calculate the error after iDCT if the chunk is removed
            step 6: remove chunks with small error
            step 7: apply inverted DCT to the matrix with only remaining chunks
            step 8: calculate PSNR

        - DCT_psnr_combine_yuv.m
            The only difference of this code is to apply DCT on a 4D array: [width, height, frame, YUV].

    d) DCT_psnr_combine_yuv.m
        - Goal: Similar to "DCT_psnr.m", but instead of handling YUV seperately, this one combine them into one and apply 3D DCT to the combined matrix.

        Note. it's too slow (due to the larger matrix), so not used for now...

    e) compressive_sensing_psnr.m
        - Goal: calculate the PSNR of a video which is compressed using compressive sensing.
            step 1. given a video with [frame, width, height, YUV] pixels.
            step 2: make a GoP every [4 or 8 or 16] frames 
            step 3: apply compressive sensing with spatial and temporal constraints to each GoP.
            step 4: reconstruct the GoP using U and V returned by compressive sensing
            step 5: calculate PSNR

    f) yuv_psnr.m
        - Goal: calculate the PSNR of two videos

        - Input: 
            1. video_name1: file name and path of the 1st video.
            2. video_name2: file name and path of the 2nd video.
            3. frames: number of frames to analyze
            4. width: the width of the video
            5. height: the height of the video

3. subtask_inject_error
    The objective of this task is to inject anomalies to a given matrix or video.

    a) inject_err.m
        - Goal: inject anomalies by adding some large numbers to the given matrix.

4. subtask_TM_to_video
    Convert the matrix to a YUV video.
    Because ffmpeg only work on video, before implementing our own MPEG encoding, we need to convert TM to video to apply MPEG based anomaly detection method. 

    a) TM_to_video.m
        - Goal: convert the given 3D matrix to a YUV video.
            Since a pixel in the YUV video only have 1 byte, I put the 1st byte to V, 2nd byte to U, and 3rd byte to Y (assuming the values in the matrix have at most 3 bytes.)

    b) sanity_check.m
        - Goal: the code is used to check my implementation is correct..

5. subtask_ffmpeg
    Use ffmpeg to convert raw YUV video to MPEG, and also convert MPEG back to YUV video.
    a) batch_convert.sh
    b) batch_convert_TM.sh

6. subtask_detect_anomaly
    After getting the normal subspace (i.e. compressed video) using scripts in "subtask_psnr" and "subtask_ffmpeg, the scripts here calculate the abnormal subspace (i.e. the difference between raw video and compressed video) and detect anomalies.

    a) diff_orig_comp_video.m
        - Goal: calculate the difference between raw video and compressed video.

    b) detect_anomaly.m
        - Goal: given the difference time-series from "diff_orig_comp_video.m", this script detect anomalies and return the performance.


/*************************
 * Helpers: 
 *   ./utils/
 *************************/
1. Matlab Lib: YUV2Image
    - YUV2Image/
    - http://www.mathworks.com/matlabcentral/fileexchange/6318-convert-yuv-cif-420-video-file-to-image-files/

2. Gene Lee's codes
    - data.py, googlemaps.py, utils.py
    - used to process FourSquare dataset.

3. PSNR
    - calculate_psnr.m
    - The codes copied from somewhere online to calculate PSNR of a video

4. Matlab Lib: DCT/IDCT
    - mirt_dctn
    - http://www.mathworks.com/matlabcentral/fileexchange/24050-multidimensional-discrete-cosine-transform-dct








About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors