yichao0319/lens
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
/*************************
* Project: Video Compression based Anomaly Detection
* Author: Yi-Chao Chen @ UT Austin
*************************/
/*************************
* Dataset
*************************/
1. 4sq/<city>
- FourSquare data set collected by Gene:
The files include the checkins of the city.
We can retrieve <venue> and <user> info from these checkins.
- Format:
1. in "Airport", file name: 4SQ_VENUE_DETAILS_Airport.gz
VENUE_DATA - venues -| stat
| tags
| ts
| tips
| checkins - users - ['last', 'gender', 'userid', 'ts', 'home', 'first']
| mayor
2. in other <city>, file name: 4SQ_VENUE_TRENDS_<city>.gz
-| 'current_lat'
| 'current_lng'
| 'VENUE_INDEX'
| 'VENUE_INFO' - venues - ['city', 'addr', 'zip', 'country', 'cate_name', 'hereNow', 'usersCount', 'state', 'contact', 'cate_id', 'ts', 'checkinsCount', 'lat', 'lng', 'id', 'name']
| 'VENUE_DETAIL' - venues - ['stat', 'tags', 'ts', 'tips', 'checkins', 'mayor']
2. 4sq/city_info/4SQ_<city>_INFO
- The detailed information of venues in the above data set.
Except "Airport", other files are generated by "subtask_process_4sq/generate_city_info.py".
- Format:
1. in "Airport", there are several possible format:
a) venues - ['grp_type', 'city', 'addr', 'checkinsCount', 'cate_name', 'ts', 'hereNow', 'state', 'contact', 'cate_id', 'lat', 'grp_name', 'lng', 'usersCount', 'id', 'name']
b) venues - ['grp_type', 'name', 'addr', 'checkinsCount', 'cate_name', 'ts', 'hereNow', 'state', 'contact', 'cate_id', 'lat', 'grp_name', 'lng', 'usersCount', 'id', 'name']
2. in other <city>, our output format is:
a) venues - ['lat', 'lng', 'id', 'name', 'checkinsCount']
3. video
Get video samples from:
- http://trace.eas.asu.edu/yuv/
- http://media.xiph.org/video/derf/
- stefan_cif.yuv
CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 90 frames
- bus_cif.yuv
CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 150 frames
- foreman_cif.yuv
CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 300 frames
- coastguard_cif.yuv
CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 300 frames
- highway_cif.yuv
CIF, YCbCr 4:2:0 planar 8 bit, 352*288, 2000 frames
The video files are large, so I put them in valleyview local disk:
/var/local/yichao/anomaly_compression/data/video/
4. huawei_cellular/BS_gps_hourly_traffic.txt
TM sample produced upon 3G dataset.
- row: 3075 rows, each row represents the traffic time series of one Base Station
column: 26 columns, first two are gps values, next 24 columns represent 24 one-hour traffic(in bytes).
5. Traffic Matrix
- MAWI
1. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top100.txt.86400.
25 frames
93 * 91
2. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top150.txt.86400.
25 frames
138 * 138
3. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top200.txt.86400.
25 frames
180 * 180
4. processed_data/subtask_parse_mawi/tm/tm_mawi.sort_ips.top500.txt.86400.
25 frames
93 * 91
- 4SQ
- SJTU WiFi
1. processed_data/subtask_parse_sjtu_wifi/tm/tm_upload.sort_ips.ap.country.txt.3600.top400.
19 frames
250 * 193
processed_data/subtask_parse_sjtu_wifi/tm/tm_download.sort_ips.ap.country.txt.3600.top400.
19 frames
193 * 250
2. processed_data/subtask_parse_sjtu_wifi/tm/tm_upload.sort_ips.ap.bgp.sub_CN.txt.3600.top400.
19 frames
250 * 400
processed_data/subtask_parse_sjtu_wifi/tm/tm_download.sort_ips.ap.bgp.sub_CN.txt.3600.top400.
19 frames
400 * 250
3. processed_data/subtask_parse_sjtu_wifi/tm/tm_upload.sort_ips.ap.gps.5.txt.3600.top400.
19 frames
250 * 400
processed_data/subtask_parse_sjtu_wifi/tm/tm_download.sort_ips.ap.gps.5.txt.3600.top400.
19 frames
400 * 250
4. Group by NUM top loaded APs
processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi.ap_load.all.bin600.top50.txt
114 (time) * 50 (APs)
processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi.ap_load.dl.bin600.top50.txt
114 (time) * 50 (APs)
processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi.ap_load.ul.bin600.top50.txt
114 (time) * 50 (APs)
processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi2.ap_load.all.bin600.top100.txt
287 (time) * 100 (APs)
processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi2.ap_load.dl.bin600.top100.txt
287 (time) * 100 (APs)
processed_data/subtask_parse_sjtu_wifi/tm/tm_sjtu_wifi2.ap_load.ul.bin600.top100.txt
287 (time) * 100 (APs)
- Huawei 3G
1. Group by lat,lng of BS
processed_data/subtask_parse_huawei_3g/region_tm/tm_3g_region_all.res0.006.bin10.sub.
146 frames
26 * 21
2. Group by BS at different areas:
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs0.all.bin10.txt
group by BS
BS types: unknown
1074 * 145
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs1.all.bin10.txt
group by BS
BS types: general urban area
458 * 145
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs2.all.bin10.txt
group by BS
BS types: general urban area
48 * 145
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs3.all.bin10.txt
group by BS
BS types: general urban area
472 * 145
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs3.all.bin60.txt
group by BS
BS types: general urban area
472 * 24
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs4.all.bin10.txt
group by BS
BS types: general urban area
24 * 145
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs5.all.bin10.txt
group by BS
BS types: general urban area
1 * 145
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs6.all.bin10.txt
group by BS
BS types: general urban area
240 * 145
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs7.all.bin10.txt
group by BS
BS types: general urban area
14 * 145
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs8.all.bin10.txt
group by BS
BS types: general urban area
19 * 145
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs9.all.bin10.txt
group by BS
BS types: general urban area
24 * 145
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs10.all.bin10.txt
group by BS
BS types: general urban area
82 * 145
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.bs.bs11.all.bin10.txt
group by BS
BS types: general urban area
13 * 145
3. group by BS (all BSs)
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.all.all.bin10.txt
2469 * 145
4. group by BS (all BSs) and choose the top loaded BSs
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.load.top200.all.bin10.txt
200 * 145
5. group by BS (all BSs) and choose the most stable BSs
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.stable.top200.all.bin10.txt
200 * 145
6. group by RNC
processed_data/subtask_parse_huawei_3g/bs_tm/tm_3g.cell.rnc.all.bin10.txt
13 * 145
- GEANT (Totem)
1. processed_data/subtask_parse_totem/tm/tm_totem.
10772 frames
23 * 23
time bin = 15 minutes
- Abilene
1. data/abilene/X
1008 (time) * 121 (od pairs)
time bin = 10 minutes
2. processed_data/subtask_parse_abilene/tm/tm_abilene.od.
same as above, but in 3D version
1008 frames
11 * 11
time bin = 10 minutes
- CSI
1. Static
/v/filer4b/v27q002/ut-wireless/swati/processed_traces/MonitorExp1/128.83.158.127_file.dat0_matrix.mat.txt
9850 * 90
/v/filer4b/v27q002/ut-wireless/swati/processed_traces/MonitorExp1/128.83.158.50_file.dat0_matrix.mat.txt
9706 * 90
2. Mobile
data/csi/mobile/Mob-Recv1run1.dat0_matrix.mat_dB.txt
10000 * 90
data/csi/mobile/Mob-Recv1run1.dat1_matrix.mat_dB.txt
10000 * 90
- Sensor
1. IntelLab
processed_data/subtask_parse_sensor/tm/tm_sensor.temp.bin600.txt
processed_data/subtask_parse_sensor/tm/tm_sensor.humidity.bin600.txt
processed_data/subtask_parse_sensor/tm/tm_sensor.light.bin600.txt
processed_data/subtask_parse_sensor/tm/tm_sensor.voltage.bin600.txt
4943 * 54
- RON
processed_data/subtask_parse_ron/tm/tm_ron1.latency.
12 * 12 * 494
- Cister RSSI: telos
processed_data/subtask_parse_telos_rssi/tm/tm_telos_rssi.txt
10000 * 16
- CU RSSI: multi location
processed_data/subtask_parse_multi_loc_rssi/tm/tm_multi_loc_rssi.txt
500 * 895 (179 nodes * 5 monitors)
- Channel CSI
condor_data/subtask_parse_csi_channel/csi/static_trace13.ant1.mag.txt
5000 * 270
- UCSB Meshnet
condor_data/subtask_parse_ucsb_meshnet/tm/tm_ucsb_meshnet.connected.txt
1527 * 425
- UMich RSS
condor_data/subtask_parse_umich_rss/tm/tm_umich_rss.txt
3127 * 182
/*************************
* Subtasks
*************************/
1. subtask_process_4sq
a) generate_city_info.py
- Goal: read 4sq checkins and produce the information of all venues in the dataset.
- Input:
1. city: the name of the city. e.g. Airport, Manhattan, Austin, San_Francisco
- Output:
1. ../processed_data/subtask_process_4sq/combined_city_info/4SQ_<city>_INFO
- The information of the venues in the city.
Will be link to ../data/4sq/city_info/4SQ_<city>_INFO and used in generating TM.
- Format:
venues - ['lat', 'lng', 'id', 'name', 'checkinsCount']
- Batch Run:
1. batch_generate_city_info.sh
b) generate_Human_TM.py
- Goal: read 4sq checkins and produce human traffic matrix
- Input:
1. period: generate a traffic matrix with "period" days of checkins data.
2. city: the name of the city. e.g. Airport, Manhattan, Austin, San_Francisco
- Output:
1. ../processed_data/subtask_process_4sq/TM/<city>_sorted.txt
- The order of airports in TM
- Format: <airport name> <lat> <lng>
2. ../processed_data/subtask_process_4sq/TM/TM_<city>_period<period>_<index>.txt
- The Human Traffic Matrix using <period> days of data
- Variables:
1. user_hist: userid - ts - ['last', 'gender', 'userid', 'ts', 'home', 'first', 'lat', 'lng', 'venue', 'venue_id']
- Batch Run:
1. batch_generate_Human_TM.sh
c) plot_TM.mother.plot
- Goal: given the Human TM generated above, plot the heat map using Gnuplot
- Output:
1. ../figures/subtask_process_4sq/TM/TM_period<period>_<index>.eps
- Batch Run:
1. batch_plot_TM.pl
2. subtask_psnr
To compare PSNR of videos compressed using MPEG and PCA.
It also output the compressed video for anomaly detection.
a) PCA_psnr.m
- Goal: calculate the PSNR of a video which is compressed by PCA low-rank approximation.
step 1: given a video with [frame, width, height, YUV] pixels.
step 2: convert to a 2D matrix: [frame, width * height * YUV]
step 3: fragment the 2D matrix into small ones:
fragment i = [frame, x_i:y_i]
step 4: apply PCA to each fragment with a given rank.
the rank decides the compression ratio and quality
step 5: reconstruct the approximated matrix
step 6: calculate PSNR
- Input:
1. num_PC: num of PCs to use (i.e. rank)
2. video_name: the name of raw video (assume the video format: YUV CIF 4:2:0)
3. frames: number of frames to analyze
4. width: the width of the video
5. height: the height of the video
- Output:
1. PSNR: the PSNR (dB) of the PCA low-rank approximation.
2. compressed size: the size of the PCA approximation. e.g. the size of principle components and eigenvectors
- Batch Run:
1. batch_PCA_psnr.m
b) PCA_psnr_by_frame.m
- Goal: calculate the PSNR oa a vidoe which is compressed by PCA low-rank approximation
step 1: given a video with [frame, width, height, YUV] pixels.
step 2: make a GoP every [4 or 8 or 16] frames
step 3: convert GoP into a 2D matrix.
step 4: apply DCT to the 2D matrix.
step 5: apply PCA to the DCT output with a given rank.
step 6: reconstruct the approximated matrix
step 7: apply inverted DCT the the approximated matrix
step 8: calculate PSNR
c) DCT_psnr.m
- Goal: calculate the PSNR of a video which is compressed using 3D DCT
step 1. given a video with [frame, width, height, YUV] pixels.
step 2: make a GoP every [4 or 8 or 16] frames
step 3: apply 3D DCT to each GoP.
step 4: partition a GoP into small chunks (e.g. 44x35 pixels a chunk)
step 5: for each chunk, calculate the error after iDCT if the chunk is removed
step 6: remove chunks with small error
step 7: apply inverted DCT to the matrix with only remaining chunks
step 8: calculate PSNR
- DCT_psnr_combine_yuv.m
The only difference of this code is to apply DCT on a 4D array: [width, height, frame, YUV].
d) DCT_psnr_combine_yuv.m
- Goal: Similar to "DCT_psnr.m", but instead of handling YUV seperately, this one combine them into one and apply 3D DCT to the combined matrix.
Note. it's too slow (due to the larger matrix), so not used for now...
e) compressive_sensing_psnr.m
- Goal: calculate the PSNR of a video which is compressed using compressive sensing.
step 1. given a video with [frame, width, height, YUV] pixels.
step 2: make a GoP every [4 or 8 or 16] frames
step 3: apply compressive sensing with spatial and temporal constraints to each GoP.
step 4: reconstruct the GoP using U and V returned by compressive sensing
step 5: calculate PSNR
f) yuv_psnr.m
- Goal: calculate the PSNR of two videos
- Input:
1. video_name1: file name and path of the 1st video.
2. video_name2: file name and path of the 2nd video.
3. frames: number of frames to analyze
4. width: the width of the video
5. height: the height of the video
3. subtask_inject_error
The objective of this task is to inject anomalies to a given matrix or video.
a) inject_err.m
- Goal: inject anomalies by adding some large numbers to the given matrix.
4. subtask_TM_to_video
Convert the matrix to a YUV video.
Because ffmpeg only work on video, before implementing our own MPEG encoding, we need to convert TM to video to apply MPEG based anomaly detection method.
a) TM_to_video.m
- Goal: convert the given 3D matrix to a YUV video.
Since a pixel in the YUV video only have 1 byte, I put the 1st byte to V, 2nd byte to U, and 3rd byte to Y (assuming the values in the matrix have at most 3 bytes.)
b) sanity_check.m
- Goal: the code is used to check my implementation is correct..
5. subtask_ffmpeg
Use ffmpeg to convert raw YUV video to MPEG, and also convert MPEG back to YUV video.
a) batch_convert.sh
b) batch_convert_TM.sh
6. subtask_detect_anomaly
After getting the normal subspace (i.e. compressed video) using scripts in "subtask_psnr" and "subtask_ffmpeg, the scripts here calculate the abnormal subspace (i.e. the difference between raw video and compressed video) and detect anomalies.
a) diff_orig_comp_video.m
- Goal: calculate the difference between raw video and compressed video.
b) detect_anomaly.m
- Goal: given the difference time-series from "diff_orig_comp_video.m", this script detect anomalies and return the performance.
/*************************
* Helpers:
* ./utils/
*************************/
1. Matlab Lib: YUV2Image
- YUV2Image/
- http://www.mathworks.com/matlabcentral/fileexchange/6318-convert-yuv-cif-420-video-file-to-image-files/
2. Gene Lee's codes
- data.py, googlemaps.py, utils.py
- used to process FourSquare dataset.
3. PSNR
- calculate_psnr.m
- The codes copied from somewhere online to calculate PSNR of a video
4. Matlab Lib: DCT/IDCT
- mirt_dctn
- http://www.mathworks.com/matlabcentral/fileexchange/24050-multidimensional-discrete-cosine-transform-dct