You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix: blogs layout shift from content overflow (in normal view and mobile view) (#283)
* fixes#275: [Updated some Heading levels, heading line-height, links overflow, img overflow of blog mdx file]
* fixes#275: [added spacing above image, handled overflow for image, links, tables, and iframes, and corrected Heading syntax]
* fix#275: fix overflowing link in 2023-12-17-sprint-dmm.mdx causing layout shift on mobile
* fix#275: prevent text and link overflow on small screens in topic modelling post by using word-break
* refactor: move Grommet Heading import to top in similar videos blog
* refactor: move Heading import to top in blog files
* fix: update blog date to 16 and apply link wrapping to prevent mobile layout shift
* shift 'Heading' import below frontmatter to follow MDX syntax rules
* shift 'Heading' import below frontmatter to follow MDX syntax rules
* correct blog post date to 2024-03-13 as per original publish schedule
Copy file name to clipboardExpand all lines: src/blog/2019-07-13-shell-architecture.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -50,6 +50,6 @@ We are following an MVC pattern for developing this server. Broadly speaking eve
50
50
51
51
We use the config module to handle different app configuration. This includes sensitive credentials like database parameters. config expects a config directory with default.json file in it. We don't check that into the git repository for security reasons. You will have to contact us via email or our slack channel to get access to it.
52
52
53
-
# Footnotes
53
+
##Footnotes
54
54
55
55
1. Even though currently Shell Server uses amazon’s hosted MySQL service for our Database needs, for all practical purposes, it is ok to consider it part of Shell.
Copy file name to clipboardExpand all lines: src/blog/2019-07-26-tattle-data-science-finding-similar-videos-efficiently.mdx
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,7 @@ author: Swair Shah
6
6
project: Kosh
7
7
tags: machine-learning, devlog
8
8
---
9
+
import { Heading } from'grommet'
9
10
10
11
One of the challenges we face at Tattle is to efficiently check if a given post or message has been encountered earlier by our system.
11
12
@@ -18,7 +19,7 @@ In this post we will describe our ideas for video and GIF content representation
18
19
19
20
We want to come up with an approach which not only works for finding duplicate or near duplicate videos but we also want to extract some useful information from the video. For example, we may want to generate a description of this video with tags such as `speech`, `president`, `nixon` etc. Processing a video can be very processor intensive. The Nixon video in our example is 36 seconds long. We can extract the frames from the video using OpenCV. There are 1109 frames in the video for just 36 seconds. Even if we are using an efficient deep Convolutional Neural Network to classify labels or generate representation, it would be inefficient to use all the frames of the video.
20
21
21
-
###Anchor Frames
22
+
## Anchor Frames
22
23
23
24
We want to find what we call "Anchor Frames" for a given video. Anchor Frames are a small set of frames which are a good representation of the video. For our video let us look at a sample from the extracted frames.
24
25
A sample of frames from the video
@@ -66,27 +67,27 @@ We can see that after the first two frame selection the error reduction slows do
Using these these two frames as anchor frames we achieve $80.48\%$ reduction in reconstruction error. Now that we have found a set of representative anchor frames for a video, we can use the same technique we use for image duplicate detection and label extraction. To generate a fingerprint for the video we can take individual image fingerprint (pHash or pre-trained Convolutional Neural Network features) for all the anchor frames and either append them or take an average.
68
69
69
-
## Semantic Labels for Videos
70
+
<Headinglevel={3}>Semantic Labels for Videos</Heading>
70
71
71
72
To classify videos we use Google vision API on the set of anchor frames. Passing the second anchor frame to Google Vision API gives the following labels.
We can pass all the anchor frames to the API and take a union of the labels.
74
75
75
-
## Advantages and Limitations
76
+
<Headinglevel={3}>Advantages and Limitations</Heading>
76
77
77
78
One of the advantages of this anchor frame approach is that it is robust to minor changes in the video. In case the video is slightly edited or if a few frames are missing, the anchor frames are not affected in most cases as they are still a good representation of the remaining video frames. Though we definitely need more testing to verify this on a broad class of videos. One could use an average frame as a representation of the video but this average frame is sensitive to the video editing. The anchor frames are also much "cleaner" than the average frame which tends to be blurry passing it to the image classifier may not give useful results. The result of the Google Vision API on the average frame is shown below.
We can see that the score for the labels relating to Richard Nixon has dropped drastically. This only gets worse as the video size increases and the average frame becomes more noisy.
80
81
81
-
## Some Optimizations and Future Work
82
+
<Headinglevel={3}>Some Optimizations and Future Work</Heading>
82
83
83
84
In our example the matrix formed after vectorizing each frames was $7500 \times 1000$. Operating on such a matrix is computationally expensive (the complexity of QR is $O(n^3)$). We can try reducing the row dimensions and the column dimensions. Here the number of rows is due to the size of the each frame which is $50 \times 50 \times 3$, even after resizing. We could resize further but after a certain point we start to lose useful information in the image. One possible way which we experimented with is using a pre-trained Convolutional Neural Network to get embeddings of each frame and then using these as a representation for the image. These embeddings are known to capture semantic information in the image [3]. The size of the embedding depends on the CNN architecture we use. Using a ResNet-18 architecture gives us $512$ dimensional embeddings. This is a significant improvement on $7500$. Another optimization we tried is sampling one frame in every 10 frames. There is a trade-off involved between speed and accuracy and one needs to tune these parameters for a given use case. Using these optimizations the matrix $X$ that we compute the QR-decomposition of is of size $512 \times 100$.
84
85
85
-
## Limitations and Future work
86
+
<Headinglevel={3}>Limitations and Future work</Heading>
86
87
87
88
The approach taken is by no means a silver bullet. There are videos where the base frame changes drastically and the number of anchor frames required to achieve a small error would be very high. So far our approach works well for typical videos shared frequently on WhatsApp and other messaging platforms. If you have suggestions on other approaches do get in touch with us!
88
89
89
-
# References
90
+
##References
90
91
91
92
-[1] G. H. Golub and C. F. Van-Loan.Matrix computations. The Johns Hopkins University Press, third edition,1996.
92
93
-[2] Maung, Crystal, and Haim Schweitzer. "Pass-efficient unsupervised feature selection." *Advances in Neural Information Processing Systems*. 2013.
0 commit comments