Skip to content

Commit 4522890

Browse files
authored
Fix: blogs layout shift from content overflow (in normal view and mobile view) (#283)
* fixes #275: [Updated some Heading levels, heading line-height, links overflow, img overflow of blog mdx file] * fixes #275: [added spacing above image, handled overflow for image, links, tables, and iframes, and corrected Heading syntax] * fix #275: fix overflowing link in 2023-12-17-sprint-dmm.mdx causing layout shift on mobile * fix #275: prevent text and link overflow on small screens in topic modelling post by using word-break * refactor: move Grommet Heading import to top in similar videos blog * refactor: move Heading import to top in blog files * fix: update blog date to 16 and apply link wrapping to prevent mobile layout shift * shift 'Heading' import below frontmatter to follow MDX syntax rules * shift 'Heading' import below frontmatter to follow MDX syntax rules * correct blog post date to 2024-03-13 as per original publish schedule
1 parent 5ba4f15 commit 4522890

File tree

50 files changed

+302
-193
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+302
-193
lines changed

src/blog/2019-07-13-shell-architecture.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,6 @@ We are following an MVC pattern for developing this server. Broadly speaking eve
5050

5151
We use the config module to handle different app configuration. This includes sensitive credentials like database parameters. config expects a config directory with default.json file in it. We don't check that into the git repository for security reasons. You will have to contact us via email or our slack channel to get access to it.
5252

53-
# Footnotes
53+
## Footnotes
5454

5555
1. Even though currently Shell Server uses amazon’s hosted MySQL service for our Database needs, for all practical purposes, it is ok to consider it part of Shell.

src/blog/2019-07-26-tattle-data-science-finding-similar-videos-efficiently.mdx

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ author: Swair Shah
66
project: Kosh
77
tags: machine-learning, devlog
88
---
9+
import { Heading } from 'grommet'
910

1011
One of the challenges we face at Tattle is to efficiently check if a given post or message has been encountered earlier by our system.
1112

@@ -18,7 +19,7 @@ In this post we will describe our ideas for video and GIF content representation
1819

1920
We want to come up with an approach which not only works for finding duplicate or near duplicate videos but we also want to extract some useful information from the video. For example, we may want to generate a description of this video with tags such as `speech`, `president`, `nixon` etc. Processing a video can be very processor intensive. The Nixon video in our example is 36 seconds long. We can extract the frames from the video using OpenCV. There are 1109 frames in the video for just 36 seconds. Even if we are using an efficient deep Convolutional Neural Network to classify labels or generate representation, it would be inefficient to use all the frames of the video.
2021

21-
### Anchor Frames
22+
## Anchor Frames
2223

2324
We want to find what we call "Anchor Frames" for a given video. Anchor Frames are a small set of frames which are a good representation of the video. For our video let us look at a sample from the extracted frames.
2425
![](../images/Screen-Shot-2019-07-19-at-3.32.55-PM.png)A sample of frames from the video
@@ -66,27 +67,27 @@ We can see that after the first two frame selection the error reduction slows do
6667
![](../images/frame0-3.jpg)![](/content/images/2019/07/frame700.jpg)
6768
Using these these two frames as anchor frames we achieve $80.48\%$ reduction in reconstruction error.  Now that we have found a set of representative anchor frames for a video, we can use the same technique we use for image duplicate detection and label extraction. To generate a fingerprint for the video we can take individual image fingerprint (pHash or pre-trained Convolutional Neural Network features) for all the anchor frames  and either append them or take an average.
6869

69-
## Semantic Labels for Videos
70+
<Heading level={3}>Semantic Labels for Videos</Heading>
7071

7172
To classify videos we use Google vision API on the set of anchor frames. Passing the second anchor frame to Google Vision API gives the following labels.
7273
![](../images/Screen-Shot-2019-07-19-at-5.37.05-PM.png)
7374
We can pass all the anchor frames to the API and take a union of the labels.
7475

75-
## Advantages and Limitations
76+
<Heading level={3}>Advantages and Limitations</Heading>
7677

7778
One of the advantages of this anchor frame approach is that it is robust to minor changes in the video. In case the video is slightly edited or if a few frames are missing, the anchor frames are not affected in most cases as they are still a good representation of the remaining video frames. Though we definitely need more testing to verify this on a broad class of videos. One could use an average frame as a representation of the video but this average frame is sensitive to the video editing. The anchor frames are also much "cleaner" than the average frame which tends to be blurry passing it to the image classifier may not give useful results. The result of the Google Vision API on the average frame is shown below.
7879
![](../images/Screen-Shot-2019-07-19-at-11.41.18-PM.png)
7980
We can see that the score for the labels relating to Richard Nixon has dropped drastically. This only gets worse as the video size increases and the average frame becomes more noisy.
8081

81-
## Some Optimizations and Future Work
82+
<Heading level={3}>Some Optimizations and Future Work</Heading>
8283

8384
In our example the matrix formed after vectorizing each frames was $7500 \times 1000$. Operating on such a matrix is computationally expensive (the complexity of QR is $O(n^3)$). We can try reducing the row dimensions and the column dimensions. Here the number of rows is due to the size of the each frame which is $50 \times 50 \times 3$, even after resizing. We could resize further but after a certain point we start to lose useful information in the image. One possible way which we experimented with is using a pre-trained Convolutional Neural Network to get embeddings of each frame and then using these as a representation for the image.  These embeddings are known to capture semantic information in the image [3]. The size of the embedding depends on the CNN architecture we use. Using a ResNet-18 architecture gives us $512$ dimensional embeddings. This is a significant improvement on $7500$. Another optimization we tried is sampling one frame in every 10 frames. There is a trade-off involved between speed and accuracy and one needs to tune these parameters for a given use case. Using these optimizations the matrix $X$ that we compute the QR-decomposition of is of size $512 \times 100$.
8485

85-
## Limitations and Future work
86+
<Heading level={3}>Limitations and Future work</Heading>
8687

8788
The approach taken is by no means a silver bullet. There are videos where the base frame changes drastically and the number of anchor frames required to achieve a small error would be very high. So far our approach works well for typical videos shared frequently on WhatsApp and other messaging platforms. If you have suggestions on other approaches do get in touch with us!
8889

89-
# References
90+
## References
9091

9192
- [1] G. H. Golub and C. F. Van-Loan.Matrix computations. The Johns Hopkins University Press, third edition,1996.
9293
- [2] Maung, Crystal, and Haim Schweitzer. "Pass-efficient unsupervised feature selection." *Advances in Neural Information Processing Systems*. 2013.

0 commit comments

Comments
 (0)