Skip to content

Commit 7110412

Browse files
committed
Google Colab link
1 parent 18c2c32 commit 7110412

File tree

5 files changed

+195
-243
lines changed

5 files changed

+195
-243
lines changed

Burrows Delta Walkthrough.ipynb

Lines changed: 99 additions & 233 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,19 +14,18 @@
1414

1515
[![PyPI package](https://img.shields.io/badge/pip%20install-faststylometry-brightgreen)](https://pypi.org/project/faststylometry/) [![version number](https://img.shields.io/pypi/v/faststylometry?color=green&label=version)](https://github.com/fastdatascience/faststylometry/releases) [![License](https://img.shields.io/github/license/fastdatascience/faststylometry)](https://github.com/fastdatascience/faststylometry/blob/main/LICENSE)
1616

17+
You can run the walkthrough notebook in [Google Colab](https://colab.research.google.com/github/fastdatascience/faststylometry/blob/main/Burrows%20Delta%20Walkthrough.ipynb) with a single click: <a href="https://colab.research.google.com/github/fastdatascience/faststylometry/blob/main/Burrows%20Delta%20Walkthrough.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
1718
<!-- badges: end -->
1819

1920
# ☄ Fast Stylometry - Burrows Delta NLP technique ☄
2021

21-
Developed by Fast Data Science, https://fastdatascience.com. Fast Data Science develops [products](https://fastdatascience.com/demos/), offers [consulting services](https://fastdatascience.com/case-studies/), and [training courses](https://fastdatascience.com/training-and-upskilling-analytics-teams-in-data-science/) in [natural language processing (NLP)](https://fastdatascience.com/guide-natural-language-processing-nlp/).
22+
Developed by [**Fast Data Science**](https://fastdatascience.com). Fast Data Science develops [products](https://fastdatascience.com/demos/), offers [consulting services](https://fastdatascience.com/case-studies/), and [training courses](https://fastdatascience.com/training-and-upskilling-analytics-teams-in-data-science/) in [natural language processing (NLP)](https://fastdatascience.com/guide-natural-language-processing-nlp/).
2223

2324
Source code at https://github.com/fastdatascience/faststylometry
2425

2526
Tutorial at https://fastdatascience.com/fast-stylometry-python-library/
2627

27-
This is a Python library for calculating the Burrows Delta.
28-
29-
Burrows' Delta is an algorithm for comparing the similarity of the writing styles of documents, known as [forensic stylometry](https://fastdatascience.com/how-you-can-identify-the-author-of-a-document/).
28+
**Fast Stylometry** is a Python library for calculating the Burrows' Delta. Burrows' Delta is an algorithm for comparing the similarity of the writing styles of documents, known as [forensic stylometry](https://fastdatascience.com/how-you-can-identify-the-author-of-a-document/).
3029

3130
* [A useful explanation of the maths and thinking behind Burrows' Delta and how it works](https://programminghistorian.org/en/lessons/introduction-to-stylometry-with-python#third-stylometric-test-john-burrows-delta-method-advanced)
3231

@@ -41,7 +40,7 @@ pip install faststylometry
4140

4241
# 🌟 Using Fast Stylometry NLP library for the first time 🌟
4342

44-
⚠️ We recommend you follow the walk through notebook [Burrows Delta Walkthrough.ipynb](Burrows%20Delta%20Walkthrough.ipynb) in order to understand how the library works.
43+
⚠️ We recommend you follow the walk through notebook titled [Burrows Delta Walkthrough.ipynb](Burrows%20Delta%20Walkthrough.ipynb) in order to understand how the library works. If you don't have the correct environment set up on your machine, then you can run the walkthrough notebook easily using [this link to create a notebook in Google Colab](https://colab.research.google.com/github/fastdatascience/faststylometry/blob/main/Burrows%20Delta%20Walkthrough.ipynb).
4544

4645
# 💡 Usage examples
4746

@@ -58,6 +57,7 @@ The [Burrows Delta Walkthrough.ipynb](Burrows%20Delta%20Walkthrough.ipynb) Jupy
5857
To create a corpus and add books, the pattern is as follows:
5958

6059
```
60+
from faststylometry import Corpus
6161
corpus = Corpus()
6262
corpus.add_book("Jane Austen", "Pride and Prejudice", [whole book text])
6363
```
@@ -81,8 +81,16 @@ for root, _, files in os.walk(folder):
8181
corpus.add_book(author, book, text)
8282
```
8383

84+
8485
## 💡 Example 1
8586

87+
Download some example data (Project Gutenberg texts) from the Fast Stylometry repository:
88+
89+
```
90+
from faststylometry import download_examples
91+
download_examples()
92+
```
93+
8694
Load a corpus and calculate Burrows' Delta
8795

8896
```
@@ -192,7 +200,7 @@ Wood, T.A., Fast Stylometry [Computer software], Version 1.0.2, accessed at [htt
192200
```
193201
@unpublished{faststylometry,
194202
AUTHOR = {Wood, T.A.},
195-
TITLE = {Fast Stylometry (Computer software), Version 1.0.2},
203+
TITLE = {Fast Stylometry (Computer software), Version 1.0.3},
196204
YEAR = {2023},
197205
Note = {To appear},
198206
}

setup.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@
7070
'numpy==1.24.3',
7171
'pandas==2.1.0',
7272
'scikit-learn==1.3.0',
73+
'wget==3.2',
7374
],
7475
extras_require={
7576
"dev": ["check-manifest"],

src/faststylometry/__init__.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,11 @@
2727
2828
'''
2929

30-
__version__ = "1.0.2"
31-
30+
__version__ = "1.0.3"
3231

32+
from faststylometry.burrows_delta import calculate_burrows_delta
3333
from faststylometry.corpus import Corpus
34-
from faststylometry.util import load_corpus_from_folder
3534
from faststylometry.en import tokenise_remove_pronouns_en
36-
from faststylometry.burrows_delta import calculate_burrows_delta
35+
from faststylometry.examples import download_examples
3736
from faststylometry.probability import predict_proba, calibrate, get_calibration_curve
37+
from faststylometry.util import load_corpus_from_folder

src/faststylometry/examples.py

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
'''
2+
MIT License
3+
4+
Copyright (c) 2023 Fast Data Science Ltd (https://fastdatascience.com)
5+
6+
Maintainer: Thomas Wood
7+
8+
Tutorial at https://fastdatascience.com/fast-stylometry-python-library/
9+
10+
Permission is hereby granted, free of charge, to any person obtaining a copy
11+
of this software and associated documentation files (the "Software"), to deal
12+
in the Software without restriction, including without limitation the rights
13+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14+
copies of the Software, and to permit persons to whom the Software is
15+
furnished to do so, subject to the following conditions:
16+
17+
The above copyright notice and this permission notice shall be included in all
18+
copies or substantial portions of the Software.
19+
20+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
25+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
26+
SOFTWARE.
27+
28+
'''
29+
30+
import os
31+
import zipfile
32+
33+
import wget
34+
35+
36+
def bar_custom(current, total, width=80):
37+
"""
38+
Display a progress bar to track the download.
39+
:param current: Current bytes downloaded
40+
:param total: Total bytes.
41+
:param width: Width of the bar in chars.
42+
"""
43+
print("Downloading: %d%% [%d / %d] bytes" % (current / total * 100, current, total), end="\r")
44+
45+
46+
def download_examples():
47+
"""
48+
Download the example corpus
49+
"""
50+
51+
data_path = "data"
52+
is_folder_exists = os.path.exists(data_path)
53+
if not is_folder_exists:
54+
print(f"Creating folder {data_path}.")
55+
# Create a new directory because it does not exist
56+
os.makedirs(data_path)
57+
58+
if os.path.exists("data/train") and len(os.listdir("data/train")) > 0:
59+
print("data/train is not empty. Exiting the downloader.") #
60+
return
61+
if os.path.exists("data/test") and len(os.listdir("data/test")) > 0:
62+
print("data/test is not empty. Exiting the downloader.") #
63+
return
64+
65+
url = 'https://raw.githubusercontent.com/fastdatascience/faststylometry/main/data/train_test.zip'
66+
67+
local_file = "data/train_test.zip"
68+
print(f"Downloading {url} to {local_file}...")
69+
70+
wget.download(url, out=local_file, bar=bar_custom)
71+
72+
print(f"Downloaded {url} to {local_file}.\nExtracting...")
73+
74+
with zipfile.ZipFile(local_file, 'r') as zip_ref:
75+
zip_ref.extractall(data_path)
76+
77+
print(f"Extracted contents of zip file to {data_path}")

0 commit comments

Comments
 (0)