Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added PBL/Description of Code.docx
Binary file not shown.
27 changes: 27 additions & 0 deletions PBL/Description of Code.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Description of Code.

Three groups of codes used in the study were not integrated and contained deliberate
manual inputs for simplicity of their modification. Furthermore, they were made as free of the
loops as possible to facilitate combining and recombining of the text files.
First group of codes in python imports the text files and converts them
into number sets. All the text files were transliterated into Latin alphabet and/or numbers.
The conversion was made by the utf-8 encodings. Other encodings such as utf-16, ascii and
Windows-1252 were tried without a particular improvement in recognition. The output from the
text import is four squares 4�64�64, which is a �fingerprint� of the imported text usable for
recognition by the C-GAN network. Fingerprints can be subjected to transformations and/or
filtering (e.g. Fourier transform or Gaussian filter). They do not seem to improve their
performance on stage 2. A small size of the fingerprint is dictated by limited computing resources
available to the author.
The second group of codes is C-GAN source codes in python. Original C-GAN codes were
taken from public spaces on GitHub and are fairly standard. They analyze the language
fingerprints, which can be used to create fakes than subjected to criticism by the
discriminator/critic within the program. The affinity between fingerprints is estimated by
correlation-like distance between two 2D number arrays described in the paper manuscript.
The correlation numbers were being output manually and collated into *.csv input file.
Finally, the Mathematica� notebook with the tree algorithm is provided.
The notebook takes *.csv outputs from the second stage. The regression method producing
the best results seem to be logistic regression.




13 changes: 13 additions & 0 deletions PBL/Description_of_data.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Description of Data Files

The stages 1 and 2 of the analysis involve 1) original text file, converted into *.docx
or any other suitable format. In our example these are five *.docx files taken from a
Wikipedia page of Philippines in Tagalog. These files can be fed into the Text_import1*.py
program. The size of the file is dictated by the computing power at the next stage.
The Text_import1* program creates training and test "fingerprints" of the
size 4X64X64. It can include samples of one or several languages. Combined samples can be
used for training as well as a single-language formats. Examples of the input files with
fingerprints for the files gan4*.py are included.
Correlation results for the pairs of languages are given in the file
Language_results*.xlsx. The information from this file can be imported into the
tree analysis notebook. For simplicity I used *csv format for the import.
3,916 changes: 3,916 additions & 0 deletions PBL/Language_Analysis3b.nb

Large diffs are not rendered by default.

Binary file added PBL/Language_results22bcd.xlsx
Binary file not shown.
37 changes: 37 additions & 0 deletions PBL/Ling3.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Name,det12,Tr,Name,det12,Tr,Name,det12,Tr
Minoa-Eng,5.40E-05,0.00156725,Hurr-Luw,0.120841839,0.2802,Hurr-Luw,0.120841839,0.2802
Sp-Luw,0.007703116,0.09815,Hurr-Bab,0.003441221,0.27743,Hurr-Bab,0.003441221,0.27743
Eng-Bab,0.017131725,0.18083,Sp-Tag,0.001024695,0.4137,Sp-Tag,0.001024695,0.4137
Eng-Suo,0.025941087,0.2415,Sp-Luw,0.007703116,0.09815,Sp-Luw,0.007703116,0.09815
Hurr-Bab,0.003441221,0.27743,Luw-Bab,0.117092613,0.5272,Luw-Bab,0.117092613,0.5272
Hurr-Luw,0.120841839,0.2802,Sp-Bab,0.03546745,0.4546,Sp-Bab,0.03546745,0.4546
Sp-Suo,0.001897894,0.28871,Tag-Luw,0.041038616,0.35703,Tag-Luw,0.041038616,0.35703
Eng-Hurr,0.0041265,0.28913,Eng-Tag,0.137952238,0.6811,Eng-Tag,0.137952238,0.6811
Eng-Luw,0.001700294,0.2894,Eng-Sp,0.012716131,0.3342,Eng-Sp,0.012716131,0.3342
Suo-Luw,0.00207798,0.29583,Eng-Bab,0.017131725,0.18083,Eng-Bab,0.017131725,0.18083
Sp-Hurr,0.002363472,0.3223,Eng-Hurr,0.0041265,0.28913,Eng-Hurr,0.0041265,0.28913
Hurr-Minoa,0.004345193,0.334122,Eng-Suo,0.025941087,0.2415,Eng-Luw,0.001700294,0.2894
Eng-Sp,0.012716131,0.3342,Sp-Suo,0.001897894,0.28871,Sp-Hurr,0.002363472,0.3223
Tag-Luw,0.041038616,0.35703,Tag-Suo,0.227440827,0.6447,Tag-Hurr,0.158181668,0.644
Suo-Bab,0.006757514,0.41286,Suo-Luw,0.00207798,0.29583,Tag-Bab,0.17663819,0.488
Sp-Tag,0.001024695,0.4137,Suo-Hurr,0.243119497,0.6914,Minoa-Eng,5.4037E-05,0.00156725
Sp-Bab,0.03546745,0.4546,Suo-Bab,0.006757514,0.41286,Sp-Minoa,0.204602493,0.6623
Tag-Bab,0.17663819,0.488,Eng-Luw,0.001700294,0.2894,Tag-Minoa,0.256435235,0.7686
Luw-Bab,0.117092613,0.5272,Sp-Hurr,0.002363472,0.3223,Luw-Minoa,0.158649803,0.5607
Luw-Minoa,0.158649803,0.5607,Tag-Hurr,0.158181668,0.644,Hurr-Minoa,0.004345193,0.334122
Tag-Hurr,0.158181668,0.644,Tag-Bab,0.17663819,0.488,Bab-Minoa,0.210479476,0.6782
Tag-Suo,0.227440827,0.6447,Min1-Eng,0.00276788,0.1598,Suo-Minoa,0.407992255,0.8866
Sp-Minoa,0.204602493,0.6623,Min1-Tag,0.00448677,0.1382,,,
Bab-Minoa,0.210479476,0.6782,Min1-Hurr,5.04E-05,0.1667,,,
Eng-Tag,0.137952238,0.6811,Min1-Suo,0.0091834,0.1527,,,
Suo-Hurr,0.243119497,0.6914,Sp-Min1,0.0019871,0.1049,,,
Tag-Minoa,0.256435235,0.7686,Luw-Min1,0.01990165,0.6442,,,
Suo-Minoa,0.407992255,0.8866,Bab-Min1,0.02139247,0.8152,,,
Min1-Eng,0.00276788,0.1598,Min1-Minoa,0.00956598,0.312,,,
Min1-Tag,0.00448677,0.1382,,,,,,
Min1-Hurr,5.04E-05,0.1667,,,,,,
Min1-Suo,0.0091834,0.1527,,,,,,
Sp-Min1,0.0019871,0.1049,,,,,,
Luw-Min1,0.01990165,0.6442,,,,,,
Bab-Min1,0.02139247,0.8152,,,,,,
Min1-Minoa,0.00956598,0.312,,,,,,
Binary file added PBL/Minoa1.npy
Binary file not shown.
Binary file added PBL/MinoanX1.npy
Binary file not shown.
Binary file added PBL/Sp1a.npy
Binary file not shown.
Binary file added PBL/SpBab1a.npy
Binary file not shown.
Binary file added PBL/SpBab2a.npy
Binary file not shown.
Binary file added PBL/SpBab3a.npy
Binary file not shown.
Binary file added PBL/SpLuw1a.npy
Binary file not shown.
Binary file added PBL/SpLuw2a.npy
Binary file not shown.
Binary file added PBL/SpLuw3a.npy
Binary file not shown.
Binary file added PBL/Suo1.npy
Binary file not shown.
Binary file added PBL/Tagalog1.docx
Binary file not shown.
Binary file added PBL/Tagalog2.docx
Binary file not shown.
Binary file added PBL/Tagalog3.docx
Binary file not shown.
Binary file added PBL/Tagalog4.docx
Binary file not shown.
Binary file added PBL/Tagalog5.docx
Binary file not shown.
122 changes: 122 additions & 0 deletions PBL/Text_import1c.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# -*- coding: utf-8 -*-
"""
Created on Sun Apr 2 15:25:15 2023

@author: Peter
"""
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import docx
import textract

# text = pd.read_table(r"C:\Users\Peter\Downloads\Linguistic_Data\English1.docx",header=None, delimiter=None)[0].to_list()
text1 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\Hurrian1.docx")
text1 = docx.Document(r"C:\Users\Peter\Downloads\Linguistic_Data\English1.docx")

print('List of paragraph objects:->>>')
print(text1.paragraphs)

text11 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\Hurrian1a.docx").decode('utf-8').strip()
text11a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English1a.docx").decode('utf-8').strip()
text211 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\Hurrian2a.docx").decode('utf-8').strip()
text211a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English2a.docx").decode('utf-8').strip()
text212 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\Hurrian3a.docx").decode('utf-8').strip()
text212a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English3a.docx").decode('utf-8').strip()
text213 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\Hurrian4a.docx").decode('utf-8').strip()
text213a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English4a.docx").decode('utf-8').strip()

text112 = bytes(text11, 'utf-8')
text113 = list(text112)
text113 += ['1'] * (64*64 - len(text113))
text114 = np.reshape(text113,(64,64))
text114 = np.asarray(text114, dtype=float)
#print(text12)

text112a = bytes(text11a, 'utf-8')
text113a = list(text112a)
text113a += ['1'] * (64*64 - len(text113a))
text114a = np.reshape(text113a,(64,64))
text114a = np.asarray(text114a, dtype=float)

text112b = bytes(text211, 'utf-8')
text113b = list(text112b)
text113b += ['1'] * (64*64 - len(text113b))
text114b = np.reshape(text113b,(64,64))
text114b = np.asarray(text114b, dtype=float)

text112c = bytes(text211a, 'utf-8')
text113c = list(text112c)
text113c += ['1'] * (64*64 - len(text113c))
text114c = np.reshape(text113c,(64,64))
text114c = np.asarray(text114c, dtype=float)

text112d = bytes(text212, 'utf-8')
text113d = list(text112d)
text113d += ['1'] * (64*64 - len(text113d))
text114d = np.reshape(text113d,(64,64))
text114d = np.asarray(text114d, dtype=float)

text112e = bytes(text212a, 'utf-8')
text113e = list(text112e)
text113e += ['1'] * (64*64 - len(text113e))
text114e = np.reshape(text113e,(64,64))
text114e = np.asarray(text114e, dtype=float)

text112f = bytes(text213, 'utf-8')
text113f = list(text112f)
text113f += ['1'] * (64*64 - len(text113f))
text114f = np.reshape(text113f,(64,64))
text114f = np.asarray(text114f, dtype=float)
#print(text12)

text112g = bytes(text213a, 'utf-8')
text113g = list(text112g)
text113g += ['1'] * (64*64- len(text113g))
text114g = np.reshape(text113g,(64,64))
text114g = np.asarray(text114g, dtype=float)

# Arbitrary bounds!
N1=max(np.array(text114).max(),np.array(text114a).max(),np.array(text114e).max(),np.array(text114f).max())
N2=min(np.array(text114).min(),np.array(text114a).min(),np.array(text114e).min(),np.array(text114f).min())

Hurr1a = np.concatenate((text114,text114b,text114d,text114f))
Eng1a = np.concatenate((text114a,text114c,text114e,text114g))
HurrEng1a = np.concatenate((text114,text114b,text114d,text114g))
EngHurr1a = np.concatenate((text114a,text114c,text114e,text114f))
HurrEng2a = np.concatenate((text114,text114b,text114e,text114g))
EngHurr2a = np.concatenate((text114a,text114c,text114d,text114f))
HurrEng3a = np.concatenate((text114,text114c,text114e,text114g))
EngHurr3a = np.concatenate((text114a,text114b,text114d,text114f))
Hurr1a = np.reshape(Hurr1a,(4,64,64))
Eng1a = np.reshape(Eng1a,(4,64,64))
HurrEng1a = np.reshape(HurrEng1a,(4,64,64))
EngHurr1a = np.reshape(EngHurr1a,(4,64,64))
HurrEng2a = np.reshape(HurrEng2a,(4,64,64))
EngHurr2a = np.reshape(EngHurr2a,(4,64,64))
HurrEng3a= np.reshape(HurrEng3a,(4,64,64))
EngHurr3a = np.reshape(EngHurr3a,(4,64,64))

testl1=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\Hurr1a.npy",Hurr1a)
testl2=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\Eng1a.npy",Eng1a)
testl3=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\HurrEng1a.npy",HurrEng1a)
testl4=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\EngHurr1a.npy",EngHurr1a)
testl5=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\HurrEng2a.npy",HurrEng2a)
testl6=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\EngHurr2a.npy",EngHurr2a)
testl7=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\HurrEng3a.npy",HurrEng3a)
testl8=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\EngHurr3a.npy",EngHurr3a)

plt.figure(figsize=(6,6))
plt.scatter(text114[:,0],text114[:,1],c=text114[:,2])
plt.xlim(N2,N1)
plt.ylim(N2,N1)
plt.show()

plt.figure(figsize=(6,6))
plt.scatter(text114a[:,0],text114a[:,1],c=text114a[:,2])
plt.xlim(N2,N1)
plt.ylim(N2,N1)
plt.show()

V1 = pd.DataFrame(text114)
V1.to_csv(r'C:\Users\Peter\Downloads\Linguistic_Data\HurrV.csv')
107 changes: 107 additions & 0 deletions PBL/Text_import1d.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# -*- coding: utf-8 -*-
"""
Created on Sun Apr 2 15:25:15 2023

@author: Peter
"""
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import docx
import textract

# text = pd.read_table(r"C:\Users\Peter\Downloads\Linguistic_Data\English1.docx",header=None, delimiter=None)[0].to_list()
text1 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\MinoanX6.docx")
text1 = docx.Document(r"C:\Users\Peter\Downloads\Linguistic_Data\English1.docx")

print('List of paragraph objects:->>>')
print(text1.paragraphs)

text11 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\MinoanX6a.docx").decode('utf-8').strip()
text11a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English1a.docx").decode('utf-8').strip()
text211 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\MinoanX6b.docx").decode('utf-8').strip()
text211a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English2a.docx").decode('utf-8').strip()
text212 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\MinoanX6c.docx").decode('utf-8').strip()
text212a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English3a.docx").decode('utf-8').strip()
text213 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\MinoanX6d.docx").decode('utf-8').strip()
text213a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English4a.docx").decode('utf-8').strip()

text112 = bytes(text11, 'utf-8')
text113 = list(text112)
text113 += ['1'] * (64*64 - len(text113))
text114 = np.reshape(text113,(64,64))
text114 = np.asarray(text114, dtype=float)
#print(text12)

text112a = bytes(text11a, 'utf-8')
text113a = list(text112a)
text113a += ['1'] * (64*64 - len(text113a))
text114a = np.reshape(text113a,(64,64))
text114a = np.asarray(text114a, dtype=float)

text112b = bytes(text211, 'utf-8')
text113b = list(text112b)
text113b += ['1'] * (64*64 - len(text113b))
text114b = np.reshape(text113b,(64,64))
text114b = np.asarray(text114b, dtype=float)

text112c = bytes(text211a, 'utf-8')
text113c = list(text112c)
text113c += ['1'] * (64*64 - len(text113c))
text114c = np.reshape(text113c,(64,64))
text114c = np.asarray(text114c, dtype=float)

text112d = bytes(text212, 'utf-8')
text113d = list(text112d)
text113d += ['1'] * (64*64 - len(text113d))
text114d = np.reshape(text113d,(64,64))
text114d = np.asarray(text114d, dtype=float)

text112e = bytes(text212a, 'utf-8')
text113e = list(text112e)
text113e += ['1'] * (64*64 - len(text113e))
text114e = np.reshape(text113e,(64,64))
text114e = np.asarray(text114e, dtype=float)

text112f = bytes(text213, 'utf-8')
text113f = list(text112f)
text113f += ['1'] * (64*64 - len(text113f))
text114f = np.reshape(text113f,(64,64))
text114f = np.asarray(text114f, dtype=float)
#print(text12)

text112g = bytes(text213a, 'utf-8')
text113g = list(text112g)
text113g += ['1'] * (64*64- len(text113g))
text114g = np.reshape(text113g,(64,64))
text114g = np.asarray(text114g, dtype=float)

# Arbitrary bounds!
N1=max(np.array(text114).max(),np.array(text114a).max(),np.array(text114e).max(),np.array(text114f).max())
N2=min(np.array(text114).min(),np.array(text114a).min(),np.array(text114e).min(),np.array(text114f).min())

plt.figure(figsize=(6,6))
plt.imshow(text114)
MinoanX1 = np.concatenate((text114,text114b,text114d,text114f))
Eng1a = np.concatenate((text114a,text114c,text114e,text114g))
MinoanEngX1 = np.concatenate((text114,text114b,text114d,text114g))
plt.figure(figsize=(6,6))
plt.imshow(text114a)

MinoanX1 = np.reshape(MinoanX1,(4,64,64))
Eng1a = np.reshape(Eng1a,(4,64,64))

testl1=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\MinoanX1.npy",MinoanX1)
testl2=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\Eng1a.npy",Eng1a)

plt.figure(figsize=(6,6))
plt.scatter(text114[:,0],text114[:,1],c=text114[:,2])
plt.xlim(N2,N1)
plt.ylim(N2,N1)
plt.show()

plt.figure(figsize=(6,6))
plt.scatter(text114a[:,0],text114a[:,1],c=text114a[:,2])
plt.xlim(N2,N1)
plt.ylim(N2,N1)
plt.show()
Loading