ancientml · pblerner18 · Jan 12, 2025 · Jun 11, 2025
diff --git a/PBL/Description of Code.docx b/PBL/Description of Code.docx
diff --git a/PBL/Description of Code.txt b/PBL/Description of Code.txt
@@ -0,0 +1,27 @@
+Description of Code.
+
+	Three groups of codes used in the study were not integrated and contained deliberate 
+manual inputs for simplicity of their modification. Furthermore, they were made as free of the 
+loops as possible to facilitate combining and recombining of the text files. 
+	First group of codes in python imports the text files and converts them 
+into number sets. All the text files were transliterated into Latin alphabet and/or numbers. 
+The conversion was made by the utf-8 encodings. Other encodings such as utf-16, ascii and 
+Windows-1252 were tried without a particular improvement in recognition. The output from the 
+text import is four squares 4�64�64, which is a �fingerprint� of the imported text usable for 
+recognition by the C-GAN network. Fingerprints can be subjected to transformations and/or 
+filtering (e.g. Fourier transform or Gaussian filter). They do not seem to improve their 
+performance on stage 2. A small size of the fingerprint is dictated by limited computing resources 
+available to the author. 
+	The second group of codes is C-GAN source codes in python. Original C-GAN codes were 
+taken from public spaces on GitHub and are fairly standard. They analyze the language 
+fingerprints, which can be used to create fakes than subjected to criticism by the 
+discriminator/critic within the program. The affinity between fingerprints is estimated by 
+correlation-like distance between two 2D number arrays described in the paper manuscript. 
+The correlation numbers were being output manually and collated into *.csv input file. 
+	Finally, the Mathematica� notebook with the tree algorithm is provided. 
+The notebook takes *.csv outputs from the second stage. The regression method producing 
+the best results seem to be logistic regression. 
+
+
+
+
diff --git a/PBL/Description_of_data.txt b/PBL/Description_of_data.txt
@@ -0,0 +1,13 @@
+Description of Data Files
+
+The stages 1 and 2 of the analysis involve 1) original text file, converted into *.docx 
+or any other suitable format. In our example these are five *.docx files taken from a 
+Wikipedia page of Philippines in Tagalog. These files can be fed into the Text_import1*.py 
+program. The size of the file is dictated by the computing power at the next stage. 
+	The Text_import1* program creates training and test "fingerprints" of the 
+size 4X64X64. It can include samples of one or several languages. Combined samples can be 
+used for training as well as a single-language formats. Examples of the input files with 
+fingerprints for the files gan4*.py are included. 
+	Correlation results for the pairs of languages are given in the file 
+Language_results*.xlsx. The information from this file can be imported into the 
+tree analysis notebook. For simplicity I used *csv format for the import. 
diff --git a/PBL/Language_Analysis3b.nb b/PBL/Language_Analysis3b.nb
diff --git a/PBL/Language_results22bcd.xlsx b/PBL/Language_results22bcd.xlsx
diff --git a/PBL/Ling3.csv b/PBL/Ling3.csv
@@ -0,0 +1,37 @@
+Name,det12,Tr,Name,det12,Tr,Name,det12,Tr
+Minoa-Eng,5.40E-05,0.00156725,Hurr-Luw,0.120841839,0.2802,Hurr-Luw,0.120841839,0.2802
+Sp-Luw,0.007703116,0.09815,Hurr-Bab,0.003441221,0.27743,Hurr-Bab,0.003441221,0.27743
+Eng-Bab,0.017131725,0.18083,Sp-Tag,0.001024695,0.4137,Sp-Tag,0.001024695,0.4137
+Eng-Suo,0.025941087,0.2415,Sp-Luw,0.007703116,0.09815,Sp-Luw,0.007703116,0.09815
+Hurr-Bab,0.003441221,0.27743,Luw-Bab,0.117092613,0.5272,Luw-Bab,0.117092613,0.5272
+Hurr-Luw,0.120841839,0.2802,Sp-Bab,0.03546745,0.4546,Sp-Bab,0.03546745,0.4546
+Sp-Suo,0.001897894,0.28871,Tag-Luw,0.041038616,0.35703,Tag-Luw,0.041038616,0.35703
+Eng-Hurr,0.0041265,0.28913,Eng-Tag,0.137952238,0.6811,Eng-Tag,0.137952238,0.6811
+Eng-Luw,0.001700294,0.2894,Eng-Sp,0.012716131,0.3342,Eng-Sp,0.012716131,0.3342
+Suo-Luw,0.00207798,0.29583,Eng-Bab,0.017131725,0.18083,Eng-Bab,0.017131725,0.18083
+Sp-Hurr,0.002363472,0.3223,Eng-Hurr,0.0041265,0.28913,Eng-Hurr,0.0041265,0.28913
+Hurr-Minoa,0.004345193,0.334122,Eng-Suo,0.025941087,0.2415,Eng-Luw,0.001700294,0.2894
+Eng-Sp,0.012716131,0.3342,Sp-Suo,0.001897894,0.28871,Sp-Hurr,0.002363472,0.3223
+Tag-Luw,0.041038616,0.35703,Tag-Suo,0.227440827,0.6447,Tag-Hurr,0.158181668,0.644
+Suo-Bab,0.006757514,0.41286,Suo-Luw,0.00207798,0.29583,Tag-Bab,0.17663819,0.488
+Sp-Tag,0.001024695,0.4137,Suo-Hurr,0.243119497,0.6914,Minoa-Eng,5.4037E-05,0.00156725
+Sp-Bab,0.03546745,0.4546,Suo-Bab,0.006757514,0.41286,Sp-Minoa,0.204602493,0.6623
+Tag-Bab,0.17663819,0.488,Eng-Luw,0.001700294,0.2894,Tag-Minoa,0.256435235,0.7686
+Luw-Bab,0.117092613,0.5272,Sp-Hurr,0.002363472,0.3223,Luw-Minoa,0.158649803,0.5607
+Luw-Minoa,0.158649803,0.5607,Tag-Hurr,0.158181668,0.644,Hurr-Minoa,0.004345193,0.334122
+Tag-Hurr,0.158181668,0.644,Tag-Bab,0.17663819,0.488,Bab-Minoa,0.210479476,0.6782
+Tag-Suo,0.227440827,0.6447,Min1-Eng,0.00276788,0.1598,Suo-Minoa,0.407992255,0.8866
+Sp-Minoa,0.204602493,0.6623,Min1-Tag,0.00448677,0.1382,,,
+Bab-Minoa,0.210479476,0.6782,Min1-Hurr,5.04E-05,0.1667,,,
+Eng-Tag,0.137952238,0.6811,Min1-Suo,0.0091834,0.1527,,,
+Suo-Hurr,0.243119497,0.6914,Sp-Min1,0.0019871,0.1049,,,
+Tag-Minoa,0.256435235,0.7686,Luw-Min1,0.01990165,0.6442,,,
+Suo-Minoa,0.407992255,0.8866,Bab-Min1,0.02139247,0.8152,,,
+Min1-Eng,0.00276788,0.1598,Min1-Minoa,0.00956598,0.312,,,
+Min1-Tag,0.00448677,0.1382,,,,,,
+Min1-Hurr,5.04E-05,0.1667,,,,,,
+Min1-Suo,0.0091834,0.1527,,,,,,
+Sp-Min1,0.0019871,0.1049,,,,,,
+Luw-Min1,0.01990165,0.6442,,,,,,
+Bab-Min1,0.02139247,0.8152,,,,,,
+Min1-Minoa,0.00956598,0.312,,,,,,
diff --git a/PBL/Minoa1.npy b/PBL/Minoa1.npy
diff --git a/PBL/MinoanX1.npy b/PBL/MinoanX1.npy
diff --git a/PBL/Sp1a.npy b/PBL/Sp1a.npy
diff --git a/PBL/SpBab1a.npy b/PBL/SpBab1a.npy
diff --git a/PBL/SpBab2a.npy b/PBL/SpBab2a.npy
diff --git a/PBL/SpBab3a.npy b/PBL/SpBab3a.npy
diff --git a/PBL/SpLuw1a.npy b/PBL/SpLuw1a.npy
diff --git a/PBL/SpLuw2a.npy b/PBL/SpLuw2a.npy
diff --git a/PBL/SpLuw3a.npy b/PBL/SpLuw3a.npy
diff --git a/PBL/Suo1.npy b/PBL/Suo1.npy
diff --git a/PBL/Tagalog1.docx b/PBL/Tagalog1.docx
diff --git a/PBL/Tagalog2.docx b/PBL/Tagalog2.docx
diff --git a/PBL/Tagalog3.docx b/PBL/Tagalog3.docx
diff --git a/PBL/Tagalog4.docx b/PBL/Tagalog4.docx
diff --git a/PBL/Tagalog5.docx b/PBL/Tagalog5.docx
diff --git a/PBL/Text_import1c.py b/PBL/Text_import1c.py
@@ -0,0 +1,122 @@
+# -*- coding: utf-8 -*-
+"""
+Created on Sun Apr  2 15:25:15 2023
+
+@author: Peter
+"""
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt 
+import docx
+import textract
+
+# text = pd.read_table(r"C:\Users\Peter\Downloads\Linguistic_Data\English1.docx",header=None, delimiter=None)[0].to_list()
+text1 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\Hurrian1.docx")
+text1 = docx.Document(r"C:\Users\Peter\Downloads\Linguistic_Data\English1.docx")
+
+print('List of paragraph objects:->>>')
+print(text1.paragraphs)
+
+text11 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\Hurrian1a.docx").decode('utf-8').strip()
+text11a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English1a.docx").decode('utf-8').strip()
+text211 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\Hurrian2a.docx").decode('utf-8').strip()
+text211a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English2a.docx").decode('utf-8').strip()
+text212 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\Hurrian3a.docx").decode('utf-8').strip()
+text212a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English3a.docx").decode('utf-8').strip()
+text213 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\Hurrian4a.docx").decode('utf-8').strip()
+text213a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English4a.docx").decode('utf-8').strip()
+
+text112 = bytes(text11, 'utf-8')
+text113 = list(text112)
+text113 += ['1'] * (64*64 - len(text113))
+text114 = np.reshape(text113,(64,64))
+text114 = np.asarray(text114, dtype=float)
+#print(text12)
+
+text112a = bytes(text11a, 'utf-8')
+text113a = list(text112a)
+text113a += ['1'] * (64*64 - len(text113a))
+text114a = np.reshape(text113a,(64,64))
+text114a = np.asarray(text114a, dtype=float)
+
+text112b = bytes(text211, 'utf-8')
+text113b = list(text112b)
+text113b += ['1'] * (64*64 - len(text113b))
+text114b = np.reshape(text113b,(64,64))
+text114b = np.asarray(text114b, dtype=float)
+
+text112c = bytes(text211a, 'utf-8')
+text113c = list(text112c)
+text113c += ['1'] * (64*64 - len(text113c))
+text114c = np.reshape(text113c,(64,64))
+text114c = np.asarray(text114c, dtype=float)
+
+text112d = bytes(text212, 'utf-8')
+text113d = list(text112d)
+text113d += ['1'] * (64*64 - len(text113d))
+text114d = np.reshape(text113d,(64,64))
+text114d = np.asarray(text114d, dtype=float)
+
+text112e = bytes(text212a, 'utf-8')
+text113e = list(text112e)
+text113e += ['1'] * (64*64 - len(text113e))
+text114e = np.reshape(text113e,(64,64))
+text114e = np.asarray(text114e, dtype=float)
+
+text112f = bytes(text213, 'utf-8')
+text113f = list(text112f)
+text113f += ['1'] * (64*64 - len(text113f))
+text114f = np.reshape(text113f,(64,64))
+text114f = np.asarray(text114f, dtype=float)
+#print(text12)
+
+text112g = bytes(text213a, 'utf-8')
+text113g = list(text112g)
+text113g += ['1'] * (64*64- len(text113g))
+text114g = np.reshape(text113g,(64,64))
+text114g = np.asarray(text114g, dtype=float)
+
+# Arbitrary bounds!
+N1=max(np.array(text114).max(),np.array(text114a).max(),np.array(text114e).max(),np.array(text114f).max())
+N2=min(np.array(text114).min(),np.array(text114a).min(),np.array(text114e).min(),np.array(text114f).min())
+
+Hurr1a = np.concatenate((text114,text114b,text114d,text114f))
+Eng1a = np.concatenate((text114a,text114c,text114e,text114g))
+HurrEng1a = np.concatenate((text114,text114b,text114d,text114g))
+EngHurr1a = np.concatenate((text114a,text114c,text114e,text114f))
+HurrEng2a = np.concatenate((text114,text114b,text114e,text114g))
+EngHurr2a = np.concatenate((text114a,text114c,text114d,text114f))
+HurrEng3a = np.concatenate((text114,text114c,text114e,text114g))
+EngHurr3a = np.concatenate((text114a,text114b,text114d,text114f))
+Hurr1a = np.reshape(Hurr1a,(4,64,64))
+Eng1a = np.reshape(Eng1a,(4,64,64))
+HurrEng1a = np.reshape(HurrEng1a,(4,64,64))
+EngHurr1a = np.reshape(EngHurr1a,(4,64,64))
+HurrEng2a = np.reshape(HurrEng2a,(4,64,64))
+EngHurr2a = np.reshape(EngHurr2a,(4,64,64))
+HurrEng3a= np.reshape(HurrEng3a,(4,64,64))
+EngHurr3a = np.reshape(EngHurr3a,(4,64,64))
+
+testl1=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\Hurr1a.npy",Hurr1a)
+testl2=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\Eng1a.npy",Eng1a)
+testl3=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\HurrEng1a.npy",HurrEng1a)
+testl4=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\EngHurr1a.npy",EngHurr1a)
+testl5=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\HurrEng2a.npy",HurrEng2a)
+testl6=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\EngHurr2a.npy",EngHurr2a)
+testl7=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\HurrEng3a.npy",HurrEng3a)
+testl8=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\EngHurr3a.npy",EngHurr3a)
+
+plt.figure(figsize=(6,6))
+plt.scatter(text114[:,0],text114[:,1],c=text114[:,2])
+plt.xlim(N2,N1)
+plt.ylim(N2,N1)
+plt.show()
+
+plt.figure(figsize=(6,6))
+plt.scatter(text114a[:,0],text114a[:,1],c=text114a[:,2])
+plt.xlim(N2,N1)
+plt.ylim(N2,N1)
+plt.show()
+
+V1 = pd.DataFrame(text114)
+V1.to_csv(r'C:\Users\Peter\Downloads\Linguistic_Data\HurrV.csv')
diff --git a/PBL/Text_import1d.py b/PBL/Text_import1d.py
@@ -0,0 +1,107 @@
+# -*- coding: utf-8 -*-
+"""
+Created on Sun Apr  2 15:25:15 2023
+
+@author: Peter
+"""
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt 
+import docx
+import textract
+
+# text = pd.read_table(r"C:\Users\Peter\Downloads\Linguistic_Data\English1.docx",header=None, delimiter=None)[0].to_list()
+text1 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\MinoanX6.docx")
+text1 = docx.Document(r"C:\Users\Peter\Downloads\Linguistic_Data\English1.docx")
+
+print('List of paragraph objects:->>>')
+print(text1.paragraphs)
+
+text11 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\MinoanX6a.docx").decode('utf-8').strip()
+text11a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English1a.docx").decode('utf-8').strip()
+text211 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\MinoanX6b.docx").decode('utf-8').strip()
+text211a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English2a.docx").decode('utf-8').strip()
+text212 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\MinoanX6c.docx").decode('utf-8').strip()
+text212a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English3a.docx").decode('utf-8').strip()
+text213 = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\MinoanX6d.docx").decode('utf-8').strip()
+text213a = textract.process(r"C:\Users\Peter\Downloads\Linguistic_Data\English4a.docx").decode('utf-8').strip()
+
+text112 = bytes(text11, 'utf-8')
+text113 = list(text112)
+text113 += ['1'] * (64*64 - len(text113))
+text114 = np.reshape(text113,(64,64))
+text114 = np.asarray(text114, dtype=float)
+#print(text12)
+
+text112a = bytes(text11a, 'utf-8')
+text113a = list(text112a)
+text113a += ['1'] * (64*64 - len(text113a))
+text114a = np.reshape(text113a,(64,64))
+text114a = np.asarray(text114a, dtype=float)
+
+text112b = bytes(text211, 'utf-8')
+text113b = list(text112b)
+text113b += ['1'] * (64*64 - len(text113b))
+text114b = np.reshape(text113b,(64,64))
+text114b = np.asarray(text114b, dtype=float)
+
+text112c = bytes(text211a, 'utf-8')
+text113c = list(text112c)
+text113c += ['1'] * (64*64 - len(text113c))
+text114c = np.reshape(text113c,(64,64))
+text114c = np.asarray(text114c, dtype=float)
+
+text112d = bytes(text212, 'utf-8')
+text113d = list(text112d)
+text113d += ['1'] * (64*64 - len(text113d))
+text114d = np.reshape(text113d,(64,64))
+text114d = np.asarray(text114d, dtype=float)
+
+text112e = bytes(text212a, 'utf-8')
+text113e = list(text112e)
+text113e += ['1'] * (64*64 - len(text113e))
+text114e = np.reshape(text113e,(64,64))
+text114e = np.asarray(text114e, dtype=float)
+
+text112f = bytes(text213, 'utf-8')
+text113f = list(text112f)
+text113f += ['1'] * (64*64 - len(text113f))
+text114f = np.reshape(text113f,(64,64))
+text114f = np.asarray(text114f, dtype=float)
+#print(text12)
+
+text112g = bytes(text213a, 'utf-8')
+text113g = list(text112g)
+text113g += ['1'] * (64*64- len(text113g))
+text114g = np.reshape(text113g,(64,64))
+text114g = np.asarray(text114g, dtype=float)
+
+# Arbitrary bounds!
+N1=max(np.array(text114).max(),np.array(text114a).max(),np.array(text114e).max(),np.array(text114f).max())
+N2=min(np.array(text114).min(),np.array(text114a).min(),np.array(text114e).min(),np.array(text114f).min())
+
+plt.figure(figsize=(6,6))
+plt.imshow(text114)
+MinoanX1 = np.concatenate((text114,text114b,text114d,text114f))
+Eng1a = np.concatenate((text114a,text114c,text114e,text114g))
+MinoanEngX1 = np.concatenate((text114,text114b,text114d,text114g))
+plt.figure(figsize=(6,6))
+plt.imshow(text114a)
+
+MinoanX1 = np.reshape(MinoanX1,(4,64,64))
+Eng1a = np.reshape(Eng1a,(4,64,64))
+
+testl1=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\MinoanX1.npy",MinoanX1)
+testl2=np.save(r"C:\Users\Peter\Documents\Documents\Documents (3)\Python2\Lingua\Eng1a.npy",Eng1a)
+
+plt.figure(figsize=(6,6))
+plt.scatter(text114[:,0],text114[:,1],c=text114[:,2])
+plt.xlim(N2,N1)
+plt.ylim(N2,N1)
+plt.show()
+
+plt.figure(figsize=(6,6))
+plt.scatter(text114a[:,0],text114a[:,1],c=text114a[:,2])
+plt.xlim(N2,N1)
+plt.ylim(N2,N1)
+plt.show()