Skip to content

Commit 448f7cf

Browse files
authored
Add ruumianalyys notebook
1 parent 83919ec commit 448f7cf

File tree

1 file changed

+242
-0
lines changed

1 file changed

+242
-0
lines changed

statistika_ruumianalyys.ipynb

Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "de2e459e-a8dd-4cc1-b941-7f1a86bb2cb5",
6+
"metadata": {},
7+
"source": [
8+
"## Ruumiandmete analüüs\n",
9+
"\n",
10+
"__Prof. Evelyn Uuemaa__\n",
11+
"\n",
12+
"Käesoleva skripti abil saate üles laadida oma CSV-faili ja selle põhjal teha karpdiagrammid ning Mann-Whitney U testi. Lõpuks saate tulemuse aruande jaoks pildifailina alla laadida."
13+
]
14+
},
15+
{
16+
"cell_type": "markdown",
17+
"id": "3710b104-ecd1-46c0-a2d0-16b631e0625f",
18+
"metadata": {},
19+
"source": [
20+
"### Kuidas seda skripti kasutada\n",
21+
"\n",
22+
"Mine hiirega kõige esimesele kastile ja vajuta \"Run\" nuppu, et üksus läbi jooksutada ja minna järgmiste plokkide juurde. Kui midagi valesti läheb, saad sa lihtsasti uuesti algusest alustada."
23+
]
24+
},
25+
{
26+
"cell_type": "code",
27+
"execution_count": null,
28+
"id": "e7365dde-1313-49e5-ad8d-33c3593fbfff",
29+
"metadata": {},
30+
"outputs": [],
31+
"source": [
32+
"from IPython.display import display\n",
33+
"import ipywidgets as widgets\n",
34+
"\n",
35+
"import pandas as pd\n",
36+
"import seaborn as sns\n",
37+
"import matplotlib.pyplot as plt\n",
38+
"\n",
39+
"sns.set_theme(style=\"whitegrid\")\n",
40+
"\n",
41+
"%matplotlib inline"
42+
]
43+
},
44+
{
45+
"cell_type": "markdown",
46+
"id": "ad421714-74e1-439d-9904-caa28852cc6a",
47+
"metadata": {},
48+
"source": [
49+
"Ülejärgmise ploki juures tuleb sul arvutist valida äsja salvestatud CSV-fail. Kui sinu CSV-faili eraldusmärgiks on koma, pole vaja midagi muuta. Kui aga semikoolon, tee muutus järgmises plokis: csv_delimiter = \";\"."
50+
]
51+
},
52+
{
53+
"cell_type": "code",
54+
"execution_count": null,
55+
"id": "ca87e67d-5e04-4a9d-9a60-8e5f9558c2ff",
56+
"metadata": {},
57+
"outputs": [],
58+
"source": [
59+
"upload = widgets.FileUpload( accept='.csv', multiple=False )\n",
60+
"\n",
61+
"csv_delimiter = \",\"\n",
62+
"text_encoding = \"utf8\""
63+
]
64+
},
65+
{
66+
"cell_type": "code",
67+
"execution_count": null,
68+
"id": "2880386e-f45d-4b4c-bcdc-33d32084c9ec",
69+
"metadata": {},
70+
"outputs": [],
71+
"source": [
72+
"upload"
73+
]
74+
},
75+
{
76+
"cell_type": "markdown",
77+
"id": "79cc56dd-f429-4f00-95ff-77540149516c",
78+
"metadata": {},
79+
"source": [
80+
"Kui oled faili oma arvutist valinud, peaks üleslaadimise lahter näitama \"Upload (1)\", mis tähendab, et fail on nüüd valitud.\n",
81+
"\n",
82+
"Jooksuta läbi järgmine plokk. Tulemuseks peaks olema ilus tabel, mitte ühes jorus tekst. Kui näed väärtuste vahel semikooloneid (;), tuleb teises plokis teha parandus (csv_delimiter = \";\")."
83+
]
84+
},
85+
{
86+
"cell_type": "code",
87+
"execution_count": null,
88+
"id": "d1ddb735-7dd7-4347-af7e-f408be349135",
89+
"metadata": {},
90+
"outputs": [],
91+
"source": [
92+
"filestream_encoding = \"utf8\"\n",
93+
"\n",
94+
"t = upload.data[0].decode(encoding=filestream_encoding)\n",
95+
"\n",
96+
"with open('tmp.csv', 'w', encoding=text_encoding) as fh:\n",
97+
" fh.write(t)\n",
98+
"\n",
99+
"df = pd.read_csv('tmp.csv', encoding=text_encoding, sep=csv_delimiter)\n",
100+
"\n",
101+
"df.head(10)"
102+
]
103+
},
104+
{
105+
"cell_type": "markdown",
106+
"id": "a75caab5-8e95-49d7-b17d-5f1379921608",
107+
"metadata": {},
108+
"source": [
109+
"Nüüd on andmed loetud andmeraami *(dataframe)* formaati ja me saame arvutada statistikuid ning teha jooniseid. Muuda järgmises plokis muutuja \"layer\" väärtust, kui sul on see veerg mõne teise nimega kui \"layer\". Seejärel jooksuta järgmist plokki, et saada karpvurrdiagramm."
110+
]
111+
},
112+
{
113+
"cell_type": "code",
114+
"execution_count": null,
115+
"id": "19c43e2a-8892-4168-8d34-71b68cd182df",
116+
"metadata": {},
117+
"outputs": [],
118+
"source": [
119+
"classes = \"layer\"\n",
120+
"\n",
121+
"values = \"DISTANCE\""
122+
]
123+
},
124+
{
125+
"cell_type": "code",
126+
"execution_count": null,
127+
"id": "cfed8b9c-7a6e-48f2-bfb8-ca42bd6ec942",
128+
"metadata": {},
129+
"outputs": [],
130+
"source": [
131+
"df['values_prep'] = pd.to_numeric(df[values], errors='coerce')\n",
132+
"\n",
133+
"df = df.rename(columns={values : f\"{values}_\"})\n",
134+
"df = df.rename(columns={'values_prep' : values})\n",
135+
"\n",
136+
"fig, ax = plt.subplots(figsize=(5,7))\n",
137+
"\n",
138+
"sns.boxplot(x=classes, y=values, data=df.dropna(subset=[classes, values]), palette=\"Spectral\")\n",
139+
"plt.xticks(\n",
140+
" rotation=0, \n",
141+
" horizontalalignment='center',\n",
142+
" fontweight='light',\n",
143+
" fontsize='x-large', \n",
144+
")\n",
145+
"plt.xlabel(classes, fontsize='x-large')\n",
146+
"plt.yticks(\n",
147+
" rotation=0, \n",
148+
" horizontalalignment='right',\n",
149+
" fontweight='light',\n",
150+
" fontsize='x-large',\n",
151+
")\n",
152+
"plt.ylabel(values, fontsize='x-large')\n",
153+
"\n",
154+
"plt.title(f\" {values} box-plots\", fontsize=20)\n",
155+
"\n",
156+
"plt.savefig('boxplots.png', bbox_inches='tight', transparent=False)\n",
157+
"plt.show()"
158+
]
159+
},
160+
{
161+
"cell_type": "markdown",
162+
"id": "0e770c72-f26a-405c-a88c-47f16bd30bb2",
163+
"metadata": {},
164+
"source": [
165+
"[lae joonis alla siit](boxplots.png)\n",
166+
"\n",
167+
"Alternatiivina võid ka pildi lihtsalt otse kopeerida, vajutades hiire parema klõpsuga pildil.\n",
168+
"\n",
169+
"**Karpdiagrammide tõlgendamine**\n",
170+
" “karbi” keskjoon näitab mediaani, karbi ülemine ja alumine serv märgib 75% ja 25% protsentiile (1. ja 3. kvartiili), püstjoonte äärmusotsad (vuntsid) markeerivad väljaspoole kvartiilihaaret x 1,5 jäävaid vaatlusi, mida nimetatakse erinditeks (outlier). (Kvartiilihaare on inglise keeles Interquartile range, IQR).\n",
171+
"\n",
172+
" Võrdle omavahel tegelike maastikutulekahjude ja juhuslike punktide kaugus teest mediaanväärtusi ja tee selle põhjal järeldus, kas maastikutulekahjud on juhusliku või mitte juhusliku paigutusega teede suhtes.\n"
173+
]
174+
},
175+
{
176+
"cell_type": "markdown",
177+
"id": "3f99c123-29ad-4d36-9f59-ca9b60f1a50a",
178+
"metadata": {},
179+
"source": [
180+
"# Mann-Whitney U test\n",
181+
"\n",
182+
"Mann-Whitney U test on mitteparameetriline test, mida kasutatakse kahe rühma võrdlemiseks juhul kui rühmad ei ole normaaljaotusega ja/või vaatlusandmeid on vähe. Valimi suurus peaks olema vähemalt 20 ning võrreldavad muutujad peaksid olema üksteisest sõltumatud. "
183+
]
184+
},
185+
{
186+
"cell_type": "code",
187+
"execution_count": null,
188+
"id": "7b3d5145-611a-45d1-8164-964511ae2241",
189+
"metadata": {},
190+
"outputs": [],
191+
"source": [
192+
"from scipy.stats import mannwhitneyu\n",
193+
"\n",
194+
"if not len(df[classes].unique()) == 2:\n",
195+
" print(\"Hoiatus! Vajalik on kaks klassi\")\n",
196+
"\n",
197+
"group1 = df.loc[df[classes] == df[classes].unique()[0]][values]\n",
198+
"group2 = df.loc[df[classes] == df[classes].unique()[1]][values]\n",
199+
"\n",
200+
"#perform the Mann-Whitney U test\n",
201+
"result = mannwhitneyu(group1, group2, alternative='two-sided')\n",
202+
"print(f\"U statistik {result.statistic}\")\n",
203+
"print(f\"P-väärtus {result.pvalue}\")"
204+
]
205+
},
206+
{
207+
"cell_type": "markdown",
208+
"id": "eaf8e1b3-121b-409b-b69b-9c108142bf64",
209+
"metadata": {},
210+
"source": [
211+
"**Tulemuste tõlgendamine**\n",
212+
"\n",
213+
"Nullhüpotees: kahe rühma jaotus on sarnane\n",
214+
"\n",
215+
"Sisukas hüpotees: kahe rühma jaotus on erinev\n",
216+
"\n",
217+
"Kui p-väärtus ei ole väiksem kui 0,05, siis me ei saa nullhüpoteesi ümber lükata. Seega kahe rühma erinevus on statistiliselt oluline, kui p < 0,05."
218+
]
219+
}
220+
],
221+
"metadata": {
222+
"kernelspec": {
223+
"display_name": "Python 3 (ipykernel)",
224+
"language": "python",
225+
"name": "python3"
226+
},
227+
"language_info": {
228+
"codemirror_mode": {
229+
"name": "ipython",
230+
"version": 3
231+
},
232+
"file_extension": ".py",
233+
"mimetype": "text/x-python",
234+
"name": "python",
235+
"nbconvert_exporter": "python",
236+
"pygments_lexer": "ipython3",
237+
"version": "3.12.3"
238+
}
239+
},
240+
"nbformat": 4,
241+
"nbformat_minor": 5
242+
}

0 commit comments

Comments
 (0)