You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
renamed LOAD DATA TABLES, added req_query COLUMN to separate URI and QUERY STRING. Normalized access_log_useragent TABLE into 11 new tables. resized many columns. Lots of other stuff in last 5 days.
Copy file name to clipboardExpand all lines: .github/README.md
+58-19Lines changed: 58 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1
# Apache Log Parser and Data Normalization Application
2
-
ApacheLogs2MySQL consists of two Python Modules & one MySQL Schema designed to automate importing Apache Access & Error Log files into a normalized database schema for reporting & data analysis.
2
+
ApacheLogs2MySQL consists of two Python Modules & one MySQL Schema designed to automate importing Apache Access & Error Log files into a normalized database schema for reporting & data analysis.
3
3
4
4
Application runs on Windows, Linux and MacOS & tested with MySQL versions 8.0.39, 8.4.3, 9.0.0 & 9.1.0.
5
5
6
-
For hassle-free installation follow `INSTALL.md` step by step. Install information is also in `README.md` for reference but `INSTALL.md` a concise list of installation steps.
6
+
For easy installation follow `INSTALL.md` step by step. Install information is also in `README.md` for reference but `INSTALL.md` a concise list of installation steps.
7
7
8
-
## MySQL view logs totalled by Browsers
9
-
MySQL View - apache_logs.access_log_browser_list - data from LogFormat: extended
8
+
## MySQL Access Log View by Browser - 1 of 50 schema views
9
+
MySQL View - apache_logs.access_log_browser_list - data from LogFormat: combined & extended
|%h|Remote hostname. Will log IP address if HostnameLookups is set to Off, which is default. If it logs hostname for only a few hosts, you probably have access control directives mentioning them by name.|
49
+
|%l|Remote logname. Returns dash unless "mod_ident" is present and IdentityCheck is set On. This can cause serious latency problems accessing server since every request requires a lookup be performed.|
50
+
|%u|Remote user if the request was authenticated. May be bogus if return status (%s) is 401 (unauthorized).|
51
+
|%t|Time the request was received, in the format [18/Sep/2011:19:18:28 -0400]. The last number indicates the timezone offset from GMT|
52
+
|%r|First line of request. Contains 4 format strings (%m - The request method, %U - The URL path requested not including any query string, %q - The query string, %H - The request protocol)|
53
+
|%s|Status. For requests that have been internally redirected, this is the status of the original request. Use %>s for the final status.|
54
+
|%O|Bytes sent, including headers. May be zero in rare cases such as when a request is aborted before a response is sent. You need to enable mod_logio to use this.|
The application also processes Error Logs with default format for threaded MPMs (Multi-Processing Modules). If you're running Apache 2.4 on any platform and ErrorLogFormat is not defined in config files this is the Error Log format.
65
+
|Format String|Description - additional format strings|
66
+
|-------------|-----------|
67
+
|%v|The canonical ServerName of the server serving the request.|
68
+
|%p|The canonical port of the server serving the request.|
69
+
70
+
Application is designed to use this extended LogFormat of 6 additions and 2 substractions (%l and %u) from vhost_combined.
71
+
|Format String|Description|
72
+
|-------------|-----------|
73
+
|%v|The canonical ServerName of the server serving the request.|
74
+
|%p|The canonical port of the server serving the request.|
75
+
|%h|Remote hostname. Will log the IP address if HostnameLookups is set to Off, which is the default.|
76
+
|%t|Time the request was received, in the format [18/Sep/2011:19:18:28 -0400]. The last number indicates the timezone offset from GMT|
77
+
|%I|ADDED - Bytes received, including request and headers. Enable "mod_logio" to use this.|
78
+
|%O|Bytes sent, including headers. The %O format provided by mod_logio will log the actual number of bytes sent over the network. Enable "mod_logio" to use this.|
79
+
|%S|ADDED - Bytes transferred (received and sent), including request and headers, cannot be zero. This is the combination of %I and %O. Enable "mod_logio" to use this.|
80
+
|%B|ADDED - Size of response in bytes, excluding HTTP headers. Does not represent number of bytes sent to client, but size in bytes of HTTP response (will differ, if connection is aborted, or if SSL is used).|
81
+
|%{ms}T|ADDED - The time taken to serve the request, in milliseconds. Combining %T with a unit is available in 2.4.13 and later.|
82
+
|%D|ADDED - The time taken to serve the request, in microseconds.|
83
+
|%^FB|ADDED - Delay in microseconds between when the request arrived and the first byte of the response headers are written. Only available if LogIOTrackTTFB is set to ON. Available in Apache 2.4.13 and later.|
84
+
|%s|Status. For requests that have been internally redirected, this is the status of the original request.|
85
+
|%H|The request protocol. Included in %r - First line of request.|
86
+
|%m|The request method. Included in %r - First line of request.|
87
+
|%U|The URL path requested, not including any query string. Included in %r - First line of request.|
88
+
|%q|The query string (prepended with a ? if a query string exists, otherwise an empty string). Included in %r - First line of request.|
89
+
|%{Referer}i|The "Referer" (sic) HTTP request header. This gives the site that the client reports having been referred from. (This should be the page that links to or includes /apache_pb.gif).|
90
+
|%{User-Agent}i|The User-Agent HTTP request header. This is the identifying information that the client browser reports about itself.|
91
+
|%{VARNAME}C|ADDED - The contents of cookie VARNAME in request sent to server. Only version 0 cookies are fully supported. ie - session ID to relate with login tables on server.|
The application processes Error Logs with default format for threaded MPMs (Multi-Processing Modules). If you're running Apache 2.4 on any platform and ErrorLogFormat is not defined in config files this is the Error Log format.
Database normalization is the process of organizing data in a relational database to improve data integrity and reduce redundancy. Normalization ensures that data is organized in a way that makes sense for the data model and attributes, and that the database functions efficiently.
156
195
157
196
View Data images are from 2 views in the apache_logs schema. Database normalization at work. There are 35 more schema views.
158
-
## MySQL view logs totalled by URIs
159
-
MySQL View - apache_logs.access_log_requri_list - data from LogFormat: extended
197
+
## MySQL Access Log View by URI
198
+
MySQL View - apache_logs.access_log_requri_list - data from LogFormat: combined & extended
MySQL Error Views - most of the verbiage above is about Access Logs. The application also does the same normalization with error logs. These are many of the views in apache_logs Schema. The error log attribute is the name of the first column. Each attribute has associated table in apache_logs Schema.
200
+
## MySQL Error Log Views
201
+
MySQL Error Log Views - The application imports and normalizes error log data as well. Some of the schema views. Error log attribute is name of first column. Each attribute has associated table in apache_logs Schema.
## Schema Objects - Tables, Views, Store Proceduresand Stored Functions
174
-
Images of the apache_logs schema objects. Access and Error log attributes have been broken into separate entity tables. Each table populated with unique values of the entity. Entity Relationship Diagram will be posted soon.
212
+
## MySQL Schema Objects - Tables, Stored Procedures, Functions and Views
213
+
Images of the apache_logs schema objects. Access and Error log attributes are normalized into separate entity tables. Each table is populated with unique values of the attribute. Entity Relationship Diagram will be posted soon.
0 commit comments