Skip to content

Oferbtzvi30/Python-Data-Analysis-Notebooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

32 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Hello everyone ๐Ÿ˜Ž,

The purpose of this repository is to share my projects in Exploratory Data Analysis (EDA)

Please feel free to add comments, ask questions and suggest new ideas and databases for data analysis.

What is an EDA? ๐Ÿš€

EDA (Exploratory Data Analysis) is a working approach to getting to know and working with a dataset. By using EDA, we can obtain knowledge and conclusions from any database we need to analyze. In this document, I have summarized some of the most significant principles for creating an EDA that we can use against any data set that is required:

  1. Loading the Data:๐Ÿงฒ Although this is obvious from the others, but you can see in the various projects that the process of loading the data is a critical step and varies between the different databases. Sometimes we are required to correct and arrange the data already in the loading phase, so this is an important phase that needs to be emphasized.
  2. Understanding the Data: A few basic details - what is the data-shape in terms of columns & rows, and what details can be found in each column and validating datatypes.
  3. Clean the Dataset: ๐Ÿ”Ž In the following section, we'll remove unnecessary columns and deal with empty and duplicated rows.
  4. Classification of variables: ๐Ÿ—ƒ๏ธ Variables can be classified as categorical or quantitative, each type requires different handling: Categorical variable: Names or labels (i.e., categories) with no logical order (nominal, hair color, type of dog, city, etc.,) or with a logical order but inconsistent differences between groups / no quantitive meaning (ordinal, rating surveys in restaurants, laptop preference) Quantitative variable: Numerical values with quantitive meaning that can be placed in a meaningful order with consistent intervals.
  5. Summary Statistics Observations: ๐Ÿงฎ A. Univariate analysis: explores each variable in a data set, separately.It looks at the range of values, as well as the central tendency of the values.It describes the pattern of response to the variable.It describes each variable on its own. Descriptive statistics describe and summarize data. B. Summary Statistics: Summary statistics is a part of descriptive statistics that summarizes and provides the gist of information about the sample data. For example: finding: a measure of location, or central tendency, such as the arithmetic mean. c. Visualizing the Data: Data visualization is the representation of data through use of common graphics, such as charts, plots, infographics, and even animations. These visual displays of information communicate complex data relationships and data-driven insights in a way that is easy to understand.
  6. Exporting the EDA: ๐Ÿ’Ž The display of the data analysis is ultimately the whole story, we as data analysts are tasked with taking unorganized data and drawing conclusions from it that non-technical people can understand and make decisions.That's why we have to invest in our analysis display in the optimal way and most importantly, keep it simple.

Enjoy!โค๏ธ

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published