terms used in ML

Machine learning algorithms are divided into three categories based on their learning approach: 
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning.

Supervised Learning: 
Learning with a Teacher
Imagine a student learning with a teacher who provides both questions and answers. Supervised learning works similarly. Data serves as the question (inputs) with labeled answers (outputs). The goal? Train the algorithm to grasp the relationship between these inputs and outputs, allowing it to predict outcomes for entirely new data.
Think House Prices:
Imagine data with house features (size, bedrooms) and their selling prices (labels). By analyzing this data, the algorithm learns the patterns influencing house prices. Once trained, it can predict the price of a new house based solely on its features!
Types: 
Regression (predicts continuous values like house prices)
Classification (predicts discrete labels like spam/not spam emails).
Popular Algorithms: The Main two types of Supervised Machine learning are Classification and Regression
Regression: Used when the output is a continuous value. For example, predicting house prices or temperatures.
Classification: Used when the output is a discrete label. For example, classifying emails as spam or not spam, or predicting if a tumor is malignant or benign.
1. Linear Regression
2. Logistic Regression
3. Decision Trees
4. Random Forest
5. Support Vector Machines (SVM)
6. K-Nearest Neighbors (KNN)

Unsupervised Learning: 
Discovering Hidden Patterns
Unlike supervised learning with labeled data, unsupervised learning works with unlabeled data. Imagine a treasure hunt without a map - the algorithm is tasked with finding hidden structures and patterns on its own.
Unlabeled Data: The algorithm analyzes data without predefined categories.
Discovering Patterns: It identifies similarities and groups data points together based on shared features.
Think of Sorting Fruit: Imagine sorting fruit by type (apples, oranges) without labels. Unsupervised learning tackles similar tasks.
Types: 
Clustering (groups similar data points together)
Dimensionality Reduction (reduces complex data for easier visualization).
Popular Algorithms: The Main types of Unsupervised Machine learning Algorithms are Clustering, Anomaly Detection and Dimensionality Reduction.
8. K-Means Clustering
9. Hierarchical Clustering
10. Principal Component Analysis (PCA)
Reinforcement Learning: Learning by Trial and Error
Reinforcement learning mimics human learning through trial and error. Imagine learning to ride a bike - you receive positive reinforcement (praise) for successful actions and negative reinforcement (bumps and scrapes) for mistakes. The algorithm interacts with an environment, receives rewards (positive) or penalties (negative) for its actions, and strives to maximize rewards over time.
Think of a Robot: Imagine a robot learning to navigate a maze. It receives rewards for reaching the goal and penalties for hitting walls. This helps it learn the optimal path.
Reinforcement Learning Algorithms
11. Q-Learning
12. Deep Q Network (DQN)
13. Policy Gradient Methods
14. Actor-Critic Methods
Real-World Applications:

Supervised Learning: 
Healthcare (predicting patient outcomes), Finance (detecting fraud), Marketing (personalizing campaigns).
Unsupervised Learning: 
Market Basket Analysis (identifying frequently bought-together products), Anomaly Detection (finding suspicious patterns in data), Customer Segmentation (grouping customers based on behavior).
Reinforcement Learning: 
Robotics (controlling robot movement), Game playing (training AI to play games), Self-driving Cars (learning to navigate roads).

Clustering -> is a automatic grouping of similar objects into sets ( Application-> Customer Segmentation )
Dimensionality Reduction -> Reducing the number of random variables to consider ( Applications-> To increase model efficiency )

Dealing with nan values in dataset....
1. Imputation  ( Replace values with central tendency ( Mean, Median, Mode )  )
     1. When we have skew distribution of dataset and we are asked to find nan values then we can't use mean value in place of nan values and will use median or mode 
     2. And in case of normally distributed graph having values distributed normally then we will use mean value
2. Drop
     1. Another Method is droping but this method is not highly appreciated to drop/delete a record(row) by using drop() function.

Data Standardization
  Process of standardizing data to a common format and common range
First chech the standard Diviation of dataset
if standard diviation is 1 then data will be in proper range and if not then standardize it

Label Encoding
    Converting labels into numeric form
When we are working with classificatio machine learning problems ( wether the datapoints belongs to one class or another) e.g We are prdicting wether a person is diabetic or non diabetic , and if data tells that 
one person is diabetic or non diabetic then it is not easy to use that value. So what we do is we will convert diabetic and non diabetic to numeric values i-e either 0 or 1. that is label encoding.

While handling with missing values( nan values ) it is not good approch to drop nan( not a number ) values

Feature Extraction of Text Data
    The mapping from textual data to real valued vectors is called feature extraction 
BOW( Bag Of Words )
     List of uniqe words in text corpus( collection of words )
TF-IDF ( Term Frequency-Inverse Document Frequency )
     Used to count No of times each word appear in a document 

Stemming
     Stemming is the process of reducing words to its root word, e.g enjoyment->enjoyfull->enjoying->enjoys->enjoy