Data_Pedigree_Analysis

Tracking Data lineage / Origin

Metadata Analysis

This technique involves examining metadata associated with data assets to understand their origins, transformations, and relationships[4].

Source Identification: Tools analyze metadata to trace data origins across multiple systems[4].
Transformation Mapping: Changes in data structure, format, or content are recorded based on metadata changes[4].
End-to-end Tracking: Metadata is used to provide a comprehensive view of data from creation to final use[4].

Graph-based Analysis

This method uses graph theory to represent data lineage as a network of interconnected nodes and edges.

Node Representation: Data elements, transformations, and systems are represented as nodes in the graph[6].
Edge Representation: Relationships and data flows between nodes are represented as edges[6].
Path Analysis: Graph algorithms are used to trace data paths and identify dependencies[6].

Pattern-based Lineage

This approach focuses on identifying recurring patterns in data transformations[2].

Pattern Recognition: Common data transformation patterns are identified and cataloged[2].
Pattern Matching: New data flows are analyzed to match known patterns[2].
Lineage Inference: Data lineage is inferred based on recognized patterns, allowing for efficient tracking of multiple datasets[2].

Automated Lineage Tracking

This methodology relies on automated tools to continuously monitor and update data lineage information[4].

Real-time Monitoring: Tools track data flows and transformations as they occur[4].
Automated Documentation: Changes in data structure, location, or content are automatically recorded[4].
Integration with Data Pipelines: Lineage tracking is integrated directly into data processing workflows[4].

Statistical Framework

This approach uses statistical methods to analyze and predict data lineage, particularly useful in complex scenarios like genetic studies[5].

Local Coverage: Uses inferred inheritance vectors to measure genotype-imputation ability in specific regions of interest[5].
Genome-wide Coverage: Utilizes pedigree structure to compute lineage metrics across the entire genome[5].
Subject Selection Optimization: Statistical methods are used to identify the most efficient subjects for sequencing in pedigree studies[5].

By employing these techniques and methodologies, organizations can gain a comprehensive understanding of their data's journey, ensuring data quality, compliance, and effective decision-making based on reliable information.

Citations: https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2013.00189/full https://segment.com/blog/data-lineage/ https://www.bioconductor.org/packages/devel/bioc/vignettes/FamAgg/inst/doc/FamAgg.html https://www.acceldata.io/blog/how-to-use-data-lineage-tools-for-tracking-data-transformations https://pmc.ncbi.nlm.nih.gov/articles/PMC3928665/ https://www.ardoq.com/knowledge-hub/data-lineage https://pmc.ncbi.nlm.nih.gov/articles/PMC4757949/ https://www.softwareag.com/en_corporate/resources/data-integration/article/data-lineage.html

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Searching_Agent.py		Searching_Agent.py
Searching_Agent_api.py		Searching_Agent_api.py
correlation_heatmap.png		correlation_heatmap.png
monthly_avg_temp.png		monthly_avg_temp.png
nyc_temp.csv		nyc_temp.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data_Pedigree_Analysis

Metadata Analysis

Graph-based Analysis

Pattern-based Lineage

Automated Lineage Tracking

Statistical Framework

About

Releases

Packages

Languages

License

VishaalChandrasekar0203/Data_Pedigree_Analysis

Folders and files

Latest commit

History

Repository files navigation

Data_Pedigree_Analysis

Metadata Analysis

Graph-based Analysis

Pattern-based Lineage

Automated Lineage Tracking

Statistical Framework

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages