This document contains the complete documentation of the data model for the Gold layer of the 100cep Gateway MVP. The model was structured following dimensional Data Warehouse principles, optimized for business analytics.
| Source Table | FK Column | Target Table | PK Column | Cardinalidade |
|---|---|---|---|---|
| fato_transacoes | cliente_id | dim_clientes | cliente_id | N:1 |
| fato_transacoes | vendedor_id | dim_vendedores | vendedor_id | N:1 |
| fato_transacoes | data_pedido | dim_data | data_calendario | N:1 |
| fato_transacoes | pedido_id | dim_chargebacks | pedido_id | N:0,1 |
| dim_clientes | cep_prefixo | dim_geolocalizacao | cep_prefixo | N:1 |
| dim_vendedores | cep_prefixo | dim_geolocalizacao | cep_prefixo | N:1 |
Note: Not all orders have chargebacks.
Type: Event Dimension
Granularity: 1 record = 1 chargeback request
Records: ~1,000 chargebacks
| Column | Type | Description |
|---|---|---|
| pedido_id | string | Unique identifier for each chargeback request, consisting of 13 alphanumeric characters in lowercase. |
| motivo_chargeback | string | Reason for the chargeback request, which may include fraud, non-receipt of products, defective product, among other causes. |
| status_chargeback | string | Current status of the chargeback request, indicating its progress stage. |
| resposta_emitente | string | Response provided by the card issuer regarding the chargeback request. |
| resposta_adquirente | string | Response provided by the acquirer regarding the chargeback request. |
Value Domains:
motivo_chargeback: Fraud, Product not received, Product defective, Service issue, Unauthorized transaction, etc.status_chargeback: pending, approved, denied, investigating, closedresposta_emitente: Approved, Denied, Under Review, Pending Documentationresposta_adquirente: Accepted, Rejected, Requesting Evidence, Processing
Business Rules:
- ✅ Not all orders have chargebacks (1:0..1 relationship)
- ✅ Chargebacks are generated by an additional dataset (AI-generated)
- ✅ Used for risk analysis and fraud detection
Use:
- Chargeback rate analysis
- Identification of fraud patterns
- Risk assessment by payment method/region
- Monitoring of financial losses
| Column | Type | Description |
|---|---|---|
| data_calendario | date | Date when the order was placed. Format: YYYY-MM-DD. |
| dia | int | Day number corresponding to the date in integer format. |
| mes | int | Month number corresponding to the date. |
| ano | int | Year number corresponding to the date. |
| nome_dia_semana | string | Weekday name corresponding to the date. |
| nome_mes | string | Month name corresponding to the date. |
| Use: |
- Identification of sellers in transactions
- Performance analysis by seller
- Geographic analysis of sellers
Type: Geographic Dimension
Granularity: 1 record = 1 ZIP code prefix (5 digits)
Records: ~19,000 ZIP code prefixes
| Column | Type | Description |
|---|---|---|
| cep_prefixo | string | The first 5 digits of the ZIP code. |
| cidade | string | City name associated with the ZIP code. |
| estado | string | Brazilian state abbreviation (two uppercase letters) associated with the ZIP code. |
| latitude | string | Geographic coordinate specifying the north-south position. |
| longitude | string | Geographic coordinate specifying the east-west position. |
| Use: |
- Identification of customers in transactions
- Geographic analysis (via
cep_prefixo) - Customer segmentation by region
Type: Dimension Table
Granularity: 1 record = 1 seller
Records: ~3,000 sellers
| Column | Type | Description |
|---|---|---|
| vendedor_id | string | Unique identifier for each seller, consisting of 13 alphanumeric characters in lowercase. |
| cep_prefixo | string | The first 5 digits of the seller's postal code. |
Type: Fact Table
Granularity: 1 record = 1 payment transaction per order
Records: ~99,000 transactions
| Column | Type | Description |
|---|---|---|
| pedido_id | string | Unique identifier for each transaction, consisting of 13 alphanumeric characters in lowercase. |
| cliente_id | string | Unique identifier for each client, consisting of 13 alphanumeric characters in lowercase. |
| vendedor_id | string | Unique identifier for each seller, consisting of 13 alphanumeric characters in lowercase. |
| data_pedido | date | Date when the order was placed. Format: YYYY-MM-DD. |
| horario_pedido | date | Time when the order was placed in the format HH:MM:SS. |
| tipo_pagamento | string | Payment method used for the transaction. |
| valor_transacao | decimal(12,2) | Transaction value in a decimal format with up to 12 digits and 2 decimal places, only positive values. |
| preco_total | decimal(16,2) | Total order price, including product price and shipping cost. |
| frete_total | decimal(15,2) | Total shipping cost in a decimal format with up to 15 digits and 2 decimal places, only positive values. |
| status_pedido | string | Current delivery status of the order. |
Value Domains:
tipo_pagamento: credit_card, boleto, voucher, debit_cardstatus_pedido: delivered, shipped, canceled, processing, unavailable, invoiced
Business Rules:
- ✅ Only orders with status "delivered" are included
- ✅
preco_total= sum of all items in the order - ✅
frete_total= sum of all shipping values of the items - ✅
valor_transacaocan be different frompreco_total(multiple payments)
Type: Time Dimension
Granularity: 1 record = 1 day
Records: ~365 days (2016-2018)
| Column | Type | Description |
|---|---|---|
| data_calendario | date | Date when the order was placed. Format: YYYY-MM-DD. |
| dia | int | Day number corresponding to the date in integer format. |
| mes | int | Month number corresponding to the date. |
| ano | int | Year number corresponding to the date. |
| nome_dia_semana | string | Weekday name corresponding to the date. |
| nome_mes | string | Month name corresponding to the date. |
Value Domains:
dia: 1-31mes: 1-12ano: 2016-2018nome_dia_semana: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sundaynome_mes: January, February, March, ..., December
Type: Dimension Table
Granularity: 1 record = 1 customer
Records: ~99,000 customers
| Column | Type | Description |
|---|---|---|
| cliente_id | string | Unique identifier for each client, consisting of 13 alphanumeric characters in lowercase. |
| cep_prefixo | string | The first 5 digits of the customer postal code. |
Type: Dimension Table
Granularity: 1 record = 1 seller
Records: ~3,000 sellers
| Column | Type | Description |
|---|---|---|
| vendedor_id | string | Unique identifier for each seller, consisting of 13 alphanumeric characters in lowercase. |
| cep_prefixo | string | The first 5 digits of the seller's postal code. |
Type: Geographic Dimension
Granularity: 1 record = 1 ZIP code prefix (5 digits)
Records: ~19,000 ZIP code prefixes
| Column | Type | Description |
|---|---|---|
| cep_prefixo | string | The first 5 digits of the ZIP code. |
| cidade | string | City name associated with the ZIP code. |
| estado | string | Brazilian state abbreviation (two uppercase letters) associated with the ZIP code. |
| latitude | string | Geographic coordinate specifying the north-south position. |
| longitude | string | Geographic coordinate specifying the east-west position. |
Value Domains:
estado: Abbreviations of the 27 Brazilian states (AC, AL, AM, AP, BA, CE, DF, ES, GO, MA, MG, MS, MT, PA, PB, PE, PI, PR, RJ, RN, RO, RR, RS, SC, SE, SP, TO)latitude: -33.75 to 5.27 (Brazil boundaries)longitude: -73.99 to -34.79 (Brazil boundaries)
📂 Kaggle CSV Files
↓
💾 Unity Catalog Volumes
↓
🟫 Bronze Layer (Raw)
├─ bronze_customers
├─ bronze_orders
├─ bronze_order_payments
├─ bronze_order_items
├─ bronze_sellers
├─ bronze_geolocation
└─ bronze_chargebacks
↓
🧹 Silver Layer (Cleaned)
├─ silver_customers [limpeza + validação]
├─ silver_orders [conversão temporal]
├─ silver_order_payments [agregação]
├─ silver_order_items [agregação]
├─ silver_sellers [padronização]
├─ silver_geolocation [deduplicação]
└─ silver_chargebacks [enriquecimento]
↓
🏆 Gold Layer (Analytics) ← YOU ARE HERE!
├─ fato_transacoes
├─ dim_data
├─ dim_clientes
├─ dim_vendedores
├─ dim_geolocalizacao
└─ dim_chargebacks
| Origin (Kaggle) | Bronze | Silver | Gold | Transformation |
|---|---|---|---|---|
| order_id | order_id | pedido_id | pedido_id | Translation |
| customer_id | customer_id | cliente_id | cliente_id | Translation |
| order_purchase_timestamp | order_purchase_timestamp | data_compra (TIMESTAMP) | data_pedido (DATE) | Conversion + Split |
| payment_value | payment_value (STRING) | valor_pagamento (DECIMAL) | valor_transacao (DECIMAL) | Type cast + Aggregation |
| zip_code_prefix | zip_code_prefix | cep_prefixo | cep_prefixo | Translation |
Complete transformation documentation: /docs/etl.md
- Unique primary keys (no duplicates)
- Valid foreign keys (referential integrity)
- Correct and consistent data types
- Values within expected domains
- Nulls only in optional fields
- Geographic coordinates within Brazil's boundaries
- Dates within the expected range (2016-2018)
- Positive numeric values (except lat/long)





