Skip to content

Enron Email Database UseCase Doc

Qiaoran edited this page Oct 31, 2019 · 34 revisions

Enron Email Database UseCase Documents

Use case queries

  • V-Query in Haskell is located in /Users/qiaoran/Desktop/VDBMS/src/VDB/Example/EnronUseCase/EnronQuery
    • Each xxxyyy.hs file contains one v-query for interaction of xxx and yyy.
    • The context of this query and the translated query are encode as comments in each file.
  • SQL Queries for each interaction are located in /Users/qiaoran/Desktop/VDBMS/databases/enronEmailDB/enronDbSQLqueries
    • The query about interaction of xxx and yyy are located in the directory named xxxyyy

Original Enron Email Database

Data Set

Enron Data is From here

Schema

  • employee(eid, firstname, lastname, email_id, email2, email3, email4, folder, status)

    • size: 149 tuples
  • message(mid, sender, date, message_id, subject, body, folder) 

    • size: 252771 tuples
  • recipientinfo(rid, mid, rtype, rvalue, dater) 

    • size: 2064442 tuples
  • referenceinfo(rfid, mid, reference) 

    • size: 54778

Enron Email Variational Database

1. Schema

  • v_employee(eid, firstname, lastname, email_id, folder, status, sign, public_key, presCond)

    • sign is responded to signature
    • public_key is responded to encrypt
  • v_message(mid, sender, date, message_id, subject, body, folder, is_signed, is_encrypted, is_from_remailer, presCond) 

    • is_signed is responded to signature
    • is_encrypted is responded to encrypt
    • is_from_remailer is responded to remailmsg
  • v_recipientinfo(rid, mid, rtype, rvalue, presCond) 

  • v_referenceinfo(rfid, mid, reference,presCond) 

  • v_auto_msg(eid, subject, body, presCond)

    • whole table will depend on autoresponder
  • v_forward_msg(eid, forwardaddr, presCond)

    • whole table is responded to forwardmessage
  • v_remail_msg(eid, pseudonym, presCond)

    • whole table is responded to remailmessage
  • v_filter_msg(eid, suffix, presCond)

    • whole table is responded to filtermessages
  • v_mailhost(eid, username, mailhost, presCond)

    • whole table is responded to mailhost
  • v_alias(eid, email, nickname, presCond)

    • whole table is responded to addressbook

2. Features in Enron Email V-DB

  • signature
  • addressbook
  • filtermsg
  • autoresponder
  • forwardmsg
  • mailhost
  • encrypt
  • remailmsg

3. Variants Designed For Enron case study

    1. Products for daily use include features:
      • Features:
        • forwardmsg
        • filtermsg
      • Presence Condition(p1):
        • forwardmsg AND filtermsg AND NOT (addressbook OR encrypt OR remailmsg OR autoresponder OR signature OR mailhost)
    1. Products that focus on Privacy
      • Features:
        • signature
        • encrypt
        • remailmsg
      • Presence Condition(p2):
        • signature AND encrypt AND remailmsg AND NOT (addressbook OR filtermsg OR autoresponder OR forwardmsg OR mailhost)
    1. Products that focus on Group Usage
      • features:
        • addressbook
        • autoresponder
        • mailhost
      • Presence Condition(p3):
        • addressbook AND autoresponder AND mailhost AND NOT (forwardmsg OR encrypt OR remailmsg OR filtermsg OR signature)
    1. Products that enables all features
      • Presence Condition(p4):
        • signature AND addressbook AND filtermsg AND autoresponder AND forwardmsg AND mailhost AND encrypt AND remailmsg
    1. Products that disables all features
      • Presence Condition(p5):
        • (NOT signature) AND (NOT addressbook) AND (NOT filtermsg) AND (NOT autoresponder) AND (NOT forwardmsg) AND (NOT mailhost) AND (NOT encrypt) AND (NOT remailmsg)

Populate the Database

1. Break employee into 5 groups

We have 150 employees in Enron database and 5 product variants for email system, we therefore decide to break 150 employees into 5 groups (A,B,C,D,E), and let those 5 groups people be user of the corresponding 5 different product variants. Assign employee to different group based on employee number (eid) as follows:

  • Product for daily usage: 0 < eid <= 30
  • Product for privacy: 30 < eid <= 60
  • Product for group usage: 60 < eid <= 90
  • Product for all enabled features: 90 < eid <= 120
  • Product for all disabled features: 120 < eid <= 150

2. For v_employee:

v_employee(eid, firstname, lastname, email_id, folder, status, sign, public_key, presCond)

- ---- eid firstname lastname email_id folder status sign public_key presCond
Daily xx xx xx xx xx xx NULL NULL p1
Privacy xx xx xx xx xx xx xx xx p2
Group xx xx xx xx xx xx NULL NULL p3
Enable xx xx xx xx xx xx xx xx p4
Disable xx xx xx xx xx xx NULL NULL p5

3. For v_message

v_message(mid, sender, date, message_id, subject, body, folder, is_signed, is_encrypted, is_from_remailer, presCond) 

mid sender date message_id subject body folde is_signed is_encrypted is_from_remailer presCond
xx xx xx xx xx xx xx NULL NULL NULL p1
xx xx xx xx xx xx xx xx xx xx p2
xx xx xx xx xx xx xx xx xx NULL p3
xx xx xx xx xx xx xx xx xx xx p4
xx xx xx xx xx xx xx NULL NULL NULL p5

4. For v_recipientinfo:

v_recipientinfo(rid, mid, rtype, rvalue, presCond) 

rid mid rtype rtype rvalue presCond
xx xx xx xx xx p1
xx xx xx xx xx p2
xx xx xx xx xx p3
xx xx xx xx xx p4
xx xx xx xx xx p5

5. For v_referenceinfo:

v_referenceinfo(rfid, mid, reference,presCond)  Just like the table v_recipientinfo, we insert presCond for corresponding product data.

6. For other tables:

For that rest of tables, since the whole table depends on a specific features, we will populate the table with user's email data based on which product group they belong to.

References

  1. Fundamental Nonmodularity in Electronic Mail
  2. SPLverifier: Model Checking of Software Product Lines
  3. Feature-Aware Verification
  4. SPL2go-Email JML
  5. SPL2go-EmailSystem_C
  6. SPL2go-EmailSystem_Java

Clone this wiki locally