Skip to content

koosha-t/WeblogSessionizationSpark

Repository files navigation

WeblogSessionizationSpark

Sessionizing Weblogs using Spark/Scala

Please go to src/main/scala/com/paytm/challenge to see my scala source codes.

I've conducted this challenge based of the following assumptions:

          (1) Sessions ONLY end when the user inactivity exceeds the inactivity threshold. The
              inactivity threshold has been set to 900 secs (15 mins) - but can be changed in
              settings object.
              
          (2) I have supposed that sessions are not ended by users and each session lasts at least as
              long as the inactivity threshold. For instance, If a client has only one activity in the
              log (their IP appears only once in the entire log), then the session is assumed to be 15
              minutes (= inactivity threshold)
              
          (3) In summary, a session ends 15 mins after the user's last activity in the log, i.e. no sessions lives
              less than 15 mins! ( I'm fair to sessions :) )

Check sampleResults.txt to see some results. This is the output of 'Sessionization.scala'.

About

Sessionizing Weblogs using Spark/Scala

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages