-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathproposal.tex
1716 lines (1478 loc) · 72.6 KB
/
proposal.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[11pt]{article}
%\documentclass[11pt,oneside,openany]{book}
%\usepackage[margin=1in]{geometry}
%\usepackage{prelude}
\usepackage[hyphens]{url}
\usepackage[implicit=true,%
bookmarks=false,%
bookmarksopen=false,%
pdfpagemode=UseNone,%
pageanchor=false,
colorlinks=false,%
pdfborder={0 0 0},%
plainpages=false,%
pdfpagelabels=true,%
pdfpagelayout=SinglePage]{hyperref}
\usepackage[sort&compress,numbers]{natbib}
\usepackage{skull}
\usepackage{ragged2e}
\usepackage{fancyhdr}
\usepackage{xspace}
\usepackage{fullpage}
\usepackage{graphicx}
\usepackage[capitalize,sort&compress]{cleveref}
\usepackage{mathptmx}
\usepackage[scaled=0.83]{berasans}
\usepackage[scaled=0.83]{beramono}
\usepackage[colorinlistoftodos]{todonotes}
\usepackage{alltt}
\usepackage[OMS,OML,T1]{fontenc}
\usepackage{textcomp}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{paralist}
\usepackage{wrapfig}
\usepackage{setspace}
%\singlespacing
\newcommand{\reffig}[1]{Figure~\ref{#1}}
\newcommand{\refsec}[1]{Section~\ref{#1}}
\renewcommand{\bfdefault}{b} % used to eliminate most
\renewcommand{\sldefault}{it} % "font not found" warnings
\usepackage{tikz}
\usetikzlibrary[backgrounds,calc,positioning]
\usepackage{enumitem}
\begin{document}
\author{Hannah Quay-de la Vallee}
\title{By the People, For the People: \\ Community Ratings of App Privacy \\ \ \\ Thesis Proposal}
\date{May 19, 2015}
\maketitle
\maketitle % PROPOSAL
\newpage
\tableofcontents % PROPOSAL
\newpage
\doublespacing
\setcounter{page}{1}
\pagestyle{empty}
\hypersetup{pageanchor=false}
\pagestyle{plain}
\begin{abstract}
Apps use access to hardware resources and sensitive
user data to enable users to customize their devices.
Because these apps are usually written by untrusted
third-parties, they become significant attack vectors
on users' security and privacy. To protect themselves,
users need to be able to control apps' access to
sensitive resources. Most systems thus give users
the impression of control by requiring some form of
consent to access resources,
such as the install-time permissions on Android.
Unfortunately, many users lack the
information and expertise needed to make informed
decisions. The proliferation of app stores beyond
phones and browsers to cars, watches, and more will
only exacerbate this problem.
I propose a marketplace that employs user ratings for
app permissions as a mechanism to inform users about
the potential risks of these permissions. This allows
users with opinions and concerns about app permissions
to share their views with other users. This thesis will
design and build such a marketplace. It will use
crowdsourcing to validate the interface design and it will
gather and compare permission ratings from a variety of
sources (such as crowd workers, college students, security
experts, and automated systems). It will investigate how to customize
the marketplace to make it better suited to each user.
Along with building the marketplace, this thesis will
also examine how user ratings can help users make better
selections and assist developers in meeting user needs.
\end{abstract}
\doublespacing
\section{Introduction}
\emph{\paragraph{Thesis Statement}
App stores should include user ratings of privacy as a criterion
for sorting apps. This can encourage users to use more privacy-respecting
apps. In the absence of user-provided ratings, crowdsourcing can be
used to gather ratings.
%By incorporating ratings of apps' privacy, an app marketplace can provide
%users with tools to search for apps that meet their privacy needs.
%Ratings of apps' privacy, in addition to ratings of functionality,
%can enable an app marketplace that incorporates user privacy into
%the app ranking process.
}
Apps have become pervasive
in consumers' lives
\cite{gplay-50-billion, apple-50-billion}.
Most commonly associated with smartphones and tablets,
such as iPhones, iPads and Android devices, the app model now has much
wider adoption, appearing in a variety of domains,
such as browsers (in the form of extensions), PC operating
systems, smartwatches (such as the Pebble and the Apple Watch),
home automation (like Microsoft's HomeOS~\cite{ms-homeos}),
and cars (such as Apple CarPlay~\cite{apple-carplay}).
%TODO: New citations (like Microsoft's HomeOS \cite{ms-homeos})
%(such as \emph{iOS in the Car} \cite{cars-apple} and Android's
%partnership with Audi, GM, and Honda \cite{cars-google}).
Most app ecosystems are supported by a store or
\emph{marketplace}---a central repository that
enables users to search for, browse,
investigate, and install apps on their devices.
Apps and app stores allow users of a broad range
of technical ability to customize their devices.
However,
because of the amount of user information associated with these
devices, installing third-party apps presents
risks to user security and privacy. Some of these risks are malware apps or
apps that highjack legitimate apps
to perform malicious operations. These problems are the topic
of extensive research
\cite{droidrisk-2013, android-repackaged-CODASPY12, comDroid-MOBISYS11},
%TODO: Update citations
but that research does not address a different,
important problem: How can systems help users make informed privacy and
security decisions about apps based on the permissions that the apps
require?
In this thesis proposes that
\emph{an app marketplaces should incorporate ratings of apps' privacy to
sort apps, which will encourage users to install more privacy-respecting
apps. Furthermore, if there is a dearth of user ratings, the ratings can
be crowdsourced.}
%\emph{ratings of app
%privacy can enable an app marketplace that incorporates user
%privacy into the app ranking process.}
%This could help users
%find apps that align with their privacy needs.
Many app platforms currently try address the issue of
user privacy by controlling apps' access to hardware resources
and user data with some type of permissions system,
and require user consent before an app can use any
of those resources and data. Permission models and their presentation
vary across operating systems. Take, for example, the permission systems
for Android and iOS, the two most popular smartphone operating systems.
Android takes a ``static'' approach, in that apps are given a
\emph{manifest} of
permissions at installation time, and may use any of them at will at runtime.
Users must approve these during the installation
process. In contrast, iOS uses a ``dynamic''
approach: permissions are not listed or approved up-front; instead, users must
approve permissions at runtime as an app tries to use them.
There are advantages and disadvantages to each approach, but
this proposal will use the static model. For one thing, only in
knowing all permissions can users reason about their potential
combinations. Furthermore, asking for permissions
during execution can result in consent that might not otherwise
have been given, because users are likely to grant whatever appears
necessary to complete a the task at hand \cite{phisher-wanings-SIGCHI08}.
In contrast, the static
model, in principle, enables a more contemplative and
informed approach to managing apps based on permissions.
In theory, requiring users to approve permissions gives them control over their
data, but in practice users often do not have the
information or expertise to meaningfully consent to an
app's required permissions. Users may not
know why the given app requires a certain permission, or
even what it means \cite{android-attention-SOUPS12}.
Context is critical:
for instance,
a ``share with social media'' permission is necessary for a social
networking app but may be utterly
unsuitable for a medical monitoring app.
For those users who do have opinions about an app's permissions,
there is no structured method to
communicate their thoughts---comments to this effect are often found in
the app's reviews, but these comments can be difficult for other
users (especially uninformed ones who would like guidance), and even
developers, to find. In turn, developers have no structured way to
respond to such comments and justify their apps' needs.
%\footnote{Sometimes,
%seemingly unnecessary permissions can appear to have justifications.
%TODO: How do I reference SK here?
%For instance, the I communicated
% with the developers of Dropbox, asking why it needed ``Phone calls
% -- read phone state and identity''. Their response was, ``[T]he
% phone state \& identity is simply because we need a unique ID for
% each cell phone, so we use the IMEI to keep track of phone/user
% combinations.'' This arguably points to a weakness of the Android
% APIs, which others have noted \cite{effectivness-perms-USENIX11}
% \cite{septa-perm-explain}.} Finally, this entire problem
%is greatly exacerbated by continuously growing manifests.
We are building a crowdsourced app marketplace that allows users to review
each of an app's individual \emph{permissions}, rather than just the app's functionality.
It also allows developers to
explain why they requested a permission---sometimes due to the
weaknesses of a permission model, such as excessively coarse permissions
\cite{permission-tracking-UBICOMM12}---and how they intend to use it. This both enables more
informed user decisions and creates accountability of a developer's intent.
%This would allow users to communicate in a structured way with one
%another and with developers. These dialogues can spur developers to
%adjust their app's permissions to better meet uses' need.
%An incident with Avis illustrates a (sadly,
%rare) success. Avis added ``List of Running Apps'' to the permissions
%required by their Android app. Users protested the change in the
%reviews, with numerous comments such as, ``I refuse to update and will
%likely uninstall until you can justify the need to access my `Running
%Apps'.'' In response to the backlash, Avis removed the offending
%permission in later versions of the app. Avis explained the removal on their app's Google Play
%page, stating:
%\begin{quote}
% We've updated the permissions in this version to not require List of
% Running Apps. We had that in place to help analyze and improve app
% performance, but removed it due to your voiced concerns.
%\end{quote}
%My goal is to make such successes commonplace. I intend to leverage
%the willingness people already demonstrate to provide ratings, and
%their familiarity with mechanisms like rating systems, by building systems to rate
%\emph{permissions}. These ratings then serve as a guidepost to other
%users as they choose which apps to install.
%Indeed, the ratings can be used to \emph{rank} apps
%so that more privacy-sensitive apps are rewarded by appearing higher,
%while questionable ones percolate down in the list. In turn, a
%permission-rating app interface becomes a channel for developers to
%convey their needs and intent back to users, thereby enabling more
%informed decisions. The feedback of developers also documents their
%intent, which can be cross-checked by other tools (e.g., program
%analyses).
\section{Preparatory Work}
\label{sec-prep-work}
For my thesis, I will develop a marketplace that employs user-provided
ratings to sort apps, and displays these ratings to other users
to assist them in their app selection process. For such a marketplace
to be useful, it requires a couple of things: it must have
user ratings, and it must present those ratings to users in
an easy-to-understand way. In Sec~\ref{subsec-seeding} I will
discuss how to gather ratings to seed the marketplace so that
it can be immediately useful to users, and in Sec~\ref{subsec-perm-ui}
I will show how we designed an interface to present these ratings
to end users. I will discuss future work on the marketplace in
Sec~\ref{subsec-the-apps}.
\subsection{Seeding the Marketplace}
\label{subsec-seeding}
Because the marketplace relies on user ratings, it presents
a chicken-or-egg problem: In order to get users
to employ systems that show ratings, there must already be present
ratings that offer some value to the user; but users would have to be
using a rating app already in order for it to collect user
ratings. We therefore wanted to investigate other avenues to
acquire ratings for the marketplace.
Because we would prefer to have a large number of ratings,
crowdsourcing seemed a natural fit. However, crowdsourcing via
platforms like Amazon's Mechanical Turk~\cite{amazon-mturk}
also raises various concerns. Will
workers take the rating task seriously? Will they give ratings that
are actually distinct enough to reveal differences? Will they rate the
permissions in the context of a given app? This last point is subtle: permissions are
not inherently ``good'' or ``bad'', but acquire meaning in the context
of the app's intended purpose.\footnote{For example, we
downloaded an app that helped turn on and off individual permissions
on the basis of perceived threat. However, the threat was not
specific to the app. Thus, on the very first app we examined---that
for Google+---the permission marked as most dangerous was that to
share to social streams\dots which is the very point of the app.}
Based on these concerns, we did several preliminary studies designed to answer
the following questions:
\begin{itemize}
\item Could we gather a large number of conscientious ratings through crowdsourcing?
\item Would those ratings be useful?
\item What are some factors that might affect how users perceive and rate
app permissions?
\end{itemize}
We deployed a series of studies on Mechanical Turk. We chose
Mechanical Turk because it can be cost effective, and has become
a common platform for academic research, which provided a body of work on how
to best use it \cite{reseach-mturk-BRM12, mturk-data-quality-PPS11}.
Our study presented subjects with surveys that
described an app, preceded by a motivating paragraph asking
them to imagine that they came by this app in some way and needed an
app of that functionality. We used fourteen apps: Facebook, GMail,
Pandora, Angry Birds and ten weather apps. The descriptions
of the apps were taken from their Google Play pages, along with their
permission information. Given this information, users were asked
whether they would download the app, and to rate each of the app's
permissions as either ``acceptable'' or ``unacceptable,'' and were
given an optional text box to explain each of their ratings.
Once we had subjects completing tasks, we needed to ensure that they
were real humans who were actually answering the questions we
asked. We manually reviewed the responses from the text boxes
that allowed users to explain their ratings. Overall, we found that
many subjects did provide explanations for their ratings despite this being
optional. Furthermore, their responses were relevant to the
permissions being discussed, indicating that the responses were from
real people thinking about the task. Because these were preliminary
studies, we did not do any significant analysis on the text answers,
but will in proposed work. We will further
validate our Mechanical Turk results by comparing them with results
from other populations such as students and security experts.
There are also other measures we will
use to increase confidence in our Mechanical Turk data, such as test
questions (like, ``which Android phone do you have?'').
\begin{figure*}[t]
\centering
\includegraphics[width=.6\linewidth]{img/RatingTable.png}
\vspace{1pt}
\caption{The range of approval ratings for the permissions from the
weather apps surveyed}
\label{weatherratings}
\end{figure*}
Next, we looked for variations in ratings between permissions. To
compare within a domain, we focused on the weather apps. Across the
ten apps, a total of thirteen permissions were used. These were rated
as shown in Fig.~\ref{weatherratings}. Although the approval rate for
every permission was over 50\%, approval varied from 58.8\% to
82.5\%. The range of ratings suggests that users were actually
distinguishing between the permissions (not treating them all
uniformly, which might be another indication of not taking the rating
task seriously or not understanding it); we also found correspondence
between workers' ratings and their textual comments.
To investigate factors that might affect user ratings, we looked only at Facebook, Gmail,
Pandora, and Angry Birds. We used the same basic surveys, but
varied specific conditions in the survey. The conditions
were: asking half the subjects if they would download the app before
they were asked to rate the permissions and asking the other half after;
varying how the subject supposedly discovered the
app (on recommendation from a colleague, because it was a featured app in
the app store, or because it was highly rated in the app store); and
varying whether the app was ``brand-name'' or generic. To create the generic
apps, we reused the Play Store descriptions of the apps but replaced all
instances of the app's name with a generic name. For example, GMail
became MongogoMail. Additionally, we changed any obvious identifiers,
so the pigs and birds from Angry Birds became warriors and invaders.
Varying where subjects were asked whether they would download the app
(either before or after rating the permissions) was
meant to investigate whether users were more or less likely to
download an app if they had been primed to think about privacy by
rating the permissions. We found that there was no significant change
in either the percentage of subjects who would download the apps, or the
ratings they provided.
Varying how the subject was asked to imagine they discovered the app
could affect their opinion of the permissions, or their willingness to
download the app. In this case, only Facebook showed any interesting
results: respondents were less likely to install the app if it
had been recommended by a colleague than if it was featured or highly
rated. We found this result is odd given that, due to the network
effect of an app like Facebook, we would have expected the app to be more
valuable if friends or colleagues also use it. However, it did not
seem to be a pervasive effect.
Varying branding did not have a significant effect on downloads or
ratings for Pandora and Angry Birds, but did
for GMail and Facebook. In both cases,
participants rated the generic version's permissions as less
acceptable. However, for GMail, a lower percentage of users said they
would install the app, but this was not true of Facebook. These
findings suggest that branding may be an important part of users'
feelings about an app. However, these results also raise questions
about how privacy and functionality interact in user decisions. Are
people less approving of these generic apps because they think they
are unsafe, or is their perception of the app altered because they
feel they will not have access to their
existing email and social network accounts,
and so do not see the permissions as worth the risk?
To separate access concerns from privacy concerns, we did a follow-up
study asking subjects to evaluate an app that was an
interface over a brand-name app. For instance,
subjects were presented with Gmore!, an app purporting to offer a
smoother interaction with one's GMail account. Again, subjects rated
the off-brand apps' permissions as less acceptable, and a lower
percentage of users said they would use Gmore!. This suggests that
subjects were concerned about the privacy of the apps, not just their
functionality.
Our preliminary studies suggest that we will be able to gather
meaningful ratings from Mechanical Turk, allowing us
to seed data in a privacy-centric marketplace, and hence attract users
to it. Further, our early examination of these ratings
suggests insights into how users think about their privacy, and reveals
factors that affect their opinions. These insights may be useful for
further development of the marketplace, and may also be more broadly
useful to the privacy research community.
\subsection{The Permission User Interface}
\label{subsec-perm-ui}
Because we want to communicate rating information to users,
the interface for displaying the ratings is another critical component of
our marketplace. The interface should help users understand the
riskiness of individual permissions so they can make \emph{informed}
decisions without requiring significant
effort. Ideally, it would be
intuitive enough that users could understand it without too
much direction. \reffig{grade-perms} is an example of what such an interface
might look like.
Designing such an interface proved surprisingly subtle.
Our original designs, based on existing security
metaphors, failed to convey the desired information.
Indeed, we found that some of them \emph{actively
mislead users} (\refsec{sec-ui-design}).
We also unearthed some common patterns of interface
confusion. In the end we found three designs that most
subjects understood, and
conducted a large user study to confirm this
(\refsec{s-sec-largescale}). One of these designs will be used in the new
marketplace.
\subsubsection{Exploratory Interface Design}
\label{sec-ui-design}
To find a functional interface, we designed several prototypes and
leveraged Amazon's Mechanical Turk platform to give us rapid feedback
on those prototypes.
\paragraph{Methodology}
\label{subsec-small-methods}
\begin{wrapfigure}{l}{0.5\textwidth}
\begin{center}
\includegraphics[width=.8\linewidth]{img/GradesPerms.png}
\end{center}
\caption{A prototype interface for permission ratings.}
\label{grade-perms}
\end{wrapfigure}
We explored each prototype with a survey on Mechanical Turk.
These surveys were
intended to expose broad conceptual problems
in the interfaces, so we recruited only 10 to 12
subjects per interface. The surveys focused on two issues: how well subjects
understood the purpose and meaning of the interface absent any explanation,
and whether subjects understood where the ratings came from.
During each study, subjects were shown a mock-up of
a candidate interface. \reffig{grade-perms}
is an example of such a mock-up. The mock-ups displayed the full permissions
interface for a fictional app called Find a Pharmacy, which appeared
to be developed by the (also fictional) company ApexApps.
We chose a pharmacy locator app because it
could pose a privacy risk to a user (if, for example, it stored a list
of the user's medications for refill reminders), but
would be unlikely to offend any
subjects. Each mock-up used different iconography to
present the user permission ratings (which were also fake), but the permissions
and their rating values were the same or comparable across interfaces.
The mock-ups were presented as static images that were tall enough
not to require scrolling. (Because the rating icons
varied in size, the mock-ups varied in height.) This was both
to ensure subjects did not miss any of the iconography by failing to
scroll, and to avoid distraction induced by interaction.
Upon being presented with one of the mock-ups, subjects were
asked to explain, in a free-response text box,
what they thought the icons next to the permissions
meant. Subjects were given no information about the
purpose of the interface. The next page of the survey
told them that the icons were privacy
ratings and asked them to rate how clear this was from
the interface, on a 4-point Likert-type scale.
We manually examined
the text responses to identify conceptual problems with each
interface, whereupon we either attempted
to redesign the interface to address issues raised by subjects,
or we decided the interface was not viable and disqualified it.
Using this process we eliminated all but three interfaces, which we evaluated
in a larger study (\refsec{s-sec-largescale}).
To understand subjects' beliefs about the ratings' source, we asked
whether they thought the ratings came from
``other Android users'', ``independent
security experts'', ``a review team at Google'', or
``don't know''. I will discuss the outcome of this question
before delving into the individual interfaces.
\paragraph{The Source of the Ratings}
\label{subsec-small-source}
If users are going to trust the ratings enough to use them,
they are necessarily placing trust in the raters, so it is
important that users understand the \emph{source}
of the ratings.
We found that most of the interfaces failed to
convey to subjects that the ratings were from other Android users.
This is therefore something that should be considered in the
design of the complete marketplace.
\label{ss-sec-stars-r1}
\begin{wrapfigure}{l}{0.5\textwidth}
\begin{center}
\includegraphics[width=.9\linewidth]{candidate-img/stars/starsR1.png}
\end{center}
\end{wrapfigure}
\paragraph{Stars}
\label{s-sec-stars}
A five-star system is possibly the most common iconography for user ratings,
and is already in use in the Google Play store to display apps' overall
functionality ratings. It is therefore a natural basis for experimentation.
% \,\protect\includegraphics[height=10pt]{img/TinyStars.png}}
Possibly due to the ubiquity of five-star ratings,
subjects seemed to have preconceptions
about the meaning and source of the ratings. This proved to be both an
advantage and a disadvantage. On the positive side,
subjects correctly understood the source of
the ratings (other Android users), and that more stars corresponded to a
better rating.
Unfortunately, subjects' association with stars as a \emph{functionality} rating
was \emph{too} strong. Many subjects thought the ratings indicated how well
the permissions' services worked. For example, some subjects thought the rating next
to ``Network Communication'' showed the strength of the network signal.
In order for the star ratings to effectively communicate the meaning of the permission
ratings, users would have to understand that the same icon
on the same page had two different meanings (the app's functionality
rating and the permission ratings). This potential for user confusion led
us to eliminate this interface.
However, it did inspire interfaces using privacy-relevant symbols rather than stars, with
the intention of
leveraging users' existing understanding of an out-of-five system while
expressing that the ratings are about privacy.
\begin{wrapfigure}{l}{0.5\textwidth}
\begin{center}
\includegraphics[width=.9\linewidth]{candidate-img/locks/locksR1.png}
\end{center}
\end{wrapfigure}
\paragraph{Locks}
\label{s-sec-locks}
One symbol we used place of stars was locks, a common visual metaphor for protection.
Our original lock design used yellow locks over a grey background:
% \,\protect\includegraphics[height=10pt]{img/TinyLock1.png}}
%\label{ss-sec-locks-r1}
%\begin{center}
%\includegraphics[width=.5\linewidth]{candidate-img/locks/locksR1.png}
%\end{center}
This design caused a number of
misconceptions.
First, although most subjects understood that the locks were privacy
ratings, some thought they meant that
the permission's service was restricted. (This may
stem from the practice by developers of using locks to mark features of
an app that must be purchased or earned before they can be used.)
Second, those subjects who \emph{did} understand that the locks represented privacy ratings
could not tell whether more yellow locks denoted a better or worse
rating. This is troubling, because it would cause users to think the most
dangerous permissions were the safest. We label this confusion,
present in many interfaces, the
\emph{better-or-worse} phenomenon, and discuss it more at the end
of this section.
\newpage
\begin{wrapfigure}{r}{0.5\textwidth}
\begin{center}
\includegraphics[width=.9\linewidth]{candidate-img/locks/locksR2.png}
\end{center}
\end{wrapfigure}
The second lock interface, drawing from the traffic light interface
%(\refsec{ss-sec-traffic-r2})
(presented later in this section), tried to eliminate the better-or-worse
phenomenon by using red and green locks.\footnote{For this
study, we chose colors
compatible with red-green colorblindness, but any deployed system
should address the full spectrum of colorblindness.}
% \,\protect\includegraphics[height=11pt]{img/TinyLock2.png}}
\label{ss-sec-locks-r2}
%\begin{center}
%\includegraphics[width=.5\linewidth]{candidate-img/locks/locksR2.png}
%\end{center}
To further reinforce the message of privacy, the
green locks were closed and the red locks open.
We also hoped that using color would reduce
the perception that the locks indicated restricted services (in which
case \emph{fewer} locks would be preferable).
Though these changes helped curtail the better-or-worse phenomenon, they did not
eliminate it entirely.
\begin{wrapfigure}{l}{0.5\textwidth}
\begin{center}
\includegraphics[width=.9\linewidth]{candidate-img/locks/locksR3.png}
\end{center}
\end{wrapfigure}
Because the better-or-worse phenomenon was at least partially caused
by confusion about whether more or fewer icons was better, we
replaced the out-of-five system with
a single lock next each permission, and relied on
color and open-ness to convey the rating:
% \,\protect\includegraphics[height=10pt]{img/TinyLockGreen.png}\,
%\protect\includegraphics[height=10pt]{img/TinyLockYellow.png}
%\protect\includegraphics[height=10pt]{img/TinyLockRed.png}}
\label{ss-sec-locks-r3}
%\begin{center}
%\includegraphics[width=.5\linewidth]{candidate-img/locks/locksR3.png}
%\end{center}
Using
only red and green locks would have been too similar to the checkbox
interface (discussed below),
%(see \refsec{ss-sec-binary-r1}),
which had resulted in dangerous misunderstandings by subjects.
To avoid this, the interface also used half-open yellow
locks. This
had the additional benefit of conveying more
information than just red and green locks without adding much cognitive
effort.
\newpage
\begin{wrapfigure}{r}{0.5\textwidth}
\begin{center}
\includegraphics[width=.9\linewidth]{candidate-img/locks/locksR4.png}
\end{center}
\end{wrapfigure}
This redesign improved understanding, but
some subjects still thought that the locks indicated inaccessible features.
To further clarify the icons' meaning, we grouped
permissions by rating and added explanatory
text alongside the icons, drawing
from the design of the first traffic light interface.
%(\refsec{ss-sec-traffic-r2}).
(Additionally, we hoped introducing the word ``voted''
would also clarify the source of the ratings by emphasizing that they
were an aggregate of community opinions.)
% \,\protect\includegraphics[height=10pt]{img/SmallLockGreen.png}\,
%\protect\includegraphics[height=10pt]{img/SmallLockYellow.png}
%\protect\includegraphics[height=10pt]{img/SmallLockRed.png}}
\label{ss-sec-locks-r4}
%\begin{center}
%\includegraphics[width=.5\linewidth]{candidate-img/locks/locksR4.png}
%\end{center}
The final lock interface was an improvement over its
predecessors,
but some subjects still thought the locks indicated
availability. Since locks
%The final lock interface was better than its
%predecessors, but some subjects still thought the locks indicated
%availability. Since locks
performed worse than percentage bars (presented below)
%(\refsec{ss-sec-pbars-r4})
and traffic signs,
%(\refsec{ss-sec-traffic-r4}),
we eliminated this interface family.
\begin{wrapfigure}{l}{0.5\textwidth}
\begin{center}
\includegraphics[width=.9\linewidth]{candidate-img/eyes/eyesR2.png}
\end{center}
\end{wrapfigure}
\paragraph{Eyes}
\label{s-sec-eyes}
Continuing our exploration of other symbols in an out-of-five rating, this
interface used eyes in the place of stars. Our first icon, which
used a no-smoking-style circle-and-slash over an eye,
proved too difficult to see at small scale
(some subjects thought it was a watch). We thus tried different-color eyes:
\label{ss-sec-eyes-r2}
%\begin{center}
%\includegraphics[width=.5\linewidth]{candidate-img/eyes/eyesR2.png}
%\end{center}
The more dangerous a permission was, the more red eyes it had; the more
benign it was, the more grey eyes it had.
Additionally, the red centers had
the appearance of a red recording light as seen on a camera.
Though subjects could now see the icon, this interface exhibited
the better-or-worse phenomenon. One possible cause is that
the grey eyes looked more like actual eyes, and so subjects thought that more
grey eyes meant more surveillance.
We tried various redesigns such as grouping permissions by rating with a
text header
%(as in \refsec{ss-sec-locks-r4})
and introducing a yellow eye category. Though these changes helped,
percentage bars
%(\refsec{ss-sec-pbars-r4})
and traffic signs
%(\refsec{ss-sec-traffic-r4})
were still better understood by subjects,
so we disqualified this interface.
\begin{wrapfigure}{l}{0.5\textwidth}
\begin{center}
\includegraphics[width=.9\linewidth]{candidate-img/masks/masksR1.png}
\end{center}
\end{wrapfigure}
\paragraph{Guy Fawkes Masks}
\label{s-sec-masks}
We also explored out-of-five ratings using Guy Fawkes masks, which
were popularized by the graphic novel \emph{V for Vendetta} and its
film adaptation, and have become a symbol for personal
privacy and activism.
% \,\protect\includegraphics[height=12pt]{img/TinyMasks.png}}
\label{ss-sec-masks-r1}
%\begin{center}
%\includegraphics[width=.5\linewidth]{candidate-img/masks/masksR1.png}
%\end{center}
Unfortunately, subjects felt the
rating showed how well protected their information was from the government
(possibly due to the ``hacktivist'' group Anonymous' adoption of the mask as a symbol).
As this is not a protection a permissions system can provide and it is dangerous for an
interface to suggest protections that do not exist, we eliminated all
variations of this design.
\begin{wrapfigure}{r}{0.5\textwidth}
\begin{center}
\includegraphics[width=.9\linewidth]{candidate-img/checkboxes/checkboxesR1.png}
\end{center}
\end{wrapfigure}
\paragraph{Binary Checkboxes}
\label{s-sec-checkbox}
As we wanted to convey information without demanding much cognitive effort,
we designed a simple interface in which
each permission was given either a green
checkmark indicating users approved of the permission or a red X indicating
they did not approve.
% \,\protect\includegraphics[height=10pt]{img/TinyCheck.png}\,
%\protect\includegraphics[height=10pt]{img/TinyEx.png}}
\label{ss-sec-binary-r1}
%\begin{center}
%\includegraphics[width=.5\linewidth]{candidate-img/checkboxes/checkboxesR1.png}
%\end{center}
Unfortunately, we discovered a very significant confusion: in this interface
the red X was meant to indicate a potentially invasive permission, but
subjects thought it meant that the given permission had been \emph{disabled}. This
is an extreme case of the better-or-worse phenomenon and is an
alarming misconception. We therefore eliminated this interface without attempting to
redesign it.
\begin{wrapfigure}{l}{0.5\textwidth}
\begin{center}
\includegraphics[width=.9\linewidth]{candidate-img/grades/gradesR1.png}
\end{center}
\end{wrapfigure}
\paragraph{Grades}
\label{s-sec-grades}
Drawing on another iconography, this
interface used letter grades to present the ratings.
Typically used to rate students' academic performance,
grades are also used in some non-educational settings (e.g.,
the New York City Department of Health restaurant
inspection results).
% \,\protect\includegraphics[height=10pt]{img/A.png}\,
%\protect\includegraphics[height=10pt]{img/C.png}\,
%\protect\includegraphics[height=10pt]{img/D.png}}
\label{ss-sec-grades-r1}
%\begin{center}
%\includegraphics[width=.5\linewidth]{candidate-img/grades/gradesR1.png}
%\end{center}
Unfortunately, most subjects thought the ratings were for
the functionality of a permission's service. As this interface failed
in its primary purpose, we eliminated it.
\begin{wrapfigure}{r}{0.5\textwidth}
\begin{center}
\includegraphics[width=.9\linewidth]{candidate-img/bars/barsR1.png}
\end{center}
\end{wrapfigure}
\paragraph{Percentage Bars}
\label{s-sec-pbars}
Eschewing existing privacy and safety metaphors,
this interface used rectangular bars to indicate
the percentage of raters who considered a given permission to be acceptable.
This style of rating conveys more information than the other
interfaces, and therefore carries a greater risk of overwhelming a user. To mitigate this
issue, the bars were colored red, yellow, or green depending on the permission's approval
rating, giving a more obvious visual distinction between ratings:
% \,\protect\includegraphics[height=8pt]{img/Bars12/GreenWhiteBar.png}\,
%\protect\includegraphics[height=8pt]{img/Bars12/YellowWhiteBar.png}\,
%\protect\includegraphics[height=8pt]{img/Bars12/RedWhiteBar.png}}
\label{ss-sec-pbars-r1}
%\begin{center}
%\includegraphics[width=.5\linewidth]{candidate-img/bars/barsR1.png}
%\end{center}
Subjects understood that the bars indicated privacy ratings,
and this interface did not suffer from the better-or-worse
phenomenon, due in part to the colors of the bars.
One subject stated the bars rated the permissions from ``most risky to the least,
red being the highest and the green being generally safe''.
Although the bars were effective, subjects' feedback on
the traffic signs interface
revealed a potential pitfall: their comments suggested that subjects
perceived a green light as a signal to proceed without caution, which could
encourage users to download an app without considering the permissions at all.
We were concerned the green bars
could have the same over-soothing effect.
To encourage caution in all cases, we modified the interface to use
red, orange, and yellow bars. This interface had
two variations. In both, more dangerous permissions had red
bars and less dangerous permissions had yellow bars. In the first variant, the more
dangerous a permission, the fuller its bar would be (showing the percentage of
raters who deemed the permission \emph{unacceptable}). These bars might look like \includegraphics[height=8pt]{img/Bars3/RedBig/YellowWhiteBar.png}\,,
\includegraphics[height=8pt]{img/Bars3/RedBig/OrangeWhiteBar.png}\,, and
\includegraphics[height=8pt]{img/Bars3/RedBig/RedWhiteBar.png}. In the second variation, the more
dangerous a permission, the more empty its bar (showing the percentage of raters who
deemed the permission \emph{acceptable}). These bars would look like
\includegraphics[height=8pt]{img/Bars3/RedSmall/YellowWhiteBar.png}\,,
\includegraphics[height=8pt]{img/Bars3/RedSmall/OrangeWhiteBar.png}\,, and
\includegraphics[height=8pt]{img/Bars3/RedSmall/RedWhiteBar.png}.
Both versions of this interface introduced the
better-or-worse phenomenon. It is possible that,
because all of the colors were ``warning colors'', the effectiveness of the color
differentiation was diminished. Additionally
these colors could cause warning fatigue after continuous use.
\begin{wrapfigure}{r}{0.5\textwidth}
\begin{center}
\includegraphics[width=.9\linewidth]{candidate-img/bars/barsR4.png}
\end{center}
\end{wrapfigure}
To avoid these problems, we
introduced two-color bars. As before, each bar had some percentage of
a warning color, (the percentage of raters who deemed the permission
unacceptable for the app), however the rest of the
bar was green, to clarify meaning and limit warning fatigue.
There were four variants:
\label{ss-sec-pbars-r4}
%\begin{center}
%\includegraphics[width=.85\linewidth]{candidate-img/bars/barsR4.png}
%\end{center}
The first two interfaces used only red
and green (with two variants: red on the left or red on the right),
so the goodness of a rating was indicated only by the ratio of red to green.
Unfortunately, subjects thought these bars were
progress bars or ratings of the
permission's service quality.
The second two interfaces used red, orange, and yellow
along with the green, so the goodness of the permission
was indicated both by the ratio of the warning color to green \emph{and} by the warning color
used. As with the red-green interfaces, one of the interfaces had the green on the left (so
\includegraphics[height=8pt]{img/Bars4/RedYellowGreen/GreenYellowBar.png}\,,
\includegraphics[height=8pt]{img/Bars4/RedYellowGreen/GreenOrangeBar.png}\,, and
\includegraphics[height=8pt]{img/Bars4/RedYellowGreen/GreenRedBar.png}),
which we will call \emph{G-ROY} bars, and the
other had the warning color on the left (like
\includegraphics[height=8pt]{img/Bars4/RedYellowGreen/YellowGreenBar.png}\,,
\includegraphics[height=8pt]{img/Bars4/RedYellowGreen/OrangeGreenBar.png}\,, and
\includegraphics[height=8pt]{img/Bars4/RedYellowGreen/RedGreenBar.png}),
which we will call \emph{ROY-G} bars.
Unlike the red-green only
bars, subjects still understood that the ratings were privacy related, and,
unlike the warning-color only bars,
they understood which ratings were better and which were worse. One subject
said of the orange bar that ``It means to me that feelings about
this permission are mixed---about half of people think
it is acceptable and half think it is not
acceptable for this app to have that permission''. Thus we
subjected these interfaces to large-scale testing (\refsec{s-sec-largescale}).
\newpage
\begin{wrapfigure}{l}{0.5\textwidth}
\begin{center}
\includegraphics[width=.9\linewidth]{candidate-img/traffic/trafficR2.png}
\end{center}
\end{wrapfigure}
\paragraph{Traffic Signs}
\label{s-sec-traffic}
The final set of interfaces we designed used traffic markers,
an iconography suggested by a subject from another interface.
%``...a stop sign, caution sign, and a green light?''
\label{ss-sec-traffic-r2}
%\begin{center}
%\includegraphics[width=.5\linewidth]{candidate-img/traffic/trafficR2.png}
%\end{center}
The traffic marker interface split the permissions into three categories, with headers above
each category.
This interface successfully communicated that the ratings were related to privacy,
but it exhibited a significant danger: the single green light gave subjects
the sense that all the permissions in the ``most acceptable'' category were
completely safe and did
not need to be examined at all, which is not necessarily the
intended meaning. Additionally, this interface
could be unsuitable for colorblind users.
\begin{wrapfigure}{r}{0.5\textwidth}
\begin{center}
\includegraphics[width=.9\linewidth]{candidate-img/traffic/trafficR3.png}
\end{center}
\end{wrapfigure}
To address colorblindness issues, we tried a variation using position
(as real traffic lights do). However, it still did not address
the problem of an overly-soothing green light. \\
%\label{ss-sec-traffic-r3}
%\begin{center}
%\includegraphics[width=.5\linewidth]{candidate-img/traffic/trafficR3.png}
%\end{center}
\begin{wrapfigure}{l}{0.5\textwidth}
\begin{center}
\includegraphics[width=.9\linewidth]{candidate-img/traffic/trafficR4.png}
\end{center}
\end{wrapfigure}
\label{ss-sec-traffic-r4}
%\begin{center}
%\includegraphics[width=.5\linewidth]{candidate-img/traffic/trafficR4.png}
%\end{center}
Rather than simply changing the colors of