Skip to content

API spec review: UserActivityHistory #5260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: feature/UserActivityHistoryAPI
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
266 changes: 266 additions & 0 deletions specs/UserActivityHistory/UserActivityHistoryAPI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
UserActivityHistory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a vague concern about poisoning of history here to trick agents into doing bad things. there may be "nothing" here, but we should threat model it out.

The problem is that the user has no visibility into what UserActivityHistory items an app is saving, and the agent is probably dumb enough to be easily tricked by malformed items.

Basically, a low-privileged app (like a UWP) adds a UserActivityHistory item that claims to be something interesting (include a display string with juicy keywords). It also includes a URI that is malicious (note "malicious" might not mean it actively harms the user directly; it might be malicious in the sense that it furthers phishing attempts or something). Now when the user asks Copilot a query, Copilot finds the (fake) UserActivityHistory item and invokes it on behalf of the user, which ends up somewhere "bad."

The malicious app cant't pull this off directly itself, because either launching the bad URI either (1) is blocked by UWP security or (2) would look out-of-place when called directly by the app. But by having it open out of context, is it bad?

(Like I said, kind of a vague concern that may not be unique to agents or to this feature or whatever... just I worry about bad actors poisoning the inputs the CUA reasons over.)

===

# Background

The [UserActivity](https://learn.microsoft.com/uwp/api/windows.applicationmodel.useractivities.useractivity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update the docs on UserActivity as well to drop references to Timeline (yes, still there) and instead reference AI and stuff.

class can be used to note down and preserve a record of activities that the user
is currently doing on their computer - e.g., browsing a website, reading a Word document, etc.
This allows Windows to have insight into the application state, enabling smart experiences that are
built around the semantics of the app. For example, a document editor can give Windows information
about the document that the user is editing, so that Recall can later take the user to the document
at the same location.

To record user activity, an app uses [UserActivityChannel](https://learn.microsoft.com/en-us/uwp/api/windows.applicationmodel.useractivities.useractivitychannel?view=winrt-26100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this information here, rather than in the docs for how to use UserActivity? The person using this API wants to query existing ones, not necessarily create new ones.

to retrieve a UserActivity object via the API [GetOrCreateUserActivityAsync](https://learn.microsoft.com/en-us/uwp/api/windows.applicationmodel.useractivities.useractivitychannel.getorcreateuseractivityasync?view=winrt-26100#windows-applicationmodel-useractivities-useractivitychannel-getorcreateuseractivityasync(system-string)).
If a UserActivity with the given ID already exists, it will be returned; otherwise, a new UserActivity
object will be created and returned. You can then call the API [GetSession](https://learn.microsoft.com/en-us/uwp/api/windows.applicationmodel.useractivities.useractivity.createsession?view=winrt-26100#windows-applicationmodel-useractivities-useractivity-createsession)
to return a [UserActivitySession](https://learn.microsoft.com/en-us/uwp/api/windows.applicationmodel.useractivities.useractivitysession?view=winrt-26100)
object that tracks how long the user is engaged in that activity. This structure allows multiple
sessions to be associated with the same activity, representing the case where the user completes
that activity a bit at a time - e.g., beginning to watch a movie, then pausing, then watching more later.
These will be treated as the same singular user activity that spans multiple sessions.

UserActivityHistory is a new set of APIs that allow you to query up to the past 28 days of the
user's activity history, which will enable you to bring back content that the user has previously
been interacting with.

# Conceptual pages (How To)

The intended use case of this API is to allow you to make search queries against the user's activity history
on their local computer. The string matching in the query parameters in this API is lexical in nature,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very wordy. Suggest more like "This API uses simple string matching for searches; it does not support natural language. If you want to support natural language search scenarios, use <some other WCR API?>."

meaning that it is expected that any natural language semantic parsing of the user's input will be done by
your app prior to calling this API.

For example, if a user types in something along the lines of, "Please find the Korean recipe I was looking at
earlier today", your app might have an agentic AI parse that input and determine that the user is looking
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "agentic" meaningful here?

for a webpage that contains the keywords "Korean" and "recipe", and construct a query with those keywords,
a content type of "text/html", and an access time within the last 24 hours.

In order for an app to make use of this API, it must be Windows logo certified, and the user must provide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to "how do I get the Windows logo certification" page?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, need to take into account whatever consent model we end up with - one time, every time, does it include context, etc?

their consent to allow access to their activity history. If either of these is not the case, the API will
throw an exception.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API will fail - the exception is specific to the language projection.


# API Pages

## UserActivityHistory class

This class provides static methods that enable you to query the user's activity history.
This activity history is stored in a database managed by a local service, and these APIs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we talk about implementation? If the implementation changes, will we break something?

call out to that service to retrieve data from the database.

Here is an example usage of the class that will enable you to bring back the webpage for a
Korean recipe that the user had previously interacted with within the last day:

```c#
UserActivityHistoryQuery query = new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the first time we're seeing code, can we show the API to request access, too? Are you just relying on the AppCapability class? Although I think the UX for consent is more dynamic, so it is probably part of the API call itself. Big open question.

query.Keywords = new string[] { "Korean", "recipe" };
query.LatestStartTime = DateTime.Now.AddDays(-1);

IList<UserActivityHistoryItem> results = UserActivityHistory.Search(
new UserActivityHistoryQuery[] { query },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an overload? Seems strange to have to create an array for a single search item.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RECOMMEND: Add an overload for 1 query

UserActivityHistoryOrderBy.DwellTime,
maxResults: 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why name the parameter?


UserActivityHistoryItem item = results.FirstOrDefault();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RECOMMEND: Remove blank line

if (item != null)
{
// Now we can use item.ActivationUri to bring back the webpage in the state in which the user
// was last viewing it.
}
```

## UserActivityHistory.Search method

This method synchronously queries the user's activity history and returns a list of items matching
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to mention "synchronously" everywhere? It should be assumed that unless the API ends with "Async" that it is synchronous

the criteria specified in the `queries` parameter. The results are ordered in descending order
according by the `orderBy` parameter: either by the most recent start times, the most recent
end times, or the longest time spent on the activity.

The results will include the user activity history items that match any one of the queries passed in.
To match a given query, the item must match all of the criteria specified in that query. Any query
property that is left empty or null will be ignored. A case-insensitive lexical search will be
performed on the keywords in the query, which will match if all of the keywords are found somewhere
in the DisplayText property of the item.

This method will not perform any parsing of the keywords for semantic meaning or natural language -
it is expected that the app will have already performed that step and will pass the result of that
into this API.

A case-insensitive lexical search will also be performed on the ContentType property of the item,
which is the [MIME type](https://docs.w3cub.com/http/basics_of_http/mime_types/complete_list_of_mime_types.html)
of the resource the user was interacting with. The search will match if the ContentType of the item matches
the ContentType property of the query. The ContentType supports using an asterisk as a wildcard
to match a range of content types - e.g., "image/*" will match "image/png", "image/jpeg", etc.

## UserActivityHistory.GetAppsWithUserActivity method

This method synchronously retrieves a list of all the app names with data in the user's activity
history database. You can use this, for example, to show the user the list of apps that are being
queried against, so the user can understand why an app that is not recording user activity is not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...why an app not recording isn't showing up in the list of apps that recorded...

Typo? Should be "...what an app that is not recording user activity is not showing up..." ?

showing up in the results.

## UserActivityHistoryItem class

This class represents a single item in the user's activity history. It contains properties that
describe in what app the activity occurred, how the app described the activity, what sort of
resource was being interacted with (e.g., a document, a webpage, a video, etc.), the URI of
the resource involved in the activity, the URI that can be used to bring back the state the user
left the activity in, and the times when the user started and ended the activity.

If the user performed the same activity multiple times, there will be multiple
`UserActivityHistoryItem` objects returned, each with different start and end times.

## UserActivityHistoryItem.AppName property

This property contains the name of the app in which the activity occurred.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of name is this? a PFN? an AUMID? display name? exe path?
Is this something Windows infers from the caller, or the app provided this when recording the UserActivity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current implementation contains the exe path, but it would be even better if there were a way to get the display name. I can't immediately find one. We have to infer this from the caller; the UserActivity object does not have this property anywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use CallerIdentity or similar (e.g. CoGetCallContext) to get an AUMID. When an app receives this AUMID they can find the display name of that AUMID for display purposes. We could choose to store the display name too, because the app might get uninstalled sometime after capturing the user activity and before querying it (what happens with that app's user activity history, does it get deleted?)

In any case let's take an action item to update the wording here once we have a solid caller id implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrinkle: conversion from PFN to Display Name should happen in which context? Ideally, it is in the CUA's context so it is localized to match the CUA. But Start Menu might show a different localization so you might not be able to find it. I don't know what the right answer is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current implementation contains the exe path

Is this an unpackaged app or packaged?

If packaged you'll want to record the package full name. Given that you can lookup its DisplayName (and Logo) localized for the current user to view.

Is this historical? Does UserActivityHistory retain information recorded by apps after they're uninstalled? If so then you can't guarantee looking up the DisplayName. If so there are options but they have caveats so I'll wait to hear if relevant before saying more.

but it would be even better if there were a way to get the display name

p = packageManager.FindPackageForUser("", pkgfullname)
string displayName = p.DisplayName

returns the package's DisplayName localized for the calling user. Are there cases where the package isn't registered for the calling user?


## UserActivityHistoryItem.ActivityId property

This property contains the ID of the activity, which can be used to collate multiple sessions
of the same activity. For example, if the user watched a movie in multiple sessions, the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is watching a video an actual scenario supported by any apps we know that report Activity History? Is it the most interesting one?

I would expect a more obvious one would be opening the same Word document 5 times in a week, and them all being related somehow. Or visiting the same website (like your e-mail) every day. And so on.

`ActivityId` property can be used to identify how long in total the user spent watching that movie.

## UserActivityHistoryItem.DisplayText property

This property contains a string that is how the app chose to describe the activity. For example,
if the activity was reading the contents of a webpage, this property might contain the webpage's title.

## UserActivityHistoryItem.ContentType property

This property contains the MIME type of the content being interacted with. For example, if the user
was looking at a PNG image, this property would contain the string "image/png".

Note that this is a string property, not an enum, so apps that populate this property
do not necessarily have to use existing recognized common MIME types.

## UserActivityHistoryItem.ContentUri property

This property contains the URI of the content being interacted with. For example, if the user was
looking at a webpage, this property would contain the URI of that webpage.

## UserActivityHistoryItem.ActivationUri property

This property contains the URI that can be used to bring back the state the user left the activity in.
For example, if the user was looking at a webpage, this property would contain the URI of that webpage
with additional information, such as what the scroll position was, etc.

If this property is not populated on the UserActivity object, then we won't add it to the user activity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the logic be that if any of the properties [ ContentURri, ActivationUri, FallbackUri] are set, it will be in the database? The current wording implies you need more than one of them.

Then ideally for UserActivityHistoryItem there are only two URI properties: One is just ItemUri that is either the ActivationUri of the original item (if available) or ContentUri (if Activation is not available) or null (if neither was specified). And then there's a FallbackUri (in case app is no longer installed).

history database, as we won't be able to bring back that activity.

## UserActivityHistoryItem.StartTime property

This property contains the time when the user started the activity session.

## UserActivityHistoryItem.EndTime property

This property contains the time when the user ended the activity session.

## UserActivityHistoryQuery class

This class is used to specify criteria for what portion of the user's activity history you want to
retrieve. It allows you to specify keywords to search for, content types to filter by, and time ranges
to filter by.

## UserActivityHistoryQuery.Keywords property

This property is an array of keywords, each of which is used to lexically search against the
DisplayText column in the database. Keywords are case-insensitive, and results returned will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it an ordinal search, or a search based on a specific locale? If it's locale-sensitive, hopefully it uses the locale of the caller.

be those that contain all of the keywords in the array.

## UserActivityHistoryQuery.ContentType property

This property is a string that specifies the content type associated with the activity you want
to retrieve. It allows the inclusion of an asterisk as a wildcard - e.g., "image/*" will match
all content types beginning with "image/", such as "image/png", "image/jpeg", etc.
This property is case-insensitive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make it clear you can leave it null / empty-string to match any content.


## UserActivityHistoryQuery.EarliestStartTime property

This is an optional DateTime property that specifies the earliest start time of the activity
you want to retrieve. Any activities with a StartTime property earlier than this will be excluded.
If this property is unspecified, it will be ignored.

## UserActivityHistoryQuery.EarliestEndTime property

This is an optional DateTime property that specifies the earliest end time of the activity
you want to retrieve. Any activities with an EndTime property earlier than this will be excluded.
If this property is unspecified, it will be ignored.

## UserActivityHistoryQuery.LatestStartTime property

This is an optional DateTime property that specifies the latest start time of the activity
you want to retrieve. Any activities with a StartTime property later than this will be excluded.
If this property is unspecified, it will be ignored.

## UserActivityHistoryQuery.LatestEndTime property

This is an optional DateTime property that specifies the latest end time of the activity
you want to retrieve. Any activities with an EndTime property later than this will be excluded.
If this property is unspecified, it will be ignored.

## UserActivityHistoryOrderBy enum

This enum specifies what property the results should be ordered by. The options are as follows:

| Name | Description |
|-|-|
| StartTime | Results will be in descending order of their StartTime property |
| EndTime | Results will be in descending order of their EndTime property |
| DwellTime | Results will be in descending order of the difference between their EndTime and StartTime properties |

# API Details

```c# (but really MIDL3)
namespace Microsoft.Windows.ApplicationModel.UserActivities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Should we put this somewhere else? It's not a general-purpose API that anyone can use. It's specific to AI scenarios and will be VERY locked down as to who can call it. Do we have a top-level "User context stuff useful for AI" namespace? Do we need one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beat me to it.

Microsoft.Windows.AI.UserActivities seems more apt

{
runtimeclass UserActivityHistory
{
static IVector<UserActivityHistoryItem> Search(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have an overload for a single query?

UserActivityHistoryQuery[] queries,
UserActivityHistoryOrderBy orderBy,
UInt32 maxResults);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a material benefit to passing maxResults (e.g. perf? or making the user feel better if we include this as part of the consent prompt?).


static IAsyncOperation<IVector<UserActivityHistoryItem> > SearchAsync(
UserActivityHistoryQuery[] queries,
UserActivityHistoryOrderBy orderBy,
UInt32 maxResults);

static IVector<String> GetAppsWithUserActivity();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a list of ProductIds rather than just strings? What data do we have from unpackaged apps recording activities (like Office)?


static IAsyncOperation<IVector<String>> GetAppsWithUserActivityAsync();
}

runtimeclass UserActivityHistoryItem
{
String AppName { get; };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is AppName?

If a packaged app is this the app's AUMID (programmatic id) or DisplayName (localized string for human consumption)?

Does this API support unpackaged apps?

String ActivityId { get; };
String DisplayText { get; };
String ContentType { get; };
String ContentUri { get; };
String ActivationUri { get; };
DateTime StartTime { get; };
DateTime EndTime { get; };
}

runtimeclass UserActivityHistoryQuery
{
UserActivityHistoryQuery();

String[] Keywords;
String ContentType;
IReference<DateTime> EarliestStartTime;
IReference<DateTime> EarliestEndTime;
IReference<DateTime> LatestStartTime;
IReference<DateTime> LatestEndTime;
}

enum UserActivityHistoryOrderBy
{
StartTime,
EndTime,
DwellTime
};
}
```