Skip to content

Conversation

@xvalec01
Copy link

@xvalec01 xvalec01 commented Aug 24, 2025

This PR brings an update to the feature branch with part of the functionality of the dashboarding system.
Until the feature branch is synced with the "page" branch, I will add the files of interest.

┣ dashboard/
┃ ┃ ┣ docs/
┃ ┃ ┃ ┗ prettier_class_diagram.puml
┃ ┃ ┣ graphs/
┃ ┃ ┃ ┣ base.py
┃ ┃ ┃ ┣ cc_bar_chart.py
┃ ┃ ┃ ┣ cc_pie_chart.py
┃ ┃ ┃ ┗ registry.py
┃ ┃ ┣ callback.py
┃ ┃ ┣ cc_route.py
┃ ┃ ┣ data.py
┃ ┃ ┣ fips_route.py
┃ ┃ ┣ layout.py
┃ ┃ ┗ __init__.py

@xvalec01 xvalec01 marked this pull request as draft August 24, 2025 23:54
@xvalec01 xvalec01 changed the title feat: Prototype with dropdown menu for dynamically add graph with callback feat: Prototype with dropdown menu for dynamically adding graph with callback Aug 24, 2025
from sec_certs_page.dashboard.graphs.base import BaseGraph


class GraphRegistry:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially a dictionary. Any reason for it being a class? The duplicate id check is the only one I see.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on this incrementally. I started with the basic functionality, so in this case, it would make sense.
But the plans are more complex. You can see the overview of the planned system in sec_certs_page/dashboard/docs/prettier_class_diagram.puml.

This is still a draft. I am planning to finish this PR with a base complex functionality with caching and storing the dashboards in the database.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok makes sense. Let's see where it goes.

Comment on lines 23 to 80
if self._df_cc is None:
logger.info("Fetching Common Criteria dataset from MongoDB...")
try:
cursor = self.mongo.db.cc.find({}) # pyright: ignore[reportOptionalMemberAccess]
data = list(cursor)
if data:
self._df_cc = pd.DataFrame(data)
else:
logger.warning("CC collection is empty.")
return pd.DataFrame()
except Exception as e:
error_message = "Error fetching CC data from MongoDB"
logger.exception(error_message)
raise e
return self._df_cc

def get_fips_dataframe(self) -> pd.DataFrame:
"""
Fetches the FIPS dataset from the 'fips' collection in MongoDB.
"""
if self._df_fips is None:
print("Fetching FIPS dataset from MongoDB...")
try:
cursor = self.mongo.db.fips.find({}) # pyright: ignore[reportOptionalMemberAccess]
data = list(cursor)
if data:
self._df_fips = pd.DataFrame(data)
else:
logger.warning("FIPS collection is empty.")
return pd.DataFrame()
except Exception as e:
logger.exception("Error fetching FIPS data from MongoDB")
raise e
return self._df_fips
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cache the dataframes for too long. Even if the underlying data changes in the meantime. Also, making a dataframe this way from the whole dataset is not efficient (idk what it even does to the very deep and branchig json structure).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to find a way to convert data from MongoDB to something I can work with (probably CCDataset). The only way to do it is to currently load JSON files that are made by the certs processing. CCDataset.from_json(). What I was thinking about is having something like from_dataframe(), from_pandas(), or from_mongodb(). We can chat about this matter on Matrix.

What I did here with a "cache" is a temporary functionality that will be changed into a full-fledged cache. I could omit the self._df_fips, but it represents that there must be some cache for future changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss this a bit more as I think you are missing some core aspects of how the web works.

Generally, through the whole web implementation, we do not really work with the objects from the core library (such as CCDataset or CommonCriteriaCertificate) other than when the weekly processing is done. During this weekly processing, the objects of the certificates are taken and converted to format that can be stored in MongoDB. This is done by the objformats module. The conversion is not direct as there are some nuances due to nested types and MongoDB. After this is done, the web basically never goes back to handling the objects again, everything is done on a lower "format". (Storage <-> Working <-> Raw <-> Obj)

I think we should follow this also in the dash integration.

The caching is for another discussion. Currently, there are several things that get (re)computed on demand anytime the certificate dataset changes, but not more. This is handled here. Basically, using several ContextVariables (these are thread-local) we check whether we have fresh data every time we access it, and if not, we recompute. I think a similar approach should be done here. Perhaps we need to generalize this into some common functionality and use it across CC, PP, FIPS and the dash outputs. Actually, the data that gets precomputed this way was used to generate the charts on the analysis page, before they were replaced with the static exports some time ago. Now, this data is only used to get the subgraph a particular certificate belongs to, so that it can be rendered on the page quickly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I agree. I felt I had missed something because using it as I tried was quite off. So, can I create a TypedDict / Dataclass representing raw/working data I can obtain from MongoDB? This is because I want to know what data to count on when making the analysis, and which I have to process by myself. When I check the data inside, I only know what is in the database. The data I have worked with already were from a processed dataset. I would like to have some "overview" of the data present in the database from the code. Mainly existing keys of the data. There may be a reason why it does not exist yet.

I will think about the caching and how it could be done to match the current approach. Does it make sense to invalidate cached data when the ContextVariable changes and the data would be saved in Redis based on the user-created query?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the fields you can expect are the same as the JSON serialization of the certificate. If you do a load on the certificate as you get it from the database (look at the entry view for example) then you will have usable Python types and not the raw data as it is stored in the database (i.e. datetime, set, etc. but not CommonCriteriaCertificate or Heuristics objects). Do you have a snapshot of the database that you can examine or no? If not, I can provide a recent one for testing.

Wrt. caching. The current approach basically tests, on each access to the computed data, whether the backing database collection changed. This essentially amounts to checking whether new weekly processing was done. It does so by setting up a ChangeStream for the collection. Now that I look back on it I am not sure whether ContextVariables are even necessary here as uWSGI workers are processes and not threads, thus they will get their own view of the variables even if they were just basic globals, provided they were created after it forks the workers. An alternative to the ChangeStream approach would be just a simple lookup in the cc_log collection, which contains entries with metadata about each weekly processing run. If this was used in the cache key one could skip the whole ChangeStream setup which I think is kind of clunky and too generic. I guess it may make sense to save stuff to Redis, as it will also make the cache machine-wide so that the multiple worker processes do not need to compute the same thing themselves. However, one must keep in mind that there are multiple uWSGI workers and one Redis instance, with user requests coming to random workers, so consistency should be ensured. Btw. there is FlaskCaching setup here which can be directly used and is backed by Redis. This is used for view-based caching but also memoization: http://flask-caching.readthedocs.io/en/latest/index.html. If you wanted to do something custom with Redis, you can use the instance defined here. It is backed by a different database to the caching one, so no collisions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a reasonable level of detail for you to go forward with? We can (perhaps should) have a meeting about this, and also discuss the overall plan. From the class diagram some things seem unclear to me whether they will work out in how the site currently works.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the fields you can expect are the same as the JSON serialization of the certificate. If you do a load on the certificate as you get it from the database (look at the entry view for example) then you will have usable Python types and not the raw data as it is stored in the database (i.e. datetime, set, etc. but not CommonCriteriaCertificate or Heuristics objects). Do you have a snapshot of the database that you can examine or no? If not, I can provide a recent one for testing.

I have a quite old snapshot of the database, but I have one. I can download the dataset from the web and update the database with it, right? So I am sure I work with the correct data. Or do you think it will be easier if you provide me with the snapshot?

Wrt. caching. The current approach basically tests, on each access to the computed data, whether the backing database collection changed. This essentially amounts to checking whether new weekly processing was done. It does so by setting up a ChangeStream for the collection. Now that I look back on it I am not sure whether ContextVariables are even necessary here as uWSGI workers are processes and not threads, thus they will get their own view of the variables even if they were just basic globals, provided they were created after it forks the workers. An alternative to the ChangeStream approach would be just a simple lookup in the cc_log collection, which contains entries with metadata about each weekly processing run. If this was used in the cache key one could skip the whole ChangeStream setup which I think is kind of clunky and too generic. I guess it may make sense to save stuff to Redis, as it will also make the cache machine-wide so that the multiple worker processes do not need to compute the same thing themselves. However, one must keep in mind that there are multiple uWSGI workers and one Redis instance, with user requests coming to random workers, so consistency should be ensured. Btw. there is FlaskCaching setup here which can be directly used and is backed by Redis. This is used for view-based caching but also memoization: http://flask-caching.readthedocs.io/en/latest/index.html. If you wanted to do something custom with Redis, you can use the instance defined here. It is backed by a different database to the caching one, so no collisions.

I was planning to use the instance of FlaskRedis for caching, but I am unsure because view-based caching may be enough. As you suggested in the following comment, it would be nice if we arranged a meeting about this. I can skip the caching for now in my implementation, and we can add it later after we agree on the approach.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a reasonable level of detail for you to go forward with? We can (perhaps should) have a meeting about this, and also discuss the overall plan. From the class diagram some things seem unclear to me whether they will work out in how the site currently works.

Yes, this is a reasonable level of detail that I need. I would like to understand the core so I can follow the idea of the current system and its purpose.

@J08nY J08nY deleted the branch crocs-muni:feat/page-dash September 8, 2025 13:55
@J08nY J08nY closed this Sep 8, 2025
@J08nY
Copy link
Member

J08nY commented Sep 8, 2025

Sorry about the accidental close, this got closed due to the other branch being deleted after merge.

@J08nY J08nY reopened this Sep 8, 2025
@J08nY
Copy link
Member

J08nY commented Sep 8, 2025

I think you can rebase this on top of current page branch, as the feat/dash-page got merged. I didn't really think before merging it, but now it is too late.

@J08nY
Copy link
Member

J08nY commented Sep 9, 2025

One general comment, which may seem nitpicky, but I would appreciate if you used "plot" instead of "graph" to describe the plots. The thing is that graph can be understood as a network/graph and not a plot. Within our context there is actually a reference graph/network so it may be misleading. So either plot or chart.

@xvalec01
Copy link
Author

One general comment, which may seem nitpicky, but I would appreciate if you used "plot" instead of "graph" to describe the plots. The thing is that graph can be understood as a network/graph and not a plot. Within our context there is actually a reference graph/network so it may be misleading. So either plot or chart.

Okay, I will get rid of the "graph" across the dashboard part.

@xvalec01 xvalec01 force-pushed the feat/page-dash-update branch from e66c425 to b3e6607 Compare September 16, 2025 20:19
@J08nY J08nY added the enhancement New feature or request label Sep 25, 2025
@J08nY
Copy link
Member

J08nY commented Sep 26, 2025

@J08nY
Copy link
Member

J08nY commented Oct 13, 2025

If possible @xvalec01 could you rebase on top of the page branch? The #522 PR with better tests landed. Soon, the user accounts PR will land and then will be a good time to rebase, so that you can hook into the user accounts.

@xvalec01
Copy link
Author

If possible @xvalec01 could you rebase on top of the page branch? The #522 PR with better tests landed. Soon, the user accounts PR will land and then will be a good time to rebase, so that you can hook into the user accounts.

Will do!

J08nY and others added 16 commits October 13, 2025 18:29
- Created system class diagrams (v1 and v2) to illustrate the architecture of the dashboard application.
- Implemented a base filter class and dynamic filter factory for generating filters based on dataset structure.
- Developed specific filter components: DynamicDropdownFilter, DynamicTextSearchFilter, and DynamicRangeSliderFilter.
- Established a filter registry to manage filter instances.
- Enhanced DashboardLayoutManager with layout building and home page registration functionality.
- Added print statements for debugging layout construction and registration processes.
@xvalec01 xvalec01 force-pushed the feat/page-dash-update branch from 87b14fe to 368c953 Compare October 13, 2025 16:35
@J08nY
Copy link
Member

J08nY commented Oct 17, 2025

Thanks for the work. However, I am not sure how we will handle the fact that it is a merge to the feat/page-dash branch that is now outdated, was already merged into page and now page has a bunch of stuff on top of it.

@J08nY
Copy link
Member

J08nY commented Oct 17, 2025

I have rebased this branch on top of page and pushed that into the crocs-muni:feat/page-dash-update branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants