feat: Prototype with dropdown menu for dynamically adding graph with callback #513

xvalec01 · 2025-08-24T23:53:42Z

This PR brings an update to the feature branch with part of the functionality of the dashboarding system.
Until the feature branch is synced with the "page" branch, I will add the files of interest.

┣ dashboard/
┃ ┃ ┣ docs/
┃ ┃ ┃ ┗ prettier_class_diagram.puml
┃ ┃ ┣ graphs/
┃ ┃ ┃ ┣ base.py
┃ ┃ ┃ ┣ cc_bar_chart.py
┃ ┃ ┃ ┣ cc_pie_chart.py
┃ ┃ ┃ ┗ registry.py
┃ ┃ ┣ callback.py
┃ ┃ ┣ cc_route.py
┃ ┃ ┣ data.py
┃ ┃ ┣ fips_route.py
┃ ┃ ┣ layout.py
┃ ┃ ┗ __init__.py

sec_certs_page/dashboard/__init__.py

sec_certs_page/dashboard/charts/cc_bar_chart.py

sec_certs_page/dashboard/graphs/cc_bar_chart.py

sec_certs_page/dashboard/graphs/registry.py

J08nY · 2025-08-25T08:29:39Z

sec_certs_page/dashboard/graphs/registry.py

+from sec_certs_page.dashboard.graphs.base import BaseGraph
+
+
+class GraphRegistry:


This is essentially a dictionary. Any reason for it being a class? The duplicate id check is the only one I see.

I am working on this incrementally. I started with the basic functionality, so in this case, it would make sense.
But the plans are more complex. You can see the overview of the planned system in sec_certs_page/dashboard/docs/prettier_class_diagram.puml.

This is still a draft. I am planning to finish this PR with a base complex functionality with caching and storing the dashboards in the database.

Ok makes sense. Let's see where it goes.

J08nY · 2025-08-25T08:32:00Z

sec_certs_page/dashboard/data.py

+        if self._df_cc is None:
+            logger.info("Fetching Common Criteria dataset from MongoDB...")
+            try:
+                cursor = self.mongo.db.cc.find({})  # pyright: ignore[reportOptionalMemberAccess]
+                data = list(cursor)
+                if data:
+                    self._df_cc = pd.DataFrame(data)
+                else:
+                    logger.warning("CC collection is empty.")
+                    return pd.DataFrame()
+            except Exception as e:
+                error_message = "Error fetching CC data from MongoDB"
+                logger.exception(error_message)
+                raise e
+        return self._df_cc
+
+    def get_fips_dataframe(self) -> pd.DataFrame:
+        """
+        Fetches the FIPS dataset from the 'fips' collection in MongoDB.
+        """
+        if self._df_fips is None:
+            print("Fetching FIPS dataset from MongoDB...")
+            try:
+                cursor = self.mongo.db.fips.find({})  # pyright: ignore[reportOptionalMemberAccess]
+                data = list(cursor)
+                if data:
+                    self._df_fips = pd.DataFrame(data)
+                else:
+                    logger.warning("FIPS collection is empty.")
+                    return pd.DataFrame()
+            except Exception as e:
+                logger.exception("Error fetching FIPS data from MongoDB")
+                raise e
+        return self._df_fips


This will cache the dataframes for too long. Even if the underlying data changes in the meantime. Also, making a dataframe this way from the whole dataset is not efficient (idk what it even does to the very deep and branchig json structure).

I was trying to find a way to convert data from MongoDB to something I can work with (probably CCDataset). The only way to do it is to currently load JSON files that are made by the certs processing. CCDataset.from_json(). What I was thinking about is having something like from_dataframe(), from_pandas(), or from_mongodb(). We can chat about this matter on Matrix.

What I did here with a "cache" is a temporary functionality that will be changed into a full-fledged cache. I could omit the self._df_fips, but it represents that there must be some cache for future changes.

We should discuss this a bit more as I think you are missing some core aspects of how the web works.

Generally, through the whole web implementation, we do not really work with the objects from the core library (such as CCDataset or CommonCriteriaCertificate) other than when the weekly processing is done. During this weekly processing, the objects of the certificates are taken and converted to format that can be stored in MongoDB. This is done by the objformats module. The conversion is not direct as there are some nuances due to nested types and MongoDB. After this is done, the web basically never goes back to handling the objects again, everything is done on a lower "format". (Storage <-> Working <-> Raw <-> Obj)

I think we should follow this also in the dash integration.

The caching is for another discussion. Currently, there are several things that get (re)computed on demand anytime the certificate dataset changes, but not more. This is handled here. Basically, using several ContextVariables (these are thread-local) we check whether we have fresh data every time we access it, and if not, we recompute. I think a similar approach should be done here. Perhaps we need to generalize this into some common functionality and use it across CC, PP, FIPS and the dash outputs. Actually, the data that gets precomputed this way was used to generate the charts on the analysis page, before they were replaced with the static exports some time ago. Now, this data is only used to get the subgraph a particular certificate belongs to, so that it can be rendered on the page quickly.

Okay, I agree. I felt I had missed something because using it as I tried was quite off. So, can I create a TypedDict / Dataclass representing raw/working data I can obtain from MongoDB? This is because I want to know what data to count on when making the analysis, and which I have to process by myself. When I check the data inside, I only know what is in the database. The data I have worked with already were from a processed dataset. I would like to have some "overview" of the data present in the database from the code. Mainly existing keys of the data. There may be a reason why it does not exist yet.

I will think about the caching and how it could be done to match the current approach. Does it make sense to invalidate cached data when the ContextVariable changes and the data would be saved in Redis based on the user-created query?

Well, the fields you can expect are the same as the JSON serialization of the certificate. If you do a load on the certificate as you get it from the database (look at the entry view for example) then you will have usable Python types and not the raw data as it is stored in the database (i.e. datetime, set, etc. but not CommonCriteriaCertificate or Heuristics objects). Do you have a snapshot of the database that you can examine or no? If not, I can provide a recent one for testing.

Wrt. caching. The current approach basically tests, on each access to the computed data, whether the backing database collection changed. This essentially amounts to checking whether new weekly processing was done. It does so by setting up a ChangeStream for the collection. Now that I look back on it I am not sure whether ContextVariables are even necessary here as uWSGI workers are processes and not threads, thus they will get their own view of the variables even if they were just basic globals, provided they were created after it forks the workers. An alternative to the ChangeStream approach would be just a simple lookup in the cc_log collection, which contains entries with metadata about each weekly processing run. If this was used in the cache key one could skip the whole ChangeStream setup which I think is kind of clunky and too generic. I guess it may make sense to save stuff to Redis, as it will also make the cache machine-wide so that the multiple worker processes do not need to compute the same thing themselves. However, one must keep in mind that there are multiple uWSGI workers and one Redis instance, with user requests coming to random workers, so consistency should be ensured. Btw. there is FlaskCaching setup here which can be directly used and is backed by Redis. This is used for view-based caching but also memoization: http://flask-caching.readthedocs.io/en/latest/index.html. If you wanted to do something custom with Redis, you can use the instance defined here. It is backed by a different database to the caching one, so no collisions.

Is this a reasonable level of detail for you to go forward with? We can (perhaps should) have a meeting about this, and also discuss the overall plan. From the class diagram some things seem unclear to me whether they will work out in how the site currently works.

Well, the fields you can expect are the same as the JSON serialization of the certificate. If you do a load on the certificate as you get it from the database (look at the entry view for example) then you will have usable Python types and not the raw data as it is stored in the database (i.e. datetime, set, etc. but not CommonCriteriaCertificate or Heuristics objects). Do you have a snapshot of the database that you can examine or no? If not, I can provide a recent one for testing.

I have a quite old snapshot of the database, but I have one. I can download the dataset from the web and update the database with it, right? So I am sure I work with the correct data. Or do you think it will be easier if you provide me with the snapshot?

Wrt. caching. The current approach basically tests, on each access to the computed data, whether the backing database collection changed. This essentially amounts to checking whether new weekly processing was done. It does so by setting up a ChangeStream for the collection. Now that I look back on it I am not sure whether ContextVariables are even necessary here as uWSGI workers are processes and not threads, thus they will get their own view of the variables even if they were just basic globals, provided they were created after it forks the workers. An alternative to the ChangeStream approach would be just a simple lookup in the cc_log collection, which contains entries with metadata about each weekly processing run. If this was used in the cache key one could skip the whole ChangeStream setup which I think is kind of clunky and too generic. I guess it may make sense to save stuff to Redis, as it will also make the cache machine-wide so that the multiple worker processes do not need to compute the same thing themselves. However, one must keep in mind that there are multiple uWSGI workers and one Redis instance, with user requests coming to random workers, so consistency should be ensured. Btw. there is FlaskCaching setup here which can be directly used and is backed by Redis. This is used for view-based caching but also memoization: http://flask-caching.readthedocs.io/en/latest/index.html. If you wanted to do something custom with Redis, you can use the instance defined here. It is backed by a different database to the caching one, so no collisions.

I was planning to use the instance of FlaskRedis for caching, but I am unsure because view-based caching may be enough. As you suggested in the following comment, it would be nice if we arranged a meeting about this. I can skip the caching for now in my implementation, and we can add it later after we agree on the approach.

Is this a reasonable level of detail for you to go forward with? We can (perhaps should) have a meeting about this, and also discuss the overall plan. From the class diagram some things seem unclear to me whether they will work out in how the site currently works.

Yes, this is a reasonable level of detail that I need. I would like to understand the core so I can follow the idea of the current system and its purpose.

sec_certs_page/dashboard/graphs/base.py

Basic Dash integration to the page

J08nY · 2025-09-08T13:57:37Z

Sorry about the accidental close, this got closed due to the other branch being deleted after merge.

J08nY · 2025-09-08T14:57:39Z

I think you can rebase this on top of current page branch, as the feat/dash-page got merged. I didn't really think before merging it, but now it is too late.

J08nY · 2025-09-09T10:57:56Z

One general comment, which may seem nitpicky, but I would appreciate if you used "plot" instead of "graph" to describe the plots. The thing is that graph can be understood as a network/graph and not a plot. Within our context there is actually a reference graph/network so it may be misleading. So either plot or chart.

xvalec01 · 2025-09-16T09:04:10Z

One general comment, which may seem nitpicky, but I would appreciate if you used "plot" instead of "graph" to describe the plots. The thing is that graph can be understood as a network/graph and not a plot. Within our context there is actually a reference graph/network so it may be misleading. So either plot or chart.

Okay, I will get rid of the "graph" across the dashboard part.

Co-authored-by: Copilot <[email protected]>

Cache analysis data in Redis

J08nY · 2025-09-26T18:08:01Z

Two things that affect this were merged:

Make web testable

Co-authored-by: J08nY <[email protected]>

…3-e2a3-465c-9f56-1c851d92b3d6 Add Copilot instructions (page)

J08nY · 2025-10-13T14:46:52Z

If possible @xvalec01 could you rebase on top of the page branch? The #522 PR with better tests landed. Soon, the user accounts PR will land and then will be a good time to rebase, so that you can hook into the user accounts.

xvalec01 · 2025-10-13T14:58:16Z

If possible @xvalec01 could you rebase on top of the page branch? The #522 PR with better tests landed. Soon, the user accounts PR will land and then will be a good time to rebase, so that you can hook into the user accounts.

Will do!

- Created system class diagrams (v1 and v2) to illustrate the architecture of the dashboard application. - Implemented a base filter class and dynamic filter factory for generating filters based on dataset structure. - Developed specific filter components: DynamicDropdownFilter, DynamicTextSearchFilter, and DynamicRangeSliderFilter. - Established a filter registry to manage filter instances. - Enhanced DashboardLayoutManager with layout building and home page registration functionality. - Added print statements for debugging layout construction and registration processes.

J08nY · 2025-10-17T18:11:11Z

Thanks for the work. However, I am not sure how we will handle the fact that it is a merge to the feat/page-dash branch that is now outdated, was already merged into page and now page has a bunch of stuff on top of it.

J08nY · 2025-10-17T21:05:48Z

I have rebased this branch on top of page and pushed that into the crocs-muni:feat/page-dash-update branch.

xvalec01 marked this pull request as draft August 24, 2025 23:54

xvalec01 changed the title ~~feat: Prototype with dropdown menu for dynamically add graph with callback~~ feat: Prototype with dropdown menu for dynamically adding graph with callback Aug 24, 2025