-
Notifications
You must be signed in to change notification settings - Fork 11
feat: Prototype with dropdown menu for dynamically adding graph with callback #513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feat/page-dash
Are you sure you want to change the base?
feat: Prototype with dropdown menu for dynamically adding graph with callback #513
Conversation
| from sec_certs_page.dashboard.graphs.base import BaseGraph | ||
|
|
||
|
|
||
| class GraphRegistry: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is essentially a dictionary. Any reason for it being a class? The duplicate id check is the only one I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working on this incrementally. I started with the basic functionality, so in this case, it would make sense.
But the plans are more complex. You can see the overview of the planned system in sec_certs_page/dashboard/docs/prettier_class_diagram.puml.
This is still a draft. I am planning to finish this PR with a base complex functionality with caching and storing the dashboards in the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok makes sense. Let's see where it goes.
| if self._df_cc is None: | ||
| logger.info("Fetching Common Criteria dataset from MongoDB...") | ||
| try: | ||
| cursor = self.mongo.db.cc.find({}) # pyright: ignore[reportOptionalMemberAccess] | ||
| data = list(cursor) | ||
| if data: | ||
| self._df_cc = pd.DataFrame(data) | ||
| else: | ||
| logger.warning("CC collection is empty.") | ||
| return pd.DataFrame() | ||
| except Exception as e: | ||
| error_message = "Error fetching CC data from MongoDB" | ||
| logger.exception(error_message) | ||
| raise e | ||
| return self._df_cc | ||
|
|
||
| def get_fips_dataframe(self) -> pd.DataFrame: | ||
| """ | ||
| Fetches the FIPS dataset from the 'fips' collection in MongoDB. | ||
| """ | ||
| if self._df_fips is None: | ||
| print("Fetching FIPS dataset from MongoDB...") | ||
| try: | ||
| cursor = self.mongo.db.fips.find({}) # pyright: ignore[reportOptionalMemberAccess] | ||
| data = list(cursor) | ||
| if data: | ||
| self._df_fips = pd.DataFrame(data) | ||
| else: | ||
| logger.warning("FIPS collection is empty.") | ||
| return pd.DataFrame() | ||
| except Exception as e: | ||
| logger.exception("Error fetching FIPS data from MongoDB") | ||
| raise e | ||
| return self._df_fips |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cache the dataframes for too long. Even if the underlying data changes in the meantime. Also, making a dataframe this way from the whole dataset is not efficient (idk what it even does to the very deep and branchig json structure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to find a way to convert data from MongoDB to something I can work with (probably CCDataset). The only way to do it is to currently load JSON files that are made by the certs processing. CCDataset.from_json(). What I was thinking about is having something like from_dataframe(), from_pandas(), or from_mongodb(). We can chat about this matter on Matrix.
What I did here with a "cache" is a temporary functionality that will be changed into a full-fledged cache. I could omit the self._df_fips, but it represents that there must be some cache for future changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discuss this a bit more as I think you are missing some core aspects of how the web works.
Generally, through the whole web implementation, we do not really work with the objects from the core library (such as CCDataset or CommonCriteriaCertificate) other than when the weekly processing is done. During this weekly processing, the objects of the certificates are taken and converted to format that can be stored in MongoDB. This is done by the objformats module. The conversion is not direct as there are some nuances due to nested types and MongoDB. After this is done, the web basically never goes back to handling the objects again, everything is done on a lower "format". (Storage <-> Working <-> Raw <-> Obj)
I think we should follow this also in the dash integration.
The caching is for another discussion. Currently, there are several things that get (re)computed on demand anytime the certificate dataset changes, but not more. This is handled here. Basically, using several ContextVariables (these are thread-local) we check whether we have fresh data every time we access it, and if not, we recompute. I think a similar approach should be done here. Perhaps we need to generalize this into some common functionality and use it across CC, PP, FIPS and the dash outputs. Actually, the data that gets precomputed this way was used to generate the charts on the analysis page, before they were replaced with the static exports some time ago. Now, this data is only used to get the subgraph a particular certificate belongs to, so that it can be rendered on the page quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I agree. I felt I had missed something because using it as I tried was quite off. So, can I create a TypedDict / Dataclass representing raw/working data I can obtain from MongoDB? This is because I want to know what data to count on when making the analysis, and which I have to process by myself. When I check the data inside, I only know what is in the database. The data I have worked with already were from a processed dataset. I would like to have some "overview" of the data present in the database from the code. Mainly existing keys of the data. There may be a reason why it does not exist yet.
I will think about the caching and how it could be done to match the current approach. Does it make sense to invalidate cached data when the ContextVariable changes and the data would be saved in Redis based on the user-created query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the fields you can expect are the same as the JSON serialization of the certificate. If you do a load on the certificate as you get it from the database (look at the entry view for example) then you will have usable Python types and not the raw data as it is stored in the database (i.e. datetime, set, etc. but not CommonCriteriaCertificate or Heuristics objects). Do you have a snapshot of the database that you can examine or no? If not, I can provide a recent one for testing.
Wrt. caching. The current approach basically tests, on each access to the computed data, whether the backing database collection changed. This essentially amounts to checking whether new weekly processing was done. It does so by setting up a ChangeStream for the collection. Now that I look back on it I am not sure whether ContextVariables are even necessary here as uWSGI workers are processes and not threads, thus they will get their own view of the variables even if they were just basic globals, provided they were created after it forks the workers. An alternative to the ChangeStream approach would be just a simple lookup in the cc_log collection, which contains entries with metadata about each weekly processing run. If this was used in the cache key one could skip the whole ChangeStream setup which I think is kind of clunky and too generic. I guess it may make sense to save stuff to Redis, as it will also make the cache machine-wide so that the multiple worker processes do not need to compute the same thing themselves. However, one must keep in mind that there are multiple uWSGI workers and one Redis instance, with user requests coming to random workers, so consistency should be ensured. Btw. there is FlaskCaching setup here which can be directly used and is backed by Redis. This is used for view-based caching but also memoization: http://flask-caching.readthedocs.io/en/latest/index.html. If you wanted to do something custom with Redis, you can use the instance defined here. It is backed by a different database to the caching one, so no collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a reasonable level of detail for you to go forward with? We can (perhaps should) have a meeting about this, and also discuss the overall plan. From the class diagram some things seem unclear to me whether they will work out in how the site currently works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the fields you can expect are the same as the JSON serialization of the certificate. If you do a
loadon the certificate as you get it from the database (look at the entry view for example) then you will have usable Python types and not the raw data as it is stored in the database (i.e. datetime, set, etc. but notCommonCriteriaCertificateorHeuristicsobjects). Do you have a snapshot of the database that you can examine or no? If not, I can provide a recent one for testing.
I have a quite old snapshot of the database, but I have one. I can download the dataset from the web and update the database with it, right? So I am sure I work with the correct data. Or do you think it will be easier if you provide me with the snapshot?
Wrt. caching. The current approach basically tests, on each access to the computed data, whether the backing database collection changed. This essentially amounts to checking whether new weekly processing was done. It does so by setting up a ChangeStream for the collection. Now that I look back on it I am not sure whether ContextVariables are even necessary here as uWSGI workers are processes and not threads, thus they will get their own view of the variables even if they were just basic globals, provided they were created after it forks the workers. An alternative to the ChangeStream approach would be just a simple lookup in the
cc_logcollection, which contains entries with metadata about each weekly processing run. If this was used in the cache key one could skip the whole ChangeStream setup which I think is kind of clunky and too generic. I guess it may make sense to save stuff to Redis, as it will also make the cache machine-wide so that the multiple worker processes do not need to compute the same thing themselves. However, one must keep in mind that there are multiple uWSGI workers and one Redis instance, with user requests coming to random workers, so consistency should be ensured. Btw. there is FlaskCaching setup here which can be directly used and is backed by Redis. This is used for view-based caching but also memoization: http://flask-caching.readthedocs.io/en/latest/index.html. If you wanted to do something custom with Redis, you can use the instance defined here. It is backed by a different database to the caching one, so no collisions.
I was planning to use the instance of FlaskRedis for caching, but I am unsure because view-based caching may be enough. As you suggested in the following comment, it would be nice if we arranged a meeting about this. I can skip the caching for now in my implementation, and we can add it later after we agree on the approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a reasonable level of detail for you to go forward with? We can (perhaps should) have a meeting about this, and also discuss the overall plan. From the class diagram some things seem unclear to me whether they will work out in how the site currently works.
Yes, this is a reasonable level of detail that I need. I would like to understand the core so I can follow the idea of the current system and its purpose.
b54d11f to
822146f
Compare
Basic Dash integration to the page
|
Sorry about the accidental close, this got closed due to the other branch being deleted after merge. |
|
I think you can rebase this on top of current |
|
One general comment, which may seem nitpicky, but I would appreciate if you used "plot" instead of "graph" to describe the plots. The thing is that graph can be understood as a network/graph and not a plot. Within our context there is actually a reference graph/network so it may be misleading. So either plot or chart. |
Okay, I will get rid of the "graph" across the dashboard part. |
e66c425 to
b3e6607
Compare
Co-authored-by: Copilot <[email protected]>
Cache analysis data in Redis
|
Two things that affect this were merged: |
Make web testable
Co-authored-by: J08nY <[email protected]>
…3-e2a3-465c-9f56-1c851d92b3d6 Add Copilot instructions (page)
- Created system class diagrams (v1 and v2) to illustrate the architecture of the dashboard application. - Implemented a base filter class and dynamic filter factory for generating filters based on dataset structure. - Developed specific filter components: DynamicDropdownFilter, DynamicTextSearchFilter, and DynamicRangeSliderFilter. - Established a filter registry to manage filter instances. - Enhanced DashboardLayoutManager with layout building and home page registration functionality. - Added print statements for debugging layout construction and registration processes.
87b14fe to
368c953
Compare
|
Thanks for the work. However, I am not sure how we will handle the fact that it is a merge to the |
|
I have rebased this branch on top of page and pushed that into the crocs-muni:feat/page-dash-update branch. |
This PR brings an update to the feature branch with part of the functionality of the dashboarding system.
Until the feature branch is synced with the "page" branch, I will add the files of interest.