NM-163: Users, Groups, Roles, Networks and Hosts Table Migration#3910
NM-163: Users, Groups, Roles, Networks and Hosts Table Migration#3910abhishek9686 merged 126 commits intodevelopfrom
Conversation
…d for schema functions;
Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>
|
Tenki Code Review - Complete Files Reviewed: 129 By Severity:
This large PR migrates model types from the legacy Files Reviewed (129 files) |
There was a problem hiding this comment.
Overview
This is a broad model-migration PR touching ~129 files. The primary mechanical change replaces helper functions like logic.GetHost(), logic.GetUser(), and logic.GetNetwork() with direct GORM-based method calls such as (&schema.Host{ID: id}).Get(ctx). The db.Middleware is applied globally to all HTTP routes in controllers/controller.go, so r.Context() carries a valid DB instance everywhere. Background goroutines and WebSocket handlers correctly use db.WithContext(context.TODO()) which accesses the package-level DB singleton.
Security
Finding 1 — bcrypt hash in audit log (controllers/user.go:1323): After logic.CreateUser(&user) completes, the local user variable is a schema.User whose Password field holds the bcrypt hash. This object is passed verbatim as Target.Info: user in the LogEvent call. In pro/logic/events.go:EventWatcher, the target is serialised via json.Marshal(e.Target) and written to the events_v1 table. Since schema.User.Password is tagged json:"password" (no omitempty), the hash is persisted and returned by the events API. The TOTPSecret field is similarly exposed.
Finding 2 — AutoJoin migration maps empty string to true (migrate/migrate_v1_5_1.go:162): The migrateNetworks() function maps the old string AutoJoin field to a bool with the pattern if network.AutoJoin == "false" { false } else { true }. This means any network where AutoJoin was never explicitly set (empty string, the zero value) is migrated to autoJoin = true. While functionally consistent with the old SessionHandler behavior, this is asymmetric with how AutoRemove and JITEnabled are migrated in the same function (both use the safe default-false == "true" pattern).
Code Quality
The migration is mechanically sound. The shift from nil-check patterns (if host == nil { continue }) to Go-idiomatic error patterns (if err != nil { continue }) improves correctness. The transferSuperAdmin function in controllers/user.go gained a missing return after its first ReturnErrorResponse call — a pre-existing bug that is now fixed.
|
Tenki Code Review - Complete Files Reviewed: 129 By Severity:
This PR performs a large-scale migration of core entity types (Network, Host, User, and related constants/enums) from the Files Reviewed (129 files) |
There was a problem hiding this comment.
Overview
This PR is a large structural refactor migrating core data types — models.Network, models.Host, models.User, models.UserRole, and dozens of constants/enums — from the legacy models package (backed by a key-value store) to a new schema package that uses GORM/SQLite ORM. Alongside the migration, two new paginated REST endpoints are introduced: GET /api/v1/hosts and GET /api/v2/users.
The overall migration is mechanically sound. The vast majority of changes are straightforward type substitutions (models.Foo → schema.Foo) and field renames (UserName → Username, NetID → Name, etc.), with direct database lookups replacing wrapper functions. Authentication and authorization logic is preserved across the rewrite.
Bugs Found
1. Missing return after error in updateUserAccountStatus (HIGH)
In controllers/user.go, the refactored updateUserAccountStatus function (called by both enableUserAccount and disableUserAccount) sends an error response when UpdateAccountStatus fails but does not return. Execution falls through to a goroutine launch that toggles ext-client connectivity, and then to a success response write. This produces: (a) two HTTP responses written to the same ResponseWriter, and (b) ext-client side-effects firing even when the DB update failed.
2. Duplicate nested if in WithFilter (LOW)
In db/types/options.go, the WithFilter function has a copy-paste artifact where if len(value) == 1 is written twice as a nested condition. The inner check is dead code. Functionality is accidentally correct (the outer check's return fires before the inner is evaluated), but the code is misleading.
3. Unreachable role-consistency check in GetUserNameFromToken (LOW)
In logic/jwts.go, after a successful user.Get() call, the condition user.Username != "" is always true (the user was fetched by username as the lookup key), causing an early return that makes the subsequent user.PlatformRoleID != claims.Role check permanently unreachable. This check was intended as a defense-in-depth guard against replaying tokens after a role change. It is not a privilege escalation (auth enforcement relies on DB-derived roles in VerifyUserToken and SecurityCheck), but a missed validation opportunity.
4. Invalid filter values produce wrong results in listUsers (LOW)
In controllers/user.go's new listUsers endpoint, the account_status and mfa_status query-param parsing uses two separate if statements instead of if/else. Any unrecognized filter value causes the zero-value false to be appended unconditionally, resulting in silent filtering by the wrong criterion rather than ignoring the invalid value.
Positive Notes
- The migration correctly preserves all authentication checks in
SecurityCheckandGetUserNameFromToken. - The new paginated endpoints (
listHosts,listUsers) are properly gated behindlogic.SecurityCheck(true, ...)requiring admin access. - SQLite connection pooling is correctly configured (max idle connections set to 1).
- The
db/typespackage'sWithFilter/InAscOrderfunctions use hardcoded field name strings at all call sites, so there is no SQL injection risk from user input.
Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>
…ller - Add missing return after error response in updateUserAccountStatus to prevent double-response and spurious ext-client side-effects - Use switch statements in listUsers to skip unrecognized account_status and mfa_status filter values
|
Tenki Code Review - Complete Files Reviewed: 90 By Severity:
This PR is a large-scale refactoring that migrates core entities (Host, Network, User, UserRole, UserGroup) from the old Files Reviewed (90 files) |
There was a problem hiding this comment.
Summary
This PR performs a broad schema migration across the Netmaker codebase, replacing models.Host, models.Network, models.User, and related types with their schema package equivalents. It also introduces two new paginated REST endpoints and consolidates enableUserAccount/disableUserAccount into a single helper.
Bugs Found
1. Free-tier limit check incorrectly blocks on DB error (controllers/limits.go)
The limitChoiceNetworks and limitChoiceUsers cases were refactored from (err != nil && !database.IsEmptyRecord(err)) || len(items) >= limit to err != nil || count >= limit. The !database.IsEmptyRecord(err) guard has been removed. Any transient DB error or GORM ErrRecordNotFound from Count() will now unconditionally trigger the "free tier limits exceeded" rejection, blocking legitimate network/user creation in free-tier mode.
2. per_page > 100 yields inconsistent pagination metadata (controllers/user.go, controllers/hosts.go)
Both listUsers and listHosts pass the raw user-supplied per_page value to WithPagination in db/types/options.go. WithPagination silently resets pageSize to 10 when pageSize > 100, but the handlers already captured the original value and embed it in the response's PerPage field. Clients receive a response claiming e.g. per_page=500 while only 10 records are returned, breaking cursor/offset pagination.
3. updateUser: extclient deletion may incorrectly remove valid extclients for non-admin users (controllers/user.go)
The old code checked both user.NetworkRoles (per-user direct access) and group membership when deciding whether to delete a user's extclients. The new code only checks group membership. If any non-admin user has direct per-user network access that hasn't been fully migrated to groups, those extclients will be wrongly deleted on any user update.
Positive Observations
- The migration consistently applies
uuid.Parsevalidation before schema DB lookups, which is an improvement over the old string-keyed approach. transferSuperAdminnow correctly returns early after error (a pre-existing bug was fixed as a side effect).- The
updateUserAccountStatusrefactor correctly consolidates duplicate enable/disable logic and adds a missingreturnafter the internal-error response indisableUserAccount. db/sqlite.gonow correctly setsMaxIdleConns(1)for SQLite safety.
|
Tenki Code Review - Complete Files Reviewed: 129 By Severity:
This PR is a large-scale migration from the legacy JSON-based storage layer ( Files Reviewed (129 files) |
There was a problem hiding this comment.
Overview
This is a significant architectural refactoring that moves models.Host, models.Network, and models.User to schema.Host, schema.Network, and schema.User backed by GORM ORM (SQLite). The change touches 129 files and also introduces two new paginated API endpoints: GET /api/v1/hosts and GET /api/v2/users.
New Endpoints
GET /api/v1/hosts(controllers/hosts.go): Supports pagination and OS filtering. Filter values are properly parameterized via GORM's?placeholders. Field names used indbtypes.WithFilteranddbtypes.InAscOrderare all compile-time constants — no SQL injection risk.GET /api/v2/users(controllers/user.go): Supports pagination and filtering by account status, MFA status, role, and auth type. Same pattern as hosts — values parameterized, fields hardcoded.
Security Assessment
No exploitable security issues were found. The WithFilter utility (db/types/options.go) uses fmt.Sprintf to build the field name into the WHERE clause, but all current callers pass only hardcoded column names ("os", "account_disabled", etc.), making injection impossible with the current call graph.
The transferSuperAdmin function in controllers/user.go performs two non-transactional DB writes — this is a pre-existing pattern but the rewrite preserves the race window where both users could end up with SuperAdmin role.
Key Findings
1. Free-tier limits block on transient DB errors (controllers/limits.go)
The limitChoiceNetworks, limitChoiceUsers, and limitChoiceEgress cases now return a free-tier error on any DB error. The old code had !database.IsEmptyRecord(err) guards. While GORM's Count() returns (0, nil) for empty tables (so this won't false-positive on fresh installs), any transient DB error will present as a confusing "free tier limits exceeded" message.
2. Non-atomic super-admin transfer (controllers/user.go)
transferSuperAdmin promotes u and demotes caller in two separate, unguarded UpsertUser calls. If the second call fails, two users hold SuperAdminRole. Should use a transaction.
3. Inconsistent error handling in getNetworksStats (controllers/network.go)
getNetworksStats retains a !database.IsEmptyRecord(err) guard that is now dead code with GORM. While harmless, it creates a maintenance divergence from the updated getNetworks handler.
4. Off-by-one in password length validation during update (logic/auth.go)
User create enforces > 5 characters (via ValidateUser's <= 5 check on the plaintext password). User update checks < 5 before hashing — allowing exactly 5-character passwords that would be rejected on create.
|
Tenki Code Review - Complete Files Reviewed: 130 By Severity:
This PR migrates Netmaker's core data layer from a JSON key-value store to a GORM/SQLite-backed relational schema, replacing Files Reviewed (130 files) |
There was a problem hiding this comment.
Overview
This is a large-scale migration (130 files) transitioning the Netmaker server from a custom key-value database layer (database.*) to GORM ORM with a proper SQL schema for hosts, networks, and users. The PR removes in-memory caching layers, replaces JSON marshal/unmarshal with ORM calls, adds new paginated API endpoints (/api/v1/hosts, /api/v2/users), and removes free-tier limit middleware.
The overall structure is correct, and the move to a proper relational database is a positive architectural improvement. However, several issues were found:
High: Password Validation Logic Regression
The new ValidateUser() function in logic/auth.go checks len(user.Password) < 5, but in UpdateUser(), this function is called after _user.Password has already been set to the stored bcrypt hash (60 chars). This means the minimum-length check is never enforced when a password update is requested, since the check sees the hash rather than the plaintext. The password length validation effectively only fires on CreateUser(). This is a behavioral regression from the old code.
Medium: SQL Field Name Interpolation Pattern
The new db/types/options.go's WithFilter function interpolates the field parameter directly into a SQL WHERE clause using fmt.Sprintf. Values are correctly parameterized, but field names are not quoted or validated. All current callers use hardcoded constants, so no user-controlled input reaches this today, but the API design invites future injection risk.
Medium: Username Charset Bypass for Email Usernames
In the new validateUserName(), usernames that successfully parse as RFC 5322 email addresses skip charset validation entirely. net/mail.ParseAddress is permissive and accepts special characters (+, !, #, etc.) in the local part. This is a regression from the old NameInCharSet() which applied the charset restriction uniformly.
Medium: Potential nil Dereference Pattern in updateUserAccountStatus
The refactored updateUserAccountStatus in controllers/user.go uses a _caller pointer that is nil when isMaster=true. While the code is currently safe due to the if !isMaster guard, the pattern is fragile—accessing _caller.PlatformRoleID in multiple switch cases within that block would panic if the guard were ever relaxed.
Low: uuid.MustParse Panic Risk
Two functions in logic/hosts.go — DisassociateAllNodesFromHost and GetHostNetworks — use uuid.MustParse(hostID) without error handling. If a malformed UUID string is encountered (e.g., from DB corruption), this will cause a runtime panic. The old code path called GetHost() which returned an error gracefully.
Low: Silent Migration Failures
The syncUsers migration in migrate/migrate.go silently discards errors from logic.UpsertUser(). Partial migration failures could leave users in inconsistent state with no log output.
* feat(go): add user schema; * feat(go): migrate to user schema; * feat(go): add audit fields; * feat(go): remove unused fields from the network model; * feat(go): add network schema; * feat(go): migrate to network schema; * refactor(go): add comment to clarify migration logic; * fix(go): test failures; * fix(go): test failures; * feat(go): change membership table to store memberships at all scopes; * feat(go): add schema for access grants; * feat(go): remove nameservers from new networks table; ensure db passed for schema functions; * feat(go): set max conns for sqlite to 1; * fix(go): issues updating user account status; * NM-236: streamline operations in HA mode * NM-236: only master pod should subscribe to updates from clients * refactor(go): remove converters and access grants; * refactor(go): add json tags in schema models; * refactor(go): rename file to migrate_v1_6_0.go; * refactor(go): add user groups and user roles tables; use schema tables; * refactor(go): inline get and list from schema package; * refactor(go): inline get network and list users from schema package; * fix(go): staticcheck issues; * fix(go): remove test not in use; fix test case; * fix(go): validate network; * fix(go): resolve static checks; * fix(go): new models errors; * fix(go): test errors; * fix(go): handle no records; * fix(go): add validations for user object; * fix(go): set correct extclient status; * fix(go): test error; * feat(go): make schema the base package; * feat(go): add host schema; * feat(go): use schema host everywhere; * feat(go): inline get host, list hosts and delete host; * feat(go): use non-ptr value; * feat(go): use save to upsert all fields; * feat(go): use save to upsert all fields; * feat(go): save turn endpoint as string; * feat(go): check for gorm error record not found; * fix(go): test failures; * fix(go): update all network fields; * fix(go): update all network fields; * feat(go): add paginated list networks api; * feat(go): add paginated list users api; * feat(go): add paginated list hosts api; * feat(go): add pagination to list groups api; * fix(go): comment; * fix(go): implement marshal and unmarshal text for custom types; * fix(go): implement marshal and unmarshal json for custom types; * fix(go): just use the old model for unmarshalling; * fix(go): implement marshal and unmarshal json for custom types; * NM-271:Import swap: compress/gzip replaced with github.com/klauspost/compress/gzip (2-4x faster, wire-compatible output). Added sync import. Two sync.Pool variables (gzipWriterPool, bufferPool): reuse gzip.Writer and bytes.Buffer across calls instead of allocating fresh ones per publish. compressPayload rewritten: pulls writer + buffer from pools, resets them, compresses at gzip.BestSpeed (level 1), copies the result out of the pooled buffer, and returns both objects to the pools. * feat(go): remove paginated list networks api; * feat(go): use custom paginated response object; * NM-271: Improve server scalability under high host count - Replace stdlib compress/gzip with klauspost/compress at BestSpeed and pool gzip writers and buffers via sync.Pool to eliminate compression as the dominant CPU hotspot. - Debounce peer update broadcasts with a 500ms resettable window capped at 3s max-wait, coalescing rapid-fire PublishPeerUpdate calls into a single broadcast cycle. - Cache HostPeerInfo (batch-refreshed by debounce worker) and HostPeerUpdate (stored as side-effect of each publish) so the pull API and peer_info API serve from pre-computed maps instead of triggering expensive per-host computations under thundering herd conditions. - Warm both caches synchronously at startup before the first publish cycle so early pull requests are served instantly. - Bound concurrent MQTT publishes to 5 via semaphore to prevent broker TCP buffer overflows that caused broken pipe disconnects. - Remove manual Disconnect+SetupMQTT from ConnectionLostHandler and rely on the paho client's built-in AutoReconnect; add a 5s retry wait in publish() to ride out brief reconnection windows. * NM-271: Reduce server CPU contention under high concurrent load - Cache ServerSettings with atomic.Value to eliminate repeated DB reads on every pull request (was 32+ goroutines blocked on read lock) - Batch UpdateNodeCheckin writes in memory, flush every 30s to reduce per-checkin write lock contention (was 88+ goroutines blocked) - Enable SQLite WAL mode + busy_timeout and remove global dbMutex; let SQLite handle concurrency natively (reads no longer block writes) - Move ResetFailedOverPeer/ResetAutoRelayedPeer to async in pull() handler since results don't affect the cached response - Skip no-op UpsertNode writes in failover/relay reset functions (early return when node has no failover/relay state) - Remove CheckHostPorts from hostUpdateFallback hot path - Switch to pure-Go SQLite driver (glebarez/sqlite), set CGO_ENABLED=0 * fix(go): ensure default values for page and per_page are used when not passed; * fix(go): rename v1.6.0 to v1.5.1; * fix(go): check for gorm.ErrRecordNotFound instead of database.IsEmptyRecord; * fix(go): use host id, not pending host id; * NM-271: Revert pure-Go SQLite and FIPS disable to verify impact Revert to CGO-based mattn/go-sqlite3 driver and re-enable FIPS to isolate whether these changes are still needed now that the global dbMutex has been removed and WAL mode is enabled. Keep WAL mode pragma with mattn-compatible DSN format. * feat(go): add filters to paginated apis; * feat(go): add filters to paginated apis; * feat(go): remove check for max username length; * feat(go): add filters to count as well; * feat(go): use library to check email address validity; * feat(go): ignore pagination if params not passed; * fix(go): pagination issues; * fix(go): check exists before using; * fix(go): remove debug log; * NM-271: rm debug logs * NM-271: check if caching is enabled * NM-271: add server sync mq topic for HA mode * NM-271: fix build * NM-271: push metrics in batch to exproter over api * NM-271: use basic auth for exporter metrics api * fix(go): use gorm err record not found; * NM-271: Add monitoring stack on demand * NM-271: -m arg for install script should only add monitoring stack * fix(go): use gorm err record not found; * NM-271: update docker compose file for prometheus * NM-271: update docker compose file for prometheus * fix(go): use user principal name when creating pending user; * fix(go): use schema package for consts; * NM-236: rm duplicate network hook * NM-271: add server topic to reset idp hooks on master node * fix(go): prevent disabling superadmin user; Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com> * fix(go): swap is admin and is superadmin; Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com> * fix(go): remove dead code block; #3910 (comment) * fix(go): incorrect message when trying to disable self; #3910 (comment) * NM-271: fix stale peers on reset_failovered pull and add HTTP timeout to metrics exporter Run the failover/relay reset synchronously in the pull handler so the response reflects post-reset topology instead of serving stale cached peers. Add a 30s timeout to the metrics exporter HTTP client to prevent PushAllMetricsToExporter from blocking the Keepalive loop. * NM-271: fix gzip pool corruption, MQTT topic mismatch, stale settings cache, and reduce redundant DB fetches - Only return gzip.Writer to pool after successful Close to prevent silently malformed MQTT payloads from a previously errored writer. - Fix serversync subscription to exact topic match since syncType is now in the message payload, not the topic path. - Prevent zero-value ServerSettings from being cached indefinitely when the DB record is missing or unmarshal fails on startup. - Return fetched hosts/nodes from RefreshHostPeerInfoCache so warmPeerCaches reuses them instead of querying the DB twice. - Compute fresh HostPeerUpdate on reset_failovered pull instead of serving stale cache, and store result back for subsequent requests. * NM-271: fix gzip writer pool leak, log checkin flush errors, and fix master pod ordinal parsing - Reset gzip.Writer to io.Discard before returning to pool so errored writers are never leaked or silently reused with corrupt state. - Track and log failed DB inserts in FlushNodeCheckins so operators have visibility when check-in timestamps are lost. - Parse StatefulSet pod ordinal as integer instead of using HasSuffix to prevent netmaker-10 from being misidentified as master pod. * NM-271: simplify masterpod logic * fix(go): use correct header; Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com> * fix(go): return after error response; Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com> * fix(go): use correct order of params; #3910 (comment) * fix(go): set default values for page and page size; use v2 instead of /list; * NM-271: use host name * Update mq/serversync.go Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com> * NM-271: fix duplicate serversynce case * NM-271: streamline gw updates * Update logic/auth.go Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com> * Update schema/user_roles.go Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com> * fix(go): syntax error; * fix(go): set default values when page and per_page are not passed or 0; * fix(go): use uuid.parse instead of uuid.must parse; * fix(go): review errors; * fix(go): review errors; * Update controllers/user.go Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com> * Update controllers/user.go Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com> * NM-163: fix errors: * Update db/types/options.go Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com> * fix(go): persist return user in event; * Update db/types/options.go Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com> * NM-271: signal pull on ip changes * NM-163: duplicate lines of code * NM-163: fix(go): fix missing return and filter parsing in user controller - Add missing return after error response in updateUserAccountStatus to prevent double-response and spurious ext-client side-effects - Use switch statements in listUsers to skip unrecognized account_status and mfa_status filter values * NM-271: signal pull req on node ip change * fix(go): check for both min and max page size; * NM-271: refresh node object before update * fix(go): enclose transfer superadmin in transaction; * fix(go): review errors; * fix(go): remove free tier checks; * fix(go): review fixes; * NM-271: streamline ip pool ops * NM-271: fix tests, set max idle conns * NM-271: fix(go): fix data races in settings cache and peer update worker - Use pointer type in atomic.Value for serverSettingsCache to avoid replacing the variable non-atomically in InvalidateServerSettingsCache - Swap peerUpdateReplace flag before draining the channel to prevent a concurrent replacePeers=true from being consumed by the wrong cycle --------- Co-authored-by: VishalDalwadi <dalwadivishal26@gmail.com> Co-authored-by: Vishal Dalwadi <51291657+VishalDalwadi@users.noreply.github.com> Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>
Describe your changes
Provide Issue ticket number if applicable/not in title
Provide testing steps
Checklist before requesting a review