You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Company's [<ahref="https://github.com/datajoint">Open-source Github</a>] and [<ahref="https://datajoint.com/works">Commercialized Product</a>]
136
136
</div>
137
137
<div>
138
-
<bclass="italic-text">* AWS:</b> Administrated the company's AWS account and several other customers' AWS accounts. Configured <b>VPC</b>, <b>Subnet</b>, <b>Security Groups</b>, <b>IAM</b> role and policies, <b>S3</b> lifecycle management, <b>EFS</b> access point, <b>EC2</b> instances, <b>RDS</b> instances, <b>Lambda</b> triggered by <b>SQS</b> or <b>EventBridge</b>, <b>SNS</b> and <b>SES</b>, <b>CloudWatch</b> metrics and alarms, <b>Route 53</b> DNS records, <b>Secrets Manager</b> for deployment secrets.
138
+
<bclass="italic-text">* AWS:</b> Administrated DataJoint's AWS account and several other customers' AWS accounts. Configured <b>VPC</b>, <b>Subnet</b>, <b>Security Groups</b>, <b>IAM</b> role and policies, <b>S3</b> lifecycle management, <b>EFS</b> access point, <b>EC2</b> instances, <b>RDS</b> instances, <b>Lambda</b> triggered by <b>SQS</b> or <b>EventBridge</b>, <b>SNS</b> and <b>SES</b>, <b>CloudWatch</b> metrics and alarms, <b>Route 53</b> DNS records, <b>Secrets Manager</b> for deployment secrets.
139
139
</div>
140
140
<div>
141
141
<bclass="italic-text">* CI/CD: </b> Developed generic <b>Github Actions</b> reusable workflows used by <b>30+</b> repositories followed by <ahref="https://www.conventionalcommits.org/en/v1.0.0/">Conventional Commits</a>, <ahref="https://learn.microsoft.com/en-us/devops/develop/how-microsoft-develops-devops">Release Flow</a> and <ahref="https://opengitops.dev/">GitOps</a> best practices, to automate build, test, release, publish private or open-source <b>Python</b> packages[<ahref="https://pypi.org/search/?q=datajoint">PyPI</a>] or deploy <b>Docker</b> images[<ahref="https://hub.docker.com/u/datajoint">Dockerhub</a>].
<bclass="italic-text">* Kubernetes: </b> Provisioned Kubernetes clusters for development, staging and production environments using <b>k3d or kOps</b>. Developed utility <b>bash</b> scripts with <b>helm</b> and <b>kubectl</b> to manage Kubernetes clusters more efficiently, including configuring <b>Nginx ingress</b> controller, cert manager with <b>Let's encrypt</b> issuer, <b>Cillium</b> Container Network Interface(CNI), IAM Roles for Service Account(<b>IRSA</b>), <b>Cluster Autoscaler</b>, AWS Elastic Load Balancer(<b>ELB</b>) or deploying applications like Percona XtraDB Clusters, Keycloak, JupyterHub, Flask and ReactJS based web application, etc.
145
145
</div>
146
146
<div>
147
-
<bclass="italic-text">* Ephemeral Worker Clusters: </b> Designed and developed a worker lifecycle manager using Python in about a month to fulfill an <b>urgent</b> business requirement. This development <b>polls</b> jobs from a MySQL database, then provisions and configures ephemeral EC2 instances by <b>Packer(pre-build AMI), Terraform and cloud-init</b> to compute jobs <b>at scale</b>; implemented AWS S3 mount to significantly reduce raw data downloading <b>overhead</b> and added EFS as a file cache for intermediate steps to improve computation <b>failover</b>; configured <b>NVIDIA CUDA toolkit</b> and <b>NVIDIA container runtime</b> for <b>GPU</b> workers.
147
+
<bclass="italic-text">* Ephemeral Worker Clusters: </b> Designed and developed a worker lifecycle manager using Python within one month to fulfill an <b>urgent</b> business requirement. This development <b>polls</b> jobs from a MySQL database, then provisions and configures ephemeral EC2 instances by <b>Packer(pre-build AMI), Terraform and cloud-init</b> to compute jobs <b>at scale</b>; implemented AWS S3 mount to significantly reduce raw data downloading <b>overhead</b> and added EFS as a file cache for intermediate steps to improve computation <b>failover</b>; configured <b>NVIDIA CUDA toolkit</b> and <b>NVIDIA container runtime</b> for <b>GPU</b> workers.
148
148
</div>
149
149
<div>
150
150
<bclass="italic-text">* Platform Automation: </b> To provision or terminate AWS resources using <b>boto3</b> or <b>Terraform</b>; manage customers' <b>RBAC</b> permissions using Keycloak and Github REST API; generating usage and billing report with <b>AWS S3 Inventory</b> report, <b>AWS CloudTrail</b> and <b>AWS Cost and Usage</b> report, made a <b>Plotly Dash</b> to analyze cost and usage efficiency.
<bclass="italic-text">* Jupyterhub: </b> Configured and maintained Jupyterhub deployment on a Kubernetes cluster with <b>Node Affinity</b> to assign pods onto different nodes by requirements and <b>Cluster Autoscaler</b> along with <b>AWS Auto Scaling Group</b> to accommodate <b>100+</b> active users; improved base images' <b>build time</b> and maintenance <b>overhead</b>.
154
154
</div>
155
155
<div>
156
-
<bclass="italic-text">* Observability:</b> Implemented small part of the metrics and alerts using <b>AWS CloudWatch</b>, and then later integrated <b>Datadog</b> for Kubernetes clusters' and ephemeral EC2 instances' metrics and logging through <b>OpenTelemetry</b> protocol, synthetic API testing, UI/UX monitoring.
156
+
<bclass="italic-text">* Observability:</b> Implemented a small part of the metrics and alerts using <b>AWS CloudWatch</b>, and then later integrated <b>Datadog</b> for Kubernetes clusters' and ephemeral EC2 instances' metrics and logging through <b>OpenTelemetry</b> protocol, synthetic API testing, and UI/UX monitoring.
157
157
</div>
158
158
<div>
159
-
<bclass="italic-text">* Security:</b> Set up codebase <b>vulnerability</b> scan with FOSSA; Set up <b>AWS Secrets Manager</b> working with <b>External Secret Store Operator</b> to secure Kubernetes secrets; Deployed and administrated self-hosted <b>Keycloak</b> for <b>RABC</b> authentication, further integrate it with <b>AWS IAM</b> as an <b>identity provider</b> to access AWS resources through <b>STS</b>, learned about OpenID Connect(<b>OIDC</b>) authentication flow such as authorization code flow, client credential flow, password grant flow etc.
159
+
<bclass="italic-text">* Security:</b> Set up codebase <b>vulnerability</b> scan with FOSSA; Set up <b>AWS Secrets Manager</b> working with <b>External Secret Store Operator</b> to secure Kubernetes secrets; Deployed and administrated self-hosted <b>Keycloak</b> for <b>RABC</b> authentication, further integrated it with <b>AWS IAM</b> as an <b>identity provider</b> to access AWS resources through <b>STS</b>, enabled OpenID Connect(<b>OIDC</b>) authentication flows such as authorization code flow, client credential flow, password grant flow etc.
160
160
</div>
161
161
<div>
162
-
<bclass="italic-text">* MySQL Database:</b> Maintained a self-hosted <b>Percona XtraDB Clusters</b> on database <b>daily backup</b> stored on <b>S3</b>, <b>mysqldump</b> backup redundancy, Point-in-Time Recovery(<b>PITR</b>), <b>deadlock</b> detection, slow query log.
162
+
<bclass="italic-text">* MySQL Database:</b> Maintained a self-hosted <b>Percona XtraDB Clusters</b> on database <b>daily backup</b> stored on <b>S3</b>, <b>mysqldump</b> backup redundancy, Point-in-Time Recovery(<b>PITR</b>), <b>deadlock</b> detection, and slow query log.
0 commit comments