Backend Systems Engineer

Reproducible platforms that stay calm in production.

I build reproducible scientific platforms, agricultural modeling pipelines, and AI/ML systems that stay calm in production.

Focus
Backend platforms · AI/ML delivery · Data stewardship
Location
Remote-friendly · Multi-timezone
  • 73 Scholarly citations h-index 3 · i10-index 3 · Google Scholar
  • 7+ Years in production UConn Health 2019 → Climate LLC / Bayer 2021 → present
  • 4+ NIH-funded open-source platforms VCell · RunBioSimulations · BioSimulators · VCell Algorithms

01 / Work

Selected work

Projects built and shipped — each with a clear problem, deliberate approach, and measurable impact.

Production system · Climate LLC / Bayer Crop Science 2021 – present

Agricultural Modeling Platform

PythonFastAPIAWS ECS/EKSDynamoDBSQS/SNSApollo FederationBigQueryPrometheusOpenTelemetry

Problem

Agronomists and data scientists needed reliable, high-throughput access to geospatial agricultural models and environmental data at continental scale — with strict SLOs, replay-safe data paths, and zero tolerance for silent failures in decision-critical pipelines.

Approach

Designed and delivered FastAPI microservices for agricultural data and model-execution workflows. Owned distributed architecture on AWS (ECS/EKS, DynamoDB, SQS/SNS, SSM, ECR, IAM). Built SQS-based async ingestion with batching, DLQs, replay safety, and checkpointing. Implemented federated GraphQL platform (Apollo Federation) with schema governance across subgraphs. Integrated BigQuery analytics telemetry with cost-optimized query patterns. Shipped Prometheus/Grafana/Loki/OpenTelemetry SLO dashboards and incident runbooks.

Impact

Promoted to Senior Software Engineer (Nov 2024). Services power model execution and geospatial data delivery across Bayer's North American precision agriculture operations, processing field data at continental scale across millions of acres annually. The SQS async ingestion layer handles high-concurrency model runs with fault-isolated DLQs and replay-safe checkpointing, keeping error budgets tight under variable seasonal demand. Federated GraphQL platform unified multiple engineering teams under shared schema governance, reducing cross-team API integration friction. Service templates and internal SDKs adopted org-wide shortened new service bootstrap time from days to hours. Observability uplift with Prometheus/Grafana/Loki cut mean time to detection for production incidents and gave on-call engineers a direct path from alert to runbook.

NIH-funded research platform · UConn Health / CCAM 2019 – 2021

Virtual Cell (VCell) Platform

View ↗
JavaCC++FortranPythonDockerSingularitySLURMNATSHPC

Problem

Computational cell biologists needed a unified platform to model and simulate biochemical networks, reaction-diffusion equations, stochastic processes, and electrophysiology — across deterministic ODE, PDE, and particle-based simulation regimes — without managing heterogeneous solver environments manually.

Approach

Contributed to the VCell core platform (Java, 113+ stars, 13K+ commits) and its solver suite (vcell-solvers: Chombo adaptive mesh, NFsim network-free stochastic, Smoldyn molecular dynamics, Hy3S hybrid stochastic, finite-volume PDE solvers). Built SLURM-based job dispatch infrastructure for HPC cluster simulation execution — handling submission, monitoring, lifecycle management, recovery, and checkpointing. Containerized solver environments with Docker and Singularity for reproducible cross-platform execution. Published the SLURM dispatch client to PyPI.

Impact

VCell is actively used by cell biology research groups at universities and institutes globally, with simulations backing peer-reviewed publications across calcium signaling, cytoskeletal dynamics, and membrane transport. SLURM dispatch infrastructure brought reliable job lifecycle management to HPC-backed simulations — concurrent submissions across multiple solver types, automated recovery from transient cluster failures, and checkpointed state that prevents lost compute on long-running stochastic and PDE runs. Containerized solver environments reduced cross-platform setup overhead and made new solver types reproducibly deployable without manual environment configuration. Infrastructure work directly enabled RunBioSimulations and BioSimulators — both published in Nucleic Acids Research with 73 combined citations.

Open-source platform · NIH-funded · UConn Health 2019 – 2021

BioSimulations

View ↗
TypeScriptPythonFlaskNestJSDockerSingularitySLURMSED-ML/OMEXNATSPyPIVCell

Problem

Computational biologists needed a single web platform to share, reuse, and run biomodeling studies across dozens of heterogeneous simulation engines — SBML, CellML, NeuroML, SED-ML, and more — without installing each tool locally or managing HPC job submission manually.

Approach

Built on top of VCell's solver infrastructure: designed the SLURM dispatch server (submission, monitoring, lifecycle, recovery, checkpointing) that connects the web platform to HPC cluster simulation engines. Developed Flask and NestJS APIs for simulation setup, metadata persistence, and results retrieval. Containerized simulation environments with Docker and Singularity. Published the SLURM dispatch client to PyPI. Integrated NATS messaging for distributed coordination.

Impact

Published in Nucleic Acids Research 2021 (IF ~14). 73 total citations across the BioSimulations/BioSimulators ecosystem. Platform (biosimulations.org) supports sharing and reproducibility of entire biomodeling studies across continuous, discrete, stochastic, and logical model types. 42+ stars on GitHub.

Central registry + API · NIH-funded · UConn Health 2021 – 2022

BioSimulators

View ↗
PythonREST APIDockerSchema governanceRegistrySBMLCellMLNeuroML

Problem

No curated, machine-readable registry of standardized, containerized biosimulation tools existed. Tool versions and command-line interfaces were undocumented and incompatible — reproducibility suffered at community scale when researchers tried to reuse each other's models.

Approach

Contributed to the registry schema, validation pipeline, and API surface at biosimulators.org. Each simulator entry includes version provenance, standardized command-line and Python interfaces, and validated Docker container images. Built a test suite (Biosimulators_test_suite) and utilities (Biosimulators_utils) to validate that simulation tools implement BioSimulators standards before registration.

Impact

Published in Nucleic Acids Research 2022. Established as the canonical registry for simulation engine discovery in the computational biology community — covering SBML, CellML, NeuroML, SED-ML, and OMEX formats. Cited alongside BioSimulations in the shared 73-citation body of work. 14+ stars on the core registry repo.

02 / Approach

How I build

Four anchors that guide every backend system I design — from first contract to production runbook.

01

Service boundaries

Protocols and contracts before code.

  • Shape HTTP, gRPC, or event schemas with clear failure modes and drift controls.
  • Design rollout plans with reversible migrations and versioned interfaces.
  • Pair every interface with lightweight load tests and trace coverage.
02

Data trust

Lineage, stewardship, and replayable paths.

  • Own schemas with governance hooks, SLOs, and automated contract checks.
  • Build replay-safe workers with backpressure, DLQs, and idempotent writes.
  • Keep checkpoints observable so incidents can be audited and repaired fast.
03

Operational calm

Predictable runbooks and measurable outcomes.

  • Ship with explicit runbooks: deploy, rollback, debug, and on-call flows.
  • Keep observability first: structured logs, usable dashboards, and SLO alerts.
  • Stress services with failure drills before they meet real traffic.
04

Research-grade rigor

Bringing lab discipline to production systems.

  • Pair AI/ML integrations with evaluation harnesses and reproducible datasets.
  • Track experiments, prompts, and model choices with provenance and rollback.
  • Share decision memos so engineering choices stay transparent and reviewable.

Technical stack

Languages Primary
  • Python
  • Java · Scala
  • TypeScript / JavaScript
Backend & APIs Primary
  • FastAPI · Flask · Django
  • Spring Boot · NestJS
  • REST / OpenAPI · GraphQL Federation · gRPC
Data Primary
  • PostgreSQL · MySQL
  • DynamoDB · MongoDB · BigQuery
  • SQS/SNS · Pub/Sub · NATS
Cloud & Infra Active
  • AWS (ECS/EKS, IAM, SSM, ECR)
  • GCP · Docker · Kubernetes · Singularity
  • GitHub Actions · GitLab CI
Observability Primary
  • Prometheus · Grafana · Loki
  • OpenTelemetry · ELK
  • SLO alerting · Runbooks
Security Active
  • IAM · SSM · KMS
  • OAuth / JWT
  • SBOM · supply-chain gates

03 / Publishing

Research & service

Peer-reviewed publications, manuscripts under review, and academic service across computational biology, security, and open source.

Google Scholar
73 citations h-index 3 i10-index 3
View on Scholar ↗

Peer-reviewed publications

B. Shaikh, G. Marupilla, et al.

RunBioSimulations: an extensible web application that simulates a wide range of computational modeling frameworks, algorithms, and formats.

Nucleic Acids Research, 2021. doi:10.1093/nar/gkab411 ↗

J. Karr, G. Marupilla, et al.

BioSimulators: a central registry of simulation engines and services for recommending specific tools.

Nucleic Acids Research, 2022. doi:10.1093/nar/gkac331 ↗

Under review

TerraFlow-Agro: A Reproducible Workflow for Geospatial Agricultural Modeling

Under review

TrialFlow-Agro: A Reproducible Workflow for Trial Analytics

Under review

Professional service

Senior Member · Institute of Electrical and Electronics Engineers (IEEE) 2026
Reviewer · Journal of Open Source Software (JOSS)
Reviewer · ACM (conference / journal peer review)
Artifact Evaluation Committee (AEC) Reviewer · IEEE Symposium on Security and Privacy
Reviewer · PLOS Computational Biology
Reviewer · Briefings in Bioinformatics

04 / Contact

Ready to design
the next calm system.

Open to backend platform roles, research partnerships, and AI/ML infrastructure work. If you're building something that needs to stay reproducible under pressure — let's talk.

Send an email