Portrait of Yang Jiao

Data Scientist / ML Engineer

Yang Jiao

I'm a PhD-trained data scientist and Google Cloud Certified Professional Machine Learning Engineer with 6+ years of experience building end-to-end ML systems in production.

I specialize in taking models beyond notebooks—designing scalable data pipelines, deploying time-series and recommendation systems, and building internal data products that drive measurable business impact. From forecasting revenue and automating financial workflows to improving recommendation accuracy with production-grade feature engineering, I focus on reliable, maintainable, and value-generating ML solutions.

I'm deeply passionate about building robust, scalable technology and applying machine learning where it creates meaningful, measurable impact in real-world products.

See projects

Selected Projects

Case studies that move from research insight to production impact.

Job Hunting Assistant AI Agent

Production Web App · ML-Powered Workflow

I built and deployed a production-oriented web application that turns job search tracking into an ML-powered workflow. The product combines structured application management with semantic search and an AI assistant, so users can quickly find relevant opportunities, import job data from web/CSV sources, and make better application decisions with analytics. This project demonstrates my end-to-end ownership across modeling, backend APIs, frontend UX, and cloud deployment.

Tech stack

  • Frontend: React 18 + Vite, React Router, Zustand, Tailwind CSS, Recharts, Axios, Nginx.
  • Backend: Python FastAPI, JWT auth, LangChain-based chatbot tooling, Sentence Transformers (all-mpnet-base-v2).
  • Data layer: PostgreSQL + pgvector (768-d embeddings, HNSW index for similarity search).
  • AI capabilities: Semantic retrieval, LLM integration (Anthropic Claude / Google Gemini), tool-calling for CSV import, web extraction, and SQL generation.

AWS ECS deployment

  • Containerized frontend, backend, and PostgreSQL services with Docker, pushed to Amazon ECR.
  • Deployed on Amazon ECS Fargate with private/public subnet architecture, ALB routing, and service discovery via AWS Cloud Map.
  • Implemented blue/green deployments for frontend and backend using AWS CodeDeploy.
  • Managed secrets through AWS Secrets Manager and centralized logs in CloudWatch.
  • Automated infra + app delivery through GitHub Actions (OIDC to AWS, CloudFormation for IaC, ECS task registration/deployment).

ENTSO-E Load Forecasting App

End-to-End ML App · Time-Series Forecasting

I built an end-to-end machine learning application for forecasting day-ahead electricity load in the Germany-Luxembourg (DE-LU) bidding zone. The app ingests load data from the ENTSO-E Transparency Platform and weather data from Open-Meteo, engineers time-series features, and trains forecasting models for operational use. It includes a FastAPI backend for serving forecasts and metrics and a Streamlit dashboard for interactive monitoring and exploration.

Tech stack

  • Language: Python 3.10+.
  • Data & ML: pandas, numpy, scikit-learn, LightGBM, joblib.
  • Data sources: ENTSO-E API (entsoe-py), Open-Meteo API (openmeteo-requests).
  • Backend API: FastAPI, Uvicorn, Pydantic, Pydantic Settings.
  • Dashboard/UI: Streamlit, Plotly.
  • Database layer: PostgreSQL (psycopg2-binary), SQLAlchemy.
  • Infrastructure & runtime: Docker, Docker Compose.
  • Developer tooling: pytest, pytest-cov, pytest-asyncio, Ruff, mypy, Makefile.

Contact

Open to data science and ML engineer opportunities.


Find me on