A New Beginning

January 17, 2024

2024-01-17
The Journey Begins

Background

For a number of reasons, I became obsessed with an MMORPG called Old School RuneScape in the latter half of 2023. OSRS is a game that brings a sense of achievement, but it also consumes so much time that my life was becoming unbalanced.

At the same time, I was spearheading the implementation of a data orchestration platform called dagster at work. This project has been incredibly exciting, and has already had a profound impact on the way we move and track data within our organization. Several challenges arose with implementing dagster, many of which were related to standing up a hybrid-hosted cloud deployment. An engineer at heart, I find myself wanting to understand how everything I was building actually works at a finer level.

A week ago today, I officially dropped OSRS and began a new chapter in my life: a journey into the Cloud.

Where I'm Going

Ultimately, I'd love to implement an inexpensive, fully-scalable data platform for various initiatives:

  • Crowd-funded, open-source data for environmental, socio-economic, and educational public use such as Data Science for Social Good
  • Subscription model data for commercial use cases like real-estate arbitrage, social media insights, consumer analysis
  • Ad-funded data for "fun" use-cases like my BrawlStars dashboard, Music Bingo game, or maybe a hiking analytics application

The Tools

  • Python - the tried-and-true 'one-size-fits-all' programming language
  • dbt - a "data build tool" for maintaining and testing data models
  • dagster - data orchestration tool of choice
  • ECS or EKS - scalable service for self-hosting compute and logging
  • ChatGPT & Github Copilot - critical tools for accelerating development
  • git & GitHub Actions - version control and deployment automation

What's Missing

  • Data ingestion tool - Something like airbyte would make it easier to ingest data from all different types of sources
  • Database selection - Thinking big data, so an OLAP solution would be ideal. Will likely use PostgreSQL until there is funding to migrate to Snowflake or Redshift
  • BI dashboard tool - I'm a huge fan of streamlit, and self-hosting streamlit apps shouldn't be an issue
  • Auto ML solution - The data science ecosystem evolves rapidly. Keras, sklearn and (py)caret were big when I was taking classes, but better tools will emerge

Progress Update

After starting the A Cloud Guru Solutions Architect - Associate course last week, I immediately was given a deeper dive of several resource types I had thought I was already familiar with at this point. Boy was I wrong!

The Course

This week I covered IAM, S3, EC2, EBS, EFS, Databases, and VPC.

IAM seems pretty straightforward - users, roles, and groups with access policies applied.

S3 is so much more capable than I knew. Not only is S3 used as a general purpose file storage, but it also integrates with many other AWS resource types. CloudFront can publish logs to an S3 bucket, S3 can serve as a static web host, databases can store point-in-time snapshots in S3.

EC2 also has more depth than I previously thought. Creating AMIs for use with new EC2s from EBS blocks is a fascinating concept. Most of the security with allowed inbound and outbound ports is still way over my head.

The Platform

AWS Adventures

Shortly into the course, I was inspired to host my own website. Intrigued by the cost-effectiveness of S3 static sites, I set out to create my own publicly-facing S3 bucket behind a domain.

After wrestling with Google Domains (RIP), Squarespace, Route 53, CNAME records, Alias records, and CloudFront - I finally got my site online! The journey involved:

  • Purchasing bstroh.com, then realizing I needed Route 53
  • Buying strohb.com after learning about the 60-day transfer lock
  • Discovering "hosted zones" after almost being charged $50/month for policy records
  • Realizing S3 bucket names must match domain names exactly
  • Finally embracing CloudFront for free SSL certificates and caching

GitHub Integration

After getting the AWS side working, I connected GitHub Actions for automated deployments. Now my site updates automatically whenever I push changes!