Week 2: Into the Clouds

January 25, 2024

2024-01-25

This blog is much shorter than last week's! My college roommate was in town for 3 days, and we spent our evenings recreating an automated version of a Halo 3 "trash compactor" mode I used to love playing many years ago.

Product Exploration

With a data ecosystem being the target product, I spent some of this week checking out new data tools and products at a high level. Here are some of my findings:

Snowflake Marketplace

Over 2000 data sources available, the majority of which are free. This will be a good place to quickly grab data to experiment with, or maybe even to refine and re-serve. What fascinates me is the paid datasets - Good Boy Studios serves a single data table about Dogs in the USA for a mere $12,500/month! Several providers charge on a per-query basis, such as the much-more-practical SEC Financial Analytics Kit at $1/query.

Hex

To me until now, Hex has been a cloud-hosted environment where developers can collaborate on and share Jupyter notebooks. After chatting with my buddy Nick Stanzione, I found an awesome video about Hex magic. If this short video doesn't result in a jaw drop, you can go ahead and unfollow me. Seriously - this is some incredible stuff, and it's exactly the kind of auto-BI tool I'm looking for.

Qdrant

A tool I haven't given the attention it deserves yet, but vector databases are an interesting concept I'll be keeping in my back pocket for future recommendation algorithm use-cases.

Cloud Guru AWS Solutions Architect Course

Since last week, I covered Route 53, Elastic Load Balancing, Monitoring, High Availability and Scaling, Decoupling Workflows, Big Data, and part of the Serverless Architecture chapter.

I've heard of SQS and SNS being used at my workplace, so it was neat to get a better understanding of the value those services offer. SNS could prove to be useful for job alerts in a self-hosted version of dagster.

The Big Data Reality Check

The big data section of this course solidified for me that I absolutely do not want to be using AWS as my database hosting platform. One of the demos showed the instructor configuring a ra3.16xlarge Redshift instance with an estimated monthly charge of $1,201,766.40!

This OMG moment sent me straight back to my comfort zone with Snowflake, where I confirmed that I'll be able to support all of my storage and compute needs for only a few dollars a month. Snowflake allows my warehouse to hibernate for free when I'm not running queries.

Similarly, the "orchestration" offered by AWS Step Functions is wayyyyy more involved than I want to get. With self-managed EC2 instances, Elastic Load Balancers, Elastic MapReduce, and AWS Data Pipelines, problems are being solved which just don't exist in the serverless ECS environment I've grown accustomed to.