AlphaEarth Foundation Satellite Embeddings

“AlphaEarth Foundations provides a powerful new lens for understanding our planet by solving two major challenges: data overload and inconsistent information.” — from the “How AlphaEarth Foundations Works” section of the DeepMind post.

The key word is lens. That is what this product appears to be.

Before we look at the embeddings, let’s look at the spatial foundation used to build them. Here is a yearly Red, Green, and Blue composite from Sentinel-2 L1C that matches the 10 m resolution of the AEF embeddings.

Sentinel-2 RGB composite of Michigan (2017)

This image is what most people imagine when they look at satellite data. It’s not far from a plane window view. When you first looked at it you likely noticed features relevant to your domain or places you know. Because of what you know, the image is worth more than its RGB components.

Now to the AEF satellite embeddings. They were produced by a model that sought to replicate that kind of understanding by digesting an individually incomprehensible volume of information and distilling it into vectors that capture small amounts of human-interpretable structure. When we visualize different embeddings as if they were the R, G, and B channels of a composite image, the results are fascinating because recognizable phenomena emerge.

However, with 64 unique embedding layers, checking every possible RGB combination (n = 41,664) would take time.

Embeddings animation

The Earth Engine Developers site has solid guides on using these embeddings for classic LULC and remote-sensing tasks. One clear use is similarity search and visual exploration, much like how people have used Google Earth, NAIP, and other high-resolution sources to curate training and validation datasets for models built on traditional EO inputs (e.g., Landsat and Sentinel).

A caution: do not treat these embeddings as if they were native sensor bands. EO instruments are chosen for known physical relationships to real-world processes. Embeddings are powerful, but they are still a black-box representation. As relationships to biomass, crop yield, and other variables are reported, we should ground them in the physics and measurements that trained the model. Foundation models should strengthen the case for continuing programs like Landsat and for expanding EO, ground validation, and basic science. The more lenses we have, the more complete our understanding of Earth’s condition and change.

AI Disclaimer: Chatgpt/GPT-5 was used for editing this post and helping with generating some of the visualization pulled from Google Earth Engine.

Doing AI Hydrology to assess water resources for AI expansion... to continue doing AI Hydrology

In January (2025), the Stargate project was announced by the president of the United States, which will be a $500 billion investment over the next four years building new AI infrastructure.

In 1956, we introduced the National Interstate and Defense Highways Act (NIDHA) with an authorization of $25 billion (equivalent to $215 billion in 2024). The typical take on the NIDHA is that it facilitated an economic growth, national security and personal mobility. Another take, however, is that the highways destroyed American cities, segregated economic classes, and created a mobility barrier requiring a personal automobile. There is no doubt, what-so-ever, that the NIDHA changed American life and culture (for better or for worse...). Shortly after the NIDHA the air quality in Los Angeles reached an all time low, before the Clean Air Act in 1963 set a course for air quality improvement. Along with the Clean Water Act and the Endangered Species Act, these regulations are largely responsible for any protections we have against corporate pillaging of the environment.

Part of the Stargate project is a commitment from the federal government to eliminate any barriers to the expansion of AI Infrastructure. Presumably, this means no need for environmental regulation, nor consideration of the impact to American lives or culture.

Expanding AI Infrastructure will require increasing H2O demand, consumptive use, and pollution discharge, even with environmental regulation in place. The hydrological sciences now are thoroughly imbedded with AI Research. Common "low hanging fruit" papers are simply just slight modification of Deep Learning architectures, to keep up with the latest advancements from computer science, and adding a line on a standard benchmarking dataset. This is not by accident, but was actually called for by early adopters of the third book of AI in hydrology (including me).

Use of AI tools, like LLMs, is rapidly on the rise. From a H20 standpoint, this fact is itself troublesome. Even though I find these tools incredibly useful for my daily tasks, particularly programming, I am now beginning to reflect on the importance of my task in general. I've been writing computer programs for hydrology and hydraulics modeling since 2008. Sometimes these models are considered "AI", sometimes they aren't, but recently almost all my models incorporate some aspect of AI. My whole academic career has been defined by my development and analysis of AI for hydrologic and hydraulics modeling. You see, I've been riding this growing wave of research funding specifically for AI for H&H modeling. We are now at a point where hydro research without AI has little chance of funding. And I suspect this to be the case in most other academic fields as well. The feeling I am getting is "Do AI, or we aren't going to fund your research".

Recently though, I've heard several anecdotes of businesses now taking the same stance. Startup companies that don't have much AI tech involved probably aren't going to get funded. Just yesterday I heard that one company that had a long standing client in MS, but didn't have an AI portfolio, was told by MS to "start doing AI, or stop working with us." This goes well beyond a growth of a technology due to an organic increasing demand, and is starting to seem like growth for its own sake, perhaps as a means to keep the "bubble" growing.

I've spend my entire life advocating for means of transportation other than personal automobiles. I didn't know why, I just liked walking, biking and skateboarding over sitting in a car. In my late teenage years, I figured out that I really enjoyed small streets, vs large streets, but didn't know why. I stumbled upon a blog/group call Strongtowns some point in my early 20s, and I figured out the why. Small streets feel more comfortable because they are designed at the "human scale", while large streets with high speed limits are uncomfortable because they are designed for automobiles which need more space, particularly when traveling at high speeds. From Strongtowns, I also learned about the Automobile Infrastructure Growth Ponzi Scheme. In short, we continue to develop new roads, and we neglect to maintain the roads we have. It is growth for its own sake, not for the benefit of travelers. In most cases, I would actually argue (feel free to ask me about this later), that expanding road networks actually inhibits travel. Automobile infrastructure growth happens to take a whole lot of H&H modeling.

I've now found myself in a scenario where I'll be using and developing AI-based hydrologic modeling in order to evaluate and plan for increasing H20 resources for AI expanding infrastructure. What a time to be alive!

AI Hydro for AI Infrastructure

Note: This blog was written without the use of LLMs. Google was used sparingly.

Coding Blog: How to access NWM forcings using VirtualiZarr

One of the most underrated aspects of NOAA's weather data is how much data is published on a daily basis. The National Water Model (Cosgrove et al. 2024) produces nine total operational configurations, each with a different look back peroid, forecast range, and domain extent. The number of .nc files output to S3 Object Storage is in the tens of thousands... per day!

While this amount of data is monumental for machine learning, or other hydrological analysis, it's cumbersome to read every .nc file individually, or download this amount of data to disk. That is where VirtualiZarr comes into play as it allows existing .nc files / structures to be viewed as zarr stores/xarray datasets without having to duplicate any data!

Below is a tutorial on how you can use VirtualiZarr to read a forecast from the National Water Model as a singular zarr store for your own studies

Soils Animated: Part 1

Animation is a powerful tool for demonstrating scientific concepts. It is engaging, simplifies abstract ideas, and makes them accessible to a wide audience.

In hydrology, one of the most simple yet elegant conceptualization of soil-water dynamics is the soil moisture loss function, a model developed by Laio et al. (2001) and Rodriguez-Iturbe et al. (1999). This model abstract complex soil processes at the field scale into a few key variables, providing ecohydrologists with a powerful 'toy model' for conducting a variety of interesting experiments. For example, Entekhabi and Rodriguez-Iturbe (1994) explored the impacts of spatio-temporal aggregation on characterizing heterogeneity of soil moisture dynamics. D'Odorico and Porporato (2004) used it to explain soil moisture seasonality.

In this blog post, I'll animate this soil dynamics model, with Part 1 focusing on the basic concepts. Animations in this blog post will illustrate how hydrologists conceptualize the soil processes happening just above and beneath our feet, while also exploring how these processes unfolds across different conceptual spaces.

The Technical side of AGU 2024: What happened and where are we going

Two weeks ago, the American Geophysical Union (AGU) hosted its annual fall meeting in Washington, D.C., with over 25,000 attendees from 100+ countries present to share their research. For those reading who have not been, nor heard of AGU, there are four major themes present:

  • Earth's subsurface
  • Earth's surface
  • The atmosphere
  • Space

Among these four themes, there are several sections, and within each section there are many sessions corresponding to a research topic proposed by a group of scientists. Generally, most scientists submit one abstract to their field of study, and rarely, a second to a different section. At the conference research conversations occurred at posters, sessions, and oddly timed coffee hours during the lulls in programming (had to get my yearly zinger at AGU's coffee policy). Now that my brain, and feet from the 20,000 daily steps, have recovered, I want to write about my most significant takeaway from the week and where I predict things will be headed next year.

Hydrology is flat, and its buckets all the way down!

For some reason, much of my recent work keeps coming back to buckets, and re-thinking the conceptualization of natural hydrologic systems as buckets. I am generally sick of talking about buckets. I'm hoping that this post is my farewell to thinking about buckets, at least for a while.

Sir Edmond Leakybucket

When I was first learning differential equations the professor told us a silly story about Sir Edmond LeakingBucket, some ol' timey English royal who had to drink his ale quickly because his ale bucket leaked. I went on to study hydrology, so I've had to think about Sir Edmond for the past fiteen years. I can't escape him. Sometimes he mixes two kinds of ales together, sometimes his ale bucket is more complicated or simpler, but he is always losing his ale. Poor guy. Sir Edmond and his bucket do two important things: 1) gives nice differential equation examples, but more importantly for hydrology 2) Leaking buckets are a primary conceptualization for hydrologic processes.

A simple differential equation for the ale level in Sir Edmond's bucket is:

\[ \frac{dh}{dt} = -k \sqrt{h} \]

Where \(h(t)\) is the ale at time \(t\), k is a proportionality constant that governs the rate of outflow. Its solution through seperation of variables is:

\[ h(t) = \left(\sqrt{h_0} - \frac{k}{2}t\right)^2 \]

Where \(h_0\) is the initial ale level in the bucket at time \(t = 0\). This gives us the opportunity to track volumes of ale through this bucket, and match the fluxes from buckets with data collected on real-world hydrological systesm. This is, in a nutshell, the field of computational hydrology, we just need to dress up and add complications to this bucket, and off we go.