I love to share what I've learnt with others. Check out my blog posts and notes about my academic research, as well as technical solutions on software engineering and data science challenges.
Opinions expressed in this blog are solely my own.

falwa release v2.2.0

18 May 2025

We published a new release of the python package falwa: v2.2.0 - Layerwise LWA flux calculations. Please refer to the release notes for the details and sample scripts.

Software Paper published on the Geoscience Data Journal

28 Apr 2025

Glad to announce the publication of our open-source software paper, “falwa: Python Package to Implement Finite-Amplitude Local Wave Activity Diagnostics on Climate Data”, to celebrate our collaborative effort (after work, since 2018!) to make this physics-based diagnostic for extreme weather events easily accessible to the scientific community. 😊

🔗 Link to the paper: https://rmets.onlinelibrary.wiley.com/doi/full/10.1002/gdj3.70006
📦 Package Repo: https://github.com/csyhuang/hn2016_falwa/

From now on, users can cite our publication as follows:

Huang, C.S.Y., Polster, C., & Nakamura, N. (2025). falwa: Python package to implement Finite-amplitude Local Wave Activity diagnostics on climate data. Geoscience Data Journal, 12(2), e70006.

Or if you are using BibTex:

@article{falwa_csyhuang_cpolster,
  author={Clare S. Y. Huang and Christopher Polster and Noboru Nakamura},
  title={falwa: Python package to implement Finite-amplitude Local Wave Activity diagnostics on climate data},
  journal={Geoscience Data Journal},
  year = {2025},
  doi = {10.1002/GDJ3.70006},
  publisher={Wiley Online Library}}

Paper accepted for publication on Geoscience Data Journal

29 Mar 2025

Just got notified by the editor that our submission to Geoscience Data Journal’s Special Edition “Open-Source Software in the Geosciences” has been accepted for publication!

falwa release v2.1.0

15 Mar 2025

We published a new release of the python package falwa: v2.1.0 - diabatic heating calculation added to QGFieldNHN22. Please refer to the release notes for the details and sample scripts.

The complete layerwise calculation of local wave activity flux and diabatic heating contribution will be available in v2.2.0 to be released tentatively in early April.

Language agent tree search

14 Dec 2024

I led the Machine Learning Journal Club discussion on the paper:

Zhou, A., Yan, K., Shlapentokh-Rothman, M., Wang, H., & Wang, Y. X. (2023). Language agent tree search unifies reasoning acting and planning in language models. arXiv preprint arXiv:2310.04406.

The slides used can be found here.

Two manuscripts submitted for peer review

30 Oct 2024

I am glad to share that we have recently submitted two manuscripts to academic journals for peer review. Please check the Publications page for details. 😁

Download CMIP6 data from Google Cloud

28 Oct 2024

This is just a post to locate resources I need. 🤔

As I am participating in the MDTF project, I need some sample climate data to test my POD. NOAA has a Google Cloud repository that stores CMIP6 data. To download the data, I need the gsutil installed on my linux machine.

I created the minimally necessary conda evironment by

conda create -n gcloud_cli python=3.11

I found a slide deck about NOAA’s public datasets online about how to read the data using Xarray:

datasets = []
for file in lets_get:
    data_path = 'gs://' + bucket_name + '/' + file
    ds3 = xr.open_dataset(fs.open(data_path), engine='h5netcdf')
    datasets.append(ds3['TSkin'])

I downloaded two datasets

$ gsutil -m cp -r   "gs://cmip6/CMIP6/CMIP/CAMS/CAMS-CSM1-0/historical/r1i1p1f2/Amon/ta"   "gs://cmip6/CMIP6/CMIP/CAMS/CAMS-CSM1-0/historical/r1i1p1f2/Amon/ua"   "gs://cmip6/CMIP6/CMIP/CAMS/CAMS-CSM1-0/historical/r1i1p1f2/Amon/va"   .

$ gsutil -m cp -r   "gs://cmip6/CMIP6/C4MIP/E3SM-Project/E3SM-1-1/hist-bgc/r1i1p1f1/Amon/ta"   "gs://cmip6/CMIP6/C4MIP/E3SM-Project/E3SM-1-1/hist-bgc/r1i1p1f1/Amon/ua"   "gs://cmip6/CMIP6/C4MIP/E3SM-Project/E3SM-1-1/hist-bgc/r1i1p1f1/Amon/va"   .

Will give a try and report my findings here.

Additional information:

Thoughts on Nobel Prize of Physics 2024

08 Oct 2024

This year’s Nobel Prize of Physics is granted to the inventers of Artificial Neural Network (ANN). As someone who have worked on both Physics and Machine Learning, I wonder what this implies - the power of ANN lies in making predictions without understanding the underlying mechanisms, while physics is precisely about making predictions by finding out the underlying mechanisms. Is that a shift in regime? 🤔

Regardless, the rise of ANN has created tonnes of job opportunities for physicists, which is an invaluable contributions to the physics community (as faculty/staff scientist positions are very limited). As an international PhD grad in the US, I’m glad that USCIS can no longer complain about Physics degrees being irrelevant to machine learning for H-1B Visa applications! 🥳🎉

Discussion on InfoNCE

05 Oct 2024

I led the Machine Learning Journal Club discussion on the paper:

Oord, A. V. D., Li, Y., & Vinyals, O. (2018). Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.

The slides used can be found here.

Clean up cached file from local test

11 Sep 2024

I found a useful python package for cleaning up cached files created in a local (suite of unit) test. called pyclean:

https://pypi.org/project/pyclean/

After installing, to clean up after a test, execute

pyclean --verbose .

would clean up the files and list out all the files deleted.

Capturing error with more details in Python

23 Aug 2024

The python native library traceback can provide more details about an (unexpected) error compared to error catching with except Exception as ex: and then examine ex.

Let’s make a function that would result in error for demonstration:

import traceback


def do_something_wrong():
    cc = int("come on!")
    return cc

try:  # first catch
    do_something_wrong()
except Exception as ex:
    print(f"The Exception is here:\n{ex}")

try:  # second catch
    do_something_wrong()
except:
    print(f"Use traceback.format_exc() instead:\n{traceback.format_exc()}")

The first catch would only display

The Exception is here:
invalid literal for int() with base 10: 'come on!'

while the second catch includes not only the error but where it occurs

Use traceback.format_exc() instead:
Traceback (most recent call last):
  File "/Users/csyhuang/JetBrains/PyCharm2024.1/scratches/scratch.py", line 16, in <module>
    do_something_wrong()
  File "/Users/csyhuang/JetBrains/PyCharm2024.1/scratches/scratch.py", line 5, in do_something_wrong
    cc = int("come on!")

In a large software project, the second example would be way more helpful than the first.

Problem solver for GitHub Workflow

22 Aug 2024

I wanted to create a GitHub workflow that did web API query and return the results as text files in the repo. Here are several problems I’ve solved during the development.

Passing keys and tokens via secrets to the web API

Several tokens and secrets are necessary to query the web API. I stored that as GitHub secrets and access them in the workflow file via:

jobs:
  job-name:
    environment: query_env
    runs-on: ubuntu-latest
    name: ...
    steps:
      ...
      - name: Query API
        id: query_api
        env:
          oauthtoken: ${{ secrets.OAUTH_TOKEN }}
          oauthtokensecret: ${{ secrets.OAUTH_TOKEN_SECRET }}
        run: python run_script.py $oauthtoken $oauthtokensecret
        ...

After running run_script.py, there will be several .txt files produced in the directory data_dir/ inside the repository which I want to push to the GitHub repository. I tried committing and pushing the files with actions/checkout@v4 but it does not work:

      ...
      - name: add files to git # Below is a version that does not work
        uses: actions/checkout@v4
        with:
           token: ${{ secrets.REPO_TOKEN }}
      - name: do the actual push
        run: |
          git add data_dir/*.txt
          git commit -m "add files"
          git push

Running this, I receive an error: nothing to commit, working tree clean. Error: Process completed with exit code 1. .

The version that works eventually looks like this:

      - name: Commit files
        uses: stefanzweifel/git-auto-commit-action@v5
        with:
           token: ${{ secrets.REPO_TOKEN }}

Note that it would commit all files produced to the repository, including some unwanted cached files. Therefore, I included a step before this to clean up the files:

      - name: Remove temp files
        run: |
          [ -d "package_dir/__pycache__" ] && rm -r package_dir/__pycache__

falwa release v2.0.0

17 Aug 2024

A new release (v2.0.0) of the python package falwa has been published to cope with the deprecation of numpy.disutils in python 3.12 and involves some changes in installation procedures, which you can find in README section “Package Installation”.

Great thanks to Christopher Polster for figuring out a timely and clean solution for migration to python 3.12. 👏 For details and references related to this migration, users can refer to Christopher’s Pull request.

Socket Timeout error in BigDL Orca

20 Jul 2024

To train deep learning model written in PyTorch with Big Data in a distributed manner, we use BigDL-Orca at work. 🛠️

Compared to the Keras interface of BigDL, PyTorch (Orca) supports customization of various components for Deep Learning. For example, using bigdl-dllib keras API, you are constrained to use only available operations in Autograd module to customize loss functions, while you can do whatever you like in PyTorch (Orca) by creating customized subclass of torch.nn.modules.loss._Loss . 😁

One drawback of Orca, though, is the mysterious error logging, as what happened within the java class (i.e. what causes the error) is not logged at all. I got stuck in error during model training, but what I got from the Spark log was just socket timeout . There can be many possibilities, but the one I encountered was about the size of train_data.

Great thanks to my colleague Kevin Mueller who figured out the cause 🙏 - when the partitions contain different number of batches in Orca, some barriers can never be reached and that results in such error.

To get around this, I dropped some rows to make sure the total size of train_data is a multiple of batch size:

train_data = train_data.limit(train_data.count() - train_data.count() % batch_size)

The training process worked afterwards. 😁

falwa release v1.3.0

14 Jul 2024

A new release (v1.3.0) of the python package falwa with some improvement in numerical scheme and enhanced functionalities has been made:

https://github.com/csyhuang/hn2016_falwa/releases/tag/v1.3.0

If you find an error, or have any questions, please submit an issue ticket to let us know.

Thank you for your attention.

Running pytest coverage check locally

13 Jul 2024

I wrote a blog post in 2021 about how to integrate pytest coverage check to GitHub Workflow.

To run coverage locally, execute coverage run --source=falwa -m pytest tests/ && coverage report -m would yield the report (this is from the PR for falwa release 1.3):

Name                           Stmts   Miss  Cover   Missing
------------------------------------------------------------
falwa/__init__.py                 11      0   100%
falwa/barotropic_field.py         41      4    90%   79, 86, 93, 138
falwa/basis.py                    66      8    88%   57-62, 175, 186
falwa/constant.py                  6      0   100%
falwa/data_storage.py            146      3    98%   52, 59, 107
falwa/legacy/__init__.py           0      0   100%
falwa/legacy/beta_version.py     240    240     0%   1-471
falwa/netcdf_utils.py              9      9     0%   6-30
falwa/oopinterface.py            400     32    92%   297, 320, 336, 355, 366, 393, 550, 559, 721, 731, 734, 799, 818, 860, 870, 880, 890, 907, 918, 929, 993, 1006-1008, 1019, 1031, 1179-1180, 1470-1471, 1550-1565
falwa/plot_utils.py              125    125     0%   6-343
falwa/preprocessing.py             7      7     0%   6-30
falwa/stat_utils.py               11     11     0%   6-26
falwa/utilities.py                61     48    21%   58-92, 151-186, 242-255
falwa/wrapper.py                 146    146     0%   6-570
falwa/xarrayinterface.py         266     37    86%   107-109, 317, 322-323, 478, 616-648, 683-704
------------------------------------------------------------
TOTAL                           1535    670    56%

I guess it’s time to work on increasing coverage again. 🙂 (Too much work recently, through.)

CS topics not covered in class

27 Jun 2024

Our team lead shared with us some useful learning materials on advanced CS topics not covered in class: The Missing Semester of Your CS Education from MIT. I’ll spend some time to read this.

Discussion on KAN (Kolmogorov-Arnold Networks)

08 Jun 2024

I led the Machine Learning Journal Club discussion on the paper:

Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., … & Tegmark, M. (2024). Kan: Kolmogorov-arnold networks. arXiv preprint arXiv:2404.19756.

Here are the slides I made.

In short, I believe it can only be practically useful if the scalability problem is solved. 👀 Let’s see how the development of this technique goes.

Older Newer

Clare S. Y. Huang Data Scientist | Atmospheric Dynamicist

Passing keys and tokens via secrets to the web API