Shao Ying (Clare) Huang Data Scientist | Atmospheric Dynamicist

I have just completed my Ph.D degree in Geophysical Sciences in the University of Chicago and currently work as a postdoctoral research assistant in Professor Noboru Nakamura's group. My dissertation focuses on developing an atmospheric Rossby wave diagnostic theory that quantifies the severity of extreme weather events. I am now working on the search of predictive features of atmospheric blocking using the local wave activity theory I developed and some machine learning techniques. Check out my python library to implement this diagnosis on climate data!

I also participated in the Insight Data Science Fellowship Program in 2017 summer as a Health Data Fellow to learn about data science and health industry. Currently, I work in the Research Analytics team at Tempus.

I love to share what I've learnt with others. Check out my blog posts and notes for technical solutions I've found and coursenotes.


Installing Stanford Core NLP package on Mac OS X

I am following instructions on the GitHub page of Stanford Core NLP under Build with Ant. To install ant, you can use homebrew:

$ brew install ant

In Step 5, you have to include the .jar files in the directory CoreNLP/lib and CoreNLP/liblocal in your CLASSPATH. To do this, first, I install coreutils:

brew install coreutils

such that I can use the utility realpath there. Then, I include the following in my ~/.bashrc:

for file in `find /Users/clare.huang/CoreNLP/lib/ -name "*.jar"`;
  do export CLASSPATH="$CLASSPATH:`realpath $file`";
done

for file in `find /Users/clare.huang/CoreNLP/liblocal/ -name "*.jar"`;
  do export CLASSPATH="$CLASSPATH:`realpath $file`";
done

(I guess there are better ways to combine the commands above. Let me know if there are.)

To run CoreNLP, I have to download the latest version of it, and place it in the directory CoreNLP/:

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-01-31.zip

The latest version is available on their official website. Unzip it, and add all the .jar there to the $CLASSPATH.

Afterwards, you shall be able to run CoreNLP with the commands provided in the blogpost of Khalid Alnajjar (under Running Stanford CoreNLP Server). If you have no problem starting the server, you shall be able to see the interface on your browser at http://localhost:9000/:

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,parse,sentiment" -port 9000 -timeout 30000

Yay. Next, I will try setting up the python interface.

Installing java on Mac

The information of this post was learnt from this StackOverflow post and also David Cai’s blog post on how to install multiple Java version on Mac OS High Sierra.

With brew cask installed on Mac (see homebrew-cask instructions), different versions of java can be installed via the command (I want to install java9 here, for example):

brew tap caskroom/versions
brew cask install java9

After installing, the symlink /usr/bin/java is still pointing to the old native Java. You can check where it points to with the command ls -la /usr/bin/java. It is probably pointing to the old native java path: /System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands/java .

However, homebrew installed java into the directory /Library/Java/JavaVirtualMachines/jdkx.x.x_xxx.jdk/Contents/Home.

To easily switch between different java environments, you can use jEnv. The installing instructions can be found on jEnv’s official page.

Adding an RSS feed to this site

Here is the link to the RSS feed of this blog.

Thanks to the instructions on Joel Glovier’s blog post.

Python Library and scripts for downloading ERA-Interim Data

Update: The ECMWF api client is now available on pypi and anaconda.

Installation can be done by the command:

pip install ecmwf-api-client

To use the sample script, you need an API key ( .ecmwfapirc ) placed in your home directory. You can retrieve that by logging in: https://api.ecmwf.int/v1/key/ Create a file named “.ecmwfapirc” in your home directory and put in the content shown on the page:

{
    "url"   : "https://api.ecmwf.int/v1",
    "key"   : "(...)",
    "email" : "(...)"
}

After doing that, in the directory with the sample script example.py, you can test the package by running it:

python example.py

You should see it successfully retrieves a .grib file if the package has been set up properly.

There are sample scripts available on the ECMWF website (look under “Same request NetCDF format”). Below is a example of python script I wrote to retrieves zonal wind, meridional wind and temperature data at all pressure levels during the time period 2017-07-01 to 2017-07-31 in 6-hour intervals:

#!/usr/bin/env python
from ecmwfapi import ECMWFDataServer
server = ECMWFDataServer()

param_u, param_v, param_t = "131.128", "132.128", "130.128"

for param_string, param in zip(["_u", "_v", "_t"],
                               [param_u, param_v, param_t]):

    server.retrieve({
        "class": "ei",
        "dataset": "interim",
        "date": "2017-07-01/to/2017-07-31",
        "expver": "1",
        "grid": "1.5/1.5",
        "levelist": "1/2/3/5/7/10/20/30/50/70/100/125/150/175/200/225/250/300/350/400/450/500/550/600/650/700/750/775/800/825/850/875/900/925/950/975/1000",
        "levtype": "pl",
        "param": param,
        "step": "0",
        "stream": "oper",
        "format": "netcdf",
        "time": "00:00:00/06:00:00/12:00:00/18:00:00",
        "type": "an",
        "target": "2017-07-01/to/2017-07-31" + param_string + ".nc",
    })

I learnt the above steps on these pages:

Resources on deep learning

I have been searching for solutions how to use Recurrent Neural Networks for text classifications. Here are some useful resources I’ve found:

my widget for counting (since Dec24, 2016)