Installing Stanford Core NLP package on Mac OS X

12 Apr 2018

I am following instructions on the GitHub page of Stanford Core NLP under Build with Ant. To install ant, you can use homebrew:

$ brew install ant

In Step 5, you have to include the .jar files in the directory CoreNLP/lib and CoreNLP/liblocal in your CLASSPATH. To do this, first, I install coreutils:

brew install coreutils

such that I can use the utility realpath there. Then, I include the following in my ~/.bashrc:

for file in `find /Users/clare.huang/CoreNLP/lib/ -name "*.jar"`;
  do export CLASSPATH="$CLASSPATH:`realpath $file`";
done

for file in `find /Users/clare.huang/CoreNLP/liblocal/ -name "*.jar"`;
  do export CLASSPATH="$CLASSPATH:`realpath $file`";
done

(I guess there are better ways to combine the commands above. Let me know if there are.)

To run CoreNLP, I have to download the latest version of it, and place it in the directory CoreNLP/:

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-01-31.zip

The latest version is available on their official website. Unzip it, and add all the .jar there to the $CLASSPATH.

Afterwards, you shall be able to run CoreNLP with the commands provided in the blogpost of Khalid Alnajjar (under Running Stanford CoreNLP Server). If you have no problem starting the server, you shall be able to see the interface on your browser at http://localhost:9000/:

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,parse,sentiment" -port 9000 -timeout 30000

Yay. Next, I will try setting up the python interface.

Installing java on Mac

11 Apr 2018

The information of this post was learnt from this StackOverflow post and also David Cai’s blog post on how to install multiple Java version on Mac OS High Sierra.

With brew cask installed on Mac (see homebrew-cask instructions), different versions of java can be installed via the command (I want to install java9 here, for example):

brew tap caskroom/versions
brew cask install java9

After installing, the symlink /usr/bin/java is still pointing to the old native Java. You can check where it points to with the command ls -la /usr/bin/java. It is probably pointing to the old native java path: /System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands/java.

However, homebrew installed java into the directory /Library/Java/JavaVirtualMachines/jdkx.x.x_xxx.jdk/Contents/Home.

To easily switch between different java environments, you can use jEnv. The installing instructions can be found on jEnv’s official page.

Adding an RSS feed to this site

30 Mar 2018

Here is the link to the RSS feed of this blog.

Thanks to the instructions on Joel Glovier’s blog post.

Python Library and scripts for downloading ERA-Interim Data

23 Feb 2018

Update: ECMWF API Clients on pip and conda

The ECMWF API Python Client is now available on pypi and anaconda.
The Climate Corporation has distributed the ECMWF API Python Client on pypi. Now it can be installed via:

pip install ecmwf-api-client

Anaconda users on OS X/linux system can install the package via:

conda install -c bioconda ecmwfapi

To use the sample script, you need an API key ( .ecmwfapirc ) placed in your home directory. You can retrieve that by logging in: https://api.ecmwf.int/v1/key/ Create a file named “.ecmwfapirc” in your home directory and put in the content shown on the page:

{
    "url"   : "https://api.ecmwf.int/v1",
    "key"   : "(...)",
    "email" : "(...)"
}

After doing that, in the directory with the sample script example.py, you can test the package by running it:

python example.py

You should see it successfully retrieves a .grib file if the package has been set up properly.

There are sample scripts available on the ECMWF website (look under “Same request NetCDF format”). Below is a example of python script I wrote to retrieves zonal wind, meridional wind and temperature data at all pressure levels during the time period 2017-07-01 to 2017-07-31 in 6-hour intervals:

#!/usr/bin/env python
from ecmwfapi import ECMWFDataServer
server = ECMWFDataServer()

param_u, param_v, param_t = "131.128", "132.128", "130.128"

for param_string, param in zip(["_u", "_v", "_t"],
                               [param_u, param_v, param_t]):

    server.retrieve({
        "class": "ei",
        "dataset": "interim",
        "date": "2017-07-01/to/2017-07-31",
        "expver": "1",
        "grid": "1.5/1.5",
        "levelist": "1/2/3/5/7/10/20/30/50/70/100/125/150/175/200/225/250/300/350/400/450/500/550/600/650/700/750/775/800/825/850/875/900/925/950/975/1000",
        "levtype": "pl",
        "param": param,
        "step": "0",
        "stream": "oper",
        "format": "netcdf",
        "time": "00:00:00/06:00:00/12:00:00/18:00:00",
        "type": "an",
        "target": "2017-07-01/to/2017-07-31" + param_string + ".nc",
    })

I learnt the above steps on these pages:

Resources on deep learning

27 Jan 2018

I have been searching for solutions how to use Recurrent Neural Networks for text classifications. Here are some useful resources I’ve found:

Open Source Deep Learning Server has a library of pre-trained neural nets.
An article on Analytics Vidhya discuss about transfer learning & The art of using pre-trained models in deep learning. They also have an article about word embeddings.
Keras has a list of deep learning models with pre-trained weights.
Hyperopt is a python library for Distributed Asynchronous Hyperparameter Optimization.
Hands-On Machine Learning with Scikit-Learn and TensorFlow has discussion on deep learning from Ch.10 onward.
Kaggle forum has a beginner tutorial of using RNN to classify toxic comments on wikipedia editing page.
(To be updated.)

Three co-authored papers submitted

23 Jan 2018

The publication page has been updated with 3 submitted manuscripts.

Updates on Feb 9, 2018: The manuscript “Role of Finite-Amplitude Rossby Waves and Nonconservative Processes in Downward Migration of Extratropical Flow Anomalies” has been accepted by Journal of Atmospheric Sciences.

The subroutine wrapper.qgpv_eqlat_lwa_ncforce for computing effective diffusivity, which quantifies the damping on wave transiences by irreversible mixing in the stratosphere during a stratospheric sudden warming event, can be found in my python package.

Setting up algs4 on Linux

02 Jan 2018

I am interested in going through the exercise from Princeton University’s Algorithm course. I found someone wrote a handy bash script to set up the environment on Mac OS/Linux:

https://gist.github.com/JIghtuse/021604bee56bddab6173c919da7dd2ad

My python library updated to v0.2.0!

20 Nov 2017

I have updated my python library hn2016_falwa to v0.2.0 (see release note! Now it includes functions to compute the contribution of non-conservative forces to wave activity.

Moreover, the documentation page generated with Sphinx is now hosted on readthedocs.org! Check it out!

A side note: somehow I made multiple commits to remedy mistake. The git commands to squash the (3, for example) commits are:

git rebase -i origin/master~3 master
git push origin +master

Wrapping Fortrain Codes in Python

07 Nov 2017

To start with, the documentation in Numpy explains how we can wrap fortran code in python using f2py.

You need a fortran compiler to run f2py. I’ve found a pre-compiled version of GCC readily installed on Mac OS X.

(To be continued)

Compiling tensorflow on Mac with SSE, AVX, FMA etc.

05 Nov 2017

(Ideally, I shall run tensorflow somewhere else rather than on my MacBook.)

When I install keras with Anaconda on my Mac OS X, with tensorflow as the backend, the following warning comes up when running the sample script:

I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

To use those instructions (SSE4.1 SSE4.2 AVX AVX2 FMA), tensorflow has to be compiled from source. The instructions are available here. Using the following command to build the source:

bazel build -c opt --copt=-march=native --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

I got the following error:

Problem with java installation: couldn't find/access rt.jar in /Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home

It happens that rt.jar is not present in Java 9. To solve it, install a version of Java 8:

brew cask install caskroom/versions/java8

and then specify the path of Java 8 (Change the version number ‘1.8.0_162’ to that of the version you installed):

export JAVA_HOME=$(/usr/libexec/java_home -v 1.8.0_162)

Afterwards, I try to build tensorflow from source again, and it successfully includes the instructions above.

Setting up MySQL / access with python on Mac OS

09 Sep 2017

Today I wanted to setup an automated kickstarter scraper on Python Anywhere but realized that only MySQL is freely supported there (while I’ve been using PostgreSQL). So, a time to switch?

Here is how I install MySQL on my Mac and have it accessed with SQLAlchemy:

Download and install MySQL from Oracle.
Go to System Preferences to start the MySQL server.
Navigate to the bin directory and login with the temporary password shown at the end of the installation:
```
cd /usr/local/mysql/bin
./mysql -u root -p
```
Create another set of username and password that you use instead of root.
```
CREATE USER username@localhost IDENTIFIED BY 'password'
```
I have installed pymysql and sqlalchemy in Python to access the MySQL database. To access the database:

from sqlalchemy import create_engine
from sqlalchemy_utils import database_exists, create_database

# dbname is the database name
# user_id and user_password are what you put in above

engine = create_engine("mysql+pymysql://%s:%s@localhost:3306/%s"
                       %(user_id,user_password,dbname),echo=False)
if not database_exists(engine.url): 
    create_database(engine.url)			# Create database if it doesn't exist.
    
con = engine.connect() # Connect to the MySQL engine
table_name = 'new_table'
command = "DROP TABLE IF EXISTS new_table;" # Drop if such table exist
con.execute(command)

Executing SQL commands is rather easy by using:

con.execute(command)

Enjoy! :)

Software Engineering Project Note-taking

18 Aug 2017

Learnt a lot from peers today! :D Here are quick notes on packages they have used for their software engineering project:

Docker Hub: You store your docker images there.

DigitalOcean: The docker images are pulled there, together with the images.

Sphinx tutorial: Useful instructions how to set Python documentation up

pydot and graphviz: Draw graphs of objects and arrows

Free Online Deep Learning Resources

10 Aug 2017

These are resources related to deep learning from conversation with friends:

Deep Learning Paper Reading Roadmap

Neural Networks and Deep Learning

Books introduced in newsletter of Data Science Central:

Deep Learning Book

Learning html, CSS, javascripts and jinja2

24 Jun 2017

On my way building a webapp with python and Flask, I need to include input options to make the app interactive. Here are some great sites I’ve learnt things from:

More updates later.

Setting up ubuntu on AWS

11 Jun 2017

Solution for the error libSM.so.6: cannot open shared object file.

Python packages for Sentiment Analysis

10 Jun 2017

On top of utilities in nltk.sentiment, there are also some packages for training and combining classifiers:

NLTK-trainer by Jacob Perkins, accompanied by the NLTK-cookbook he wrote.
Pattern.en developed by the Computational Linguistics Research Group
Data sets for training a sentiment classifier: Movie Review Data

(More to be updated)

Published a paper on Local Wave Activity Budget!

15 May 2017

I’ve published a new paper on Geophysical Research Letters!

Climate dynamicists have derived a conservation relation based on small-amplitude wave assumption for wave activity (A) that describes evolution of Rossby wave packets:
Wave activity flux equation
However, only the wave activity flux vector on the RHS has been used to diagnose realistic climate data. A is ill-defined when wave amplitude is large (i.e. ‘of finite-amplitude’). In Huang & Nakamura (2016), we introduced a new theory of wave activity applicable to large waves. We thus can obtain a well-defined A even from real data. This is the first piece of work that compare LHS and RHS of the conservative part of equation above for reanalysis data. This advance allows us to estimate the overall non-conservative contribution (natural/human-induced forcings) to the observed flow.

Major results include:

(1) Our estimation of transient wave activity (top panel) is consistent with previous work (bottom panel, assuming small-amplitude waves) and is better behaving.

Comparison with previous work

(2) We can break down the local wave activity budget at seasonal time-scale.

Wave activity flux equation

(3) We can also break down the budget in synoptic time-scale with the use of co-spectral analysis.

Wave activity flux equation

Switching to Jekyll

12 May 2017

I’m switching from a traditional html webpage builder to Jekyll user! Hope to update more often!

I set up Jekyll in my Mac OS X with homebrew, rbenv and RubyGems.

To see how to set jekyll up on Windows, refer to my older post.

Installation of Jekyll on Windows 10

12 Feb 2017

Below are the procedures I used to install Jekyll with problem solvers:

Main reference sites:

Jekyll on Windows
Easily install Jekyll on Windows with 3 command prompt entries and Chocolatey

Procedures:

Open Powershell
Run Powershell as administrator: [Reference]

Start-Process powershell -Verb runAs
Change execution policy to enable installation of Chocolatey (a package manager): [Reference]

Set-ExecutionPolicy RemoteSigned
Installing Chocolatey: [Reference]

iwr https://chocolatey.org/install.ps1 -UseBasicParsing | iex
Update the certificate to install ruby: [Reference]
http://guides.rubygems.org/ssl-certificate-update/
Install ruby:

choco install ruby -y
Close the window and open a new command prompt with Administrator access (i.e. step 2)
Install gem bundler

gem install bundler
Install Jekyll

gem install jekyll
Done :)

Installing Python Library for downloading ERA-Interim Data

13 Jun 2016

Update: ECMWF API Clients on pip and conda

The ECMWF API Python Client is now available on pypi and anaconda.
The Climate Corporation has distributed the ECMWF API Python Client on pypi. Now it can be installed via:

pip install ecmwf-api-client

If you are using anaconda, OS X/linux users can install that via

conda install -c bioconda ecmwfapi

Clare S. Y. Huang Data Scientist | Atmospheric Dynamicist

Update: ECMWF API Clients on pip and conda

Update: ECMWF API Clients on pip and conda