Clare S. Y. Huang Data Scientist | Atmospheric Dynamicist

Index

 web (1)  talk-summary (1)  sql (7)  slides (1)  setup (14)  se (27)  research (18)  pyspark (14)  opinion (1)  notesfromreading (1)  nlp (2)  medium-article (1)  machine_learning_journal_club (11)  llms (1)  hacktoberfest (3)  fun (1)  ds (6)  coursera (1)  climate_tools_python (3) 

§ web (1)

: READ MORE button via jekyll

§ talk-summary (1)

: Notes on ModelOps and MLOps talks

§ sql (7)

: Implementing QuantileTransformer in Spark - mapping any kinds of distribution to normal distribution
: Aggregation of vectors using Spark Summarizer is too slow. How to get around it?
: Reduce the number of files on HDFS
: Ranking hierarchical labels with SQL
: More efficient way to do outer join with large dataframes
: Tips for writing more efficient SQL
: Handling JSON in PostgreSQL

§ slides (1)

: MDTF meeting 2023/6/5

§ setup (14)

: Problem solver for GitHub Workflow
: GitHub Actions for Package Deployment
: Deployment onto Conda forge
: Compile cython modules
: hn2016_falwa release 0.6.1 + remarks on GitHub CLI and repo management
: Migrating CI from Travis CI to GitHub Workflow
: Setting up a Dash App on PythonAnywhere
: Installing Stanford Core NLP package on Mac OS X
: Installing java on Mac
: Setting up algs4 on Linux
: Compiling tensorflow on Mac with SSE, AVX, FMA etc.
: Setting up MySQL / access with python on Mac OS
: Setting up ubuntu on AWS
: Installation of Jekyll on Windows 10

§ se (27)

: Clean up cached file from local test
: Capturing error with more details in Python
: Problem solver for GitHub Workflow
: falwa release v2.0.0
: Socket Timeout error in BigDL Orca
: falwa release v1.3.0
: Running pytest coverage check locally
: CS topics not covered in class
: Important announcement about the GitHub repo hn2016_falwa
: GitHub Actions for Package Deployment
: Deployment onto Conda forge
: IMPORTANT Bug fix release hn2016_falwa v0.7.2
: Deploy package on pip and conda channel
: Compile cython modules
: hn2016_falwa release 0.6.1 + remarks on GitHub CLI and repo management
: Package release hn2016_falwa v0.6.0
: Package release hn2016_falwa v0.5.0
: Migrating CI from Travis CI to GitHub Workflow
: Running a single test case in the unittest suite
: New pip release and changes in its way to resolve dependency conflicts
: Submitting pull request from forked repo to main repo
: Minor release of my python package + release procedures
: Local wave activity package updated to version 0.3.7
: Useful Git commands at work
: Resources on Python Packaging
: Wrapping Fortrain Codes in Python
: Software Engineering Project Note-taking

§ research (18)

: Two manuscripts submitted for peer review
: Download CMIP6 data from Google Cloud
: falwa release v2.0.0
: falwa release v1.3.0
: Important announcement about the GitHub repo hn2016_falwa
: IMPORTANT Bug fix release hn2016_falwa v0.7.2
: MDTF meeting 2023/6/5
: Local wave activity budget correction
: Paper published on GRL!
: Package release hn2016_falwa v0.6.0
: Package release hn2016_falwa v0.5.0
: Bulk download of ERA5 data from CDSAPI
: Local wave activity calculation for Southern Hemisphere available in release0.4.0
: Visit to AOS at UW-Madison to give a Colloquium
: Published on Science!
: Three co-authored papers submitted
: My python library updated to v0.2.0!
: Published a paper on Local Wave Activity Budget!

§ pyspark (14)

: Socket Timeout error in BigDL Orca
: Implementing QuantileTransformer in Spark - mapping any kinds of distribution to normal distribution
: Aggregation of vectors using Spark Summarizer is too slow. How to get around it?
: Reduce the number of files on HDFS
: Databricks Certified Associate Developer for Apache Spark 3.0
: Split a vector/list in a pyspark DataFrame into columns
: Ranking hierarchical labels with SQL
: Custom Transformer that can be fitted into Pipeline
: More efficient way to do outer join with large dataframes
: Conversion of pandas dataframe to pyspark dataframe with an older version of pandas
: Generate sequence from an array column of pyspark dataframe
: Pyspark error "Could not serialize object"
: Read libsvm files into PySpark dataframe
: Simple pyspark solutions

§ opinion (1)

: Thoughts on Nobel Prize of Physics 2024

§ notesfromreading (1)

: Reading Notes on Spark - The Definitive Guide

§ nlp (2)

: Papers on architecture of Recurrent Neural Networks (RNN)
: Comparison between different statistical language models

§ medium-article (1)

: Notes on ModelOps and MLOps talks

§ machine_learning_journal_club (11)

: Discussion on InfoNCE
: Discussion on KAN (Kolmogorov-Arnold Networks)
: Discussion on Diffusion models beat GANs on image synthesis
: Discussion on Adapters as a mean of parameter-efficient trasnfer learning for NLP
: Discussion on Difference-based Contrastive Learning for Sentence Embedding (DiffCSE)
: Discussion on Translation-Counterfactual Word Replacement (TCWR)
: Discussion on Supervised Contrastive Learning
: Discussion on the application of Contrastive Learning to train Sentence Embedding
: Discussion on Contrastive Learning
: I finished the GANs Specialization on Coursera!
: Discussion on Deep Compression

§ llms (1)

: Generative AI with LLMs

§ hacktoberfest (3)

: Running a single test case in the unittest suite
: New pip release and changes in its way to resolve dependency conflicts
: Submitting pull request from forked repo to main repo

§ fun (1)

: I started a comic series about bouldering (for fun)

§ ds (6)

: Socket Timeout error in BigDL Orca
: Implementing QuantileTransformer in Spark - mapping any kinds of distribution to normal distribution
: Common issues in RNN training
: Resources on deep learning
: Free Online Deep Learning Resources
: Python packages for Sentiment Analysis

§ coursera (1)

: I finished the GANs Specialization on Coursera!

§ climate_tools_python (3)

: Bulk download of ERA5 data from CDSAPI
: Python Library and scripts for downloading ERA-Interim Data
: Installing Python Library for downloading ERA-Interim Data