It’s fun to write about what I have learnt and share solutions to problems related to data science/machine learning I have solved.
Click onto each topic to expand the list of posts.
These posts are about tools I use most frequently at work.
2024-07-20 - Socket Timeout error in BigDL Orca
2023-03-23 - Implementing QuantileTransformer in Spark - mapping any kinds of distribution to normal distribution
2023-03-21 - Aggregation of vectors using Spark Summarizer is too slow. How to get around it?
2023-01-21 - Reduce the number of files on HDFS
2020-12-10 - Databricks Certified Associate Developer for Apache Spark 3.0
2020-09-17 - Split a vector/list in a pyspark DataFrame into columns
2020-09-14 - Ranking hierarchical labels with SQL
2020-08-01 - Custom Transformer that can be fitted into Pipeline
2020-04-16 - More efficient way to do outer join with large dataframes
2019-10-30 - Conversion of pandas dataframe to pyspark dataframe with an older version of pandas
2019-09-25 - Generate sequence from an array column of pyspark dataframe
2019-09-24 - Pyspark error "Could not serialize object"
2018-12-14 - Read libsvm files into PySpark dataframe
2018-11-28 - Simple pyspark solutions
Summary slides I made for discussion in the Machine Learning Journal Club with peers.
2024-07-20 - Socket Timeout error in BigDL Orca
2023-03-23 - Implementing QuantileTransformer in Spark - mapping any kinds of distribution to normal distribution
2023-03-21 - Aggregation of vectors using Spark Summarizer is too slow. How to get around it?
2023-01-21 - Reduce the number of files on HDFS
2020-12-10 - Databricks Certified Associate Developer for Apache Spark 3.0
2020-09-17 - Split a vector/list in a pyspark DataFrame into columns
2020-09-14 - Ranking hierarchical labels with SQL
2020-08-01 - Custom Transformer that can be fitted into Pipeline
2020-04-16 - More efficient way to do outer join with large dataframes
2019-10-30 - Conversion of pandas dataframe to pyspark dataframe with an older version of pandas
2019-09-25 - Generate sequence from an array column of pyspark dataframe
2019-09-24 - Pyspark error "Could not serialize object"
2018-12-14 - Read libsvm files into PySpark dataframe
2018-11-28 - Simple pyspark solutions
2024-09-11 - Clean up cached file from local test
2024-08-23 - Capturing error with more details in Python
2024-08-22 - Problem solver for GitHub Workflow
2024-08-17 - falwa release v2.0.0
2024-07-20 - Socket Timeout error in BigDL Orca
2024-07-14 - falwa release v1.3.0
2024-07-13 - Running pytest coverage check locally
2024-06-27 - CS topics not covered in class
2023-12-09 - Important announcement about the GitHub repo hn2016_falwa
2023-11-05 - GitHub Actions for Package Deployment
2023-09-05 - Deployment onto Conda forge
2023-09-04 - IMPORTANT Bug fix release hn2016_falwa v0.7.2
2023-06-18 - Deploy package on pip and conda channel
2023-05-04 - Compile cython modules
2022-08-28 - hn2016_falwa release 0.6.1 + remarks on GitHub CLI and repo management
2022-03-18 - Package release hn2016_falwa v0.6.0
2021-08-15 - Package release hn2016_falwa v0.5.0
2021-08-14 - Migrating CI from Travis CI to GitHub Workflow
2020-10-07 - Running a single test case in the unittest suite
2020-10-03 - New pip release and changes in its way to resolve dependency conflicts
2020-10-03 - Submitting pull request from forked repo to main repo
2020-07-14 - Minor release of my python package + release procedures
2019-09-15 - Local wave activity package updated to version 0.3.7
2019-06-20 - Useful Git commands at work
2018-07-01 - Resources on Python Packaging
2017-11-07 - Wrapping Fortrain Codes in Python
2017-08-18 - Software Engineering Project Note-taking
2024-07-20 - Socket Timeout error in BigDL Orca
2023-03-23 - Implementing QuantileTransformer in Spark - mapping any kinds of distribution to normal distribution
2019-10-14 - Common issues in RNN training
2018-01-27 - Resources on deep learning
2017-08-10 - Free Online Deep Learning Resources
2017-06-10 - Python packages for Sentiment Analysis
2024-08-22 - Problem solver for GitHub Workflow
2023-11-05 - GitHub Actions for Package Deployment
2023-09-05 - Deployment onto Conda forge
2023-05-04 - Compile cython modules
2022-08-28 - hn2016_falwa release 0.6.1 + remarks on GitHub CLI and repo management
2021-08-14 - Migrating CI from Travis CI to GitHub Workflow
2018-06-24 - Setting up a Dash App on PythonAnywhere
2018-04-12 - Installing Stanford Core NLP package on Mac OS X
2018-04-11 - Installing java on Mac
2018-01-02 - Setting up algs4 on Linux
2017-11-05 - Compiling tensorflow on Mac with SSE, AVX, FMA etc.
2017-09-09 - Setting up MySQL / access with python on Mac OS
2017-06-11 - Setting up ubuntu on AWS
2017-02-12 - Installation of Jekyll on Windows 10
2023-06-29 - Generative AI with LLMs Week 1 quiz
2023-06-29 - Generative AI with LLMs Week 2 quiz
2023-06-29 - Generative AI with LLMs Week 3 quiz
2023-06-29 - Generative AI with LLMs Week 1 (1)
2023-06-29 - Generative AI with LLMs Week 1 (2)
2023-06-29 - Generative AI with LLMs Week 2 (1)
2023-06-29 - Generative AI with LLMs Week 2 (2)
2023-06-29 - Generative AI with LLMs Week 3 (1)
2023-06-29 - Generative AI with LLMs Week 3 (2)
2024-11-20 - Competitive data science - Feature preprocessing
2024-11-20 - Udemy's course on Web Development