A Step-by-Step Guide on Text Exploration

Explore the interesting insight hidden in the text with n-grams Word Cloud

Photo by Jennifer Griffin on Unsplash

Text exploration has always been my favourite process in Text Analytics. I always thrilled when I found something interesting. I have done two text analytics projects during my studies, which both I used Topic Modelling to study the topics that are discussed in the long text.

I never read the text before I do the analysis, and I should not be doing so as well because the text is extremely long, it is just not rational to read them to understand the topics. …


The ultimate guide for using pywin32 to manipulate Pivot Table in Excel

Automate Pivot Table and extract data from the filtered Pivot Table. Save your time for a cup of tea.

Photo by Jasmine Huang on Unsplash

In Automate Excel with Python, the concepts of the Excel Object Model which contain Objects, Properties, Methods and Events are shared. The tricks to access the Objects, Properties, and Methods in Excel with Python pywin32 library are also explained with examples.

Now, let us leverage the automation of Excel report with Pivot Table, one of the most wonderful functions in Excel!

Why PyWin32?

You may curious why don’t we use pandas.DataFrame.pivot or pandas.DataFrame.pivot_table from pandas library instead? It’s a built-in library that we don’t even need to install.

Well, the two pandas functions mentioned above can create the Pivot Table easily, but…


The ultimate guide for using pywin32 to manipulate Excel

Automate the boring stuff with Python to free your hand for a coffee

Photo by Brigitte Tohm on Unsplash

Most of the time, an organization will have multiple data sources, data scientist or data analyst will have to extract and compile the data from the different data source into one Excel file before performing analysis or create the model. This could be a complex and time taking task if we do it manually.

Doing this with the most famous Python library, pandas will shorten the time, but the hard-coded Excel file which might not be favor by other domain users who will access the Excel File directly. …


A Guide for Text Extraction with Regular Expression

An example of extracting text with keywords with Regular Expression.

Photo by Kelly Sikkema on Unsplash

“Regular Expression (RegEx) is one of the unsung successes in standardization in computer science,” [1].

In the example of my previous article, the regular expression is used to clean up the noise and perform tokenization to the text. Well, what we can do with RegEx in Text Analytics is far more than that. In this article, I am sharing how to use RegEx to extract the sentences which contain any keyword in a defined list from the text data or corpus. …


A guide for text processing in Python for everyone.

Text processing example with NLTK and spaCy

Photo by Brett Jordan on Unsplash

The Internet has connected the world, while Social Media like Facebook, Twitter and Reddit provided the platform for people to express their opinions and feelings toward a topic. Then, the proliferation of smartphones increased the usage of these platforms directly. For instance, there are 96% or 2,240 million Facebook active users who used Facebook by smartphones and tablets [1].

The increment in the usage of Social Media has grown the size of text data, and boost the studies or researches in Natural Language Processing (NLP), for example, Information Retrieval and Sentiment Analysis. Most of the time, the documents or the…


The most complete guide for pywin32 you could ever found.

Simplify the tedious stuff to free our mind.

Photo by Davide Baraldi on Unsplash

Have you been in a position where you search over your mailbox to download all the attachments needed? Then maybe you leave and come back forgot where you have stopped? Perhaps you still have to save them to different directories or folder afterwards?

I have been in that position before. That is why I want to automate the process of downloading the attachment to the correct directory. After that, perform a transformation to the email’s attachment accordingly.

In this article, I would compare the possible Python libraries for the solution and share how I automate the process with Python.

Comparison of Python Libraries for Accessing Mailbox


Tips on selecting the best platform for Jupyter Notebook according to your project needs.

Photo by Mitchell Luo on Unsplash

In my previous article, I shared about AI Platform Notebook, which is a cloud computing service provided on Google Cloud Platform (Link to AI Platform Notebook Article). Before I found the AI Platform Notebook, I either use Google Colab or the GCP Compute Engine Virtual Machine Instances when I needed cloud service to run my Jupyter Notebook. Every service has its Pros and Cons. In this article, I will share my little thoughts and experiences when using these cloud computing services to run my projects.

Getting Ready for Jupyter Notebook


The easiest and also the fastest way of running your Jupyter Notebook in the Cloud

A new platform for you to manage and run all your Jupyter Notebooks in the cloud with the machine specification you need.

Photo by Dallas Reedy on Unsplash

How do you run your Jupyter Notebook in the cloud?

Google Colab or other virtual machines provided by cloud computing services providers?

I believe you face the scenario before if you used the two type of services mentioned above.

  1. You run a huge model, and your Google Colab refreshed run time or stopped due to RAM full.
  2. You failed to configure the complicated VM instances.
  3. You succeed to configure the VM instances, wasted half of a day reinstalling the environment, and all the libraries and dependencies needed by your projects.

Well! You are safe from all the problems now!

This wonderful creation, GCP AI Platform Notebook is a Jupyter Notebook on Google Cloud Platform, with all the popular libraries…


Big Data Solution

A deep dive into the security issues that occur in HDFS structure, and the available technologies to protect it.

Photo by Liam Tucker on Unsplash

I. Introduction

Big data is trending. Smart devices, the Internet and technologies allowed the unlimited generation and transmission of data, and from the data, new information is gained. The big data generated are in various form, it can be structured, semi-structured or unstructured data. The traditional data processing techniques like Relational Database Management System (RDBMS) are no longer capable to store or process big data, as it has a wide variety, extremely large volume, and generated at a high speed. Here’s where Hadoop come into the loop. …


How are text analytics used in Industry 4.0?

Photo by NASA on Unsplash

I. Overview

Natural Language Processing (NLP) is described as an application and research area that study how computers and learn and exploit natural language text or speech to create meaningful stuff [1]. In order to achieve human-like language processing for a variety of tasks or applications, NLP as a theoretically inspired set of computational techniques for the analysis and representation of naturally occurring texts at one or more levels of linguistic analysis is necessary [2]. The term NLP is typically used to describe the role of computer system components, software or hardware that analyses or synthesize spoken or written language [3].

Text…

KahEm Chu

Passionate in Data Science Path. Wish to share some of my works here =]

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store