We welcome the ascension of our artificial overlords!

Back

27th June 2024 - Feature Engineering with Temporal Data

So when will universal basic income be validly proposed? With mechs already automating menial tasks right now, it feels like an inevitability relating to my subject of study.

Today's topic is on temporal information in the context of feature engineering. Unlike regular information, temporal information refers to data that contains info about time. Examples of temporal info collection include a sports watch collecting the heart rate of its human wearer at a fixed time, or the closing price of the stock of a specific listed company on each day.

How is temporal information read by machinery?

Through a method operating on a similar logic to One Hot Encoding, each part in a time format is converted to pure numerical data, then separated and stored into multiple columns of year, month, date, and hour.

Handing lost values in time-based data

Simply using techniques like mean, mode, or KNN might not be ideal for imputing missing values in time-based data with strong correlations between previous and subsequent data points.

A number of techniques you could consider using are:

Linear/nonlinear regression complements: uses known data for imputation, the process of estimating missing values. In this case, the independent variable of the regression is the time point and the dependent variable is the variable with missing values you want to impute.
- For example, if the earliest and latest time points in the known data are 2022/10/01 and 2022/11/30 respectively, the data cannot be supplemented for 2022/09/30.

Difference compensation method: identifies a data point before the missing value (t-1) and another data point after the missing value (t+1), and then estimates the missing value (t) based on the difference between these two known values.
- In the process of data collection, the data may not be collected very regularly, resulting in a different time difference between each two data. If the regression model is used to make up the value, the data change caused by the time interval difference will not be considered.

Data differencing

Temporal data is very concerned with "trends", the changes between data, so it is important to calculate the characteristics generated by computation between two or more pieces of data. This is where data differencing comes into play.

For example, there are temporal data values [25, 28, 31, 27]. The difference will produce results such as [28 - 25 = 3], [31 - 28 = 3], [27 - 31 = -4], etc. If two instances of adjacent data subtract each other, it is called a first-order difference. If there is a space between the two data, then it is called a second-order difference. If there are two spaces, it is a third-order difference. And so on.

Moving average

A moving average calculates the average of a data set for a specified period. To more clearly elaborate on the subject, take reference to the graph below - the blue line represents data per day while the orange line represents the data after being averaged into a unit of seven days.

Before averaging, the system will decide the amount of data items to pick at one time for averaging, which is called a window. Its size will greatly affect the detailed trend change of the timed data.

Page updated

Google Sites

Report abuse