Just because we have calculators now does not mean we no longer need to learn basic math. We still need to know what we want and how we to get wanted answers.
Just like how the human body needs data from their brain to learn and function, modern AI tech demands data to do the same things. Machine learning needs lots of data to learn rules from. Engineers are responsible for feeding them the need-specific data. Therefore, before learning machine learning and AI-related techs, it is necessary to have a foundation in data science.
Before we begin this Cupoy course, I would like to illustrate how AI, in study, does not exclusively describe the construction of AI models. It is a multi-discipline study – a supertopic, if you will – including other digital concepts such as machine learning, deep learning, and generative AI. The upcoming chapter's focus is on a subtopic in machine learning: data science.
So what differentiates data science from traditional science? Traditional science often focuses on controlled experiments to isolate cause-and-effect relationships, while data science often deals with observational data and focuses on identifying patterns and trends within large datasets.
Keep in mind that experiments and observations are not the only explorative methods for both sciences, neither are they mutually exclusive to both types.
In data science, a common modeling method is to use machine learning to build predictive models. When a model has predictive power, the hypothesis corresponding to the model can be validated, and the model can be used to explain natural phenomena.
However, a common problem with this method is that machine learning models are often too complex to clearly explain natural phenomena. Therefore, traditional scientific models tend to use simpler, more interpretable models to model and test the models.
Take the classic Newton's second theorem of motion in physics F=ma, for example. When we use an external force to push an object of the same mass, we can measure that the object has different accelerations – that there is a linear relationship between the external force and acceleration. Using the traditional model can accurately predict how much external force is needed to produce a certain amount of acceleration and vice versa. Such a prediction model can be quite effective, allowing us to explain natural phenomena using the concepts of external force and acceleration.
You can say a machine learning predictive model is one of the products of a data science project. Data science is, briefly put, mainly about understanding the information contained in data and the means to extract it. A successful data science project can produce a machine learning model with predictive capabilities. A refined and properly evaluated predictive model can be a desirable artificial intelligence model and data science product.
Finally, to complete a machine learning or deep learning model, data science is an indispensable basic capability, whether you want to build data science projects or train good predictive models.
Only with a basic foundation in data science can we understand the inner machinations on how to successfully train a good prediction model. Otherwise, you will only have engineering techniques or small tricks that cannot truly create a skill milestone breakthrough.
In the process of machine learning refinement, there are the roles of data engineer, data scientist, and data analyst.
Data engineers focus on building and maintaining the infrastructure that processes and stores data. This includes designing and implementing data pipelines, databases, and large-scale processing systems. They ensure the smooth flow of data throughout the machine learning lifecycle. They typically deal with more tangible aspects like databases and systems, with the static end goal of building and maintaining the infrastructure in mind.
Data scientists prepare data for analysis and modeling. This involves tasks like data cleaning, transformation, feature engineering, and exploratory data analysis. They identify and address issues with data quality and format it in a way that optimizes machine learning algorithms. They typically work with the less tangible aspects of data quality and relevance, with the continuous end goal of aiming for optimal data quality of the machine learning model in mind.
Data analysts focus on uncovering insights and patterns within data to inform decision-making. Their role involves transforming raw data into actionable knowledge. This includes tasks such as data cleaning, exploration, and visualization. They employ statistical techniques and business acumen to identify trends, anomalies, and opportunities. Data analysts often collaborate with stakeholders to understand business requirements and translate findings into clear and compelling narratives.