Journal logo

A gentle introduction to data science

Typical steps involved in the data science

By RomeePublished 4 years ago 2 min read
Like
A gentle introduction to data science
Photo by Kevin Ku on Unsplash

There is a lot of hype in tech world about data science. There are many startups emerging as analytics solution providers to businesses. Many IT professionals shifting their careers towards data science. So, what exactly is data science. What kind of work a data scientist does? This short guide is meant to answer these questions.

Data science is a multi-disciplinary field in which engineers, software developers, statisticians make use of data to draw useful business insights. These insights can come in form of visualizing patterns in data, hidden patterns in the data, or future value predictions.

Below are the typical steps involved in a data science problem

Data collection — Data is the main ingredient of data science. Without data it is impossible to do data science. Data can be collected from various sources. It may be readily available to download, it can extracted from a database, sometimes data is not available readily, in that case a data scientist need to scrape data from web.

Data cleaning — As data comes from various sources, it may not be used directly for analysis purpose.Often public data need cleaning, missing value treatment, anomaly handling, validation, and transformations. Some of these steps can be done with the help of SQL or Excel. But for more complex operations programming knowledge is required.

Exploratory data analysis — This step involves data visualization, creating summaries, segmentation, and find answer to other business questions. Tools that can create summaries, combine variable to form composite variables, plotting utilities etc are required here. Excel, Matlab, R, Python or any other tool with these functionalities is required.

Predictive analysis — Many business problems (not all) demands prediction of future values. It can be sales, churners, or any other variable. This step involves feature engineering, feature selection, model selection etc. For this knowledge of machine learning algorithms is required. Python, R etc. provides efficient libraries for machine learning.

Communication — Once you are done with data exploration and predictions, final step is to communicate the findings. A data scientist creates summaries, plots, graphs to easily tell stories to stakeholders. Help them understand the causal variables and how they can improve their business. Data scientist will tell the business about key performance indicators, and predictions.

Tools used by a data scientist

- Excel, SQL, SAS etc for data exploration.

- Python, Java, C++ etc. for data collection, data scraping.

- Matplotlib, R, Matlab, Tableau, D3, etc. for data visualization.

- scikit-learn, R, tensorflow, torch etc, for machine learning.

- Hive, Spark, Hadoop etc. for big data processing.

Data scientists use many different tools for their work. As you can see from above points that what kind of tools a data scientist use is not that important. Any tool that help handing and processing data will work. Important thing is that a data scientist needs strong analytical skills to be good at data science.

industry
Like

About the Creator

Romee

Engineer | Blogger | Musician

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Sign in to comment

    Find us on social media

    Miscellaneous links

    • Explore
    • Contact
    • Privacy Policy
    • Terms of Use
    • Support

    © 2024 Creatd, Inc. All Rights Reserved.