Education logo

Reason Why You Shouldn’t Be A Generalist In Data Science


By LekhanaPublished 2 years ago 5 min read

I work for a firm that provides data science mentorship, and I've noticed that there is one piece of advice I find myself repeatedly delivering to potential mentees. And it's actually not what I had expected. I advise that they first consider what kind of data scientist they want to be rather than suggesting a new library, tool, or resume hack.

This is important since there is no single, clearly defined discipline of data science, and employers prefer to recruit highly specialized experts rather than "data scientists" who can handle everything.

Consider being a corporation looking to hire a data scientist to understand why. You already have a relatively clear idea of the issue you're trying to solve, and it will call for some reasonably specialized technical knowledge and subject-matter experience. For instance, some businesses use basic models on big datasets, while others use complicated ones on tiny ones, some have to train their models on the fly, and others don't use any models.

Strangely, the advice given to aspiring data scientists tends to be so general: "learn how to use Python, build some classification/regression/clustering projects, and start applying for jobs." Each of these requires an entirely different skill set which can be found in the data science course online, which has rigorous data science training for working professionals.

A large portion of this fault rests with those of us who work in the field. In informal chats, blog posts, and presentations, we frequently group too many things under "data science." This creates a reliable production data pipeline. A "data science problem" exists there.

That's bad because it often leads to aspirant data scientists losing focus on particular issue classes and turning into jacks of all trades. It makes it more challenging to stand out or break through in a market already flooded with generalists. But it's difficult to resist becoming a generalist if you don't know which regular problem classes you may specialize in. For this reason, I have created a list of the four problem categories that are frequently grouped under the name "data science":

1. Data Engineer

As a data engineer, you'll be managing data pipelines for businesses that work with enormous amounts of data. This entails ensuring that your data is effectively gathered, cleaned, and preprocessed before being retrieved from its source as needed.

Reason – It may be challenging to comprehend why there are people whose full-time profession it is to create and maintain data pipelines if they have only ever worked with relatively tiny (5 GB) datasets contained in.csv or.txt files. Here are a few justifications:

A 50 GB dataset won't fit in your computer's RAM, so you often need to find alternative ways to input it into your model;

Processing that much data can take an absurdly long time, and 3) that much data can be costly; redundant storage is frequently required. It takes specialist technical knowledge to manage that storage.

You will be utilizing Apache Spark, Hadoop, and Hive, as well as Kafka, among other technologies. Most likely, you'll need to have a strong SQL foundation.

You'll be asked questions that sound like the following:

"How can I construct a pipeline capable of processing 10,000 requests per minute?"

"How can I clean this dataset without loading the entire thing into RAM?"

2. Data analyst

It will be your job to turn data into insightful business information. You'll frequently act as the liaison between the business strategy, sales, marketing and technical teams. Your daily activities will include a significant amount of data visualization.

Reason – Although highly technical people frequently find it challenging to understand, data analysts are crucial. Data analysts ensure that data science teams don't waste time on tasks that don't provide value to the company. So that business strategies can be created around them, someone needs to transform a trained and tested model and mountains of user data into an easily understandable format.

You'll work with Python, SQL, Tableau, and Excel, among other tools. You must also have strong communication skills which you can master with the best data science course, by Learnbay.

You'll be asked questions that sound like the following:

What is responsible for our user growth statistics?

How can we let management know that the recent increase in user fees is losing business?

3. Data scientist

You will be responsible for cleaning and exploring datasets and making forecasts that have business value. Model training, optimization, and frequent model deployment to production will be part of your daily tasks.

Reason – You need a method for extracting understandable insights from data that is too large for a human to interpret and too valuable to be ignored. The primary duty of a data scientist is to convert datasets into clear conclusions.

Python, sci-kit-learn, Pandas, SQL, and potentially Flask, Spark, and TensorFlow/PyTorch are among the technologies you'll be using. While some jobs in data science are simply technical, the majority will demand you to have some business acumen so that you don't end up fixing issues that aren't actually problems none have.

You'll be asked questions that sound like the following:

Do we really have so many different kinds of user types?

Can we create a model to forecast which goods would appeal to which customers?

4. Machine learning Engineer

You will be in charge of creating, enhancing, and implementing machine learning models in real-world applications. Machine learning models are typically treated as APIs or parts that you plug into a full-stack application or some hardware, but you may also be asked to create your own models.

TensorFlow/PyTorch (and enterprise deep learning frameworks), sci-kit-learn, Python, Javascript, and SQL or MongoDB are the technologies you'll be working with (typically used for app DBS).

You'll be asked questions that sound like the following:

What is the best way to incorporate this Keras model into our Javascript app?

How can I decrease forecasting costs and time for our recommender system?

However, most occupations will fit into one of these categories more quickly than the others, and the bigger the organization, the more often these categories will likely be applicable. For example, a data scientist might also need to be a data engineer or a data analyst for an early-stage startup.

The main thing to keep in mind is that to get employed, you'll typically be better off developing a more specialized skillset: don't emphasize learning Pyspark if you want to work as a machine learning researcher, and don't learn TensorFlow if you're going to work as a data analyst. Instead, consider the kind of value you want to aid businesses in creating and practice providing that value well. The best way to enter is through that more than anything else.

I hope you got the idea of why you shouldn't just become a data science generalist. Specialize in one field and you’re already earning lakhs. So get started today with an online data science course, specializing in advanced AIML techniques with multiple domain electives.

Enroll today and you’re good to go!


About the Creator

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights


There are no comments for this story

Be the first to respond and start the conversation.

Sign in to comment

    Find us on social media

    Miscellaneous links

    • Explore
    • Contact
    • Privacy Policy
    • Terms of Use
    • Support

    © 2024 Creatd, Inc. All Rights Reserved.