Education logo

Fundamentals of data science

Let's start learning Data Science

By SaranPublished 3 years ago 3 min read
1
Fundamentals of data science
Photo by Sean Stratton on Unsplash

Business Intelligence :

  • Bi widely used for price optimization and inventory management.
  • In other words, this technique reduces costs and increases profits.

Traditional Methods :

  • Also called predictive analytics, used for predicting future values with good accuracy.

Regression is a model used for describing relationships among variables in our analysis.

Logistic regression :

  • Logistic regression is a non - linear model, values on the vertical line will be 1s and 0s
  • Eg: like filtering job candidate, If selected means 1s otherwise rejected means 0s.

Cluster Analysis :

  • This technique can be applied only when the data are divided into few groups.

Factor Analysis :

  • It Combines more variables into one.
  • Grouping Explanatory variables together.
  • It reduces dimensionality.

Time Series Analysis :

  • Representation always in a horizontal line.
  • Variable are independent.
  • This analysis widely used for measuring Stock price.

Example: Sales Forecasting which uses Time Series Analysis.

By Artem Beliaikin on Unsplash

The Concept of Machine learning:-

By Alex Knight on Unsplash
  • In general, ml is applied to create an algorithm, we don't give instruction Instead we provide an algorithm that lets an ml learn itself.
  • Machine learning is a process of trial and error.
  • Types of Machine Learning :

    Supervised Learning

    Unsupervised Learning

    Reinforcement Learning

    Supervised Learning :

    Supervised learning deals with labeled data, it has targets eg: giving data and expecting possible outcomes.

    The Widely utilized algorithm in supervised learning are :

    • SVM - Support Vector Machines.
    • NN - Neural Networks.
    • Deep Learning (has high accuracy).
    • Random Forests.
    • Bayesian Networks.

    Unsupervised Learning :

    It deals with unlabelled data. If we have large data we don't have enough time to process it. To overcome this we first go for unsupervised learning and go to supervised learning.

    The most applied algorithm in unsupervised learning are :

    • K-means algorithm.
    • Deep Learning.

    Reinforcement Learning :

    In each process providing a reward to the model only if positive outcomes are obtained. It is similar to supervised learning but maximizing reward. This learning aims to increase objective function(mathematical equation).

    The most applied algorithm in Reinforcement learning is:

    • Deep Learning.

    Deep Learning algorithm solves all the three major types of ml in different ways

    Necessary programming languages and software :

    • Python - suitable for mathematical and statistical computations.
    • py are easily adaptable.
    • Matlab, SQL, Programming with R.
    • Software: Excel for big data, Apache Hadoop, MongoDB.

    Job role in data science :

    1. Traditional Data: Data Architect, Data Engineer, DB Administration.
    2. Big Data: Big Data Architect, Big Data Engineer.
    3. Bi: Bi analyst, Bi Consultant, Bi Developer.
    4. Data Science: Data Scientist, Data Analyst.
    5. Machine Learning: Data Scientist, Machine Learning Engineer.

    To become a data scientist, we need to be strong in mathematical concepts like Statistics, Calculus, and Algebra. when I started to learn statistics I thought starting from probability will be a good idea. Let see what I learned about Probability for statistics.

    Understating Probability for inferential statistics:

    Probability :

    • Probability is a chance of getting success and failure, those are represented in fractions and percentages.
    • Example: A coin is tossed, it can be categorized preferred outcomes(1) and possible outcomes(2)

    Expected Values :

    • An expected value can be an average outcome that we expect when we run our experiment many times.
    • Experiment: it refers to multiple trials like tossing 50 coins and getting 50 outcomes are considered as single experiments. These are known as experimental probabilities. Experimental probabilities are easy to compute.
    • Experimental probabilities are a good predictor of theoretical probabilities.

    Probability Frequency Distribution:

    • A collection of probabilities for each possible outcomes.
    • To obtain probability frequency distribution we can divide frequency by size of sample space.
    • This can be represented through a graph or table.
    • The highest value is considered as Expected Value.

    Complements :

    • Every event has complements(A’).
    • Complements are always mutually exclusive.
    • If a set consist of all odd numbers then its complement would be set of all even numbers.

    Notation :

    Combinatorics:

    Combinatorics deals with combinations of objects from a finite set. In order to perform combinatorics, they have some restrictions (condition) like avoiding repetition and order to get a number of favorable outcomes.

    Types of Combinatorics :

    Combinatorics has three parts, Namely

    Permutation

    Variations

    Combinations

    We will discuss more about the types of Combinatorics in our upcoming article. Thank you for reading you can share your valuable suggestion on LinkedIn.

    Read more articles :

    courses
    1

    About the Creator

    Saran

    Learning Statistics

    Reader insights

    Be the first to share your insights about this piece.

    How does it work?

    Add your insights

    Comments

    There are no comments for this story

    Be the first to respond and start the conversation.

    Sign in to comment

      Find us on social media

      Miscellaneous links

      • Explore
      • Contact
      • Privacy Policy
      • Terms of Use
      • Support

      © 2024 Creatd, Inc. All Rights Reserved.