Fundamentals of data science

Let's start learning Data Science

By SaranPublished 3 years ago • 3 min read

Photo by Sean Stratton on Unsplash

Business Intelligence :

Bi widely used for price optimization and inventory management.

In other words, this technique reduces costs and increases profits.

Traditional Methods :

Also called predictive analytics, used for predicting future values with good accuracy.

Regression is a model used for describing relationships among variables in our analysis.

Logistic regression :

Logistic regression is a non - linear model, values on the vertical line will be 1s and 0s
Eg: like filtering job candidate, If selected means 1s otherwise rejected means 0s.

Cluster Analysis :

This technique can be applied only when the data are divided into few groups.

Factor Analysis :

It Combines more variables into one.
Grouping Explanatory variables together.
It reduces dimensionality.

Time Series Analysis :

Representation always in a horizontal line.
Variable are independent.
This analysis widely used for measuring Stock price.

Example: Sales Forecasting which uses Time Series Analysis.

The Concept of Machine learning:-

In general, ml is applied to create an algorithm, we don't give instruction Instead we provide an algorithm that lets an ml learn itself.

Machine learning is a process of trial and error.

Types of Machine Learning :

Supervised Learning
Unsupervised Learning
Reinforcement Learning

Supervised Learning :

Supervised learning deals with labeled data, it has targets eg: giving data and expecting possible outcomes.

The Widely utilized algorithm in supervised learning are :

SVM - Support Vector Machines.
NN - Neural Networks.
Deep Learning (has high accuracy).
Random Forests.
Bayesian Networks.

Unsupervised Learning :

It deals with unlabelled data. If we have large data we don't have enough time to process it. To overcome this we first go for unsupervised learning and go to supervised learning.

The most applied algorithm in unsupervised learning are :

K-means algorithm.
Deep Learning.

Reinforcement Learning :

In each process providing a reward to the model only if positive outcomes are obtained. It is similar to supervised learning but maximizing reward. This learning aims to increase objective function(mathematical equation).

The most applied algorithm in Reinforcement learning is:

Deep Learning.

Deep Learning algorithm solves all the three major types of ml in different ways

Necessary programming languages and software :

Python - suitable for mathematical and statistical computations.
py are easily adaptable.
Matlab, SQL, Programming with R.
Software: Excel for big data, Apache Hadoop, MongoDB.

Job role in data science :

Traditional Data: Data Architect, Data Engineer, DB Administration.
Big Data: Big Data Architect, Big Data Engineer.
Bi: Bi analyst, Bi Consultant, Bi Developer.
Data Science: Data Scientist, Data Analyst.
Machine Learning: Data Scientist, Machine Learning Engineer.

To become a data scientist, we need to be strong in mathematical concepts like Statistics, Calculus, and Algebra. when I started to learn statistics I thought starting from probability will be a good idea. Let see what I learned about Probability for statistics.

Understating Probability for inferential statistics:

Probability :

Probability is a chance of getting success and failure, those are represented in fractions and percentages.

Example: A coin is tossed, it can be categorized preferred outcomes(1) and possible outcomes(2)

Expected Values :

An expected value can be an average outcome that we expect when we run our experiment many times.
Experiment: it refers to multiple trials like tossing 50 coins and getting 50 outcomes are considered as single experiments. These are known as experimental probabilities. Experimental probabilities are easy to compute.
Experimental probabilities are a good predictor of theoretical probabilities.

Probability Frequency Distribution:

A collection of probabilities for each possible outcomes.

To obtain probability frequency distribution we can divide frequency by size of sample space.
This can be represented through a graph or table.
The highest value is considered as Expected Value.

Complements :

Every event has complements(A’).
Complements are always mutually exclusive.
If a set consist of all odd numbers then its complement would be set of all even numbers.

Notation :

Combinatorics:

Combinatorics deals with combinations of objects from a finite set. In order to perform combinatorics, they have some restrictions (condition) like avoiding repetition and order to get a number of favorable outcomes.

Types of Combinatorics :

Combinatorics has three parts, Namely

Permutation

Variations

Combinations

We will discuss more about the types of Combinatorics in our upcoming article. Thank you for reading you can share your valuable suggestion on LinkedIn.

Read more articles :

courses

About the Creator

Saran

Learning Statistics

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Keep reading

More stories from Saran and writers in Education and other communities.