Home    |    Instructor-led Training    |    Online Training     
         
 
Courses
ADA
Adobe
Agile
AJAX
Android
Apache
AutoCAD
Big Data
BlockChain
Business Analysis
Business Intelligence
Business Objects
Business Skills
C/C++/Go programming
Cisco
Citrix
Cloud Computing
COBOL
Cognos
ColdFusion
COM/COM+
CompTIA
CORBA
CRM
Crystal Reports
Data Science
Datawarehousing
DB2
Desktop Application Software
DevOps
DNS
Embedded Systems
Google Web Toolkit (GWT)
IPhone
ITIL
Java
JBoss
LDAP
Leadership Development
Lotus
Machine learning/AI
Macintosh
Mainframe programming
Mobile
MultiMedia and design
.NET
NetApp
Networking
New Manager Development
Object oriented analysis and design
OpenVMS
Oracle
Oracle VM
Perl
PHP
PostgreSQL
PowerBuilder
Professional Soft Skills Workshops
Project Management
Python
Rational
Ruby
Sales Performance
SAP
SAS
Security
SharePoint
SOA
Software quality and tools
SQL Server
Sybase
Symantec
Telecommunications
Teradata
Tivoli
Tomcat
Unix/Linux/Solaris/AIX/
HP-UX
Unisys Mainframe
Visual Basic
Visual Foxpro
VMware
Web Development
WebLogic
WebSphere
Websphere MQ (MQSeries)
Windows programming
XML
XML Web Services
Other
Python for Data Science
Overview

In the information age, data is all around us. Within this data are answers to compelling questions across many societal domains (politics, business, science, etc.). But if you had access to a large dataset, would you be able to find the answers you seek?

Specifically, you’ll learn how to use:

  • python
  • jupyter notebooks
  • pandas
  • numpy
  • matplotlib
  • git
  • and many other tools.
You will learn these tools all within the context of solving compelling data science problems. After completing this course, you’ll be able to find answers within large datasets by using python tools to import data, explore it, analyze it, learn from it, visualize it, and ultimately generate easily sharable reports.

By learning these skills, you’ll also become a member of a world-wide community which seeks to build data science tools, explore public datasets, and discuss evidence-based findings.

Learning Objectives
  • Basic process of data science
  • Python and Jupyter notebooks
  • An applied understanding of how to manipulate and analyze uncurated datasets
  • Basic statistical analysis and machine learning methods
  • How to effectively visualize results
Course duration

3 Days

Course outline

Base Python Introduction
  • History and current use
    • Installing the Software
    • Python Distributions
  • String Literals and numeric objects
  • Collections (lists, tuples, dicts)
  • Datetime classes in Python
  • Memory Management in Python
  • Control Flow
  • Functions
  • Exception Handling
Defining actionable, analytic questions
  • Defining the quantitative construct to make inference on the question
  • Identifying the data needed to support the constructs
  • Identifying limitations to the data and analytic approach
  • Constructing Sensitivity analyses
Bringing Data In
  • Structured Data
    • Structured Text Files
    • Excel workbooks
    • SQL databases
  • Working with Unstructured Text Data
    • Reading Unstructured Text
    • Introduction to Natural Language Processing with Python
NumPy: Matrix Language
  • Introduction to the ndarray
  • NumPy operations
  • Broadcasting
  • Missing data in NumPy (masked array)
  • NumPy Structured arrays
  • Random number generation
Data Preparation with Pandas
  • Filtering
  • Creating and deleting variables
  • Discretization of Continuous Data
  • Scaling and standardizing data
  • Identifying Duplicates
  • Dummy Coding
  • Combining Datasets
  • Transposing Data
  • Long to wide and back
Exploratory Data Analysis with Pandas
  • Univariate Statistical Summaries and Detecting Outliers
  • Multivariate Statistical Summaries and Outlier Detection
  • Group-wise calculations using Pandas
  • Pivot Tables
Exploring Data graphically
  • Histogram
  • Box-and-whiskers plot
  • Scatter plots
  • Forest Plots
  • Group-by plotting
Advanced Graphing with Matplotlib, Pandas, and Seaborn

Python, Hadoop and Spark
  • Introduction to the difference in Python, Hadoop, and Spark
  • Importing data from Spark and Hadoop to Python
  • Parallel execution leveraging Spark or Hadoop
Missing Data
  • Exploring and understanding patterns in missing data
  • Missing at Random
  • Missing Not at Random
  • Missing Completely at Random
  • Data imputation methods
Traditional Inferential Statistics
  • Comparing Groups
    • P-Values, summary statistics, sufficient statistics, inferential targets
    • T-Tests (equal and unequal variances)
    • ANOVA
    • Chi-Square Tests
  • Correlation
Frequentist Approaches to Multivariate Statistics:
  • Linear Regression
    • Multivariate linear regression
    • Capturing Non-linear Relationships
    • Comparing Model Fits
    • Scoring new data
    • Poisson Regression Extension
  • Logistic regression
    • Logistic Regression Example
    • Classification Metrics
Machine learning approaches to multivariate statistics
  • Machine Learning Theory
  • Data pre-processing
    • Missing Data
    • Dummy Coding
    • Standardization
    • Training/Test data
  • Supervised Versus Unsupervised Learning
  • Unsupervised Learning: Clustering
    • Clustering Algorithms
    • Evaluating Cluster Performance
  • Dimensionality Reduction
    • A-priori
    • Principal Components Analysis
    • Penalized Regression
Supervised Learning: Regression
  • Linear Regression
  • Penalized Linear Regression
  • Stochastic Gradient Descent
  • Scoring New Data Sets
  • Cross Validation
  • Variance Bias-Tradeoff
  • Feature Importance
Supervised Learning: Classification
  • Logistic Regression
  • LASSO
  • Random Forest
  • Ensemble Methods
  • Feature Importance
  • Scoring New Data Sets
  • Cross Validation

Please contact your training representative for more details on having this course delivered onsite or online

Training Outlines - the one stop shopping center for IT training.
© Training Outlines All rights reserved