Data Science Certification Course 120 Hrs.

	11 Nov 2020
	09:00 - 17:00 (UTC+7)
	Skulthai Surawong Tower

Business Self-Improvement

Event Information

คำอธิบายรายวิชา

Data Science เป็นการค้นหา Pattern จากข้อมูลขนาดใหญ่ เพื่อสนับสนุนการตัดสินใจ โดยใช้ ข้อมูลขององค์กร บริษัทในปัจจุบันใช้ความสามารถทางด้าน Data Science ทำให้เกิดความได้เปรียบ ในการแข่งขัน (Competitive Advantage) เหนือกว่าคู่แข่ง

หลักสูตร Data Science Certification นี้เป็นการรวบรวม หลักสูตรหลายๆ หลักสูตร เพื่อให้ผู้เรียนได้รับความรู้ และการลงมือปฏิบัติ เพื่อให้ท่านได้สามารถ เรียนรู้ ศาสตร์ที่กำลังพัฒนาและปรับเปรียนตัวเองอย่างรวดเร็ว ท่านจะสามารถเข้าใจการใช้เทคโนโลยี ที่เกี่ยวข้องเช่น Hadoop, Spark, NoSQL และ ภาษา R ที่จะทำให้ท่านมีความรู้ในการวิเคราะห์ข้อมูลต่างๆ

หลักสูตร Data Science Certification เป็นหลักสูตร 120 ชั่วโมงที่ต้องการพัฒนาให้ผู้เรียนได้เข้าใจถึงเรื่องของ Data Science ตลอดจน ความรู้ในการทำ Big Data ตั้งแต่วางกลยุทธ์ จนถึงการทำ Predictive Analytics ด้วย Large-Scale Machine Learning การสอนในหลักสูตรนี้ประกอบไปด้วยการบรรยาย การทำ Workshop โดยจะมีการติดตั้งใช้เครื่องมือ Big Data จริงๆที่สามารถทำงานได้ รวมถึงการใช้งานบนระบบ Cloud

ทางสถาบัน ไอเอ็มซี จึงได้รวบรวมวิทยากร และปรับเนื้อหาให้สอดคล้องกับ เทคโนโลยีที่มีเข้ามาเพิ่มเรื่อยๆ ในหลากหลายมุมของการนำ Data Science ไปใช้ เป็นหลักสูตร 4 เดือน

ระยะเวลาอบรม : 120 ชั่วโมง

เรียนทุกวันพุธตอนเย็น 18.00 -21.00 น. และวันเสาร์ 9.00 - 17.00 น.
รุ่นที่ 2 เริ่มเรียน 11 พฤศจิกายน 2563

กำหนดการอบรม

Module 1: Introduction to Data Science, Big Data and R

Module 2: Data Visualization

Module 3: Data Science: Statistical Analysis

Module 4: Data Science: Machine Learning Algorithm

ค่าอบรม : 59,000 บาท ไม่รวม VAT

(ค่าอบรมรวม ค่าเอกสารอบรม ค่าใช้งาน Cloud Services ค่าอาหารกลางวัน อาหารว่าง และ อาหารเย็นวันพุธ)

วัตถุประสงค์ของการอบรม

หลักสูตรการอบรม120 ชั่วโมงนี้ มีวัตถุประสงค์เพื่อให้ผู้เข้าอบรมได้รับความรู้ด้านต่างๆดังนี้

มีความเข้าใจ ทักษะในการบริหาร วงจรชีวิต (Lifecycle) ของ Data Science

สามารถเขียนโปรแกรมวิเคราะห์ข้อมูล โดยใช้ภาษา R
เรียนรู้หลักการของ Data Science และอัลกอริทึมในการพยากรณ์ข้อมูลต่างๆ ทั้งแบบที่เป็น Classification/Regression และ Clustering ด้วย Large-Scale Machine Learning
เรียนรู้ เทคนิค Neural Network และ Deep Learning
เรียนรู้เครื่องมือในการทำ Data Visualization
การสร้างโครงการ Data Science บนข้อมูล Big Data
เข้าใจหลักการของ Big Data และแนวโน้มของเทคโนโลยีด้านข้อมูล
เรียนรู้การใช้งานเทคโนโลยี Hadoop และเทคโนโลยีที่เกี่ยวข้องอาทิเช่น Spark และอื่นๆ ในด้าน Computation, Storage และ Ingestion

วิทยากร:

Veerasak Kritsanapraphan (See Profile)
Aekanun Thongtae (See Profile)
Dr.Thanachart Numnonda (See Profile)

รูปแบบการอบรม:

การบรรยาย 40% ปฏิบัติการ 60%
การทำ Hand-on Lab ติดตั้งระบบจริง และการใช้ Big Data as a Service
การพัฒนาโปรแกรมต่างๆ
การทำ Workshops
การทำ Mini-Project ทางด้าน Data Science

บุคลากรที่ควรเข้าร่วมการอบรม

บุคคลทั่วไปที่สนใจจะพัฒนา Big Data และต้องการเป็น Big Data IT Professional หรือ Data Scientist โดยต้องมีความรู้พื้นฐานด้านไอทีมาเป็นอย่างดี มีความรู้เรื่องฐานข้อมูล และการเขียนโปรแกรมมาบ้าง

คุณสมบัติเบื้องต้นของผู้เข้าร่วมการอบรม

ต้องมีประสบการณ์การทำงานด้านไอทีมาอย่างน้อย 2 ปี
ควรมีความรู้พื้นฐานเรื่องระบบฐานข้อมูล
มีพื้นฐานด้านการพัฒนาโปรแกรมภาษาใดภาษาหนึ่ง

สิ่งที่จะได้จากการอบรม:

เข้าใจหลักการของ Data Science
เข้าใจเทคโนโลยีต่างๆของ Big Data ที่ใช้ใน การทำ Data Science Project
การใช้เครื่องมือต่างๆที่เกี่ยวข้องกับ Big Data เช่น NoSQL, Hadoop, BI Tools และ Machine Learning Tools
เรียนรู้การใช้ Big Data as a Service บน Cloud Platform
สามารถใช้ BI Tools ที่เป็น Open Source Tools ได้
เรียนรู้ด้าน Data Science ในเชิงลึก อัลกอรึทึมสำหรับ Machine Learning ประเภทต่างๆ และการทำ Predictive Analytics ด้วย Large-Scale Big Data
การใช้ Data Science Platform เช่น Tensorflow Keras และ Pytorch เป็นต้น
การพัฒนา Application โดยใช้งาน Data Science

การทบทวนเนื้อหาที่ได้รับการอบรม

จะมีการอัดคลิปการอบรมเพื่อให้สามารถทวบทวนย้อนหลังได้ โดยจะใช้วิธีการอัดเสียงและหน้าจอของวิทยากร
จะจัดเตรียม source code ต่างๆให้อยู่ใน Github

ระบบ Computer Cluster ที่จะใช้ในการอบรม:

ในการอบรมนี้จะมีการติดตั้งระบบ Big Data Cluster ที่สามารถรองรับข้อมูลขนาดใหญ่ที่สามารถใช้งานได้จริง โดยจะใช้ Server ของ Google Cloud Platform ให้ผู้เรียนได้ติดตั้ง Hadoop Cluster โดยใช้ Server จำนวนหลายเครื่อง และจะมี Hadoop Cluster ที่เป็น Google DataProc ที่สามารถรองรับข้อมูลขนาดใหญ่จำนวนเป็น GByte เพิ่อให้ผู้เข้าอบรมได้เรียนรู้การทำงานจริงๆ

เทคโนโลยีที่จะใช้ในการอบรม

เทคโนโลยีด้าน Big Data

Hadoop Cluster using Apache and Cloudera Distribution
Hadoop Cluster on Google Cloud Platform
Tensorflow
Keras

ภาษาที่จะใช้ในการประมวลผลข้อมูล

R, RStudio
Map/Reduce using Spark
Node.js

เทคโนโลยีด้าน Cloud Computing

Google Cloud Platform
Google Big Data as a Service
Amazon Web Services using EC2, S3 and EMR

เนื้อหาในการอบรม:

Module 1: Introduction to Data Science, Big Data and R

Introduction to Data Science (Mr. Veerasak)

Data Science Definition
Data Science and Data-Driven Decision Making
Data Science Benefits

Introduction to Big Data (Dr.Thanachart)

Big Data Definition
Why Big Data?
Big Data Eco-System
Big Data Benefits
Big Data vs. Business Intelligence vs. Analytics
Big Data Use Cases
Big Data Technology

Introduction to Hadoop (Dr.Thanachart)

What is Hadoop?
Hadoop Architecture and HDFS
Comparison of Hadoop Software Distribution Products
Comparison of Hardware for Hadoop Ecosystem
Hadoop on Cloud: Hadoop as a Service

Introduction to R and RStudio (Mr. Veerasak)

R Markdown and Notebook
R Basic
Data Types
Vectors
Data Frame
Matrix

Programming Logic, Loop and Function

Conditionals
Functions
Loops

R Packages

Module 2: Data Visualization (Mr. Veerasak)

Introduction to Distributions

Distribution Assessment
Normal Distribution

Principles of Data Visualization
Exploratory Data Analysis Principle

Quantiles, Percentiles, and Boxplots

Data Summarization

Data Tables
Group by

Advanced Graphic Visualization with ggplot2
Creating Infographic and Gapminder
Creating a Dashboard using Shiny

Module 3: Data Science: Statistical Analysis (Mr. Veerasak)

Inference and Modeling

Introduction to Inference
Parameters and Estimates

Central Limit Theorem

Margin of Error
Monte Carlo Simulation for CLT
Spread
Bias

Confidence Intervals and p-Values

Confidence Intervals
Power
P-Values

Statistical Models

Data-driven Models

Bayesian Statistics

Bayes’s Theorem
The Hierarchical Model

Hypothesis Testing
Introduction of Apache Spark using R

Overviewing of Apache Spark’s Architecture
Works with Spark’s DataFrame using R
Sparklyr: R interface for Apache Spark

Data Understanding and Preparation with Statistical Perspective [Mr. Aekanan]
- Detecting Outlier and their Effects in Supervised Learning
- Missing Values, Null and Compensation Techniques
- Variance and Negative Implication
- Imbalanced Data and Issues of Model Degrade
- Filter based
- Wrapper-based
- Embedded

Overfitting & Underfitting
Exploratory Data Analysis and Preparation
Feature Selection Theory

Module 4: Data Science: Machine Learning Algorithm (Mr. Veerasak)

Introduction to Machine Learning
- Machine Learning Basics
- Regression
  - Linear Models
  - Stratification and Multivariate Regression
  - Least Squares Estimates
- Logistic Regression
- Time Series Analysis
Evaluating Machine Learning Algorithms
- Confusion matrix
- Balanced Accuracy and F1 Score
- Prevalence
- ROC and Precision-recall curves
- Cross-validation
Supervised Learning Algorithms
- Classification
  - Decision Tree
  - Naive Bayes
  - K-Nearest Neighbor
  - Support Vector Machine
- Ensembles
  - Boosting
  - Bagging
  - Random Forests

- Unsupervised Learning Algorithms
  - K-Mean Clustering
  - Hierarchical Clustering
  - Recommendation System
  - Frequent Pattern Mining
  - Sentimental Analysis
- Real World Cases: Large-Scale Machine Learning using Spark/Databrick (Mr. Aekanan)
  - An Analysis of Marketing Effort using Unsupervised Learning
  - Credit Scoring using Supervised Learning
- Neural Network and Deep Learning
  - Multi-layer Perceptron
  - Convolutional Neural Net
  - Recurrent Neural Net
  - LSTM
- Reinforcement Learning Algorithms