Data Cleaning and Management Course Using Stata
Traditional training – (5 days, R13,500.00 Excl. VAT)
Online training – (4 weeks, R10,550.00 Excl. VAT)
About This Course
Every data analyst who is keen about producing accurate and excellent statistical outputs spends ample time exploring and cleaning their data before engaging on statistical analysis. It is in fact thought that data cleaning and management can sometimes take more time than the actual statistical analysis. This is because of two reasons. Firstly, the validity of data analysis outputs depends on the quality of the data used. Secondly, a reviewer or reader of a report is more likely to spot data analysis errors than data cleaning and management errors.
Despite the extreme importance of data cleaning and management for programme and research data, the concepts and procedures are rarely taught in postgraduate schools, and there are scarcely any short courses in the continent that cover them. This leaves many data analysts without a systematic approach to data cleaning and management.
After many years of data analysis experience in diverse projects, CESAR has drawn from its rich experience to put together a comprehensive data cleaning and management short course. The course will be taught using Stata and participants will be required to have some experience with Stata or similar software.
Participants will be taught the concepts and procedures for data cleaning and management. They will be able to explore raw data for data quality and implement data cleaning procedures using Stata. They will also be able to draw up data quality SOP for their research and programme data processes. They will learn many tricks for data management needed for efficient and accurate data analysis. As a result, they will be able to take raw data, clean them, summarise them, analyse them and take appropriate action. They will also be able to appraise and interpret published research publications by other authors.
Course Content
Our virtual courses are delivered as interactive teaching on live webinar, and self-paced learning and practical sessions.
Live webinars:
For this course, there will be two live webinar sessions on two separate days per week, and each live session will last for three hours. This means that there will be six hours of contact teaching per week and this four-week course will have 24 contact teaching hours. The spread of the course over four weeks provides sufficient time for self-paced learning, practical exercises and facilitator feedback.
The live webinars will not be didactic sessions that bore participants. They will include PowerPoint presentations, case-studies, questions and answers, and problem-based learning, that promotes interactivity.
Self-learning and practical sessions
The course has dedicated number of hours for self-paced learning and practical exercises. Each participant is expected to slot these activities into their weekly routine.
Day 1
- Setting up Stata
- From variables to dataset
- Working with Stata data-files, log files, and do files
- Introduction to data quality
- Review of basic data quality checks commands
- Timing and procedures for data cleaning
- Working with dates in Stata
Day 2
- Basic data examination commands such as: describe, codebook
- Creating and transforming variables
- Importing and exporting data in Stata
- Combining datasets in Stata
Day 3
- Review of basic data management commands
- Identifying string variables
- Converting string variables to numeric variables
- Handling unavoidable string variables in Stata
- String concatenation
Day 4
- Stata egen command
- Aggregating data in Stata
- Reshaping data from wide to long format and long to wide format
Day 5
- Automatic outputting of Stata results
- Looping with foreach and forvalues
Traditional Course Content
Online Course Content
Week 1
- Stata set up
- Setting up Stata
- Working with Stata data-files, log files, and do files
- Data management
- Importing and exporting data in Stata
- Combining datasets in Stata
Week 2
- Data manipulation
- Basic data examination commands such as: describe, codebook
- Creating and transforming variables
- Procedures and techniques for data cleaning
- Introduction to data quality
- Timing and procedures for data cleaning
- Handling duplicates, missing data, extreme and illegal values
- Working with dates in Stata
Week 3
- Handling string variables
- Identifying string variables
- Converting string variables to numeric variables
- Handling unavoidable string variables in Stata
- String concatenation
- Exploring the capabilities of egen
- Stata egen command
Week 4
- Data aggregation and transformation
- Aggregating data in Stata
- Reshaping data from wide to long format and long to wide format
- Automation of Stata output
- Automatic outputting of Stata results
Course outcomes
The course will teach participants the following:
Achieve understanding of data quality
Be able to apply procedures and techniques for data management using Stata
Identify and correct errors in data prior to analysis
Perform data manipulation
Use automation techniques for efficiency in data management
Who Should Attend?
- Researchers
- Biostatisticians
- Research analysts
- Data analysts
- Economists
- HODs
- Clinicians
- Epidemiologist
- Programme managers
- Postgraduate students
- Market researchers
- Clinical and medical researchers
- Scientists
- Government practitioners
Application
For more details about our services contact:
Dave Temane
Email: info@cesar-africa.com
Tel: +27 11 403 1411
Price Includes
- Course attendance
- Full refreshments: lunch
- Welcome tea
- Two breaks for tea including pastries
- Course lecture notes and training manual
- Complimentary parking
- Certificate of attendance