Data Cleaning and Management Course Using Stata

Traditional training – (5 days, R13,500.00 Excl. VAT)
Online training – (4 weeks, R10,550.00 Excl. VAT)

About This Course

Every data analyst who is keen about producing accurate and excellent statistical outputs spends ample time exploring and cleaning their data before engaging on statistical analysis. It is in fact thought that data cleaning and management can sometimes take more time than the actual statistical analysis. This is because of two reasons. Firstly, the validity of data analysis outputs depends on the quality of the data used. Secondly, a reviewer or reader of a report is more likely to spot data analysis errors than data cleaning and management errors.

Despite the extreme importance of data cleaning and management for programme and research data, the concepts and procedures are rarely taught in postgraduate schools, and there are scarcely any short courses in the continent that cover them. This leaves many data analysts without a systematic approach to data cleaning and management.

After many years of data analysis experience in diverse projects, CESAR has drawn from its rich experience to put together a comprehensive data cleaning and management short course. The course will be taught using Stata and participants will be required to have some experience with Stata or similar software.

Participants will be taught the concepts and procedures for data cleaning and management. They will be able to explore raw data for data quality and implement data cleaning procedures using Stata. They will also be able to draw up data quality SOP for their research and programme data processes. They will learn many tricks for data management needed for efficient and accurate data analysis. As a result, they will be able to take raw data, clean them, summarise them, analyse them and take appropriate action. They will also be able to appraise and interpret published research publications by other authors.

Course Content

Our virtual courses are delivered as interactive teaching on live webinar, and self-paced learning and practical sessions.

Live webinars:
For this course, there will be two live webinar sessions on two separate days per week, and each live session will last for three hours. This means that there will be six hours of contact teaching per week and this four-week course will have 24 contact teaching hours. The spread of the course over four weeks provides sufficient time for self-paced learning, practical exercises and facilitator feedback.

The live webinars will not be didactic sessions that bore participants. They will include PowerPoint presentations, case-studies, questions and answers, and problem-based learning, that promotes interactivity.

Self-learning and practical sessions
The course has dedicated number of hours for self-paced learning and practical exercises. Each participant is expected to slot these activities into their weekly routine.

Traditional Course Content


Day 1 - Data cleaning using Stata

  • Setting up Stata
  • Introduction to data quality
  • Review of basic data quality checks commands
  • Timing and procedures for data cleaning

Day 2 - Data management using Stata

  • Importing and combining datasets in Stata
  • Review of basic data management commands
  • Handling string variables in Stata
  • Stata egen and collapse commands

Day 3 - Efficient data management using Stata

  • Automatic outputting of Stata results
  • Automatic sequence commands (looping)

Online Course Content


Week 1

  • Setting up Stata
  • From variables to dataset
  • Working with Stata data-files, log files, and do files
  • Introduction to data quality
  • Review of basic data quality checks commands
  • Timing and procedures for data cleaning
  • Working with dates in Stata

Week 2

  • Basic data examination commands such as: describe, codebook
  • Creating and transforming variables
  • Importing and exporting data in Stata
  • Combining datasets in Stata

Week 3

  • Review of basic data management commands
  • Identifying string variables
  • Converting string variables to numeric variables
  • Handling unavoidable string variables in Stata
  • String concatenation

Week 4

  • Stata egen command
  • Aggregating data in Stata
  • Reshaping data from wide to long format and long to wide format
  • Automatic outputting of Stata results
  • Looping with foreach and forvalues

Course outcomes

The course will teach participants the following:

Achieve understanding of data quality

Be able to apply procedures and techniques for data management using Stata

Identify and correct errors in data prior to analysis

Perform data manipulation

Use automation techniques for efficiency in data management

Who Should Attend?

  • Researchers
  • Biostatisticians
  • Research analysts
  • Data analysts
  • Economists
  • HODs
  • Clinicians
  • Epidemiologist
  • Programme managers
  • Postgraduate students
  • Market researchers
  • Clinical and medical researchers
  • Scientists
  • Government practitioners


For more details about our services contact:

Dave Temane
Email: info@cesar-africa.com
Tel: +27 11 403 1411

Price Includes

  • Course attendance
  • Full refreshments: lunch
  • Welcome tea
  • Two breaks for tea including pastries
  • Course lecture notes and training manual
  • Complimentary parking
  • Certificate of attendance

Register for Traditional Training

Click to confirm

4 + 6 =

Register for Online Training

Click to confirm

12 + 15 =