1. What’s an attribute? What’s a data instance?
- What’s noise? How can noise be reduced in a dataset?
- Define outlier. Describe 2 different approaches to detect outliers in a dataset.
- Describe 3 different techniques to deal with missing values in a dataset. Explain when each of these techniques would be most appropriate.
- Given a sample dataset with missing values, apply an appropriate technique to deal with them.
- Give 2 examples in which aggregation is useful.
- Given a sample dataset, apply aggregation of data values.
- What’s sampling?
- What’s simple random sampling? Is it possible to sample data instances using a distribution different from the uniform distribution? If so, give an example of a probability distribution of the data instances that is different from uniform (i.e., equal probability).
- What’s stratified sampling?