+1 (218) 451-4151



Econometrics PS 1

Due: Feb 2

Complete the entire problems in each section as required. There are two sections!

Section 1: Probability theory: Expected Value and Lotteries (ONE QUESTION TOTAL)

We discussed how the sample mean can be skewed by an extreme value. In a sample of 100 people from Texas, if a multi-millionaire oil baron is randomly chosen for the sample, the mean income in the sample would be skewed higher than the median.

The sample mean is sometimes referred to as the expected value, written E[X] for the expected value of X. In probability theory the expected value is the sum of all potential outcomes, weighted by the probability/chance of the occurrence. For example, assume you are close friends with the oil baron. You need money for school, and as your friend he will agrees to give you one of two cars he never drives, which you will immediately sell for cash. He will flip a coin to determine which one, giving you a 50/50 chance of each. Let’s say one car is worth $12,000 and the other car is worth $118,000. The expected value is written E[X] = 0.5 × $12,000 + 0.5 × $118,000 = $65,000. Notice that this is also the average of the two values.

Question 1: Consider a random lottery, where 2,250 people enter their name (only once per-person) and a machine selects one winner at random. Each player has an equal chance of selection and the winning prize is $425,000.

a. Absent any costs associated with winning or playing the lottery, what is the expected value of entering the lottery one time?

b. Assume the winner must pay a 20% tax on lottery winnings. Further, the dealer wants to charge an entry fee. Exactly 2,250 people believe in luck and will play if the expected value of the gamble is greater than or equal to zero. What is the maximum entry fee the dealer can charge? Will the dealer make a profit? 

c. What is the fundamental difference between the typical Powerball or Megamillions lottery and the one we established in our example above? Use two sentences or less to explain.

Section 2: Stata Exercises (FIVE QUESTIONS TOTAL)

For this section you will download data from this site: https://www.stata.com/texts/eacsap/

Connect to the virtual Stata console here: https://vcon.lib.uh.edu/portal/webclient/index.html#/ 

– Find the data “labsup.dta” which contains variables for birthing mothers and their background characteristics. Save the file to your computer in a folder you can track down.

– Boot up StataSE 15 by clicking on the icon .  

(File Transfer)

– Upload the labsup.dta file to your virtual desktop using file transfer. Open the tab on the far left side of the console and select the file transfer option (circled in red in the screenshot). Your file will be stored in the virtual “documents” folder. You will also use this tab to toggle between open files (circled in blue).

– Stata allows us to write programs, so our work is reproducible. We save the program codes as .do file format, which is Stata specific but other statistical software have the same functionality. Once Stata is up and running, click the icon to open a new .do file. Of the two with a pencil, it is the one on the left.

– Before proceeding, save your .do to the documents folder. In the .do file pane, click “file…save as…”, select desktop, then your user folder, then documents folder.  While you have the window open, save the filepath by right clicking on “Documents” at the top and selecting “Copy address as text.”


– Navigate back to your .do file. The first step is to set the working directory using the cd (change directory) command. We will use the documents folder as the directory. Next, load the dataset with the use command. In this example I clear the temporary memory, set the working directory, and load the data. Select the block you wish to execute and press the “do” button.


By this point you will see the data loaded up in the main Stata panel. That’s the way it works. You code up something in the .do file and it is executed in the workstation. 

Other than the .do file you can tell Stata to execute something by typing directly into the command line in the main Stata panel.  Note that for any Stata function you can use help followed by the command name, directly into the command line. Also note that to see the data table, type browse into the command line and press enter.


1. The labsup dataset contains variables describing the fertility of n=31,857 mothers. Summarize the variable “kidcount” in detail and answer the following questions.

a. Describe the sample based on the number of kids. Are there any women in the sample with no children? Use three sentences or less.

b. Let’s say a big family is one that has three kids or more. Based on this definition, what fraction of the total sample has a big family? Use the count function.

c. If I were to randomly select a mother from this sample, what is the probability that I would choose a woman with 5 kids or more? What about 3 kids or less?

2. Random variables can be categorized as discrete or continuous. A discrete variable is one that has values determined by counts: the number of kids, total family size, etc. A continuous variable is measured and can take on fractional values: income, miles, square footage of a house are a few examples. 

a. Based on your summary output in #1, is the median or mean a better descriptor for the number of kids each mother has? Why?

b. There are three income variables in the dataset. Use the kdensity function to plot the probability density for “labinc” (mom’s labor income in 000’s). Also summarize “labinc.” What is informative from looking at the probability density that you might not have picked up just by looking at the mean? Paste a screenshot of your density here as well.

c. Now plot the probability density for “faminc” (total household income). Does this variable approximate a normal distribution? Are there observations in the data you might flag as erroneous?

3. Use the correlation command to create a single correlation matrix for the following variables: agefstm (mother’s age at first birth), kidcount, faminc, and educ (years of education). 

a. Paste a screenshot of your correlation matrix here.

b. Some of the correlations are obvious, for example years of education is positively correlated with income. Identify one positive and one negative correlation that would have been less obvious before running this correlation.

c. A researcher looks at the correlation between family income and kidcount and notices that the sign is negative. She then says to herself, “wow, lower income families have more kids. This is surprising given they would have less resources than a richer family.” Think like an economist- given the correlation between family income, years of education, and mother’s age at first birth, what might be an alternative explanation for why lower income mothers have more children? Use four sentences or less.

4. Indeed, the correlation between family income and number of kids is not very strong. Let’s run a regression to test for the effect of increasing income on the number of kids. Using the regress command, run the following regression equations.

a. First run the regression 


where Yi is kidscount, and X1i is faminc (family income). What is the coefficient for B1?

b. Now run the regression 


Where X2i is years of education and X3i is mom’s age at first birth.  Does the coefficient B1 become smaller or larger?

c. Now run the regression


Where Zi is labinc (mom’s labor income). Which coefficient is larger, the coefficient for family income (B1) or the coefficient for mom’s income (B4)? Also paste a screenshot of your regression results for part c (only) here.

5. In the 1980’s the FDA approved oral contraceptives commonly referred to as birth control pills. Think like an economist – what is the opportunity cost of childbirth for a woman of working age? What effect do you think birth control pills had on college graduation rates of women since the approval? Given what you found in the prior steps, what is one mechanism in which birth control pills might decrease the fertility rate in the United States, outside of directly preventing contraception? Write two paragraphs or less summarizing your thoughts about these questions.