Skip to main content

Parameter estimation

parameter-estimation

Parameter Estimation

Fundamentals

Problem Statement

Suppose that the population distribution follows a parameteric model $f(x|\theta)$ and given a random sample $X_1,X_2, ..., X_n$ from the population $X_i\tilde{} f(x|\theta)$, estimate the parameter of interest $\theta$

Basic assumption in parametric estimation is that the population distribution follows some parameteric model. Here, parametric models are those of the form:

$$\mathcal{F}=f(x,\theta), \theta\in\Theta$$

where $\Theta\subset R^k$ is the parameter space, and $\theta$ is the parameter.

Example

  1. Normal distribution has two parameters $\mu$ and $\sigma$

Terminologies

  1. Estimator $\hat{\theta}$ is a rule to calculate an estimate of a given quantity (model parameter) based on observed data.
  2. Estimate is a fixed value of that estimator for a particular observed sample.
  3. Statistic is a function of the data, e.g. sample mean
  4. Population distribution
  5. Sampling distribution of a statistic

Example

  • A pool seeks to estimate the proportion $p$ of adult residents of a city that support building a new sport stadium. Suppose that n is the sample size and $\hat{p}$ is the sample proportion, the rule that calculates the sample proportion is called the estimator of the population proportion. The actual value of sample proportion on the observed sample is called the estimate. The sample proportion is a statistic of the sample, note that the statistic itself doesn't need to associate with any parameter of interest.

Point Estimation

Point estimation involves the use of sample data to calculate a simple value which is the best estimate of an unknown population parameter.

Method of Moments

Let $X_1, X_2, ..., X_n$ are iid random variables from a parametric model $f(X,\theta)$ where $\theta=(\theta_1,\theta_2, ..., \theta_k)$ is a vector of $k$ parameters. We are interested in estimating $\theta$.

Moments

  • $\mu_k=E[(X-c)^k]$ is the k-th (theoretical) moment of the distribution around $c$, for k=1, 2, etc.

  • $A_k=1/n\sum_{i=1}^n (X-c)^k_i$ is the k-th sample moment around $c$, for k=1,2,etc.

Moments are often used to indicate moments around zero ($c=0$).
For $k>1$, we also use $c=\mu$, central moments. The second order central moment is the variance.

Suppose that the first K order moments of population exists, equating K theoritical moments to K sample moments gives us K equations with K unknowns.

$$E(X^k)=\frac{1}{n}\sum_{i=1}^nX^k$$

Solving these equations gives us the Method-of-moment estimators for K parameters of interest.

Examples 1 Method of moments estimator for uniform distribution

Assume that $X\tilde{}U(a,b)$ where a,b are unknown. We obtain a sample (1,2,3,4,5) from the uniform population, find the method-of-moments estimator for a,b.

The density function is

\begin{equation} f(x)= \begin{cases} \frac{1}{b-a} & a \leq x \leq b\\ 0 & \mbox {otherwise} \end{cases} \nonumber \end{equation}

The first theoretical moment:

$$E(X)=\int_a^bxf(x)dx=\frac{x^2}{2(b-a)}\biggr|_a^b=\frac{a+b}{2}$$

The second theoretical central moment:

$$E(X^2) = Var(X)+E(X)^2$$

$$Var(X)=\int_a^b(X-\frac{a+b}{2})^2*\frac{1}{b-a}dx=\frac{(b-a)^2}{12}$$

Comments

Popular posts from this blog

Pytorch and Keras cheat sheets

Sigmoid, tanh, ReLU functions. What are they and when to use which?

If you are working on Deep Learning or Machine Learning in general, you have heard of these three functions quite frequently. We know that they can all be used as activation functions in neural networks. But what are these functions and why do people use for example ReLU in this part, sigmoid in another part and so on? Here is a friendly introduction to these functions and a brief explanation of when to use which. Sigmoid function Output from 0 to 1 Exponential computation (hence, slow) Is usually used for binary classification (when output is 0 or 1) Almost never used (e.g., tanh is a better option) Tanh function A rescaled logistic sigmoid function (center at 0) Exponential computation Works better than sigmoid ReLU function (Rectified Linear Unit) and its variants Faster to compute Often used as default for activation function in hidden layers ReLU is a simple model which gives 0 value to all W*x + b < 0. The importance is that it introduces t...

Python Tkinter: Changing background images using key press

Let's write a simple Python application that changes its background image everytime you click on it. Here is a short code that helps you do that: import os, sys import Tkinter import Image, ImageTk def key(event): print "pressed", repr(event.char) event.widget.quit() root = Tkinter.Tk() root.bind_all(' ', key) root.geometry('+%d+%d' % (100,100)) dirlist = os.listdir('.') old_label_image = None for f in dirlist: try: image1 = Image.open(f) root.geometry('%dx%d' % (image1.size[0],image1.size[1])) tkpi = ImageTk.PhotoImage(image1) label_image = Tkinter.Label(root, image=tkpi) label_image.place(x=0,y=0,width=image1.size[0],height=image1.size[1]) root.title(f) if old_label_image is not None: old_label_image.destroy() old_label_image = label_image root.mainloop() # wait until user clicks the window except Exception, e: # Skip a...