Important knowledge

This is the mathematical and probability knowledge required for this course:

The mathematics part should have been covered in pre-requisite courses. Coverage of probability is less certain (pun intended 😉) and needed in the second half of the course, so you should review the second part below with particular care.

Revisions Part 1: Mathematics #

Functions and their derivatives #

  1. Be familiar with functions xα,eαx,ln(1+x)
  2. Basic derivatives: (xα)=αxα1 (eαx)=αeαx (ln(1+x))=11+x (ax)=axln(a)
  3. Taylor's expansion (here for an exponential random variable): ex=1+x+x22!+x33!+...+xnn!+...

Be able to find expressions for following summations #

i=1nxi,i=1nixi,i=1ni See Tutorial 0 (Revisions) for solutions.

Change the order of double summation #

k=1nj=1kak,j=j=1nk=jnak,j

The solution to a quadratic equation #

The equation ax2+bx+c=0 has two solutions:

x1=b+b24ac2a,b2>4ac x2=bb24ac2a,b2>4ac

If b24ac = 0, the equation has a double solution: x=b2a

Example #

For example, find a value for v such that v(0,1) and v satisfies the equation: v2v0.5+34=0.

Solution: let v12=x. Then the equation above simplifies to x22x+34=0 which has two solutions:

x1=2+432=1.5,        x2=2432=0.5

We reject the solution x1=1.5 as x1>1. Then v=x22=0.52=0.25 is the required solution.

Be able to solve simple differential equations #

Example 1 #

For example, solve f(x)=2x with initial condition f(0)=1.

Solution:

f(x)=f(0)+0xf(t)dt=1+0x2tdt=1+t2|0x=1+x2

Example 2 #

Solve f(x)=2f(x) with initial condition f(0)=1.

Solution:

f(x)f(x)=2(lnf(x))=2. lnf(x)lnf(0)=0x(lnf(t))dt=0x2dt=2x lnf(x)=2xf(x)=e2x

Integrals #

  1. We have abf(x)dx=F(b)F(a), where F(x) is the anti-derivative of f(x), such that F(x)=f(x).
  2. The integration variable is just a tool, that is, abf(x)dx=abf(y)dy, it does not matter to use x or y.
  3. We have $0f(x)dx=limb0bf(x)dx=limbF(b)F(0). For example: 0exdx=limb0bexdx=limb(ex|0b)=1limbaeb=1
  4. Integration by parts: abf(x)g(x)dx=f(x)g(x)|ababg(x)f(x)dx=f(b)g(b)f(a)g(a)abg(x)f(x)dx

The average of a function on [a,b] #

Let f(x) be a continuous function on [a,b]. Then there exists a point c[a,b] such that abf(x)dx=f(c)(ba),        c[a,b]

Interpretation: abf(x)dx is the area of the region enclosed by f(x), x-axis, x=a, x=b. f(c)(ba) is the area of the rectangle of height f(c) and length (ba).

Definition: abf(x)dxba=f(c): the average value of f(x) on [a,b] interval.

Example 1: #

11xdx2=0

The average value of f(x)=x on [1,1] is 0.

Example 2: #

01xdx10=12

The average value of f(x)=x on [0,1] is 12.

Example 3: #

11x2dx2=13 The average value of f(x)=x2 on [1,1] is 13.

The trapezoid rule in integration #

abf(x)dx12[f(b)+f(a)](ba) abf(x)dxba12[f(b)+f(a)]

The average value of f(x) on [a,b] can be approximated by 12[f(b)+f(a)].

The definition of abf(x)dx and its numerical calculations #

abf(x)dx=limnk=0n1f(xk)ban,(1) where x0=a,x1=x0+ban,...,xk+1=xk+ban,...,xn=b

In the summation in (1), each term represents the area of a rectangle. f(xk)ban represents the area of the k-th rectangle.

Approximations:

  1. abf(x)dxf(a)(ba)       (n=1)
  2. abf(x)dxba2[f(a)+f(b+a2)]
  3. abf(x)dx(ba)f(x0)+f(x1)+...+f(xn1)n: the average of f(x0),f(x1),...,f(xn1) times (ba).
  4. If a=0,b=1, 01f(x)dxf(x0)+f(x1)+...+f(xn1)n.

Alternatively,

abf(x)dx=limnk=1nf(xk)ban(2), where x1=x0+ban,x2=x1+ban,...,xn=b.

Approximations:

  1. abf(x)dxf(b)(ba)
  2. abf(x)dxba2[f(b+a2)+f(b)]
  3. abf(x)dx(ba)f(x1)+f(x2)+...+f(xn)n

The average number of n numbers #

Let x1,x2,...,xn be n numbers. Then x1+...+xnn=1ni=1nxi is the average value of x1,x2,...,xn.

Example 1 #

The average value of 1,2,...,n is 1ni=1ni=1nn(n+1)2=n+12, where 1+2+3+...+n=n(n+1)2 is given in Tutorial 0.

Example 2 #

One student took 8 subjects in his first year at University of Melbourne. The results are as follows: Semester 1: 75, 83, 65, 90; Semester 2: 60, 76, 80, 50.

Then

  • 75+83+65+90+60+76+80+50=579 is the total marks from year 1.
  • The average mark is 5798=72.4
  • The average mark for S1 is 75+83+65+904=78.2
  • The average mark for S2 is 60+76+80+504=66.5

The weighted average of n numbers #

Let x1,x2,...,xn be n real numbers.

Let θ1,θ2,....,θn be n numbers such that 0θi1 and i=1nθi=1. Then i=1nθixi is called the weighted average of x1,x2,...,xn.

Note:

  • θi is the weight attached to xi.
  • if θi=1n, then i=1n1nxi=1ni=1nxi is the average of x1,x2,...,xn (equally weighted).

Example #

In the assessment of ACTL10001, the assignments account for 20%, the mid-semester exam accounts for 10%, and the final exam accounts for 70%. A student got 70 out of 100 for mid-semester result, 95 out of 100 for assignments, and 80 for final exam. Then the overall weighted average mark is 70×10%+95×20%+80×70%=82.

Revisions Part 2: Probability #

The following contents are the object of a video recorded in August 2021: annotated pdf

If you wish to watch the embedded videos from Lecture Capture, you need to have logged in and entered Lecture Capture via Canvas once for each session. This is to restrict access to students enrolled at the University of Melbourne only.

Events and Probability #

Vocabulary: events vs probability #

It is important to understand the difference between events and probability:

  • Event: what could happen - an actual “thing”, in real life, that could happen;
  • Probability: our understanding of the “likelihood” (or frequency) of an event (something that could occur).

So when we are building a mathematical model for uncertain outcomes:

  1. The first step is to work out what are all the possible things that could occur (for instance, “rain” or “no rain”). The full set of those is denoted Ω.
  2. The second step is to make assumptions about how likely those things can occur. Here " Pr " is an operator that maps an event into a probability. For instance, Pr[rain]=0.2 means that the likelihood corresponding to the event “rain” is 20%.

In what follows we outline basic results and axioms around events and their probabilities. Often logic means that a result or definition on one side (e.g. events) can be translated on the other side (e.g. probabilities).
For instance, the complement to an event is exactly whatever could happen, that is not the event. Hence, the probability of the complement must be 1 minus the probability of the original event; see 2.1.2.4 below.

Events, operations of events, probability of an event #

  1. : empty set, that it, it is an impossible event: Pr()=0.
  2. Ω: the full set of possible outcomes, that is, it is a certain event: Pr(Ω)=1.
  3. A: an event (within Ω), 0Pr(A)1.
  4. AC: the event that A does not occur (called a “complement”): Pr(AC)=1Pr(A).
  5. AB: A and B, the event that both A and B occur.
  6. AB: A or B, the event that either A or B, or both events occur.
  7. AB: If A occurs, B must, and:
    • Pr(A)Pr(B);
    • AB=A.
      Example: A = {a 20-year old survives to age 70}, B = {the 20-year old survives to age 50}. Then AB.

Mutually exclusive events #

If AB=, then A and B are mutually exclusive. Also,
Pr(AB)=Pr(A)+Pr(B).

Independent events A and B #

If A and B are independent, then Pr(AB)=Pr(A)Pr(B).

Conditional probability formula #

We have

Pr(A|B)=Pr(AB)Pr(B).

This leads to Bayes' theorem, see for instance this.

Also,

  1. If AB, then AB=A and Pr(A|B)=Pr(A)Pr(B).
  2. If A and B are independent, then Pr(A|B)=Pr(AB)Pr(B)=Pr(A)Pr(B)Pr(B)=Pr(A).
  3. If BA, then AB=B and Pr(A|B)=Pr(B)Pr(B)=1. Given B has occurred, A is certain.

Random variables and their distribution #

Definition #

A random variable, denoted by capital letters X,Y,Z, is a quantity whose value is subject to variations due to chance.

Distribution Function #

Definition: F(x)=Pr(Xx)        xR F(x) is called the distribution function of X, and it has the following properties:

  1. F()=0,F()=1.
  2. F(x1)F(x2), if x1x2.
  3. F(x) is right-continuous (aka “càdlàg”), i.e., limxx0+F(x)=F(x0).
  4. F(b)F(a)=Pr(a<Xb).
  5. F(b)F(a)=Pr(a<X<b).
  6. F(b)F(a)=Pr(aXb).
  7. F(b)F(b)=Pr(X=b)0

In our subject, we generally assume that X0 so that F(x)=0 for x<0.

Difference betweeen continuous and discrete random variables #

As an introduction to differences between continuous and discrete random variables, review this video:

Continuous random variables ( X0 ) #

X is said to be a continuous r.v. if X has a probability density function f(x), x0, with the following properties:

  1. f(x)=F(x).
  2. F(x)=0xf(y)dy.
  3. Pr(a<Xb)=Pr(a<X<b)=Pr(aXb)=Pr(aX<b)=abf(x)dx
  4. F(x) typically looks like but note that it does not need to be concave.
  5. E(X)=0xf(x)dx=0[1F(x)]dx

Discrete random variables #

A random variable X is said to be a discrete random variable if X takes values from a countable set of numbers {x1,x2,...,xn,...}.

  1. Probability distribution of X where pn=Pr(X=xn),n=1,2,3,...

  2. E(X)=n=1xnpn

  3. The distribution function F(x) is a piece-wise constant function (also called step function).

Moments of a random variable #

Expectation and variance #

Expectation of X: E(X)=0xf(x)dx. Variance of X: Var(X)=E{[XE(X)]2}=E(X2)[E(X)]2.

Furthermore:

  1. Var(X) measures the variability of X. The larger the variance, the more variability X has.
  2. If Var(X)=0, XcE(X). There is no variability for X. X is a constant.
  3. If X and Y are independent, then Var(X+Y)=Var(X)+Var(Y).
  4. Var(aX)=a2Var(X)

Moments of the average of iid rv’s #

Assume X1,X2,...,Xn are independently and identically distributed, with E(X1)=μ, and Var(X1)=σ2. Define Yn=1n(X1+X2+...+Xn) to be the average of X1,X2,...,Xn. Then E(Yn)=1n(E(X1)+E(X2)+...+E(Xn))=1n(μ+μ+...+μ)=μ and Var(Yn)=1n2(Var(X1)+Var(X2)+...+Var(Xn))=1n2(σ2+σ2+...+σ2)=σ2n. When n, Var(Yn)0. That is to say, as n, Ynμ. With an infinite sample of X’s, you can estimate μ with certainty.

Selected distributions #

Binomial distribution #

If XBin(n,p), then Pr(X=k)=(nk)pk(1p)nk,   0<p<1,k=0,1,2,...,n.

Note:

  1. E(X)=np, Var(X)=np(1p).
  2. F(k)=Pr(Xk)=j=0kPr(X=j)=j=0k(nj)pj(1p)nj,   k=0,1,2,...,n.
  3. F(2.5)=Pr(X2.5)=Pr(X2)=F(2).
  4. X represents # of successes out of n independent trials, each trail has two outcomes: sucess with probability p OR failure with probability 1p.

Exponential distribution #

If XExp(λ) then f(x)=λeλx,x>0,λ>0.

Note:

  1. F(x)=1eλx,x0.
  2. E(X)=0xf(x)dx=0[1F(x)]dx=1λ.

Uniform distribution #

If XU(0,M) then f(x)=1M,0xM.

Note:

  1. F(x)=xM,0xM.
  2. E(X)=0Mx1Mdx=0M(1xM)dx=M2.

Credit #

The initial version of those notes were developed by Professor Shuanming Li in 2018. These were then transcribed modified and augmented by Professor Benjamin Avanzi in 2021.