RANDOM VALUES AND THE LAWS OF THEIR DISTRIBUTION.
Random called a quantity that takes values depending on the combination of random circumstances. Distinguish discrete and random continuous quantities.
Discrete A quantity is called if it takes a countable set of values. ( Example: the number of patients at the doctor's office, the number of letters per page, the number of molecules in a given volume).
Continuous called a quantity that can take values within a certain interval. ( Example: air temperature, body weight, human height, etc.)
distribution law A random variable is a set of possible values of this quantity and, corresponding to these values, probabilities (or frequencies of occurrence).
EXAMPLE:
x | x 1 | x2 | x 3 | x4 | ... | x n |
p | p 1 | p 2 | p 3 | p 4 | ... | p n |
x | x 1 | x2 | x 3 | x4 | ... | x n |
m | m 1 | m2 | m 3 | m4 | ... | m n |
NUMERICAL CHARACTERISTICS OF RANDOM VALUES.
In many cases, along with the distribution of a random variable or instead of it, information about these quantities can be provided by numerical parameters called numerical characteristics of a random variable . The most commonly used of them:
1 .Expected value - (mean value) of a random variable is the sum of the products of all its possible values by the probabilities of these values:
2 .Dispersion random variable:
3 .Standard deviation :
The THREE SIGMA rule - if a random variable is distributed according to a normal law, then the deviation of this value from the mean value in absolute value does not exceed three times the standard deviation
ZON GAUSS - NORMAL DISTRIBUTION LAW
Often there are values distributed over normal law (Gauss' law). main feature : it is the limiting law to which other laws of distribution approach.
A random variable is normally distributed if its probability density looks like:
M(X)- mathematical expectation of a random variable;
s- standard deviation.
Probability Density(distribution function) shows how the probability related to the interval changes dx random variable, depending on the value of the variable itself:
BASIC CONCEPTS OF MATHEMATICAL STATISTICS
Math statistics- a branch of applied mathematics, directly adjacent to the theory of probability. The main difference between mathematical statistics and probability theory is that mathematical statistics does not consider actions on distribution laws and numerical characteristics of random variables, but approximate methods for finding these laws and numerical characteristics based on experimental results.
Basic concepts mathematical statistics are:
1. General population;
2. sample;
3. variation series;
4. fashion;
5. median;
6. percentile,
7. frequency polygon,
8. bar chart.
Population- a large statistical population from which some of the objects for research are selected
(Example: the entire population of the region, university students of the city, etc.)
Sample (sample population)- a set of objects selected from the general population.
Variation series- statistical distribution, consisting of variants (values of a random variable) and their corresponding frequencies.
Example:
X, kg | ||||||||||||
m |
x- the value of a random variable (mass of girls aged 10 years);
m- frequency of occurrence.
Fashion– the value of the random variable, which corresponds to the highest frequency of occurrence. (In the example above, 24 kg is the most common value for fashion: m = 20).
Median- the value of a random variable that divides the distribution in half: half of the values are located to the right of the median, half (no more) - to the left.
Example:
1, 1, 1, 1, 1. 1, 2, 2, 2, 3 , 3, 4, 4, 5, 5, 5, 5, 6, 6, 7 , 7, 7, 7, 7, 7, 8, 8, 8, 8, 8 , 8, 9, 9, 9, 10, 10, 10, 10, 10, 10
In the example, we observe 40 values of a random variable. All values are arranged in ascending order, taking into account the frequency of their occurrence. It can be seen that 20 (half) of the 40 values are located to the right of the selected value 7. So 7 is the median.
To characterize the scatter, we find the values that were not higher than 25 and 75% of the measurement results. These values are called the 25th and 75th percentiles . If the median bisects the distribution, then the 25th and 75th percentiles are cut off from it by a quarter. (The median itself, by the way, can be considered the 50th percentile.) As you can see from the example, the 25th and 75th percentiles are 3 and 8, respectively.
use discrete (point) statistical distribution and continuous (interval) statistical distribution.
For clarity, statistical distributions are depicted graphically in the form frequency polygon or - histograms .
Frequency polygon- a broken line, the segments of which connect points with coordinates ( x 1 ,m 1), (x2,m2), ..., or for polygon of relative frequencies - with coordinates ( x 1 ,p * 1), (x 2 ,p * 2), ...(Fig.1).
m m i /n f(x)
Fig.1 Fig.2
Frequency histogram- a set of adjacent rectangles built on one straight line (Fig. 2), the bases of the rectangles are the same and equal dx , and the heights are equal to the ratio of frequency to dx , or R * to dx (probability density).
Example:
x, kg | 2,7 | 2,8 | 2,9 | 3,0 | 3,1 | 3,2 | 3,3 | 3,4 | 3,5 | 3,6 | 3,7 | 3,8 | 3,9 | 4,0 | 4,1 | 4,2 | 4,3 | 4,4 |
m |
Frequency polygon
The ratio of the relative frequency to the width of the interval is called probability density f(x)=m i / n dx = p* i / dx
An example of constructing a histogram .
Let's use the data from the previous example.
1. Calculation of the number of class intervals
where n - number of observations. In our case n = 100 . Consequently:
2. Calculation of the interval width dx :
,
3. Drawing up an interval series:
dx | 2.7-2.9 | 2.9-3.1 | 3.1-3.3 | 3.3-3.5 | 3.5-3.7 | 3.7-3.9 | 3.9-4.1 | 4.1-4.3 | 4.3-4.5 |
m | |||||||||
f(x) | 0.3 | 0.75 | 1.25 | 0.85 | 0.55 | 0.6 | 0.4 | 0.25 | 0.05 |
bar chart
Odessa National Medical University Department of Biophysics, Informatics and Medical Equipment Guidelines for 1st year students on the topic “Fundamentals of Mathematical Statistics” Odessa 2009Mathematical statistics is a branch of mathematics that studies methods for collecting, systematizing and processing the results of observations of massive random events in order to clarify and apply existing patterns in practice. Methods of mathematical statistics are widely used in clinical medicine and public health. They are used, in particular, in the development of mathematical methods for medical diagnostics, in the theory of epidemics, in planning and processing the results of a medical experiment, and in organizing healthcare. Statistical concepts, consciously or unconsciously, are used to make decisions in such matters as clinical diagnosis, predicting the course of an individual patient's illness, predicting the likely outcomes of certain programs in a given population, and choosing the appropriate program in specific circumstances. Familiarity with the ideas and methods of mathematical statistics is a necessary element of the professional education of every health worker.
3. Whole classes. The general goal of the lesson is to teach students to consciously use mathematical statistics in solving problems of a biomedical profile. Specific whole classes:The application of statistics in health care is needed both at the community level and at the level of individual patients. Medicine deals with individuals who differ from each other in many ways, and the value of the indicators on the basis of which a person can be considered healthy vary from one individual to another. No two patients or two groups of patients are exactly alike, so decisions regarding individual patients or populations must be made on the basis of experience gained from other patients or populations with similar biological characteristics. It is necessary to realize that, given the existing discrepancies, these decisions cannot be absolutely accurate - they are always associated with some uncertainty. It is in this that the modern nature of medicine consists.
The task of the sampling method is to make a correct estimate of the random variable that is being studied using the obtained voter. Therefore, the main requirement that is presented to the selection is the maximum display of all the features of the general population. The selection that satisfies this requirement is called representative. The evaluation of the assessment depends on the representativeness of the selection, that is, the degree of compliance of the assessment with the parameter that it characterizes.
The conclusions that are obtained by the methods of mathematical statistics are always based on a limited, selective number of observations, so it is natural that for the second sample the results may be different. This circumstance determines the imaginative nature of the conclusions of mathematical statistics and, as a consequence, the widespread use of probability theory in the practice of statistical research.
decreases with growth of n, so, with a constant value of the reliable interval, with growth of n, u increases. With a constant reliable probability, with an increase in the volume of viborkp, the size of the reliable interval decreases. When planning medical research, this relationship is used to determine the minimum sample volume that will provide the values of the reliable interval and reliable probability required by the conditions of the problem being solved.
Math statistics is a branch of mathematics devoted to the methods of collecting, analyzing and processing the results of statistical observational data for scientific and practical purposes. Methods of mathematical statistics are used in those cases when they study the distribution mass phenomena, i.e. a large collection of objects or phenomena distributed on a certain basis.
Let a set of homogeneous objects, united by a common feature or property of a qualitative or quantitative nature, be studied. Individual elements of such a set are called its members. The total number of members of a population is its volume. The set of all objects united by some attribute will be called general population. For example, the income of the population, the market value of shares or the deviation from the State Standard are studied in the course of a qualitative assessment of manufactured products.
Mathematical statistics is closely related to the theory of probability and relies on its conclusions. In particular, the concept population in mathematical statistics corresponds to the concept space of elementary events in probability theory.
The study of the entire general population is most often impossible or impractical due to significant material costs, damage or destruction of the object of study. Thus, it is impossible to obtain objective and complete information on the income of the population of the entire region; each individual inhabitant. Due to the deterioration of the research object, it is impossible to obtain reliable information about the quality, for example, of certain medicines or food products.
Main a task mathematical statistics is to study the general population based on sample data depending on the goal, that is, the study of the probabilistic properties of the population: the law of distribution, numerical characteristics, etc. for making managerial decisions under conditions of uncertainty.
One of the methods of mathematical statistics is sampling method. In practice, most often, not the entire population is studied, but a limited sample from it.
sample(sample set) is a set of randomly selected objects. With the help of the sampling method, not the entire population is examined, but the sample ( X 1 ,X 2 ,...,x n) as a result of a limited number of observations. Then, according to the probabilistic properties of this sample, a judgment is made about the entire population from a certain general population. Various sampling methods are used to obtain a sample. The objects of study after the study can be in the general population, which corresponds to
sample.
The sample is called representative or representative, if it reproduces the general population well, that is, the probabilistic properties of the sample coincide or are close to the properties of the general population itself.
So, the effectiveness of the application of the sampling method increases under a number of conditions, which include the following:
Number of sample items studied enough to draw conclusions, that is, the sample is representative or " representative».
So, a sufficient number of parts in a batch that is checked for quality (marriage) is established using the laws of probability theory and mathematical statistics.
Sample items must be varied, taken randomly, those. principle must be respected randomization.
Studied trait – typical, is typical for all elements of the set of studied objects – those. for the entire population.
The trait being studied is essential for all elements of this class.
A change in a sign of a statistical population studied by a sampling method is called variation, and the observed values of the feature x i - option. Absolute frequency (frequency or frequency) options x i called the number of members of the population (general or sample) that have the value x i(i.e. this is the number of particles i- th grade).
Ranked grouping of the variant according to the individual values of the attribute (or according to the intervals of change), i.e. a sequence of options arranged in ascending order is called variational series. Any function ( X 1 ,X 2 ,…,X n) from the results of observations X 1 ,X 2 ,…,X n the random variable under study is called statistics.
Accepted volume of the general population designate N, its absolute frequencies are N i, sample size - n, its absolute frequencies are n i. It's obvious that
,
.
The ratio of frequency to population size is called relative frequency or statistical probability and denoted W i or :
.
If the number of options is large or close to the sample size (with a discrete distribution), and also if the sample is made from a continuous general population, then the variation series is not compiled by individual - point - values, but intervals population values. The variational series represented by the table, constructed using the grouping procedure, will be called interval. When compiling an interval variation series, the first line of the table is filled with intervals of values of the studied population equal in length, the second - with the corresponding absolute or relative frequencies.
Let from some general population as a result n observations retrieved volume sample P. Statistical distribution samples called a list of options and their corresponding absolute or relative frequencies. Dot variation series absolute frequencies can be represented by a table:
x i |
X k |
|||
n i |
n k |
and
.
Dot variation series relative frequencies represented by a table:
x i |
X k |
|||
and
.
When constructing an interval distribution, there are rules in choosing the number of intervals or the size of each interval. The criterion here is the optimal ratio: with an increase in the number of intervals, the representativeness improves, but the amount of data and the time for processing them increase. Difference x max - x min between the largest and smallest values is a variant called on a grand scale samples.
To count the number of intervals k Sturgess' empirical formula is usually used:
k= 1+3.3221g n (3.1)
(assuming rounding to the nearest integer). Accordingly, the value of each interval h can be calculated using the formula:
. (3.2)
x min = x max - 0,5h.
Each interval must contain at least five options. In the case when the number of options in the interval is less than five, it is customary to combine adjacent intervals.
Math statistics- this is a branch of mathematics that studies approximate methods for collecting and analyzing data based on the results of an experiment to identify existing patterns, i.e. finding laws of distribution of random variables and their numerical characteristics.
In mathematical statistics, it is customary to distinguish two main areas of research:
1. Estimation of the parameters of the general population.
2. Testing statistical hypotheses (some a priori assumptions).
The basic concepts of mathematical statistics are: general population, sample, theoretical distribution function.
General population is the set of all conceivable statistical data in observations of a random variable.
X G \u003d (x 1, x 2, x 3, ..., x N, ) \u003d ( x i; i \u003d 1,N)
The observed random variable X is called a feature or sampling factor. The general population is a statistical analogue of a random variable, its volume N is usually large, therefore, a part of the data is selected from it, called the sample population or simply a sample.
X B \u003d (x 1, x 2, x 3, ..., x n, ) \u003d ( x i; i \u003d 1,n)
Х В М Х Г, n £ N
Sample is a collection of randomly selected observations (objects) from the general population for direct study. The number of objects in the sample is called the sample size and is denoted by n. Typically, the sample is 5% -10% of the general population.
The use of a sample to construct patterns to which an observed random variable is subject makes it possible to avoid its continuous (mass) observation, which is often a resource-intensive process, or even simply impossible.
For example, a population is a set of individuals. The study of an entire population is laborious and expensive, therefore, data are collected on a sample of individuals who are considered representatives of this population, allowing to draw a conclusion about this population.
However, the sample must necessarily satisfy the condition representativeness, i.e. give a reasonable idea of the general population. How to form a representative (representative) sample? Ideally, a random (randomized) sample is sought. To do this, a list of all individuals in the population is compiled and randomly selected. But sometimes the costs of compiling the list may be unacceptable, and then take an acceptable sample, for example, one clinic, hospital, and examine all patients in that clinic with this disease.
Each item in the sample is called a variant. The number of repetitions of options in the sample is called the frequency of occurrence. The value is called relative frequency options, i.e. is found as the ratio of the absolute frequency of variants to the entire sample size. A sequence of options written in ascending order is called variational series.
Let's consider three forms of variation series: ranged, discrete and interval.
ranked row- this is a list of individual units of the population in ascending order of the trait under study.
Discrete variation series is a table consisting of graphs or lines: a specific value of the attribute x i and the absolute frequency n i (or relative frequency ω i) of the manifestation of the i-th value of the attribute x.
An example of a variation series is the table
Write the distribution of relative frequencies.
Solution: Find the relative frequencies. To do this, we divide the frequencies by the sample size:
The distribution of relative frequencies has the form:
0,15 | 0,5 | 0,35 |
Control: 0.15 + 0.5 + 0.35 = 1.
A discrete series can be represented graphically. In a rectangular Cartesian coordinate system, points with coordinates () or () are marked, which are connected by straight lines. Such a broken line is called frequency polygon.
Construct a discrete variation series (DVR) and draw a distribution polygon for 45 applicants according to the number of points they received in the entrance exams:
39 41 40 42 41 40 42 44 40 43 42 41 43 39 42 41 42 39 41 37 43 41 38 43 42 41 40 41 38 44 40 39 41 40 42 40 41 42 40 43 38 39 41 41 42.
Solution: To construct a variational series, we arrange the various values of the attribute x (options) in ascending order and write down its frequency under each of these values.
Let's build a polygon of this distribution:
Rice. 13.1. Frequency polygon
Interval variation series used for a large number of observations. To build such a series, you need to select the number of feature intervals and set the length of the interval. With a large number of groups, the interval will be minimal. The number of groups in a variation series can be found using the Sturges formula: (k is the number of groups, n is the sample size), and the interval width is
where is the maximum; - the minimum value of the variant, and their difference R is called span variation.
We study a sample of 100 people from the totality of all students of a medical university.
Solution: Calculate the number of groups: . Thus, to compile an interval series, it is better to divide this sample into 7 or 8 groups. The set of groups into which the results of observations are divided and the frequencies of obtaining the results of observations in each group is called aggregate.
A histogram is used to visualize a statistical distribution.
Frequency histogram- this is a stepped figure, consisting of adjacent rectangles built on the same straight line, the bases of which are the same and equal to the width of the interval, and the height is equal to either the frequency of falling into the interval or the relative frequency ω i .
Observations of the number of particles that hit the Geiger counter for a minute gave the following results:
21 30 39 31 42 34 36 30 28 30 33 24 31 40 31 33 31 27 31 45 31 34 27 30 48 30 28 30 33 46 43 30 33 28 31 27 31 36 51 34 31 36 34 37 28 30 39 31 42 37.
Based on these data, build an interval variation series with equal intervals (I interval 20-24; II interval 24-28, etc.) and draw a histogram.
Solution:n=50
The histogram of this distribution looks like:
Rice. 13.2. Distribution histogram
Task options
№ 13.1. Every hour the voltage in the mains was measured. In this case, the following values were obtained (B):
227 219 215 230 232 223 220 222 218 219 222 221 227 226 226 209 211 215 218 220 216 220 220 221 225 224 212 217 219 220.
Build a statistical distribution and draw a polygon.
№ 13.2. Observations of blood sugar in 50 people gave the following results:
3.94 3.84 3.86 4.06 3.67 3.97 3.76 3.61 3.96 4.04
3.82 3.94 3.98 3.57 3.87 4.07 3.99 3.69 3.76 3.71
3.81 3.71 4.16 3.76 4.00 3.46 4.08 3.88 4.01 3.93
3.92 3.89 4.02 4.17 3.72 4.09 3.78 4.02 3.73 3.52
3.91 3.62 4.18 4.26 4.03 4.14 3.72 4.33 3.82 4.03
Based on these data, build an interval variation series with equal intervals (I - 3.45-3.55; II - 3.55-3.65, etc.) and depict it graphically, draw a histogram.
№ 13.3. Construct a range of frequencies for the distribution of erythrocyte sedimentation rate (ESR) in 100 people.
The data obtained as a result of the experiment is characterized by variability, which can be caused by a random error: the error of the measuring device, the heterogeneity of the samples, etc. After conducting a large amount of homogeneous data, the experimenter needs to process them in order to extract the most accurate information about the quantity under consideration. For processing large arrays of measurement data, observations, etc., which can be obtained during the experiment, it is convenient to use methods of mathematical statistics.
Mathematical statistics is inextricably linked with the theory of probability, but there is a significant difference between these sciences. Probability theory uses the already known distributions of random variables, on the basis of which the probabilities of events, mathematical expectation, etc. are calculated. Problem of mathematical statistics– to obtain the most reliable information about the distribution of a random variable based on experimental data.
Typical directions mathematical statistics:
Methods for evaluating and testing hypotheses are based on probabilistic and hyper-random models of data origin.
Mathematical statistics evaluates parameters and functions from them, which represent important characteristics of distributions (median, mathematical expectation, standard deviation, quantiles, etc.), density and distribution functions, etc. Point and interval estimates are used.
Modern mathematical statistics contains a large section − statistical sequential analysis, in which the formation of an array of observations for one array is allowed.
Mathematical statistics also contains general hypothesis testing theory and a large number of methods for testing specific hypotheses(for example, about the symmetry of the distribution, about the values of parameters and characteristics, about the agreement of the empirical distribution function with the given distribution function, the homogeneity test hypothesis (coincidence of characteristics or distribution functions in two samples), etc.).
By holding sample surveys, associated with the construction of adequate methods for evaluating and testing hypotheses, with the properties of different schemes for organizing samples, the branch of mathematical statistics, which is of great importance, is engaged. Methods of mathematical statistics directly uses the following basic concepts.
Definition 1
sample called the data obtained during the experiment.
For example, the results of the range of a bullet when firing the same or a group of the same type of guns.
Remark 1
distribution function makes it possible to express all the most important characteristics of a random variable.
In mathematical statistics, there is a concept theoretical(not previously known) and empirical distribution functions.
The empirical function is determined according to the data of experience (empirical data), i.e. by sample.
Histograms are used to provide a visual, but rather approximate, representation of an unknown distribution.
bar chart is a graphical representation of the distribution of data.
To obtain a high-quality histogram, adhere to the following rules:
If the sample is very large, often the interval of sample elements is divided into equal parts.
Using these concepts, one can obtain an estimate of the necessary numerical characteristics of an unknown distribution without resorting to the construction of a distribution function, a histogram, etc.
nanbaby.ru - Health and beauty. Fashion. Children and parents. Leisure. Gen. House