Bài tập bổ trợ unit 5 technology and you violet năm 2024

  • 1. wisest man alive, for I know one thing, and that is that I know nothing. -Plato, The Republic
  • 2. role of raw material for any statistical investigation and defined in a single sentence as “The values of different objects collected in a survey or recorded values of an experiment over a time period taken together constitute what we call data in Statistics” Each value in the data is known as observation. Statistical data based on the characteristic, nature of the characteristic, level of measurement, time and ways of obtaining it may be classified as follows:
  • 3.
  • 4. name quantitative itself suggests that it is related to the quantity. In fact, data are said to be quantitative data if a numerical quantity (which exactly measure the characteristic under study) is associated with each observation. Generally, interval or ratio scales are used as a measurement of scale in case of quantitative data. Data based on the following characteristics generally gives quantitative type of data. Such as weight, height, ages, length, area, volume, money, temperature, humidity, size, etc. For example, (i) Weights in kilogram (say) of students of a class. (ii) Height in centimetre (say) of the candidates appearing in a direct recruitment of Indian army organised by a particular cantonment. (iii) Age of the females at the time of marriage celebrated over a period of week in Delhi. (iv) Length (in cm) of different tables in a showroom of furniture.
  • 5. name qualitative itself suggests that it is related to the quality of an object/thing. It is obvious that quality cannot be measured numerically in exact terms. Thus, if the characteristic/attribute under study is such that it is measured only on the bases of presence or absence then the data thus obtained is known as qualitative data. Generally nominal and ordinal scales are used as a measurement of scale in case of qualitative data. Data based on the following characteristics generally gives qualitative data. Such as gender, marital status, qualification, colour, religion, satisfaction, types of trees, beauty, honesty, etc. For example, i. If the characteristic under study is gender then objects can be divided into two categories, male and female. ii. If the characteristic under study is marital status then objects can be divided into four categories married, unmarried, divorcee, widower. iii. If the characteristic under study is qualification (say) ‘matriculation’ then objects can be divided into two categories as ‘Matriculation passed’ and ‘not passed’. iv. If the characteristic under study is ‘colour’ then the objects can be divided into a number of categories Violet, Indigo, Blue, Green, Yellow, Orange and Red.
  • 6. nature of the characteristic under study is such that values of observations may be at most countable between two certain limits then corresponding data are known as discrete data For example, (i) Number of books on the self of an Elmira in a library form discrete data. Because number of books may be 0 or 1 or 2 or 3,…. But number of books cannot take any real values such as 0.8, 1.32, 1.53245, etc. (ii) If there are 30 students in a class, then number of students presents in a lecture forms discrete data. Because number of present students may be 1 or 2 or 3 or 4 or…or 30. But number of present students cannot take any real values between 0 and 30 such as 1.8675, 22.56, 29.95, etc. (iii) Number of children in a family in a locality forms discrete data. Because number of children in a family may be 0 or 1 or 2 or 3 or 4 or…. But number of children cannot take any real values such as 2.3, 3.75, etc. (iv) Number of mistakes on a particular page of a book. Obviously number of mistakes may be 0 or 1 or 2 or 3…. But cannot be 6.74, 3.9832, etc.
  • 7. said to be continuous if the measurement of the observations of a characteristic under study may be any real value between two certain limits. For example, (i) Data obtained by measuring the heights of the students of a class of say 30 students form continuous data, because if minimum and maximum heights are 152cm and 175 cm then heights of the students may take any possible values between 152 cm and 175 cm. For example, it may be 152.2375 cm, 160.31326… cm, etc. (ii) Data obtained by measuring weights of the students of a class also form continuous data because weights of students may be 48.25796…kg, 50.275kg, 42.314314314…kg, etc.
  • 8. of data is done to solve a purpose in hand. The purpose may have its connection with time, geographical location or both. If the purpose of data collection has its connection with time then it is known as time series data. That is, in time series data, time is one of the main variables and the data collected usually at regular interval of time related to the characteristic(s) under study show how characteristic(s) changes over the time. For example, quarterly profit of a company for last eight quarters, yearly production of a crop in India for last six years, yearly expenditure of a family on different items for last five years, weekly rate of inflation for last ten weeks, etc. all form time series data. If the purpose of the data collection has its connection with geographical location then it is known as Spatial Data. For example, (i) Price of petrol in Delhi, Haryana, Punjab, Chandigarh at a particular time. (ii) Number of runs scored by a batsman in different matches in a one day series in different stadiums. If the purpose of the data collection has its connection with both time and geographical location then it is known as Spacio Temporal Data. For example, data related to population of different states in India in 2001 and 2011 will be Spacio Temporal Data. In time series data, spatial data and spacio temporal data we see that concept of frequency have no significance and hence known as non-frequency data. For instance, in the example discussed in case of time series data, expenditure of Rs 40000 on food in 2006 is itself important, here its frequency say 3 (repeated three times) does not make any sense. Now consider the case of marks of 40 students in a class out of 10 (say). Here we note that there may be more than one student who score same marks in the test. Suppose out of 40 students 5 score 10 out of 10, it means marks 10 have frequency 5. This type of data where frequency is meaningful is known as frequency data.
  • 9. interested to know that how is a characteristic (such as income or expenditure, population, votes in an election, etc.) under study at one point in time is distributed over different subjects (such as families, countries, political parties, etc.). This type of data which is collected at one point in time is known as cross sectional data. For example, annual income of different families of a locality, survey of consumer’s expenditure conducted by a research scholar, opinion polls conducted by an agency, salaries of all employees of an institute, etc.
  • 10. are collected by an investigator or agency or institution for a specific purpose and these people are first to use these data, are called primary data. That is, these data are originally collected by these people and they are first to use these data. For example, suppose a research scholar wants to know the mean age of students of M.Sc. Chemistry of a particular university. If he collects the data related to the age of each student of M.Sc. Chemistry of that particular university by contacting each student personally. The data so obtained by the research scholar is an example of primary data for the same research scholar. There are a number of methods of collection of primary data depending upon many factors such as geographical area of the field, money available, time period, accuracy needed, literacy of the respondents/informants, etc. Here we will discuss only following commonly used methods. (1) Direct Personal Investigation Method (2) Telephone Method (3) Indirect Oral Interviews Method (4) Local Correspondents Method (5) Mailed Questionnaires Method (6) Schedules Method Let us discuss these methods one by one with some examples, merits and demerits.
  • 11. the previous section shows that collection of primary data requires lot of time, money, manpower, etc. But sometimes some or all these resources are not sufficient to go for the collection of primary data. Also, in some situations it may not be feasible to collect primary data easily. To overcome these types of difficulties, there is another way of collecting data known as secondary data. The data obtained/gathered by an investigator or agency or institution from a source which already exists, are called secondary data. That is, these data were originally collected by an investigator or agency or institution and has been used by them at least once and now, these are going to be used at least second time. Already existed data in different sources may be in published or unpublished form. So sources of secondary data can broadly be classified under the following two heads. (1) Published Sources When an institution or organisation publishes its own collected data (primary data) in public domain either in printed form or in electronic form then these data are said to be secondary data in published form and the source where these data are available is known as published source of the secondary data of the corresponding institution or organisation. Some of the published sources of secondary data are given below:  International Publications  Government Publications in India  Published Reports of Commissions and Committees  Research Publications  Reports of Trade and Industry Associations  Published Printed Sources  Published Electronic Sources
  • 12. Collected information in term of data or data observed through own experience by an individual or by an organisation which is in unpublished form is known as unpublished source of secondary data. (i) Records and statistics maintained by different institutions or organisations whether they are government or non- government (ii) Unpublished projects works, field works or some other research related works submitted by students in their corresponding institutes (iii) Records of Central Bureau of Investigation (iv) Personal diaries, etc.
  • 13. “counting” and “measurement” are very frequently used by everybody. For example, if you want to know the number of pages in a note book, you can easily count them. Also, if you want to know the height of a man, you can easily measure it. But, in Statistics, act of counting and measurement is divided into 4 levels of measurement scales known as (1) Nominal Scale In Latin, ‘Nomen’ means name. The word nominal has come from this Latin word, i.e. ‘Nomen’. Therefore, under nominal scale we divide the objects under study into two or more categories by giving them unique names. The classification of objects into atleast two or more categories is done in such a way that (a) Each object takes place only in one category, i.e. each object falls in a unique category, i.e. it either belongs to a category or not. Mathematically, we may use the symbol (“=”, “ ”) if an object falls in a category or not. (b) Number of categories must be sufficient to include all objects, i.e. there should not be scope for missing even a single object which does not fall in any of the categories. That is, in statistical language categories must be mutually exclusive and exhaustive. Generally nominal scale is used when we want to categories the data based on the characteristic such as gender, race, region, religion, etc.
  • 14. have seen that order does not make any sense in nominal scale. As the name ordinal itself suggests that other than the names or codes given to the different categories, it also provides the order among the categories. That is, we can place the objects in a series based on the orders or ranks given by using ordinal scale. But here we cannot find actual difference between the two categories. Generally ordinal scale is used when we want to measure the attitude scores towards the level of liking, satisfaction, preference, etc. Different designation in an institute can also be measured by using ordinal scale. For example Suppose, a school boy is asked to list the name of three ice-cream flavours according to his preference. Suppose he lists them in the following order: Vanilla Straw berry Tooty-frooty This indicates that he likes vanilla more compared to straw berry and straw berry more as compared to tooty-frooty. But the actual difference between his liking between vanilla and straw berry cannot be measured. In sixth pay commission, teachers of colleges and universities are designated as Assistant Professor, Associate Professor and Professor. The rank of Professor is higher than that of Associate Professor and designation of Associate Professor is higher than Assistant Professor. But you cannot find the actual difference between Professor and Associate Professor or Professor and Assistant Professor or Associate Professor and Assistant Professor. This is because, one teacher in a designation might have served certain number of years and have done a good quality of research work, etc. and other teacher in the same designation might have served for lesser number of years have done unsatisfactory research work, etc. So, the actual difference between one designation and other designation cannot be found. So one may be very near to his next higher designation and other may be very far from it depending on their quality of teaching/research.
  • 15. I = [4, 9] then length of this interval is 9-4 =5, i.e. difference between 4 and 9 is 5, i.e. we can find the difference between any two points of the interval. For example, 7, 7.3, difference between 7 and 7.3 is 0.3. Thus we see that property of difference holds in case of intervals. Similarly, third level of measurement, i.e. interval scale possesses the property of difference which was not satisfied in case of nominal and ordinal scales. Nominal scale gives only names to the different categories, ordinal scale moving one step further also provides the concept of order between the categories and interval scale moving one step ahead to ordinal scale also provides the characteristic of the difference between any two categories. Interval scale is used when we want to measure years/historical time/calendar time, temperature (except in the Kelvin scale), sea level, marks in the tests where there is negative marking also, etc. Mathematically, this scale includes +, – in addition to >, < and = and not equal. let us consider some examples: The measurement of time of an historical event comes under interval scale because there is no fixed origin of time (i.e. ‘0’ year). As’0’ year differ calendar to calendar or society/country to society/country e.g. Hindus, Muslim and Hebrew calendars have different origin of time, i.e. ‘0’ year is not defined. In Indian history also, we may find BC (Before Christ).
  • 16. scale is the highest level of measurement because nominal scale gives only names to the different categories, ordinal scale provides orders between categories other than names, interval scale provides the facility of difference between categories other than names and orders but ratio scale other than names, orders and characteristic of difference also provides natural zero (absolute zero). In ratio measurement scale values of characteristic cannot be negative. Ratio scale is used when we want to measure temperature in Kelvin, weight, height, length, age, mass, time, plane angle, etc. Ratio scale includes x, division in addition to +, –, >, <, =, not equal. But be careful never take ‘0’ in the denominator while finding ratios. For example, 0/4 is meaningless. let us consider some examples, Measurement of temperature in Kelvin scale comes under ratio scale because it has an absolute zero which is equivalent to C 15.273 0. This characteristic of origin allows us to make the statement like 50K (‘50K’ read as 50 degree Kelvin) is 5 time hot compare to 10K. Both height (in cm.) and age (in days) of students of M.Sc. Statistics of a particular university satisfy all the requirements of a ratio scale. Because height and age both cannot be negative (i.e have an absolute zero).
  • 17. STATISTICAL TOOLS LOGIC/REASON NOMINAL SCALE Mode, chi-square test and run test Here counting is only permissible operation. ORDINAL SCALE Median all positional averages like quartile, Decile, percentile, Spearman’s Rank correlation Here other than counting, order relation (less than or greater than) also exists. INTERVAL SCALE Mean , S.D., t-test, F-test, ANOVA, sample multiple and moment correlations, regression. Here counting, order and difference operations hold. RATIO SCALE Geometric mean (G.M.), Harmonic mean (H.M.), Coefficient of variation. Here counting, order, difference and natural zero exist.