For a 2×2 table, the %by category agreement is the % positive and the % negative, and the symmetry test is reduced to the McNemar test . The Kappa is not weighted and therefore treats all categories in the same way at odds. For a detailed explanation of the statistics of agreements and symmetry tests, see the [5, Ch.8 – 11]. If the B test is a gold standard, the accuracy of the diagnostic tests is measured by (simplified to 2×2 tables): PPV – N22/N-2, NPV – N11/N-1, sensitivity – N22/N2 – and specificity – N11/N1. The discovery of new biomarkers, the development of new diagnostic tests for these biomarkers and the validation of new diagnostic tests are fundamental objectives in medical research. Improving this “biomarker pipeline” [1, 2, 3] to quickly provide validated and useful diagnostic tests is one of the top research priorities. One of the keys to improving the pipeline is the effective implementation and use of large sample archives to enable the analysis of new diagnostic tests and to compare them to existing diagnostic tests . As a general rule, all existing diagnostic tests have already been performed on all samples. When a gold standard is available, standard diagnostic accuracy statistics are sensitivity, specificity, positive forecast value (APP) and negative forecast value (NPV) . In the fields of science and technology, the accuracy of a measurement system is the degree of proximity of measurements from a quantity to the actual value of that quantity.  The accuracy of a measurement system that relates to reproducibility and repeatability is the degree to which repeated measurements under unchanged conditions show the same results.   Although the two words are commonly synonymous with precision and precision, they are deliberately contrasted in the context of the scientific method. Accuracy is the proximity of a measure to fair value for this measurement.
The accuracy of a measurement system refers to the proximity of the concordance between repeated measurements (repeated under the same conditions). Measurements can be both accurate and precise, accurate, but not precise, accurate, but not accurate, but not accurate or not. In addition to the possibilities of the computer agreement that I list above, I could also use the type classification (Chapter 1) to create neural activity maps for the three of you (them, your friend, man) and see if the cards are ranked in the same way. No matter how we measure, our questions are really about collective intentionality. The objective of the Community Access to Cervical Health (CATCH) study was to evaluate the diagnostic accuracy of three different methods of cervical cancer screening (pap smear, HPV DNA testing and visual inspection of acetic acid) for the detection of cervical neoplastic 2 or less (CIN2 or “cervical precanceurry”) . In this study, 2,331 women living in Ranga Reddy district, In the Indian state of Andra Pradesh, underwent all three test for the bone collar. Women who tested positive in one of the three tests, as well as a random sample of 1/5 of all women, were asked to submit to the gold standard of colposcopic biopsy in order to definitively determine the presence of CIN2. In addition, 670 of the 1052 women who are expected to undergo a colpososcopic biopsy determined the presence of CIN2. Although women were allowed to refuse the biopsy, acceptance of the biopsy is reasonable in the screening layers, as there are no outward signs of cervical cancer that this woman could use to make her gold standard status too divine, and cervical cancer (which could cause tangible symptoms) was rare (only 4 cases were detected). We show three examples that illustrate the knowledge we have gained, the resources we have saved, the problems we have encountered in choosing a subsampling scheme and the ease of practical application of our methods.