Twenty five years later Plan Some whys Immediate Results Further Results Examples Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Antoine de Falguerolles Institut de Mathématiques de Toulouse, Statistique et Probabilités 14 avril 2010 Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys 2010 − Immediate Results 1984 + 1986 2 Further Results = 25 Google for Twenty five years after Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités Examples End Twenty five years later Plan Some whys Immediate Results Further Results Examples Twenty five years later, a new Musketeer book between The Round Table of the Musketeers and The Man in the Iron Mask. Two decades and a half have passed since the famous swordsmen triumphed over Cardinal Analyse des Donnés and Milady Modelling in The Round Table of the Musketeers. Time has not weakened their resolve, nor dispersed their loyalties. But treasons and strategems still cry out for justice: European regulation on the Rosé wine endangers the throne of France, while in England, Brown promises to rebuild the economy, renew society and restore faith in politics. Today, the Royal Statistical Society brings its immortal Companies of Musketeers out of world-wide dispersion to cross swords with time, the malevolence of non-statisticians, and the forces of history. But their greatest test is the titanic struggle with the son of Milady who wears the face of evil. Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results Who are the Musketeers? Who is the devilish son of Milady? Who is Count d’Artagnan? I the Musketeers: The aficionados of statistical modelling! I the devilish son of Milady: Some dear colleague! I Count d’Artagnan: see next Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités Examples End Twenty five years later Plan Some whys Immediate Results Further Results Examples End ESRC/CNRS FRANCO-BRITISH PROGRAMME From left to right: Musqueteers Francis, Baccini(?) or Saint Pierre (?), Hinde, and Carlier. Notice the different tunics. “The research project has the general aim of comparing and evaluating the distinct French and British approaches to data analysis through the analysis of a number of complex data sets, and determining to what extent they are complementary rather than competitive.” Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités Twenty five years later Plan Some whys Immediate Results Further Results Examples I Analyse des données / Modélisation statistique I Exploratory Analysis / Statistical modelling I e.g: SVD of data matrices / Statistical models for muldimensional arrays Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Twenty five years later Plan Some whys Why this theme? Why Toulouse? The influential Edmond Lisle Immediate Results The round table Further Results Alain Baccini Henri Caussinus Jean-René Mathieu Examples CA and link functions Retinal convergence End Further Results Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités Examples End Twenty five years later Plan Some whys Immediate Results Further Results Examples Why this theme? Why this theme? I It seems widely accepted that there was such thing as a French approach in the Eighties. I To what extent did this “School” dominated statistical data analysis in France? I To what extent did this “School” was seen outside France as dominating statistical data analysis in France? These questions (and many more) are discussed I In a special issue of the Journal Electronique d’Histoire des Probabilités et de la Statistique http://www.jehps.net/decembre2008.html. Some Matériaux pour l’histoire de l’analyse des données have been organised by Ludovic Lebart. They include an introduction by Ludovic Lebart, a series of articles (John Gower, Fionn Murtagh, Michel Armatte, Alain Desrosieres, Willem J. Heiser, Antoine de Falguerolles, Alfredo Rizzi, Hans-Hermann Bock, Boris Mirkin and Ilya Muchnik), and some texts and documents from Jean-Paul Benzécri, Henry Rouanet et Dominique Lepine, Noboru Ohsumi. I Henri Caussinus (2002): Some concluding observations in Annales de la Faculté des Sciences de Toulouse. Vol. XI, n 4, 2002 pp. 587591. www.stat.cmu.edu/~fienberg/ToulouseAnnales-4-2002/Conclusion.pdf Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results Examples Why this theme? The debate on the emphasis to put either on DATA or on MODEL is recurrent. I Ernest Fournier de Flaix commenting some of the talks given at the Jubilee meeting (1885) of the Royal (London) Statistical Association: [. . . ]C’est la traduction en courbes graphiques des calculs de probabilités, mais les calculs de probabilité sont un des dangers de la statistique. [. . . ] Ernest Fournier de Flaix: Le jubilee-volume de la Société de Statistique de Londres, Journal de la Société de Statistique de Paris, vol. 27, 1886, p. 222-223. I Same Journal in 1897 and 1898: IWLS (Vilfredo Pareto); Gamma distribution (Lucien March)! Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results Examples Why this theme? Fournier de Flaix: Les mêmes réserves ne doivent-elles pas être faites sur l’application des formules mathématiques, accessibles à si peu de personnes, aux résultats de la statistique ? On en trouve la preuve dans un mémoire de M. Galton sur l’application de la méthode graphique à la mesure de l’erreur. C’est la traduction en courbes graphiques des calculs de probabilités, mais les calculs de probabilité sont un des dangers de la statistique. Ces calculs ont séduit plus d’un économiste, plus d’un statisticien, tels que Stanley Jevons, en les exposant plus d’une fois à être démentis par les faits. Ce qui est arrivé dans la question de la monnaie. Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results Examples Why Toulouse? Why Toulouse? The Laboratoire de Statistique et Probabilités, Faculté des Sciences, Université de Toulouse I founded by Roger Huron (1913 - 1997) in the Fifties, a mathematician and a medical doctor I I Roger Huron (1958): Méthode générale d’estimation de la fréquence des gènes. Application aux groupes sanguins Annales de la faculté des sciences de Toulouse Sér. 4, 22, p. 159-173. then chaired by Henri Caussinus (1972-1988) I Henri Caussinus, Contribution à l’analyse statistique des tableaux de corrélation. Annales de la faculté des sciences de Toulouse, Sér. 4, 29 (1965), p. 77-183. I Jean-René Mathieu (1988-1996) I Gérard Letac (1997-1998), . . . until 2007. Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results Examples Why Toulouse? In the fifties, Roger Huron used the EM algorithm (before it was known under this name) to model the mixing of genes in various populations sampled in various countries: I Observed frequencies of phenotypes I Estimated frequencies of genotypes given the data on phenotypes and current estimations of model parameters I Revised estimation of model parameters given estimated frequencies of genotypes See also: Huron (Roger) et Ruffié (Jacques) – Les méthodes en génétique générale et en génétique humaine, Paris: Masson et Cie, 1958. Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results Examples Why Toulouse? I COMPSTAT’1982 held in Toulouse. I I Murray Aitkin (1982): Logit Models for the analysis of a very Large Survey of Unemployment in France, COMPSTAT’82, Part II, p. 9-10 Wien: Physica-Verlag. 11th Biometric Conference also held in Toulouse in 1982 I Murray Aitkin (1982): ? Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results Examples The influential Edmond Lisle Edmond Lisle A super international connector! Né le 23 mars 1928 à Marseille. Études aux Lycée français de Londres, Kingsbridge Grammar School, Merchant Taylors School à Londres, Magdalen College à Oxford, Facultés de droit et des lettres de Paris. Diplômes: Master of Arts, Docteur ès sciences économiques, Licencié ès lettres. Influential member of the CNRS, he was instrumental in maintaining Social Sciences within the CNRS thus contributing to keep the “scientific” status to Social Sciences. See the interesting interview (27 June 2001) of Edmond Lisle by Olivier Martin in La revue pour l’histoire du CNRS, 2002 (http://histoire-cnrs.revues.org/documrnt543.html). Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results The round table The round table, 9-10 December 1985 Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités Examples End Twenty five years later Plan Some whys Immediate Results Further Results The round table Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités Examples End Twenty five years later Plan Some whys Immediate Results Further Results The round table Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités Examples End Twenty five years later Plan Some whys Immediate Results Further Results The round table Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités Examples End Twenty five years later Plan Some whys Immediate Results Further Results The round table The cosupervised PhD dissertation of Nathalie Raynal (defended 1987) Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités Examples End Twenty five years later Plan Some whys Immediate Results Further Results Examples Alain Baccini Alain Baccini Les premiers de ces travaux ont conduit à la thèse d’Abdelhaq Khoudraji (”Analyse des Correspondances et mise en oeuvre du modèle de Goodman”, 1988) et à deux publications : l’une sur l’estimation moindres carrés des paramètres du modèle d’association (Baccini et Khoudraji, 1992) ; l’autre sur l’usage de ce modèle dans l’analyse d’une table de taux (Baccini et Khoudraji, 1992). Par la suite, les propriétés asymptotiques des estimateurs moindres carrés généralisés des paramètres des modèles d’association et de corrélation ont été établies (Baccini, Fekri et Fine, 2000). Enfin, à la suite de la thèse de Lahcen At-Sidi-Allal (”Contributions à l’étude des modèles d’association dans l’analyse des tables de contigence”, 1996), un algorithme de calcul des estimations maximum de vraisemblance des paramètres des modèles d’association et de corrélation ont été mis au point, ainsi que des critères de choix de la dimension d’un tel modèle (At-Sidi-Allal, Baccini et Mondot, 2004). Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results Examples Henri Caussinus Henri Caussinus . . . Il s’est rapidement avéré que les deux approches devaient être considérées comme complémentaires bien plus que concurrentes (voir le numéro spécial de la RSA, 1987). Et c’est dans cette optique précise que plusieurs recherches ont ensuite été développées à Toulouse, en grande partie grâce à l’impulsion donnée par la collaboration entre notre équipe et celle animée par Murray Aitkin. . . . Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results Examples Henri Caussinus Besse, Ph., Caussinus, H., Ferré, L., Fine, J. (1986). Principal component analysis and optimisation of graphical displays. Statistics, 19, 2, pp 301-312. Caussinus, H. (1986). Quelques réflexions sur la part des modèles probabilistes en analyse des données. In E. Diday et al. (eds.), Data Analysis and Informatics, IV. pp. 151-165, North-Holland, Amsterdam. Caussinus, H., Fekri, M., Hakam, S., Ruiz-Gazen, A. (2003). A monitoring display of multivariate outliers, Computational Statistics and Data Analysis, 44, 1-2, 237-252. Caussinus, H and Ruiz-Gazen, A. (2006). Projection-Pursuit approach for categorical data. In Multiple Correspondence Analysis and Related Methods, M. Greenacre and J. Blasius (eds.), 405-418, Chapman & Hall. Caussinus, H and Ruiz-Gazen, A. (2007). Classification and generalized principal component analysis, Selected contributions in data analysis and classification, Brito et al.(Eds.), 539-548, Springer. Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results Jean-René Mathieu Jean-René Mathieu Jean-René Mathieu organises the Fifth International Statistical Modelling Workshop in Toulouse (1990) Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités Examples End Twenty five years later Plan Some whys Immediate Results Further Results Examples A two-way table 1 ... B b 1 AB y11 ... AB y1b ... AB y1#B y1A .. . a AB ya1 ... AB yab ... AB y1#B yaA .. . AB y#A1 ... AB y#Ab ... AB y#A#B A y#A y1B ... ybB ... B y#B y∅ .. . A .. . #A ... #B Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results Examples CA and link functions CA and link functions I Empirical probabilities AB pab I = AB yab y∅ bilinear predictor and identification constraints K X AB A B ηab = β ∅ + βaA + βbB + σk βk,a βk,b k=1 I prior weights AB wab = I a link function I least squares (constant variance) AB ηab 1 paA pbB = g (µAB ab ) A useful machinery I learnt from the colleagues in Lancaster. Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results Examples CA and link functions In standard CA, the link function is P A B reduces to 1 + K k=1 σk βk,a βk,b . 1 . paA pbB AB then The predictor ηab Other links can be considered, e.g. a log link in the spirit of Goodman’s R × C association model. Link misspecification has an important impact on dimensionality as can be seen in the following simulated example. (See Baccini, Caussinus and Falguerolles (1994): Diabolic horseshoes, IWSM 9, Exeter.) Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results 2 1 1 2 15 12 13 16 11 1014798 6 2 3 5 4 0 axis 3 12 10 113 16798 14 4 5 15 3 0 axis 2 1 2 CA and link functions 17 −1 −1 16 1 −2 −2 17 −2 −1 0 1 2 −2 −1 1 2 2 1 17 1 0 16 2 13 3 12 147815 11 56 4 10 9 −1 0 16 17 −2 −2 −1 12 11 8 3 10 15 2 4 5 6 79 14 13 axis 3 1 0 1 −1 axis 2 0 axis 1 2 axis 1 −2 −1 0 1 2 −2 1 2 Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics axis 1 axis 1 (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités Examples End Twenty five years later Plan Some whys Immediate Results Further Results Examples Retinal convergence Retinal convergence The suicide data: frequencies of suicide by Age, Gender, and Method (Heuer, 1979) (see van der Heijden and de Leeuw, 1985; van der Heijden and Worsley, 1988; . . . ) About 50000 cases (20000 males and 30000 females), 9 methods of suicide, 17 age classes. Many exploratory aproaches can be considered here: I I Multiple correspondence analysis Correspondence analysis of some two-way table: I I I I (age, sex) by methods sex odds classified by methods and ages ... Various Biplots Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results Examples Retinal convergence I I I Multiple correspondence analysis Correspondence analysis of some two-way table: I I I I I (age, sex) by methods sex odds classified by methods and ages ... Poisson all two-way A∗M +A∗S +M ∗S some standard models derived from the “Poisson trick”: I I I Various Biplots I multinomial logit (response is method) binomial (response is gender) ... others Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results 1.5 Retinal convergence + 1.0 ++ 0.0 + ++ + ++ + + ++ + ++ + + + −0.5 axis 2 0.5 + + + + + + + ++ + ++ −1.0 + + + −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 axis 1 project in statistics (1984-1986) Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research Institut de Mathématiques de Toulouse, Statistique et Probabilités Examples End Twenty five years later Plan Some whys Immediate Results Further Results Examples Retinal convergence Similar or Different? Dimension 1 in the plot separates males and females. Dimension 2 is mostly ordered by age. Note also that the two “clouds” for the age groups for males and females have approximately the same “shape”. The difference in the respective scale of the two clouds, due to the chi-squared metric, reflects the unbalancedness of the frequencies of males and females. The location of the two clouds reflect that a positive (resp. negative) age-method for one group corresponds to a negative (resp. positive) age-method for the other group. Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results Retinal convergence Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités Examples End Twenty five years later Plan Some whys Immediate Results Further Results Retinal convergence An exploratory approach: AM = pam AMS yam1 AMS AMS yam1 +yam2 is the empirical proportion of males AM = qam AMS yam2 AMS +y AMS yam1 am2 is the empirical proportion of females for given A = a and M = m Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités Examples End Twenty five years later Plan Some whys Immediate Results Further Results Examples Retinal convergence I AM or the q AM (q AM = 1 − p AM ) select either the pam am am am choose prior weights √ A 1M A M (other choices are possible) I AM : a bilinear predictor for pam I pa pm qa qm AM ηam ∅ = β + βaA + M βm + K X A M σk βk,a βk,m k=1 I AM for p AM corresponds to −η AM for q AM .) a logit link ( the ηam am am am Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités End Twenty five years later Plan Some whys Immediate Results Further Results Thank you for your attention! Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986) Institut de Mathématiques de Toulouse, Statistique et Probabilités Examples End
© Copyright 2025 Paperzz