Accéder directement au contenu Accéder directement à la navigation
Communication dans un congrès

A Framework for Statistically-Sound Customer Segment Search Authors' Copy

Abstract : We develop S4, a Statistically-Sound Segment Search framework that combines principled data partitioning and sound statistical testing to verify common hypotheses in retail data and return interpretable customer data segments. Our framework accommodates one-sample, two-sample, and multiple-sample testing, to provide various aggregations and comparisons of customer transactions. To control the proportion of false discoveries in multiple hypothesis testing, we enforce an FDR-controlling procedure and formulate a unified optimization problem that returns customer data segments that satisfy the test for a given significance level, maximize coverage of the input data, and are within a risk capital. We develop a greedy algorithm to explore different data partitions and test multiple hypotheses in a sound manner. Our extensive experiments on four retail data sets examine the interaction between significance, risk and coverage, and demonstrate the expressivity, usefulness, and scalability of S4 in practice.
Type de document :
Communication dans un congrès
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-03379740
Contributeur : Sihem Amer-Yahia Connectez-vous pour contacter le contributeur
Soumis le : vendredi 15 octobre 2021 - 10:30:17
Dernière modification le : samedi 30 octobre 2021 - 03:48:05

Fichier

S4DSAA2021.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Sihem Amer-Yahia, Laure Berti-Equille, Abdelouahab Chibah. A Framework for Statistically-Sound Customer Segment Search Authors' Copy. The 8th IEEE International Conference on Data Science and Advanced Analytics, Oct 2021, Porto (virtual), Portugal. ⟨10.1109/DSAA53316.2021.9564199⟩. ⟨hal-03379740⟩

Partager

Métriques

Consultations de la notice

21

Téléchargements de fichiers

38