2009-02-26

『啤酒与尿布』的数据挖掘神话还要传多久

数据挖掘神话“啤酒与尿布”,这里有一份详细的调查报告:
Basically, I found the person in Blischok's group who ran the queries. K. Heath ran self joins
in SQL (1990), trying to find two itemsets that have baby items, which are particularly profitable. She found this beer and diapers pattern in their data of 50 stores over a day period. When I talked to her, she mentioned that she didn't think the pattern was significant, but it was interesting."

So what are the facts? In 1992, Thomas Blischok, manager of a retail consulting group at Teradata, and his staff prepared an analysis of 1.2 million market baskets from about 25 Osco Drug stores. Database queries were developed to identify affinities. The analysis "did discover that between 5:00 and 7:00 p.m. that consumers bought beer and diapers". Osco managers did NOT exploit the beer and diapers relationship by moving the products closer together on the shelves. This decision support study was conducted using query tools to find an association. The true story is very bland compared to the legend.

So if someone asks you about the story of "data mining, beer and diapers" you now know the facts. The story most people tell is fiction and legend. You can continue telling the story, but remember no matter how you tell it, the story of "data mining, beer and diapers" is NOT a good example of the possiblities for decision support with current data mining technologies.


调查结果:啤酒与尿布的数据关系虽不显著但很有意思;Osco并未利用这一发现,将啤酒与尿布特意就近放置。

没有评论: