Data mining is not scraping the screen. I know some of the people in the room may disagree with this statement, but they're actually two completely different concepts.

In short, you can mention it this way: Screen stripping allows you to obtain information, as data mining allows you to analyze information. This is a very big simplification, so I'm going to explain a bit.

The term "screen abstraction" comes from the old terminal days when people worked on computers with green and black screens containing only text. Screen scraping was used to extract letters from the screens so that they could be analyzed. Quick redirecting to the web world today, screen stripping is now the most popular point to extracting information from websites. That is, computer programs can "crawl" or "spider" through websites and pull data. People often do this to build things like comparison shopping engines, archive web pages or simply download the text to a spreadsheet so that it can be filtered and analyzed.

On the other hand, data mining is defined by Wikipedia as "a practice of automatic searching for large stores of data on patterns." In other words, you already have the data, and you're now analyzing it to learn useful things about it. Data mining often involves many complex algorithms based on statistical methods. It has nothing to do with how the data was obtained in the first place. In data mining, you only care about analyzing what already exists.

The difficulty is that people who don't know the term "screen cancellation" will try Googling for anything like it. We include a number of these terms on our website to help these people; for example, we have created pages titled “Extracting Text Data”, “Automated Data Collection”, “Extracting Website Data” and even “Site Ripper” (I assume “Skimming” Similar to "shredding"). So it's a little problem – we don't necessarily want to perpetuate a misconception (for example, Screen Scrapping = Data Extraction), but we also have to use terms that people will actually use.


Leave a Reply