Data Mining
Introduction to Data Mining
- State of art of data mining is similar to late 1960’s & early 1970’s.
- We probably see the development of query processing models, standards & algorithms.
- We probably see new data structured designed for storage of datatabase.
- DM involves many different algorithms to accomplish different tasks.
- DM algorithms can be characterized as model, preference & search.
- Ex. Credit card companies
predictive model and descriptive model
- A predictive model makes a prediction about values of data using known results found from different data.
- It is based on the use of other historical data.
- For example, a credit card use might be refused not because of the user's own credit history, but because the current purchase is similar to earlier purchases that were subsequently found to be made with stolen cards.
- includes classification, regression, time series analysis, and prediction.
- A descriptive model identifies patterns or relationships in data.
- includes clustering, summarization, association rules, and sequence discovery
DATA MINING VERSUS KNOWLEDGE DISCOVERY IN DATABASES
- Knowledge discovery in databases (KDD) is the process of finding useful information and patterns in data.
- The definition of KDD includes the keyword useful.
- Data mining is the use of algorithms to extract the information and patterns derived by the KDD process.
- A traditional SQL database query can be viewed as the data mining part of a KDD process.
KDD process consists of the following five steps:
Selection: The data needed for the data mining process may be obtained from many different and heterogeneous data sources. This first step obtains the data from various databases, files, and nonelectronic sources.
Preprocessing: The data to be used by the process may have incorrect or missing data. There may be data from multiple sources involving different data types and metrics. Erroneous data may be corrected or removed, whereas missing data must be supplied or predicted.
Transformation: Data from different sources must be converted into a common format for processing. Some data may be encoded or transformed into more usable formats.
Data mining: Based on the data mining task being performed, this step applies algorithms to the transformed data to generate the desired results.
Interpretation/evaluation: How the data mining results are presented to the users is extremely important because the usefulness of the results is dependent on it. Various visualization and GUI strategies are used at this last step.
Visualization
"a picture is worth a thousand words"
Graphical: Traditional graph structures including bar charts, pie charts, histograms, and line graphs may be used.
Geometric: Geometric techniques include the. box plot and scatter diagram techniques.
Icon-based: Using figures, colors, or other icons can improve the presentation of the results.
Pixel-based: With these techniques each data value is shown as a uniquely colored pixel.
Hierarchical: These techniques hierarchically divide the display area (screen) into regions based on data values.
Hybrid: The preceding approaches can be combined into one display.
Toll milling service I am impressed. I don't think Ive met anyone who knows as much about this subject as you do. You are truly well informed and very intelligent. You wrote something that people could understand and made the subject intriguing for everyone. Really, great blog you have got here.
ReplyDelete