The mining methods are classified as the methods of data analysis and the knowledge acquisition and they are derived from the methods of "Knowledge Discovery". Within the scope of these methods, there are two main variants associated with a form of data,i.e.: "data" and "text mining". The author of the paper tries to find an answer to a question about helpfulness and usefulness of these methods for the purpose of knowledge acquisition in the construction industry. The very process of knowledge acquisition is essential in terms of the systems and tools operating based on knowledge. Nowadays, they are the basis for the tools which support the decision-making processes. The paper presents three cases studies. The mining methods have been applied to practical problems – the selection of an adhesive mortar coupled with alternative solutions, analysis of residential real estate locations under construction by a developer company as well as support of technical management of a building facility with a large floor area.
The paper presents the key-finding algorithm based on the music signature concept. The proposed music signature is a set of 2-D vectors which can be treated as a compressed form of representation of a musical content in the 2-D space. Each vector represents different pitch class. Its direction is determined by the position of the corresponding major key in the circle of fifths. The length of each vector reflects the multiplicity (i.e. number of occurrences) of the pitch class in a musical piece or its fragment. The paper presents the theoretical background, examples explaining the essence of the idea and the results of the conducted tests which confirm the effectiveness of the proposed algorithm for finding the key based on the analysis of the music signature. The developed method was compared with the key-finding algorithms using Krumhansl-Kessler, Temperley and Albrecht-Shanahan profiles. The experiments were performed on the set of Bach preludes, Bach fugues and Chopin preludes.
The research aimed to establish tyre-road noise models by using a Data Mining approach that allowed to build a predictive model and assess the importance of the tested input variables. The data modelling took into account three learning algorithms and three metrics to define the best predictive model. The variables tested included basic properties of pavement surfaces, macrotexture, megatexture, and unevenness and, for the first time, damping. Also, the importance of those variables was measured by using a sensitivity analysis procedure. Two types of models were set: one with basic variables and another with complex variables, such as megatexture and damping, all as a function of vehicles speed. More detailed models were additionally set by the speed level. As a result, several models with very good tyre-road noise predictive capacity were achieved. The most relevant variables were Speed, Temperature, Aggregate size, Mean Profile Depth, and Damping, which had the highest importance, even though influenced by speed. Megatexture and IRI had the lowest importance. The applicability of the models developed in this work is relevant for trucks tyre-noise prediction, represented by the AVON V 4 test tyre, at the early stage of road pavements use. Therefore, the obtained models are highly useful for the design of pavements and for noise prediction by road authorities and contractors.
Classification techniques have been widely used in different remote sensing applications and correct classification of mixed pixels is a tedious task. Traditional approaches adopt various statistical parameters, however does not facilitate effective visualisation. Data mining tools are proving very helpful in the classification process. We propose a visual mining based frame work for accuracy assessment of classification techniques using open source tools such as WEKA and PREFUSE. These tools in integration can provide an efficient approach for getting information about improvements in the classification accuracy and helps in refining training data set. We have illustrated framework for investigating the effects of various resampling methods on classification accuracy and found that bilinear (BL) is best suited for preserving radiometric characteristics. We have also investigated the optimal number of folds required for effective analysis of LISS-IV images.
The paper analyses the distorted data of an electronic nose in recognizing the gasoline bio-based additives. Different tools of data mining, such as the methods of data clustering, principal component analysis, wavelet transformation, support vector machine and random forest of decision trees are applied. A special stress is put on the robustness of signal processing systems to the noise distorting the registered sensor signals. A special denoising procedure based on application of discrete wavelet transformation has been proposed. This procedure enables to reduce the error rate of recognition in a significant way. The numerical results of experiments devoted to the recognition of different blends of gasoline have shown the superiority of support vector machine in a noisy environment of measurement.
Decision-making processes, including the ones related to ill-structured problems, are of considerable significance in the area of construction projects. Computer-aided inference under such conditions requires the employment of specific methods and tools (non-algorithmic ones), the best recognized and successfully used in practice represented by expert systems. The knowledge indispensable for such systems to perform inference is most frequently acquired directly from experts (through a dialogue: a domain expert - a knowledge engineer) and from various source documents. Little is known, however, about the possibility of automating knowledge acquisition in this area and as a result, in practice it is scarcely ever used. lt has to be noted that in numerous areas of management more and more attention is paid to the issue of acquiring knowledge from available data. What is known and successfully employed in the practice of aiding the decision-making is the different methods and tools. The paper attempts to select methods for knowledge discovery in data and presents possible ways of representing the acquired knowledge as well as sample tools (including programming ones), allowing for the use of this knowledge in the area under consideration.
This article presents the methodology for exploratory analysis of data from microstructural studies of compacted graphite iron to gain knowledge about the factors favouring the formation of ausferrite. The studies led to the development of rules to evaluate the content of ausferrite based on the chemical composition. Data mining methods have been used to generate regression models such as boosted trees, random forest, and piecewise regression models. The development of a stepwise regression modelling process on the iteratively limited sets enabled, on the one hand, the improvement of forecasting precision and, on the other, acquisition of deeper knowledge about the ausferrite formation. Repeated examination of the significance of the effect of various factors in different regression models has allowed identification of the most important variables influencing the ausferrite content in different ranges of the parameters variability.
The paper presents an analysis of SPC (Statistical Process Control) procedures usability in foundry engineering. The authors pay particular attention to the processes complexity and necessity of correct preparation of data acquisition procedures. Integration of SPC systems with existing IT solutions in area of aiding and assistance during the manufacturing process is important. For each particular foundry, methodology of selective SPC application needs to prepare for supervision and control of stability of manufacturing conditions, regarding specificity of data in particular “branches” of foundry production (Sands, Pouring, Metallurgy, Quality).
The aim of the study was to evaluate the possibility of applying different methods of data mining to model the inflow of sewage into the municipal sewage treatment plant. Prediction models were elaborated using methods of support vector machines (SVM), random forests (RF), k-nearest neighbour (k-NN) and of Kernel regression (K). Data consisted of the time series of daily rainfalls, water level measurements in the clarified sewage recipient and the wastewater inflow into the Rzeszow city plant. Results indicate that the best models with one input delayed by 1 day were obtained using the k-NN method while the worst with the K method. For the models with two input variables and one explanatory one the smallest errors were obtained if model inputs were sewage inflow and rainfall data delayed by 1 day and the best fit is provided using RF method while the worst with the K method. In the case of models with three inputs and two explanatory variables, the best results were reported for the SVM and the worst for the K method. In the most of the modelling runs the smallest prediction errors are obtained using the SVM method and the biggest ones with the K method. In the case of the simplest model with one input delayed by 1 day the best results are provided using k-NN method and by the models with two inputs in two modelling runs the RF method appeared as the best.
The use of quantitative methods, including stochastic and exploratory techniques in environmental studies does not seem to be sufficient in practical aspects. There is no comprehensive analytical system dedicated to this issue, as well as research regarding this subject. The aim of this study is to present the Eco Data Miner system, its idea, construction and implementation possibility to the existing environmental information systems. The methodological emphasis was placed on the one-dimensional data quality assessment issue in terms of using the proposed QAAH1 method - using harmonic model and robust estimators beside the classical tests of outlier values with their iterative expansions. The results received demonstrate both the complementarity of proposed classical methods solution as well as the fact that they allow for extending the range of applications significantly. The practical usefulness is also highly significant due to the high effectiveness and numerical efficiency as well as simplicity of using this new tool.