MLDM'2015
Comments from Participants of MLDM 2015, Hamburg, Germany
A personal report about ICDM and MLDM in Hamburg, 2015 Dr Hartmut Ilgner, CSIR, South Africa
My attendance at both conferences was incredibly stimulating and useful in relation to the various projects where we need to provide solutions by applying “real time monitoring and decision making” in the extractive mineral industry. It was interesting to see what similarities exist between the approaches in totally different industrial sectors to detect undesired changes of semi-static conditions, balanced environments, regular patterns or dynamic properties.
For example, the monitoring and skilful online analyses of the changes of analogue signals of selected brain functions to predict a pending seizure is very similar to listening to the increasing events of micro-seismicity in an underground mine prior to a sudden collapse of the overlaying rock. The methodologies for data analysis and the need to define reliable thresholds and indicators are similar. More specific is the challenge of the timing considerations of “how soon, but not too soon”, in order to raise a warning or even a serious alarm. Equally, the need to prevent false alarms is crucial to ensure confidence and acceptance by the end users.
In most online applications, new data creation is continuous. However, there are also instances when no new data are forthcoming for a period of time, and accordingly, an indication of the next data value is important. In that case, historical data need to be used to forecast such future value with a high degree of certainty and accuracy. Consequently, in this instance, the data mining activity is most intense during that period when no new data are available, which is somewhat counter-intuitively to data mining, but it is one of the most interesting challenges.
The organisers endeavour to make everyone’s work relevant to real-world problems. Computed outcomes must be distilled into some meaning for a cognitive human. Thus, merely numerical presentation of complex data manipulation must be avoided, as ‘a lot of everything’ may disguise unintentionally the pure essence of a study.
During both conferences, the importance to evaluate the performance of any chosen data mining process in terms of accuracy, computing time and derived knowledge was emphasised. Some fundamentals for good data mining were highlighted, for example:
- Good accuracy of machine learning or data mining does not automatically add value to the client;
- Costs and benefits must be balanced in relation to the problem and the requested insight;
- Wisdom is required to distinguish between rejecting of outliers and enhancing of novel features;
- Expert opinions should be obtained to structure the analyses and to review the final outcome; and
- If the data set is too small, be careful not to generalise.
There are also potential risks when selecting inappropriate methods, which may create misleading outcomes. In addition, a critical review is required whether chosen rules are accurate enough for the task at hand, or if they unnecessarily complicate the process without gaining additional insights.
The oral presentations were followed by an interactive session and brief question period. This provided further understanding and contextualised the respective presentation in relation to previous studies on related topics. There is so much insight and wisdom to be gained by engaging in the debates and interactive discussions with the experts, which is not reflected in their published papers relating to their talk.
For current students, these conferences provide a unique forum to discuss progress of their R&D work. Even when it appears to be a challenging audience, the access to the vast body of knowledge available, and the positive atmosphere, will, at the end, improve the final output of each thesis or dissertation alike.
The conference closed with a preview of things to come, which had the humorous and self-critical title: Big Data, Hype or Hallelujah?
Keep mining...