Module 1144: Knowledge Discovery in Big Data


Assignment to degree program: M.Sc. Computer Science
Module coordinator: Univ.-Prof. Dr. phil. Michaela Geierhos
Module type: Compulsory elective
Recommended trimester: 2
Workload: 180 hrs.
- of which attendance time: 72 hrs.
- of which self-study: 108 hrs.
ECTS points: 6

 

Related courses

No. Type Course name Participation TWS
11441 Knowledge Discovery Compulsory elective 3
11443 SE Research Topics in Data Science Compulsory elective 3
11444 Big Data Management Compulsory elective 3
11446 P Data Science Practical Compulsory elective 3
Total (Compulsory and elective) 6

 

Recommended prerequisites

Students should have basic knowledge of programming and software design as well as a basic understanding of algorithms and data structures.

 

Qualification objectives

The learning objectives are the competent mastery of basic procedures and methods as well as their practical application in the areas described under content (see below).

 

Contents

In the lecture “Big Data Management”, students learn about architectures that are designed for the collection, processing and analysis of big data, for which conventional database systems are no longer suitable. In this context, not only the distributed big data infrastructure is dealt with, but also topics such as data structuring, data synchronization/parallelism and storage management are brought into focus. Initial experience with big data architectures is gained in the exercise.

The lecture “Knowledge Discovery” deals with the handling, categorization and analysis of heterogeneous data sources. Methods such as visual analytics/knowledge, discovery and data mining techniques and exploratory data analysis with the aid of AI methods such as machine learning or computational intelligence are introduced and explored in greater depth in the exercises.

In the seminar “Research Topics in Data Science”, selected current methods from the fields of data science, machine learning and deep learning are presented. The seminar is designed to give students an insight into state-of-the-art research topics. The topics covered are based on the current Gartner Hyper Cycle for Artificial Intelligence (e.g. Decision Intelligence, Responsible AI, Knowledge Graphs) and the Gartner Hype Cycle for Emerging Technologies (e.g. Self-Supervised Learning, Explainable AI, Social Data).

In the “Data Science Practical”, the knowledge learned in theory is put into practice in a project. Students will work in small groups on a larger project in the field of data science and present it at the end of the trimester. The project covers the entire project cycle - from the idea and concept, to data collection and processing, to the creation of a machine learning model and evaluation of the results. The plenary session offers regular exchange and feedback between the groups. The topics of the projects relate to the research areas learned in “Research Topics in Data Science” and “Methods of Data Science”. It is strongly recommended to have attended one of the above mentioned courses.

 

Literature

  • Jiawei Han, Micheline Kamber, Jian Pei: Data Mining – Concepts and Techniques, Morgan Kaufmann Publishers, 2011.
  • Martin Ester, Jörg Sander: Knowledge Discovery in Databases – Techniken und Anwendungen, Springer Verlag, 2000
  • Ayodele Oluleye: Exploratory Data Analysis with Python Cookbook, Packt Publishing, 2023.
  • Steffen Herbold: Data-Science-Crashkurs, dpunkt, 2022.

 

Proof of performance 

Portfolio with equal parts for each of the lectures (with exercise), for each seminar and in the practical course. Students can (depending on what is offered) submit either two lectures with exercises (11441 and 11444) or one lecture with exercise (11441 or 11444) and one practical course (11446) or one lecture with exercise (11441 or 11444) and one seminar (11443). The required individual achievements are as follows:

  • 11441: Written examination of 60 minutes or oral examination of 30 minutes. The type of examination will be announced at the beginning of the module.
  • 11443: Written documentation, processing time: 4 weeks, length 5,000 words.
  • 11444: Written examination of 60 minutes or oral examination of 30 minutes. The type of examination will be announced at the beginning of the module.
  • 11446: Completion of a project with written elaboration, completion time: 8 weeks, 20 pages.

 

Applicability

The knowledge and skills acquired here supplement the training in the field of software engineering with an aspect of great practical importance. Participation in the courses of this compulsory elective module enables students to undertake a Master's thesis in the field of data science.

 

Duration and frequency

The module lasts 2 to 3 trimesters and begins each year in FT.

 

Other remarks

The lectures, seminars and the practical course are not all offered every year, but in each year a minimum number of courses are offered to achieve 6 ECTS credits. Students are informed of the specific courses on offer at the beginning of each module.