Related courses
| No. | Type | Course name | Participation | TWS |
| 38531 | VÜ | Analysis of Unstructured Data | Compulsory | 6 |
| Total (Compulsory and elective) | 6 | |||
Recommended prerequisites
Students should have basic programming skills and a basic understanding of algorithms and data structures.
Qualification objectives
Students learn about the challenges and methods involved in obtaining and extracting information and learn to apply the analysis methods discussed. They learn to apply methods of analyzing unstructured data to concrete, practice-relevant questions (especially in the area of knowledge acquisition) and can assess existing approaches for exemplary tasks and suggest further developments or implement them independently.
Content
This module provides an insight into the challenges and methods used to analyze unstructured data. Unstructured information is usually very text-heavy, which is why many predictive analytics methods cannot utilize the information value of this data. However, text-based media (emails, website content, specialist articles, social media posts, etc.) can help to identify trends, gain knowledge and uncover fake news, among other things. To do this, information must be identified, extracted, processed and interpreted. The challenge lies in recognizing relevant information, extracting it from unstructured texts and adding missing information where necessary.
The course also deals with topics such as obtaining information from different sources and issues of quality assurance in data storage and data management in knowledge-based structures.
Theoretical and practical issues are addressed equally in the exercise. The theoretical part serves to repeat the contents of the lecture. In the practical part, students are asked to independently implement selected methods for analyzing unstructured data. Programming skills are required for the exercises.
Literature
- Soumen Chakrabarti: Mining the Web, Morgan Kaufmann, 2002.
- Henning Wachsmuth:Text Analysis Pipelines: Towards Ad-hoc Large-Scale Text Mining (Lecture Notes in Computer Science, Band 9383), Springer Verlag, 2015.
- Nikos Tsourakis: Machine Learning Techniques for Text, Packt Publishing, 2022.
- Anish Chapagain: Hands-On Web Scraping with Python, 2. Auflage, Packt Publishing, 2023.
Proof of performance
Written examination lasting 60 minutes or oral examination lasting 30 minutes. The type of examination will be announced at the beginning of the module.
Applicability
Participation in the courses of this elective module enables students to take on a Master's thesis in the field of data science with a focus on the analysis of unstructured data.
Duration and frequency
The module lasts one trimester and begins each year in HT.