Place | Mary-Somerville-Str. 7 28359 Bremen |
Time | 10 am - 4 pm |
Contact Person | |
Organisation | Teilprojekt A04 (2022-25): SFB 1342, Universität Bremen |
Lecture Series | Internal Events |
The two-day workshop conveys basic knowledge of text classification workflows (document and sentence classification, e.g. for identifying relevant documents or sentiment analysis) and sequence tagging forr information extraction (e.g. named entities or protest event data such as protest form, issues, or number f participants). The workflow presented in the workshop is based on neural transformer networks (BERT and successor models) and includes the following steps:
- 1) handling of typical data formats for model training and prediction (CSV, XMI CAS, CoNLL),
- 2) application of pre-trained models,
- 3) training or fine-tuning of models with own data for new tasks,
- 4) hyper-parameter optimisation and evaluation of models.
The workshop includes short lecture content and plenty of time for exercises with prepared Jupyter Python notebooks running in the Google Colab platform in the browser. For machine learning, the Flair NLP framework and the Huggingface library of transformers based on Pytorch will be used.
Prerequisites
Basic knowledge in Python programming, handling of Pandas data frames, optional: familiarisation with the Flair NLP framework
Preparation
register a Google account (exercises will be done with Google Colab)
Data set
Test events in local news texts (will be provided)