German Medical Text Corpus (GeMTeX)
The GeMTeX corpus is the largest German clinical text resource available for research, comprising real documents from various clinical domains from six German university hospitals. If you are interested in using the corpus for your research, you can request access in accordance with the established regulations and governance of the Medical Informatics Initiative. In particular, please follow these three steps:
1. Provide a study protocol outlining the intended use of GeMTeX.
2. Provide an IRB vote for that study protocol.
Do you need assistance?
If you have questions or need assistance in requesting access to GeMTeX, do not hesitate to reach out to us via gemtex.mi@mh.tum.de.
Resources
- GeMTeX @ Health Study Hub: https://health-study-hub.de/resource/nfdi4health-29691
- GeMTeX De-Identification Guidelines with sample project: https://doi.org/10.5281/zenodo.15747389
- GeMTeX Semantic Annotation Guidelines: https://doi.org/10.5281/zenodo.17711868
- GeMTeX Semantic Annotation Gold Standard Document: https://doi.org/10.5281/zenodo.18861608