ICS PAS

The Linguistic Engineering Group

TEI P5 Encoding of the National Corpus of Polish


This page offers examples of the TEI P5 encoding of various layers in the National Corpus of Polish (NKJP), as of 6 May 2011. The schemata and the examples may still change without notice. Examples in the column file in NKJP come from the full corpus (i.e., linguistic annotations are generated automatically, by appropriate tools), while those in file in 1M come from the manually annotated 1-million-word subcorpus of NKJP.

file in NKJPfile in 1MODDRelaxNG / DTD
text_structure.xml NKJP_structure.xml NKJP_structure.rng
text.xml NKJP_text.xml NKJP_text.rng
ann_segmentation.xml ann_segmentation.xml NKJP_segmentation.xml NKJP_segmentation.rng
ann_morphosyntax.xml ann_morphosyntax.xml NKJP_morphosyntax.xml NKJP_morphosyntax.rng
ann_senses.xml ann_senses.xml NKJP_senses.xml NKJP_senses.rng
ann_words.xml ann_words.xml NKJP_words.xml NKJP_words.rng
ann_named.xml ann_named.xml NKJP_named.xml NKJP_named.rng
ann_groups.xml ann_groups.xml NKJP_groups.xml NKJP_groups.rng
header.xml header.xml header.dtd

Corpus headers:

How to validate particular files (with xmllint on Linux):

$ xmllint --noout --valid header.xml
$ xmllint --noout --xinclude --relaxng NKJP_text.rng text.xml
$ xmllint --noout --xinclude --relaxng NKJP_structure.rng text_structure.xml
$ xmllint --noout --xinclude --relaxng NKJP_segmentation.rng ann_segmentation.xml
$ xmllint --noout --xinclude --relaxng NKJP_morphosyntax.rng ann_morphosyntax.xml
$ xmllint --noout --xinclude --relaxng NKJP_senses.rng ann_senses.xml
$ xmllint --noout --xinclude --relaxng NKJP_words.rng ann_words.xml
$ xmllint --noout --xinclude --relaxng NKJP_groups.rng ann_groups.xml
$ xmllint --noout --xinclude --relaxng NKJP_named.rng ann_named.xml

Papers:


Tools:


Back to the homepage of the Linguistic Engineering Group.


Valid XHTML 1.0! Valid CSS!

Last Modified: Sun Aug 7 22:13:32 CEST 2011

Maintained by AP.