This page offers examples of the TEI P5 encoding of various layers in the National Corpus of Polish (NKJP), as of 6 May 2011. The schemata and the examples may still change without notice. Examples in the column file in NKJP come from the full corpus (i.e., linguistic annotations are generated automatically, by appropriate tools), while those in file in 1M come from the manually annotated 1-million-word subcorpus of NKJP.
Corpus headers:
How to validate particular files (with xmllint
on Linux):
$ xmllint --noout --valid header.xml $ xmllint --noout --xinclude --relaxng NKJP_text.rng text.xml $ xmllint --noout --xinclude --relaxng NKJP_structure.rng text_structure.xml $ xmllint --noout --xinclude --relaxng NKJP_segmentation.rng ann_segmentation.xml $ xmllint --noout --xinclude --relaxng NKJP_morphosyntax.rng ann_morphosyntax.xml $ xmllint --noout --xinclude --relaxng NKJP_senses.rng ann_senses.xml $ xmllint --noout --xinclude --relaxng NKJP_words.rng ann_words.xml $ xmllint --noout --xinclude --relaxng NKJP_groups.rng ann_groups.xml $ xmllint --noout --xinclude --relaxng NKJP_named.rng ann_named.xml
Papers:
Tools:
Back to the homepage of the Linguistic Engineering Group.