Monolingual NER Results for various Languages

Feb 4, 2019 1 min read named entity recognition, Indian Languages, European Languages

The Neural NER system implemented by me as part of the papers TALLIP paper and ACL 2018 Paper achieves the following F1-Scores on various languages.

Results

Language	Dataset	Word Embeddings	Reference	F1 Score
English	CoNLL 2003	Spectral Embeddings	Arxiv Paper	90.94
Spanish	CoNLL 2002	Spectral Embeddings	Arxiv Paper	85.75
Dutch	CoNLL 2002	Spectral Embeddings	Arxiv Paper	85.20
German	Link	Spectral Embeddings	ACL 2018 Paper	87.64
Italian	Evalita 2009	Spectral Embeddings	ACL 2018 Paper	75.98
Hindi	FIRE 2014	Fasttext Embeddings	ACL 2018 Paper	64.93
Marathi	FIRE 2014	Fasttext Embeddings	ACL 2018 Paper	61.46
Bengali	FIRE 2014	Fasttext Embeddings	ACL 2018 Paper	55.61
Malayalam	FIRE 2014	Fasttext Embeddings	ACL 2018 Paper	64.59
Tamil	FIRE 2014	Fasttext Embeddings	ACL 2018 Paper	65.39

PPS: The reason for difference in monolingual NER performance for Bengali, Tamil and Malayalam compared to the published results are due to certain pre-processing steps which were not performed in the ACL 2018 paper. We have observed that some of the sentences have length greater than 200 words. Manually splitting these longer sentences into smaller ones using ‘|’ as delimiter lead to substantial improvement. Also, these models are trained using common-crawl embeddings as opposed to wikipedia embeddings

named entity recognition Indian Languages European Languages

Rudra Murthy V

Browses memes and watches series to escape from reality

My research interests include multilingual learning for various Natural Language Processing Tasks.

Monolingual NER Results for various Languages

Results

Rudra Murthy V

Browses memes and watches series to escape from reality

Related