spaCy ( spay-SEE) is an open-source software library for advanced Natural Language Processing, written in the programming languages Python and Cython. It offers the fastest syntactic parser in the world. The library is published under the MIT license and currently offers statistical neural network models for English, German, Spanish, Portuguese, French, Italian, Dutch and multi-language NER, as well as tokenization for various other languages.
Unlike NLTK, which is widely used for teaching and research, spaCy focuses on providing software for production usage. As of version 1.0, spaCy also supports deep learning workflows that allow connecting statistical models trained by popular machine learning libraries like TensorFlow, Keras, Scikit-learn or PyTorch. spaCy's machine learning library, Thinc, is also available as a separate open-source Python library. On November 7, 2017, version 2.0 was released. It features convolutional neural network models for part-of-speech tagging, dependency parsing and named entity recognition, as well as API improvements around training and updating models, and constructing custom processing pipelines.
Video SpaCy
Main features
- Non-destructive tokenization
- Named entity recognition
- Support for over 25 languages
- Statistical models models for 8 languages
- Pre-trained word vectors
- Part-of-speech tagging
- Labelled dependency parsing
- Syntax-driven sentence segmentation
- Text classification
- Built-in visualizers for syntax and named entities
- Deep learning integration
Maps SpaCy
Extensions and visualizers
spaCy comes with several extensions and visualizations that are available as free, open-source libraries:
- Thinc: A machine learning library optimized for CPU usage and deep learning with text input.
- sense2vec: A library for computing word similarities, based on Word2vec and sense2vec.
- displaCy: An open-source dependency parse tree visualizer built with JavaScript, CSS and SVG.
- displaCyENT: An open-source named entity visualizer built with JavaScript and CSS.
See also
- Natural language processing
- List of natural language processing toolkits
- NLTK
References
External links
- Official website
- spaCy source code on GitHub
- Official blog by the creators
Source of article : Wikipedia