We are proud to announce the version 3.0 release of the quanteda package, just over a year following our last major release of v2.0. Version 3.0 is a significant update that makes quanteda and its growing family of extension packages more solid, more consistent, and more extensible.
Main changes Modularisation We have now separated the textplot_*() functions from the main package into a separate package quanteda.textplots, and the textstat_*() functions from the main package into a separate package quanteda.
Continue reading
We are proud to announce the version 2.0 release of the quanteda package, following over a year of planning, discussion, design, and – most significantly – programming and testing. quanteda version 2.0 is a major update introducing some major changes and new features detailed below.
Major changes 1. Object structure All included data objects (corpus, tokens, dfm and dictionary) have the new formats. These are all updated to work with the existing extractor and replacement functions.
Continue reading
A major update of spaCy (v2.1) was released recently. spaCy is one of the best and fastest tools for tokenization, part-of-speech tagging, dependency parsing, and entity recognition. In this post, I will discuss how it works with our spacyr package along with some tips on having multiple versions of spaCy using conda environments.
Good news: It works Our package spacyr is an R wrapper to the spaCy Python library. To work with the spacyr package, users have to prepare a Python environment with spaCy installed.
Continue reading
February 12, 2019 · 13 minute read
· Tags:
blog
, r-bloggers
· Author: Kenneth Benoit and Gokhan Ciflikli
Introduction A frequent problem in processing texts is the need to segment one or few documents into many documents, based on segments that they contain marking units that the analyst will want to consider seperately.
This is a frequent feature of interview or debate transcripts, for instance, where a single long source document might contain numerous speech acts from individual speakers. For analysis, it’s likely that we would want to consider these speakers separately, perhaps with the ability to combine their speech acts later by spearker.
Continue reading
Quanteda Blog This is a placeholder for a brandnew quanteda blog.
We are so delighted to announce our new blog.
What is this blog for? This blog serves for several purposes:
Announcing new update on our quanteda-verse packages and services Sharing the information and tips on natural language processing especially in R We need contributors Here is a guideline for contributors.
Continue reading
What to contribute As we stated in the announcement, we would like to make this blog a forum for people who are interested in natural language processing (NLP) from any level of contributor from newbies to guru, and to disseminate the information about the latest updates in NLP world and also how-to or how-not-to analyze text as data.
To serve this goal, we would like to ask those who have a say on NLP to get involved in our blog as authors of blog posts.
Continue reading