What's new in quanteda version 3.0

April 7, 2021 · 8 minute read · Tags: blog , quanteda-latest , r-bloggers · Author: Kenneth Benoit and Kohei Watanabe

We are proud to announce the version 3.0 release of the quanteda package, just over a year following our last major release of v2.0. Version 3.0 is a significant update that makes quanteda and its growing family of extension packages more solid, more consistent, and more extensible. Main changes Modularisation We have now separated the textplot_*() functions from the main package into a separate package quanteda.textplots, and the textstat_*() functions from the main package into a separate package quanteda.
Continue reading

What's new in quanteda version 2.0

February 27, 2020 · 12 minute read · Tags: blog , quanteda-latest , r-bloggers · Author: Kenneth Benoit and Kohei Watanabe

We are proud to announce the version 2.0 release of the quanteda package, following over a year of planning, discussion, design, and – most significantly – programming and testing. quanteda version 2.0 is a major update introducing some major changes and new features detailed below. Major changes 1. Object structure All included data objects (corpus, tokens, dfm and dictionary) have the new formats. These are all updated to work with the existing extractor and replacement functions.
Continue reading

Using spaCy v2.1 with spacyr

March 28, 2019 · 6 minute read · Tags: blog , spacyr · Author: Akitaka Matsuo

A major update of spaCy (v2.1) was released recently. spaCy is one of the best and fastest tools for tokenization, part-of-speech tagging, dependency parsing, and entity recognition. In this post, I will discuss how it works with our spacyr package along with some tips on having multiple versions of spaCy using conda environments. Good news: It works Our package spacyr is an R wrapper to the spaCy Python library. To work with the spacyr package, users have to prepare a Python environment with spaCy installed.
Continue reading

Text analysis of the 10th Republican Presidential candidate debate

February 12, 2019 · 13 minute read · Tags: blog , r-bloggers · Author: Kenneth Benoit and Gokhan Ciflikli

Introduction A frequent problem in processing texts is the need to segment one or few documents into many documents, based on segments that they contain marking units that the analyst will want to consider seperately. This is a frequent feature of interview or debate transcripts, for instance, where a single long source document might contain numerous speech acts from individual speakers. For analysis, it’s likely that we would want to consider these speakers separately, perhaps with the ability to combine their speech acts later by spearker.
Continue reading

Announcing the Quanteda Blog

June 24, 2018 · 1 minute read · Tags: blog · Author: Quanteda Initiative

Quanteda Blog This is a placeholder for a brandnew quanteda blog. We are so delighted to announce our new blog. What is this blog for? This blog serves for several purposes: Announcing new update on our quanteda-verse packages and services Sharing the information and tips on natural language processing especially in R We need contributors Here is a guideline for contributors.
Continue reading

Guide for contributors

June 1, 2018 · 4 minute read · Tags: blog · Author: Ken Benoit, Akitaka Matsuo

What to contribute As we stated in the announcement, we would like to make this blog a forum for people who are interested in natural language processing (NLP) from any level of contributor from newbies to guru, and to disseminate the information about the latest updates in NLP world and also how-to or how-not-to analyze text as data. To serve this goal, we would like to ask those who have a say on NLP to get involved in our blog as authors of blog posts.
Continue reading