{kun´ēzē}
featured
02Nov2023

Word count for articles

Information
825 hits Updated: 21 November 2023 Products: Downloads & Documentation

Word count for articles

Version 1.1.0

Compatible with:
Joomla! 3.x Joomla! 4.x Joomla! 5.x

You need to login to access the

Download

Viewed 604 times
[Show] Hide last 5 visits
🇺🇸
Just now
🇬🇧
Today
🇨🇦
Today
🇨🇳
Yesterday
🇨🇳
Yesterday
 

This module displays a list of articles on a Joomla website, who wrote them and (optionally) a link to the each article and the word count for each article.  A “word” is defined as a string of text with a space (or multiple spaces, including non-breaking spaces) following it.  You can choose to analyse the word count for published and archived articles or include trashed and unpublished articles in the analysis.  The module lists the articles in descending order (i.e. the longest articles to the shortest) together with the word count (x) for each article and a statistical analysis of the number of articles (|x|), the shortest article (xmin), the longest article (xmax), mean word count per article () and standard deviation (σ) also plotted on a normal distribution chart at the end.

It is not recommended that the module be used to analyse websites with many thousands of articles—because all the contents of a website’s articles need to be loaded into memory—and certainly you wouldn’t want to display a list of “thousands” of articles.  I would recommend that you set a limit of, say, a few hundred articles at the most.  The performance of the module depends on how many articles you want to analyse and the length of your articles.  Also the output produced by the module is fairly wide and may not be suited to a template’s “sidebar” position; it looks best if displayed in a position that is more than 975px wide.  I would recommend that you place the module within an article using Joomla's {loadposition} or {loadmodule} feature.

Basic Settings

Word count for Joomla articles settings

Enter a number to remove the longest and shortest articles (if left empty, default is 0).  If you attempt to remove more outliers than you have articles then nothing will be displayed.

ParameterValue
Display authors ( Name | Username | None ) Choose whether you want the module to display authors “real” names or their usernames or not display the authors at all.
Include trashed articles ( Yes | No ) Yes will include unpublished and trashed articles in the selection; No only selects articles that are published and/or archived.  No is the default setting.
Display ( Articles | Statistics | Both ) Choose whether you want the module to display just a list of the longest articles, just the statistics or both a list of articles and the statistics.  The statistics are displayed after the list of articles:  viz. number of articles (|x|), mean word count per article () and standard deviation (σ).
Display links ( Yes | No ) Yes will construct links to articles (if selected by the previous setting).  Note: articles that are unpublished or trashed will not have links generated.
List limit Specify a positive integer number—we will refer to this as |x| (where |x| > 0)—for the number of articles you want to select; the initial (default) value is 25.  This number can be smaller or larger than the number of records in the _contents table on a website; the module will display the lesser of this number or the total number of records in the _contents table.  If you enter a negative value you will select all records in the _content table; this can be a convenient way of finding out how many articles exist.  A non-numeric setting will use the default 25.
Exclude categories If left empty then no categories will be excluded.  Select categories of articles that you do not want to include in the word count analysis.
Exclude articles If left empty then no articles will be specifically excluded.  If you want to exclude specific articles, enter the article IDs as a comma-separated list.
Remove outliers ( 0 | 1 … 10 ) Enter a number to remove the articles with most or least number of words (if left empty, default is 0).  See notes about this setting later in this article.

Key to symbols used

wc Sample1The example shows a list of five articles on a website with these settings:   Display authors = Username, Include trashed articles = Yes and Display links = Yes.

1. The article title is underlined indicating it is hyperlinked.  The article is published.

2. The symbol after the article title indicates the article has been trashed.  Trashed articles cannot be hyperlinked.

3. The symbol after the article title indicates the article is unpublished.  Unpublished articles cannot be hyperlinked.

4. The symbol instead of the username indicates the user record does not exist in the _users table.  The article is published.

5. The symbol after the article title indicates the article has been archived.

Troubleshooting “No articles to display”

If the module produces “No articles to display” this could be from one of the following reasons:

  1. You have not created any articles on your website.
  2. You have created articles but they’re unpublished (or trashed) and you’re using the setting Include trashed articles = No
  3. You have used List limit = 0.
  4. |x| is less than 2 × Remove outliers
  5. You have excluded too many articles or categories and there are no articles available for analysis.

Removing outliers:  Min-max feature scaling

As you may have guessed, there’s a difference between the statistical mean () of a series of numbers and the probability that a certain number (x) will occur.  Let’s look again at the example used to show the symbols you might see with the module.  In that example there are five articles with the following corresponding number of words:

Article number 1 2 3 4 5
Word count 3101 1275 1293 86 18

The median word count \[\tilde{x} = \frac{x_{min} + x_{max}}{2}\] is halfway between the largest article (3101) and the smallest (18); that is, the median is 1559.5.  However the mean word count \[\bar{x} = \dfrac{\sum x_{i}}{i}\] is 1154.6.  In other words, the mean word count is much lower than the median and there appears to be a higher probability that articles on this website will have fewer words than the expected (or median) number.

Real-estate businesses are notorious for quoting median property values, instead of the average property price, in a locality in order to attract customers.  Consider this:  if most houses in an area are selling for around 1.25M—but one home sells for 400K and another one sold for 3.6M—the median price is 2M.  So, potential sellers may be lured into thinking that the property market is good and will list their properties hoping to get 2M when they’re more likely to get 750K less.  On the other hand, were real estate businesses to use average property sales instead, home owners would be less likely to list their properties for sale.  The median value is determined by the lowest and highest price paid and one or two extreme cases can affect this number.  If the lowest and highest values are excluded, the median (or “most likely”) value changes.

There are several ways that we can adjust the calculation so that the mean number is closer to the median.  One method used to “normalise” the data is called min-max feature scaling when we exclude the extreme values from the calculation.  This means we trim the series by removing one or more elements from it; the highest and the lowest.  That’s what the setting Remove outliers is for; we will refer to this trimming factor as Z.  If we use Z = 0 we will keep all the records in our data set; if we use Z = 1 we would remove the “pink” records;  use Z = 2 remove the “pink” and the “yellow” records.  If we tried to use Z = 3 we would have no records left in our example data set.  The maximum value for use Z cannot be greater than half the number of selected records (|x|).

So, if you only have 20 articles you cannot use Z = 10.  In summary, the setting to remove outliers is something you may want to experiment with.

Live demonstration

Statistics:
  • Number of articles (|x|) = 101
  • Shortest article (xmin) = 152 words; longest article (xmax) = 3,611 words
  • Mean word count per article () = 1,255.94
  • Standard deviation (σ) = 745.44
1,256
942⟷1,570
628 ⟷1,884
314⟷2,198
152⟷3,611

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive