Readability indexes

Talk about general stuff that interests you (that doesn't fit anywhere else).
Post Reply
User avatar
lpetrich
Posts: 14453
Joined: Mon Feb 23, 2009 6:53 pm
Location: Lebanon, OR, USA

Readability indexes

Post by lpetrich » Sun Jan 14, 2018 12:57 pm

I have discovered several of them:
(Wikipedia)Flesch–Kincaid readability tests
(Wikipedia)Gunning fog index
(Wikipedia)Coleman–Liau index
(Wikipedia)Automated readability index
(Wikipedia)SMOG
(Wikipedia)Dale–Chall readability formula
(Wikipedia)Spache readability formula
(Wikipedia)Fry readability formula

They all use words per sentence, along with syllables or letters or long/unfamiliar/complex/difficult words per word.

That seems to me overly simplistic, because it does not involve some estimate of syntactical complexity. To see why that is a problem, let us consider these three sets of sentences.
  1. I watched the cat. She was eating her dinner.
  2. I watched the cat, and she was eating her dinner.
  3. I watched the cat, as she was eating her dinner.
Most readability indexes would make (2) and (3) equal or very close to equal, even though (2) and (3) are syntactically rather different. (2) is essentially (1) with the sentences each turned into co-equal clauses. (3) is different from the other two. The second clause (she was eating her dinner) is turned into a modifier of the first clause (I watched the cat).

A complexity-based index would place (2) as not much more complex than (1) and (3) as significantly more complex than (1) or (2).

I went over to scholar.google.com to look for attempts to use syntactic complexity in readability testing, but I found only a few papers, and they did not state their results very clearly.

User avatar
Old Woman in Purple
Posts: 12010
Joined: Sat Sep 03, 2011 11:19 pm
Location: Hoffman Estates, IL

Post by Old Woman in Purple » Sun Jan 14, 2018 3:03 pm

How would formulas to gauge syntax complexity be set up? They would have to (in the process) identity proper 'complex syntax', and exclude 'mangled syntax'.

User avatar
lpetrich
Posts: 14453
Joined: Mon Feb 23, 2009 6:53 pm
Location: Lebanon, OR, USA

Post by lpetrich » Mon Jan 15, 2018 12:13 am

[quote=""Old Woman in Purple""]How would formulas to gauge syntax complexity be set up? They would have to (in the process) identity proper 'complex syntax', and exclude 'mangled syntax'.[/quote]
There is already some software that can parse text according to its grammar. That's what's used in grammar checkers.

Link Grammar has an online demo: Parse a sentence

Parsing English with a Link Grammar, (Wikipedia)Link Grammar, Link Grammar at AbiSource, opencog/link-grammar: The CMU Link Grammar natural language parser The "link" part is how the parsing is expressed.

Loren Pechtel
Posts: 2982
Joined: Sun Mar 08, 2009 5:29 pm

Post by Loren Pechtel » Tue Jan 16, 2018 1:13 am

[quote=""lpetrich""]I have discovered several of them:
(Wikipedia)Flesch–Kincaid readability tests
(Wikipedia)Gunning fog index
(Wikipedia)Coleman–Liau index
(Wikipedia)Automated readability index
(Wikipedia)SMOG
(Wikipedia)Dale–Chall readability formula
(Wikipedia)Spache readability formula
(Wikipedia)Fry readability formula

They all use words per sentence, along with syllables or letters or long/unfamiliar/complex/difficult words per word.

That seems to me overly simplistic, because it does not involve some estimate of syntactical complexity. To see why that is a problem, let us consider these three sets of sentences.
  1. I watched the cat. She was eating her dinner.
  2. I watched the cat, and she was eating her dinner.
  3. I watched the cat, as she was eating her dinner.
Most readability indexes would make (2) and (3) equal or very close to equal, even though (2) and (3) are syntactically rather different. (2) is essentially (1) with the sentences each turned into co-equal clauses. (3) is different from the other two. The second clause (she was eating her dinner) is turned into a modifier of the first clause (I watched the cat).

A complexity-based index would place (2) as not much more complex than (1) and (3) as significantly more complex than (1) or (2).

I went over to scholar.google.com to look for attempts to use syntactic complexity in readability testing, but I found only a few papers, and they did not state their results very clearly.[/quote]

I disagree. #1 is simpler than #2 because the period completes the first thought and then the second sentence builds upon that. You have less pending material to think about.

In the programming world this is very clear--our complexity measures definitely favor #1 over #2 or #3.

Also, look at books for young kids--#1 is definitely favored.

Post Reply