{"id":894,"date":"2019-04-26T23:30:11","date_gmt":"2019-04-27T03:30:11","guid":{"rendered":"http:\/\/aristotle2digital.blogwyrm.com\/?p=894"},"modified":"2019-11-07T10:12:08","modified_gmt":"2019-11-07T15:12:08","slug":"teaching-a-machine-to-ghoti","status":"publish","type":"post","link":"https:\/\/aristotle2digital.blogwyrm.com\/?p=894","title":{"rendered":"Teaching a Machine to Ghoti"},"content":{"rendered":"<p>English is a notoriously difficult language to spell and pronounce. &nbsp;If it challenges the organic learners ability to grasp and communicate it more than doubly so is a source of frustration for the programmer trying to teach bits of silicon, metal, and plastic to appear that they speak naturally.<\/p>\n<p>A famous demonstration of the problems readily encountered in today\u2019s lingua franca, attributed to William Ollier Jr., is the spelling of fish as \u201cghoti\u201d. &nbsp;The proper pronunciation of this awkward looking sequence of letters strikes many people as something like \u201cgoatey\u201d, an adjective that could be used to describe a thing with the essence or behavior of a the goat. &nbsp;But Ollier argues for the fish phonetic as follows:<\/p>\n<ul>\n<li>the letters \u2018gh\u2019 are to be pronounced as they are in the word <i>enough<\/i><\/li>\n<li>the letter \u2018o\u2019 is to be pronounced as it is in the word <i>women<\/i><\/li>\n<li>the letters \u2018ti\u2019 is to be pronounced as it is in the word <i>nation.<\/i><\/li>\n<\/ul>\n<p>Short, simple, and eminently confusing!<\/p>\n<p>Ollier\u2019s first donor word comes from a delightfully ridiculous family of words all sporting the \u2018ough\u2019 combination of letters and all having subtly or radically different pronunciations. &nbsp;Consider the following list:<\/p>\n<ul>\n<li><i>bough<\/i> &#8211; a part of a tree; pronounced as <a href=\"https:\/\/www.bing.com\/search?q=bough%20pronunciation&amp;qs=n&amp;form=QBRE&amp;sp=-1&amp;ghc=2&amp;pq=bough%20pronunciation&amp;sc=8-19&amp;sk=&amp;cvid=378F2A4BF7094ABF9F672D99E3C98956\">[bou]<\/a><\/li>\n<li><i>bought<\/i> &#8211; the past tense of to buy; pronounce as <a href=\"https:\/\/www.bing.com\/search?q=bought%20pronunciation&amp;qs=n&amp;form=QBRE&amp;sp=-1&amp;pq=bought%20pronunciation&amp;sc=8-20&amp;sk=&amp;cvid=A44B4988A9F1477890FDC6BC822F5FB9\">[b\u00f4t]<\/a><\/li>\n<li><i>cough<\/i> &#8211; an action of the mouth and lungs; pronounced as <a href=\"https:\/\/www.bing.com\/search?q=cough%20pronunciation&amp;qs=n&amp;form=QBRE&amp;sp=-1&amp;pq=cough%20pronunciation&amp;sc=8-19&amp;sk=&amp;cvid=BA70DD694056439D9C4082B977A5C07C\">[k\u00e4f]<\/a><\/li>\n<li><i>dough<\/i> &#8211; a mixture of water and flour used in baking; pronounced as <a href=\"https:\/\/www.bing.com\/search?q=dough%20pronunciation&amp;qs=n&amp;form=QBRE&amp;sp=-1&amp;pq=dough%20pronunciation&amp;sc=8-19&amp;sk=&amp;cvid=AAC228EAF54149DB913E7E87AB5FBBEE\">[d\u014d]<\/a><\/li>\n<li><i>enough<\/i> &#8211; just the right amount of a required thing; pronounced <a href=\"https:\/\/www.bing.com\/search?q=enough%20pronunciation&amp;qs=n&amp;form=QBRE&amp;sp=-1&amp;pq=enough%20pronunciation&amp;sc=3-20&amp;sk=&amp;cvid=AA72581F89A742D688EF84273FC5BFA5\">[i\u02c8n\u0259f]<\/a><\/li>\n<li><i>through<\/i> &#8211; to past into and beyond; pronounced <a href=\"https:\/\/www.bing.com\/search?q=through%20pronunciation&amp;qs=n&amp;form=QBRE&amp;sp=-1&amp;pq=through%20pronunciation&amp;sc=8-21&amp;sk=&amp;cvid=FB221157E2FC42F08C1673828173918F\">[THro\u035eo]<\/a><\/li>\n<\/ul>\n<p>This list contains six distinct pronunciations for the same four letter combination, representing an astonishing gain of nearly two distinct pronunciations per letter, most likely the greatest variability in the English language, possibly the world. &nbsp;And it is by no means exhaustive (even if putting it together exhausted the writer). And there are a host of other \u2018ough\u2019 words that didn\u2019t make onto the list simply because they offered nothing new.<\/p>\n<p>For example, consider that <i>fought<\/i>, <i>ought<\/i>, <i>sought<\/i>, <i>thought<\/i> all rhyme with <i>bought<\/i> and bring nothing new to the list even if they bring headaches to people learning the English tongue. &nbsp;Similarly, \u2018ough\u2019 in <i>drought<\/i> sounds the same as it does in <i>bough<\/i> despite the fact that trees need water (or perhaps because of it). &nbsp;Likewise, <i>though<\/i>\u2019s similarity to <i>dough<\/i> consigns it to a mere mention in this paragraph rather than a place of honor (or is it shame) within the bullets above. &nbsp;And it is tough that <i>tough<\/i> and <i>rough<\/i> weren\u2019t unique enough to make the cut.<\/p>\n<p>But perhaps the most interesting no-show on the list is <i>slough<\/i>. &nbsp;This word is two-faced having two quite different sounds depending on whether it is a noun or a verb:<\/p>\n<ul>\n<li>slough &#8211; a swamp or mire; pronounced <a href=\"https:\/\/www.bing.com\/search?q=slough%20pronunciation&amp;qs=n&amp;form=QBRE&amp;sp=-1&amp;pq=slough%20pronunciation&amp;sc=6-20&amp;sk=&amp;cvid=B735A5FF852F4438A86BEF95DFDD9AC7\">[slou, slo\u035eo]<\/a><\/li>\n<li>slough &#8211; to shed or cast off; pronounced <a href=\"https:\/\/www.bing.com\/search?q=slough%20pronunciation&amp;qs=n&amp;form=QBRE&amp;sp=-1&amp;pq=slough%20pronunciation&amp;sc=6-20&amp;sk=&amp;cvid=B735A5FF852F4438A86BEF95DFDD9AC7\">[sl\u0259f]<\/a><\/li>\n<\/ul>\n<p>This example is, by no means unique. &nbsp;English is just brimming with curious words that lie in wait trying to trick the speaker. &nbsp;There are the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Homophone\">heterographs<\/a> that have the same pronunciation but have different spellings and meanings. &nbsp;A particularly sinister example is the set of <i>to<\/i>, <i>too<\/i>, and <i>two<\/i>. &nbsp;There are the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Homophone\">homonyms<\/a> that share both the same pronunciation and spelling but also mean different things. &nbsp;The clause, <i>the rose rose to glory in the garbage dump<\/i>, is a fine example.&nbsp; Together the heterographs and the homonyms form the set of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Homophone\">homophones<\/a>; words that are pronounced the same but which mean different things (regardless of spelling). &nbsp;Homonyms are also often called homographs, thus serving to bring us to the next category, the synonyms; words with different spellings and pronunciations but which have the same meaning. &nbsp;Two closely related categories that seem to have no official designation are the spelling variants and the speaking variants. Into the latter go all those weird words like <i>color<\/i> and <i>colour<\/i>, <i>saber<\/i> and <i>sabre<\/i>, and <i>normalize<\/i> and <i>normalise<\/i>. &nbsp;In the latter category, one finds words like <i>often<\/i>, in which the speaker may include or omit the \u2018t\u2019 sound.<\/p>\n<p>All of these categories can cause problems to speakers, natural and artificial though they may be. &nbsp;But no category seems to engender as much confusion and consternation as the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Homophone\">heteronyms<\/a>. &nbsp;Some of the classic examples that fall into this category are:<\/p>\n<ul>\n<li><i>bass<\/i> the musical instrument (<a href=\"https:\/\/www.bing.com\/search?q=bass%20pronunication&amp;qs=n&amp;form=QBRE&amp;sp=-1&amp;pq=bass%20pronunication&amp;sc=3-18&amp;sk=&amp;cvid=C7D847B8E6024839BF655A54C61914B4\">[b\u0101s]<\/a>) and bass the fish (<a href=\"https:\/\/www.bing.com\/search?q=bass%20pronunication&amp;qs=n&amp;form=QBRE&amp;sp=-1&amp;pq=bass%20pronunication&amp;sc=3-18&amp;sk=&amp;cvid=C7D847B8E6024839BF655A54C61914B4\">[bas]<\/a>)<\/li>\n<li><i>minute<\/i> the unit of time (<a href=\"https:\/\/www.bing.com\/search?q=minute%20pronunciation&amp;qs=n&amp;form=QBRE&amp;sp=-1&amp;pq=minute%20pronunciation&amp;sc=6-20&amp;sk=&amp;cvid=103D9E8C8886422D88F79DBD3CD65728&amp;ajf=10\">[\u02c8minit]<\/a>) and <i>minute<\/i> adjective describing size (<a href=\"https:\/\/www.bing.com\/search?q=minute%20pronunciation&amp;qs=n&amp;form=QBRE&amp;sp=-1&amp;pq=minute%20pronunciation&amp;sc=6-20&amp;sk=&amp;cvid=103D9E8C8886422D88F79DBD3CD65728&amp;ajf=10\">[m\u012b\u02c8n(y)o\u035eot]<\/a>)<\/li>\n<\/ul>\n<p>Together the heteronyms and homonyms make up the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Homograph\">homographs<\/a>; words with different meanings but the same spelling (regardless of pronunciation). &nbsp;The following Venn diagram (based on the one in the Wikipedia references) helps one keep score.<\/p>\n<p><a href=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2019\/04\/Word_Venn.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-893\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2019\/04\/Word_Venn.png\" alt=\"\" width=\"857\" height=\"614\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2019\/04\/Word_Venn.png 857w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2019\/04\/Word_Venn-300x215.png 300w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2019\/04\/Word_Venn-768x550.png 768w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2019\/04\/Word_Venn-810x580.png 810w\" sizes=\"auto, (max-width: 857px) 100vw, 857px\" \/><\/a><\/p>\n<p>Heteronyms are particularly problematic for computer-generated readings of the written word because context changes the pronunciation and meaning in a way that seems hard to find solid rules that work in all instances.<\/p>\n<p>Consider the ridiculous sentence:<\/p>\n<div class=\"myQuoteDiv\">The bass played the bass to the applause of the crowd.<\/div>\n<p>Both instances of <i>bass<\/i> are nouns but it seems clear that the musical instrument can\u2019t be the subject of any sentence, so maybe a programmer can write a rule to accounted for this case or, since it is unlikely that the previous sentence will show up in anything worth writing specialized code to handle, maybe it is ignored altogether.<\/p>\n<p>The sentence:<\/p>\n<div class=\"myQuoteDiv\">To a man with too many things to do a minute is a minute fraction of time.<\/div>\n<p>may be an entirely different story. &nbsp;There is a reasonable chance that a sentence containing both the noun and adjective form of <i>minute<\/i> will be written and the word order in the sentence is unlikely to indicate which is which in nearly all cases. &nbsp;Still the local association of the article <i>a<\/i> just before the noun form and the occurrence of the noun <i>fraction<\/i> just after the adjective form may be enough of a pattern to write a rule.<\/p>\n<p>And then there are the voluminous list of noun-verb heteronyms, a sample of which are listed here (for a <a href=\"https:\/\/www.wordstress.info\/wp-content\/uploads\/2014\/10\/Stress-Pattern-Change-noun-verb-pairs.pdf\">full list of two-syllable examples<\/a>):<\/p>\n<ul>\n<li>Address and address<\/li>\n<li>Bow and bow<\/li>\n<li>Buffet and buffet<\/li>\n<li>Desert and desert<\/li>\n<li>Dove and dove<\/li>\n<li>Lead and lead<\/li>\n<li>Present and present<\/li>\n<li>Project and project<\/li>\n<li>Row and row<\/li>\n<li>Slough and slough<\/li>\n<li>Tear and tear<\/li>\n<li>Wind and wind<\/li>\n<\/ul>\n<p>Writing rules for a machine to naturally speak any sentence with any of these is difficult. &nbsp;In some cases the word order makes it easier. For example:<\/p>\n<div class=\"myQuoteDiv\">He will bow when he receives the bow.<\/div>\n<p>guarantees that the subject (he) will be followed by the verb form of bow rather than the noun form.<\/p>\n<p>But other sentences are not so obvious. &nbsp;Consider these sentences involving the very treacherous word <i>tear<\/i>, which is a sort of palindrome of noun-verb heteronyms since the [ter] and the [tir] form can be both a noun and a verb.<\/p>\n<div class=\"myQuoteDiv\">He will tear open the screen to let in the breeze that will cause him to have a tear in his eye.<\/div>\n<p>and<\/p>\n<div class=\"myQuoteDiv\">His will open a tear in the screen to let in the breeze that will cause his eye to tear.<\/div>\n<p>Definitely more difficult, but perhaps doable by scanning the sentence for the helper verbs like <i>will<\/i> and <i>to<\/i>.<\/p>\n<p>But the fun doesn\u2019t stop there. &nbsp;Consider the following command.<\/p>\n<div class=\"myQuoteDiv\">Give the address.<\/div>\n<p>Is this a command telling an unidentified person to hand over where he lives or to give a speech to an audience. &nbsp;There is simply no way of knowing without comprehending the rest of sentences around it.<\/p>\n<p>And there you have, a brief but bewildering dip into what makes learning English a tricky &nbsp;enterprise. A relatively small set of rules may cover a majority of sentences encountered but to get to fluency an enormous number of special cases and exceptions must be mastered, some of which required a non-local analysis of the text to ensure correctness. &nbsp;And while teaching a machine to read and speak English aloud is definitely a noble goal for the computer scientist (and especially beneficial for the seeing-impaired) it is one that is likely to come with a whole host of frustrations for year to come. It amazing that any of us can communicate with each other at all.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>English is a notoriously difficult language to spell and pronounce. &nbsp;If it challenges the organic learners ability to grasp and communicate it more than doubly so is a source of&#8230; <a class=\"read-more-button\" href=\"https:\/\/aristotle2digital.blogwyrm.com\/?p=894\">Read more &gt;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-894","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/894","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=894"}],"version-history":[{"count":0,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/894\/revisions"}],"wp:attachment":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=894"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=894"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=894"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}