Leide Daiane de Almeida Oliveira
Mestranda em inglês
Estudos Linguísticos e Literários (UFSC)

ABSTRACT: The objective of this study is to verify the effectiveness of corpus linguistics in the analysis of literary texts. The corpus linguistics is a field of linguistics with great potential, including research in literary studies. The basic assumption is that through the use of written texts, also known as corpus, linguistic studies of different types can be performed. In this article, the selected literary texts are two tales of James Joyce's Dubliners: “Araby” and “Eveline”. Regarding the corpus analysis software, AntConc was selected for this research because of its convenience: it is not a complex tool and it seems to be accurate in relation to the results in view of the type analysis.
Keywords: Literature. James Joyce. Corpus linguistics. EnteConc.

RESUMO: O objetivo deste estudo é verificar a eficácia da linguística de corpus na análise de textos literários. A linguística de corpus é um campo da linguística com grande potencialidade em pesquisas em estudos literários. A suposição básica nesta área é que, através da utilização de textos escritos, também conhecidos como corpus, pesquisas linguísticas de diferentes tipos podem ser realizadas. Neste artigo, os textos literários selecionados são dois contos de Dublinenses de James Joyce: “Araby” e “Eveline”. Em relação ao software de análise de corpus, AntConc foi selecionada para essa investigação devido às suas conveniências: não é uma ferramenta complexa e parece ser precisa nos resultados, tendo em vista o tipo de análise proposta.
Palavras-chave: Literatura. James Joyce. Linguística de corpus. EnteConc.

This investigation will be focusing on how some literary features can be perceived through word frequencies. The main literary feature that is observed is the construction of the atmosphere. Roughly speaking, atmosphere could be defined as the feeling that is constructed through a particular literary narrative. The reader gets in touch with the atmosphere or mood mainly through the choices of words made by the writer, especially by the choice of adjectives.
In “Araby” and “Eveline” the atmosphere is a very important element because it leads the reader to get in touch with one of the major themes in the whole book which is Paralysis. If the atmosphere had not been well developed through each short story, maybe the different themes would not be so clear to the reader. The construction of the atmosphere in both short stories are made through the use of appropriated adjectives and the linguistic tools is able to locate them in the text, give the number of occurrences and the context in which each word appears.

Although most of time it is hard to come up with a good and all-encompassing definition concerning some broad topics as it is the case of corpus linguistics, it is possible to have a better understanding of it through the definition given by Tony McEnery & Andrew Hardie in their book named Corpus Linguistics: Method, Theory and Practice. They define corpus linguistics as “dealing with some set of machine-readable text which is deemed and appropriate basis on which to study a specific set of research questions” (1). Another definition is brought by Nadja Nesselhaulf in her article entitled Corpus Linguistics: A Practical Introduction. She says that: 

Corpus linguistics is a method of carrying out linguistic analyses. As it can be used for the investigation of many kinds of linguistic questions and as it has been shown to have the potential to yield highly interesting, fundamental, and often surprising new insights about language, it has become one of the most wide-spread methods of linguistic investigation in recent years. (2)

Bearing this definition in mind and considering the great number of possibilities in the field of Corpus linguistics, it is also important to have some understanding about the software tools to deal with machine-readable texts. There are many software tools available nowadays, nevertheless it is important to be careful and check if the chosen program is working properly. Laurence Anthony on his research entitled “A critical look at software tools in corpus linguistics” discusses about the problems that might occur.  One of the main problem is conflicting results when, for example, the corpus is processed in more than one tool in an attempt to test the veracity of the result. This problem, as he points out is due to a different way to count words but some adjustments can be easily made in the selected tool.
The selected tool to this investigation is AntConc, version 3.2.4. This freeware concordance program was developed by Laurence Anthony, director of the Centre for English Language Education at Waseda University. This is a widely known program with many guides available on the web. Once it has been downloaded, the next step is to select the file to be analyzed and the other steps are self-explanatory in the sense that it is possible to test what each category of analysis can do with the text. The available options are concordance, concordance plot, file view, clusters, collocates, word list and keyword list. To this investigation, concordance and keyword list were the only necessary categories. Concordance is the section of the program that gives the location of the selected word in the context in which it appears in the text, and the keyword list lists all the words of the text, giving the number of occurrences of each word.

The investigation was carried out in two steps. The first was to see how the keywords were used to create the atmosphere in both short stories. The second was to analyze words indirectly connected to the atmosphere of the short stories, with the connection being perceived by the analysis of its context of occurrence. The following sub-sections will bring a more detailed explanation about the data and how they were analyzed.

3.1. DATA
The data is composed by two short stories that belong to a collection of fifteen other short stories by James Joyce. As the short stories follow a certain pattern in relation to the order in which they appear in the book, the chosen ones were respectively “Araby” and “Eveline”, the third and the fourth. 
“Araby” has an apparently simple plot; it tells the story of a young boy that falls in love with the sister of one of his friends that happen to be his neighbor. He goes through all the symptoms of a person who is in love but has never had the guts to talk to the girl. The day he finally gets to talk to her, they talk about a bazar that would take place in the city, she could not go but he promised her he would bring her a gift if he went. The accomplishment of that promise became the most important thing for the boy. He asked his aunt and uncle for permission and for the money, they allow him to go but his uncle forgets about it and gets home very late. Even being late the boy is hopeful he can still find some stalls open and get the girl a gift. When he gets to the bazar it is almost completely dark and all that is left for the boy is a tremendous frustration.
In Eveline the main character is a nineteen-old woman that has a very simple life, she works in a shop and also takes care of her house. Her mother and older brother are dead and her father is a drunken aggressive man that usually treats her and her younger siblings in a bad way. At a certain point she meets a sailor named Frank and she has the opportunity to leave to Argentina with him. Nonetheless either because she is afraid to leave and face a new life or because she has promised her mother she would take care of the house and her younger siblings, she gives up leaving to Argentina with Frank and has to face one more loss in her life.
There are many other important elements that are sprinkled thought the short stories such as political issues, religious references, some sort of paralysis that follow most of the short stories in Dubliners and that are metaphorically perceived through the behavior of Eveline and from the boy in “Araby”, They both face situations in which they do not have much to do about. The atmosphere contributes to the understanding of their situations and their feelings. Without the elements brought with the careful construction of the atmosphere, their behavior and the way the reader feels about them would be different. Without an appropriated atmosphere the short stories could sound as ordinary situations that could happen to anyone, nevertheless, both “Eveline” and “Araby” are so well arranged in terms of atmosphere that it is hard to remember the short story without associating to that feeling created by the appropriated choice of words by the writer.

Both literary texts- “Araby” and “Eveline”- will be processed through AntConc, version 3.2.4. As the focus of the investigation is to verify if it is possible to perceive the atmosphere created in both texts and once the atmosphere is mainly created by the use of adjectives, the functional words such as “the”, “a”, “with” and so on will be ignored. Therefore the first content words will be analyzed and it’s relation to the construction of the atmosphere.

The main results were that it was possible to verify that through the use of corpus linguistics, a literary analysis could be successfully carried out. In this particular investigation the atmosphere of two short stories could be verified through the use of a linguistic tool. To be more specific about how it was possible to perceive how the atmosphere was created in each short story, this section is divided into two subtopics, one about the atmosphere in Araby and the other one about Eveline.

Taking the functional words aside it is perceived through AntConc that the first adjective that is ranked is the word “dark”. It appears directly seven times in the five-pages short story. The tool allows a deeper analysis and by clicking on the word dark, the context in which it appears in the short story is shown as it can be seen below:

The occurrence of the word dark is, of course followed by other words that together reinforce the gloomy atmosphere that is created. AntConc is also very helpful to show the word in the context in which it appears. It is worth taking a glimpse on the word dark in the context it occurs so that it can be seen that it doesn’t refer to the color of the hear or eyes of a particular character, for example. It has otherwise, everything to do with the creation of a melancholic mood to the narrative as can be seen in the following excerpts:  “through the dark muddy lanes behind the houses”, “the back doors of the dark dripping gardens where odours arose from the ashpits”, “to the dark odorous stables”, “It was a dark rainy evening”, “I looked over at the dark house where she lived,” “the dark entrance to the stall”, The upper part of the hall was now completely dark”.  Joyce’s creation of the atmosphere is greatly formed by the use of the work dark but the use of the word light also occurs many times and its context deserves investigation.
The second more frequent adjective is the word light. It appears four times in the short story. But at least in one of its occurrences it has the same meaning of dark because he says that “the light was out.” The other three times the word light arises is usually related to the girl the boy is in love with. The fact that the word light is usually related to the girl is metaphorical in the sense that besides her, everything else was dull for the boy: school, his own house, the streets, playing with his friends and everything else that was not related to the girl he was in love with. It can be seen below how the occurrence of light takes place:

After the adjectives “dark” and “light” the next one that is shown in the word list of AntConc is blind, the context in which it appears also indicates that it has a close relation to the creation of the atmosphere. Nevertheless, the tool reveals that one of the occurrences of blind in the text is in fact a noun. The other two moments that the word appears is in the following manner: “North Richmond Street, being blind, was a quiet street”, “an uninhabited house of two stores stood at the blind end.” The occurrence of the word blind might suggest a more complex idea than a simple description of the street, it also collaborate to the mood of the narrative. The next adjective is brown, and it seems to be stronger than blind in the creation on the gloomy atmosphere. It appears this way: “gazed at one another with brown imperturbable faces.” “I kept her brown figure always in my eye”, “seeing nothing but the brown-clad figure cast by my imagination”. All the uses of brown are helpful to build the dark atmosphere in Araby.  The following adjectives listed are bad, cold, confused, lightened and useless, all of them occur only twice in the whole short story. The rest of the adjective don’t seem to have much relevance to the present investigation.

Following the same method applied to Araby- disregarding the functional words and focusing on the use of adjectives- it is perceived through the linguistic tool that the first adjective that appears in the ranking is the word “hard”. It appears five times in the short story and it’s important to take a look at the context of its occurrences.  “Of course she had to work hard, both in the house and at business.” “he wasn't going to give her his hard-earned money to throw about the streets”, “She had hard work to keep the house together”, “It was hard work -- a hard life -- but now   that she was about to leave”. It’s remarkable how the choices of adjectives are able to produce specific mood to the narrative. In Eveline the reader gets in touch with a harsh reality in which the young woman is inserted. The repetition of the word hard helps the reader to perceive the atmosphere that is built through the story. See below the occurrence of the word hard at AntConc:

The next adjective that appears in the ranking is the word “dead”. It appears four times in the following context: “She and her brothers and sisters were all grown up her mother was dead.” “Tizzie Dunn was dead, too.” “Latterly he had begun to threaten her and say what he would do to her only for her dead mother's sake.” “Ernest was dead and Harry, who was in the church decorating business, was nearly always down somewhere in the country.” All the passages in which the word “dead” arises, the feeling of isolation is built in the sense that, the good people that really cared about her are all gone, she hasn’t got many people to count on. The word “dead” helps to create an atmosphere of paralysis, her life seems to drag through the daily routine, there is no excitement or adventure, and everything seems a little “dead”. See below the occurrence of “dead” in the tool:

The following adjectives that appear in the short story are the word “little” and “long”. These two words appear four times in the short story. Although these two words have a large number of repetitions in the shot story, they do not seem to be very important in the creation of the atmosphere. The next adjectives are: “black”, “brown”, “distant”, “new” and “old”, all of them occur twice. The other adjectives just appear once and due to the nature of this specific investigation, only adjectives with four or more than four occurrences are worth being analyzed.
In Eveline “hard” and “dead” are the adjectives that occur more often. These two words are central to the construction of the atmosphere. Through their constant repetition, the writer leads the reader to specific moods. The other adjectives that can be found in the short story as cited above don’t seem to have a great influence in the construction of the atmosphere.

This investigation was able to indicate that through the use of a linguistic tool, in this case, AntConc, it is possible do carry out literary analysis through corpus linguistics. In the present investigation two short stories by James Joyce were processed through a linguistic tool named AntConc. The investigation was able to provide a satisfactory result in the sense that it was demonstrated that through a linguistic tool a literary analysis can be pursued.
The perception of the atmosphere in both literary texts was interestingly perceived through the approach of corpus linguistics. In “Araby” and “Eveline” the frequency in which the same adjectives occurred in each short story was significant to the construction and perception of the atmosphere. In the case of “Araby”, the words “dark” and “light” were highlighted, being dark the most important one to give the tone of the narrative. And in the case of light by showing that the context can invert the meaning of a word. In Eveline, two other adjectives were chief to the perception the atmosphere, “hard” and “dead.”
The importance of the results of this investigation is related mainly to the fact that researches can be carried out with the collaboration of Linguistics and Literature, besides that, the results of the present investigation seem to be quite reliable. Another important aspect that deserves emphasis is time optimization, once it would be time consuming to go through long texts to find out specific words. With a tool such as AntConc it can be done with a click.

The limitation of this kind of research could be seen from two points of view, one would be concerning time constrains because the data is easy to be processed through the linguistic tool but it might take a long time to go through the analysis of the results obtained. The other limitation would be in relation to the reliability of the tool as it has been briefly discussed before. Nevertheless, those minimal limitations do not compare to the huge number of possibilities that such kind of research can provide.
ANTHONY, Laurence. (2013). A critical look at software tools in corpus linguistics. Linguistic Research 30(2), 141-161. Retrieved from: http://www.antlab.sci.waseda.ac.jp/research/20130827_linguistic_research_paper/linguistic_research_paper_final.pdf
MCENERY, Tony & HARDIE, Andrew. Corpus Linguistics: Method, Theory and Practice. Cambridge, 2011.
NESSELHAULF, Nadja. (2011). Corpus linguistics: A practical introduction. Retrieved from: http://www.uni-bamberg.de/fileadmin/eng-ling/fs/Chapter_9/Index.html?9References.html
JOYCE, James. Dubliners. An electronic classics series publication.