Alright, let's get started. So today it's time to give you-you will give the presentation about your proposal and your proposal should include the first one, research question or research questions. The motivation of research and corpus or corpora for this research. So and also you need to introduce the methods and also expected results and conclusions and at the end, implications. All right. Good afternoon everyone. So I have my team with me. So we'll be talking about our research proposal, which is the sentimental analysis on the online course review at KAIST. So this is the agenda for today. So we'll go through the research questions, motivational research and corpus, methods, the expected results, conclusions and the implications. So our research questions. So we were thinking by using sentiment analysis and part-of-speech tagging in the course review for this course like calculus I, is it possible to identify what aspect of the course are mentioned in the positive terms or negative terms? So we came to this research questions because of we realized that the online subject reviews reflects on the satisfaction of the students with their courses and students have increasingly relied on the online review for the decision-making according to the criteria, and this does not just limit it to subject reviews. The online reviews in general on online shopping websites are also heavily reliant by the customers to-for them to make that decisions to do the purchase. So generally, students are more inclined to the overall positive rated courses. However, the written review might not accurately reflect the overall grade given. For example, the student might rate this subject "A", but he or she might comment that the lecture is very disorganized or some negative comments. But in the end, she still gives the subtract A. Yeah, so therefore, our analysis is aimed to provide a more accurate review of that chosen course, which is the calculus I, and to identify the key terms that associates with the overall positive or negative reviews. So I'll pass the floor to my teammate to talk about the corpus. So let's move on to the corpus for this research. For the corpus we used the critics for the calculus one lecture by KAIST students retreat from the OT site, it stands for Online Time-planner with Lecture, and it is indeed on Korean Corpus and it's written in normally casual register because students might wrote this reviews as if they were talking to their friends. There were 74 reviews taken and there were on average over 38 words tokens per review. As you can see, this is the screenshot of the OT site and we can see the the reviews by students. We compared these reviews and did some sentiment analysis and compare it with overall grades, which were posted on the right side. The special thing is that those short reviews will not be omitted from the analysis because they can contain some important tokens for the analysis. For example, there was on one word reviews on [inaudible]. In Korean, which means uninterested, uninteresting in English and this can contribute in a highly a negative score in the sentimental analysis. All right. So we haven't done it yet. What we will do, hopefully, is as was mentioned a couple of times, use sentiment analysis and try and see if we can score the overall review using the polarity of the words used. Hopefully, if you sum up the polarity in a review that will give you an overall sentiment of whether this was good or bad. Then we will use part-of-speech tagging and then we'll find the highly polar adjectives and see what nouns are co-located with that in order to see what aspects of the class is usually talked about in a very positive way or in a very negative way, and our expected results is that hopefully the overall grade that the student gives the course in the end, they mentioned that you can give the course an A, B, C, D, or E, is that all the way down to E and F? Then hopefully if they've given F, we'll see that it has a very low sentiment in the review, sort of saying that we've managed to make the sentiment work has we wanted it to because in most cases it should have this overall comparison between the two. But if there is a discrepancy between the overall grade in the sentiment, it could be because we haven't managed to taking the negations the proper way. So for example, do not like the lecture would probably score very highly, positively polar because it just recognizes the word like and it doesn't take into account that you have a negation in front of it. So those are some of the stuff that we have to figure out how to implement. The implications of this, we wanted to add another dimension to their reviews. You can find them course reviews online because if you just see all these free text and you as a student go in and trying to figure out if this is a class you want or not, it can be difficult to read all the text to figure it out first. So this would be a way to quickly get an overview for the student to see what it was that people didn't like. It was the lecture notes, but what they did like was the content of the class and then it's easier to make your decision of whether you want to class or not. Then depending on how the sentiment analysis in Koreans work out, where am I going to have to make a manual tagging of some of the words? If that comes to it, then one of the resource would be a dictionary of sentiment, like atomic words that are specific to course reviews, which could then be used for further research. Yeah, that's us. Okay. So then my turn. I have three comments and a question. The first one is, though it might be better, this is a suggestion. So that might be better to just consider their representativeness of the reviews. What I mean is that you might want to think about, how many students have taken that Calculus 1 per year. For example, 100 students took that Calculus 1 but then 10 students actually they wrote their reviews. That means like 10 percent. Am I calculating correct? So those representativeness, if you report that one, that might be more reliable and trustful. So that one is one of my suggestions. The second one is, what was it? So they had challenging or issues like how they can identify or examine the negations, such as I do not like. Any other good suggestions or ideas how they can figure it out? I do not like the class, which means the whole sentence. That means pretty negative. But because of the word like, maybe the analysis identifies that one as positive. Okay Kyle. There's a function in Angkor which is called n-grams which divides sentence by two or three words or you can find out whether the sentence is negative or positive. For example, this is not good. If we divide two words in this sentence, this is its neutral sentence and it's not as bad, and not good as neutral so this sentence is a negative sentence. So you might want to use the n-grams in Angkor. That one is another consideration or other possible way would be, because we have very limited list of the negation such as not or no. So you might want to just look at that negation part only, because you have only 73 reviews, so it's not that big data this time. So you might want to go back and only separately analyze that part, that one is another way you can do it. My third one is the question. So why you guys consider the overall score only instead of, because in the OTL the website there are many difference sub-categories. So is there any reason why you consider only the overall? Many variations in the overall grade? Yeah. So look at the example. For example, this one is grade and how intensive, the lecture itself and this one is overall. When I got this data, some students gave some extreme scores statistics F or A at the same time. I thought that it was quite not precise for analyzing the data. So we choose the most fitting data as overall score. Is it not really precise? I think that one is more precise in terms of giving the grade or in terms of lecture. I think it depends on what you want to look at because it needs to be something that we can compare to. So if we find out that the lecture is mentioned a lot of times, then we will compare the sentiment of him to the scores that they or him or her to that lecture. But if you look at the overall review, they have a lot of different aspects and here it's limited to what grade the student got and then how intensive they found the course. But we're looking for the other aspects so that it would be like disorganized lectures or those aspects, which is our main goal. The other thing is the whole comparison with the whole grade, is just a way to see if the sentiment fit the right level. All right. So that's totally makes sense. Very logical. So it might be better to mention that what you exactly said in your research paper. so then other people, they probably curious about why you guys just examined the overall score like me. So if that explanation is in your research paper, that would be great. Lets give Viona a clap. This is our project title, analyzing the change in usage of the strong language in action movies, which has been produced with an interval of 50 years, 50 to 40 years. So from this title we build our main question, which is, how does the perception of the strong language and usage in action movies change over a time period of these 40 to 50 years from the 70s and until now. This is our main question we will focus on. But from this, we also got some secondary questions, which will be, how do people's perception of the strong language change over time, or how does it change as a society and do modern time action movies contain more strong language than action movies produced these past years ago? The motivation for this is that the language, of course, changes all the time and we use some words now which our grandparents will never use and vice verse. So we would like to see how does this change with strong language. The strong language is a way to express aggression and violence. So it's a hard language to use. So we want to see how has this changed over years because we can say words now which our grandparents would be offended of, and we will just shake the head because we use that normally. So for us it doesn't seem that hard to use, and that is because the perception of the strong language changed it over the times and we want to look into how could that be and can it be seen in the action movies because it is action. So of course, action movies will include a lot of the strong language so that would be an obvious way to look at this, because they use it so much. So we can hopefully easily see the change in time during these past years. So we have to create a new corpora for our dataset because there is only a few data set we can collect across the time. So we collected eight movie script that is produced in 1970s and eight movies that is produced in 2010s and we collected the script. We collected a famous movie that is produced by Hollywood that will represent the uses of language in America. So the method we are going to do is that we first preprocess the data. The script we have collected dataset consists of 16 scripts of action movie from the two decade. However, the raw data contains an unspoken feature. The spoken feature are extracted from the script itself. We are going to use Python to preprocess that step. As you can see, the spoken feature is only the part of the oral scripts so we had to first filter them. We are going to perform the sentimental analysis to the process dataset. We are going to focus on frequently used POS, mainly on interjection, nouns, verbs, adverbs, and adjectives, that are inspected using AndCone. Those POS are intuitively chosen that strong languages will be in that area. There is low chance that preposition has a strong link. Proposition is strong. Also, we are going to see the co-location of those filter spoken languages feature, to see the connection, the spoken feature. Then you might have a question there. How do we select the strong language from the data? Because the definition of spoken language is a bit vague. So we are going to use the movie's rating to filter those strong languages. As you can see in this feature over there, those rating system is somehow related to the language that is used throughout the movie. So the rated R movie would likely to be have several strong language. The rated PG 13 would have low chance of strong language, rated PG would have no chance to have a strong language. So the intuition is that if a token appears only in rated R movie only, then that token will be more likely to be a strong language. So we first filter them and we are going to compare the strong language of each decade to be found and compared. Next, our expectation of this research, first the newer movies, we expected that, the newer movies to contain more strong language. We've seen that since we are more tolerant to strong language in modern time. Second, the language is expected to change to a more rough usage of word. So in modern time, more extreme words and the use of intensifier should be increased. So our hypothesis is the change in perception can be due to increased tolerance to strong language. The implication that we assume in this research is, first, we can detect those strong language of action movies, and this will help the rating. Second is to see the changes of the strong languages use over time. Third is observe other people's perception of strong language loudness. First one is to make the society aware of increasing usage of strong language. Okay, the last is "Talk nice to each other" calling is because actually the rating system is really for the parents to know what content is in the movie. So in this way, we educate our children better than not to use stronger language, and thank you. Well, I have a question. So what's the definition of strong language? I think we have, or we've heard a lot, strong language. So what is the definition? This is the way we use language to express that we are aggressive, almost violent, we can call it violent language. When you use hot words to express that you are angry or you wanted to do something violent. So it's how we perceive the language, how we think about the language. Do you say something that I feel offended? So to be more specific. Search it over the definition in the Longman Dictionary. According to Longman dictionary. There are two meanings there, the angry word used to tell people what you exactly mean. So we are going to emphasize the meaning of this language word. The second one is that we focus on this area the words that most people think are offensive. However, how people think about the language as offensive changes throughout the time. So we want to see those things. According to MPAE the movie rating is defined by the strong language. So this is a bit big and related with the perception of the language. Okay. So thank you for introducing the definition of strong language according to Longman. So these are two definitions. Will you guys follow the definitions,, the Longman? That is all planned. Okay, Longman defined. Okay. Which is great. So in your paper or in your final presentation, you need to make sure that what strong language is. Okay. Any other question to this group? Can I just keep asking? Okay. Yeah. So the second question is that time period of 40 years from 1970. So what was your expectation? My expectation was 1970, 1980, 1990, and 2000, something like that. So you might want to slightly change your research question because you only have two different corporate. One is from 1970 and then one is from 1980, 2000. 2010s. 2010. Yeah. So it might be better to change slightly. Also any other question? My last question is, could you go back to sentimental analysis method? Okay, all right. So what tools will you use for sentiment analysis? Thinking using the AntConc for finding the words. Okay. So AntConc you can find the word list. Yeah, the frequently words and then compare this to, we'll do that for every movie and compare it to its rating, so we can see which phrases or words are frequently used in this type of rating movies forbidden for children nor general law what rating? By manual? I don't think we have a better plan for that yet, but maybe take the first 10 or something and see is this very frequently use and see does this work and phrases also include in the other movies, or I think we will know more about it when we see the results from the scripts. Okay, all right. So now is just a plan for now. Okay, which is fine. Other groups. I've noticed that other groups, they are using sentiment analysis as well. So any comments or suggestions, because they are still searching the good tools? So since you already mentioned that you're going to do the preprocessing of the text in Python, there are several packages for Python that does sentiment analysis on English. It gives you a score like a polarity of the words so that you can get like an overall sentiment of sentences or of the specific words, so I'd look into that. One of them is called VADER sentiment. Right, yeah. So thanks for sharing all the information. This one is another chance that we can share. Not only just giving the questions, if they share any challenges, we can give some suggestions or tips if you have. Okay. All right. Please write down all the comments and questions on the Google Sheets. So that this group two can take and consider all your comments and questions. Okay, let's give a round of applause to group two. thank you very much. Hello. First of all I'm disappointed of my group because they left me alone here presenting in front of you. It's a big challenge, but we will figure it out well. I will manage, no worries. I cover you guys. We are here to present our budget proposal. As you can see in the photo, it's going to be about Steve Jobs and Apple. But let me introduce to you how we have reached our research question. At the very beginning, we wanted to compare two presidents, in this case, Barack Obama and Donald Trump. Then analyze Dexters, a newspaper, Dexters, and see how they are describing them and see if there is a pattern or not. But we did a lot of research and we saw that the there is already a lot of research done in this field. So we decided to move on to pick other two people that they are also popular. In this case, we chose the Steve Jobs and the owner of Tesla. But however, as you may know that time-frame is constrained. So we decided to just focus on the Steve Jobs. So we figured out to do our research on Steve Jobs speeches and see if his speeches have changed according to Apple's popularity over time-frame window from 2005 till 2011. Why this time-frame? Because on the first year, it was when the first iPhone was released until the last year is when Steve Jobs died. You also may wonder, may ask yourself how you will measure apples popularity? Well, the definition of popularity may vary from person to person. So our group agreed to measure apples popularity according to its sales in each year. So the more sales in one year, the more popularity Apple was in that year. So that's how we will know the popularity of Apple. Also we want to conduct this research to see, to discover the most common words and phrases used by Steve Jobs in all of his speeches that he did. Also, we want to analyze if his speeches have changed according to Apple's popularity or not. We want to see that. Finally, we want to understand which are the most common words used by him when introducing a new device, a new product to the market, which are the ways to present, to introduce a new product. So yeah, the corpora that we will use is just based on YouTube videos of Steve Jobs presenting new devices, new products. So basically we choose the couple or a couple of videos, one video in each year. We have extracted it's transcripts and then analyze it. Analyze it the several tools. In our case, the POS we'll do it by TagAnt and also our web-page tool that is called parts of speech, that will give us how many or which are the most common, we'll sort the text that we will put in the web-page tool. It will say which are the most common nouns used also the most common adjective and so on. The most common adverbs and phrasal verbs and so on. Also the word frequency and collocates we will use AntConc. Finally, just to present our results, we will use the Voyant tools that is also a web-page tool. The conclusions we don't know it so far. That's why we want to do the research, but the expected conclusions with the results it's just we think, we assumed that his linguistics, his way of speaking has changed according to the circumstances. Also we assume that there is butterfly effect, a cycle effect between his speech and also Apple's popularity. So his speech has changed according to Apple's popularity and also the company's status has changed also because of his speeches. So we just assume that there is this pattern. Finally, our implications is just we saw at the very beginning that the 90 percent of the startups fail according to one of the most common newspaper forward. So replications of this project is it's going to be useful for new entrepreneurs to use or to know for them which are the best words to introduce new ideas, new products. So it's going to be useful for them. So that's basically our implications. Thank you very much. Let's see which are the final conclusions of our report. Yeah, I'm very looking forward to the results because usually the speech, I guess there are a lot of positive or emphasizing adjectives. Yeah. So how that speech patterns and the popularity, they are related to each other. So I'm very looking forward to the results. Could you explain more about the Voyant tool for visual? Maybe other group, they might want to use that one as well. Okay, basically I'm not the one who decided to use this tool, but I will try do it. Basically you just introduce a text to big texts in this tool and just automatically, they give you which are the most common words. So the bigger the word is, the most frequency or appears in the text. Basically is like this way. So based on the frequencies. Yes. They can change the size of each word, right. Good. Any other comments or questions? Okay, my last question is that, will you guys use Lemma or just each word? From TagAnt, you will see the first column is the original words. Then the second one is part of speech. But the third one is Lemma. Lemma is representing the words group. For example, dances, dance, or danced. If that one using as a verb, then the Lemma is just dance. So will you guys use Lemma or just individual words? I think we'll use individual words. Okay. All right. Is there any reason? Not really, just because it's easier maybe or it's because mainly we want to focus in nouns and adjectives, not verbs. Not verbs, okay. So nouns and adjectives, and maybe whether Steve Jobs used singular word or plural word, could be like, could influence their popularity. So that's why it might be useful to look at individual words instead of Lemma. But so think about that, whether it's better to using Lemma. Lemma the easy thing is that because that one representing the individual words. So you may have the group's instead of the individual words. But if you want to look at very detail and specifically all the individual word usage, then instead of Lemma, you might want to use, yeah, individual words. Okay. All right. Let's give a round of applause. Good afternoon. We are group four. As you can see, our proposal topic is about Donald Trump. We're going to analyze his tweets. So let's get started. So the agenda for today, so we will be introducing our research topic and the subtopics behind it, and we will tell you about why we chose this topic, the motivation of why we chose it, the corpus for our research, the methods that we use to do generate this corpus, and what expected results and expected conclusion from this, and implications to this research. So as a research topic, mainly we want to analyze the linguistic usage of Donald Trump's tweets. The period that we chose is based on his day that he was sworn in plus or minus 584 days. So 584 days is because minus 584 days is the day that he submitted his application to run for presidency and plus 584 days is just to neutralize the bias, so he has the same amount of time. What we'll do with this data is we wanted to categorize the negative and positive words that he uses. To compare the speech registers like before his election, before he was sworn in, and after he was sworn in, whether there's a shift in any register. We want to take the words that he uses in his tweets and compare it to COCA and see if there are any discrepancies or whether the words they use are far from norm. Lastly, we want to see if there's any correlation between his tweets and his approval ratings. Motivation. Why we analyzed Donald Trump's tweets? First of all, he is popular. Yes, of course. The second one is his special hates on traditional politician. I think he's the first president of United States who rules the country through tweets, through Twitter. So the third one is that his tweets is powerful, could affect everyone around the world because he's president of United States. So here are two significant effect about him. First one is, he was a widely criticized by the traditional media like newspaper, television, and traditional media treat him now well. But he was successfully elected to be the president of the United States. So the most people get the wrong prediction about him of the election. Hopefully, we hope to depict a graph of his popularity during election and then figure out whether his tweets affect our results like the internet whether he is different. The second factor is, he uses the Twitter as the main platform to communicate to his supporters and condemn the political competitors. So based on our reasons and most of the politicians they tend to use strong words or the aggressive register before the election to satisfy their supporters and also building up image of changer. But most of them, after election, they tend to use a rational or mutual way to build a image of a reliable leader. So we wonder whether Donald Trump have similar shift of his register. So generally, we're just to want to figure out what's the role of his tweets play in the election. So the corpus of the research is mainly the Donald Trump's tweets and we might want to compare with American English corpus, which is COCA, to compare if his using words are high level or low level or he's using similar words with academic words or speech words, stuff like that. First, we're going to use Python to crawl over the Twitters. So Python enables us to utilize the browser directly and crawl over the tweets for certain period of time. So we collect the data like the picture over there, it is just a text file and we're going to sort the data in CSV format. The data is sorted with text and metadata, which is the date and likes, retweets, and comments. So after we sort the data, we are going to analyze the text if it is aggressive or not, if it contains positive or negative adjectives or registers, and if he is using high-level or low-level language. Finally, if some linguistic feature is related to his approval rate of himself. The expected results for this research are as follows. First, the level language belongs to low-level because topics about academic or literature are negligible. Second, the language usage will have a dramatic impact on political events. As Billy mentioned, before the election, he used very straight and strong words whereas after the election, he prefers to use your neutral words to get both progressive and conservative people. Lastly, compared to COCA, we suspect certain words are far from the normal the American people use because there's no section about social media tests. So what we are going to figure out is his own characteristic of social media tests and also we're going to find the most closest section to social media and why they're similar. That's implications. This research is meaningful in analyzing the impact of words selection on public. Also it will help to figure out the influence of social media and the appropriate language usage of it. That's all for of presentation. Thank you. I have a question. So could you go back to high level and low level? Here. So what's the definition of high level and what's the definition of a low level? The definition of a high level is the words from academic and literature texts and the low level is the slang. Advertisement or slang? Advertisement, really? Low level? Yes. Or same as casual speech. Casual speech. Yes. So it's better to make the clear definition of high level and low level in your research, that might be better. What else? Could you go to the next slide? So here, you guys mentioned that, and we've learned that in COCA, there are different sections and different registers. So how can you define the norm of the American people? So we didn't actually precisely planned how to use COCA. So after we analyzed the Donald Trump's tweet, we are going to try to utilize COCA and find concurrence according to frequency. Yeah, I guess. All right. So this is really challenging question. So the norm of the American people. If you have any good suggestions, please feel free to leave the comments on our Google Form, so they can actually consider or take the advantages from your comments. I think we will compare it to the more formal or more like comparing the fiction doesn't make sense, but maybe comparing to similar books like newspaper, the more formal text as you know because he's the president of United States. So if he uses this as a platform to make a stand, to tweet something, his language should be of formal use. So we will go and see which other more formal registers. So you guys will focus on newspaper specifically at COCA? Not just specifically, but of over time you can think of a newspaper, but I want to see there are so many different sections, but we haven't went to look at what else we can compare with. But from what we learned, I already know this paper which is the closest. Yes. This is a good idea. So the closest register is a newspaper, so they might want to just focus on register, which is newspaper, and compare the Twitter language written by Donald Trump. Just one more suggestion is that because you guys are interested in the texts data on social media. So Douglas Biber and Jesse Egbert, they actually investigated some linguistic features on social media or online. So you might want to take a look at that as well. All right. Let's give them a round of applause. Thank you very much.