{"id":446574,"date":"2017-12-04T19:35:52","date_gmt":"2017-12-05T03:35:52","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&p=446574"},"modified":"2017-12-05T19:33:58","modified_gmt":"2017-12-06T03:33:58","slug":"make-first-accomplishment-nlp-field","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/make-first-accomplishment-nlp-field\/","title":{"rendered":"How to Make a First Accomplishment in the NLP Field"},"content":{"rendered":"

\"\"<\/p>\n

Natural Language Processing (NLP) is a technology to help computer process or understand human languages. It includes:<\/p>\n

    \n
  1. Syntactic<\/strong>–<\/strong>semantic Analysis<\/strong>: Conduct word segmentation, the tagging of parts of speech, named entity recognition and linking, syntactic analysis, semantic role recognition, and polysemy disambiguation for a given sentence.<\/li>\n
  2. Information extraction<\/strong>: Extract important information from a given text, such as the time, the place, the people, the event, the cause, the result, numbers, dates, currency, proper nouns, etc. Put simply, it is to understand who did what at what time, for what reason, and with what results. This is related to entity identification, time extraction, causal relationship extraction, and other key technologies.<\/li>\n
  3. Text mining<\/strong>: This includes text clustering, classification, information extraction, abstracts, emotional analysis, and the visualization and interactive expression of mined information and knowledge. Currently, the mainstream technology is all based on statistical machine learning and deep learning.<\/li>\n
  4. Machine translation<\/strong>: Automatically translate text from an input source language to another language. Based on different input media, this can be sorted into text translation, voice translation, sign language translation, graphic translation, etc. Machine translation, from the earliest rule-based method, to the statistics-based method 20 years ago, to today\u2019s neural network (encoding-decoding) method, is gradually transforming into a more accurate system.<\/li>\n
  5. Information retrieval<\/strong>: Index large-scale documents. Different weights can be given to various words in the document to build a simple index, or the technologies from the above 1, 2, and 3 can be used to build a more in-depth index. In the query process, the system will analyze the input query expression such as a search term or a sentence, and then search for matching documents from the index, and then sort the candidate documents based on a sorting mechanism, and finally output the top-ranked document.<\/li>\n
  6. Question answering system<\/strong>: Q&A system provides a precise answer to a question expressed through a natural language. This requires a certain amount of semantic analysis of natural language queries, including entity links and relationship recognition, to form logical expressions, and then it searches for possible answers in the knowledge database, finally locating the best answer through a sorting mechanism.<\/li>\n
  7. Conversation system<\/strong>: The system completes a task through a series of dialogues, conducting chatting and question answering with the user. This is related to technology such as understanding user intent, general chat engines, Q&A engines, and dialogue management. In addition, in order to reflect context, there needs to be an ability to conduct multiple rounds of dialogue. At the same time, in order to reflect the individual, there needs to be development of user portraits and personalized, user-based replies.<\/li>\n<\/ol>\n

    With the impressive results of deep learning in image and speech recognition, people have high hopes for the value of deep learning in NLP. In addition, the success of AlphaGo renders the research and application of artificial intelligence extremely popular. Natural language processing as a cognitive intelligence in the field of artificial intelligence has become the focus of attention. Many graduate students are entering the field of natural language, hoping to display their talents in artificial intelligence in the future. However, people frequently encounter problems. As the saying goes, everything is difficult in the beginning. If the first attempt is successful, the student will build confidence, learn the ropes, and do better and better. Otherwise, the student may become discouraged and even leave the field. I will provide my personal advice here, and I hope these views can lead to deeper discussion.<\/p>\n

    Advice #1: How can I learn my first skill in the NLP field?<\/strong><\/p>\n

    My advice is: Find an open source project, such as a machine translation or a deep learning project. Understand the mission of the open source project, compile the demonstration program issued by the project, and try to obtain the same results as the project demonstration program. Try to gain an in-depth understanding of the algorithm of the demonstration program for the open source project and implement the algorithm of this demonstration program, and then follow the standard test set provided by the project to test your own program. If your results are different from the ones shown by the project, then you need to carefully check your program and make changes repeatedly until the results are basically the same. If you still can\u2019t get the right results, bravely contact the project author. On this foundation, see if you can further improve the algorithm or implementation to obtain results even better than the demonstration program.<\/p>\n

     <\/p>\n

    Advice #2: How do I select a good topic?<\/strong><\/p>\n

    For graduate students in the engineering field, topics are often given by the teachers. You need to take a practical and solid hands-on approach. You might not need much theoretical innovation, but you need to have strong abilities in realizing your ideas and in comprehensive innovation. However, for academically oriented graduate students, in order to obtain first-class research results, the research topic must have a certain degree of innovation. My advices are as follows.<\/p>\n