LANGUAGE FOR ARRAY DATA PROCESSING INSTRUCTOR: DR. KWOK –BUN YUE MENTOR: MR. RAVI GANTA TEAM #6 HARITHA RANI JADCHERLA NARASIMHA BHYRAVABOTLA SALOTI ANNAPURNA VIKRAM SRIRAM Spring 2010 04/21/2010 1 Acknowledgement We would like to thank our professor Dr. Kwok Bun Yue, Chairperson of Division of Computing and Mathematics, for giving us an opportunity to explore our skills and innovations and also for helping us by providing valuable suggestions in making the project successful. We further extend our gratitude to our mentor Mr.Ravi Ganta, Director, Product Development of AnshLabs, for providing us with an opportunity to work on this project to integrate the proposed language on the tool. His valuable feedback in weekly mentor meetings helped us to better understand the software engineering process that is involved in real world product development. We would also like to thanks our friends and all those who were directly or indirectly involved in this project. Check your grammar carefully. Your background description is better than before. Spend less space on generic discussion, more space on project-specific discussion. Have you learnt any lessons? What are the percentage of original requirements your team was able to satisfy? 2 Abstract In a typical laboratory, physician performs various tests on blood, urine or other chemical samples. The test results/observations are tabulated for further processing. Sometimes the results are stored in various storage types like tables, records or 2D arrays. Data (observations/results) in these arrays are subjected to tedious mathematical algorithms or trigonometric functions. In situations where the blood sample belonging to one patient is tested for various diseases, these readings belong to only one patient but there are readings for each test for disease. Hence these observations are tagged by the test name. When performing various mathematical an algorithm that involve 2D arrays and tags that are really tedious and the preciseness of the results also plays an important role. Hence there is a need to define a domainspecific language for lab users (Your requirements are very specific. They may not apply to general array data users.). The Domain Specific Language (DSL) will be simple in semantics, compact in syntaxes and easy to understand by the lab physicians. Physician need not to have prior knowledge of any programming language. This minimizes the effort on the end user. There are many tools to develop such Domain Specific Language. (This paragraph is a general introduction, which is better than before. However, your main project’s goal is to design a domain-specific language and interpreter for 2D arrays. Just a sentence is not sufficient. You may want to provide more details.) The goal of this project is to design a user-friendly domain specific language and interpreter. The project has the following four phases: (The description of these phases are not as important as the requirements and designs) 1. Language specification design. 2. Unambiguous grammar design. 3. Use of compiler generation tool for creating lexer, a parser and 3 4. A run-time environment. Contents Abstract…………………………………………………………………………………....3 1. Introduction……………………………………………………………………………5 1.1 Background…..……………………………………………………………………5 1.2 Purpose…..………………………………………………………………………...6 1.3 Scope…..…………………………………………………………………………..6 2. Project Requirements…………………………………………………………………...6 2.a Defining a Language specification………………………………………………….6 2.b Defining a Grammar………………………………………………………………..7 2.c Compiler generation tools………………………………………………………… 7 2.d Run time environment…………………………………………………………….. 8 3. Design and Implementation………………………………………………………… …9 4. Technical Details…………………………………………………………………….. 10 4.a ToolsUsed…………………………………………………………………………10 5. Evaluation……………………………………………………………………………. 12 6. Conclusion and Further Discussions………………………………………………… 12 7. References……………………………………………………………………………. 13 8. Appendices…………………………………………………………………………….13 8.a Appendix A A1. Team roles……………………………………………………………………..13 A2.Team Contribution……………………………………………………………..14 8.b Appendix B………………………………………………………………………...15 4 1. Introduction Background: A chemical laboratory domain is a collection of physicians/lab technicians, scientists, doctors, terminology and many more. Physicians perform various tests on samples, and results are tabulated. Each test may result in many results/readings and some tests may yield only one result. Tests are different, and results are specific to each test. The results are to be stored in such a way that they belong to one patient for easy retrieval. The results specific to a test may be identified by tagging the results with the test name. This is done by using a Microtiter plate. A Microtiter plate or microplate is a flat plate with multiple wells used as small test tubes. The wells are distributed as a 2-D matrix. Hence the dimensions for this plate can be represented as for the matrix as “rows X columns” and read as “rows by columns”. This 2-D matrix is represented as a 2-D Array in a programming language. Figure 1 shows a typical Microtiter plate for which we have developed the language and whose dimensions are 8x12. This means 8 rows and 12 columns. In the following figure rows are named with letters and columns with integers. 5 Figure 1:Microplate There are many types of operations that a physician performs on the test results. Some of them are: 1. Arithmetic operations 2. Trigonometric functions like sin, cos and tan. 3. Statistical operations. When periodical tests are performed the results are read for some number of times and then further results are estimated by plotting a graph. In order to permit end users to easily specify such mathematical transformations on plate data, a simple but specific language needs to be designed and built. Our language is one such attempt at building this kind of language. This section gives good background. Purpose: The purpose of this project was to develop an interpreter for a language that enhances the processing of array type of data. This project mainly focuses on developing user-friendly interface while achieving accurate results after performing various operations on the plate. 6 Scope: The scope of this project was to help the lab users perform various operations without any errors and also to ease the task of calculating (or performing) operations on various samples. 2. Project Requirements: The main requirement of the project is to develop a domain-specific language to process array type of data. (You need to provide some discussion on what kind of operations are needed. They will in turn drive your language design.) In developing such a domainspecific language four major tasks are identified. The four major tasks that are identified are defining a language specification, defining a grammar, compiler-generation tools and developing a rum-time. a. Defining Language Specification: A language specification is nothing but a document that gives the user with all the necessary information which is needed to operate the interpreter. The document contains details about the variables and commands and their functionality that are used for the interpreter. This document helps the user in using the interpreter. The document is available in the project website under the deliverables section. Most readers know what a language specification is and a generic description is not urgently needed. b. Defining a grammar: As we are defining a new language, we need to develop a new grammar which defines the new language. Defining a grammar for a language involves two 7 phases. The first phase deals with defining the syntax of the language. The second phase deals with evaluating the defined syntax. Grammar does not define any limitations for the variable names and function names but this is handled while checking the semantics of the grammar. The syntax of the grammar is defined by a set of production rules. A production rule consists of non-terminal and terminal symbols. A non-terminal symbol is one which can be replaced or rewritten by another production rule. A terminal symbol is one, which cannot be rewritten or replaced. In our grammar, all the terminal symbols represent a digit or variable name or function name. In our project, we used LL* grammar to define the production rules. Thus defining a grammar gives the prototype of the variables and functions used in the language. In this project, we developed grammar in ANTLRWorks. A detailed description of ANTLRWorks is defined in the following section. Again, this is too generic. Readers with knowledge about grammar should know about production rule, terminals, etc. You don’t need a long introduction. c. Compiler-generation Tools: A compiler-generation tool is a tool which is used to generate the java code files. The input to such a tool is a grammar which should be syntactically and semantically correct. The compiler-generation tool that we used in our project is ANTLRWorks. An ANTLR is abbreviated as ANother Tool for Language Recognition. For every production rule, a corresponding syntax diagram is 8 displayed. By verifying the syntax diagram the logic of the grammar is tested. ANTLR also provides the necessary error recovery. In our project using ANTLR, we developed the required lexer and parser code files. A lexer code file consists of the syntax of the variables and functions that are being used and a parser code file describes the definition of the function and the working of the functions. As our project is developed in Java, we declared the target language as Java. The generated lexer and parser code files are used in developing the run-time environment. There are many such tools. Why were ANTLR selected? d. Run-Time Environment: A run-time environment is the interface where the user gives the input and sees the processed results. To develop the interface we used eclipse in java framework. Eclipse is open-source and is an Integrated Development Environment. Using the lexer and parser code files obtained from the compiler-generation tools, a driver file is developed. The driver file uses the grammar file, lexer file, parser file and the necessary library files. (Firstly, RTE and IDE are different, although sometimes they are integrated together. Your readers should know the basics of IDE and RTE so keep it short. Instead of discussing generic issues, focus on the specific flavors of your projects. For examples, you may state that users may want to access different collections of cell data and provide examples. This drives your language syntax.) 3. Design and Implementation 9 The architecture for the Language of Array Data Processing (LADP) is shown below: Figure 2:Architecture Diagram of LADP 10 The architecture can be described in three phases. In the first phase, the grammar that is defined is executed in ANTLR. ANTLR generates the lex and parse code files. A lexer code is one which generates tokens and a parser code file is one which generates a parse tree that gives us the syntax tree of the grammar. In the second phase an interpreter is developed. To develop the interpreter, we used Eclipse on Java framework. For developing the interpreter, the java code files i.e, lexer and parser code files from ANTLR, grammar file, matrix file are used. The output of this phase is a console application. In the third phase, the console is tested for a given input and the output is verified. 4. Technical details This section gives the details of the technologies used in developing the project. a. Tools Used: To develop the code files from the grammar we used automated tools. The present section gives the details of tools that are used in the project. The following discussion is too general. You may want to shorten it. Instead, focus on examples of interesting language syntax and how they are actually implemented as production rules. You don’t need to list all of them. However, you may want to give some full examples, such as: Need: users may want to access a rectangular block of cells within the test plate. Syntax: allow using ranges in specifying indices of rows and columns. Syntax example: M1[{0..2}, {1..4}] Production rule: … 11 ANTLR: ANTLR is an open-source framework that is used for constructing recognizers, compilers and translators from grammatical description. ANTLR provides excellent support for tree construction, tree walking, translation, error recovery, and error reporting. ANTLR has a sophisticated grammar development environment called ANTLRWorks. We can generate the code in many target languages by specifying the target language. The general ANTLR IDE framework looks as follows: 12 Figure 3: A snapshot of ANTLRWorks Eclipse: Eclipse is an open-source environment and is an integrated-development environment that is used to develop the console application. Eclipse offers extensible plug-ins. The plug-ins are available for various languages such as C#, C, PHP. We used Eclipse because it is mainly user-friendly and offers the programmer the flexibility as of Visual Studio.Net. Eclipse is well known. Just mention that you are using it and how you integrate Eclipse with ANTLR. 13 5. Evaluation The project is evaluated by testing whether the console is able to give accurate output for a variety of test cases. The test cases that we used for evaluating the project execution are accessing of a matrix, working of functions and expressions. Accessing of a matrix is evaluated by considering different test cases. There are two ways of accessing a matrix, single index and multiple index. In case of single index, the matrix is accessed as an atomic value. A matrix is accessed in multiple index in three different ways i.e., This is too short. You may want to describe how you construct the test cases and what the results are. a. Range b. Ordered Set: meaning? c. Wildcard (* means all) Working of functions is evaluated by testing the function with passing the parameters and also by testing the function in an expression. Working of expressions is evaluated by executing the possible expressions at the console application. All the possible test cases and the expected outputs are listed below in the Appendix B. 6. Conclusions and Future Work: A language for array data processing enables a user to perform the complex array operations in an easy way. It simplifies the task of a user by providing the user with userfriendly run-time environment where the users can give input and see the result. 14 The future work in the project may include further implementation of tag-based operations and functions. 7. References: [1]. Source for ANTLR www.antlr.org [2]. Source for Eclipse http://www.eclipse.org/downloads/ [3]. Team Website http://dcm.uhcl.edu/c423008fasalotia/caps10g6/default.htm 8. Appendices: Appendix A: Team roles and contribution A.1 Team role Vikram Sriram Major: Computer Science Email: vikram.sriram16@gmail.com Phone: 832-314-2534 Role: Developer Narasimha Bhyravabotla Major: Computer Information System Email: bsv.narasimha@gmail.com Phone: 832-477-0265 Role: Developer 15 Haritha Rani Jadcherla Major: Computer Science Email: j.haritha20@gmail.com Phone: 630-913-2844 Role: Developer Annapurna Saloti Major: Computer Science Email: salotiannapurna@gmail.com Phone: 305-439-7477 Role: Webmaster A.2 Team Contribution Sr No Tasks Vikram Narasimha Haritha Annapurna 1 Project Selection 25 25 25 25 2 Team Leadership 25 25 25 25 3 Project Analysis (Brainstorming) 25 25 25 25 4 Research Work 25 25 25 25 5 Website Creation, Maintenance 20 20 25 35 6 Preparing Instructor and Mentor Meetings 40 20 20 20 25 25 25 25 7 Documentation : SRS, Abstract, Language Specification, Presentation, Final 16 Report 8 9 Testing 25 25 25 25 Integration 30 25 25 20 Appendix B: In this section, the test cases are described. For a given input, the expected output is presented in the form of a table as shown below: List the actual content of M1. INPUT EXPECTED OUTPUT M1[1,2] 2.2 M1[0,11] 1.11 M1[{1,2,3},1] 2.1 3.1 4.1 M1[1,{1,2,3}] 2.1 2.2 2.3 M1[{1,2,3},{1}] 2.1 3.1 4.1 M1[{1},{1,2,3}] 2.1 2.2 2.3 M1[{1,2,3},{1,2,3}] 2.1 2.2 2.3 3.1 3.2 3.3 4.1 4.2 4.3 M1[{0..2},1] 1.1 2.1 3.1 M1[{0..2},{1,2}] 1.1 1.2 2.1 2.2 3.1 3.2 M1[{0..2},{0..8}] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 M1[{0},{0..8}] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 M1[{1,2},{0..8}] 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.0 3.1 3.2 17 3.3 3.4 3.5 3.6 3.7 3.8 M1[1,{0..8}] 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 M1[*,*] Are there any reasons for using the 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.1 1.11 values with yellow background. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.1 2.11 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.1 3.11 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.1 4.11 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.1 5.11 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.1 6.11 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.1 7.11 2.0 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.1 8.11 M1[*, 1] 1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 M1[*,{1,2,3}] 1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 4.1 4.2 4.3 5.1 5.2 5.3 6.1 6.2 6.3 7.1 7.2 7.3 8.1 8.2 8.3 M1[*,{0..2}] 1.0 1.1 1.2 2.0 2.1 2.2 3.0 3.1 3.2 4.0 4.1 4.2 5.0 5.1 5.2 6.0 6.1 6.2 7.0 7.1 7.2 2.0 8.1 8.2 M1[1,*] 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.1 2.11 M1[{1,2},*] 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.1 2.11 18 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.1 3.11 M1[{0..1},*] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.1 1.11 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.1 2.11 Avg(1,2,3) 2 Avg(1.0,2.0,3.0) 2.0 Max(2,5,8,10,12) 12 Max(8.0,2.0,56.0) 56.0 Min(6,9,2,3,4,1) 1 Sort(22,19,7,56,33,4) 4,7,19,22,33,56 Log8 Do you mean Log(8). Case sensitivity? 3 Where M1 is the matrix defined in matrix.txt as shown below in the table format where the first 0 1 2 3 4 5 6 7 8 9 10 11 0 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 19 Where is row 7? Values of yellow highlight background seem not matching results of M1[*,*] 20
© Copyright 2025 Paperzz