SEDE 2021:Papers with Abstracts

Abstract. Converting source code from one programming language to another is a problem that occurs regularly in real life, but has attracted limited attention and has not been inves- tigated systematically. This paper presents the challenges of translating a large source code from one high-level programming language to another, and the solutions developed to resolve these problems. Furthermore, the paper introduces a systematic classification of the occurring challenges during the translation process and elaborates on the ability to handle them without manual intervention by a programmer. A detailed performance comparison between the two language versions of the code is being performed.
Abstract. Phishing is a fraudulent attempt where attackers trick the victims into disclosing sensitive information under pretenses. This research project aims to develop a Google Chrome extension to detect phishing emails. We firstly collected a number of phishing email samples. Then we used text mining techniques to find out the words that are important in phishing emails. Next, we developed a classifier model that the Chrome extension will use to detect phishing emails using those words. The next step was to test the extension with phishing email samples and standard (non-phishing) email samples. After that, the evaluation metrics were collected. We found our extension was able to identify phishing emails and non-phishing emails with a relatively high degree of accuracy.
Abstract. AnthroFace is a rib fracture data collection and analysis web application. Currently researchers at the University of Nevada, Reno (UNR) are manually collecting data using .csv files and do not have it in a centralized location. The purpose of this software is to provide researchers an efficient way to collect and record relevant data and make meaningful interpretations from the dataset. The main features of AnthroFace can broadly be placed into three categories, a quality user interface system to efficiently collect and record relevant data, a database to store collected information, and tools to help the user make meaningful interpretations from the dataset.
Abstract. In the world of information technology, text mining is a widely popular methodology to extract the desired information out of the given pile of text data. Currently there are thousands of research papers/literatures published in the field of medical science related to the study of how microRNAs (miRNAs) can assist or impede the development of various types of cancer. mirCancer is a repository offered by East Carolina University to access details of cancer-miRNA association from more than 7000 research papers retrieved using rule based text mining technique. It would be a good value if we can create a machine learning model to extract the cancer-miRNA association details from the title and abstract content of these medical research papers. In this research paper, we have proposed a machine learning model which is designed and implemented using the open source NLP framework – BERT, provided by Google, to identify the cancer-miRNA relationship in the given abstract content of the research papers. We have also prepared the dataset required to train and validate the proposed model. The model developed by us performed with an overall accuracy of 90.3% in retrieving the required information from the research literatures of the test dataset and it can be useful for retrieving cancer-miRNA association information from future research literatures.
Abstract. Using algorithms to compose creative and pleasing music has been an ambitious research goal since the 1950s. This trend continues to this day with the help of widely accessible, highly sophisticated music research tools created by big companies, such as Google’s Magenta. Due to the sequential nature of musical pieces, Recurrent Neural Networks (RNNs), as well as advanced variants such as Long-Short Term Memory networks (LSTMs), have been successfully employed for this purpose. Music scores data is made up of features like duration, pitch, rhythm, chords, etc. As more music features are integrated into the composition process, the space of encodings required to represent possible feature combinations grows significantly, making the process computationally infeasible. This consideration becomes of huge significance in situations with polyphonic pieces, where additional features such as harmonies and multiple voices are present. With an emphasis on efficiency without sacrificing quality, this research aims to further demonstrate the effectiveness of LSTMs for automated music generation by learning from existing music scores data. More specifically, we show that training separated models to learn individual music features and combining results to generate new music is, overall, superior to the common practice of learning resource-intensive complex models that simultaneously incorporate multiple desired features.
Abstract. In this paper, we develop a real-time method for estimating the level of attention while performing a task. This method uses only a low frame rate video from a standard camera so that it can be available even on a small computer. Eye blinks and head movements are calculated from a video by using landmarks. Existing blink detection methods use standard frame rate videos, making them difficult to process on a computer with low processing performance. One obvious solution is to use videos with a reduced frame rate. We investigate the error caused by reducing the frame rate, and then to overcome the error, we further develop a new method that uses the head movements calculated from the reduced frame rate videos. Then we demonstrate the error is within acceptable ranges by using the method, and show it is effective to estimate the attention level. Since this method uses only landmark information obtained from facial images, it reduces the mental burden on the user, and also partially protects personal information. In this paper, we explain the details of the method and report the experimental results.
Abstract. The concepts of mutual inclusion and mutual exclusion are critical for concurrency control in distributed systems. Mutual exclusion is a property which ensures that at most one process can execute in its critical section at any given time. For example, other processes are not allowed to enter their critical sections when a given process is updating a shared variable in its critical section. If up to k processes can enter their critical sections, this is called k-exclusion. In contrast, mutual inclusion imposes restrictions on processes from leaving their critical sections. For example, to ensure reliability in a server farm, a certain number of servers may need to be available to service requests. If at least m processes must be available, this is called m-inclusion. Model checking is essential to verify and validate correctness and safety properties of distributed algorithms. The paper presents token-based models that can be used to verify and validate k-mutual exclusion and m-mutual inclusion algorithms where k refers to the maximum number of processes in their critical sections and m is the minimum number that must remain in their critical sections. Verification criteria includes the maximum number of messages that must be exchanged to enter or leave a critical section, deadlock freedom, and timing parameters. In addition, a model that includes both k-exclusion and m-inclusion is presented to demonstrate the feasibility of evaluating both mutual exclusion and mutual inclusion in the same model. Models are developed in UPPAAL, an environment for modeling, validation, and verification of real-time systems represented using timed automata.
Abstract. The Veteran Services Tracking and Analytics Program (VS-TAP) is a web application used to store and query the rate and duration of visitors within Veteran Services’ locations. The application accepts data from Navigate as well as a hosted demographics survey to dis- play statistics in a graphically meaningful way. Accumulating data from different sources allows stakeholders to create custom reports to compare multiple variables that represent student veterans.
Abstract. Networks are pervasive in society: infrastructures (e.g., telephone), commercial sectors (e.g., banking), and biological and genomic systems can be represented as networks. Con- sequently, there are software libraries that analyze networks. Containers (e.g., Docker, Singularity), which hold both runnable codes and their execution environments, are in- creasingly utilized by analysts to run codes in a platform-independent fashion. Portability is further enhanced by not only providing software library methods, but also the driver code (i.e., main() method) for each library method. In this way, a user only has to know the invocation for the main() method that is in the container. In this work, we describe an automated approach for generating a main() method for each software library method. A single intermediate representation (IR) format is used for all library methods, and one IR instance is populated for one library method by parsing its comments and method signature. An IR for the main() method is generated from that for the library method. A source code generator uses the main() method IR and a set of small, hand-generated source code templates—with variables in the templates that are automatically customized for a particular library method—to produce the source code main() method. We apply our approach to two widely used software libraries, SNAP and NetworkX, as examplars, which combined have over 400 library methods.
Abstract. Database tampering is a key security threat that impacts the integrity of sensitive in- formation of crucial businesses. The evolving risks of security threats as well as regulatory compliance are important driving forces for achieving better integrity and detecting pos- sible data tampering by either internal or external malicious perpetrators. We present DBKnot, an architecture for a tamper detection solution that caters to such problem while maintaining seamlessness and ease of retrofitting into existing append-only database ap- plications with near-zero modifications. We also pay attention to data confidentiality by making sure that the data never leaves the organization’s premises. We leverage designs like chains of record hashes to achieve the target solution. A set of preliminary exper- iments have been conducted that resulted in DBKnot adding an overhead equal to the original transaction time. We have run the same experimemts experiments with different parallelization and pipelining versions of DBKnot which resulted in cutting approximately 66% of the added overhead.
Abstract. Since 2008, when it was first cited, blockchain technology represents an innovation from both a structural and application point of view. Since then, thanks to its peculiarities and capabilities of implementing smart contracts, blockchain technology has undergone a strong development in different application domains. The interest around this technology also brought to the definition of several platforms facilitating its use and application. Due to their variety, , choosing the most suitable blockchain platform to support a specific business need represents a strategic problem. This paper proposes an analysis for the definition of an evaluation framework and related quality attributes, helping to characterize and compare different blockchain platforms for identifying the most suitable one to the implementation of smart contracts in a specific business context. The analysis of a set of blockchain platforms is proposed for discussing the applicability and use of the proposed framework
Abstract. Internet of things to applications have the potential to improve the quality of life, especially for elderly and disabled people. A smart home has been a beneficial system for elderly and disabled people. However, there are many innovative gaps to design, develop, install, and plugins for improving an intelligent home system. Basic activities should be initially concerned, such as electronic control systems, human movement detection, and environment monitor system. There are many issues in designing an intelligent home system, such as what the devices need to integrate into the system and designing a thin system. This paper explores the design and development of the automated light system in a smart home using the Internet of things and Mosquitto server. This research aims to design the workflow overview of the automatic light system based on sensors, devices, components, and servers. The system achieves the light control manually and automatically. MQTT server accomplishes to control and modify any devices in the IoT network system flexibly. The system's functionalities achieve, and the motion sensor capability is a concern of the sensing range covering the area.
Abstract. Machine learning technique usage has exploded in recent years, as has the utilization of virtual reality techniques. One area that these tools can be utilized is the practice of medicine. In this research, we propose a framework to visualize the position and rotation of human spines based on machine learning predictions. This framework approach is signifi- cant due to the importance of medical visualizations and organ tracking, with uses ranging from education of medical students, to surgical uses. Subsequently, using machine learning techniques with virtual reality offers real-time medical visualizations which is significant for surgery. According to our experiment results, our proposed framework can accurately predict position and rotation data.
Abstract. This paper will study the Time Series Antarctic Glacier Mass from April 2002 to March 2021. The objective of this paper is to forecast the Antarctic Glacier Mass level for 2021 to 2041. The Science studied is the Geoscience of the Glacier; the Technology applied is the GRACE-FO satellites to collect the Glacier Ice Sheet Mass data; Engineering focuses on the COVID-19 impact on the Glacier melting rate; and mathematical/statistical tools like Time Series ARIMA models are applied. Although the Glacier melting rate sped up recently before 2020, the COVID-19 situation might have slowed down the rate of glacier melting in 2020 in both Antarctic and Greenland. During 2020 COVID-19 period, Antarctic Glacier Mass seasonal pattern became a smoother single-peak cyclic pattern which is different from the double-peak cyclic pattern in 2002 to 2019. Authors conducted both non-seasonal and seasonal ARIMA models and concluded that only the Seasonal ARIMA Forecasting modeling algorithm can detect more reliable insights of the relatively small pattern change during 2020 period. The COVID-19 factor might have made certain impact on the Antarctic Glacier Melting rate. The Glacier Melting rate may have been slowed down by 20% in the 2020-2021 period.
Abstract. Approximate query processing (AQP) aims to provide an approximated answer close to the exact answer efficiently for a complex query on large datasets, especially big data. It brings enormous benefits into many data science fields when the efficiency of query execution weighs more than the accuracy. However, assessing the accuracy of an approx- imated answer from AQP deserves more study. Existing work usually relies on strong dataset assumptions which may not work for real-world datasets. In this work, we employ bootstrap sampling to assess the estimation errors of the AQP for selection queries (called σ-AQP). We implement a prototype system which can calculate confidence intervals for the estimated query results. Experiment results demonstrated that the confidence intervals generated by the prototype system can cover the ground truth of the query results with high accuracy and low computing cost. In addition, we implement optimization strate- gies for the bootstrap sampling which have significantly improved the overall computing efficiency.