案例JAVA 英文编程研究生:Advanced Computer Systems (A
当前位置:以往案例 > >案例JAVA 英文编程研究生:Advanced Computer Systems (A
2019-05-16

Advanced Computer Systems (ACS), 2017/2018


This is your final 5-day take home project for Advanced Computer Systems, block 2, 2017/2018. This project is due via Digital project on January 22, 2018, 23:59. The project will be evaluated on the 7-point grading scale with external grading, as announced in the course description.

Hand-ins for this project must be individual. Cooperation on or discussion of the contents of the project with other students is strictly forbidden. The solution you provide should reflect your knowledge of the material alone. The project is open-book and you are allowed to make use of the book and other reading material of the course. If you use on-line sources for any of your solutions, they must be cited appropriately. While we require the individual work policy outlined above, we will allow students to ask clarification questions on the formulation of the project during the project period. A clarification question should not include any solution ideas to questions in the project, but only pertain to the meaning of theproblem formulations and associated assumptions in the project text itself. These questions will only be accepted through the forums on Absalon, and will be answered by the TAs or your lecturer. The goal of the latter policy is to give all students fair access to the same information during the project period.

A well-formed solution to this project should include a PDF file with answers to all scenario questions as well as questions posed in the programming part of the project. In addition, you must submit your code along with your written solution. Evaluation of the project will take both into consideration.

Do not get hung up in a single question. It is best to make an effort on every question and write down all your solutions than to get a perfect solution for only one or two of the questions. Nevertheless, keep in mind that a concise and well-written paragraph in your solution is always better than a long paragraph.

Note that your solution has to be submitted via Digital project (https://eksamen.ku.dk/) in electronic format. In the unlikely event that Digital project is unavailable during the project period, we will publish the project in Absalon and expect any issues to be reported to the email [email protected]. It is your responsibility to make sure in time that the upload of your files succeeds. We strongly suggest composing your solution using a text editor or LaTeX and creating a PDF file for submission. Paper submissions will not be accepted, and submissions will only be accepted in Digital project.

Learning Goals of ACS

We attempt to touch on many of the learning goals of ACS. Recall that the learning goals of ACS are:

(LG1) Describe the design of transactional and distributed systems, including techniques for modularity,performance, and fault tolerance.

(LG2) Explain how to employ strong modularity through a client-service abstraction as a paradigm tostructure computer systems, while hiding complexity of implementation from clients.

(LG3) Explain techniques for large-scale data processing.

(LG4) Implement systems that include mechanisms for modularity, atomicity, and fault tolerance. (LG5) Structure and conduct experiments to evaluate a system's performance.

(LG6) Discuss design alternatives for a modular computer system, identifying desired system propertiesas well as describing mechanisms for improving performance while arguing for their correctness.

(LG7) Analyze protocols for concurrency control and recovery, as well as for distribution and replication.

(LG8) Apply principles of large-scale data processing to analyze concrete information-processingproblems.

Scenarios

Question 1: Data Processing (LG3, LG8)

A car rental company BestCars maintains an operational database storing information on their customers, cars, as well as rental records.

The management of BestCars would like to analyze the interest of their customers in terms of the size of cars. In particular, they assign a bunch of data analysis tasks to you, the best data scientist and developer in BestCars. One of the tasks is as follows: “Produce a list of ids of the users who have only rent big cars, but not small cars.”

Since the analysis is very heavy, to avoid any interference with the operational system, the necessary data for your task has been pre-extracted outside of the database and stored separately. For the aforementioned task, you are provided with two heap files storing two tables whose schemas are as follows: (1)

Cars(car_id, make, model, size_category), and (2) Rentals(user_id, car_id, time). You can assume thevalues of size_category can be either “small” or “big”. You are provided with one server with memory larger than the square root of any table, but insufficient to hold any one of them completely. This applies to any temporary tables that are generated by your algorithm. Therefore, you have to resort to external memory algorithms to solve the problem. If you need to make any further assumptions in your answers, please state them clearly as part of your solution. Please write down your solutions to the following questions.

1. State an external memory algorithm with the minimum cost in terms of I/O to answer the query above. Argue for the algorithm’s correctness and efficiency.

2. State the I/O cost of the algorithm you designed. You can make your own notation of the size of each attribute in any table.

NOTE 1: To state an algorithm, you can reference existing sort-based or hash-based external memoryalgorithms. You should not state all the steps of these existing algorithms from scratch again, but you should clearly state the steps that you need to change in the algorithms you reference, and also how you change these steps. To describe how you change a step, refer to the step and list the sub-steps that need to be executed to achieve your goal.

NOTE 2: Instead of just using several existing external memory algorithms in sequence as black boxes,you should design a single algorithm that addresses the whole task holistically. That is why in NOTE 1 we expect that you will need to show changes to steps of existing algorithms, if necessary.

NOTE 3: Your solution will be graded partly according to the optimality of your algorithm and the qualityof the justifications. That is to say, a solution with justified lower I/O cost will in general help you get a better grade.

Question 2: Replication (LG7)

Three processes, P1, P2, and P3, replicate a common object. The object is updated asynchronously by three different clients, and each client happens to dispatch its update to a different process out of P1, P2, and P3. After receiving each update, the respective processes incorporates the update in its local replica

and then propagates the new value to the other processes. The precise exchange of messages is shown in the following figure:



image.png



Figure 1: Exchange of messages among processes


Note that P1 got a client update at A, P2 at B, and P3 at D; the clients are omitted from the figure to avoid cluttering. The system uses vector clocks to keep track of the object version internally. You may assume that all local clock values start at zero at the beginning of the interaction shown. In addition, assume that only updates from clients cause an increment of a local clock, i.e., not every event causes an increment of the local clock except for original client updates. Therefore, (1,1,1) is the maximum vector clock possible in the given interaction. The latter policy ensures that, if update propagation reaches all processes, then we expect the object’s version to converge as either overwriting or merging through a commutative associative function takes place.


Every time an update is received by a process P, the process chooses one out of the following actions to incorporate the update:


(A1) Keep P’s own value;


(A2) Overwrite P’s value with the latest value from the message;

(A3) Call an application-specific merge function to combine both values consistently.


For each point in the execution from A to K shown in Figure 1, give the value of the vector clock at the corresponding process before the update event is processed, the value of the vector clock received in the incoming message or sent in the outgoing message, the action taken by the process to incorporate the update, and the value of the vector clock after the update is processed. Use a table with the following format to report your answer:



Event

Vector Clock at

Message Vector

Action Taken by

Vector Clock at

Process Before

Clock

Process

Process After

A









B





















In addition to the table, provide a paragraph with a brief justification for your reasoning for constructing the table, i.e., discuss when each action should be taken depending on the vector clock values at the given message and event.



Question 3: ARIES Recovery (LG7)


In the recovery scenario described in the following, we ask you to explain the functioning and justification behind the ARIES recovery algorithm. At the beginning of time, there are no transactions active in the system and no dirty pages. A checkpoint is taken. After that, four transactions, T1, T2, T3, and T4, enter the system and perform various operations. The detailed log follows:


LOG

LSN

PREV_LSN

XACT_ID

TYPE

PAGE_ID

UNDONEXTLSN



——–

——-

—-

——-

———–

1





begin CKPT





2





end CKPT





3

NULL

T1

update

P3



4

3

T1

update

P1



5

NULL

T2

update

P1



6

NULL

T3

update

P3



7

5

T2

update

P2



8

NULL

T4

update

P2



9

4

T1

update

P4



10

8

T4

commit





11

9

T1

abort





12

7

T2

abort





13

12

T2

CLR

P2

A

14

10

T4

end





15

11

T1

CLR

P4

B

16

15

T1

CLR

P1

C

17

13

T2

CLR

P1

D

18

16

T1

CLR

P3

E

19

18

T1

end





——– XXXXXXX ——  CRASH!

—— XXXXXXX ——–


A crash occurs at the end of the log as shown. Considering again that the system employs the ARIES recovery algorithm, answer the following questions and justify each of your answers.


1. In the log, the undonextLSN values for several CLRs are represented by the letters A–E. For each letter, state the value of the corresponding undonextLSN and why it has the given value.


2. What are the states of the transaction table and of the dirty page table after the analysis phase of recovery? Explain why.

3. Show the additional contents of the log after the recovery procedure completes. Provide a brief explanation for why any new log records shown need to be added.

4. In the system shown, let us assume that the database is memory-resident, i.e., upon start-up all contents of nonvolatile storage are loaded into volatile storage (which is large enough to cache everything). Thus, no page replacements are ever necessary during normal operation. Furthermore, assume plenty of log buffer space is available and no background page writes are performed. Under these conditions, answer the following:


a. Which log records could trigger writes to stable storage during normal operation in the scenario above? Why?

b. Following the ARIES log record structure introduced in the course for update records, is there any information that would not need to be written to stable storage in such a scenario? If so, explain which information and why. Otherwise, justify why all information in these records needs to reach stable storage.




Programming Task


In this programming task, you will develop a simplified edge-to-cloud distributed interpreter abstraction in a scenario inspired by emerging self-checkout supermarkets. Through the multiple questions below, you will describe your design (LG1), expose abstractions as a services (LG2), design and implement these services (LG4, LG6, LG7), and evaluate your implementation with an experiment (LG5).


As with projects in this course, you should implement this programming task in Java, compatible with JDK 8. As an RPC mechanism, you are only allowed to use Jetty together with XStream and/or Kryo, and we expect you to abstract the use of these libraries behind clean proxy classes as in the projects of the course. You are allowed to reuse communication code from the projects when building your proxy classes. You are also allowed to use skeleton code from project 3 to orchestrate experimental workload and measurements as well as JUnit testing skeleton code from the projects.

在线提交订单