[CE 1.1]
The following career episode was based on my work experience for New H3C Technologies Co., Ltd. (hereinafter referred to as the “H3C”) in Shenzhen Guangdong China. H3C is one of the vendors which delivers communication and networked storage devices in China. A networked storage device is used to store and retrieve data through network protocols in the telecommunication infrastructure. The duration of the project was from the 12th of January 2009 to 17th of March 2009.
Project Title | Design Solution & Synthesis Strategy for Network Storage Device |
Duration | January 2009 – March 2009 |
Location | Shenzhen, China |
Organization | H3C Technologies Co., Ltd. |
Position | Engineer |
[CE 1.2]
I received many feedbacks and complaints from customers that they suffered many services interrupts during the process of business systems in which H3C’s networked storage product was employed. Such interrupts affect the stability of their business systems, so they expected that H3C was able to investigate what problem(s) caused such malfunction, and then solved the problem(s). A project was launched to deal with this situation.
[CE 1.3]
The overall aim of the project was to improve customer experience and to enhance stability when using the storage system based on the design solution & synthesis strategy.
The detailed objectives of the project were provided as follows:
[CE 1.4]
The nature of my work area in H3C’s storage product department is to design and develop the storage devices by applying theories of mathematical analysis and computer science to design, evaluate or develop storage applications and devices that are crucial for making storage products work. My job begins by analyzing the requirements of a customer and then designing, developing, and testing the system to determine whether those needs are met. Flowcharts and UML diagrams are drawn during the process as well as algorithms that inform the server what and how to do.
[CE 1.5]
The following chart illustrates the organizational structure highlighting my position during the career episode.
[CE 1.6]
My duties in the project are presented as follows:
[CE 1.7]
I initiated the work with the analysis made on the complexity of the problem and I covered many factors and permitted a good deal of fuzziness at the edges. Based on my knowledge of physics, computer science and engineering methodology, I broke down the problem into four potential directions which needed to be investigated. It referred to data center environment, hard disk drive (HDD), storage server and storage system.
[CE 1.8]
After completing a training on safety specification of a data center, I went to fields of customers’ data centers to obtain raw environmental data which described temperature and humidity. With the permits of customers, I also collected the sense-data which was a kind of debug information of HDD and server adapter card. I cleansed; normalized all raw data; and imported them into Excel so that I was able to use Excel to interpret tendencies and to evaluate the influence that distinct changes made and how they affected stability. This included activities like correlating two or more factors to changes in the outcome as well as exploring and recognizing statistical aberrations in raw data.
[CE 1.9]
I Considered the customer data extreme sensitiveness and understood my accountability to protect their data. I only used such data within the scope of the project, and any other request for those data was prohibited. Until the project finished, I destroyed all data to prevent any disclosure to the outside world. I also read materials about the storage system and HDD. I synthesized key information in them and raw data from the fields. It was obvious that both temperature data and humidity data were within the normal range specified in the datasheets of HDD and storage product. Furthermore, I decided not to put attention to environmental factors anymore. I needed to focus on HDD, storage server and storage software system.
[CE 1.10]
Problem Faced: I determined the sensing data which was based on the crucial information and it assisted in determining the root cause of HDD and adapter card failure. Because of ruling out environmental factors, it became possible to reproduce the issue in my development system. I defined one sense data type as “media error” and it was among the collected data. Solution: I used a storage protocol analyzer to capture messages between HDD and adapter card to discover what happened with the advent of the sense data. It took me almost five days until I discovered the sense data in the log message. By analyzing captured messages, I found an “IDNF” error code defined as that the requested address in HDD was not found. However, I was not able to draw any conclusion, because I must ensure it was not a coincidence. I used storage pro<="" p="" style="box-sizing: border-box; outline: 0px !important;">
[CE 1.11]
I organized captured messages and wrote a document in English. I submitted them to the HDD vendor of USA. I also initiated a telephone conference with their engineers. At the conference, I introduced the background; presented the document; and expressed my conjecture about HDD. One of the vendor's engineers argued that in theory, it was more likely that vibration of HDD and/or chassis caused the IDNF. Another vendor’s engineer said they had received many similar cases throughout the world. Although I expressed my respect and appreciation to their viewpoint, I had a slight skeptical of it. It was because I only saw the phenomenon on the HDD of the vendor. There were three vendors to provide HDD for our storage product, and other vendors’ HDDs worked well. I discussed both views and negotiated the next plan.
[CE 1.12]
I requisitioned an accelerometer and attached it to the chassis and the HDD mounted to the chassis for validating the theoretical hypothesis about vibration. I recorded the vibration data of HDD and chassis respectively. The data indicated vibration was within the normal range. The vendor reconsidered this case, and later they released new firmware. I verified the release and the sense data was disappeared from the log messages.
[CE 1.13]
Problem Faced: There were around hundreds of types of sensitive data while most of them were unknown to us. It was possible that there were many undetected problems in the firmware of the HDD and/or adapter card. I noticed that only focusing on failures with obvious characteristic (such as the above IO error) was not enough, because it would leave risks in the system. I needed to find a solution which could deal with both discovered and potential unknown problems. Solution: I used inductive approach for analyzing failure data from the fields so that I was able to find a new failure model of HDD and adapter card. All failure data was generalized into two categories: IO error and IO timeout. For IO error, I orderly chose to retry, to reset and to power cycle as methods to restore HDDs and/or adapter card. If to retry IO could make IO successful, then to reset and to power cycle were not required. If not, the storage system would reset HDDs and/or adapter card. If it didn’t work, power cycling HDDs and/or server adapter card would be the last method. If such means still didn't affect, the system would mark the HDD and/or adapter as abnormal. Customers needed to contact H3C to replace them. For IO timeout, only to reset and to power, the cycle was conducted because my experiment indicates that to retry failed IO was hardly effective.
[CE 1.14]
I was able to find software interfaces used for retrying and resetting, but there was not a ready interface to power cycle. I asked for help from the hardware team, and later they provided me with an interface to the power cycle. Now that the basics were in place, I summarized all the above works as a written specification of the design solution. Within the specification, I drew a flowchart and UML sequence diagram to depict how the above software logic worked. Then according to the specification, I wrote a C language program which was integrated into the storage system.
[CE 1.15]
I collected many failed and abnormal HDDs and adapter cards from development environments, and my testing colleagues and I installed them in the testing environment. Testing result showed that the new release successfully detected errors with both obvious characteristic and no traits; restored the storage system timely in case of service interrupts; and marked unrecoverable HDDs/adapter cards.
[CE 1.16]
I executed the delivery of the new release to customers which meant the finishing of the project. I received many satisfactory feedbacks in the following years, and even though some feedbacks could not be described as satisfactory, customers admitted that the storage system was much more stable than before. Therefore, I achieved the objective of the project; that was, reducing service interrupts affecting the stability of business systems. To achieve the goal, I defined customer requirements; identified risks and issues in the storage system; proposed the design solution; reviewed and revised such solution; implemented the solutions as a part of the storage system. Overall, my technical skills were improved appropriately with this project execution.
We hold the apex position in providing services regarding CDR writing for engineers Australia. We are known to have very high success records for consistent team of professional writers having years of experience in the field of CDR preparation. We provide the best and trusted service for CDR writing and reviewing of all kinds of engineering disciplines. We provide services for career episode writing, plagiarism check and removal etc.
Should you need any further information, please do not hesitate to contact us.
Contact: +61-4-8885-8110
WhatsApp: +61-4-8885-8110
(Australia, USA, UK, UAE, Singapore, New Zealand)