91天堂,亚洲av乱码一区二区三区按摩,国产福利小视频在线一区二区,午夜在线不卡精品国产

中國項目管理資源網(wǎng)

IT項目啟示錄——來自泰坦尼克號的教訓(第十二篇)

2005/12/14 17:50:56?|? 2531次閱讀?|? 來源:原創(chuàng)?? 【已有0條評論】發(fā)表評論

文/Mark Kozak-Holland  譯/楊磊

回顧泰坦尼克號當時的情形:當船重新起航后(見第10部分),滲水演變成一場大災難。當晚12時45分左右,即在船體擱在冰架上65分鐘后,船長令指揮員們打開救生艇并把所有乘客和船員召集到甲板上。船員們因不清晰的溝通而處于困惑之中(見第11部分),行動遲疑,不相信一切已經(jīng)不對頭了。畢竟,其時大災難的跡象尚未顯見。

在今天,災難恢復的概念是把在線運行轉移到另一個替代性的服務環(huán)境。但是形式卻是多種多樣的,從數(shù)天內完成單個應用的數(shù)據(jù)/文件的簡單恢復,到數(shù)分鐘小時內就得完成整個業(yè)務運行的相對復雜的恢復。災難可能呈現(xiàn)三種態(tài)勢,即:完全狀態(tài)(絕對而立即),急迫而逼近,緩慢而無毒害。當災難被確認后,應急計劃就啟動了,災難也將被公諸于眾。

在泰坦尼克號上,災難屬于緩慢而無毒害型的。雖然全面的恢復計劃不再可行,船長與指揮官們仍可展開局部的恢復。而在缺乏正式的撤離或災難恢復計劃的情況下,他們能做的也只能是在災難跡象明顯之前,發(fā)令阻止恐慌和混亂的蔓延。在設計時(見第3部分)對災難恢復的場景假想,是用救生艇把乘客們轉移到另一艘船上并帶回港岸,就是說,救生艇會往返運載乘客,因此對其數(shù)量的要求就很小。但這一假想的前提是基于泰坦尼克號是不會沉沒的,至少能自己漂浮在海上待援。

而今我們開發(fā)一個災難恢復計劃時,必須考慮全IT方案中可能引發(fā)災難的所有形式的故障。例如:
●技術上的物理故障或有形缺陷
●設計錯誤,含系統(tǒng)/應用程序軟件設計的失敗和代碼問題
●由運行操作人員因事故,不熟練,培訓不足,不按規(guī)程甚至蓄意惡意造成的運行失敗

環(huán)境(如動力系統(tǒng),冷卻系統(tǒng),連同網(wǎng)絡)的故障,可以和自然災害、恐怖行動一樣,對運行中心造成同等的破壞。

在過去400年中,絕大部分與橫渡大西洋有關的環(huán)境因素,都已經(jīng)被發(fā)現(xiàn),植入圖表和載入文檔了。內容包羅萬象,從全年的自然情況(如海流的變化),天氣情形(如風暴和颶風),到自然危害(如海上濃霧,冰原,冰山帶和危險的海岸線,礁石等等)。然而,在泰坦尼克號項目中彌漫的一種信念就是,這艘不會沉的巨大鐵船能應對一切自然問題。

在設計一個災難恢復計劃時,還需考慮災難的級別。比如,當較小的風暴,火災或者水淹來襲時,你的顧客希望得到某種相對迅速的應急服務?,F(xiàn)在,你就需要對所有這些都準備應急措施,以至對更大的災難也一樣。

災難恢復的相關費用,會因耗時,引發(fā)原理,恢復程度的不同而相異。這些費用,應作為計劃的一部分,針對每個特定的IT方案對象,仔細確定。

對泰坦尼克號而言,按海運慣例本應有一個考慮到了上述一切情況的災難恢復計劃,來將所有人帶到救生甲板,把他們轉移到座位寬綽有余的救生艇上,安全放下并讓訓練有素的船員帶走他們。在金斯頓的救生艇訓練中,應該已經(jīng)測試過計劃中的這后一部份(見第5部分)。

在生產(chǎn)環(huán)境下大量的嚴重問題都開始于無毒無害的狀態(tài),即在問題剛開始時,你的組織也許甚至都不會留意到它及其影響后果。如,IT方案中一個不緊要的部分停下來了,未被注意,但是因為各個部件和應用之間的內在關聯(lián),出現(xiàn)一種連鎖效應并很快使得該方案的其他部分受到影響,這將在極短時間內引發(fā)大的災禍。

在泰坦尼克號上,救生艇的釋放明顯晚了,說明方式猶豫到最后才不得不發(fā)放的。指揮員的緩慢反應,可能因為總覺得該船不可能沉沒,事態(tài)也不明顯,當時一切都尚顯正常。還有,900船員中,真正意義上的水手只有83個(見第5部分),只有這些人掌握了把30英尺長的救生艇(可乘65人)怎樣放到60英尺下海面上的復雜操作。這樣的救生艇一共16艘,此外另有4艘較小的可拆裝式的稱作Englehardts的救生艇(可乘45人)。

結論

如今,不少IT項目完全忽視災難恢復,其理由是不在項目范疇內,和另有年度計劃流程來覆蓋。IT項目本身除了確立商務理由,針對IT方案進行設計外,其實也包括了對所需恢復展開深入的了解。對影響IT方案的災難后果所作的嚴肅思考,需在項目早期盡早完成,以便對整體的災難恢復計劃進行調整。下一部分我們仍將著眼于災難恢復。

原文:

In recapping Titanic’s situation, following the restart of the ship (Part 10) the flooding became catastrophic. Around 12:45 p.m. , 65 minutes after the initial grounding on the ice shelf, the captain gave orders to the officers to uncover the lifeboats and get the passengers and crew ready on deck. The crew, confused by unclear communication (Part 11), operated in a state of disbelief, refusing to believe that anything was wrong. After all, there were still few signs of the disaster.

In today’s world, disaster recovery is the concept of switching the online operation to an alternate service-delivery environment. However, it takes many shapes and forms, from the relatively simple recovery of data and files from a single application in a timeframe measured in days, to the relatively complex recovery of a complete business operation in a timeframe measured in minutes or hours. A disaster can take three forms, namely: total (absolute and immediate), rapid and imminent, slow and innocuous. When a disaster is recognized, contingency plans are invoked and a disaster is declared.

On board Titanic, the disaster was slow and innocuous. Although a full recovery was not feasible anymore, the captain and officers could enact a partial recovery. But without a formalized evacuation or disaster recovery plan, the best they could do was to bring some order to prevent widespread panic and chaos once the disaster signs became more obvious. The envisioned scenario for disaster recovery, at the time of the design (Part 3), was to transfer passengers through lifeboats to another ship and then deliver them to port. The lifeboats would ferry passengers back and forth to the rescue ship, requiring a much smaller total lifeboat capacity. This scenario was based on the perception that Titanic could not possibly sink, but would float in an incapacitated state waiting for help.

In today’s world in defining a disaster recovery plan, thought needs to be given to all the types of failures that could possibly happen to an IT solution and lead to a disaster. For example:
· Physical faults or failures in the technology
· Design errors which include system or application software design failures and bugs
· Operations errors caused by operations services staff because of accidents, inexperience, lack of due diligence or training, not following procedures or even malice
Environmental failures can be equally devastating, such as those in power supplies, cooling systems and network connections--as can natural disasters and terrorist activities against the operation center itself.

In the past 400 years, most environmental factors related to crossing the Atlantic had been observed, charted and documented. This included everything from year-round natural conditions like changing ocean currents and weather patterns like storms and hurricanes to natural hazards like fogbanks, ice fields and iceberg areas, and dangerous shorelines and rocky outcrops, etc. However, a belief had evolved during Titanic’s project (Part 4) that anything that nature could hand out could be handled by this enormous iron ship that was practically unsinkable.

In defining a disaster recovery plan, the scale of disaster is important to consider as well. For example, if a relatively minor storm, fire or flood knocks out your online operation, your customers are going to expect some contingency of service relatively quickly. In today’s world, you need contingency for all of these, even the most catastrophic disasters.

The associated costs of disaster recovery vary, based on the window of recovery (time), the elements of the disaster and the degree of recovery required. As part of a plan, these costs need to be carefully determined specifically for the IT solution created.

For Titanic, under maritime convention there should have been a disaster recovery plan defined for all the above situations that brought everyone onboard to the lifeboat deck, loaded them into the lifeboats with places to spare, lowered the lifeboats safely, and put them adrift with experienced crews to handle them. The life boat drill in Queenstown should have tested the latter part of the plan (Part 5).

Many serious problems with a production environment can start so innocuously that, in the first hour, your organization might not even be aware of it or its implications. For example, a less-critical part of the IT solution might be "down," so it goes unnoticed. However, because of interdependencies between components and applications, there tends to be a "knock on" effect and very quickly other parts of the IT solution can become affected. This leads to a catastrophic failure in a very short time.

On board Titanic there was a major delay in getting the lifeboats down, indicating a hesitation to launch the boats until as late as possible. It is likely the officers reacted slowly for several reasons: the ship was believed to be unsinkable, the gravity of the situation was not apparent and everything appeared so normal at the time. Also, only 83 of the crew of 900 were actual mariners (Part 5) and therefore familiar with the somewhat complex drill of lowering a 30 foot (65 person) lifeboat 60 feet to the water. There were 16 of these lifeboats in total, plus four smaller collapsible lifeboats (45 person) or "Englehardts."

Conclusions
Today, many IT projects completely ignore disaster recovery as something beyond their scope and covered off by a yearly IT planning process. Yet it is the IT project that determines the business justification and design around the IT solution, and develops an in-depth understanding of the kind of recovery that is required. Serious thought needs to be given to the consequences of a disaster impacting the IT solution, and this needs to be done early enough in the project so that adjustments to the overall disaster recovery plan can be made. The next installment will continue to look at disaster recovery.

【?發(fā)表評論?0條?】


網(wǎng)友評論
網(wǎng)友評論(共0 條評論)..

請您注意·自覺遵守:愛國、守法、自律、真實、文明的原則
·尊重網(wǎng)上道德,遵守《全國人大常委會關于維護互聯(lián)網(wǎng)安全的決定》及中華人民共和國其他各項有關法律法規(guī)
·嚴禁發(fā)表危害國家安全,破壞民族團結、國家宗教政策和社會穩(wěn)定,含侮辱、誹謗、教唆、淫穢等內容的作品
·承擔一切因您的行為而直接或間接導致的民事或刑事法律責任
·您在中國項目管理資源網(wǎng)新聞評論發(fā)表的作品,中國項目管理資源網(wǎng)有權在網(wǎng)站內保留、轉載、引用或者刪除
·參與本評論即表明您已經(jīng)閱讀并接受上述條款