share
Stack OverflowWhat are software practices in mission-critical industries (e.g. nuclear power plant)?
[+82] [39] Pavel Chuchuva
[2009-07-01 23:07:30]
[ safety-critical ]
[ http://stackoverflow.com/questions/1071762/what-are-software-practices-in-mission-critical-industries-e-g-nuclear-power-p ] [DELETED]

Edit
What software practices are being used in mission-critical industries where safety is paramount? For example nuclear power plant.

Update
Originally this question was: How would you develop software for a nuclear plant? I have changed it to save good answers. I'm also making this question community wiki. Please help to word it better!

An interesting question are there resources somewhere what kind of testing and evaluation processes are used for software in such critique environments? - Janusz
(1) How is this not a real question? He's asking if developing software for the nuclear industry has any specific practises to it. How do you explain the lack of a question? - DeadHead
(11) +1 and +reopen. Interesting question - I'm sure lots of us could learn something. - RichieHindle
(3) If there were any, they would probably not be allowed to make that information public. - shoosh
(12) The software practices, at least a decade ago, were simple: No safety precaution or operating control would rely on software, since it couldn't be properly analyzed for failure modes. Sofware was only used for log keeping, communication facilities, etc. So there isn't much to learn in terms of how to build reliable SW. I think you might be better off looking to "Safety of flight" rated systems. - mpez0
(2) "Safety of flight" would be interesting in light of the recent Air France loss. - Nosredna
Well, for one thing, I'd hope that the plant is designed with physical systems that prevent run-away accidents from occuring even in the event of multiple failures. All leading designs for nuclear power plants ([Pebble Bed Reactor][en.wikipedia.org/wiki/Pebble_bed_reactor#Safety_features], [Light Water Reactor][en.wikipedia.org/wiki/Light_water_reactor]) work in this way. - TokenMacGuy
(8) Reeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeally carefully. - Wayne Koorts
(2) /me checks to see if the OP is Iranian... - annakata
(1) Honestly, it sounds like this question could only be answered by people with knowledge of how nuclear plants are constructed (e.g. what the requirements are, where failsafes are required, etc.). Without that, the best anyone can really offer is "very carefully". - Kevin Pang
Great question. - David P
Pavel, you may want to look at meta.stackoverflow.com/questions/1986/… and give your opinion about what should be done with this question and others like it. - John Saunders
@Janusz: Each jurisdiction, usually by state or province has their own regulatory guidelines that must be adhered to. The OHSA/HSE/NEB or your local equivalent can provide you with necessary documentation for this. - BenAlabaster
Whether the question was meant as a joke, or it was serious, it's resulting in some serious answers. I think it's a valid question if recast as "... for a mission/life-critical situation" - Euro Micelli
(1) Should be a wiki, since I doubt there's any way to give true definitive (i.e. non-speculative) answers. I imagine that anyone who could answer this question is probably bound by some sort of national security non-disclosure agreement not to discuss this sort of thing. - gnovice
[+80] [2009-07-02 00:04:13] JP Alioto

Well, not Java. According to the license agreement [1] ...

You acknowledge that Licensed Software is not designed or intended for use in the design, construction, operation or maintenance of any nuclear facility

[1] http://www.java.com/en/download/license.jsp

(45) I guess I will have to use PHP then ... - too much php
(24) Wait, so Sun is an inappropriate name for the company. lulz - Overflown
It's oracle now I think! - JP Alioto
(10) We had to rewrite an iTunes-based reactor control system for the very same reason - dbr
(8) @1alstew1 Sun is really pushing for the use of solar power as they have complete monopoly in that market. - too much php
(5) LOL That's a great restriction! Sun doesn't want to be blamed for Java being slow and crashing and blowing up half a state. - Chris Pietschmann
1
[+50] [2009-07-02 00:42:14] clemahieu

I would use Eiffel with Design by Contract for correctness. Coincidentally it's already used in nuclear plants

DbC is a precondition/postcondition for each method, checked by the runtime during development. It includes class invariants that are checked before and after each method invocation during development. The precondition/postcondition and invariants form an exact specification on a module's interface.

http://dev.eiffel.com


(3) I believe this psot deserves being voted much higher up, provided the Eiffel-statement is correct :) - Cecil Has a Name
2
[+47] [2009-07-02 23:04:26] BenAlabaster

I haven't worked specifically in nuclear production, but I have ample experience in system development where environmental safety (and human safety for that matter) is paramount. A lot of the development I have done in my career has been for use in this type of environment - whether it be Oil & Gas, Hydro-Electric production and could even be used in nuclear facilities, although I've yet to have that final honour - thankfully, perhaps.

The large majority of these types of systems are developed using SCADA [1] systems and some form of HMI control system - which is what they call the GUI in industrial systems. This is usually an IDE built on a system designed purely for this purpose - CygNet [2], Wonderware [3], iFix [4] or FactoryLink [5] or similar.

Whenever you're coding for this type of environment, your first concern is failsafe. I will simplify to demonstrate my point (at the risk of being chastized by the SCADA community), but a system like this is controlled largely by hardware with safety limits hard-wired, firmware controlled and then software controlled.

The hard-wired limits are the outside boundaries of safety. In the event that firmware or software fails and these limits are breached, the system automatically shuts down. For instance on an oil pipeline this might mean closing a valve on a well to prevent an explosion at one end, or may mean venting excess to atmosphere or a burner if necessary.

Firmware limits are usually predetermined safety limits, considered safe for general use to push the system to.

Software is then used by an operator who will tweak the system to get the best possible performance or to meet other business targets - i.e. most power, coolest operating temperature, optimal performance etc.

In the event that anything fails, the underlying system takes over and operates safely. This means that in the event that the application were to fail catastrophically, the firmware built in the hardware controls can still operate the system safely. In the event that the firmware fails the hardware faults safely - i.e. shutting the system down to prevent environmental or human catastrophy.

[1] http://en.wikipedia.org/wiki/SCADA
[2] http://www.cygnetscada.com/
[3] http://global.wonderware.com/EN/Pages/default.aspx
[4] http://www.gefanuc.com/products/3311
[5] http://www.inotek.com/Catalog/usdata1cn.html

Useful answer, good for mentioning hardware. Anyway SCADA software is programmed in a high level language too, for example in C++, lately C# .NET, then SCADA programmers could be now reading this post looking of a new tool to fit future development.. They are now in a paradox "Build my next SCADA with SCADA ?", - Hernán Eche
3
[+42] [2009-07-12 16:48:39] Michelle

Each CANDU nuclear unit (reactor and turbine-generator) is controlled [1] with two independent, 100% redundant computers. An important design concept is that control systems and safety systems are kept completely independent.

Normal operator interface with the computer

The panel closest to the camera, is the panel for one of the computers. The hand-switch selects which computer is controlling. One computer is controlling, one computer is running on standby. The orange lamps show which programs are executing. The CRT is displaying the contents of core memory for specific locations. L-3 MAPPS [2] is a current computer supplier.

Where I work, the computers are Varian V72s. Hardware defences include core parity checking logic, restart detection logic and peripheral failure sensing logic. A multiple level interrupt scheme is used where internal interupts are given a higher priority over external interrupts. The two computers communicate through a data link. A program fault on one computer gracefully degrades the control and automatically transfers control to the other computer. Each computer is powered by independent high reliability power supplies. If both computers fail, the unit is shutdown, by the fail-safe insertion of neutron absorbers. In summary, the design concepts include redundancy, independence, graceful degradation and failing safe.

The computers are programmed in Assembler, using absolute coding. (The core location and contents of every word of coding is known by looking at the listing without having to refer to a core map and then use octal arithmetic.) Breakpoint hardware is used for debugging. An executive program runs in core memory which schedules, executes and checks various other control programs and service functions. Core memory is also used for input/ouput validation and alarms. Fast periodic programs like the watchdog timer and reactor power control, remain in core memory and run every half second. Slow periodic programs like boiler level control run every two seconds. Version control is rigourously applied using both software and administrative methods. Changes are tested, installed on the controlling computer, and confirmed safe, before being installed on the standby computer.

[1] http://www.aecl.ca/Assets/Publications/C6-Technical-Summary.pdf?method=1
[2] http://www.mapps.l-3com.com/html/power/candu.html

Thanks for that. Did you actually mean core == actual magnetic cores? - John Saunders
Probably. This is in a nuclear power plant. Consider what gamma rays do to semiconductor memory. - Windows programmer
Come to think of it, since this is in a Canadian nuclear power plant, how did they get permission, and why would they want to, omit half of the official languages from mission critical safety controls? - Windows programmer
(1) +1: Impressive answer - Stefano Borini
I would guess that core is not used because of its (alleged) resistance to nuclear energy. More likely, it was state-of-the-art and known to be reliable at the time the nuclear plant was designed. - Barry Brown
What were the reasons for using assembler? - Jason
An informative answer, and without losing your security clearance, I presume. :) - Christopher Harris
+1 and I need to comment to put this in my comment history :) - ee.
4
[+40] [2009-07-02 00:57:01] Gerard

Over a VPN. As far away as possible


(1) What if it crashes and you need to reboot? - drikoda
(17) Get one of the chaps in the lead pants to press the reset button - Gerard
This is a hard real time software, so there isn't exists many chances that you could do anything if it crashes. - eKek0
5
[+19] [2009-07-01 23:50:24] Stefano Borini

I am not an expert. I just tell you what I heard.

For mission critical systems, the reference language is Ada. The development process is very strict and focuses on a test driven strategy with very small and highly tested (and stressed) routines. To address the potential of a crash, there is not only one system, but multiple redundant systems (in terms of sensors and processing units) which perform a "voting" procedure.

I don't know more than this.


(8) It's Ada, not ADA. Ada is not an initialism or acronym, it was named after Ada Lovelace. - Bryan Oakley
fixed it, thanks - Stefano Borini
Link to Ada info, it sounds interesting and was built under contract of the DoD: en.wikipedia.org/wiki/Ada_%28programming_language%29 - rmoore
Wasn't it an Ada bug that caused the gazillion dollar Arianne 5 to crash? - Hans Malherbe
Dont' remember. I do remember however that it was an overflow of a 32 bit value stored in 16 bits. This lead to a crash of the first system, kicking in the secondary system who had the same bug. and then fireworks... - Stefano Borini
@Hans, actually, according to wikipedia, the problem was caused by disabling a language feature for performance reasons - Jader Dias
@Hans: It's more complicated. The critical bad decision was to reuse a software component from Ariane 4 without throughly checking to see if it worked on Ariane 5. Ariane 5 lifts off a whole lot faster than Ariane 4, so a position indicator overflowed a 16-bit number. - David Thornley
6
[+18] [2009-07-02 21:53:14] Mike Robinson

I'd probably use some combination of Windows ME and Visual Basic 6. Then I'd RUN LIKE HELL.


(1) Damn, you must be able to run reeeeeeeal fast. Must be like the "Bomb Disposal - If you see me running, try and keep up". - BenAlabaster
(3) We should send you to Iran to work on their program. - Dave Markle
This is NSFW I started to laugh! +1 - Hernán Eche
7
[+16] [2009-07-02 08:22:48] RichieHindle

A strange game. The only winning move is not to play.


(1) Couldn't resist: xkcd.com/601 - ya23
(3) Wait, even that loses. Perhaps you should program a chess game instead? - nilamo
but I'm crap at chess! - Breton
8
[+14] [2009-07-12 17:39:16] Pablojim

Not exactly a nuclear power plant but similarly mission critical is software for manned space missions.

The Nasa mission critical software process works off 4 propositions:

  1. The product is only as good as the plan for the product.
  2. The best teamwork is a healthy rivalry.
  3. The database is the software base.
  4. Don't just fix the mistakes -- fix whatever permitted the mistake in the first place.

Consider these stats : the last three versions of the program -- each 420,000 lines long-had just one error each. They must be doing something right.

There is a very good article explaining these propositions here:

"They Write the Right Stuff" [1]

Obviously this cost a lot of money!

[1] http://www.fastcompany.com/magazine/06/writestuff.html?page=0%2C0

Coincidence that they had one error each? Do you think it was the same error? Does their version control system finger the person who wrote that error? :P - BenAlabaster
Maybe they forgot to merge the fix back into trunk... - Pablojim
"Maybe they forgot to merge the fix back into trunk". Nope. They might make that mistake once but they wouldn't allow themselves to make that mistake a second time. "4. Don't just fix the mistakes -- fix whatever permitted the mistake in the first place." - Windows programmer
(2) #1 is an impossible, infinitely-recursive proposition - Aidan Ryan
9
[+12] [2009-07-02 00:04:12] Nick

A formal specification language [1] such as Z or Object-Z is a must. The software producing organization should have a high CMMI level as well, 4 or 5.

[1] http://en.wikipedia.org/wiki/Formal%5Fmethods

I'd go with formal methods too. I learnt Z and B a long time ago at University. At the time Z had a type checker fuzz free but no tools for refining the specification to implementation. The B-method on the other hand had much better commercial tool support allowing a specification to be turned into an implementation. I've never heard of Object-Z i'll have to take a look. - pjp
Are you saying the CMMI requirement is a legal requirement? - Marco Mustapic
High CMMI is useful to prove that your team are doing the right thing almost all the time. These people work very well when put in an environment where correctness is paramount. But a high CMMI in and of itself is not an indicator that software can be done well, only that it can be done reliably. On the other hand people who right well designed and correct software tend to write good software anyway. - Spence
10
[+10] [2009-07-01 23:51:48] a_m0d

I remember about a month or so ago there was an article on slashdot about how NASA developed defect free software (that's how it was referred to) - it had a specific example from NASA. They made sure that they had very clear specs (written IIRC using Z), and had lots of testing, etc. You can find one of their documents here [1].

I am trying to find the link, but can't see it atm. Will post it later when I can find it.

In general, I would say that the following would be important:

  • Make sure that you have very specific specifications (use Z or some other formal language)
  • You can never write too many tests (for such a high risk application)
  • Make sure that you choose the correct framework / language / development environment (e.g. IIRC the Java license does not even permit the use of Java in nuclear plants)


EDIT: marcc found this [2] link [3] (second link shows everything on one page), which explains a bit more about how NASA operate, but isn't the link I was looking for.

[1] http://www.hq.nasa.gov/office/codeq/doctree/871913B.pdf
[2] http://www.fastcompany.com/magazine/06/writestuff.html
[3] http://www.fastcompany.com/node/28121/print

(3) I think the article you refer to is at fastcompany.com/magazine/06/writestuff.html - marcc
Because NASA specs have such a great reputation for their clarity... given their Mars Orbiter debacle with the Imperial/Metric unit mishap... - BenAlabaster
@marcc: No, that doesn't seem like the one I was referring to, but will add it into my answer - a_m0d
11
[+10] [2009-07-02 00:50:13] too much php

Apparently C++ is good enough for managing nuclear warheads. See the Coding Standards for the Joint Strike Fighter [1] (PDF).

[1] http://www.research.att.com/~bs/JSF-AV-rules.pdf

(26) That's because C++ often tend to blow up, which makes it a perfect candidate for the application. ;) - kigurai
@kigurai - beautifully put :-)) - ldigas
12
[+10] [2009-07-09 19:44:42] JeffP

There are standards and certifications for software development of safety-critical systems:

...and so on. Most of these overlap heavily with variations on CMMI-like processes, restrictions on language usage, requirements for fault analysis, diagnostics, fail safe states, etc.

So, in short, developing software for safety critical systems is not something you need to figure out on your own.

[1] http://en.wikipedia.org/wiki/DO-178B
[2] http://en.wikipedia.org/wiki/IEC%5F61508

13
[+9] [2009-07-02 00:18:39] ldigas

You wouldn't believe how much stuff runs on 30-year old software.

But that's besides the point - the answer you're looking for is that nuclear power plants, and pretty much most power plants, and all other kind of objects don't rely on software for their running operations. Think of it, ... how old are some powerplants, how long is their predicted lifetime (and a lot of them exceedes that lifetime) ... do you really wish to make them rely on buggy software on operating systems that change every 10 years ?

No, when it comes to that kind of objects, you have physical controlling mechanisms (from valves up to releys, up to ...), with alarms, then some more valves, then some more alarms and human monitoring, and then maybe some software controlling of processes (but that software still can't avoid the valve) ... you see my point probably.

How that old saying goes ?

If architects builded buildings like programmers build software,
the first woodpecker that came would destroy the civilization.


(7) "If architects builded buildings like programmers build software ..." - I'm kind of offended by this. Architects and programmers are not the same ... architects aren't usually asked to use a building material beyond its capabilities or completely restructure the foundation near project completion? I can't tell you how many times I've had to defend my software from a client who wants to add functionality to the UI specific to an external Add-On, or combine multiple data into 1 column, or combine unrelated databases into one. It's a sad state, but mostly because the material is so malleable. - John MacIntyre
(1) @John - I'm sorry you feel offended by it. I didn't said it, I just quoted it. Although, I agree with it. As far as your complaint goes, all jobs have their demands. Programmers are not the only ones with unrealistic (crazy?) client demands. - ldigas
(1) @Idigas - I would say that it's a little unfair to tarnish all programmers with the same brush. It's a gross overgeneralization that as a stereotype I don't think I could refute. However, at least in this instance I'd say that developers in this type of environment don't write your average business application and don't think or program in the same mindset. Every line of code is designed with the thought "what needs to happen if this line of code fails". Most general business programmers however don't consider this until after the fact. - BenAlabaster
@ldigas - ... sorry about the rant. ... I've attempted to come up with a more comprehensive response, but I just start ranting again. ;-) - John MacIntyre
@balabaster - It's all about the business commitment. You could develop some really robust software given the priority. Instead, most software is written with zero commitment even for end user testing. - John MacIntyre
(1) @balabaster - Most sayings of this type are based on stereotypes. But, why does it bother you that much? I thought it as funny, until now. However, I really don't feel like starting a flame war, so if it offends you, I'll delete it. As for the other part of your comment, no, not really, they're not written with that thought in mind. You'd be suprised. People's ideas how powerplants work, operate and ... are grossly misguided by media. - ldigas
They're very boring places, actually. Normally, the most exciting thing that happens in there is when they get a new kind of desert in the cantine. - ldigas
Er, didn't offend me. Don't delete it on my account. Was just stating my position, wasn't trying to be an ass or start a war.. nuclear or any other kind :P - BenAlabaster
@balabaster - :-) Aaah, wasn't that important. And I don't feel like writing it all over again anyways :-) - ldigas
@Idigas that's what Rollback is for ;) - BenAlabaster
@balabaster - uf, I forgot about that. I still think of these forums as modifiable usenet. - ldigas
14
[+5] [2009-07-01 23:59:08] docgnome

Very carefully. Or not at all.


15
[+5] [2009-07-02 00:32:30] The Matt

Depending on what the nuclear plant does with the material, be careful about what Google software you use in connection with your development work. By agreeing to their Terms of Service, you're also agreeing to:

(iv) not license, sell, provide or distribute the Software for use in connection with chemical, biological, or nuclear weapons or missiles capable of delivering such weapons

Sources:

https://registration.keyhole.com/download_earth_pro.html [1]

http://sketchup.google.com/download/license_pro.html [2]

http://toolbar.google.com/gmail-helper/terms_mac.html [3]

[1] https://registration.keyhole.com/download%5Fearth%5Fpro.html
[2] http://sketchup.google.com/download/license%5Fpro.html
[3] http://toolbar.google.com/gmail-helper/terms%5Fmac.html

(5) Because the first place I go for my missile guidance systems is Google Maps... :P - BenAlabaster
16
[+5] [2009-07-02 02:43:56] Tatiana Racheva

You might be interested in this document [1], which talks extensively about the current experience (as of 2004).

With regards to the programming languages specifically, here's paragraph 5.5.2 (p. 53):

Most evolutionary I&C; [TR: Instrumentation and Controls] designs use some variant of the C computer language. Overall, there were no reported problems when the C language was used. By contrast, other software languages have had various issues. For Westinghouse implementations, the choice of the PUM-86 computer language proved to be too microprocessor-specific. Because of the limited use of this language, it proved difficult to expand its use across different applications. The lack of familiarity with the language among vendor and plant personnel also contributed to problems, such as reduced sources of support and limited data. Because of similar problems, the PL-1 and PASCAL languages have been replaced by C. ADA was adopted for use in the Temelin - Class 1 E diverse protection system because of its unique characteristics and its history of development and use by the U.S. Military. However, for the above-mentioned reasons, ADA will most likely not be used in future reactor designs.

So, it's C.

[1] http://www.nrc.gov/reading-rm/doc-collections/nuregs/contract/cr6842/cr6842.pdf

17
[+5] [2009-07-03 12:52:50] Jason S

You might wish to read the book SafeWare [1] by Nancy Leveson, it has some good case studies for software and dealing with preventing hazards.

[1] http://books.google.com/books?id=ZrZQAAAAMAAJ

Safeware is good. people should read it. - Tim Williscroft
18
[+3] [2009-07-02 00:10:03] John Saunders

Can you say what you are trying to learn from answers to this question?

Possibly you're wondering "could we do more to make sure our software doesn't break? What if we wrote software like they do for nuclear power plants"?

If so, then I don't think you've found the correct analogy. The cost of bugs in the systems of a nuclear power plant would be so high that, it's possible that software is not even permitted.

If this is what you're looking for, then I think you should look for examples of software where failure would be very expensive, but would not be life-threatening. Maybe systems that deal in millions of dollars per second, I don't know. But I think you want something achievable.

Chances are the differences aren't so much in QA, as in process, to make sure the bugs never get into the code in the first place.


(1) Actually I just wanted to hear some war stories but apparently there is no such thing as "software that controls nuclear plant". - Pavel Chuchuva
(2) War stories from people who design software for nuclear power plants? O_o - Eric
I like the irony of the implication of that question Eric... hahaha, that's awesome :D - BenAlabaster
19
[+3] [2009-07-02 22:41:00] Tim Williscroft

Use what the regulator allows. No, seriously, you do NOT get to choose sometimes. It's quite possible that people will suggest crazy things like commodity operating systems.

This is the same mess the US SCADA industry is in with little to no security.

So the my money would be on locked down Solaris X ( it has quite nice real-time support, in addition to being like a bank vault.)

Ravenscar Ada springs to mind for the code . As noted you can't use Java. I've used Real-time Java for weapon systems and it works really well; but maybe one day Realtime java will be okay for nuclear plants; in which case it would be good.

Big ups on heavy formal methods, and using a whole-system simulator build in Matlab or similar. No I'm not smoking crack. The flight system guys now use Mathtlab's code generator at least for simulation.

And really heavy testing. Yes Veronica, we will be expecting 100% scenario coverage.


Did you have a problem with the weapons not going off in time? :P - BenAlabaster
+1 for the mention of SCADA which nobody else here seems to have a clue about. - BenAlabaster
Weapon release was on time, every time. Fist shot hits the target or your bullet back. - Tim Williscroft
20
[+2] [2009-07-02 09:02:12] ojblass

I think I would spend 98.99999997% of my time writing test cases and testing my code. I would spend 1% writing code and the remainder on StackOverflow.


(4) Only 0.00000003% of time in StackOverflow? With 220 8h work days per year, that would mean about 1 second of StackOverflow every five years... what a sad perspective :) - Rene Saarsoo
21
[+2] [2009-07-24 00:00:01] David Plumpton

During the landing of the first space shuttle mission (STS-1) all five redundant computers failed (due to a hardware fault). Mission commander John Young took over manual control for the landing.

So the lesson is... always have a manual override.


22
[+1] [2009-07-02 00:25:29] xeon

Don't use Java. It is not approved for use in nuclear power plants.


23
[+1] [2009-07-02 00:34:56] Reginaldo

Erlang [1] with reported cases of 99,9999999% of availability would be a serious candidate language.

I'd also count on a very experienced team and a lots of effort on code coverage tests as well on stress tests.

[1] http://www.pragprog.com/articles/erlang

(4) availability does not imply correctness - Eric
That's why I'd also dedicate lots of effort on testing. - Reginaldo
(2) Nuke plant software needs to be both highly available and extremely fault tolerant. Doesn't Erlang achieve it's high availability by shooting errant processes in the head? I have to question the usage of a functional language in software where the whole point is to cause side effects. - Hans Malherbe
24
[+1] [2009-07-02 08:48:10] John Nolan

Using a formal method that you can mathematically prove you are not going to fail.

There are methods such as the B- Method [1] that are used in safety critical systems, notably the Paris metro.

[1] http://en.wikipedia.org/wiki/B-Method

It's nice to take a specification and refine it to the implementation proving correctness as you go along. The only trouble is making sure that the specification is correct in the first place. - pjp
25
[+1] [2009-07-02 21:27:08] ya23

I'd spent most of the time writing detailed spec, as the guys in NASA do.

I found this article [1] very interesting.

[1] http://www.fastcompany.com/node/28121/print

That's all well and good until they realise that half was written using imperial measurements and half was written in metric... did them a lot of good then :P - BenAlabaster
26
[+1] [2009-07-02 21:44:16] User

I would only work as telecommute. Better from some other continent.

And for design would definitely recommend a hardware switch that cuts out PC control and puts everything on manual.

I wonder if there are developers that work on nuclear plant software among stackoverflow.com audience.

There are sure such developers somewhere, like guys working in CERN (haven't seen them alive though).

There should also be developers who work with hadron collider. They have likely already made a few bugs there. Though the thing crashed after a few days of operation, there is likely to be a memory leak. I mean, I used to find a few things on my desktop in Germany somewhat shifted from the original position I left them in the direction of Switzerland (micro black hole or whatever they created but did not properly disposed). Scary...


27
[+1] [2009-07-02 22:19:44] aaaaaa

For industrial control systems using PLCs there are tools availible that can analyze every possible state of the software. By using this data (in form of state graphs for example) you can see if there are any dead ends or other strange situations and thus rewrite the program to prevent those states from even existing.

Disclaimer: I really have no idea how nuclear software is made but i belive that such tools would be really helpful for this kind of application.


28
[+1] [2009-07-02 22:20:48] Cade Roux

I'm sure a lot of the "software" actually used in nuclear power plants is on Windows, like most businesses: Excel and Word and Acrobat and Outlook. I'm sure they have boring old CRUD applications for rod inventory.

Nuclear power plants, like most large systems, are going to be made up of a combination of digital and analog controls and embedded and general purpose computers. The components will be programmed in a variety of languages and the choice in each case is going to be dictated by the individual requirements.


+1 for boring CRUD applications. Systems like the control rods aren't managed on-the-fly by advanced computer systems but actually changed one at a time by a human operator in 6 inch increments (2% of total travel) every two weeks or so. Nuclear Power Plants operate only under extremely strict design constraints and known conditions. There is no guesswork involved in any of the control systems. If necessary safety-related instrumentation fails there is failsafe hardware to insert all rods into the core and get things into a known configuration. - nvuono
29
[+1] [2009-07-02 22:25:00] eKek0

Definitely not with agile, and yes with an waterfall process.

In the tools wich I would use, certainly there will be formal tools wich I could verified with some mathematics, like Petri networks.

If the software will able to run in Windows I will write it in Delphi, just because java doesn't allows it.


30
[+1] [2009-07-06 16:02:43] ConcernedOfTunbridgeWells

This posting [1] discusses safety critical software with quite a lot of fan-out links.

[1] http://stackoverflow.com/questions/243387/best-language-for-safety-critical-software

31
[+1] [2009-07-06 16:20:36] PaulG

CERN uses LabVIEW to control the LHC. If LabVIEW is good enough to recreate black holes, Higgs Bosons and the Big Bang I'm sure it can handle whimply ole' nuclear fision. :)


Did it manage to create Higgs Bosons? - BenAlabaster
Not yet. Bozos keep breaking the damn thing. :) - PaulG
32
[+1] [2009-11-25 04:44:12] iskandar

No garbage collection, no dynamic memory allocation, no multitasking and multi threading. Evrything must be stupid safe.


you're implying that making things stupid makes them safe? - Christopher Harris
33
[+1] [2009-12-31 02:44:40] none

Design Diversity (n-version programming):

Quoting " Choosing Effective Methods for Diversity [1]":

Design diversity is a popular defence against design faults in safety critical systems. Design diversity is at times pursued by simply isolating the development teams of the different versions, but it is presumably better to “force” diversity, by appropriate prescriptions to the teams. There are many ways of forcing diversity.

Quoting " Simulating Specification Errors and Ambiguities in Systems Employing Design Diversity [2]":

In n-version programming different software versions, written to the same specification but developed independently execute in parallel. It is imperative that there is no communication between the teams responsible for developing the different versions. Quarantining the different teams is essential such that misunderstandings from one team do not affect the understanding of other teams. But quarantining teams is not always enough uncorrelated faults in distinct versions can lead to identical failures. [...] Many people have written off n-version programming as a dead approach to attaining high integrity software because of the n-version problem.But to our amazement n-version programming is alive and well in several different safety critical domains and it is particularly popular outside of the United States.

Quoting " Research on Diversity and Software Fault Tolerance at the Center for Software Reliability [3]":

The use of diversity – doing things differently, in two or more ways, to protect against the failures of single procedures – has been ubiquitous in safety-critical industries for decades. In many of these applications, the benefits have been regarded as ‘obvious’, and it is only in more recent years that there have been formal models and studies of efficacy. [...] More recently (in the past 25 years) there has been considerable interest in the use of diversity in software-based systems. A driver for this research was the need for very highly reliable software, coupled with the realisation that there were severe difficulties in making a single version of a program very reliable (e.g. via reliability growth from extensive testing and debugging) (Miller, Morell et al. 1992; Littlewood and Strigini 1993). The use of multi-version software, developed independently and adjudicated at run-time, seemed a possible way out of the difficulties: early work in the field was probably motivated by an analogy with hardware redundancy. [...] There are some early applications of software diversity that appear to have been successful: examples include critical flight-control computers on Airbus aircraft (Briere and Traverse 1993); various railway signalling and control systems, see e.g. (Hagelin 1988). After experiencing many years of operational use, there seem to be no reports of catastrophic failure of these systems attributable to software design faults.

Quoting " Digital Avionics: A Computing Perspective [4]":

Software is intangible, so it cannot exhibit degradation faults. Rather, software failures are necessarily due to design faults that cannot be masked through simple replication due to the lack of failure independence. Design diversity is a popular technique that attempts to overcome this difficulty by employing arrays of redundant components, each with a dissimilar design or implementation. Airbus and Boeing both use design diversity in their flight control systems but in different ways.

Airbus employs a design diversity technique called multiversion programming or N-version programming [3]. In multiversion programming, several system implementations are prepared from the same set of requirements by different developers under the presumption that the designs prepared by each developer will be independent—that is, the probability that one implementation will fail on a particular set of inputs given that another implementation has failed on those inputs is equal to the probably of the implementation’s failing alone. The various implementations are then assembled into a classical redundancy architecture in which they are run in parallel on the same inputs and their outputs are passed into a voter to check agreement. If a design fault is activated in one of the implementations, then, according to the theory, it is unlikely that the other implementations will also possess the fault and should continue to function. Clearly, the assurance that can be placed on multiversion programming rests on the assumption of design independence, and evidence exists that this assumption does not hold for all types of software systems [7].

The Boeing 777 FCS was not developed through multiversion programming but rather by employing diversity in the microprocessor architecture. Boeing compiled the software for the 777 FCS for multiple machine architectures and runs each version in tandem during system operation. This approach allows the 777 FCS to tolerate design faults in a specific microprocessor well as those introduced during compilation. It does not, however, provide any resilience to faults resulting from errors in the common source code from which the versions were built [13].

[1] http://www.springerlink.com/content/ejcg5kruncye34uu/
[2] http://www.cigital.com/papers/download/pnsqc97.pdf
[3] http://www.csr.city.ac.uk/projects/diversity/
[4] http://www.computer.org/portal/web/readynotes/sample-digital-avionics-a-computing-perspective

Could you please elaborate on why these quotes are interesting? - John Saunders
34
[0] [2009-07-02 22:02:22] Kevin Pang

Remotely, if possible.


35
[0] [2009-07-12 17:54:58] Tiberiu

I understood that for some space missions in the past CLIPS expert system was used to control launching ..etc but also programming languages like LISP which keeps you in the state of the flow are considered to be safe for safety critical applications.


36
[0] [2009-07-12 17:58:45] cwap

Not really a practice, but a strong OS might be a start: http://en.wikipedia.org/wiki/VxWorks


37
[0] [2009-07-15 03:25:26] Andrew Grimm

I once heard at university that some coders aren't allowed to compile or run their own code. They have to work out for themselves whether it works as it ought to, and only then does it get compiled and tested.


38
[0] [2009-08-22 18:25:58] Blair

Great post - I enjoyed reading it. Here is a great Canadian Standard for Nuclear Power Plants. N290.14-07 "Qualification of pre-developed software for use in safety related instrumentation and control applications in nuclear power plants". In it, it describes what standards to use when deciding if software can be installed in safety systems. Example if you can get somethng the is IEC sil level 4 then yes its good to go into the safety system.


39