Unbreaking

Current mainstream enterprise-architecture [EA] is broken, more broken and yet more broken, and has real problems with wickedness. But what can we do about this? How can we mend EA, make it unbroken?

As in those posts linked-to above, I’d suggest that the key reasons why EA is so broken ultimately come down to misuses of the linear-paradigm – Taylorism and suchlike – in contexts for which it is inherently unsuited. What we need instead to understand about enterprise-architecture is that:

it’s about whole-context systems and the holistic view
it’s about wicked-problems
it’s about human relations
it’s about linking everything together: people, process, purpose, and much else besides
it’s about managing and continually ‘re-solving’ inherent conflicts
it’s about getting the right balance between stability, adaptability and resilience

(And yes, there is indeed some IT somewhere in there too – no doubt about that. Yet it’s rare that IT is the only significant concern in an enterprise-scope architecture, and not all that common that it’s even the most-significant issue, either. It’s a factor, but never the factor: a distinction that’s extremely important, yet somehow all-too-often forgotten?)

So, how do we manage those inherent conflicts? More to the point, as enterprise-architects, via what architectural means can we do this? Here are two real-life examples.

Conflict: Engineers versus scientists

This one was at a research-laboratory some years back. The main focus there was science, some of it pure-research, some of it as applied as technologies for immediate real-world problems. Their work tended to be exploratory and experimental, over timescales measured in years. There was a small amount of engineering-work done on site, but primarily to design and build test-rigs for the scientists’ experiments.

Then following a change in organisational policy, an engineering unit was moved from another site into the laboratory. In a sense some of their work was similar to the scientists’, but it was highly structured, oriented around predefined tests, with concrete results expected and delivered within days. Senior-management had assumed that both types of teams – engineers and scientists – would all get along well with each other: “after all”, said one executive, “they’re all technical people, aren’t they? Should get on like a house on fire!”

They did indeed get on like a ‘house on fire’: panic, flames, explosions, the occasional scream… 🙂

Management’s first response, of course, was to pretend that the conflict wasn’t happening. When doing-nothing didn’t work, their all-too-predictable next response was to try to find someone to blame – which just made the conflict worse. Instead, what we were able to show was that it was no-one’s fault: the conflict was not in the people as such, but inherent in the nature of the work and the implied relationships between those types of work. On the surface it seemed to be a ‘people-issue’, but the underlying driver was an architectural relationship – which meant that we had real options to resolve it, or at least reduce the tensions, via architectural means.

To illustrate this, we used a reworked version of the classic Chinese ‘Five Elements‘ model, much like a continuous-cycle version of the well-known Tuckman Group Dynamics project-lifecycle sequence ‘forming, storming, norming, performing, adjourning’:

The scientists’ focus was very much in the ‘forming’ or Purpose phase, new ideas, new developments, often abstract, often with an emphasis on the far-future. By contrast, the engineers’ focus was just as strongly centred on the ‘performing’ or Process phase, doing just this one thing, by the book, right here, right now, no variance allowed, the immediate concrete present.

Once we phrased it in that way, it was obvious why it was inevitable that we would have a full-scale fight on our hands – and also why the main focus of contention was around scheduling, with the scientists needing to change things right down to the last moment, and the engineers needing things stable enough to get anything done. And perhaps the greatest problem was that the engineers were winning the fight, disrupting work right across the whole of the laboratory. Yet the model showed that this too was all but inevitable: in the classic Chinese terms, the scientists were ‘Wood’ (forming, the new), the engineers ‘Metal’ (performing, the now) – and whilst Wood eventually fractures Metal, Metal is much quicker at chopping down Wood…

Architecturally, the focus of the fight was between those two domains: forming/Purpose versus performing/Process. What the model showed us was that we needed to get the flow moving around the frame in the mutually-supportive sequence. In other words, we needed to place much stronger emphasis on the ‘missing’ phases or domains, and the links between them:

storming/People (‘Fire’): acknowledge that the conflict is inherent in the relationship between scientists and engineers, and is therefore ‘no-one’s fault’; establish explicit mechanisms for ongoing arbitration and conflict-resolution, ultimately responsible to and reporting direct to executive level
norming/Preparation (‘Earth’): redesign the scheduling-system to provide stronger support for engineers’ need for stability, based around an explicit but extensible ‘service-catalogue’, but with space allocated in the schedule to allow scientists to ‘slot in’ quick experiments built on the service-catalogue
adjourning/Performance (‘Water’): redevelop the engineers’ results-outputs into structured formats based on the service-catalogue, to make the results more accessible and reusable within other scientists’ work.

Or, in more graphic form, terms of the Five Elements model:

Wood feeds Fire feeds Earth feeds Metal feeds Water feeds Wood: it worked. It took the anger right out of most of the arguments, everyone felt much more supported by everyone else, and whilst the conflicts did still remain – because they always do, in any inherent wicked-problem – their fire was appropriately contained within that explicit and respected space for conflict-resolution.

(For IT-folks: the scheduling-system involved some redesign of the existing job-scheduling application, whilst the results-output was a new development of an object-database to replace a random mixture of Word-templates and Excel spreadsheets. So yes, some significant new IT there. Yet note that the IT was in context of and worked in conjunction with other non-IT-based human processes in conflict-resolution and the like: the IT made almost no sense as such – in fact had no real business-case – when viewed solely on its own.)

In short, a structural means to continually ‘re-solve’ a wicked-problem, via rethinking the architecture of the overall context.

Conflict: Stability versus agility

Although this specific wicked-problem relates to ‘stability and control’ versus ‘agile response to change’, it’s a pattern (or anti-pattern, rather) that comes up in quite a few different guises – not least the infamous ‘IT/business divide’ – so I’ll adapt this one as a more generic version here.

Let’s start this with the SCAN sensemaking/decision-making framework:

A quick summary:

vertical axis: arbitrary scale of time-available-before-decision-for-action
dotted-line divider on vertical-axis: ‘considered’ or ‘rational’ decision-making (above), versus ‘run-time’ (often emotive) decision-making (below)
horizontal axis: arbitrary scale of modality – in effect, the extent of ambiguity and uncertainty in the context
red-line divider on horizontal axis: the Inverse-Einstein Test: nominally-certain (left/blue), versus inherently-uncertain (right/red)

The SCAN frame provides a context-space map in which we can model the various ideas, experiences and assumptions, and the conflicts that arise between these various assumptions interact with each other. In effect, what we have in the stability-versus-agility wicked-problem is several different conflicts all scrambled together into the same context:

one group (typically ‘the IT department’) focusses its efforts on Complicated analysis to develop Simple ‘solutions’ (‘automated processes’) for real-time use [in SCAN, top-left to bottom-left]
another group (typically referred to by IT as ‘the business’ – in essence, ‘anything not-IT that might affect IT’) focusses its efforts on resolving the Not-known in real-time actions, and experiments and methods to deal with the Ambiguous [in SCAN, bottom-right to top-right]
in their internal operations and structures, IT-systems (and machines in general) need and expect Simple stability and predictability
real-world contexts inherently include Not-known uncertainty and unpredictability
the Complicated takes time to deliver usable methods – and its methods are only usable for Simple known contexts
the Not-known has no-time, but tries to grab some time to cope with and make sense of the Ambiguous
stability demands Waterfall-type governance, to maintain that stability
agility needs Agile-type governance, to support the required adaptability
what each party is trying to do or deliver will not resolve the other party’s needs [in SCAN, primarily top-left to bottom-right – ‘considered’ certainty versus real-time uncertainty]
every party in these conflicts tends to take the Simple view that they alone are right, and therefore everyone else is wrong…

Hardly surprising, then, that there would be some difficulties here… So here’s a typical example of this anti-pattern in real-world practice, as described by Donald Lawn in a LinkedIn conversation the other day:

the executive-team is struggling to make sense of what’s happening in real-time within their organisation
there are also significant problems with ambiguities and miscommunications arising from lack of ‘single source of truth’ across the enterprise
an IT-vendor proposes building a central data-warehouse with an ‘executive dashboard’ to show real-time activity
to make the data-warehouse work, all existing systems must be linked to it via ETL (extract-transform-load)
because it has to represent all real-time activity, local ‘shadow-IT’ in small local-databases and spreadsheets must be suppressed, and their data brought into centralised control
the centralisation and re-implementation place excessive load on the IT-department, who drop far behind schedule on planned delivery of the new system
when the system comes on-line, it satisfies the executives’ need, but falls far short of what is actually needed for front-line
because of the fragility and business-criticality of the data-warehouse, all proposed changes to any data-systems have to go through Waterfall-type governance with the overworked IT-department
front-line management experience the IT-department response as too slow for their needs
front-line management once again develop local shadow-IT to satisfy local needs, without links to the central data-warehouse
the data-warehouse slowly becomes out-of-date, giving a misleading picture of the business
the executive struggles to make sense of what’s happening in real-time within their organisation
there are also significant problems with ambiguities and miscommunications arising from lack of ‘single source of truth’ across the enterprise
an IT-vendor proposes building a new central data-warehouse with an ‘executive dashboard’ to show real-time activity…

In a typical large organisation, this sad cycle takes around 5-10 years per iteration, at a cost of anywhere from millions to billions of dollars and a heck of lot of wasted effort by highly-skilled people – and no-one is happy at any stage of the process (except perhaps the IT-vendors and external-consultants).

What works instead is to respect it as a wicked-problem – in other words, use tactics that acknowledge and work with the inherent conflicts and ambiguities in the context.

A typical tactic we use to resolve this is ‘backbone and edge‘: the ‘backbone’ provides a stable frame to support agility at the ‘edge’. First:

acknowledge that the conflict exists
acknowledge that the conflict arises naturally from the nature of the context, and that therefore no-one is to blame for the existence of the conflict
establish explicit frameworks and mechanisms for conflict-resolution across all areas engaged in and/or affected by the conflict
establish frameworks and mechanisms for continuous ‘re-solution’ of the wicked-problem itself

For the latter, architecturally, we would always recommend a service-oriented approach to the architecture – in other words, build on an assertion that everything in the enterprise is or enacts or represents a service. If we do this, then we can use the same conceptual framing throughout the architecture, regardless of what type of assets we’re dealing with at any point – physical, information, relationships, brands, whatever – and regardless of who or what the agent might be – machines, IT-systems, real-people, or any combination of those. This framing includes concepts such as asset, service, capability (the ability of an agent to do something), functions and function-interfaces (such as APIs), as per Enterprise Canvas.

A ‘backbone’ (or ‘platform’) is a stable cluster of related services that can be relied upon to deliver the same to every other service that needs that whatever-it-is. (This isn’t just about information-platforms, by the way: the respective whatever-it-is could be, well, whatever it is…)

Since other services do depend on it, it does need to be stable, yet only for as long as it does need to be stable. And the same is true for every interface for every service – which itself is one of the key drivers for the wicked-problem. So what we end up with is a spectrum of stability, which in turn often relates most to how many other services rely on a given service, and how much and for how long they will rely on it. At one end we have backbone-services, which must be stable, because they’re shared across the whole enterprise, often over long periods of time; at other end, contextual-services (‘shadow-IT’ brought out of the shadows), which tend to address a single and often short-term concern in one specific area of business; and between them, domain-services, which tend to be used as a shared-reference for a single business-area, but not necessarily by anyone outside of that domain.

Given that spectrum of stability, we therefore need a matching spectrum of governance, from Waterfall to Agile – the respective type of governance matched to the respective required stability, reliability, scope-of-use and directionality for each inter-service interface. A visual summary of the interfaces and governance for an information-system might look like this:

The backbone is essential to the enterprise, yet it’s also one of the greatest constraints on enterprise-agility – so in practice we do need to keep it as simple and compact as possible. To identify what does need to be in there – and, for that matter, the respective domain-repositories – we’d use techniques such as layered modelling with Enterprise Canvas, together with fitness-landscape and variety-weather maps. There are several posts here that go into further depth on this, such as:

Should be enough for you to get started with, anyway.

Any comments or suggestions on this, anyone?