Overview · Module overview
✓ Done
Week 1 · Month 1 Foundation · Tools

The real AI landscape
beyond Claude and ChatGPT

Most professionals have used a handful of AI tools without understanding what distinguishes them at a mechanism level. This week maps the full AI ecosystem — models, copilots, agents, and the stack beneath them — and builds the evaluation discipline to make defensible recommendations in your professional context.

LO 1.1
Explain what major AI tool categories are designed for and where they fall short
LO 1.2
Compare major AI tools on the same task based on direct testing and evaluation discipline
LO 1.3
Build a personal tool map usable for advising on tool choice in professional contexts
LO 1.4
Explain Mollick's co-intelligence framing for organisations — centaur and cyborg modes, governance implications
How this week works This module runs sequentially — each activity builds on what came before. Discussion activities involve your tutor; your responses are saved before each discussion begins. Assessment activities create outputs you carry across all 14 weeks.
Week 1 milestone Personal tool map complete — you can explain the difference between five major AI tools and when to use each one professionally.
✓ Done
Reading Activity 1.1

Pre-reading — Co-Intelligence ch. 1–3

Learning outcomes
💡LO 1.1 Explain what major AI tool categories are designed for and where they fall short

Your reading for this week

Primary · Cat A
Co-Intelligence: Living and Working with AI
Ethan Mollick · Penguin, 2024 · Chapters 1–3 this week
This is the anchor text for the full course. You will return to it in Week 3 (chapters 4–6) and re-read chapters 1–3 in Week 14 as a mastery benchmark. Buy the book if you have not already (~$25). Mollick argues that after the public emergence of ChatGPT, humans had developed a new kind of co-intelligence and should learn to work with AI as co-worker, co-teacher, and coach. Chapters 1–3 establish the framing that everything else in this course builds on.
Supplementary · Cat A
Stanford HAI AI Index 2026
Stanford Human-Centered AI Institute · Free · hai.stanford.edu
The most comprehensive annual data report on the state of AI — capability benchmarks, adoption trends, economic impact, and policy developments. Use it as an evidence base, not a reading to absorb cover to cover. Focus on the executive summary and the sections on enterprise adoption and workforce impact.
Supplementary · Cat A
NIST AI 600-1 — AI Risk Management Framework: Generative AI Profile
National Institute of Standards and Technology · Free · nist.gov
Authoritative US federal guidance on generative AI risk. Relevant to Week 1 because it gives a structured taxonomy of AI risks by category — a useful complement to the capability taxonomy you will build in Activity 1.3.
Before you read — pre-reading question

"What assumptions do you currently hold about what AI can and cannot do? Where do you think colleagues are most getting AI wrong?"

Answer now, before opening the book. Aim for 150–300 words. Be specific — name actual tools, actual tasks, actual mistakes you have observed. Your answer is stored and you will return to it in Activity 3.5.

✓ Response saved — it will appear in Activity 3.5 alongside your end-of-week reflection.
After you read — post-reading question

"How does Mollick's framing of AI as a 'co-intelligence' challenge or confirm your prior assumptions? What is one specific thing you will do differently this week based on chapters 1–3?"

After completing chapters 1–3, return here and answer this. Not assessed — a thinking prompt to consolidate what you have read before instruction begins.

📝 Your pre-reading response
Your saved response will appear here. It is stored and displayed alongside your Week 1 reflection in Activity 3.5.
✓ Done
Discussion Activity 1.2 ● Saved

Your mental model — before instruction begins

Learning outcomes
💡LO 1.1 Explain what major AI tool categories are designed for and where they fall short
You have read Co-Intelligence chapters 1–3 and answered the pre-reading question. Before the taxonomy is formally introduced in Activity 1.3, your tutor wants to hear how you are currently thinking about the AI landscape — in your own words, from your own professional experience. There are no wrong answers here. The point is to surface your actual mental model before instruction shapes it.
How to complete this discussion
1
Write your response to the question below and click "Add to handoff".
2
Copy the handoff text and open the course Claude Project in a new tab. Paste and send.
3
Complete the discussion with your tutor. When the tutor produces the final summary, copy it.
4
Return here and paste the tutor's output into the box below. Your response will be saved.
Your opening question
"Before we map the AI landscape formally — what categories do you think exist? Where do you think the real limits are, from what you've observed professionally?"
Your response
Step 4 — Paste Claude's output here
After your discussion, copy the tutor's final output from Claude and paste it below. The expected format is shown as placeholder text.
✓ Done
Instruction Activity 1.3

The three-level AI tool taxonomy

Learning outcomes
💡LO 1.1 Explain what major AI tool categories are designed for and where they fall short

Most professionals speak about AI tools by brand name — ChatGPT, Copilot, Gemini, Claude. The brand is not the meaningful distinction. What matters is the mechanism beneath the brand: how the tool works, what data it can access, and what structural dependencies it carries. This activity gives you a consistent three-level framework for making those distinctions in your own work and in conversations with colleagues who have not yet made them.

Level 1 — Model mechanism: what the model is built to do

At the foundation, AI tools differ in what their underlying model is optimised for. Four mechanism types are in common professional use.

Text-generation models

Text-generation models are the most familiar. A language model accepts prompts, roles, and context and produces text or structured text-based output. OpenAI describes prompting as the way one "programs" such a model. The primary professional uses are drafting, summarising, classifying, and explanatory writing.

Where they fall shortOutput is non-deterministic and prompt-sensitive. The same prompt can produce meaningfully different results on different runs, and quality varies significantly unless the task is carefully specified and systematically evaluated. This is not a bug to be fixed. It is a property of how these models work, and it has direct implications for how you test and verify their output.

Reasoning models

Reasoning models are a distinct class, optimised for complex problems that require multiple steps of inference. OpenAI explicitly separates reasoning models from standard chat models and notes that they respond differently to prompting. They often require less instruction and more patience, as the model works through a problem before responding. Professional uses include complex analysis, multi-step planning, and demanding logical reasoning tasks.

Where they fall shortThey are typically slower and more expensive than text-generation models. The performance gain is also task-dependent. For straightforward drafting or summarisation, a reasoning model adds little value over a standard text-generation model, and provider guidance notes that users need to adjust how they prompt and evaluate these models compared to simpler ones.

Multimodal models

Multimodal models process inputs beyond text, including images, audio, documents, and in some cases video, within a single interaction. The practical professional relevance is primarily for tasks involving visual materials such as charts, photographs, or scanned documents.

Where they fall shortCapability is uneven across modalities. A model that handles images well may handle audio poorly. Capability claims should be tested against your specific task and input type rather than accepted from general product descriptions.

Tool-using and agentic models

Tool-using and agentic models are capable of calling external tools such as web search, code execution, APIs, and file systems, and chaining those actions across multiple steps without a human directing each one. These are the models closest to what is commonly described as "AI agents."

Where they fall shortThe governance burden is substantially higher than for other mechanism types. Every tool call is a potential point of failure or misuse. External content encountered during a task can inject instructions the model was not given — a risk discussed further in Activity 3.1. A reasoning model that uses tools is a categorically different system from one that does not, and should be governed accordingly.

Level 2 — Application form: how the model is delivered to you

The same underlying model mechanism can be delivered in fundamentally different ways. The delivery form changes what the tool can and cannot do in a professional context.

General chat models

General chat models are models accessed through a public interface such as Claude.ai, ChatGPT, or Gemini, with no connection to your organisation's data, systems, or permissions. The model knows only what you put into the conversation. General chat is highly capable for tasks that can be fully brought to it: drafting from your own knowledge, working through a problem you describe, or generating options from a brief you write.

Where they fall shortThe model cannot be grounded in your documents, emails, or enterprise files without you manually pasting content in. This creates both a volume constraint — most conversations have a context limit — and a data-handling question: what is appropriate to put into a public model? For tasks that require access to large document sets or internal systems, general chat is the wrong application form.

Enterprise copilots

Enterprise copilots are the same or similar model mechanisms deployed inside your organisation's permission boundary. Microsoft 365 Copilot is the clearest current example. It operates inside the Microsoft 365 service boundary, data access is scoped to the signed-in user's permissions, and the organisation's security, compliance, and privacy policies continue to apply. An enterprise copilot can synthesise a set of documents you have access to without you manually pasting each one.

Where they fall shortThey require IT deployment and governance infrastructure that many organisations have not yet completed. They are not simply "ChatGPT but for work." Partial deployments — where the copilot is available but its data governance has not been configured for all data categories — create a distinct class of risk: the tool is deployed, but it is not safe to use for all tasks.
Key conceptThe distinction between general chat and enterprise copilot is not a question of which is better. It is a question of what the task requires. Synthesis across a large document set that cannot be manually pasted demands grounding. Drafting a new document from your own knowledge does not. Choosing the wrong application form for the task creates either unnecessary friction or unnecessary risk.

Level 3 — Provider architecture: the stack beneath the tool

Beneath the model and the application form sits a structural question that most professionals do not consider until it creates a problem. Which provider stack does this tool sit on, and what does that mean for your organisation's data, vendor relationships, and governance obligations?

Stratechery's analysis of the AI platform landscape identifies three structural positions that major providers occupy.

Integrated stacks (Google)

Integrated stacks are providers where the AI model, the cloud infrastructure, and the enterprise applications are all first-party. When an organisation uses Gemini inside Google Workspace, the model, the data storage, and the application layer sit within a single provider relationship. The advantage is coherence. The dependency is vendor lock-in.

Middle-layer stacks (Microsoft)

Middle-layer stacks sit between an integrated and a modular position. Microsoft builds its own enterprise applications and cloud infrastructure but relies on a close partnership with OpenAI for its frontier model capability. The enterprise application and cloud relationship is with Microsoft; the model relationship runs through OpenAI. Governance and data-residency obligations sit primarily with Microsoft, but the model capability is not exclusively Microsoft's.

Modular and marketplace stacks (AWS Bedrock)

Modular and marketplace stacks are provider architectures that offer model choice rather than a single proprietary model. An organisation using AWS Bedrock can access models from Anthropic, Meta, Mistral, and others through a single API, with cloud infrastructure managed by AWS. The advantage is model flexibility and reduced lock-in at the model layer. The trade-off is higher architectural complexity.

Why this matters for tool choice"Which AI tool should we use?" is also a procurement and dependency question. The tool you recommend carries an implied architectural commitment. Knowing the stack behind the tool is part of knowing what you are recommending.

How to use this taxonomy

This is an analytic framework, not a fixed product catalogue. Individual tools sit across these categories in ways that blur the lines: a reasoning model can be delivered as an enterprise copilot; an integrated stack can offer a modular API. The taxonomy is a thinking tool. Its value is that it forces mechanism-level reasoning rather than brand-level association.

The most important discipline in applying the taxonomy is always asking the second question: where does this tool fall short? Every category has a structural limit, not just a capability gap. Knowing the limit is what makes the taxonomy professionally useful, and it is what the cases in Unit 2 are designed to test.

✓ Done
Assessment Activity 1.4

Taxonomy knowledge check

Learning outcomes
💡LO 1.1 Explain what major AI tool categories are designed for and where they fall short

These questions test your understanding of the three-level taxonomy from Activity 1.3 — model mechanism, application form, and provider architecture. The questions focus on mechanisms and limits, not brand names. There is no score. Read the explanation after each question before moving on.

Question 1 of 8
Which model class is specifically optimised for complex, multi-step reasoning tasks?
✓ Correct ✗ Incorrect
Question 2 of 8
What is the primary functional difference between a general chat model and an enterprise copilot?
✓ Correct ✗ Incorrect
Question 3 of 8
In Stratechery's framework for provider architecture, which provider structure offers model choice through a marketplace rather than a single proprietary model?
✓ Correct ✗ Incorrect
Question 4 of 8
An AI model's output is described as "non-deterministic." What does this mean in practice?
✓ Correct ✗ Incorrect
Question 5 of 8
A task requires synthesising 15 internal policy documents simultaneously. Which application form is designed to handle this without requiring manual pasting?
✓ Correct ✗ Incorrect
Question 6 of 8
Which provider is the primary example of an "integrated stack" in AI — where the AI model, cloud infrastructure, and enterprise applications are all first-party?
✓ Correct ✗ Incorrect
Question 7 of 8
A tool-using or agentic model differs from a text-generation model primarily because it can:
✓ Correct ✗ Incorrect
Question 8 of 8
You are advising a colleague at a regulated financial institution. They need to analyse 30 internal audit reports under strict data residency requirements — the data cannot leave the organisation's systems. Which combination of factors should drive your tool recommendation?
✓ Correct ✗ Incorrect
✓ Done
Instruction Activity 2.1

How to evaluate AI tools — methodology before judgment

Learning outcomes
💡LO 1.2 Compare major AI tools on the same task based on direct testing and evaluation discipline

The most common mistake professionals make when evaluating AI tools is treating a single impressive output as evidence of stable capability. It is not. AI model output is non-deterministic: the same prompt, run twice, can produce meaningfully different results. Model behaviour changes between versions. Capability varies across task types in ways that cannot be predicted from brand prestige or general reputation. What looks like a reliable tool in a demonstration may fail on your specific task under your specific conditions.

Key concept: the jagged frontierMollick's concept of the "jagged frontier" names this problem precisely. AI capability is not smoothly distributed across task types. A model that outperforms a skilled professional on one task may underperform a junior analyst on an adjacent one. The frontier is jagged — high in some places and low in others — and the only way to learn it is through direct use and structured testing, not by reading a specification or accepting a marketing claim.

This has a direct methodological implication. Evaluation must be disciplined, task-specific, and iterative. The three sections below describe what that means in practice.

Discipline 1 — Define your success criteria before you test

The most common evaluation failure is running a test without having stated in advance what a good result looks like. Without a prior definition, you will unconsciously judge outputs against whichever tool impressed you first, or whichever output confirms your prior assumptions.

OpenAI's evaluation guidance is direct on this point: define specific and measurable success criteria. What does a good output look like for this task? What are the minimum quality thresholds? What would a mediocre output look like, and how would you distinguish it from a good one?

For professional tasks, success criteria typically have at least three components. Accuracy: is the output factually correct and grounded in the inputs? Completeness: does it cover what the task requires? Usefulness: would you actually use this output, or would you need to substantially rewrite it before it was fit for purpose?

Include edge casesAnthropic's evaluation documentation adds a precision that most professionals miss: include edge cases in your evaluation design. A tool that handles the standard case well may fail on the exception that matters most. If your task involves compliance-sensitive content, an evaluation that does not include a compliance-adjacent test case is not a valid evaluation of whether the tool is appropriate for that task.

Discipline 2 — Hold conditions constant across tools

Comparing two tools without holding the test conditions constant is not a comparison. It is an impression. To produce a meaningful comparison, the same task, the same inputs, and the same evaluation rubric must apply to every tool you are testing.

In practice, this means writing the prompt once, running it in each tool, and grading the outputs against the same criteria. Do not adjust the prompt for each tool based on what you think that tool responds well to. If you adjust the prompt, you are testing your prompting skill, not the tool's capability.

For tasks involving documents, grounding conditions must also be constant. A general chat model that receives a pasted excerpt is not comparable to an enterprise copilot that accesses the full document set through permissions. If the grounding conditions differ, you are comparing application forms rather than model mechanisms. That is a different and also useful question, but it must be labelled correctly.

Run each tool at least twiceNon-determinism means a single run is not representative. Mollick's guidance on evaluation notes that model behaviour is non-deterministic, which makes single-run comparisons unreliable. Two runs per tool is a minimum; for high-stakes decisions, more runs are warranted.

Discipline 3 — Treat evaluation as iterative, not a one-time verdict

A tool evaluation is not a product review. The right answer today may be wrong in six months when the model is updated, when your organisation's data governance changes, or when the task type shifts. Both OpenAI and Anthropic's evaluation guidance explicitly frames evaluation as an ongoing practice rather than a one-time verdict.

The practical implication for your personal tool map is this: each entry should note what you last tested and what would cause you to revise the recommendation. The entry is not permanent. If the constraint that drove your recommendation changes, the recommendation should be revisited.

The final evaluation questionThe final question in any evaluation should always be: what would cause me to change this answer? Naming the condition that would change the recommendation forces you to distinguish recommendations based on reasoning from recommendations based on habit or preference.

What the AI providers say about evaluating their own models

Anthropic's documentation on evaluation recommends constructing task-specific evaluations that mirror real-world task distributions, not generic prompts designed to show the model at its best. This includes building automated grading where the task output can be scored, and combining automated metrics with human judgment where quality is interpretive.

OpenAI's evaluation guidance aligns on the same core point: the evaluation must be grounded in your specific task. Generic benchmarks tell you how a model performs on standardised tests. They do not tell you how the model will perform on synthesising your board papers under your compliance constraints.

The convergence between Anthropic and OpenAI's evaluation guidance is itself worth noting. The methodological discipline they describe is not marketing. It is the acknowledgment that their own models are not uniformly capable, and that the responsibility for discovering where a tool fails on your specific task sits with you.

✓ Done
Case Activity 2.2

ANZ — compliance-adjacent synthesis

Learning outcomes
💡LO 1.1 Explain what major AI tool categories are designed for and where they fall short
💡LO 1.2 Compare major AI tools on the same task based on direct testing and evaluation discipline
💡LO 1.3 Build a personal tool map usable for advising on tool choice in professional contexts
Judgment type: BoundedA right-direction answer exists for this case. You should be able to defend a specific recommendation — not just identify the trade-offs.

The situation

ANZ has publicly distinguished between AI that is appropriate for synthesis and interpretation work and AI that is not appropriate for compliance output where zero errors are acceptable. That distinction is not a preference — it is a governance position.

A senior analyst at ANZ needs to synthesise 12 board papers on customer remediation outcomes into a single executive summary. The papers were produced over 18 months across three remediation streams. The synthesis is not itself a compliance document — but it feeds directly into a briefing used by executives making decisions about remediation resourcing and regulatory reporting. If the summary misrepresents the pattern across the 12 papers, the downstream decisions are wrong.

The analyst has access to three tool options: a general chat model (Claude or ChatGPT accessed through a browser); M365 Copilot, deployed with standard permission settings — it can access documents the analyst has permissions to read; and a reasoning model variant available through the same general chat interface.

What you need to work out

On application form

The task involves 12 documents. Can you paste 12 board papers into a general chat session? What is the practical limit — and what happens to synthesis quality when you can only work with excerpts rather than the full set? The enterprise copilot can access the documents directly through permissions. What does that change about the output?

On model mechanism

Is this a task that benefits from a reasoning model, or is it primarily a synthesis and extraction task that a text-generation model handles adequately? The papers are long but the task is not primarily logical deduction — it is pattern recognition and summarisation across a structured document set.

On governance

ANZ's public position says AI should not produce zero-error compliance output without human review. This task is adjacent to compliance, not directly in it. Where does the human review step sit? Who sees the AI-assisted summary before it reaches the executive?

"Which tool category would you recommend for this task, and what single governance step would you build in before the output reaches the executive?"

Write your answer before moving to the next activity. Your response is stored and displayed in Discussion 2.5.

✓ Judgment saved — it will appear in Discussion 2.5 for reference alongside your Medibank and Accenture judgments.
✓ Done
Case Activity 2.3

Medibank — health data, most constrained environment

Learning outcomes
💡LO 1.1 Explain what major AI tool categories are designed for and where they fall short
💡LO 1.3 Build a personal tool map usable for advising on tool choice in professional contexts
💡LO 1.4 Explain Mollick's co-intelligence framing for organisations — centaur and cyborg modes, governance implications
Judgment type: Bounded — most constrained caseThis is the most constrained case in the sequence. The right-direction answer is narrower than in ANZ. Pay attention to the specific privacy classification before reaching your recommendation.

The situation

Medibank's operating environment changed permanently in October 2022. The breach — in which data on approximately 9.7 million current and former customers was accessed and subsequently released publicly — resulted in OAIC civil penalty action, APRA FAR scrutiny, and sustained regulatory focus on how the company handles sensitive customer data. This is a live constraint, not a theoretical one.

Health data is the highest sensitivity tier in Australian privacy law. The specific data categories Medibank holds — diagnoses, treatment histories, claims records, care pathways — are personally sensitive in ways that are irreversible if mishandled.

A product manager at Medibank needs to brief the executive team on an AI-assisted sentiment analysis across 40,000 post-claim health survey responses. The surveys capture how customers felt about their claims experience — wait times, communication quality, outcome satisfaction. The data is health-adjacent: it does not contain diagnoses or treatment records, but it is linked to health events and is held under the same privacy obligations as primary health data.

The product manager has access to two tool options and one non-option: a general chat model (accessed through a browser, no organisational data boundary); M365 Copilot, which Medibank has in early deployment — partial rollout, not yet fully configured for this data category; and the option of neither until appropriate controls are in place.

What you need to work out

On data classification

The survey text does not contain diagnoses. Does that change the privacy classification? Under Australian privacy law, data linked to a health event is treated as health information regardless of whether it contains clinical content.

On the general chat model

A general chat model accessed through a browser has no organisational data boundary. Inputting 40,000 survey responses — or even a representative sample — means transmitting health-adjacent data to a provider infrastructure outside Medibank's control. What is the legal position under the Privacy Act and Medibank's current regulatory obligations?

On the enterprise copilot

M365 Copilot is in partial rollout. It has not yet been configured for this data category. Does "partially deployed" mean it is safe to use for this task, or does it mean its data governance controls have not been validated for health-adjacent data?

On centaur logic

If an AI system does process these survey responses, where must the human judgment remain? Who reviews the output, and what specifically are they reviewing?

"Advise the product manager. Three options are available: general chat model, enterprise copilot (partially deployed), or neither until appropriate controls are in place. Which option do you recommend, and on what grounds?"

Connect your answer to the specific privacy constraint, not just the general principle. Write your answer before moving to the next activity.

✓ Judgment saved — it will appear in Discussion 2.5 alongside your ANZ and Accenture judgments.
✓ Done
Case Activity 2.4

Accenture — enterprise advisory under commercial sensitivity

Learning outcomes
💡LO 1.1 Explain what major AI tool categories are designed for and where they fall short
💡LO 1.2 Compare major AI tools on the same task based on direct testing and evaluation discipline
💡LO 1.3 Build a personal tool map usable for advising on tool choice in professional contexts
Judgment type: OpenThe previous two cases had right-direction answers. This one does not — or rather, it has multiple defensible answers, and the quality of your response is determined by the reasoning, not the label you attach to the tool.

The situation

You are a manager in Accenture's financial services practice, based in Sydney. Your team has been engaged by a Singapore-headquartered private bank to advise on an AI readiness assessment across their retail and wealth management operations. The client operates in Singapore under MAS oversight, with affiliated entities in Hong Kong under HKMA supervision, and a subsidiary under APRA's prudential framework in Australia.

Your team spent three weeks in discovery: structured interviews with the client's heads of technology, operations, risk, compliance, and three business-unit leaders. You now have 18 sets of detailed interview notes — a mix of direct quotes, paraphrased positions, and field observations — totalling roughly 40,000 words. The engagement partner needs a 1,500-word client-ready synthesis identifying key themes, internal disagreements, and critical dependencies that will shape the AI roadmap.

Three constraints shape what you can and cannot do.

Client data confidentiality. The interview notes contain the client's internal strategic positions, regulatory concerns, and candid assessments of their organisation's readiness. This is confidential client information. Accenture's professional obligations and the engagement letter restrict what can be transmitted outside the engagement boundary.

Cross-border delivery complexity. Your team is in Sydney. The client's data governance is anchored in Singapore under MAS. The synthesis document will be reviewed by the engagement partner in Singapore before going to the client. MAS has specific expectations about where financial institution data is processed and stored.

Stack architecture. Accenture's internal environment gives your team access to M365 Copilot on the standard Accenture tenant. The client's environment runs on Google Workspace. There is no data connection between the two, and the client has not granted your team access to their Google environment.

What you need to work out

On confidentiality and general chat models

A general chat model accessed through a browser would transmit the interview notes — and the client's confidential strategic positions — to a provider infrastructure outside the engagement boundary. Is there a way to use a general chat model that does not carry this problem? What would that look like for a 40,000-word synthesis task?

On the enterprise copilot

Accenture's M365 Copilot deployment operates within Accenture's Microsoft tenant. If the interview notes are stored there, Copilot can process them within the organisational data boundary. Does that satisfy the confidentiality constraint? Does it satisfy the MAS data sovereignty question, given that Accenture's Microsoft tenant is hosted across multiple Azure regions? This is a question you would need to verify with your engagement risk team, not assume.

On the no-data-input option

One approach is to use a general chat model for structure and prose, without inputting the interview notes directly — the analyst reads the notes and inputs a synthesised description of themes. What does this cost in terms of analytical depth? Is that acceptable given the stakes of the synthesis?

On stack architecture

The client runs on Google. You run on Microsoft. There is no shared AI environment. The stack decision is yours alone.

"Write a recommendation of three to five sentences specifying: (1) which tool category you would use for the synthesis; (2) what specific constraint drove that choice — not a general principle, but the specific constraint in this engagement; (3) what governance step you would take before the synthesis document leaves your team."

More than one combination of answers is defensible. Write your answer before moving to the next activity.

✓ Judgment saved — it will appear in Discussion 2.5 alongside your ANZ and Medibank judgments.
✓ Done
Discussion Activity 2.5 ● Saved

Judgment discussion — across all three cases

Learning outcomes
💡LO 1.2 Compare major AI tools on the same task based on direct testing and evaluation discipline
You have worked through three cases — ANZ, Medibank, and Accenture. Your tutor will now discuss your reasoning across all three cases together. This is a conversation to help you sharpen your thinking, not a test of your answers. Your three case judgments are displayed below for reference.
Your case judgments — for reference during this discussion
ANZ (Activity 2.2)
Not yet saved — return to Activity 2.2 and save your judgment first.
Medibank (Activity 2.3)
Not yet saved — return to Activity 2.3 and save your judgment first.
Accenture (Activity 2.4)
Not yet saved — return to Activity 2.4 and save your judgment first.
How to complete this discussion
1
Write your response to the question below and click "Add to handoff". Your three case judgments will be included automatically.
2
Copy the handoff text and open the course Claude Project in a new tab. Paste and send.
3
Complete the discussion with your tutor. When the tutor produces the final statement, copy it.
4
Return here and paste the tutor's output into the box below. Your statement will be saved and carried into Activity 3.2.
Your opening question
"Across ANZ, Medibank, and Accenture — in which case were you most confident in your recommendation, and in which were you least? Name the specific constraint that drove the difference."
Your response
Step 4 — Paste Claude's output here
After your discussion, copy the tutor's final output from Claude and paste it below.
✓ Done
Instruction Activity 3.1

Co-intelligence — centaur, cyborg, and the governance implications

Learning outcomes
💡LO 1.4 Explain Mollick's co-intelligence framing for organisations — centaur and cyborg modes, governance implications

The way organisations talk about AI tends toward a binary framing: AI will do this task, or humans will. That framing is both wrong and unhelpful. What Ethan Mollick calls "co-intelligence" describes a more accurate picture of how capable AI is actually being used in professional settings: humans and AI working together, with the quality of that collaboration determined by how deliberately the division of labour is designed.

This activity covers the two work modes Mollick identifies, the conditions that determine which mode applies, and four governance implications that arise when AI is deployed in an organisational rather than a purely individual context.

Centaur work: when the division of labour is explicit

In centaur work, the division between human and AI is designed in advance and held explicitly. The human decides what scope of task goes to the AI, the AI executes within that bounded scope, and the human reviews the output before it is used or shared. The term comes from the mythological figure — half human, half horse — a literal division of two different kinds of capability.

The clearest professional examples of centaur work are in high-governance contexts, where the division of labour is not a stylistic choice but a structural requirement. In a compliance-adjacent synthesis task, the AI might read and summarise a document set; the human must review and sign off before the output reaches the executive. In a legal drafting context, the AI might produce a first draft; a lawyer must review it before it constitutes advice. In these contexts, centaur mode is non-optional because the accountability structure does not change simply because AI was used to produce the output.

When centaur logic appliesCentaur logic applies when the failure mode of AI error is both high-stakes and potentially non-obvious. AI models produce outputs that can be fluent, confident, and factually wrong. In contexts where a wrong output has significant consequences, a structural human review step is not optional.

Cyborg work: when collaboration is interwoven

In cyborg work, the task boundary is not fixed in advance. Human and AI iterate together in real time: the human steers, the AI responds, and each exchange shapes the next. The labour is genuinely interwoven rather than divided into separate phases.

Mollick's essay "I, Cyborg" frames this mode as the natural one for many knowledge tasks that do not carry the compliance, accountability, or high-stakes failure conditions that make centaur logic mandatory. A senior executive drafting a board communication, a consultant developing a framework for a client, a researcher synthesising literature: in each of these cases, the AI is a continuous collaborator rather than a worker handed a defined scope.

When the environment determines the mode, not the task

The most important principle in this activityIt is not the task alone that determines the work mode. It is the environment the task sits in. A synthesis task in a startup with a lean team, no compliance obligations, and no formal approval structure may be cyborg work. The same synthesis task in a regulated financial institution with board-level accountability, APRA oversight, and an audit trail requirement is centaur work. The task is identical. The mode is different because the operating environment determines the accountability structure, and the accountability structure determines the mode.

This is the question that Activity 3.2 is designed to surface: does the same task produce the same work mode recommendation when you move it across four different organisational contexts? The answer should be no, and understanding why is the point of the capstone.

Verification: why fluent output is not reliable output

Both centaur and cyborg work carry a verification obligation that is routinely underestimated. AI models produce outputs with consistent fluency and apparent confidence regardless of whether the content is accurate. There is no reliable signal within the output itself that distinguishes a correct answer from a plausible-sounding incorrect one.

The practical governance implication is that verification cannot be reserved for cases where the output seems wrong. It must be built into the workflow as a structural step. OpenAI's evaluation guidance notes that human judgment is needed even for tasks with automated scoring, because metrics do not capture everything that matters. Mollick makes the same point: the skill of using AI well includes knowing which outputs to verify and how to verify them.

Deploying AI without verification is not deploying AI efficientlyIt is deploying AI unsafely. The cost of the verification step should be compared to the cost of the error it prevents, not to the ideal of AI output that requires no checking.

Data access governance: a first-class design issue

When an enterprise copilot is deployed, it operates on data that the signed-in user has permission to access. This sounds like a technical safeguard, and it is. But it is also a governance design question that must be addressed before deployment, not discovered after.

Microsoft's governance documentation for Microsoft 365 Copilot makes the stakes explicit. It recommends data loss prevention controls, sensitivity labels, prompt and response auditing, data security posture management reviews, and retention and deletion choices aligned to regulatory compliance. This is not a list of optional features. It is recognition that an AI system able to access everything a user can access will also surface information that was technically accessible but practically siloed.

The governance recommendationTreat AI rollout as an access-governance and information-governance project before treating it as a productivity project. Organisations that deploy copilots before reviewing what their users can access are not accelerating productivity. They are creating information risk.

Agent security: a distinct governance layer

Tool-using and agentic AI systems carry a governance burden that general chat models and enterprise copilots do not. Anthropic's tool-use documentation describes the technical reality: some tools execute in the client environment, others on provider infrastructure. The surface area for error and misuse is larger than in a standard chat session.

Key concept: prompt injectionSimon Willison's widely cited work on prompt injection identifies the specific risk clearly. The threat is not an attack on the model itself. It is an attack on the surrounding application. A malicious instruction embedded in an external document, email, or web page can redirect what an agentic system does, without the user knowing it has happened. Willison's assessment is frank: a robust fix for prompt injection is not yet known.

The practical governance implication is not that agentic tools should be avoided, but that they should be deployed with three disciplines in place. First, least-privilege design: the agent should have access only to what it needs for the specific task. Second, restricted tool scope: limit what external services the agent can call. Third, content review: any pathway where the agent processes external content should be reviewed for injection risk. Agentic systems that can read from and write to external systems are security-sensitive applications, not chat assistants with extra features.

Ecosystem architecture: tool choice as a dependency decision

The final organisational implication of co-intelligence framing returns to the stack architecture introduced in Activity 1.3. When an organisation chooses which AI tool to deploy at scale, it is not only choosing a product. It is making a dependency decision that will constrain future choices.

Stratechery's analysis of the AI platform landscape identifies the procurement implication directly. Lock-in, data gravity, and provider dependency are real variables. An organisation that standardises on Google's integrated stack has made a set of choices about where its data lives, which model serves its applications, and which vendor it renegotiates with when pricing or capability changes.

The governance recommendationTool procurement should record not only which application is selected but which architectural dependency that selection implies. The IT team and the business team making the choice should make it together, with an explicit view of the dependency structure, not only a feature comparison.

Mollick's framing is sometimes read as motivational. That is a misreading. Co-intelligence is an operating model. Once understood as such, work design, evaluation, verification, data access, security, and procurement all fall within the same governance frame. Decisions about how humans and AI divide labour have organisational consequences that persist beyond the individual interaction. Activity 3.2 is designed to make those consequences concrete.

✓ Done
Case Activity 3.2

Capstone — same task, four organisational contexts

Learning outcomes
💡LO 1.2 Compare major AI tools on the same task based on direct testing and evaluation discipline
💡LO 1.3 Build a personal tool map usable for advising on tool choice in professional contexts
💡LO 1.4 Explain Mollick's co-intelligence framing for organisations — centaur and cyborg modes, governance implications
Your constraint reasoning — from Discussion 2.5
Use this as your starting framework as you work through the four contexts below.

The task — identical across all four contexts

You are asked to synthesise 20 stakeholder interview notes into a 500-word executive brief. The notes are from structured interviews covering AI readiness and strategic priorities. The brief goes directly to the executive team.

The task is the same in all four cases. The only thing that changes is the organisational context.

Context 1 — ANZ

You are a senior analyst at ANZ. The interview notes are from internal stakeholders and are stored in SharePoint under standard ANZ document permissions. The brief goes to the Chief Risk Officer and two executive committee members as part of a quarterly AI governance review.

ANZ operates under APRA FAR and has publicly stated that AI should not produce zero-error compliance output without human review. The AI governance review directly informs compliance posture decisions.

For this contextWhich tool category? What governance step before the brief reaches the executives?

Context 2 — Medibank

You are a senior strategist at Medibank. The interview notes are from internal clinical and operational leaders. Several interviewees referenced specific patient cohorts and operational incidents involving health-sensitive context. The notes are stored in the document management system.

Medibank operates under APRA FAR and OAIC oversight, is under active regulatory scrutiny following the 2022 breach, and has not completed its M365 Copilot rollout. The brief goes to the Chief Strategy Officer and is expected to feed into a board paper.

For this contextWhich tool category? What governance step? Which work mode — centaur or cyborg — and why?

Context 3 — Accenture

You are a manager in Accenture's financial services practice. The interview notes are from a client engagement — a Singapore private bank — and contain the client's confidential strategic positions. You have access to M365 Copilot within Accenture's tenant. The brief is client-deliverable: it goes to the engagement partner first, then to the client's CEO.

The client operates under MAS. Accenture's engagement confidentiality obligations apply. The data sovereignty question has not yet been cleared with the engagement risk team.

For this contextWhich tool category? What governance step before external delivery?

Context 4 — XpertiseNow

XpertiseNow is a consulting marketplace and expert-network platform where organisations connect with independent consultants for project-based engagements. It operates with a lean team and does not operate under APRA, MAS, OAIC, or comparable prudential oversight. Its revenue model depends on the quality of its expert matching and the trust of both clients and consultants.

You are a senior member of the XpertiseNow team. The interview notes are from consultants and clients who participated in a research project on how AI is affecting the market for expert advice. Participants consented to their views being used for internal strategy purposes. The notes contain commercially sensitive assessments — views on competitors, pricing tolerance, emerging substitution concerns — but no regulated personal data. XpertiseNow's technology environment is lightweight: Google Workspace, no enterprise AI deployment.

For this contextWhich tool category? What governance step, if any? Which work mode — centaur or cyborg — and why does the operating environment, not the task, determine that?

Your deliverable

Complete the table below. Your answers are stored and shared with the tutor before Discussion 3.4 begins. Be specific in each cell — not just "enterprise copilot" but the specific tool and why it fits this context's constraints.

Context Tool category Governance step Work mode
ANZ
Medibank
Accenture
XpertiseNow
✓ Table saved — your answers will be available in Discussion 3.4.

"In which context were you least confident? Name the specific thing you do not yet know — about the tool, the constraint, or the governance — that would change your answer if you knew it."

✓ Done
Assessment Activity 3.3 ● Saved

Personal tool map

Learning outcomes
💡LO 1.3 Build a personal tool map usable for advising on tool choice in professional contexts

Your tool map is not a list of favourite products. It is a tested decision artifact: a record of which tool category you would use for a specific task, in a specific context, and why — along with the known failure mode and the governance step you would build in. You have worked through three cases and a capstone. Now you apply the same logic to your own professional context. Your answers are stored and shared with the tutor before Discussion 3.4 begins — the tutor reads them before the discussion opens. There are no correct answers. There are complete answers and incomplete ones. A complete answer connects the task to the constraint, the constraint to the tool choice, and the tool choice to the governance step.

Question 1 of 5 — The task
Describe a specific, recurring professional task in your work where you use AI, have used AI, or could plausibly use AI.
Be specific. Not "writing" — but "drafting the executive summary section of monthly operations reports for the CFO." Not "analysis" — but "synthesising NPS survey responses across three business units into a quarterly insight brief." The more specific you are, the more useful the tool map entry will be.
Question 2 of 5 — The context and constraint
What is the operating constraint or context for that task?
Consider: Who sees the output, and what is their tolerance for error? Does the task involve sensitive data — personal, health-related, commercially confidential, or regulated? What governance structure does your organisation operate under — APRA, MAS, GDPR, or other? Name the single most important constraint for this task. If more than one constraint applies, name them in order of binding force.
Question 3 of 5 — The tool choice
Which tool category would you use for this task, and why?
Use the three-level taxonomy from Activity 1.3. Name the application form (general chat or enterprise copilot) and the model mechanism (text-generation, reasoning, multimodal, or tool-using) that fits this task and constraint. Connect your choice to the specific constraint you named in Question 2 — not to the tool's general reputation. If you are uncertain, say so and name what you would need to know to decide.
Question 4 of 5 — The failure mode
What is the most likely way this tool fails on this task?
Not "AI might be wrong" — that is true of every tool. Be specific: does this model produce fluent but poorly-grounded summaries when working from pasted excerpts rather than full documents? Does it lose track of constraint variables across a long synthesis? If you have tested this tool on this task, describe what you observed. If you have not yet tested it, name the failure mode you would design your first test to probe.
Question 5 of 5 — The governance step
What verification or governance step would you build in before acting on the output?
Name a specific step, not a general principle. Not "I would review it carefully" — but "I would compare the AI summary against the original source documents on the three data points most likely to affect the downstream decision." Connect the step to the failure mode you named in Question 4. The governance step should address the specific failure mode, not generic AI caution.
📝 Your tool map is saved here
Your completed tool map will appear here after you submit all five entries. Your tutor reads this before Activity 3.4 begins.
✓ Done
Discussion Activity 3.4 ● Saved

Tool map discussion

Learning outcomes
💡LO 1.3 Build a personal tool map usable for advising on tool choice in professional contexts
💡LO 1.4 Explain Mollick's co-intelligence framing for organisations — centaur and cyborg modes, governance implications
Your tutor has read your completed tool map from Activity 3.3 and your capstone table from Activity 3.2 before this discussion begins. This is a conversation grounded in your actual work — not a cold start. Your tool map and capstone table are displayed below for reference.
Your tool map (Activity 3.3) — for reference
Not yet submitted — complete Activity 3.3 first.
Your capstone table (Activity 3.2) — for reference
Not yet saved — complete Activity 3.2 first.
How to complete this discussion
1
Write your response identifying your least-confident tool map entry. Click "Add to handoff" — your tool map and capstone table will be included automatically.
2
Copy the handoff text and open the course Claude Project in a new tab. Paste and send.
3
Complete the discussion with your tutor. When the tutor produces the refined entry, copy it.
4
Return here and paste the tutor's output below. Your refined entry will be saved as your Week 1 persistent artifact.
Your opening question
"Pick the one entry in your tool map where you are least confident. Walk me through the task, the constraint, the tool you chose, and the governance step — and tell me what would change your answer."
Your response
Step 4 — Paste Claude's output here
After your discussion, copy the tutor's refined tool map entry from Claude and paste it below.
✓ Done
Reflection Activity 3.5 ● Saved

What changed — and what remains uncertain

Learning outcomes
💡LO 1.1 Explain what major AI tool categories are designed for and where they fall short
💡LO 1.3 Build a personal tool map usable for advising on tool choice in professional contexts

This is the last thing you do in Week 1 and the first thing referenced in Week 2. It has two purposes: consolidating what changed this week, and naming what remains uncertain so it becomes a structured input to Week 2 rather than a vague gap.

Your pre-reading answer from Activity 1.1 is displayed below alongside your mental model from Discussion 1.2 — you can see your before and after directly.

Your mental model — captured before instruction began (Activity 1.2)
Your mental model summary from Discussion 1.2 will appear here. If it is not showing, return to Activity 1.2 and ensure you completed and saved the discussion output.
Your pre-reading response (Activity 1.1) — for comparison
Your pre-reading response will appear here if you saved it in Activity 1.1.
Reflection question 1 — What changed
Read your mental model above. How has it changed since you wrote that?
Be specific. Name the assumption you held that turned out to be wrong or incomplete. Name the mechanism you did not understand before Activity 1.3 that you understand now. If your pre-reading answer turns out to have been largely correct, say so — and name the one thing that surprised you anyway. Vague answers ("I learned a lot") are not useful here. The point is to identify the specific place where your thinking shifted, so you can track whether it holds up across the remaining 13 weeks.
Reflection question 2 — What remains uncertain
Name one thing you still do not know that you would need to test or verify before advising someone else on it.
This is not a gap to be embarrassed about. Uncertainty is structural in the tool map — Mollick's jagged frontier argument means the frontier is learned through use, not resolved by reading. The point is to name the uncertainty precisely enough that Week 2 can address it. Not: "I'm not sure about agents." But: "I don't yet know whether M365 Copilot handles document synthesis differently when documents are stored in SharePoint vs. Teams — and that distinction matters for the ANZ-type task."
Reflection question 3 — Week 2 handoff
What do you want to carry into Week 2?
Your tool map entry from Activity 3.3 carries forward automatically. Beyond that — what question, what unresolved judgment, or what open item from this week do you want Week 2 to address? Write one sentence.
Your refined tool map entry — from Discussion 3.4 · Week 1 persistent artifact
This is your Week 1 persistent artifact. It carries into Week 2.
Week 1 complete — three things carry into Week 2
Your full tool map — a live document extended across all 14 weeks
Your initial mental model — a benchmark you will return to across the course
Your open items — the things you flagged as untested become Week 2 targets