InvestorsHub Logo
Followers 65
Posts 5557
Boards Moderated 0
Alias Born 01/17/2005

Re: io_io post# 250

Saturday, 03/15/2008 5:59:58 PM

Saturday, March 15, 2008 5:59:58 PM

Post# of 890
More on NTB Test per Natexis Bleichroeder report



It is clear from recent controversy that the Street (and many physicians) are unfamiliar with the neuropsychological test battery (NTB), and it is understandable that some may have concerns as it is not as well understood as the previous gold standard ADAS-cog. We think some on the Street might question the validity of the NTB as it was constructed by Elan/Wyeth.

We do not find it unusual for a pioneer in the field to advance the science. The NTB is just that — an advancement,
in our view. In this note we took a deeper look into the NTB, the areas of overlap and differences it shares with the ADAS-cog, and the kind of data to expect from the bapineuzumab Phase II trial.

* It is still not clear whether NTB is truly the primary endpoint in both the Phase II and Phase III programs as Elan has remained mum on the issue. The NTB is a compilation of nine different previously used scales; it is more complex than the ADAS-cog and thus is not an “easier” outcome measure. The NTB remedies what has long been lacking with ADAS-cog by including executive function as a subscale. Impairment in executive function is a core feature of Alzheimer’s, as
executive dysfunction leads a patient to lose the ability to plan, organize, set, and adapt current and past knowledge to future behavior. The NTB also enhances measurements of cognitive function with the inclusion of immediate as well as delayed memory recall, which is not part of the ADAS-cog and a prominent point of contention among practitioners. The ADAS-Cog measures only immediate memory recall and not delayed memory recall, which is important for measuring impairment early in the course of dementia. Delayed recall measurement is thought by experts to be more demanding on patients than measurement of immediate recall. Thus, measurement of delayed memory recall might biased towards showing decline instead of improvement.

* Elan has not disclosed the format of the data presentation for the Phase II when it releases the top line. The outcome of the NTB scale is a composite score that would be accompanied by a typical a p-value and is calculated using a z-score. This is a standard deviation unit, and the Street shouldn’t be taken aback by the small numerical value of the efficacy. In fact, we don’t expect the reported value to be much greater than 0.3 — depending on the dose arm.

* We think some on the Street have concluded that the primary endpoint in the Phase II and Phase III is the NTB based chatter from various sources. Whatever the primary endpoint is, we don’t think it matters greatly since ADAS-Cog and several other scales will also be measured. Elan can change the primary endpoint before the Phase III data are unblinded. If Elan/Wyeth chooses the NTB we still think there is a strong likelihood that bapineuxumab will get approved if it hits
statistical significance and barring outstanding safety issues. The requirement of ADAS-cog is certainly not set in stone, and we think the FDA, too, is committed to advancing science if indeed a more sensitive scale can be validated.

DIFFERENCES BETWEEN ADAS-COG AND NTB

ADAS-COG

The cognition portion of Alzheimer’s Dementia Assessment Scale (ADAS-cog) is the gold standard and has become widely used since its first publication in 1984 as the primary outcome measure in Alzheimer’s Disease trials. It is certainly the FDA-preferred primary endpoint and was the primary efficacy metric in all pivotal clinical trials on acetylcholinesterase inhibitors that are currently on the market. As a single instrument it has the advantages of simplicity of interpretation, administration efficiency, and limited time demands on patients.

However, the ADAS-cog does not contain certain cognitive elements that are, in hindsight, important in measuring impairment.

• It does not include any tests designed to assess attention, concentration, planning, and working which combine to form executive function.

• It measures immediate memory recall and not delayed memory recall, which is important for measuring memory impairment early in the course of dementia.

• It is not as sensitive at the extremities of the Alzheimer’s severity scale (mild and severe patients) as it is to the middle range (moderate Alzheimer’s). A fundamental assumption of the ADAS-cog is patients on placebo would decline by approximately seven points per year on the scale. However, clinical trials on Aricept and Exelon in patients with mild Alzheimer’s (MMSE score > 18) reported only a modest change of about a one point decline in the placebo group after six months. Anecdotal evidence suggests that the annual change is closer to the order of four points decline.

NTB

The neuropsychological test battery (NTB) comprises nine validated full-length tests. It attempts to address the deficiencies of the ADAS-cog by including not only tests of memory but also of executive function as parts to its overall structure. The overlaps between the two are found in some of the cognitive assessments (which we will discuss below), but the NTB differentiates by including the following:

• The inclusion of executive function is a vast improvement over ADAS-cog and fills in a glaring hole in the ADAS-cog. This incorporation enables the assessment of the ability to plan, organize, set, and adapt current and past knowledge to future behavior.

• Subtests that measure both immediate and delayed visual and verbal memory recall of the NTB. Delayed verbal recall has been repeatedly demonstrated to be a sensitive tool of cognitive decline in AD patients.

• The NTB incorporates tests of paired associative learning. This is important because the brain areas — the temporal lobe structures — that perform this kind of learning are reported to be among the first to experience degeneration into AD.

• It is sensitive to the extremities of the Alzheimer’s severity range as well as the mid-range.

• The representation of the final numerical outcome for the NTB is a z-score, which is the conversion of a raw score of a test to a standardized unit known as a standard deviation.

The use of nine full-length tests may seem demanding on a test giver and a patient. However, we note it takes on average just 40 minutes to administer all nine assessment tools, which is in line with the time required to give the cognitive and non-cognitive parts of the ADAS. These tests have been in wide-spread use and administered by psychometricians for decades. They should have some familiarity if these components are not already part of their repertoire.

AREAS OF COGNITIVE OVERLAP

The cognitive part of the ADAS contains ten tasks and the NTB includes 6 tasks. Our comparison of these tasks, though we are no experts, shows that the area of direct overlap is predominantly in the tasks that measure memory recall. This is not surprising as loss of memory function is a major part of the neurodegenerative disease of Alzheimer’s.

There are also a number of areas that have indirect overlap between the two neuropsychological tests. The ADAS-cog tend to take more specific measures of tasks that the NTB components ignore. These tasks are typically incorporated into the NTB tests but are not measured. It’s unclear why the NTB doesn’t measure them, but they are essential parts of the tests that a failure/difficulty to execute on them would result in a bad score. This applies to the tasks of remembering test instructions, following commands, and comprehension of spoken language.

The NTB, however, does not have tasks that measure orientation and praxis; they contribute 25% of the cognition score. Orientation determines how well oriented a patient is to time and places, such as day of the week and where they were. We think orientation is directly measure in the NTB attention test of forward digit span. Praxis has two parts. Constructional praxis asks a patient to copy some geometric shapes and ideational praxis determines a patients ability to perform a familiar but complex sequence of action, such as the sequence that’s required to mail a letter (place letter in envelope, put stamp, to and from addresses, etc).

DIRECT AREAS OF OVERLAP BETWEEN ADAS-COG AND NTB

ADAS-Cog Symptom Area NTB

Task Component

Memory

Word recall RAVLT/WMS verbal immediate

Word recognition RAVLT/WMS verbal immediate

Remembering test instructions

Orientation

Language

Naming WMS visual immediate

Following commands

Spoken language ability

Word-finding difficulty

Comprehension of spoken language

Praxis

Copying drawings

Ideational praxis

Source: Arch Nuerol/Vol 64 (No 9)

NO FOUL PLAY WITH NTB

We think the Street questions the validity and potential acceptance of the neuropsychological test battery (NTB) as a true outcome measure for Alzheimer’s trials, as it was constructed by Elan and Wyeth and is a crucial efficacy measure in the bapineuzumab program (and potentially in all other Alzheimer’s programs).

We do not think there is any foul play such as to deceive the regulatory agency and medical community with an outcome measure that would favor the companies’ clinical programs. These concerns, in our view, are unwarranted for the following reasons:

• It’s not unusual that a pioneer in the field — like Elan is in Alzheimer’s — should try to advance the science and further improvements — especially on outcome scales that are as subjective as those for Alzheimer’s.

• Elan did not invent the nine components of the NTB, which are validated and widely accepted and used neuropsychological tests. Again, these components are not new. The Wechsler Memory published in the Journal of Psychology. A more familiar form is the Wechsler Adult Intelligence Scale, which is broadly used as a general IQ test. The Rey Auditory Verbal Learning Test (RAVLT) which makes up two of the components was developed in the 1960s.

• Each component of the NTB is an original test or subtest (in the case of the Wechsler Memory Scale); so Elan did not change any of the questions or tasks to suit their need. On the other hand, the tests making up the ADAS-cog are revised versions of longer ones and thus may not be as comprehensive as the full length tests that are the components of the NTB.

THE NEUROPSYCHOLOGICAL TEST BATTERY (NTB)

Here we outline the structure of the NTB, provide brief descriptions of the nine full-length tests that form the NTB, and how the numerical outcome of the NTB is calculated. The NTB consists of two overall parts — memory and executive function. A majority of the nine components went into shaping the NTB. These are immediate and three delayed memory tests. The executive function comprises three tests — the Digit Span with the forward task measuring attention and the backward task measuring working memory, and the Controlled Word Association and the Category Fluency tests that support language fluency.

THE STRUCTURE OF THE NTB

Memory Immediate Wechsler Memory Scale visual immediate (score range, 0-18)

Wechsler Memory Scale verbal immediate (score range, 0-24)

Rey Auditory Verbal Learning Test (RAVLT) immediate (score range, 0-105)

Delayed Wechsler Memory Scale visual delayed (score range, 0-6)

Wechsler Memory Scale verbal delayed (score range, 0-8)

Rey Auditory Verbal Learning Test (RAVLT) delayed (score range, 0-30) *

Executive Function Wechsler Memory Digit Span (score range, 0-24)

Category Fluency Test (CFT)

Controlled Word Association Test (COWAT)

*The RAVLT delayed measure consists of recall and recognition performance components that sums to a score ranging from 0 to 30 ........Source: Arch Nuerol/Vol 64 (No 9)

Wechsler Memory Scale is a composite of subtests that are designed to assess auditory and visual memory. It is now in its third version, the WMS-III, which improves on a number of deficiencies its predecessors (the original WMS published in 1945 and WMS-Revised in 1987). The WMS-III has 11 subsets, six of which are considered primary measures and five are optional. The primary subtests measure immediate, general (or delayed), and working memory. Elan’s NTB includes most of the primary subtests and does not include Spatial Span. We do not know which visual memory subtests — Faces or Family Pictures, are included. It’s unclear whether the Logical memory I and II subtests are part of the NTB. Below we provide structure outlines of the WMS-III and brief descriptions of the subtests.

SUBTESTS OF THE WMS-III

Auditory Presentation.................. Visual Presentation

Primary Logical memory I and II........ Faces I and II

Verbal Paired Associates I and II...... Family Pictures I and II

Letter-Numbering Sequencing Spatial Span...Optional Information and Orientation Visual Reproduction I and II

Word Lists I and II

Mental Control

Digit Span

Source: Natixis Bleichroeder Inc.

THE STRUCTURE OF PRIMARY INDEX SCORES

Immediate memory Auditory Immediate Logical Memory I Recall

Verbal Paired Associates I Recall

Visual Immediate Faces I Recognition

Family Pictures I Recall

General Memory Auditory (Delayed Recall) Logical Memory II Recall

Verbal Paired Associates II Recall

Auditory Recognition Delayed Logical Memory II Recognition

Verbal Paired Associates II Recognition

Visual (Delayed) Faces II Recognition

Family Pictures II Recall

Working Memory Auditory Letter-Number Sequencing

Visual Spatial Span

Source: Natixis Bleichroeder Inc.

Logical Memory I and II. An examiner reads aloud two paragraphs and the patient recalls them both immediately and after a delay. A yes/no recognition test follows the delay.

Faces I and II. The patient must recognize faces, immediately following a presentation and after a delay.

Verbal Paired Associated I and II. In this subtest an examiner reads out a list of word pairs. Immediately after the examiner reads out one word and the patient must supply the other word that went with it. The list is repeated four times. The pairs are also tested after a delay.

Family Pictures I and II. The patient sees four scenes one at a time and then must recall the characters in the scene and what they were doing. Recall is also tested after a delayed.

Wechsler Memory Digit Span. is a subtest in the Wechsler Scales, and is primarily used as a measure of an individual’s working memory. In this subtest, the test administrator reads out a series of digits of increasing length and the examinee must repeat these digits in the same order for the forward measure or in the reverse order for the backward measure. The examiner scores the forward digit span as the number of items correctly repeated on two successive trials. This is measure of executive function in the NTB. The forward task is believe to measure simple/immediate memory storage. The backward task is more difficult for patients, and performance may depend on the coding and visualization strategies used by patients in additional to serial retention. It is thus believed to be more sensitive to measuring impairments associated with neurological conditions because it requires a greater memory component than the forward task. So, it’s not surprising that scores for the forward task tend to be greater than scores for the backward task.

Controlled Word Association Test (COWAT) measures a person’s ability to make verbal associations to specific letters (phonological fluency or letter fluency task). Two versions are available, and either may be used as they are considered equivalent. Version A uses the stimulus letters C,F, and L and version B uses P, R, and W. An examiner would say a letter of the alphabet and the examinee has one minute to say as many words as he can think of starting with the letter. These words, however, cannot be proper names, such as names people or places (i.e., “Buffalo” or “Bill”) and cannot be the same words again with a different ending (such as “cut” and “cutting”). The record sheet provides numbered lines on which the subject’s responses can be entered. All incorrect responses should be entered verbatim, and a repetition of the word is accepted only in cases where the subject definitely indicated an alternate meaning. The total score is the sum of all admissible words.

COWAT was developed by Arthur Benton as a relatively brief aphasia test battery (1969). The sets of letters (CFL and PRW) were not randomly chosen, but on the basis of their difficulty and frequency of words beginning with these letters. Thus, these two versions are determined to be of equivalent difficulty.

Category Fluency Test (CFT) measures category fluency task (semantic fluency/memory). The patients in this test are asked to name as many members of a category as possible (e.g. dog, cat for the ANIMAL category) in a designated time, which is typically one minute. The animal category is most commonly used; examples of extinct (Tryanosaurus), imaginary (unicorn), or magic (Alladin) animals are admissible, but given names like “Tom” and “Jerry” are not.

Simplistically, semantic memory is the part of long-term memory that deals with words, what they look like and represent, and how they are used in an organized way. It is unusual for a healthy person to forget the meaning of a word like "dog," or to be unable to conjure up a visual image of a cat when the word is heard or read. Semantic memory contrasts with episodic memory, where memories are dependent upon a relationship in time. An example of an episodic memory is "I, Corey, made the best stock call of my life a day before Thanksgiving."

Rey Auditory Verbal Learning Test (RAVLT) developed by Andre Rey in France in the 1960s and is the oldest test that uses word list learning as a method of assessing memory impairment. The RAVLT provides measures of recall following short and long delay periods (immediate or delayed memory), efficiency of learning, and effects of interference. The test takes approximately 10 to 15 minutes to administer. The standard administration format is for the examiner to read a list of 15 words aloud to the patient five times with recall being tested after each reading. A second list is read and after the recall of the second list is conducted, recall of the first list is tested again, which is the sixth recall of the first list. A delayed recall is a seventh recall test of the first list after a wait of 20–30 minutes.

THE NTB NUMERICAL OUTCOME IS A Z-SCORE

The numerical presentation of the NTB outcome will be a z-score, known as the NTB z-score. This is derived by averaging the z-scores from the nine components. The individual z-score for executive function and for memory can also be found by taking the average of the z-scores of their components. The change from baseline is calculated as the post-baseline z-score minus the baseline score, such that a positive change indicates an improvement from baseline. As a z-score is a standard deviation unit, the magnitude of the number is small. We thus do not expect the NTB z-score for the bapineuzumab Phase II to be a large number. This does not mean that the efficacy is weak, though.

Elan has not announced exactly the data presentation format for the bapineuzumab Phase II. The initial press release in June is likely to be brief and then the full data set are expected to be presented at the end of July at ICAD. The data presentation for the NTB should be similar to that for the AN1792 Phase II trial. There analyses of the z-score across the NTB revealed differences favoring antibody responders. The composite NTB z-score for the treated and placebo arms to 12 months, standard deviation, and p-value were given (0.03 ± 0.37 treated vs -0.20 ± 0.45 placebo, p = 0.020). Note that the value for the z-score is numerically very small. Although there is no direct translation between NTB and ADAS-cog, the placebo change of –0.2 on the NTB could equate to the seven-point decline on the ADAS-cog at one year.

THE FINAL OUTCOME OF THE NTB IS A Z-SCORE

Memory Wechsler Memory Scale visual immediate (score range, 0-18) z-score

Wechsler Memory Scale verbal immediate (score range, 0-24) z-score

Rey Auditory Verbal Learning Test (RAVLT) immediate (score range, 0-105) z-score

Wechsler Memory Scale visual delayed (score range, 0-6) z-score

Wechsler Memory Scale verbal delayed (score range, 0-8) z-score

Rey Auditory Verbal Learning Test (RAVLT) delayed (score range, 0-30) * z-score

Executive Function Wechsler Memory Digit Span (score range, 0-24) z-score

Category Fluency Test (CFT) z-score

Controlled Word Association Test (COWAT) z-score

*The RAVLT delayed measure consists of recall and recognition performance components that sums to a score ranging from 0 to 30 average
to get memory z-score average
to get executive function z-score average
to get the NTB z-score

Source: Arch Nuerol/Vol 64 (No 9) and Natixis Bleichroeder Inc.


A z-score is a conversion of a raw score on a test to a standardized (normalized) score. It is derived by subtracting the arithmetic mean from the raw score and dividing by the standard deviation usually of the arithmetic means (x - μ) / σ. The standard score indicates how many standard deviations an observation is from the mean, which has a z-score of zero. A z-score of 3 (three standard deviation from zero) represents
99.9% of the area under a standard normal distribution curve.

ADAS-COG

The Alzheimer’s Disease Assessment Scale (ADAS) is the current gold standard for measuring the severity of major dysfunctions in cognitive and non-cognitive behaviors in Alzheimer’s patients. It is a two-part performance-based scale comprising 21 items, 11 of which to assess cognitive function (ADAS-cog) such as memory and orientation and 10 to evaluate non-cognitive (ADAS-Noncog) such as mood state and behavioral changes.

The ADAS-Cog has become the standard primary outcome measure to determine the effects of drugs in Alzheimer’s patients. It is organized into four cognitive categories -- memory, orientation, language, and praxis – that are further subdivided into 11 cognitive tasks. They include recall of test instructions, word following commands, word-finding difficulty in spontaneous speech, naming objects and fingers,
constructional praxis, and ideational praxis. For descriptions of these tasks we’ve provided a full copy of the assessment tool below. Translation of the scale into different languages can present substantial biases and therefore cannot be directly compared.

Scores for the 11 cognitive tasks sum to a total of 70 points. A score of 0 means the patients made no errors during the assessments and a score of 70 means the patient is profoundly demented. In practice, a healthy person scores between 5 and 10 points.

The commonly accepted belief is that the mean level of change in ADAS-Cog over one year for untreated AD patients is about 7 points (though several studies have suggested significantly less). Clearly, the level of change depends on how ill the patient is at the time of the first assessment (baseline). Hence, an AD patient who is only mildly impaired will likely show a smaller change over the subsequent year than patients who are more severely impaired. But for a patient who progresses to very severe AD the annual rate of deterioration tends to slow down and so the level of change over a year for this type of patient is likely to be smaller.

THE STRUCTURE OF THE ADAS

Symptom area Points

Task (range)

Cognitive 0 - 70

Memory 0 - 25

Word recall 0 - 10

Word recognition 0 - 12

Remembering test instructions a 0 - 5

Orientation 0 - 8

Language 0 - 25

Naming 0 - 5

Following commands 0 - 5

Spoken language ability a 0 - 5

Word-finding difficulty a 0 - 5

Comprehension of spoken language a 0 - 5

Praxis 0 - 10

Copying drawings 0 - 5

Ideational praxis 0 - 5

Noncognitive 0 - 50

Agitation 0 - 20

Excess motor activity a 0 - 5

Pacing a 0 - 5

Uncooperative to testing a 0 - 5

Tremors a 0 - 5

Depressed mood 0 - 10

Depressed mood a 0 - 5

Tearfulness a 0 - 5

Psychosis 0 - 10

Delusions a 0 - 5

Hallucinations a 0 - 5

Miscellaneous 0 - 10

Attention/concentration a,b 0 - 5

Weight change a,c 0 - 5

Total score 0 - 120

a) Clinician rated item.

b) May be included as a cognitive item.

c) May be associated with depression or cognitive impairment


"....on the biotech battle-field, you need some élan...."