Bicameral LLC v. NXP USA, Inc. et al

Western District of Texas, txwd-6:2018-cv-00294

Exhibit 9

Interested in this case?

Current View

Full Text

6 Kheyfits Declaration Exhibit 9 Case 6:18-CV-00294-ADA Documen USUDDY United States Patent [19] Augsburg et al. US005996092A [11] Patent Number: [45] Date of Patent: 5,996,092 Nov. 30, 1999 ....... 2 1541 SYSTEM AND METHOD FOR TRACING PROGRAM EXECUTION WITHIN A PROCESSOR BEFORE AND AFTER A TRIGGERING EVENT 5,408,637 4/1995 Shimizu ...... 5,410,685 4/1995 Banda et al. .. 5,560,036 9/1996 Yoshiva 5,564,028 10/1996 Swoboda et al. .... 5,642,479 6/1997 Flynn 5,675,729 10/1997 Mehring ............. 5,724,505 3/1998 Argade et al. 5,848,264 12/1998 Baird et al. .......... 395/500 714/38 712/227 712/227 714/45 714/37 714/45 .......... 395/500 I flynn .................. ............ [75] Inventors: Victor Roberts Augsburg, Apex; Jeffrey Todd Bridges, Raleigh; Thomas K. Collopy, Cary; James N. Dieffenderfer, Apex; Thomas Andrew Sartorius, Raleigh, all of N.C. Primary Examiner-Kenneth S. Kim Attorney, Agent, or Firm—Steven B. Phillips; Winstead, Sechrest & Minnick; John D. Flynn [73] Assignee: International Business Machines Corporation, Armonk, N.Y. (57) ABSTRACT [21] Appl. No.: 08/760,553 [22] Filed: Dec. 5, 1996 [51] Int. C1.6. G06F 11/30 [52] U.S. CI. ........... 714/38; 714/37; 714/45; 712/227 [58] Field of Search .......... 395/183.06, 183.13, 395/183.21, 500; 714/37, 38, 45; 712/227 A system and method for tracing program code within a processor having an embedded cache memory. The non- invasive tracing technique minimizes the need for trace information to be broadcast externally. The tracing tech- nique monitors changes in instruction flow from the normal execution stream of the code. The tracing technique moni- tors the updating of processor branch target register contents in order to monitor branch target flow of the code. Tracing of the program flow includes tracing instructions both before and after a trace triggering event. The implementation of periodic synchronizing events enables the tracing of instruc- tions occurring before and after a triggering event. [56] References Cited U.S. PATENT DOCUMENTS 3,707,725 12/1972 Delheim 714/38 4 Claims, 8 Drawing Sheets EXTERNAL BUS 1 104 up 101 i 100 03 DEBUG | INSTRUCTION CACHE CONTROL LOGIC LR CTR IAR 108 109 110 114 105 SE COUNTER 120 FIFO OFFSET COUNTER | 122 115 7107 DRIVER SERIAL 118 [0:2] 1197 [0:3] TO TRACE TOOL TO TRACE TOOL BC_GEN_0002985 6 5,996,092 SYSTEM AND METHOD FOR TRACING system resources (e.g., bus bandwidth), degrading perfor- PROGRAM EXECUTION WITHIN A mance to support a feature that is only used during software PROCESSOR BEFORE AND AFTER A debug operations. If the trace data is switched on only when TRIGGERING EVENT acquiring a trace, it may affect the timing of the program by 5 delaying the processor's normal access to the shared pins, CROSS-REFERENCE TO RELATED PATENT and thus will be intrusive. Dedicated pins can alleviate this APPLICATION problem; however, to maintain low cost of the IC, the pin This application for patent is related to U.S. patent count must be kept as low as possible. application Ser. No. 08/283,128 entitled "A SYSTEM AND A previous invention, disclosed within the cross- METHOD FOR PROGRAM EXECUTION TRACING 10 referenced patent application, described a set of hardware WITHIN AN INTEGRATED PROCESSOR", now issued additions made to a microprocessor to provide a non- U.S. Pat. No. 5,809,293, which is hereby incorporated by intrusive, real-time trace capability with low additional cost reference herein. to the processor IC. However, that solution had the follow- ing deficiencies: TECHNICAL FIELD (1) It could only trace forward from a TE. That is, once The present invention relates in general to data processing the TE was recognized, trace information was provided systems, and in particular, to program execution tracing to reconstruct an instruction trace from the clock on within an integrated processor. which the TL occurred and some finite number of clock cycles (dictated by the depth of the external trace BACKGROUND INFORMATION 20 acquisition buffer) after the TE. When debugging, a The present invention addresses the need to acquire a softwarc enginccr may often wish to trigger thc capture real-time trace of program execution from a highly inte- of the trace when some extraordinary error or event grated microprocessor. Typically, users wish to obtain a happens, and then to see a trace of the instructions that "trace" or listing, of exactly what instructions execute dur- preceded the unexpected event, to determine what 25 ing each clock cycle for a limited period of time during the caused the event. For example, one might wish to execution of a program in order to debug or analyze the acquire a trace whenever the processor vectors to an performance of the program. A "real-time" trace is one that error exception handling routine. In order to determine can be acquired while the program runs at normal speed, in the cause of the error, one must use the trace of the actual system environment, and can be triggered by some 2 instructions before the error was recognized. The system event recognized by the trace acquisition system. instructions executed after the error occurs are just Note that since any buffer used to acquire a trace will have those of the exception handling routine, and tracing a finite number of entries that will likely be much smaller them will be of little use in determining the cause of the than the number of clocks consumed in the execution of the error. program, the trace acquisition system must be able to as (2) It can only indicate a single TE on the output pins. The selectively retain only the information for the clock cycles of ability to indicate multiple TЕs is useful if the user interest, i.e., those just before and just after the "trigger" wants to count Tes and retain the trace information for event ("TE"). Further, the system must provide a means for the time period around the Nth TE. synchronizing the TE with the contents of the trace buffer so (3) The partitioning of the solution did not lend itself to that the user can tell exactly what instructions were execut-40 reducing cost in a "CORE+ASIC" environment. In this ing during the clock cycle that the TE occurred. A "non- type of design environment, a central processing unit invasive" trace is one that can be acquired without disturb- ("CPU") is provided as a large "macro" or "mega-cell" ing the timing behavior of the program from its behavior to be used as an element of an Application Specific while not being traced. Integrated Circuit ("ASIC"). The CPU is a "hard A difficulty in acquiring a trace from a highly integrated 45 macro"; that is, it is a physical design implementation processor stems from the invisibility of most of the signals that is placed onto the ASIC as a whole and is not required to derive the trace. A typical approach to deriving subject to any type of changes or physical optimiza- an instruction trace requires one to determine the location of tions. Since some ASICs may need support for tracing an instruction being executed on a particular clock cycle (ie., and some may not, it is desirable to add as little at the start of the trace), and then to determine for subsequent 50 hardware to the CPU as possible and allow for another clock cycles how many instructions are executed, whether macro block or some part of the ASIC logic to imple- they are taken or not if they are branches, and the target ment the bulk of the additional logic necessary to addresses for the taken branches. support trace operations. In this manner, one could Because the processor has an integrated instruction cache, easily remove the logic used to support tracing when it the instruction address bus is not accessible externally and 55 is not required on a particular ASIC. The previous hence, each instruction fetch cannot normally be seen. Also, solution described within the cross-referenced patent the signals that indicate the number of instructions executed application used three registers in the CPU dedicated to each cycle and the direction taken by conditional branches the tracing function; removing them from the CPU is are not usually available externally to the integrated circuit desirable. ("IC"). Therefore, some information must normally be 60 (4) The processor operation had to be stopped in order to exported from the microprocessor in order to acquire the read the dedicated registers. Stopping the processor trace. This information should appear on the external pins of operation may be inconvenient or impossible. For the IC; either on pins that are already used for other purposes example, if it was desired to acquire several trace such as external data and address buscs, or on pins dedicated fragments over the time that the processor runs a to the tracing function. 65 relatively long task, the processor could not be stopped Multiplexing trace data onto existing pins has two poten- to retrieve the information from the dedicated registers tial problems. If the trace runs all the time, it will contend for without affecting the application that was being traced. BC_GEN_0002994 6 5,996,092 Thus, there is a need in the art for an improved tracing FIG. 7 illustrates a flow diagram of the transmission of operation for an integrated processor that addresses the data to the FIFO as a result of synchronizing events or above four issues. execution of mtlr, mtetr, or exception vectoring; FIG. 8 illustrates a trace acquisition buffer; and SUMMARY OF THE INVENTION FIG. 9 illustrates a trace acquisition buffer and a debug- The foregoing needs are addressed by the present inven ging workstation. tion which provides a system and method for acquiring non-invasivc real-time instruction traccs from an integrated DETAILED DESCRIPTION processor with the following advantages: 10 In the following description, numerous specific details are (1) The present invention allows for trace acquisition both set forth such as specific word or byte lengths, etc. to provide before as well as after a triggering event ("TE") is a thorough understanding of the present invention. However, recognized by the system. it will be obvious to those skilled in the art that the present (2) Multiple TЕs can be indicated by the CPU and counted invention may be practiced without such specific details. In by the external trace gathering system. Former trace 15 other instances, well-known circuits have been shown in acquisition systems started broadcasting trace informa block diagram form in order not to obscure the present tion when the first TE occurred, and only that one TE invention in unnecessary detail. For the most part, details was indicated. Multiple TЕs are useful, for example, if concerning timing considerations and the like have been a user wishes to trace the Nth time through a certain omitted inasmuch as such details are not necessary to obtain section of code. 20 a complete understanding of the present invention and are (3) Some dedicated hardware is removed from the CPU within the skills of persons of ordinary skill in the relevant and replaced with hardware that can be easily parti- art. tioned from the CPU, thus making the solution less Refer now to the drawings wherein depicted elements are costly for CORE+ASIC products that do not require the not necessarily shown to scale and wherein like or similar tracing capability. 25 elements are designated by the same reference numeral (4) Stopping of the processor to read the dedicated through the several views. registers is not required. The trace pins can be exam In order to completely reconstruct an instruction trace, the ined and the information on these pins retrieved "on user must be able to determine whether any instructions are the-fly". As a result, it is possible to acquire several executed on each clock cycle being traced, and the address trace fragments over the time that the processor runs a 30 of any such instructions. The system described within this relatively long task, and the processor operation is not application and the cross-referenced application noted stopped, which alleviates the problem of affecting the above, operates by dedicating a few pins to the trace application that is being traced. function and by broadcasting a data stream on those pins, More specifically, the present invention periodically gen which allows the external acquisition system to reconstruct erates synchronizing events and sends the synchronizing 35 the trace. events to an external trace acquisition buffer so that when a In an alternative implementation, one might choose to triggering event occurs, there will be a predesignated num broadcast the address of each instruction executed on every ber of stored instructions between the synchronizing events clock along with a validation bit, but this may not be and the triggering event. practical because it would require a broadcast of too many The foregoing has outlined rather broadly the features and 40 bits for each executed instruction. A 32-bit PowerPC micro- technical advantages of the present invention in order that processor (available from IBM Corporation) implementa- the detailed description of the invention that follows may be tion would require 30 address bits and a valid bit using this better understood. Additional features and advantages of the solution. Processors that have the capability to execute invention will be described hereinafter which form the multiple instructions per clock cycle exacerbate the prob- subject of the claims of the invention. 45 lem; a 2-way superscalar machine would need 62 bits of information broadcast each cycle. BRIEF DESCRIPTION OF THE DRAWING The information content of such a stream of addresses is For a more complete understanding of the present low, however, since most of the time the processor is just invention, and the advantages thereof, reference is now so executing instructions in line. For in-line code (i.e., no made to the following descriptions taken in conjunction with branches), each address is mostly the same as the one before the accompanying drawings, in which: it, and can be completely determined from the one before it. FIG. 1 illustrates a diagram of an embodiment of the Accordingly, a coding of the information has been devel- present invention for performing tracing of a typical micro- oped that uses the available bandwidth provided by the processor; 55 relatively small number of trace pins significantly more FIG. 2 illustrates a flow diagram of a loading of the FIFO efficiently. The coding is accomplished by broadcasting only utilized within one embodiment of the present invention; the relevant state changes for each processor clock, as opposed to the complete state for each clock cycle. FIG. 3 illustrates a flow diagram of sending TE and For the PowerPC architecture, the only state-change serialized FIFO output information to the TS pins; 60 information required for most clock cycles is the number of FIG. 4 illustrates a flow diagram of the transmission of instructions executed, and which, if any, were taken status information; branches. This information is designated as the execution FIG. 5 illustrates a flow diagram of the encoding of a status ("ES") of the processor trigger event; There are a few CPU operations, however, that can FIG. 6 illustrates a data processing system employing an 65 change the program flow in a way that cannot be calculated embodiment of the present invention or of a debugging from the execution status information without some addi- workstation; tional information. There is also some other information BC_GEN_0002995 6 5,996,092 de required by the reconstruction process, the sum of all these cessor 100. LR 108 is typically used for subroutine CALL/ data requirements is listed below. RETURN sequences within microprocessor 100. First, most of the branch targets in a PowerPC instruction Registers 108-110 are architected registers that are typical stream can be calculated from the program listing as they are in microprocessor designs. CTR 109 and LR 108 are soft- relative to the address of the branch itself; the exceptions are 5 ware accessible using the instructions MTLR, MFLR, branches to the link register ("LR") or count register MTCTR and MFCTR, which are well-known in the art. ("CTR"). When these instructions are executed, the recon These instructions move values between these registers and struction algorithm must determine the value of the LR or general purpose registers within microprocessor 100. LR CTR to calculate the branch target address. While the 108 and CTR 109 are also used by the BCCTR and BCLR hardware could broadcast the value of the LR or CTR each 10 branch instructions as branch targets, or as in the case of the time a branch to LR ("bclr") or a branch to CTR ("bcctr") BCL, BCLRL, or BCCTRL, LR 108 stores the return is executed, a preferred solution is to have the reconstruction address to be used at a later time. Again, such instructions software track the values of the LR and CTR, and the CPU are well-known in the art. IAR 110 is an internal processor broadcasts only changes to the LR and CTR that could not resource that is used to keep track of the instruction address be determined by inspection of the program listing. These 15 that is currently being executed. As a result of the above, changes are merely the executions of move-to-link-register registers 108-110 are physically accessible by the present ("mtlr") and move-to-count-register ("mtctrº) instructions; invention in well-known manners. these occur much less frequently than bcctr or belr opcodes. Mux 114 multiplexes contents from LR 108, CTR 109 and Second, if the CPU accepts an interrupt and reloads the IAR 110 for input into FIFO 102, which is a trace FIFO used instruction address register ("IAR") with an exception han- 20 to store trace address information for later output to the trace dling vector, that information must be broadcast. tool. Third, some means of signaling the clock cycles on which Mux 114 and FIFO 102 may consist of commercially TEs are recognized by the CPU is needed. available multiplexers and FIFOs, which are known to those Finally, note that since the data being broadcast from the skilled in the art. IC on a clock-by-clock basis is a description of the state 20 Trace serialization logic ("Serial Circuit") 115 serializes changes from one clock cycle to the next, at some point one the trace FIFO data received from FIFO 102 for serial must require a complete initial state (ie., the contents of the broadcast over a 4-bit bus 119 to the trace tool. IAR, LR, and CTR) from which to start a trace reconstruc- Debug logic circuit 104 provides an interface in-between tion. Any such clock cycles for which the values of the IAR, circuit 10 and a user for allowing various trace events to be LR, and CTR registers are available to the reconstruction enabled. Trace events may also be enabled via software software are called "synchronizing events" ("SES"); these executed within the data processing system employing cir- clock cycles provide the starting points for any trace recon- cuit 10 via bus 116. struction The creation of an SE requires two mechanisms. One to Referring to FIG. 1, there is illustrated a block diagram of as determine which clock cycles to designate as SEs, and one one embodiment of the present invention. Integrated circuit to provide the value of the IAR 110, LR 108, and CTR 109 10 includes logic for performing the tracing of program code registers at the point of the SE to the reconstruction soft- running out of an embedded cache (instruction cache 101) ware. within microprocessor 100. The invention disclosed within the cross-referenced Shadow lines 10 embody elements of the present inven- u embody elements of the present inven- 40 patent application created a single SE by using the TE to tion which may be incorporated on a single silicon chip. determine on which clock cycle the SE occurs; they were Microprocessor 100 may comprise any one of the numer defined to be the same. When the first TE was recognized, ous commercially available microprocessors, e.g., the Pow the values of the IAR 110, LR 108, and CTR 109 registers erPC microprocessor, model no. PPC403GA, available from were stored in registers in the CPU dedicated to this purpose. IBM Corporation, the assignee of the present invention. It is 45 The CPU then signaled the occurrence of the TE so the to be assumed that microprocessor 100 contains all thc usual external trace acquisition software could be directed to save and well-known microprocessor elements and functionality the broadcast of trace information for the TE and the clock and performs in the usual manner. Microprocessor 100 cycles that immediately follow it. At some later time after includes embedded instruction cache 101; microprocessor the code being traced had executed, the reconstruction 100 can execute code residing in cache 101, or an on-chip 50 Software could get the values of the IAR 110, LR 108, and memory, without accessing external memory 604 (see FIG. CTR 109 from the SE from the dedicated registers on the 6) through external bus 116. CPU. Link register ("LR") 108 is an architected register used to This solution had three problems as noted above. The provide a branch target address for a "branch conditional to present invention separates the generation and broadcast of link register" instruction, and to hold the return address after 55 SEs from the recognition and broadcast of the TEs. It does "branch and link" instructions. Count register ("CTR") 109 so by using an SE counter 120 to generate an SE is an architected register used to hold a loop count that can periodically, for example every N clock cycles. This tech- be decremented during execution of "branch" instructions nique can be used to provide the ability to trace the execu- that update this register. CTR 109 is also utilized to provide tion of an arbitrary number of instructions before a TE. Any the branch target address for a "branch conditional to count 60 number of TEs can be signaled and counted by the external register" instruction. trace acquisition logic before beginning to retain the broad- Instruction address register ("IAR") 110 (commonly cast data from which a trace will be constructed using the known as the program counter) is a register that contains the present invention. Reconstruction is no longer required to address of the current instruction being cxccuted within begin on the first (or any) of thc TЕs; it begins with an microprocessor 100 at any one point in time. 65 arbitrarily chosen SE instead. CTR 109 is typically used as a counter for FOR-DO loops Further, the present invention does not use dedicated or as an alternative to subroutine returns within micropro registers to store the values of the IAR 110, LR 108, and BC_GEN_0002996 6 5,996,092 CTR 109 registers for the SEs, but broadcasts them via the data are broadcast serially on three of TS pins 119, 3-bits at same method used to broadcast other information such as a time over the course of 10 cycles. (The fourth pin of pins execution of mtlr or mtctr and vectoring to exception 119 is a "1" when broadcasting address information; see the routines. coding table below.) The reconstruction process can deter- The exact information required to be broadcast depends 5 mine the cause of the broadcast by analysis of the program on the architecture of the processor being traced. The present listing that will show mtlr and mtetr instructions, and the implementation example uses seven I/O pins to broadcast execution status, which will indicate a vector to an interrupt enough information to reconstruct a trace. handler. The choice of the number of TS pins 119 is a compromise between the amount of bandwidth required and Three of the seven pins encode the execution status ("ES") of a two-way superscalar CPU. These ES pins are 10 the cost of adding dedicated pins to IC 10; this implemen- pins 118 in FIG. 1. One could use only two pins for a tation has four TS pins 119, but a design could be proposed single-issue machine, or even more pins for a machine with with a few more or less that would not be conceptually different. a more complicated execution model. There is no particular Referring back to FIG. 1, multi-word first-in-first-out preference as to which symbol represents what CPU state 15 ("FIFO") buffer 102 allows several broadcasts to be queued information; any assignment that covers all the required states is acceptable. The ES information may be binary- in the case of a "burst" of mtlr/mtetr instructions, i.e., the case of executing such an instruction before the previous encoded for each cycle as follows: broadcast is completed. If FIFO 102 is completely full when 000-no instructions were executed on this clock cycle; CPU 100 needs to make an entry to be broadcast, CPU 100 001-an interrupt occurred, transferring execution to an 20 must halt execution (stall) until the oldest entry in FIFO 102 exception vector address; has been broadcast and removed from FIFO 102. Correct 010— only first instruction available executed and it was operation of the stall program and the ability to trace that not a taken branch; program are assured in this case, but the user will see a 011-only first instruction available executed and it was performance degradation. Thus, while the depth of FIFO a taken branch; 25 102 is arbitrary with regard to correct logical function, too 100—wo instructions executed; neither was a taken few locations will degrade performance, and too many branch; locations will waste space on IC 10. For purposes of providing an example, but not meant to limit the implemen- 101-two instructions executed; the first was a taken tation of the present invention, statistical analysis of typical branch; 30 PowerPC code has shown that the choice of 8 locations 110two instructions executed; the second was a taken within FIFO 102 renders insignificant the probability of branch; stalling CPU 100 due to a full trace FIFO 102. 111-two instructions executed; both were taken As noted within FIG. 2, if an mtlr instruction is being branches. executed in CPU 100, then at step 203, the process moves to Referring to FIG. 4, there is illustrated a flow diagram of 35 step 204 whereby the value placed in LR 108 by the how ES information is broadcast from IC 10. In step 41, execution of the mtlr instruction is also placed into the status information is received from microprocessor 100 by ENTRY for loading into FIFO 102. control logic 103. Such status information may include the FIG. 2 illustrates that both a value (ENTRY) and a type execution of an instruction, the direction of any executed (TYPE) are entered into FIFO 102 as a pair, and when they branches, and the taking of any exception vectors. Next, in 40 leave FIFO 102, the TYPE is used to notify serialization step 42, control logic 103 encodes the received status logic 115 of which codes or counter values (if any) to prefix information using the encoding noted above. Then, in step to the broadcast of the ENTRY onto TS pios 119. 43, this encoded execution status information is output along In step 201, there is a determination of whether or not an bus 105 through driver 107 onto pins 118 to the trace tool SE event has occurred. If not, the process merely proceeds (see FIG. 9). This information is continuously provided on 45 to step 203. However, if an SE event has occurred, then in pins 118. step 202, parameters SE-IAR-PENDING, SE-LR- This ES information is sufficient to determine what PENDING, and SE-CTR-PENDING are made equal to 1. instructions are executed and which ones are taken branches An SE event may be determined when SE counter 120 on each cycle. It is not enough to completely trace instruc reaches a predetermined value. tions within microprocessor 100. As noted above, the trace 50 Next, in step 203, if an mtlr instruction has been executed reconstruction software process has access to the object code in CPU 100, then as described abovc, thc process moves to that is being executed, so it can use the information provided step 204 to place the value of LR 108 into the ENTRY, and on the ES pins 118 to follow in-line instructions and taken to designate the TYPE as REGULAR. The same is true for branches whose targets are specified by the instructions steps 205 and 207 with respect to the MTCTR and exception themselves. However, the trace reconstruction software 55 causing instructions being executed in CPU 100. If an must also be able to determine the value of the LR 108 or MTCTR instruction has been executed and completed, then CTR 109 registers during any clock cycle in which a branch this value is placed in the ENTRY and the TYPE is desig- to one of those targets occurs, changes in program flow due nated as REGULAR in step 206. Likewise, in step 208, if an to exceptions, when trigger events occur, and what the initial exception causing instruction has been executed, then the state of registers 108-110 are for the initial cycle of trace 60 IAR value is placed in the ENTRY and the TYPE is reconstruction (i.e., a specific SE occurrence). designated as REGULAR. Pins 119 are referred to as the trace status ("TS") pins, and If none of these instructions in steps 203, 205, and 207 are used to broadcast information that is required in addition have been executed in CPU 100, then the process proceeds to the cycle-by-cycle status provided by ES pins 118. Note to step 209 to determine whether or not SE-IAR-PENDING that execution of mtlr, mtetr, and interrupt responses occur 65 equals 1, indicating that an SE event has occurred (see step relatively infrequently, but they require the processing of a 201). If yes, the process proceeds to step 210 to place the 30-bit instruction address. Therefore, each of these pieces of value in IAR 110 into the ENTRY and to designate its TYPE BC_GEN_0002997 6 5,996,092 as SE-IAR. Additionally, the value SE-IAR-PENDING is SE information is also broadcast on TS pins 119 using returned to a 0 value. Furthermore, offset counter 122 is FIFO 102 in the same manner as information regarding mtlr, started. mtctr, and exception vectors are. In one embodiment, SES The process will then proceed to step 215 to determine are generated periodically by control logic 103 in response whether or not FIFO 102 is full, if so, step 215 will be 5 to a continuously running counter 120, which may be recycled until FIFO 102 is not full when the process will clocked by the same clock as CPU 100. Alternatively, the proceed to step 216 to enter the ENTRY and TYPE into Ses could be generated by some other means such as an FIFO 102. external input. The process then returns to step 201, and will proceed Whenever the value of SE counter 120 matches a prede- down to step 211 if no SE event has occurred and MTLR, 10 termined value (e.g., 0), an SE is generated. The "genera- MTCTR, and exception causing instructions have not been tion" of an SE is defined as setting the SE-IAR-PENDING, completed. In step 211, since the SE-LR-PENDING value is still equal to 1, the process will proceed to step 212 to enter etc. flags. the value of LR 108 into the ENTRY of FIFO 102 along with All broadcasts of SE addresses are preceded by codes on the TYPE designated as equal to SE-LR. The value SE-LR- LR TS pins 119 that identify the types of the broadcast. The PENDING will be returned to 0. 15 specific encoding of pins 119, including encoding of TES The foregoing process will also occur with respect to and other events, may be as follows: steps 213 and 214 for entering the value of CTR 109 into 0000-no broadcast FIFO 102. 0001--reserved The flow diagrams illustrated in FIGS. 2 and 7 may be 0010_-processor is in wait state utilized by one skilled in the art to design FIFO 102. 20 Microprocessor 100 includes hardware to recognize cer- 0011-processor is in stop state tain TEs including, but not limited to, the execution of 0100—trigger event (TE) certain instructions or access of data at predefined addresses 0101'SE IAR code-proceeds counter value plus SE-IAR stored in dedicated registers on microprocessor 100. broadcast Essentially, a user sets up a trace by directing the circuitry 25 0110-SE LR code-proceeds SE-LR broadcast within chip 10 to broadcast a TE when certain conditions 0111-SE CTR code-proceeds SE-CTR broadcast occur. This is performed by control logic 103 monitoring such addresses and control within microprocessor 100 and 1xxx-address broadcast (for SEs, mtlr, mtctr, exception performing a comparison with an cvcnt designated by the vectors) user through debug circuit 104. Referring to FIG. 5, this 30 (xxx)—are three bits of a 10-cycle serial broadcast of an process begins with step 51 where a TE is recognized. Then, address in step 52, the recognized TE is encoded as shown in the Note: "stop" and "wait" states are debugging and power- table below (e.g., 0100). In step 53, this encoded recognized down states of CPU 100. Users may wish to know that CPU TE is sent to serial logic 115 for broadcast on pins 119. 100 is in one of these states, so this implementation provides Generally, the external acquisition system will recognize the 35 this information on TS pins 119. CPU 100 does not execute symbol for the TE (0100) and cause the external trace buffer instructions in these states, and so for purposes of this (see FIG. 8) to save data in the temporal vicinity of the TE. invention, these encodings may be irrelevant. For example, if one uses a logic analyzer 91 (see FIG. 9) When the IAR 110 value for the SE is placed into FIFO with a buffer depth of 2000 clocks to capture the trace data, 102, offset counter 122 begins counting up from 0. When the one might program analyzer 91 to save the data from the 40 IAR 110 value for the SE is to be broadcast from FIFO 102, clocks from 1000 clocks before the TE until 1000 clocks the value of offset counter 122 is broadcast after the IAR SE after the TE. The broadcast of the TE is a little different than code and before the IAR address data. Since the value of the the broadcast of all the other information on TS pins 119 in offset counter 122 is the number of cycles since the SE was that it does not enter FIFO 102. Instead, the code (0100) for placed into FIFO 102, the reconstruction software can relate the TE is placed on TS pins 119 in the clock cycle imme- 45 the cycle on which the IAR broadcast appears on TS pins diately after the clock cycle in which the TE is recognized. 119 to the cycle in which the SE entered FIFO 102. Hence, And, if data is in the process of being broadcast from FIFO it can determine the IAR 110 value associated with a specific 102, that broadcast is deferred for the one clock cycle cycle of data from ES pins 118, and begin trace reconstruc- occupied by the broadcast of the TE code. This policy allows tion from that cycle. the TE to be related directly to the data on ES pins 118 so 50 Referring next to FIG. 7, there is illustrated a flow that the reconstruction software can discorn what instruction diagram of this process, which may be implemented within was executing when the TE was signalled. control logic 103. In step 701, a determination is made Referring next to FIG. 3, there is illustrated a flow whether or not FIFO 102 is empty. If yes, the process simply diagram of this process implemented within serial logic 115. returns upon itself. However, if FIFO 701 is not empty, then The process proceeds to step 301 to determine whether or 55 in step 702, a determination is made whether or not the not an encoded TE has been received from control logic 103. previous serialization has been completed. If not, the process If not, the process forwards to step 304. However, if an recycles upon itself. However, if the previous serialization is encoded TE has been received, then the process proceeds to complete, the process proceeds to step 703. In step 703, the step 302 wherein sending of serialized data to TS pins 119 ENTRY and TYPE are read from FIFO 102 into serialization is deferred. Then in step 303, the encoded TE signal (0100) 60 logic 115 (see FIG. 3). Then in step 704, if the TYPE is is sent on pins 119. REGULAR (see FIG. 2), the process proceeds to step 708 to In step 304, a determination is made whether or not there send the ENTRY for serialization and transmission along TS is any serialized data available to send onto TS pins 119. If pins 119. The process then returns to step 701. not, the process returns to step 301. However, if there is However, if in step 704 the TYPE is not REGULAR, the serialized data available, the process proceeds to step 305 to 65 process proceeds to step 705 to determine whether or not the send this serialized data to TS pins 119. The process then TYPE is equal to SE-IAR (see step 210). If yes, the process returns to step 301. proceeds to step 709 to send the SE-IAR code (0101), the BC_GEN_0002998 6 5,996,092 11 12 offset counter value (see step 210 in FIG. 2), and the IAR forth herein, and a reconstruction algorithm can be used to ENTRY to serialization logic 115 (see FIG. 3). reconstruct the code flow from the captured trace informa- If in step 705, the TYPE is not equal to SE-IAR, the tion. A typical trace tool might interface to debug logic 104 process proceeds to step 706 to determine whether or not the via an IEEE Std. 1149.1-1990 Std. Interface (JTAG 117), TYPE is cqual to SE-LR. If yes, then in stcp 710, the SE-LR and would monitor trace pins 118 and 119. code (0110) and the LR ENTRY are sent to serialization Referring next to FIG. 6, there is illustrated a data logic 115 (see FIG. 3). processing system operable for implementing the present If in step 706, the TYPE is not equal to SE-LR (see step invention. Processor 100 is coupled via bus 116 to random 212 of FIG. 2), then the process proceeds to step 707 where access memory 604, permanent storage 622, optional com- the TYPE is equal to SE-CTR (see step 214 in FIG. 2). The munications adapter 606, which enables communication process proceeds to step 711 to send the SE-CTR code 10 with other systems, input/output controller 612, which con- (0111) and the CTR ENTRY to serialization logic 115 (see trols interaction with video display 164, keyboard 616, FIG. 3). pointing device 618, disk controller 620, which controls The following analyzes the relationship of an SE, the interaction between processor 100 and permanent storage cxternal trace acquisition buffer depth and the minimum 622. The devices disclosed arc typically available compo- number of cycles before the desired TE for which a trace can 15 nents. A removable diskette or an optical drive could be used be reconstruction. in place of a magnetic drive for permanent storage 622 and As noted above, it is desirable to begin trace reconstruc processor 100 could be comprised of a number of processing tion on some cycle before thc TE. Tracc reconstruction can cngincs in a multiprocessor or parallel processing architec- begin with any cycle held in the trace acquisition buffer 91 ture. for which one can determine the initial state of the machine, Although the present invention and its advantages have i.e., the contents of IAR 110, LR 108, and CTR 109. These 2 been described in detail, it should be understood that various cycles are those previously designated as synchronizing changes, substitutions and alterations can be made hercin events ("SES"). The solution described within the cross without departing from the spirit and scope of the invention referenced patent application had only one SE, which was as defined by the appended claims. the same as the first TE. The present invention has multiple What is claimed is: SEs, gcncratcd and broadcast periodically. 25 1. A method for tracing real-time program execution The problem, then, is to guarantee the generation of an SE within a processor, said method comprising the steps of: cycle some number of cycles before an event of interest, that periodically generating one or more synchronizing events is, the trigger event. Then one can trace from the SE to the occurring in said processor, wherein said one or more TE, cffcctively tracing the CPU opcration before thc TE. synchronizing events signify a state of said processor; Referring next to FIG. 8, there is shown one example of 30 detecting a triggering event occurring in said processor; trace acquisition buffer 91 shown in FIG. 9. In order to guarantee that there is even an SE in trace buffer 91 at all, tracing instructions occurring after said one or more the periodicity of the Ses should be less than or equal to the synchronizing events and before and after said trigger- depth of trace buffer 91. For example, if trace buffer 91 has ing event; and some number of entries N, and the SEs occur every N cycles, 35 providing said one or more synchronizing events, trigger- a simple implementation might be to capture blocks of N ing event, and traced instructions externally from said clocks beginning with each SE cycle, and retaining the block processor. for reconstruction if the desired TE is detected within the 2. The method as recited in claim 1, further comprising saved block. This solution may not guarantee any arbitrary the steps of: number of clocks to be traced before the occurrence of the storing said one or more synchronizing events and said TE, since the TE may be at or near the beginning of the traced instructions in a FIFO; period between start cycles. serializing said one or more synchronizing events and said One alternative solution is to cause a periodic SE fre- traced instructions; and quently enough to insure that multiple SEs will be evenly transmitting said serialized one or more synchronizing distributed in trace acquisition buffer 91. Note that a trace can be reconstructed beginning from any of them. As an 45 events and traced instructions to a trace tool. example, suppose that an SE is generated every N cycles, 3. Thee method as recited in claim 2, wherein said step of and the depth of trace acquisition buffer 91 is 2N. If the periodically generating one or more synchronizing events buffer 91 locations are designated from 0 to 2N-1, and it is further comprises the step of: assumed that the trace entries are kept in temporal order copying contents of one or more registers embedded from 0 to 2N-1 as well, and the data at location 2N-1 is that 50 within said processor to said FIFO in response to a which is collected in the last cycle, and the data in location counter value. O is that which is collected 2N cycles previous, then after a 4. Acircuit for tracing real-time program execution within TE is recognized, trace buffer 91 stops acquiring new data a processor comprising: when the older SE reaches location 0. Then there will be 2 circuitry for periodically generating one or more synchro- SEs in buffer 91, one at location 0 (the oldest instruction) ss nizing events occurring in said processor wherein said and one at location N, or about halfway through buffer 91. one or more synchronizing events signify a state of said TE is captured somewhere in the second half of buffer 91, processor; and since one can trace from the older SE to the end of buffer circuitry for detecting a triggering event occurring in said 91, the ability to trace at least N cycles before the TE is processor; guaranteed. More generally, if an SE is caused every N cycles, and 60 60 circuitry for tracing instructions occurring after said one there is a trace buffer depth of mN, then the ability to trace or more synchronizing events and before and after said up to (m-1)N cycles before the TE may be guaranteed. triggering event; and Referring next to FIG.9, there is illustrated an example of circuitry for providing said one or more synchronizing a trace tool coupled to pins 118 and 119. Trace acquisition events, triggering event, and traced instructions exter- buffer 91 is coupled to debugging workstation and support- 65 nally from said processor. ing software 92. Any well-known trace tool may be used to capture the appropriate trace information in the manner set BC_GEN_0002999 EXTERNAL BUS ----- 104 U.S. Patent - - - - - 100 101 103 10 DEBUG INSTRUCTION CACHE - - - - - - - - - - - - CONTROL LOGIC LR CTR IAR 108 109 110 Nov. 30, 1999 114 - - - - - - - - - - - - - 105 6 102 SE COUNTER 120 FIFO Sheet 1 of 8 OFFSET COUNTER 122 115 107 DRIVER SERIAL 118 [0:2] 119 7 [0:3] TO TRACE TOOL 5,996,092 TO TRACE TOOL FIG. 1 BC_GEN_0002986 6 U.S. Patent Nov. 30, 1999 Sheet 2 of 8 5,996,092 201 202 YES SE - EVENT? SE - IAR - PENDING = 1 SE - LR - PENDING = 1 SE - CTR - PENDING = 1 NO 204 YES MTLR ENTRY = LR, TYPE = REGULAR 206 NO YES MTCTR ENTRY = CTR, TYPE = REGULAR 208 NO YES EXC ENTRY = IAR, TYPE = REGULAR 210 NO YES SE - IAR - PENDING=1 ENTRY = IAR, TYPE = SE - IAR, SE-IAR - PENDING = Ø; START OFFSET CTR | 212 YES SE - LR - PENDING=1 ENTRY = LR, TYPE = SE - LR SE - LR - PENDING = 0 NO,214 213 YES SE - CTR - PENDING=1 ENTRY = CTR, TYPE = SE - CTR SE-CTR - PENDING = 8 NO 215 YES FIFO FULL NO 216 FIFO — ENTRY, TYPE FIG. 2 BC_GEN_0002987 6 U.S. Patent Nov. 30, 1999 Sheet 3 of 8 5,996,092 NO ENCODED TE RECEIVED ? ANY SERIALIZED DATA AVAILABLE? YES 301 304 YES DEFER SENDING SERIALIZED DATA TO TS PINS SEND SERIALIZED DATA TO TS PINS 305 SEND TE CODE (0100) TO TS PINS 303 FIG. 3 BC_GEN_0002988 6 U.S. Patent Nov. 30, 1999 Sheet 4 of 8 5,996,092 41 RECEIVE STATUS INFORMATION ENCODE STATUS INFORMATION OUTPUT ENCODED STATUS INFORMATION FIG. 4 RECOGNIZE TRIGGER EVENT ENCODE RECOGNIZED TRIGGER EVENT SEND ENCODED RECOGNIZED TRIGGER EVENT TO SERIAL LOGIC FIG. 5 BC_GEN_0002989 6 U.S. Patent U.S. Patent Nov. 30, 1999 Nov. 30, 19 Sheet 5 of 8 set sors 5,996,99 5,996,092 100 604 606 CPU MEMORY COMM. ADAPT. 620 612 116 DISK CONTROL 10 CNTL 616 FIG. 6 BC_GEN_0002990 6 U.S. Patent U.S. Patent Nov. 30, 1999 Home Bu, shetot a 5,96,992 Sheet 6 of 8 5,996,092 701 YES FIFO EMPTY 702 NO PREVIOUS SERIALIZATION COMPLETE YES 703 READ ENTRY, TYPE FROM FIFO 704 708 YES TYPE = "REGULAR" SEND ENTRY TO SERIALIZATION NO 705 709 YES TYPE = SE - IAR SEND - SE-IAR CODE (0101) - OFFSET CTR VALUE - ENTRY (IAR) TO SERIALIZATION LOGIC NO 706 YES 710 TYPE = SE - LR SEND - SE - LR CODE (0110) - ENTRY (LR) TO SERIALIZATION LOGIC 711 TYPE = SE - CTR SEND - SE - CTR CODE (0111) - ENTRY (CTR) TO SERIALIZATION LOGIC FIG. 7 BC_GEN_0002991 6 U.S. Patent Nov. 30, 1999 Sheet 7 of 8 5,996,092 2N-1 TE N CYCLES FIG. 8 BC_GEN_0002992 6 U.S. Patent Nov. 30, 1999 Sheet 8 of 8 5,996,092 ES (0:2] TS (0:3] TRACE AQUISITION BUFFER (LOGIC ANALYZER) L 91 L 92 DEBUGGING WORKSTATION (AND SUPPORTING SOFTWARE) FIG. 9 BC_GEN_0002993