https://download.csdn.net/download/qccz123456/10567835
https://pan.baidu.com/s/1sloNYyx
Intel® SDK for OpenCL™ applications for Windows* OfflineCompiler plug-in for Microsoft Visual Studio* IDE enables you to develop OpenCLapplications with Visual Studio IDE.
The plug-in supports the following features:
· New project templates
· New OpenCL code file (*.cl)template
· Syntax highlighting
· Types and functions auto-completion
· Offline compilation and build of OpenCL kernels
· LLVM code view
· Assembly code view
· Program IR generation
· Selection of target OpenCL device - CPU or Intel Graphics
To work with the plug-in features, create an OpenCL code projecttemplate or convert an existing project into the OpenCL code project.
See Also
· ConvertingExisting Projects into OpenCL Code Projects
· Creatingand Building
· KernelDevelopment Framework
· DebuggingOpenCL™ Kernels on CPU
· DebuggingOpenCL™ Kernels on GPU
· APIDebugging for GPU and CPU
· CodeAnalyzer
· ConfiguringMicrosoft Visual Studio* IDE
· ConvertingExisting Project into OpenCL™ Project
· OpenCL™New Project Wizard
· BuildingOpenCL™ Project
· UsingOpenCL™ Build Properties
· SelectingTarget OpenCL™ Device
· Generatingand Viewing Assembly Code
· Generatingand Viewing LLVM Code
· GeneratingIntermediate Program Binaries in Visual Studio*
· ConfiguringOpenCL™ Build Options
To configure the OpenCL™ API Offline Compilerplug-in for Microsoft Visual Studio* IDE, do the following:
1. In the Visual Studiosoftware select Project > Properties.
2. In the C/C++ > General propertypage, under Additional Include Directories, enter the full path tothe directory where the OpenCL code header files are located:
$(INTELOCLSDKROOT)\include
3. In the Linker > General propertypage, under Additional Library Directories, enter the full path tothe directory where the OpenCL code run-time import library file is located.For example, for 64-bit application:
$(INTELOCLSDKROOT)\lib\x64
4. In the Linker > Input propertypage, under Additional Dependencies, enter the name of the OpenCLICD import library file OpenCL.lib.
OpenCL™ API Offline Compiler plug-in forMicrosoft Visual Studio* IDE enables you to convert a standard C/C++ project toan OpenCL project and vice versa.
To convert your project, do the following:
1. Right-click theproject you want to convert in the Solution Explorer.
2. In the project menuclick Convert to a project for OpenCL API.
OpenCL™ New Project wizard is a plug-in forMicrosoft Visual Studio* software enables developing Windows* and Android*OpenCL applications with Visual Studio IDE either from scratch (empty project)or based on template projects.
The wizard kit supports the followingfeatures:
· Creatingan Empty OpenCL™ Project for Windows
· Createa New OpenCL™ Project from OpenCL Project Template for Windows*
· Createa New OpenCL™ Project from OpenCL Project Template for Android*
To create an empty OpenCL™ project forMicrosoft Visual Studio* IDE, do the following:
1. Go to File > New > Project...
2. Select OpenCLtemplates from the Templates tree view.
3. Select EmptyOpenCL Project for Windows.
4. Fill the Name, Location,and Solution name fields and click OK
To create an OpenCL™ template project forWindows platforms in Microsoft Visual Studio* IDE, do the following:
1. Go to File > New > Project...
2. Select OpenCLtemplates from the Templates tree view.
3. Select CodeBuilder Project for Windows.
4. Fill the Name,Location, and Solution name fields and click OK.
5. In the CodeBuilder wizard for OpenCL API dialog, you can select the basicsettings for the behavior of the OpenCL application and kernel. The parametersthat can be set are platform name, device type, kernel type (images or buffermanipulation), build options, and local work group size behavior. Each fieldhas a short tool-tip explanation.
6. Click Finish tocreate the default template project or click Next to openthe Advanced Settings screen enabling you to set some advancedoptions like whether to enable profiling queue and the kernel's argumentsmemory source type. For CPU device type, you can also set the out-of-orderexecution mode and debug mode for the kernel.
7. Click Finish tocreate the default template project or click Previous toreturn the Basic Settings screen.
To build the solution using OpenCL™ APIOffline Compiler plug-in for Microsoft Visual Studio* IDE, click Build > BuildSolution.
When building solution, Intel OpenCL compilerautomatically builds attached OpenCL kernels. See the build result in the Output builddialog of the Microsoft Visual Studio IDE.
OpenCL Build properties page in the Microsoft VisualStudio* IDE enables you to set compilation flags and change target device whenbuilding an OpenCL kernel. To change the settings, do the following:
1. Go to Project > Properties.
2. Click the Intel SDK for OpenCL Applications entryunder the Configuration Properties group.
3. Modify properties and click OK.
The Intel® SDK for OpenCL™ Applications entryexists for OpenCL projects with *.cl source files attached. If the entrydoes not exist, convert an existing standard project into the OpenCL project.
Creating an EmptyOpenCL™ Project
Converting ExistingProject into OpenCL Project
OpenCL™ API Offline Compiler plug-in forMicrosoft Visual Studio* IDE enables you to choose the target device whenbuilding your OpenCL code:
· Intel CPU
· Intel® Graphics
· Intel CPU onExperimental OpenCL 2.1 Platform
The default device is CPU.
To choose a target device, do the following:
1. Go to Project > Properties.
2. Click ConfigurationProperties > Intel SDK for OpenCL Applications > General.
3. Change the Device optionaccording your needs.
4. Click OK.
OpenCL™ API Offline Compiler plug-in forMicrosoft Visual Studio* IDE enables generating assembly representation of theOpenCL code. To enable generating and viewing the assembly code, do thefollowing:
1. Go to Project > Properties.
2. Click ConfigurationProperties > OpenCL Code Builder > General.
3. Modify the propertiesas needed, and click OK.
After the build, you can open the generatedassembly file in the Visual Studio editor by double-clicking the message inthe Output view.
OpenCL™ API Offline Compiler plug-in forMicrosoft Visual Studio* IDE enables generating LLVM representation of the OpenCLcode. To enable generating and viewing LLVM code, do the following:
1. Go to Project > Properties.
2. Click ConfigurationProperties > Intel SDK for OpenCL Applications > General.
3. Change the GenerateLLVM Code option to Yes.
After the build, you can open the generatedLLVM file in the Visual Studio editor by double-clicking the message inthe Output view.
OpenCL™ API Offline Compiler plug-in forMicrosoft Visual Studio* IDE generating program binaries of the OpenCL code.
An application can use generated programbinaries to create program from binaries later (clCreateProgramFromBinary(...)). To generate intermediate program binaries, do the following:
1. Go to Project > Properties.
2. Click ConfigurationProperties > Intel SDK for OpenCL Applications > General.
3. Change the CreateProgram Binary option to Yes.
4. Click OK.
OpenCL™ API Offline Compiler plug-in forMicrosoft Visual Studio* IDE enables configuring build options for the OpenCLcode. To configure the build options, do the following:
1. Go to Project > Properties.
2. Click ConfigurationProperties > Intel SDK for OpenCL Applications > General.
3. Add build options intothe Additional build options line.
4. Click OK.
Kernel Development Framework is nativeintegrated development environment in the Microsoft Visual Studio* IDE thatenables you to build and analyze OpenCL™ kernels.
The framework supports Intel® Architectureprocessors, Intel Processor Graphics, and remote development on Android*devices. The tool provides full offline OpenCL language compilation, whichincludes:
· OpenCL syntax checker
· Cross-platformcompilation
· Low Level VirtualMachine (LLVM) viewer
· Assembly code viewer
· Intermediate programbinary Generator
· The feature alsoprovides a way to assign input to the kernel, test the correctness, and analyzekernel performance based on group size, build options, and target device.
· KernelDevelopment Framework Session
· Buildingand Compiling OpenCL™ Program
· BuildArtifacts
· CodeBuilder Build Toolbar
· Analyzingthe Kernel
· VariableManagement
Work in the Kernel Development Framework is managed throughsessions. To create, build, or analyze an OpenCL™ kernel you need to create asession.
A session contains:
· A file with an OpenCL program
· Build artifacts:
·
o Generated LLVM code
o Assembly code
o Intermediate binary files
· OpenCL kernels with assigned variables
· Analysis reports
There are two methods for creating a session, depending on thefile type that you are working on:
· For a *.cl file, you need to create asession
· For a *.exe file, you need to generatea session
Creating a NewSession
Generate a Session
Creating Sessionfrom Existing OpenCL Code
Saving and LoadingSessions
Removing Sessions
Configuring Sessions
Code BuilderConfiguration Toolbar
Configuring DeviceOptions
Configuring BuildOptions
Configuring General Options
To show the toolbar go to: VIEW > Toolbars andmake sure that the Code Builder Build option is checked.
You can use the Code Builder Build toolbarto perform basic operations on sessions such as:
· Create new session
· Open session
· Save session
· Build session
· Compile session
· Open session'ssettings dialog
· Generate a new sessionfrom an existing OpenCL application
· Generate C++ code froman existing session
Creating a NewSession
Creating a Sessionfrom Existing OpenCL Code
Generate a Sessionfrom an OpenCL Application
This topic explains how to create and build a session.
1. Go to CODE-BUILDER > OpenCL Kernel Development > NewSessions or click the New Session button in the toolbar.
2. Specify the session name, path to the folder to store thesession file and the content of the session (can be either empty session orwith pre-defined OpenCL code).
3. Click Done.
Once the session is created, the new session appears inthe Code Builder Session Explorer view.
If you don't see the Code Builder Session Explorer dialog,go to: CODE-BUILDER > OpenCL Kernel Development > Windows > CodeBuilder Session Explorer.
Right-click the session in the Code Builder SessionExplorer and select BuildSession.
After the build is complete, you can see the following newitems:
· the Build Artifacts folder inthe session shows the related build artifacts that were created during thesession build
· the Kernels folder in thesession shows the kernel that ran during the session build.
· The Code Builder Analysis Input paneappears with the Input configuration and the Kernel Arguments
The Kernel Development Framework enables youto create a session from an existing project that contains OpenCL™ code files.If you have a project in Microsoft Visual Studio* that contains such files(s),you can do the following:
· Right-click the OpenCLfile and select Create Code Builder Session
·
· A new Session iscreated and becomes available in the Code Builder Session Explorer dialog.
The Generate Session featureenables you to generate a KDF session from an existing working executable fileof an application that uses OpenCL APIs. This enables you to rerun a singlekernel.
To create a KDF Session:
1. In Microsoft* Visual Studio*, go to CODE-BUILDER > OpenCL KernelDevelopment > Generate Session.
2. Select an application and its working directory.
3. Optionally, you can configure the following:
o Add application arguments in the Arguments field
o Change the output directory of the created sessions in the Output Directory field.
o Set the number of configurations generated per kernel in theNumber of instance per NDRange.
By default, this is set to 1. In this case if there is more than one instanceof NDRange with the same global and local workgroup sizes, only oneconfiguration is created.
o Load the created sessions to the KDF. By default, this checkboxis checked.
· The Session Generator generates regular KDF sessions. It recordsany data needed to rerun the kernels: *.cl input file, build option, deviceselection (CPU/GPU), assigned arguments and their data, global/local groupsizes, target architecture (32/64 bit)
· The created variables are saved locally within the session.
· Every use of clBuildProgram API in the application creates aunique session file
· Programs that are not used in the application do not create asession file
· Every use of clEnqueueNDRangeKernel API in the applicationcreates a kernel configuration (see Creatinga New Analysis Configuration section).
Building andCompiling OpenCL™ Program
Build Artifacts
Intel GraphicsDisassembly Source Mapping
To build an OpenCL™ program via the Kernel DevelopmentFramework feature of the OpenCL Code Builder, do the following:
1. Select the sessionwith the code that you would like to build.
2. Go to CODE-BUILDER > OpenCLKernel Development > Build Program. Or click the BuildProgram button in Code Builder - Build toolbar menu.
The build log appears in the ConsoleOutput dialog.
Once a OpenCL™ program build is completed, thebuild artifacts appear under the Builds Artifacts note inthe Code Builder Session Explorer. The list of artifacts includes:
· Generated LLVM code (
· Generated assemblycode for CPU only (
· Generated IntelGraphics disassembly code for each kernel (
· Program's intermediateprogram's binary (
· 32-bit version ofgenerate SPIR LLVM code (
· 64-bit version ofgenerate SPIR LLVM code (
· 32-bit version of theSPIR binary (
· 64-bit version of theSPIR binary (
· 32-bit version ofgenerate SPIR-V code (
· 64-bit version ofgenerate SPIR-V code (
· 32-bit version of theSPIR-V binary (
· 64-bit version of theSPIR-V binary (
All build artifacts are stored in thesessions' folder. You can double-click the LLVM\Assembly code to see itscontent in the IDE's editor. You can open the containing folder byright-clicking one of the files and selecting OpenCL Containing Folder.
NOTE: thisfeature is only supported on for Intel 64 architecture builds.
Once a OpenCL™ program build for GPU is completed, IntelGraphics disassembly code is generated for each kernel.
An artifact
Double-click on an artifact to open the source mapping view forthe specific kernel.
In the view, click on a source line to highlight the correlatingIntel Graphics disassembly lines.
The following screen capture shows a selected source line withthe related disassembly lines highlighted:
For more information on the architectural behavior andprogramming environment of chipset and graphics devices, see Intel®Graphics Programmers Reference Manual (PRM)
Code Builder ConfigurationToolbar
Configuring Sessions
Configuring DeviceOptions
Configuring BuildOptions
Configuring GeneralOptions
You can control some of the session optionsthrough the Code Builder Config toolbar.
To show the toolbar in Visual Studio*, goto: VIEW > Toolbars and make surethat Code Builder Config is checked.
The Code Builder Configuration toolbar enablesyou to:
· Select the targetmachine to work on: local machine or a connected Android device
· Select the targetOpenCL™ platform (Currently only Intel and PowerVR platform are supported)
· Select the targetOpenCL device
· Select the targetplatform architecture
· Show the platform informationdialog
A configuration is a set of analysis inputs:assigned variables, number of iterations, global sizes and local sizes of aspecific kernel and so on. You can create a separate configuration for each setof inputs that you want to analyze.
To configure the session, right-click thesession in the Code Builder Session Explorer and select SessionOptions...
The Session Configuration dialogbox enables you to define:
· Target device toperform build or analysis operations
· Build options
· Target platformarchitecture
Open the Session Options menu viaselecting CODE-BUILDER > OpenCL Kernel Development > SessionOptions.
The Device Options tab provides severalconfiguration options.
Target Machine group box enables selectingthe target machine:
· Local Machine
· Remote Android Machine
To use the Remote Machine option, you need to
1. Connect an Android* device with Intel processor or an emulatorbased on IA-32 System Image.
2. Copy OpenCL
3. runtime to the Android device or emulator. See section InstallingOpenCL™ Runtime on Emulator.
4. In Visual Studio* click Setup to copy OpenCLtools to the device.
You need to use the Setup option each time youstart an emulator device.
OpenCL Device group box enables selectingthe target platform and device for the selected machine:
· Intel CPU
· Intel(R) Graphics
· Intel CPU on Experimental OpenCL 2.1 Platform
Network Settings group box enables configuringthe network port range.
Configuring Build Options
Open the Session Options menu viaselecting Code-Builder > OpenCL Kernel Development > SessionOptions.
The Build Options tab provides severalconfiguration options.
Enter preprocessor and compiler options in the OpenCLOptions text box, or click the ... buttonto configure the settings in the OpenCL CompilerOptions dialog box. The options that you set are added tothe Options String text box at the bottom ofthe dialog.
Target Build Architecture group box enables:
· Using the current platform architecture.
· Configuring the build architecture manually by uncheckingthe Use current platform architecture check box, andselecting:
·
o Select Target operating system:
o
§ CurrentOperating System
§ Android OperatingSystem
o Choosing the Target instruction set:
o
§ StreamingSIMD Extension 4.2 (SSE4.2)
§ AdvancedVector Extension (AVX)
§ AdvancedVector Extension 2 (AVX2)
Changing the Target Build Architecture optionsenables viewing assembly code of different instruction set architectures andgenerating program binaries for different hardware platforms.
Target Build Architecture options are available forthe CPU device only.
Open the Session Options menuvia selecting CODE-BUILDER
The General Options tabenables defining the target session's platform architecture (x86 or x64).
Network Settings group box enables configuring thenetwork port range to be used to connect to the Code Builder server.
To save your session, go to Code-Builder > OpenCLKernel Development > Save Session. Or click the SaveSessionbutton in the Code Builder Build toolbar menu.
New sessions are saved under the NewSession Default Directory defined in the Kernel DevelopmentFramework's Settings. See ConfiguringSessions for informationon how to change these settings.
To load a saved session, do the following:
1. Go to Code-Builder > OpenCLKernel Development > Load Session. Or click the LoadSession button in the Code Builder - Build toolbar.
2. Select the session toload in the Open File dialog and click Open.
To export a saved session, copy the sessionfolder to the new location, on the current system, or on another system, andload as usual.
To remove a session fromthe Code Builder Session Explorer dialog, right-click thesession that you want to remove and select Remove Session (Keep localfiles) .
You can manage variables in Kernel Development Framework viathe Code Builder Variable Management pane. To open the pane,go to:
In Visual Studio*: CODE-BUILDER > OpenCLKernel Development > Windows > VariableManagement
Or, click the Variable Management button in the Code Builder Analysis toolbar.
These are the supported variable types:
· Buffer
· Image, in one of the following formats:
·
o RGB - BGRA
o YUV (NV12, NV21, YV12)
· Sampler
· Pipe
To create new buffer variable
1. Open the variable management pane. Go to CodeBuilder > OpenCL Kernel Development > Windows > Variable Management
2. Clickthe Add button in the Code Builder Variable Management paneto open and select Buffer.
the Buffer Variable dialog box opens.
3. Fill outthe dialog box entries:
Field |
Options |
Name |
Enter a meaningful name, or leave the default. |
IO Mode |
Use the drop-down to select Input, Output, or InOut |
Number of Elements |
Enter the number of elements, or leave the default value of 1. |
Memory flags to use |
Use the drop-down to select a memory flag, or leave the default CL_MEM_USE_HOST_PTR |
Use as SVM |
Check to use the buffer as an SVM on systems that support SVM. |
Initialization options |
For Input, or InOut IO Mode, you need to set an initialization option. · When using CSV files, each line represents one OpenCL data type (like int4, float16, and so on), with a value in each column to satisfy the type size. For example, for a long8, at least eight columns of long numbers should exist in each line. The size of the buffer is used as the number of lines to read from CSV. The CSV file may hold more columns or lines than needed for a specific buffer, but not fewer. · When using binary files, the content should be a concatenation of the OpenCL data type, and as with using CSV files, the file may hold more data than indicated by the Size argument. |
Output buffers do not need a value assigned to them. If a valueis assigned, it is ignored.
To create a new image variable, open the variable managementdialog. Go to Code Builder > OpenCL KernelDevelopment > Windows > Variable Management.
1. Click the Add button in the Code Builder Variable Management dialogand choose Image in the opened context menu.
2. In the Image Variable contextmenu, enter the Input image variable:
a. Enter a meaningful name, such as bmp_input1.
b. For IO Mode from thedrop-down, select Input.
c. In the Source field browse to,or enter the path to input bitmap files.
d. Enter the appropriate size, type, channel order, and otherparameters for your input image.
e. Click Done to create the InputVariable.
3. Click the Add button in the Code Builder Variable Management dialogand choose Image in the opened context menu.
4. In the Image Variable contextmenu, enter the Output image variable:
a. Enter a meaningful name, such as bmp_output1.
b. For IO Mode from thedrop-down, select Output. The output imagevariable does not take a source file.
c. Enter the appropriate size, type, channel order, and otherparameters for your output image.
d. Click Done to create theoutput Variable.
Create output images with the correct size, type, channel order,and so on.
Check the Get output image data checkbox todisable reading back the output buffer or image. This enables you to try morethan one combination of global or local work sizes, where there is no need toread the same output for all the combinations.
1. Inthe IO mode, select Input
2. Fill outthe image variable dialog box features for the input image.
a. Forthe Source field, browse to select your YUV imagefile.
b. For Source Format, select the appropriate YUV format from thedrop-down.
c. For widthand height enter the appropriate values for the image file.
d. For therest of the fields, you can use the defaults, or revise to use one of the validoptions.
3. Click Done to create the input images.
The number of created images depends on the image YUV format. For example, forNV12 format 2 new images are created
4. Click Add ( ) and choose Image to createthe output image variable. This variable does not require a source file.
5. Click Done to create the output images.
The YUV planes are created. You can now run the kernel with theimage.
To create new sampler variable
1. Open the variablemanagement dialog.
2. Click the Add button in the Code Builder Variable Management dialogand choose Sampler in the opened context menu.
To create new sampler variable
1. Open the variablemanagement dialog.
2. Click the Add button in the Code Builder Variable Management dialogand choose Pipe in the opened context menu.
You can change memory options of buffers or images using KernelDevelopment Framework. Refer to the relevant sections of this guide forguidelines on creating or editing variables.
You are not limited in selecting options. Avoid selecting theoption combinations that are forbidden by the OpenCL 1.2 specification,otherwise you may encounter errors upon analysis.
To choose buffers and images memory options, do the following:
1. Open the variable properties by right-clicking an image orbuffer variable in the Code Builder Variable Managementwindow andselecting Edit Variable.
2. Open thecombo box next to Memory flags to use.
3. Select the appropriate options and click Done.
To edit the variables in the system using theKernel Development Framework, do the following:
1. Open the CodeBuilder Variable management window.
2. Right-click a variablename.
3. Click EditVariable.
4. Change the desiredproperties and click Done.
To view buffer or image contents when usingthe Kernel Development Framework, do the following:
1. Open the CodeBuilder Variable management window.
2. Right-click a bufferor image name you want to view.
3. Click ViewVariable.
To create a copy of buffer, image, or samplervariable when using the Kernel Development Framework, do the following:
1. Open the CodeBuilder Variable management window.
2. Right-click a buffer,image, or sampler name you want to copy.
3. Click CopyVariable.
To delete variables when using the KernelDevelopment Framework, do the following:
1. Open the CodeBuilder Variable management window.
2. Right-click a variablename.
3. Click Deletevariable or Delete all variables.
You can delete all buffers, images, or samplesby right-clicking the corresponding node (Buffers, Images, Samplers, or Pipesrespectively).
To execute the kernel:
1. Select the session youare interested in from the Code Builder Session Explorer.
Tip: If you don't see the Code Builder Session Explorer dialog,go to: Code-Builder > OpenCL Kernel Development > Windows >Code Builder Session Explorer.
2. Optionally, select aconfiguration from the Code Builder Analysis Input pane:
a. Select the desiredconfiguration from the down button next to the Configuration ID field,or from the Configurations folder in the Code BuilderSession Explorer.
b. To create a newconfiguration based on the selected configuration, click the NewConfiguration button .
3. Select the kernel thatyou want to execute from the Select Kernel combo-box inthe Code Builder Analyze toolbar and click the Run buttonto execute the selected kernel with the selected configuration.
CodeBuilder Analysis Toolbar
KernelExecution Input
ViewingKernel Execution Results
OutputValidation
Runninga Session With a YUV Image Variable
To show the toolbar go to: VIEW > Toolbars andmake sure that Code Builder Analyze option is checked.
You can use the Code Builder Analyze toolbarto execute a kernel and to perform several analysis operations:
· Selecting the OpenCLkernel to execute or analyze
· Start kernel execution
· Start kernel analysis
· Open Code BuilderAnalysis Input window
· Open Code BuilderVariable Management windows
To assign analysis inputs for kernel execution, do thefollowing:
1. Select the desirable kernel from the session's kernels list inthe Code Builder Session Explorer or from the SelectKernel combo box in the Code Builder Analyze toolbar.
2. Open the Code Builder Analysis Input windowfrom:
In Visual Studio*: Code-Builder > OpenCLKernel Development > Windows > AnalysisInput
or by clicking the OpenAnalysis Input button in the Code Builder Analysis toolbar.
3. Assign a variable for each kernel argument in the KernelArguments table by clicking the Click here to assign linkunder the Assigned Variable column. You can assignone-dimensional variables (such as integer, float, char, half, and so on) on-the-fly by typing singlevalues into the table. See VariableManagement for details.
4. You canassign a reference for each output variable (buffers or images) that you wantto verify. If you specify a validation reference, after the kernel executioneach one of the output variables will be compared with its assigned referencein order to validate the correctness of the kernel executions.
5. Set thenumber of iterations, global size and local sizes per workload dimension inthe Workgroup size definitions group box.
You can use the local size(s) text boxes forseveral different test configurations:
· Set single size value for a single test.
· Add several comma-separated sizes for multiple tests.
· Set 0 to utilize the default framework-assigned local size.
· Check the Auto box to enable the Kernel Builderto iterate on all sizes that are smaller than global size and device maximumlocal size.
Also consider the following:
· Each option is available for each dimension.
· To analyze the kernel in its designed conditions, set a singlevalue.
· To find the local size that provides higher performance results,click Auto or set a list of comma-separated values.
· To improve the analysis accuracy, run each global and local worksize combination several times by increasing the Number of iterations value.Several iterations minimize the impact of other system processes or tasks onthe kernel execution time.
· Use the Device Information dialog to comparedevice properties and choose the appropriate device for the kernel.
Follow these steps to create a new configuration for anexisting built session:
1. In the Code BuilderAnalysis Input pane, you need to assign variables for each kernelargument that is listed. Click on the link in the AssignedVariable column to assign the appropriate variable.
2. Enter a valid name to enable the New Configuration button . Click the button to duplicate the current configuration. Youcan edit the new configuration to create multiple analysis configurations.
Once the kernel execution is completed, anew Run Results page is generated.
The Run Results reportcontains the following sections:
· ExecutionOverview - providesinformation about the executed kernel, execution time and output validationresults.
· OutputValidation - lists all theoutput variables which have a reference assigned. If the validation fails youcan click the variable name to open the image viewer or the buffer viewer andcompare the content of the output variable with the assigned reference.
· Kernelvariables - lists all thekernel variables. You can click the variable name to open the image viewer orthe buffer viewer and see the content of the variable.
Kernel Development Framework enables you to set a referencepoint for your OpenCL* Kernel's output variable. The Kernel DevelopmentFramework validates the output variable after each kernel execution. Thevalidation results appear in the Run report.
You can compare the output variable with its reference inthe Image and Buffer viewers.
1. In the Assigned Variables view,click the Validation Reference column andchoose a validation reference.
2. Check the Enable output validation checkboxand run the kernel.
3. For each validation reference you set in step 1, you can see theresults of the validation in the CodeBuilder Run Results window,in the Output Validation section.
4. For a Validation failed result,click Validation failed link to view theoutput variable side-by-side with the reference variable.
5. Сlick the Next/Prev buttons tofile the exact pixels and see their values:
After you have created a YUV format imagevariable, you can run an analysis session with the image variable as one of thekernel arguments:
1. Create a session withthe relevant kernel and build the session.
2. In the OpenCLKernel Analysis Input pane, assign the image variables to the kernelarguments.
3. From the OpenCLKernel Development > Run Analysis.
4. In the report that isgenerated, from the Select Report drop-down, select VariablesView.
5. Right-click on one ofthe links and select Open in YUV Combiner.
The YUV Planes Combiner dialog box appears
6. Fill in the UV Plane,height and width fields and click create YUV image:
To start running an analysis:
1. Select the session youare interested in from the Code Builder Session Explorer
If you don't see the Code Builder Session Explorer dialog, goto: Code-Builder > OpenCL Kernel Development > Windows > CodeBuilder Session Explorer.
2. Optionally, select aconfiguration from the Code Builder Analysis Input pane:
a. Select the desiredconfiguration from the down button next to the Configuration ID field, or fromthe Configurations folder in the Code Builder Session Explorer
You can also click New configuration button to create a new configuration based on the selectedconfiguration.
3. Select the kernel thatyou want to analyze from the Select Kernel combobox in the CodeBuilder Analyze toolbar and click the Analyze button to startanalyzing the selected kernel with the selected configuration.
Alternatively, right-click the session from the Code Builder Session Explorerand select Analyze to run the selected analysis with theselected configuration.
ViewingAnalysis Results
Once analysis is completed, several reportsare generated. A new report is generated for each analysis run.
The reports are available under the Reports nodein the Code Builder Session Explorer window.
The analysis reports are HTML files, saved bedefault in the session directory and can be viewed in any modern web-browser.
Each report contains several views:
· SessionInfo - providesinformation about the application that was analyzed and the command line to runthe analysis. Also provides the kernel code for viewing.
· ExecutionAnalysis - providesinformation on execution times statistics and on the best and worstconfigurations.
· VariablesView - providesinformation on the read and read back time of the memory object being used inthe kernel and allows you to see their content.
You can toggle between the views throughthe Report Selection combo box located at the top of thereport's layout.
This pane provides information about theanalysis and the command line to run the analysis
You can also click on the Kernel Code link on the top of theview to view the kernel source code:
The top part of the Execution View enablesyou to see the tested global and local size best and the worst configurations,based on median execution time. In case only one configuration exists, the resultappears in both result windows.
The table below enables you to see statisticalanalysis results for all configurations. The statistics consists of thefollowing iteration execution time values for the selected configuration:
· Median
· Average
· Standard deviation
· Maximum
· Minimum
Expanding each row in the table enables you tosee the total run time, the breakdown to queue, submit and execute times periteration for the given configuration.
The Execution View providesthe following features
· Best configuration interms of median execution time is marked in blue. Click on the blue summaryline on the top of the view to open the results.
· Click the [...] linkunder the Variables column to view the list of the kernel's input/outputvariables
· Click on a variablename in the list to view the variable content
For analysis on a GPU device, you can performKernel Analysis on any configuration in the report. Click on the KernelAnalysis node to see the various configurations.
Click a configuration to run KernelAnalysis on it.
The Kernel Development Framework can generateC++ code from your Kernel session.
To generate the C++ code in Visual Studio*:
1. Go to Code-Builder> OpenCL Kernel Development > Generate Host Code Generator
2. In the OpenCLCodeBuilder Host Code Generator dialog box that appears, select thedesired name and location for the generated C++ code and related files, andclick Done to generate the files.
You can now use the generated C++ code to editthe kernel and rerun a session with the new kernel.
OpenCL™ Debugger plug-in for Microsoft Visual Studio* IDEenables debugging OpenCL kernels using the Microsoft Visual Studio softwaredebugger GUI. The Debugger enables debugging host code and OpenCL kernels in asingle Microsoft Visual Studio debug session.
Debugger supports existing Microsoft Visual Studio debugging windowssuch as:
· Breakpoints
· Memory view
· Watch variables - including OpenCL types like float4, int4, and so on
· Call stack
· Auto and local variables views
Debugging is available only for CPU device. If thecode should run on Intel Graphics, debug on CPU device during developmentphase, then change the target device.
For debugger limitations and known issues,refer to the Code Builder Release Notes at https://software.intel.com/en-us/intel-opencl-support.
Enabling Debuggingin OpenCL™ Runtime
Configuring theDebugger
Troubleshooting theDebugger
To enable debugging mode in the Intel OpenCL runtime forcompiling OpenCL code using OpenCL™ Debugger plug-in for Microsoft VisualStudio* IDE, do the following:
1. Add the -g flagto the build options string parameter in the clBuildProgram function.
2. Specify full path to the file in the build options stringparameter to the clBuildProgram functionaccordingly (including the CL file name):
-s
If the path includes spaces, enclose the entire path with doublequotes.
For example:
err = clBuildProgram(
g_program,
0,
NULL,
"-g -s \"\"",
NULL,
NULL);
Relative path to the CL file is not supported.
According to the OpenCL standard, work-items execute OpenCLkernels simultaneously. The Debugger requires setting the global ID of thework-item to debug before the debugging session starts. The Debugger stops onbreakpoints in OpenCL code only when the pre-set work-item reaches them.
To work with the OpenCL™ Debugger plug-in for Microsoft VisualStudio* IDE, the OpenCL kernel code must exist in a text file, separate fromthe code of the host. Debugging OpenCL code that appears only in a stringembedded in the host application is not supported. Create your OpenCL projectwith the OpenCL Offline Compiler plug-in for Microsoft Visual Studio* to getseamless integration with the Debugger.
After you enabled debugging in the OpenCL™ Runtime, you canstart to use the OpenCL Debugger plug-in for the Visual Studio* IDE:
1. Start the Microsoft Visual Studio* IDE.
2. Go to Code Builder > OpenCL Debugger> Options > Kernel Debugger.
3. Check the box relating to the device to debug:
a. To debug the CPU device, check the EnableOpenCL Kernel Debugging for CPU device checkbox, and fill out theappropriate work items and port.
b. To debug the GPU device, check the EnableOpenCL Kernel Debugging for GPU device checkbox, and fill out theappropriate server, port and log file.
4. Insert breakpoints in the application in different OpenCL kernelcalls, and then start debugging with F5.
If NDRange is not 3D, leave unused dimension values at 0.
· OpenCL™API Debugger
· Enablingthe API Debugger
· TraceView
· ObjectsTree View
· PropertiesView
· CommandQueue View
· ProblemsView
· ImageView
· DataView
· MemoryTracing
You can configure several features of the API debugger, goto CODE BUILDER > OpenCL Debugger > Options toopen the CodeBuilder - Debugger Configuration dialogbox, and select the API Debugger tab:
To do this... |
Do this: |
Configure the number of states to save per each memory object (Image, Buffers and SubBuffers). |
In the Number of previous states field, set the appropriate number of states. |
Enable raw data tracing. |
Check the Enable Raw Data Tracing box. |
Dump image bitmaps. |
Check the Enable Image Bitmap Tracing box. |
Enable automatic memory tracing. |
In the Other settings group, check the Auto-generate sessionbox. |
Configure the data buffering size. |
In the Data buffering size (bytes) field, enter the appropriate number of bytes for the data buffer. |
Set a custom port for API debugging. |
Check the Use Custom Port box and set the appropriate port number in the text field. |
And can be enabled via:
CODE BUILDER > OpenCL Debugger > Options toopen the CodeBuilder - Debugger Configuration dialogbox, and select the API Debugger tab Othersettings > Auto-generate session
· Profiling performance measured by either clGetEventProfilingInfo runtimeAPI call or any other method for measuring execution time or occupied hostmemory, may cause performance degradation.
· To get more accurate profiling results, use the runtime directlyvia Ctrl+F5, Start w/o Debugging, or disabling APIDebugger in the plug-in configuration menu.
· Only 2D images are supported for viewing, which is memoryobjects that contain CL_MEM_OBJECT_IMAGE2D intheir image_type fieldinside their descriptor (cl_image_desc).
· Images above 2GB are not supported and are not be displayed
· The bitmaps shown in the Image View are merelyan 8-bit RGBA approximation of the underlying pixel array of the associatedimages.
Behind the scenes, the plug-in does a liner color conversion from the inputrange of the pixels, which can be any type supported by the OpenCL runtime, forexample, CL_SNORM_INT8, CL_UNSIGNED_INT16, and soon) to the [0..255] range.
As a result, the presented colors might not accurately represent the bitmap asexpected.
The interface of the Microsoft Visual Studio* IDE provides standarddebugging capabilities for the host side of OpenCL™ applications, while theOpenCL Debugger plug-in of the Code Builder enables debugging OpenCL kernels.The stitch between simultaneous debugging of OpenCL kernel and host applicationmight be complicated in different stages. API Debugging feature of the CodeBuilder - Debugger plug-in for Microsoft Visual Studio covers the stitch.
The API Debugging feature enables monitoring and understandingOpenCL environment of applications throughout execution.
The feature supports the following:
· API Tracing - lists a trace ofall OpenCL API calls that occurred during the execution, list of tracearguments, return values, and execution time.
· OpenCL Objects View -shows all OpenCL objects that exist in memory during the execution.
· Properties View -retrieves common information per each OpenCL object.
· Command-Queue View -tracks the execution status of the enqueued commands.
· Problems View -summarizing all error and warning messages.
· Image View - visually displays all2D image objects as bitmaps.
· Data View - visually displaysbuffer data and 2D image pixel arrays on a grid.
· Save/Load session -enables capturing a state/snapshot of all views of the plug-in, saving it ondisk, and also loading the stored sessions.
· Memory Tracing -enables storing OpenCL Images and Buffers content, and visually examiningeither by Bitmap or Grid view the contents of the underlying data associatedwith the memory object throughout the various API calls that affected it.
Concurrent debugging sessions are not supported with the OpenCLAPI Debugger. This includes attaching the debugger to more than one process, oropening multiple instances of the Visual Studio and debugging processesconcurrently.
TraceView
ObjectsTree View
PropertiesView
CommandQueue View
ProblemsView
ImageView
Data View
MemoryTracing
To use the API Debugger, do the following:
1. Start the Microsoft Visual Studio* IDE.
2. Go to CODE BUILDER > OpenCL Debugger > Options > APIDebugger.
3. Check Enable OpenCL API Debugger.
4. Insert breakpoints in the application in different OpenCL APIcalls, and then start debugging with F5.
5. Open the needed API Debugger views by selecting CODEBUILDER > OpenCL Debugger and select the view youneed.
The API Debugger updates the view panes when:
· The Debugger hits a breakpoint in Microsoft Visual Studio* IDE.
· One of the views behavior changes, which means you click abuttons.
· The host application execution ends.
So, to see data in the views,
1. Insert some breakpoints in your application (in different APIcalls), or run the application with Start Debugging (F5).
2. Then open the needed views via CODE BUILDER > OpenCLDebugger.
TraceView
ObjectsTree View
PropertiesView
CommandQueue View
ProblemsView
ImageView
Data View
MemoryTracing
The trace view contains trace of all OpenCL™ API Calls duringthe execution, API call arguments, returned values, error codes, and time ofexecution.
To access the trace view, select CODE BUILDER > OpenCLDebugger > Trace View.
Use the following buttons to control the view:
· Save - enables saving thecurrent state of all views with live OpenCL objects, API trace, command queue,and so on,
o to either a binary file (.trace) that can be later loaded with the LoadSession button.
o or, you can export a list (trace) of all API calls into a CSV file
· Load Session... -enables restoring the state of the views from a previously saved .trace file either using SaveAs... or Generate trace file option in the APIDebugger settings.
This feature is available only when Visual Studio* IDE is not indebug mode, as views are synced with the application you debug.
· Success/Errors -enables filtering successful or failed API calls.
· API Display Mode -toggles between views:
·
o Function name only
o Function name and arguments
o Function name with argument names and values
o Show Return Value
o Show Error Code
o Show Time
· Filter - enables filtering out APIcalls by name. Start typing "device [1]" for example, to get only APIcalls using "device [1]":
·
· Right-click context menu -enables toggling between various display modes of arguments Hex\Decimal, andshow raw values (for example, 0x2 instead of CL_DEVICE_TYPE_CPU).
·
To enable automatic trace generation, select CODEBUILDER > OpenCL Debugger > APIDebugger > Auto-generate session. Traces are saved inthe folder that is specified in the Output Folder text box.
Automatic trace generation is an equivalent to clicking Save... afterthe host application ended.
Enabling the APIDebugger
OpenCL™ API Debugger plug-in for Microsoft Visual Studio*IDE Objects Tree view enables:
· Getting a better understanding of which objects are"alive"/released at any given point of time.
· Showing hierarchy and dependencies of various OpenCL objects.
API Debugger also reflects the OpenCL objects that exist inmemory during application execution:
· Platform
· Devices
· Context
· Buffer
· and so on
When creating an OpenCL context for with (for example, clCreateContext() APIcall), the Objects Tree updates immediately with the new context object.
Objects dim when become released by, for example, clRelease.
Use the following buttons to control the Objects Tree view:
· Sort By - enables togglingthe way data is displayed:
·
o Sort by Context -all entities that are associated with a specific context are displayed ascontext successors.
o Sort by Device -all contexts are displayed as children of the devices.
· Show Objects -enables displaying only a subset of the OpenCL objects. Use it when you have alot of OpenCL objects that are alive at some given moment, and you need to seestatus of only several objects or object types.
To view objects of a specific type only,
·
o Select Show Objects > uncheck ShowAll.
o Select Show Objects > select the object typeto display.
· Open Source Code in a new tab - enables viewingthe source code associated with the program object. Right-click any Programobject in the tree, then click Open Source Code in a new tab.
·
· Save Binaries -enables dumping binary files that were built for the program object with use of clBuildProgram,or clCreateProgramWithBinaries.Right-click any built program object in the tree, then click SaveBinaries and select the location to save the binaries.
Enablingthe API Debugger
OpenCL™ API Debugger plug-in for Microsoft Visual Studio* IDEexposes miscellaneous properties for each OpenCL object or Command Queue event.Properties view pre-fetches information about OpenCL objects or events, anddisplays it when a particular object is selected.
Access the Properties View by selecting CODEBUILDER > OpenCL Debugger > PropertiesView.
All properties in the Properties View areread-only.
To view properties for an OpenCL object, do the following:
1. Select (left-click) some object from the Objects View window.
2. Open the Properties view.
The OpenCL Objects Properties view is analternative to calling API calls such as clGetDeviceInfo().
To view properties for an OpenCL command-queue event, do thefollowing:
1. Select (left-click) an event from the Command Queue View window.
2. Open the Properties view.
The Command Queue Events Properties view is analternative to retrieving execution time by adding the CL_QUEUE_PROFILING_ENABLE parameterto clCreateCommandQueue() whencreating the command queue to which the commands are enqueued, and thenquerying the enqueued events execution times using clGetEventProfilingInfo().
To view properties for an OpenCL command-queue event:
1. Select (left-click) some event from the Command QueueView window.
2. Open the Properties view.
Enablingthe API Debugger
OpenCL™ API Debugger plug-in for Microsoft Visual Studio* IDEprovides Command Queue View, which enables tracking the executionstatus of enqueued commands (issued by clEnqueue API call).
The status for a command can be one of the following options:
· Submitted
· Running
· Completed
The Command Queue View also displays events fora particular command-queue (Separate Queues) or for all events from all queues(Unify Queues).
Access the Command Queue View byselecting CODE BUILDER > OpenCL Debugger > CommandQueue View.
Use the following buttons to control the Command QueueView:
· Save As... - enables dumpingthe current status of commands to a text file for a later investigation.
· Unify Queues -enables to view all commands across all queues.
Also note the following:
·
o When working in the Unified queues mode, eachentry is added a suffix of the form: CQ [NUMBER], which indicates thecommand-queue number, with which the command is associated.
o For example: TASK(3)CQ[1], indicates that the 3rd command enqueued to some queue isa clEnqueueTaskcommand,and is associated with Command-Queue [1].
o Each queue has a color and all its corresponding commands havethe color of the queue. Such differentiation makes it easy to spot in the eyethe corresponding queues of the commands in question:
o
Command-queues in the Objects Tree view sharethe same color in the view as their color in the Command Queue view.
o The Unify Queues button changes into SeparateQueues button after being clicked, which does the opposite operationand shows events status per queue.
· Separate Queues -appears when working in Unified mode after clicking UnifyQueues, and does the opposite to Unify Queues operation,which is showing the commands per-queue. First select the queue from thedrop-down list under the Save As... button, then the viewupdates with the commands that are associated with the selected queue.
Enablingthe API Debugger
OpenCL™ API Debugger plug-in for Microsoft Visual Studio* IDEprovides the Problems View that summarizes into a single viewall errors and warnings that occurred during the execution.
To access the view, select CODE BUILDER > OpenCLDebugger > Problems View.
Problems View supports the followingfeatures:
· Displaying warnings and errors of kernel compilation.
· Showing uninitialized kernel arguments, each one of them is setby calling clSetKernelArg() foreach argument.
· Releasing OpenCL objects in the out-of-order mode, for example,when you release a program object before releasing its kernels (clReleaseProgram before clReleaseKernel).
· Resource leaks: at the end of the program, an error entry isadded for each OpenCL resource (programs, buffers, images, and so on) that isnot released
· API call failures - when an OpenCL API call fails, an errorentry is added to the problems view. You can right-click the entry, to jump tothe line item in the trace view that caused the failure.
Double-clicking an error in the Problems View opensthe compilation error log message in the code editing area.
Enabling theAPI Debugger
This view enables visual displaying of the OpenCL™ Image objectsin the host application.
Each Image object is added to the Objects View, andby double-clicking each Image object, the bitmap is displayed - the underlyingpixel array gets translated into a bitmap.
Double-click the Image you need and wait for the ImageView to appear with the latest state of the image
From the Images drop-down, select any Image. The view displaysthe image as bitmap.
The history drop-down enables viewing various states of theselected image, where each state is a result of an API call.
If, for instance, you create an image with all pixels set to 0,you see on Image creation the following view:
Now, after running the kernel on the selected Image, you canobserve that it was updated indirectly by clEnqueueNDRange APIcall (therefore causing the kernel to run).
Each state is related to the API call that caused the change,and is in the following format: #ID:API Call.
Where #ID isthe number of API call that caused the change, and API Call is the OpenCL API call thataffected (changed) the object.
This is the same API call that it shown in the TraceView.
When selecting an Image from the drop-down, or alternativelyselecting an Image state, the Trace View automaticallyhighlights the API call that is related to that state:
The Save As button in the Image View enablessaving a copy of the displayed image to disk, as bitmap.
To configure the number of states to save per each memory object(Image, Buffers and SubBuffers) see Configuringthe API Debugger
Enablingthe API Debugger
Configuringthe API Debugger
The Data View enables visual displaying on a grid of all theOpenCL Memory Objects: Images, Buffers and SubBuffers, that were instantiatedin the host application.
Each Memory Object is added to the Objects View, andby double-clicking Buffers/SubBuffers you can display the buffer contents, orby double-clicking an Image you can view the raw pixel data associated with theimage.
Double-click the Buffer you need, and Data View windowappears with the latest state of the buffer/sub-buffer.
From the Memory Objects drop-down, select any memory object andthe view shows the raw data associated with the object:
The history drop-down enables viewing various states of theselected memory object, where each state is a result of an API call.
Consider the situation of a host application that calculates ahistogram of a grayscale image. For example, use a buffer with 256 bins foreach color of the image, to calculate the histogram.
As a first step, issue an NDRange kernel called bzero toinitialize the buffer with zeros:
Examine the buffer contents on the grid and see that all bufferelements are set to zero:
Now, as a second step, issue a second NDRange command thatcalculates the histogram of the image using the buffer as bins counter:
This example has 22 pixel elements with grayscale value 5, and27 pixels with grayscale value 9, and so on.
Use the Type box to select the underlying datatype (for example, cl_uint, cl_double).
The Save As button enables saving a CSVrepresentation of the data to disk.
When exporting Buffer/SubBuffer, you get each buffer cell in aseparated line. The Buffer/SubBuffer values are interpreted as a contiguousmemory chunk containing unsigned, chars as its elements.
When exporting an Image as a CSV, the number of rows in theoutput CSV is the height of the image (number of rows), and each row representsall columns of that row joined and delimited by commas.
The values in the cells are interpreted according to the imagechannel data type, so, for example, an Image that has CL_SIGNED_INT32 asits data-type, causes the resulting output to display each row as an array ofsigned 32-bit integers.
Each state is related to the API call that caused the change,and is in the following format: #ID:API Call.
Where #ID isthe number of API call that caused the change, and API Call is the OpenCL API call thataffected (changed) the object.
This is the same API call that it shown in the TraceView.
When selecting a memory object from the drop-down list, oralternatively selecting a memory object state, the Trace Viewautomaticallyhighlights the API call that is related to that state:
To configure the number of states to save per each memory object(Image, Buffers and SubBuffers) see Configuringthe API Debugger
Enablingthe API Debugger
Configuringthe API Debugger
Memory tracing enables the user to capture the session of thedebugging into a file, and also to load a previously stored state into theviews.
The stored state contains:
· State of all the views - this includes all the data that isfilled in the various views of the plug-in
· Images bitmaps (if Enable Images Bitmap Tracing ison)
· Memory objects raw data (if Enable Raw Data Tracing ison)
The state can be stored by either of the following ways:
· Automatically when host application ends
· Manually, by going to: Trace View > Save > SaveSession (.trace)
The automatic memory tracing contains:
· State of all the views
· CSV of all API calls that occurred during the execution
This option creates a separate directory for each capturedsession of the plug-in. The directory is stored under the Output Folder specifiedin the same window. To enable automatic memory tracing see Configuringthe API Debugger.
Enablingthe API Debugger
Configuringthe API Debugger
OpenCL™ Code Analyzer provides performanceinformation for OpenCL applications.
The Code Analyzer enables you to collectperformance data from both the host side and the kernels side of an OpenCLapplication.
For the kernel side - the Code Analyzerprovides:
· Time of execution.
· Throughput and thework size data of each OpenCL kernel that was launched during your program'sexecution.
· Overall occupancy ofthe GPU execution units during the kernel execution.
· Latency of each memoryaccess command that was executed during the kernel runs on the GPU.
For the host side - the Code Analyzerprovides:
· Statistics of allOpenCL API calls
· Data about memorycommands that were executed in your program.
Creatingand Launching a New Analyze Session
AnalysisResults
SessionInfo
HostProfiling
KernelOverview
KernelAnalysis
Host-sideAnalysis Optimization Tips
RevisingCode and Rerunning a Host Profiling Session
RevisingCode and Rerunning a Kernel Profiling Session
Configuringthe Code Analyzer
CodeAnalyzer Command Line Interface
When you run the Code Analyzer, you create anAnalyze Session, which contains the configuration data for collectingperformance information and the results of the analysis run. You can explorethe analysis sessions in the Analyze Sessions Explorer window.
To create and run an Analysis session, do thefollowing:
1. Go to CODE_BUILDER > OpenCLApplication Analysis > New Analysis Session...
the OpenCL Application Analysis Session dialog box appears.
2. In the ApplicationSetting section, specify information about the application that youwant to analyze:
· In the Application field,enter, or browse to, the full path and executable file name of the targetapplication.
· Inthe Arguments field, enter the command-line options to use whenlaunching the target application.
· In the WorkingDirectory field, optionally enter the working directory for the targetapplication.
To use the directory specified in the Application field, checkthe Use application directory as working directorybox.
· In the SessionName section, enter a meaningful name for the session.
3. Verify that the StartAutomatically on Process Launch check box is checked, and click Launch.
Your application is launched and the Code Analyzer starts to collect data.
While the application is running, a newsession run tab is opened in the main Visual Studio* window.
During the session, you can do the following:
· Click Pause/Resumeanalysis button in the session run tab to pause or resume datacollection.
· Exercise thefunctionality that might cause performance issues.
· Click Closeanalysis to stop the analysis.
After you finish running the application, the new analysissession that you created appears in the Analyze Sessions Explorer windowand a results view appears in Visual Studio* main area.
The analysis result view contains an analysis home page and ananalysis progress bar with several steps to review the analysis results.
In the home page of the analysis view you can find an overview of thehost and kernel profiling results. You can click on each one of the sections onthis page to navigate to the relevant report or you can follow the analysisprogress bar steps:
The session info page provides information about the applicationthat was analyzed and the command line to run the analysis.
The Host Profiling page provides data that can help you optimizeyour host-side code.
This section includes the following reports:
· API Calls
· Memory Commands
· OpenCL Objects
The API call report lists statistics of calls made to the OpenCLAPI, including the number of times the API call was called, the number oferrors returned, and statistics on the elapsed time each API call took whileexecuting
Click the + buttonon the left of any API name to expand and show additional information on thisAPI, including the error code, input arguments and timing data of each call tothis API during program execution.
To see the entire list of arguments for each API call press "[..]"and the arguments column:
The Memory Commands report lists statistics of calls made to theOpenCL program memory commands, including the number of times the memorycommands were called, the number of times errors were returned, and statisticson the elapsed time each memory command call took while executing.
Click the + buttonon the left of any memory command name to expand its row. The expanded areapresents additional information, including the return value, memory objects andtiming data, of each call of this memory command occurred during the programexecution.
The OpenCL Objects report presents all OpenCL objects that werecreated during the application run, shown as an hierarchical tree.
You can use the platform, context and devices menusto hide some branches of the tree.
The Kernel Overview page provides data thatcan help you optimize your kernel code.
This section includes the API Calls report,that shows every OpenCL kernel that was launched during the program execution.
Kernels with different name, different globalwork size, or different local work size are considered as a different kernelsand presented in a different rows.
Each row shows:
· The total, minimum,maximum and average kernel's execution time.
· EU Active - Thenormalized sum of all cycles on all cores spent actively executinginstructions.
· EU Stalled - Thenormalized sum of all cycles on all cores spent stalled. At least one thread isloaded, but the core is stalled for some reason.
· GPU MemoryReads/Writes - Reads/Writes from GPU from/to chip uncore (LLC) and memory. Thoseare all memory accesses that miss in internal GPU L3 cache and are servicedeither from uncore or main memory.
· L3 Cache Misses - Allread and write misses in GPU L3 cache.
· Untyped MemoryReads/Writes - Memory accesses to buffer created with clCreateBuffer
· Typed MemoryReads/Writes - Memory accesses to typed buffers, e.g., writes to bufferscreated with clCreateImage. However, reads from images are counted by Sampleraccesses and Texture Read.
· SLM Reads/WritesMemory accesses to Shared Local Memory
Click the '+' button on the left of any kernelname to expand its row. The expanded area presents additional information,including the latency, return value, command queue, context and timing data ofeach time this kernel was executed during the program execution.
For analysis of the kernel side of the application, thefollowing reports are generated:
· Occupancy
· Ticks per Thread
· Latency
The Occupancy report shows, for each kernel,the occupancy of each execution unit in the GPU.
You can also see the number of GPU threads launched, and themin, max and average thread execution time.
The Ticks per Thread report shows, for eachnumber of active threads, the amount of time this number of threads was active.
The Threads per Time report shows this numberof threads that were active at each point in time during the execution.
The Latency pane shows, for each kernel file,the overall latency of the memory commands.
Click the kernel name to see the latency of each memory command in the sourcecode of this kernel.
While you run the host-side performanceanalysis, the Code Analyzer identifies inefficient use of the OpenCL API. Whenthe analysis is done, a TIPS screen appears, showing all thedetected issues, each issue also has a short description.
Click a specific tip to open a related reportand highlight the data within the report which is relevant to this tip. Inaddition a popup window appears, with a recommendation how to fix the reportedissue:
The following table summarizes therecommendations that are reported from the Tips.
int Title |
Description |
Recommendation |
Inefficient "clCreateBuffer" |
The host program includes a call to clCreateBuffer where "flags" includes "CL_MEM_COPY_HOST_PTR"." |
There are two ways to ensure zero-copy path on memory objects mapping. For best results, allocate memory with "CL_MEM_ALLOC_HOST_PTR", this method ensures that the memory is efficiently mirrored on the host. Another way is to allocate properly aligned and sized memory yourself and share the pointer with the OpenCL framework by using the "CL_MEM_USE_HOST_PTR" flag. |
"clCreateBuffer" call where "host_ptr" isn't 4K aligned. |
The host program includes a call to "clCreateBuffer where "host_ptr" is not 4K aligned. |
For best results, align memory address to host memory page (4K bytes) |
"clCreateBuffer" call where "size" isn't a multiple of 64 bytes |
The host program includes call to "clCreateBuffer" where "size" is not a multiple of 64 bytes. |
For best results, make sure that the amount of memory you allocate and the size of the corresponding OpenCL buffer is a multiple of the cache line sizes (64 bytes). |
Redundant calls to "clBuildProgram" |
The host program includes several calls to "clBuildProgram" with the same arguments. |
When possible, call "clGetProgramInfo" to retrieve binaries generated from calls to "clCreateProgramWithSource" and "clBuildProgram". |
Redundant calls to "clCompileProgram" |
The host program includes several calls to "clCompileProgram" with the same arguments. |
When possible, call "clGetProgramInfo" to retrieve previously compiled binaries. |
Redundant calls to "clCreateContextFromType". |
The host program includes calls to "clCreateContextFromType" with the same arguments." |
Consider using the same OpenCL context instead of recreating it. |
Redundant calls to "clCreateContext". |
The host program includes calls to "clCreateContext" with the same arguments. |
Consider using the same OpenCL context instead of recreating it. |
Redundant calls to "clCreateCommandQueue" |
The host program includes several calls to "clCreateCommandQueue" that refer to the same device |
Consider using the same command-queue to access the device. |
Redundant calls to "clCreateCommandQueueWithProperties" |
The host program includes several calls to "clCreateCommandQueueWithProperties" that refer to the same device. |
Consider using the same command-queue to access the device. |
"clEnqueueReadBuffer" call |
The host program includes several calls to "clEnqueueReadBuffer" |
When possible, use "clEnqueueMapBuffer" and "clEnqueueUnmapMemObject instead of calls to "clEnqueueReadBuffer" or "clEnqueueWriteBuffer. |
"clEnqueueWriteBuffer" call |
The host program includes several calls to "clEnqueueWriteBuffer" |
When possible, use "clEnqueueMapBuffer" and "clEnqueueUnmapMemObject instead of calls to "clEnqueueReadBuffer" or "clEnqueueWriteBuffer. |
"clEnqueueReadImage" call |
The host program includes several calls to "clEnqueueReadImage" |
When possible, use "clEnqueueMapImage" and "clEnqueueUnmapMemObject instead of calls to "clEnqueueReadImage" or "clEnqueueWriteImage. |
"clEnqueueWriteImage" call |
The host program includes several calls to "clEnqueueWriteImage" |
When possible, use "clEnqueueMapImage" and "clEnqueueUnmapMemObject instead of calls to "clEnqueueReadImage" or "clEnqueueWriteImage. |
"clEnqueueReadBufferRect" call |
The host program includes several calls to "clEnqueueReadBufferRect" |
When possible, use "clEnqueueMapBuffer" and "clEnqueueUnmapMemObject" instead of calls to "clEnqueueReadBufferRect" or "clEnqueueWriteBufferRect". |
"clEnqueueWriteBufferRect" call |
The host program includes several calls to "clEnqueueWriteBufferRect" |
When possible, use "clEnqueueMapBuffer" and "clEnqueueUnmapMemObject" instead of calls to "clEnqueueReadBufferRect" or "clEnqueueWriteBufferRect". |
The work-group dimensions are defined as "column" work-group |
The host program includes a call to "clEnqueueNDRange" where the work-group dimensions are defined as "column" work-group. |
When reading from memory, best to reorganize the work-group to read in lines instead of columns. |
Performance Information |
Kernel register pressure is too high, spill fills will be generated. Additional surface needs to be allocated. |
Consider simplifying your kernel. |
Performance Information |
Kernel private memory usage is too high and exhaust register space. Additional surface needs to be allocated. |
Consider reducing the amount of private memory used, avoid using private memory arrays. |
Performance Information |
Local workgroup sizes selected for this workload may not be optimal |
consider using a different local workgroup size, |
Performance Information |
Not aligned surface detected. Driver needs to disable L3 caching. |
, |
Performance Information |
Kernel submission requires coherency with CPU, this may impact performance. |
, |
Performance Information |
Null local workgroup size detected, Following sizes will be used for execution |
, |
After you optimize your code, you can rerun the Code Analyzersession and compare the data to see how your changes improve your applicationperformance.
To rerun an analyze session, do the following:
1. Open the Analyze Sessions Explore from CODEBUILDER > OpenCL Application Analysis > Windows > AnalyzeSession Explorer
2. In the Analyze Sessions Explorer right-clickthe analyze session that you want to rerun and select Rerun.
3. A new analyze session is created and launched and the profiledapplication starts.
4. After the application is finished the new analyze sessionappears in the Analyze Explorer window and new reports aregenerated.
For each analysis session, the Code Analyzer creates a sessiondirectory named with application's name, the date, and an incremented session number.
A new capture subdirectory is created in the session directorythen profiling begins, and also each time you pause and resume the datacollection during the session. The files in that directory comprise the capturereports. The format of the capture directory is:
capture_N where N is an incrementingnumber
These are the types of files in a capture directory:
· *.csv -capture reports in CSV format.
· *.bin -capture reports in binary format - you can open such reports in Visual Studiofrom the Analyze Sessions Explorer.
In addition, a session file is created in the session directory.This file stores the data about session configuration. You can use it to createand run another similar session.
After you optimize your code, you can rerun the Code Analyzersession and compare the data to see how your changes improve your applicationperformance in the Kernel Development Framework.
During the performance analysis run the code analyzer stored allthe kernels that were executed during the analysis, together with their inputs,as a session of Kernel Development Framework.
To open this session in the Kernel Development Framework, clickthe EditKernel button in the Kernel Analysis page.
For each analysis session, the analysis tool creates a sessiondirectory named with the following information:
· application's name
· the date
· an incremented session number.
When you start profiling begins, and each time you pause andresume the data collection during the session, a new capture subdirectory iscreated in the session directory.
The capture directory is called "capture" and anincrementing number (for example, capture_1, capture_2, and so on). The capturedirectory includes capture reports in a binary format.
You can open this binary file in Visual Studio from the Analyze SessionsExplorer.
In addition, a session file is created in the session directory.This file stores the data about session configuration. You can use it to createand run another similar session. You can also export the analysis result tohtml and csv format.
KernelAnalysis
CodeAnalyzer Command Line Interface
You can use the Analyze SessionSettings dialog to change the reports directory and also to change theconnection info for the analyze sessions.
To open the Analyze Session Setting dialogbox go to CODE BUILDER > OpenCL ApplicationAnalysis > Settings
These are the fields in the dialog box
· ReportsLocation - full path tothe directory that contains analyze reports. If you check the Placeoutput reports in the solution directory check button and open thesolution in Visual Studio*, the Code Analyzer ignores the specified directoryand places the analysis reports into the solution directory.
· Connection - range of available port numbers. TheCode Analyzer uses one of the ports in the specified range to establishconnection with Visual Studio*.
· AdvancedSetting section
·
o Enable analysis of GPUactivity and memory transactions - check this box to collect and analyze GPUactivity and memory transactions in the GPU
o Reduce disk space -check this box to disable the deep analysis and reduce the amount of storeddata.