CTS问题分析13-CTS问题分析10(续)

CTS/GTS问题分析13

问题分析

这个问题不是第一次出现,详见CTS问题分析10;但当时有更紧急的问题,所以并没有继续深入分析,只是分析到持有大量的CompatibilityTestSuite导致retry时发生错误;

但是这次又出现了,因此有必要进行下调研,以确保下次不再复现此问题

retry 命令: run retry --retry 0 --shard-count 2 -s 7c6252f -s 7c62472

终端报错log:

java.lang.OutOfMemoryError: GC overhead limit exceeded
Dumping heap to java_pid26338.hprof ...
Heap dump file created [5553157593 bytes in 101.829 secs]
01-29 16:09:47 E/CommandScheduler: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap.newNode(HashMap.java:1747)
at java.util.HashMap.putVal(HashMap.java:631)
at java.util.HashMap.put(HashMap.java:612)
at java.util.HashSet.add(HashSet.java:220)
at java.util.AbstractCollection.addAll(AbstractCollection.java:344)
at com.android.tradefed.config.OptionSetter.setFieldValue(OptionSetter.java:452)
at com.android.tradefed.config.OptionSetter.setFieldValue(OptionSetter.java:549)
at com.android.tradefed.config.OptionCopier.copyOptions(OptionCopier.java:49)
at com.android.tradefed.config.OptionCopier.copyOptionsNoThrow(OptionCopier.java:60)
at com.android.tradefed.testtype.suite.ITestSuite.split(ITestSuite.java:662)
at com.android.compatibility.common.tradefed.testtype.retry.RetryFactoryTest.split(RetryFactoryTest.java:122)
at com.android.tradefed.invoker.shard.ShardHelper.shardTest(ShardHelper.java:123)
at com.android.tradefed.invoker.shard.ShardHelper.shardConfig(ShardHelper.java:30)
at com.android.tradefed.invoker.shard.StrictShardHelper.shardConfig(StrictShardHelper.java:51)
at com.android.tradefed.invoker.InvocationExecution.shardConfig(InvocationExecution.java:149)
at com.android.tradefed.invoker.TestInvocation.invoke(TestInvocation.java:656)
at com.android.tradefed.command.CommandScheduler$InvocationThread.run(CommandScheduler.java:1357)

首先,我们从中可以看到失败时栈的路径,从中找出为什么占用大量内存的原因

多台机器retry时的数据结构组织

通过以前的分析,我们知道大量的CompatibilityTestSuite,中间持有大量的exclude case项记录最终造成问题;因此我们跟着栈梳理下多台机器retry时,cts相关的数据结构是如何组织的
tools/tradefederation/core/src/com/android/tradefed/invoker/shard/ShardHelper.java

65    /**
66     * Attempt to shard the configuration into sub-configurations, to be re-scheduled to run on
67     * multiple resources in parallel.
68     *
69     * 

A successful shard action renders the current config empty, and invocation should not 70 * proceed. 71 * 72 * @see IShardableTest 73 * @see IRescheduler 74 * @param config the current {@link IConfiguration}. 75 * @param context the {@link IInvocationContext} holding the tests information. 76 * @param rescheduler the {@link IRescheduler} 77 * @return true if test was sharded. Otherwise return false 78 */ 79 @Override 80 public boolean shardConfig( 81 IConfiguration config, IInvocationContext context, IRescheduler rescheduler) { 82 List shardableTests = new ArrayList(); 83 boolean isSharded = false; 84 Integer shardCount = config.getCommandOptions().getShardCount(); 85 for (IRemoteTest test : config.getTests()) { 86 isSharded |= shardTest(shardableTests, test, shardCount, context);// shardTest做retry时test的切分工作 ,此时test中没有什么,只记录了cts-known-failures.xml中的已知失败项,保存在exclude list中 87 } 88 if (!isSharded) { 89 return false; 90 } 91 // shard this invocation! 92 // create the TestInvocationListener that will collect results from all the shards, 93 // and forward them to the original set of listeners (minus any ISharddableListeners) 94 // once all shards complete 95 int expectedShard = shardableTests.size(); 96 if (shardCount != null) { 97 expectedShard = Math.min(shardCount, shardableTests.size()); 98 } 99 ShardMasterResultForwarder resultCollector = 100 new ShardMasterResultForwarder(buildMasterShardListeners(config), expectedShard); 101 102 resultCollector.invocationStarted(context); 103 synchronized (shardableTests) { 104 // When shardCount is available only create 1 poller per shard 105 // TODO: consider aggregating both case by picking a predefined shardCount if not 106 // available (like 4) for autosharding. 107 if (shardCount != null) { 108 // We shuffle the tests for best results: avoid having the same module sub-tests 109 // contiguously in the list. 110 Collections.shuffle(shardableTests); 111 int maxShard = Math.min(shardCount, shardableTests.size()); 112 CountDownLatch tracker = new CountDownLatch(maxShard); 113 for (int i = 0; i < maxShard; i++) { 114 IConfiguration shardConfig = config.clone(); 115 shardConfig.setTest(new TestsPoolPoller(shardableTests, tracker)); 116 rescheduleConfig(shardConfig, config, context, rescheduler, resultCollector); 117 } 118 } else { 119 CountDownLatch tracker = new CountDownLatch(shardableTests.size()); 120 for (IRemoteTest testShard : shardableTests) { 121 CLog.i("Rescheduling sharded config..."); 122 IConfiguration shardConfig = config.clone(); 123 if (config.getCommandOptions().shouldUseDynamicSharding()) { 124 shardConfig.setTest(new TestsPoolPoller(shardableTests, tracker)); 125 } else { 126 shardConfig.setTest(testShard); 127 } 128 rescheduleConfig(shardConfig, config, context, rescheduler, resultCollector); 129 } 130 } 131 } 132 // clean up original builds 133 for (String deviceName : context.getDeviceConfigNames()) { 134 config.getDeviceConfigByName(deviceName) 135 .getBuildProvider() 136 .cleanUp(context.getBuildInfo(deviceName)); 137 } 138 return true; 139 }

196    /**
197     * Attempt to shard given {@link IRemoteTest}.
198     *
199     * @param shardableTests the list of {@link IRemoteTest}s to add to
200     * @param test the {@link IRemoteTest} to shard
201     * @param shardCount attempted number of shard, can be null.
202     * @param context the {@link IInvocationContext} of the current invocation.
203     * @return true if test was sharded
204     */
205    private static boolean shardTest(
206            List shardableTests,
207            IRemoteTest test,
208            Integer shardCount,
209            IInvocationContext context) {
210        boolean isSharded = false;
211        if (test instanceof IShardableTest) {
212            // inject device and build since they might be required to shard.
213            if (test instanceof IBuildReceiver) {
214                ((IBuildReceiver) test).setBuild(context.getBuildInfos().get(0));
215            }
216            if (test instanceof IDeviceTest) {
217                ((IDeviceTest) test).setDevice(context.getDevices().get(0));
218            }
219            if (test instanceof IMultiDeviceTest) {
220                ((IMultiDeviceTest) test).setDeviceInfos(context.getDeviceBuildMap());
221            }
222            if (test instanceof IInvocationContextReceiver) {
223                ((IInvocationContextReceiver) test).setInvocationContext(context);
224            }
225            //为test设置一些属性
226            IShardableTest shardableTest = (IShardableTest) test;
227            Collection shards = null;
228            // Give the shardCount hint to tests if they need it.
229            if (shardCount != null) { //当多台机器retry指定了shardCount时
230                shards = shardableTest.split(shardCount); //调用RetryFactoryTest.split方法
231            } else {
232                shards = shardableTest.split();
233            }
234            if (shards != null) {
235                shardableTests.addAll(shards);
236                isSharded = true;
237            }
238        }
239        if (!isSharded) {
240            shardableTests.add(test);
241        }
242        return isSharded;
243    }

test/suite_harness/common/host-side/tradefed/src/com/android/compatibility/common/tradefed/testtype/retry/RetryFactoryTest.java

180    @Override
181    public Collection split(int shardCountHint) {
182        try {
183            CompatibilityTestSuite test = loadSuite();
184            return test.split(shardCountHint); //注意上面两句,这里是组织数据结构的关键所在
185        } catch (DeviceNotAvailableException e) {
186            CLog.e("Failed to shard the retry run.");
187            CLog.e(e);
188        }
189        return null;
190    }

创建一个CompatibilityTestSuite

192    /**
193     * Helper to create a {@link CompatibilityTestSuite} from previous results.
194     */
195    private CompatibilityTestSuite loadSuite() throws DeviceNotAvailableException {
196        // Create a compatibility test and set it to run only what we want.
197        CompatibilityTestSuite test = createTest();
198
199        CompatibilityBuildHelper buildHelper = new CompatibilityBuildHelper(mBuildInfo);
200        // Create the helper with all the options needed.
201        RetryFilterHelper helper = createFilterHelper(buildHelper); //创建一个RetryFilterHelper
202        // TODO: we have access to the original command line, we should accommodate more re-run
203        // scenario like when the original cts.xml config was not used.
204        helper.validateBuildFingerprint(mDevice);
205        helper.setCommandLineOptionsFor(test);
206        helper.setCommandLineOptionsFor(this);
207        helper.populateRetryFilters(); //exclude项的增加
208
209        try {
210            OptionSetter setter = new OptionSetter(test);
211            for (String moduleArg : mModuleArgs) {
212                setter.setOptionValue("compatibility:module-arg", moduleArg);
213            }
214            for (String testArg : mTestArgs) {
215                setter.setOptionValue("compatibility:test-arg", testArg);
216            }
217        } catch (ConfigurationException e) {
218            throw new RuntimeException(e);
219        }
220
221        test.setIncludeFilter(helper.getIncludeFilters());
222        test.setExcludeFilter(helper.getExcludeFilters());
223        test.setDevice(mDevice);
224        test.setBuild(mBuildInfo);
225        test.setAbiName(mAbiName);
226        test.setPrimaryAbiRun(mPrimaryAbiRun);
227        test.setSystemStatusChecker(mStatusCheckers);
228        test.setInvocationContext(mContext);
229        test.setConfiguration(mMainConfiguration);
230        // reset the retry id - Ensure that retry of retry does not throw
231        test.resetRetryId();
232        test.isRetry();
233        // clean the helper
234        helper.tearDown();
235        return test;
236    }

test/suite_harness/common/host-side/tradefed/src/com/android/compatibility/common/tradefed/util/RetryFilterHelper.java

72    /**
73     * Constructor for a {@link RetryFilterHelper}.
74     *
75     * @param build a {@link CompatibilityBuildHelper} describing the build.
76     * @param sessionId The ID of the session to retry.
77     * @param subPlan The name of a subPlan to be used. Can be null.
78     * @param includeFilters The include module filters to apply
79     * @param excludeFilters The exclude module filters to apply
80     * @param abiName The name of abi to use. Can be null.
81     * @param moduleName The name of the module to run. Can be null.
82     * @param testName The name of the test to run. Can be null.
83     * @param retryType The type of results to retry. Can be null.
84     */
85    public RetryFilterHelper(CompatibilityBuildHelper build, int sessionId, String subPlan,
86            Set includeFilters, Set excludeFilters, String abiName,
87            String moduleName, String testName, RetryType retryType) {
88        this(build, sessionId);
89        mSubPlan = subPlan;
90        mIncludeFilters.addAll(includeFilters);
91        mExcludeFilters.addAll(excludeFilters);
92        mAbiName = abiName;
93        mModuleName = moduleName;
94        mTestName = testName;
95        mRetryType = retryType;
96    }

到此时mExcludeFilters中还只有cts-known-failures.xml中记录的已知错误,关键在populateRetryFilters

183    /**
184     * Populate mRetryIncludes and mRetryExcludes based on the options and the result set for
185     * this instance of RetryFilterHelper.
186     */
187    public void populateRetryFilters() {
188        mRetryIncludes = new HashSet<>(mIncludeFilters); // reset for each population
189        mRetryExcludes = new HashSet<>(mExcludeFilters); // reset for each population
190        if (RetryType.CUSTOM.equals(mRetryType)) {
191            Set customIncludes = new HashSet<>(mIncludeFilters);
192            Set customExcludes = new HashSet<>(mExcludeFilters);
193            if (mSubPlan != null) { //retry时一般不指定subplan,因此这里不会走到
194                ISubPlan retrySubPlan = SubPlanHelper.getSubPlanByName(mBuild, mSubPlan);
195                customIncludes.addAll(retrySubPlan.getIncludeFilters());
196                customExcludes.addAll(retrySubPlan.getExcludeFilters());
197            }
198            // If includes were added, only use those includes. Also use excludes added directly
199            // or by subplan. Otherwise, default to normal retry.
200            if (!customIncludes.isEmpty()) {
201                mRetryIncludes.clear();
202                mRetryIncludes.addAll(customIncludes);
203                mRetryExcludes.addAll(customExcludes);
204                return;
205            }
206        }
207        // remove any extra filtering options
208        // TODO(aaronholden) remove non-plan includes (e.g. those in cts-vendor-interface)
209        // TODO(aaronholden) remove non-known-failure excludes
210        mModuleName = null;
211        mTestName = null;
212        mSubPlan = null;
213        populateFiltersBySubPlan();
214        populatePreviousSessionFilters();
215    }

因此会走到这里

217    /* Generation of filters based on previous sessions is implemented thoroughly in SubPlanHelper,
218     * and retry filter generation is just a subset of the use cases for the subplan retry logic.
219     * Use retry type to determine which result types SubPlanHelper targets. */
220    public void populateFiltersBySubPlan() {
221        SubPlanHelper retryPlanCreator = new SubPlanHelper();
222        retryPlanCreator.setResult(getResult());
223        if (RetryType.FAILED.equals(mRetryType)) {
224            // retry only failed tests
225            retryPlanCreator.addResultType(SubPlanHelper.FAILED);
226        } else if (RetryType.NOT_EXECUTED.equals(mRetryType)){
227            // retry only not executed tests
228            retryPlanCreator.addResultType(SubPlanHelper.NOT_EXECUTED);
229        } else {
230            // retry both failed and not executed tests
231            retryPlanCreator.addResultType(SubPlanHelper.FAILED);
232            retryPlanCreator.addResultType(SubPlanHelper.NOT_EXECUTED);
233        }
234        try {
235            ISubPlan retryPlan = retryPlanCreator.createSubPlan(mBuild); //可以看到SubPlanHelper中的include list和exclude list会被加到CompatibilityTestSuite项中
236            mRetryIncludes.addAll(retryPlan.getIncludeFilters());了
237            mRetryExcludes.addAll(retryPlan.getExcludeFilters());
238        } catch (ConfigurationException e) {
239            throw new RuntimeException ("Failed to create subplan for retry", e);
240        }
241    }

test/suite_harness/common/host-side/tradefed/src/com/android/compatibility/common/tradefed/result/SubPlanHelper.java
createSubPlan 最关键点,从我们retry的报告中提取信息到include list(mIncludeFilters)和exclude list(mExcludeFilters)

206    /**
207     * Create a subplan derived from a result.
208     * 

209 * {@link Option} values must be set before this is called. 210 * @param buildHelper 211 * @return subplan 212 * @throws ConfigurationException 213 */ 214 public ISubPlan createSubPlan(CompatibilityBuildHelper buildHelper) 215 throws ConfigurationException { 216 setupFields(buildHelper); 217 ISubPlan subPlan = new SubPlan(); 218 219 // add filters from previous session to track which tests must run 220 subPlan.addAllIncludeFilters(mIncludeFilters); 221 subPlan.addAllExcludeFilters(mExcludeFilters); 222 if (mLastSubPlan != null) { 223 ISubPlan lastSubPlan = SubPlanHelper.getSubPlanByName(buildHelper, mLastSubPlan); 224 subPlan.addAllIncludeFilters(lastSubPlan.getIncludeFilters()); 225 subPlan.addAllExcludeFilters(lastSubPlan.getExcludeFilters()); 226 } 227 if (mModuleName != null) { 228 addIncludeToSubPlan(subPlan, new TestFilter(mAbiName, mModuleName, mTestName)); 229 } 230 Set statusesToRun = getStatusesToRun(); 231 for (IModuleResult module : mResult.getModules()) { 232 if (shouldRunModule(module)) { 233 TestFilter moduleInclude = 234 new TestFilter(module.getAbi(), module.getName(), null /*test*/); 235 if (shouldRunEntireModule(module)) { 236 // include entire module 237 addIncludeToSubPlan(subPlan, moduleInclude); //整个模块的所有case全部fail 238 } else if (mResultTypes.contains(NOT_EXECUTED) && !module.isDone()) { 239 // add module include and test excludes 240 addIncludeToSubPlan(subPlan, moduleInclude); 241 for (ICaseResult caseResult : module.getResults()) { 242 for (ITestResult testResult : caseResult.getResults()) { 243 if (!statusesToRun.contains(testResult.getResultStatus())) { 244 TestFilter testExclude = new TestFilter(module.getAbi(), 245 module.getName(), testResult.getFullName()); 246 addExcludeToSubPlan(subPlan, testExclude); //模块没执行完 done = false的情况 247 } 248 } 249 } 250 } else { 251 // Not-executed tests should not be rerun and/or this module is completed 252 // In any such case, it suffices to add includes for each test to rerun 253 for (ICaseResult caseResult : module.getResults()) { 254 for (ITestResult testResult : caseResult.getResults()) { 255 if (statusesToRun.contains(testResult.getResultStatus())) { 256 TestFilter testInclude = new TestFilter(module.getAbi(), 257 module.getName(), testResult.getFullName()); 258 addIncludeToSubPlan(subPlan, testInclude);//模块执行完成,但是中间有部分fail的情况 259 } 260 } 261 } 262 } 263 } else { 264 // module should not run, exclude entire module 265 TestFilter moduleExclude = 266 new TestFilter(module.getAbi(), module.getName(), null /*test*/); 267 addExcludeToSubPlan(subPlan, moduleExclude);//全部正确的module 268 } 269 } 270 return subPlan; 271 }

那么到这里,CompatibilityTestSuite为什么会持有大量的exclude case项记录已经明白了,CtsDeqpTestCases没有完成,且是在快完成前中断导致最后没有完成,这一项共有35万条case(仅v7a或者v8a)
CompatibilityTestSuite下面的一些初始化操作因为不是本文的重点,不再赘述了;继续看test.split(shardCountHint)的逻辑
tools/tradefederation/core/src/com/android/tradefed/testtype/suite/ITestSuite.java

621    /** {@inheritDoc} */
622    @Override
623    public Collection split(int shardCountHint) {
624        if (shardCountHint <= 1 || mIsSharded) {
625            // cannot shard or already sharded
626            return null;
627        }
628
629        LinkedHashMap runConfig = loadAndFilter();
630        if (runConfig.isEmpty()) {
631            CLog.i("No config were loaded. Nothing to run.");
632            return null;
633        }
634        injectInfo(runConfig);
635
636        // We split individual tests on double the shardCountHint to provide better average.
637        // The test pool mechanism prevent this from creating too much overhead.
638        List splitModules =
639                ModuleSplitter.splitConfiguration(
640                        runConfig, shardCountHint, mShouldMakeDynamicModule);
641        runConfig.clear();
642        runConfig = null;
643        // create an association of one ITestSuite <=> one ModuleDefinition as the smallest
644        // execution unit supported.
645        List splitTests = new ArrayList<>();
646        for (ModuleDefinition m : splitModules) {
647            ITestSuite suite = createInstance();
648            OptionCopier.copyOptionsNoThrow(this, suite);
649            suite.mIsSharded = true;
650            suite.mDirectModule = m;
651            splitTests.add(suite);
652        }
653        // return the list of ITestSuite with their ModuleDefinition assigned
654        return splitTests;
655    }

首先看loadAndFilter的相关逻辑

261    private LinkedHashMap loadAndFilter() {
262        LinkedHashMap runConfig = loadTests();
263        if (runConfig.isEmpty()) {
264            CLog.i("No config were loaded. Nothing to run.");
265            return runConfig;
266        }
267        if (mModuleMetadataIncludeFilter.isEmpty() && mModuleMetadataExcludeFilter.isEmpty()) {
268            return runConfig;
269        }
270        LinkedHashMap filteredConfig = new LinkedHashMap<>();
271        for (Entry config : runConfig.entrySet()) {
272            if (!filterByConfigMetadata(
273                    config.getValue(),
274                    mModuleMetadataIncludeFilter,
275                    mModuleMetadataExcludeFilter)) {
276                // if the module config did not pass the metadata filters, it's excluded
277                // from execution.
278                continue;
279            }
280            if (!filterByRunnerType(config.getValue(), mAllowedRunners)) {
281                // if the module config did not pass the runner type filter, it's excluded from
282                // execution.
283                continue;
284            }
285            filterPreparers(config.getValue(), mAllowedPreparers);
286            filteredConfig.put(config.getKey(), config.getValue());
287        }
288        runConfig.clear();
289        return filteredConfig;
290    }

tools/tradefederation/core/src/com/android/tradefed/testtype/suite/BaseTestSuite.java
首先在loadTests中重新组织mIncludeFilters和mExcludeFilters,变为mIncludeFiltersParsed和mExcludeFiltersParsed

133    /** {@inheritDoc} */
134    @Override
135    public LinkedHashMap loadTests() {
136        try {
137            File testsDir = getTestsDir();
138            setupFilters(testsDir);
139            Set abis = getAbis(getDevice());
140
141            // Create and populate the filters here
142            SuiteModuleLoader.addFilters(mIncludeFilters, mIncludeFiltersParsed, abis);
143            SuiteModuleLoader.addFilters(mExcludeFilters, mExcludeFiltersParsed, abis); //解析成键值对,module为name,List为其test
144
145            CLog.d(
146                    "Initializing ModuleRepo\nABIs:%s\n"
147                            + "Test Args:%s\nModule Args:%s\nIncludes:%s\nExcludes:%s",
148                    abis, mTestArgs, mModuleArgs, mIncludeFiltersParsed, mExcludeFiltersParsed);
149            mModuleRepo =
150                    createModuleLoader(
151                            mIncludeFiltersParsed, mExcludeFiltersParsed, mTestArgs, mModuleArgs);
152            // Actual loading of the configurations.
153            return loadingStrategy(abis, testsDir, mSuitePrefix, mSuiteTag);  //取要执行的module对应的config
154        } catch (DeviceNotAvailableException | FileNotFoundException e) {
155            throw new RuntimeException(e);
156        }
157    }
159    /**
160     * Default loading strategy will load from the resources and the tests directory. Can be
161     * extended or replaced.
162     *
163     * @param abis The set of abis to run against.
164     * @param testsDir The tests directory.
165     * @param suitePrefix A prefix to filter the resource directory.
166     * @param suiteTag The suite tag a module should have to be included. Can be null.
167     * @return A list of loaded configuration for the suite.
168     */
169    public LinkedHashMap loadingStrategy(
170            Set abis, File testsDir, String suitePrefix, String suiteTag) {
171        LinkedHashMap loadedConfigs = new LinkedHashMap<>();
172        // Load configs that are part of the resources
173        if (!mSkipJarLoading) {
174            loadedConfigs.putAll(
175                    getModuleLoader().loadConfigsFromJars(abis, suitePrefix, suiteTag));
176        }
177
178        // Load the configs that are part of the tests dir
179        if (mConfigPatterns.isEmpty()) {
180            // If no special pattern was configured, use the default configuration patterns we know
181            mConfigPatterns.add(".*\\.config");
182            mConfigPatterns.add(".*\\.xml");
183        }
184        loadedConfigs.putAll(
185                getModuleLoader()
186                        .loadConfigsFromDirectory(
187                                testsDir, abis, suitePrefix, suiteTag, mConfigPatterns));
188        return loadedConfigs;
189    }

tools/tradefederation/core/src/com/android/tradefed/testtype/suite/ModuleSplitter.java
然后调用到splitConfiguration

56    /**
57     * Create a List of executable unit {@link ModuleDefinition}s based on the map of configuration
58     * that was loaded.
59     *
60     * @param runConfig {@link LinkedHashMap} loaded from {@link ITestSuite#loadTests()}.
61     * @param shardCount a shard count hint to help with sharding.
62     * @return List of {@link ModuleDefinition}
63     */
64    public static List splitConfiguration(
65            LinkedHashMap runConfig,
66            int shardCount,
67            boolean dynamicModule) {
68        if (dynamicModule) {
69            // We maximize the sharding for dynamic to reduce time difference between first and
70            // last shard as much as possible. Overhead is low due to our test pooling.
71            shardCount *= 2;
72        }
73        List runModules = new ArrayList<>();
74        for (Entry configMap : runConfig.entrySet()) {
75            // Check that it's a valid configuration for suites, throw otherwise.
76            ValidateSuiteConfigHelper.validateConfig(configMap.getValue());
77
78            createAndAddModule(
79                    runModules,
80                    configMap.getKey(),
81                    configMap.getValue(),
82                    shardCount,
83                    dynamicModule); //根据module name,config,shardcount 创建对应的ModuleDefinition
84        }
85        return runModules;
86    }
88    private static void createAndAddModule(
89            List currentList,
90            String moduleName,
91            IConfiguration config,
92            int shardCount,
93            boolean dynamicModule) {
94        // If this particular configuration module is declared as 'not shardable' we take it whole
95        // but still split the individual IRemoteTest in a pool.
96        if (config.getConfigurationDescription().isNotShardable()
97                || (!dynamicModule
98                        && config.getConfigurationDescription().isNotStrictShardable())) {
99            for (int i = 0; i < config.getTests().size(); i++) {
100                if (dynamicModule) {
101                    ModuleDefinition module =
102                            new ModuleDefinition(
103                                    moduleName,
104                                    config.getTests(),
105                                    clonePreparersMap(config),
106                                    clonePreparers(config.getMultiTargetPreparers()),
107                                    config);
108                    currentList.add(module);
109                } else {
110                    addModuleToListFromSingleTest(
111                            currentList, config.getTests().get(i), moduleName, config);
112                }
113            }
114            return;
115        }
116
117        // If configuration is possibly shardable we attempt to shard it.
118        for (IRemoteTest test : config.getTests()) {
119            if (test instanceof IShardableTest) {
120                Collection shardedTests = ((IShardableTest) test).split(shardCount);
121                if (shardedTests != null) {
122                    // Test did shard we put the shard pool in ModuleDefinition which has a polling
123                    // behavior on the pool.
124                    if (dynamicModule) {
125                        for (int i = 0; i < shardCount; i++) {
126                            ModuleDefinition module =
127                                    new ModuleDefinition(
128                                            moduleName,
129                                            shardedTests,
130                                            clonePreparersMap(config),
131                                            clonePreparers(config.getMultiTargetPreparers()),
132                                            config);
133                            currentList.add(module);
134                        }
135                    } else {
136                        // We create independent modules with each sharded test.
137                        for (IRemoteTest moduleTest : shardedTests) {
138                            addModuleToListFromSingleTest(
139                                    currentList, moduleTest, moduleName, config);
140                        }
141                    }
142                    continue;
143                }
144            }
145            // test is not shardable or did not shard
146            addModuleToListFromSingleTest(currentList, test, moduleName, config);
147        }
148    }

创建出ModuleDefinition list之后,根据其进行进一步的split操作

646        for (ModuleDefinition m : splitModules) {
647            ITestSuite suite = createInstance();
648            OptionCopier.copyOptionsNoThrow(this, suite); //注意这里,刚刚的创建的CompatibilityTestSuite有复制的操作
649            suite.mIsSharded = true;
650            suite.mDirectModule = m; //新的suite,为mDirectModule赋值(刚刚创建的ModuleDefinition)
651            splitTests.add(suite); //CompatibilityTestSuite list
652        }

这里splitTests就是hprof中造成失败的CompatibilityTestSuite list
tools/tradefederation/core/src/com/android/tradefed/config/OptionCopier.java

54    /**
55     * Identical to {@link #copyOptions(Object, Object)} but will log instead of throw if exception
56     * occurs.
57     */
58    public static void copyOptionsNoThrow(Object source, Object dest) {
59        try {
60            copyOptions(source, dest);
61        } catch (ConfigurationException e) {
62            CLog.e(e);
63        }
64    }
32    /**
33     * Copy the values from {@link Option} fields in origObject to destObject
34     *
35     * @param origObject the {@link Object} to copy from
36     * @param destObject the {@link Object} tp copy to
37     * @throws ConfigurationException if options failed to copy
38     */
39    public static void copyOptions(Object origObject, Object destObject)
40            throws ConfigurationException {
41        Collection origFields = OptionSetter.getOptionFieldsForClass(origObject.getClass());
42        Map destFieldMap = getFieldOptionMap(destObject);
43        for (Field origField : origFields) {
44            final Option option = origField.getAnnotation(Option.class);
45            Field destField = destFieldMap.remove(option.name());
46            if (destField != null) {
47                Object origValue = OptionSetter.getFieldValue(origField,
48                        origObject);
49                OptionSetter.setFieldValue(option.name(), destObject, destField, origValue);
50            }
51        }
52    }

最后复制出大量的CompatibilityTestSuite (需要retry module多的情况) ;并且每个CompatibilityTestSuite持有大量的exclude记录项(35万条);最终造成log中的报错

问题总结

  1. 测试CtsDeqpTestCases module这个超大模块时,再其要执行完时,adb中断等情况造成case中断,done = false;因此再retry时,会将大量的exclude项记录到CompatibilityTestSuite中
  2. CompatibilityTestSuite在多台机器retry时有复制操作,更进一步放到了问题,导致fail
  3. 临时解决方案,将CtsDeqpTestCases这个模块单独提出来测试,这样能保证问题绝对不会发生;就算在此中断,单独retry CtsDeqpTestCases报告也不会进行复制操作;因此,目前看来只要单独测试CtsDeqpTestCases模块,此问题绝不会复现,这也是google允许的
  4. 建议google进行cts框架的修改,比如对retry时不用的exclude项进行移除;或者复制CompatibilityTestSuite时对exclude list用单例模式进行处理(这建议google来修复,google更熟悉此逻辑,并且google自身有专门的团队在不断迭代更新)
  5. 向google提供的首个patch 只是一种思路,不太好,还是建议google来修复这个问题

你可能感兴趣的:(CTS问题分析13-CTS问题分析10(续))