QT之webkit 分析(三)
分三个阶段对QWebView进行分析:初始化(获取数据)、HTML解析、页面显示。从QT自带的文档中可以知道:
QWebView -> QWebPage => QWebFrame(一个QWebPage含多个QWebFrame)
在界面中选择了Open URL,输入URL之后,调用的是:void MainWindow::openUrl()
void MainWindow::openUrl()
{
bool ok;
QString url = QInputDialog::getText(this, tr("Enter a URL"),
tr("URL:"), QLineEdit::Normal, "http://", &ok);
if (ok && !url.isEmpty()) {
centralWidget->webView->setUrl(url);
}
}
调用的是QWebView::setUrl()
void QWebView::setUrl(const QUrl &url)
{
page()->mainFrame()->setUrl(url);
}
其中page()是获取QWebPage指针,QWebPage::mainFrame()获取的是QWebFrame指针。
所以调用的是:QWebFrame::setUrl()
void QWebFrame::setUrl(const QUrl &url)
{
d->frame->loader()->begin(ensureAbsoluteUrl(url));
d->frame->loader()->end();
load(ensureAbsoluteUrl(url));
}
ensureAbsoluteUrl()函数作用是,确保URL是绝对URL(完整URL)。所谓相对URL是指没有输入http://或者https://等前缀的web地址。先看第一句的调用。其中隐含了从QUrl到KURL的变换。
void FrameLoader::begin(const KURL& url, bool dispatch, SecurityOrigin* origin)
{
// We need to take a reference to the security origin because |clear|
// might destroy the document that owns it.
RefPtr forcedSecurityOrigin = origin;
bool resetScripting = !(m_isDisplayingInitialEmptyDocument && m_frame->document() && m_frame->document()->securityOrigin()->isSecureTransitionTo(url));
clear(resetScripting, resetScripting); // 清除上一次的数据,为本次装载准备
if (resetScripting)
m_frame->script()->updatePlatformScriptObjects(); // 在Windows平台下,这是空函数
if (dispatch)
dispatchWindowObjectAvailable();
m_needsClear = true;
m_isComplete = false;
m_didCallImplicitClose = false;
m_isLoadingMainResource = true;
m_isDisplayingInitialEmptyDocument = m_creatingInitialEmptyDocument;
KURL ref(url);
ref.setUser(String());
ref.setPass(String());
ref.setRef(String());
m_outgoingReferrer = ref.string();
m_URL = url;
RefPtr document;
if (!m_isDisplayingInitialEmptyDocument && m_client->shouldUsePluginDocument(m_responseMIMEType))
document = PluginDocument::create(m_frame);
else
document = DOMImplementation::createDocument(m_responseMIMEType, m_frame, m_frame->inViewSourceMode()); // 创建DOM文件,m_responseMIMEType不同实体不同。
// 如果是"text/html"创建HTMLDocument实体;"application/xhtml+xml"创建Document实体
// 如果是"application/x-ftp-directory"则是FTPDirectoryDocument实体
// text/vnd.wap.wml 对应 WMLDocument 实体(无线)
// "application/pdf" /"text/plain" 对应 PluginDocument实体
// 如果是MediaPlayer::supportsType(type),创建的是MediaDocument实体
// "image/svg+xml" 对应 SVGDocument实体
m_frame->setDocument(document);
document->setURL(m_URL);
if (m_decoder)
document->setDecoder(m_decoder.get());
if (forcedSecurityOrigin)
document->setSecurityOrigin(forcedSecurityOrigin.get());
m_frame->domWindow()->setURL(document->url());
m_frame->domWindow()->setSecurityOrigin(document->securityOrigin());
updatePolicyBaseURL(); // 更新排布策略的基础URL
Settings* settings = document->settings();
document->docLoader()->setAutoLoadImages(settings && settings->loadsImagesAutomatically());
if (m_documentLoader) {
String dnsPrefetchControl = m_documentLoader->response().httpHeaderField("X-DNS-Prefetch-Control");
if (!dnsPrefetchControl.isEmpty())
document->parseDNSPrefetchControlHeader(dnsPrefetchControl);
}
#if FRAME_LOADS_USER_STYLESHEET
KURL userStyleSheet = settings ? settings->userStyleSheetLocation() : KURL();
if (!userStyleSheet.isEmpty())
m_frame->setUserStyleSheetLocation(userStyleSheet);
#endif
restoreDocumentState();
document->implicitOpen();
if (m_frame->view())
m_frame->view()->setContentsSize(IntSize());
#if USE(LOW_BANDWIDTH_DISPLAY)
// Low bandwidth display is a first pass display without external resources
// used to give an instant visual feedback. We currently only enable it for
// HTML documents in the top frame.
if (document->isHTMLDocument() && !m_frame->tree()->parent() && m_useLowBandwidthDisplay) {
m_pendingSourceInLowBandwidthDisplay = String();
m_finishedParsingDuringLowBandwidthDisplay = false;
m_needToSwitchOutLowBandwidthDisplay = false;
document->setLowBandwidthDisplay(true);
}
#endif
}
看其中document->implicitOpen()的代码:
void Document::implicitOpen()
{
cancelParsing();
clear();
m_tokenizer = createTokenizer();
setParsing(true);
}
Tokenizer *HTMLDocument::createTokenizer()
{
bool reportErrors = false;
if (frame())
if (Page* page = frame()->page())
reportErrors = page->inspectorController()->windowVisible();
return new HTMLTokenizer(this, reportErrors);
}
新创建的HTMLTokenizer对象,就是HTML的解析器。
回到QWebFrame::setUrl()的第二句:d->frame->loader()->end();
只是把上次未完的解析停止:
void FrameLoader::endIfNotLoadingMainResource()
{
if (m_isLoadingMainResource || !m_frame->page())
return;
// http://bugs.webkit.org/show_bug.cgi?id=10854
// The frame's last ref may be removed and it can be deleted by checkCompleted(),
// so we'll add a protective refcount
RefPtr protector(m_frame);
// make sure nothing's left in there
if (m_frame->document()) {
write(0, 0, true);
m_frame->document()->finishParsing();
} else
// WebKit partially uses WebCore when loading non-HTML docs. In these cases doc==nil, but
// WebCore is enough involved that we need to checkCompleted() in order for m_bComplete to
// become true. An example is when a subframe is a pure text doc, and that subframe is the
// last one to complete.
checkCompleted();
}
再来看QWebFrame::setUrl()的第三句:load(ensureAbsoluteUrl(url));
void QWebFrame::load(const QUrl &url)
{
load(QNetworkRequest(ensureAbsoluteUrl(url)));
}
新建一个QNetworkRequest对象,然后调用
void load(const QNetworkRequest &request,
QNetworkAccessManager::Operation operation = QNetworkAccessManager::GetOperation,
const QByteArray &body = QByteArray());
看其代码:
void QWebFrame::load(const QNetworkRequest &req,
QNetworkAccessManager::Operation operation,
const QByteArray &body)
{
if (d->parentFrame())
d->page->d->insideOpenCall = true;
QUrl url = ensureAbsoluteUrl(req.url());
WebCore::ResourceRequest request(url);
switch (operation) {
case QNetworkAccessManager::HeadOperation:
request.setHTTPMethod("HEAD");
break;
case QNetworkAccessManager::GetOperation:
request.setHTTPMethod("GET");
break;
case QNetworkAccessManager::PutOperation:
request.setHTTPMethod("PUT");
break;
case QNetworkAccessManager::PostOperation:
request.setHTTPMethod("POST");
break;
case QNetworkAccessManager::UnknownOperation:
// eh?
break;
}
QList httpHeaders = req.rawHeaderList();
for (int i = 0; i < httpHeaders.size(); ++i) {
const QByteArray &headerName = httpHeaders.at(i);
request.addHTTPHeaderField(QString::fromLatin1(headerName), QString::fromLatin1(req.rawHeader(headerName)));
}
if (!body.isEmpty())
request.setHTTPBody(WebCore::FormData::create(body.constData(), body.size()));
d->frame->loader()->load(request);
if (d->parentFrame())
d->page->d->insideOpenCall = false;
}
看关键的FrameLoader::load()
void FrameLoader::load(const ResourceRequest& request)
{
load(request, SubstituteData());
}
void FrameLoader::load(const ResourceRequest& request, const SubstituteData& substituteData)
{
if (m_inStopAllLoaders)
return;
// FIXME: is this the right place to reset loadType? Perhaps this should be done after loading is finished or aborted.
m_loadType = FrameLoadTypeStandard;
load(m_client->createDocumentLoader(request, substituteData).get());
}
上面m_client对应的是FrameLoaderClientQt实体,m_client->createDocumentLoader()创建的是DocumentLoader对象。进一步看FrameLoader::load(DocumentLoader *)的代码:
void FrameLoader::load(DocumentLoader* newDocumentLoader)
{
ResourceRequest& r = newDocumentLoader->request();
addExtraFieldsToMainResourceRequest(r);
FrameLoadType type;
if (shouldTreatURLAsSameAsCurrent(newDocumentLoader->originalRequest().url())) {
r.setCachePolicy(ReloadIgnoringCacheData);
type = FrameLoadTypeSame;
} else
type = FrameLoadTypeStandard;
if (m_documentLoader)
newDocumentLoader->setOverrideEncoding(m_documentLoader->overrideEncoding());
// When we loading alternate content for an unreachable URL that we're
// visiting in the history list, we treat it as a reload so the history list
// is appropriately maintained.
//
// FIXME: This seems like a dangerous overloading of the meaning of "FrameLoadTypeReload" ...
// shouldn't a more explicit type of reload be defined, that means roughly
// "load without affecting history" ?
if (shouldReloadToHandleUnreachableURL(newDocumentLoader)) {
ASSERT(type == FrameLoadTypeStandard);
type = FrameLoadTypeReload;
}
loadWithDocumentLoader(newDocumentLoader, type, 0);
}
QT之webkit 分析(四)
接昨天的分析,看FrameLoader::loadWithDocumentLoader()的代码:
void FrameLoader::loadWithDocumentLoader(DocumentLoader* loader, FrameLoadType type, PassRefPtr prpFormState)
{
ASSERT(m_client->hasWebView());
// Unfortunately the view must be non-nil, this is ultimately due
// to parser requiring a FrameView. We should fix this dependency.
ASSERT(m_frame->view());
m_policyLoadType = type;
RefPtr formState = prpFormState;
bool isFormSubmission = formState;
const KURL& newURL = loader->request().url();
if (shouldScrollToAnchor(isFormSubmission, m_policyLoadType, newURL)) {
RefPtr oldDocumentLoader = m_documentLoader;
NavigationAction act ion(newURL, m_policyLoadType, isFormSubmission);
oldDocumentLoader->setTriggeringAction(act ion);
stopPolicyCheck();
checkNavigationPolicy(loader->request(), oldDocumentLoader.get(), formState,
callContinueFragmentScrollAfterNavigationPolicy, this);
} else {
if (Frame* parent = m_frame->tree()->parent())
loader->setOverrideEncoding(parent->loader()->documentLoader()->overrideEncoding());
stopPolicyCheck();
setPolicyDocumentLoader(loader);
checkNavigationPolicy(loader->request(), loader, formState,
callContinueLoadAfterNavigationPolicy, this);
}
}
上面调用checkNavigationPolicy()是关键,看其实现:
void FrameLoader::checkNavigationPolicy(const ResourceRequest& request, DocumentLoader* loader,
PassRefPtr formState, NavigationPolicyDecisionFunction function, void* argument)
{
NavigationAction act ion = loader->triggeringAction();
if (act ion.isEmpty()) {
act ion = NavigationAction(request.url(), NavigationTypeOther);
loader->setTriggeringAction(act ion);
}
// Don't ask more than on ce for the same request or if we are loading an empty URL.
// This avoids confusion on the part of the client.
if (equalIgnoringHeaderFields(request, loader->lastCheckedRequest()) || (!request.isNull() && request.url().isEmpty())) {
function(argument, request, 0, true);
loader->setLastCheckedRequest(request);
return;
}
// We are always willing to show alternate content for unreachable URLs;
// treat it like a reload so it maintains the right state for b/f list.
if (loader->substituteData().isValid() && !loader->substituteData().failingURL().isEmpty()) {
if (isBackForwardLoadType(m_policyLoadType))
m_policyLoadType = FrameLoadTypeReload;
function(argument, request, 0, true);
return;
}
loader->setLastCheckedRequest(request);
m_policyCheck.set(request, formState.get(), function, argument);
m_delegateIsDecidingNavigationPolicy = true;
m_client->dispatchDecidePolicyForNavigationAction(&FrameLoader::continueAfterNavigationPolicy,
action, request, formState);
m_delegateIsDecidingNavigationPolicy = false;
}
其中m_client是FrameLoaderClientQt实体指针,
void FrameLoaderClientQt::dispatchDecidePolicyForNavigationAction(FramePolicyFunction function, const WebCore::NavigationAction& act ion, const WebCore::ResourceRequest& request, PassRefPtr)
{
Q_ASSERT(!m_policyFunction);
Q_ASSERT(m_webFrame);
m_policyFunction = function;
#if QT_VERSION < 0x040400
QWebNetworkRequest r(request);
#else
QNetworkRequest r(request.toNetworkRequest());
#endif
QWebPage*page = m_webFrame->page();
if (!page->d->acceptNavigationRequest(m_webFrame, r, QWebPage::NavigationType(act ion.type()))) {
if (act ion.type() == NavigationTypeFormSubmitted || act ion.type() == NavigationTypeFormResubmitted)
m_frame->loader()->resetMultipleFormSubmissionProtection();
if (act ion.type() == NavigationTypeLinkClicked && r.url().hasFragment()) {
ResourceRequest emptyRequest;
m_frame->loader()->activeDocumentLoader()->setLastCheckedRequest(emptyRequest);
}
slotCallPolicyFunction(PolicyIgnore);
return;
}
slotCallPolicyFunction(PolicyUse);
}
void FrameLoaderClientQt::slotCallPolicyFunction(int act ion)
{
if (!m_frame || !m_policyFunction)
return;
FramePolicyFunction function = m_policyFunction;
m_policyFunction = 0;
(m_frame->loader()->*function)(WebCore::PolicyAction(action));
}
用函数指针回调,FrameLoader::continueAfterNavigationPolicy(PolicyAction policy),参数为
PolicyUse
void FrameLoader::continueAfterNavigationPolicy(PolicyAction policy)
{
PolicyCheck check = m_policyCheck;
m_policyCheck.clear();
bool shouldContinue = policy == PolicyUse;
switch (policy) {
case PolicyIgnore:
check.clearRequest();
break;
case PolicyDownload:
m_client->startDownload(check.request());
check.clearRequest();
break;
case PolicyUse: {
ResourceRequest request(check.request());
if (!m_client->canHandleRequest(request)) {
handleUnimplementablePolicy(m_client->cannotShowURLError(check.request()));
check.clearRequest();
shouldContinue = false;
}
break;
}
}
check.call(shouldContinue);
}
上面调用的是PolicyCheck::call(),参数为true
void PolicyCheck::call(bool shouldContinue)
{
if (m_navigationFunction)
m_navigationFunction(m_argument, m_request, m_formState.get(), shouldContinue);
if (m_newWindowFunction)
m_newWindowFunction(m_argument, m_request, m_formState.get(), m_frameName, shouldContinue);
ASSERT(!m_contentFunction);
}
m_navigationFunction又是一个函数指针,指向的是FrameLoader::callContinueLoadAfterNavigationPolicy()
void FrameLoader::callContinueLoadAfterNavigationPolicy(void* argument,
const ResourceRequest& request, PassRefPtr formState, bool shouldContinue)
{
FrameLoader* loader = static_cast(argument);
loader->continueLoadAfterNavigationPolicy(request, formState, shouldContinue);
}
void FrameLoader::continueLoadAfterNavigationPolicy(const ResourceRequest&, PassRefPtr formState, bool shouldContinue)
{
// If we loaded an alternate page to replace an unreachableURL, we'll get in here with a
// nil policyDataSource because loading the alternate page will have passed
// through this method already, nested; otherwise, policyDataSource should still be set.
ASSERT(m_policyDocumentLoader || !m_provisionalDocumentLoader->unreachableURL().isEmpty());
bool isTargetItem = m_provisionalHistoryItem ? m_provisionalHistoryItem->isTargetItem() : false;
// Two reasons we can't continue:
// 1) Navigation policy delegate said we can't so request is nil. A primary case of this
// is the user responding Cancel to the form repost nag sheet.
// 2) User responded Cancel to an alert popped up by the before unload event handler.
// The "before unload" event handler runs on ly for the main frame.
bool canContinue = shouldContinue && (!isLoadingMainFrame() || m_frame->shouldClose());
if (!canContinue) {
// If we were waiting for a quick redirect, but the policy delegate decided to ignore it, then we
// need to report that the client redirect was cancelled.
if (m_quickRedirectComing)
clientRedirectCancelledOrFinished(false);
setPolicyDocumentLoader(0);
// If the navigation request came from the back/forward menu, and we punt on it, we have the
// problem that we have optimistically moved the b/f cursor already, so move it back. For sanity,
// we on ly do this when punting a navigation for the target frame or top-level frame.
if ((isTargetItem || isLoadingMainFrame()) && isBackForwardLoadType(m_policyLoadType))
if (Page* page = m_frame->page()) {
Frame* mainFrame = page->mainFrame();
if (HistoryItem* resetItem = mainFrame->loader()->m_currentHistoryItem.get()) {
page->backForwardList()->goToItem(resetItem);
Settings* settings = m_frame->settings();
page->setGlobalHistoryItem((!settings || settings->privateBrowsingEnabled()) ? 0 : resetItem);
}
}
return;
}
FrameLoadType type = m_policyLoadType;
stopAllLoaders();
// - In certain circumstances on pages with multiple frames, stopAllLoaders()
// might detach the current FrameLoader, in which case we should bail on this newly defunct load.
if (!m_frame->page())
return;
setProvisionalDocumentLoader(m_policyDocumentLoader.get());
m_loadType = type;
setState(FrameStateProvisional);
setPolicyDocumentLoader(0);
if (isBackForwardLoadType(type) && loadProvisionalItemFromCachedPage())
return;
if (formState)
m_client->dispatchWillSubmitForm(&FrameLoader::continueLoadAfterWillSubmitForm, formState);
else
continueLoadAfterWillSubmitForm();
}
void FrameLoader::continueLoadAfterWillSubmitForm(PolicyAction)
{
if (!m_provisionalDocumentLoader)
return;
// DocumentLoader calls back to our prepareForLoadStart
m_provisionalDocumentLoader->prepareForLoadStart();
// The load might be cancelled inside of prepareForLoadStart(), nulling out the m_provisionalDocumentLoader,
// so we need to null check it again.
if (!m_provisionalDocumentLoader)
return;
//
先看活动的DocumentLoader能否装载
DocumentLoader* activeDocLoader = activeDocumentLoader();
if (activeDocLoader &&
activeDocLoader->isLoadingMainResource())
return;
// 看Cache中能否装载
m_provisionalDocumentLoader->setLoadingFromCachedPage(false);
unsigned long identifier = 0;
if (Page* page = m_frame->page()) {
identifier = page->progress()->createUniqueIdentifier();
dispatchAssignIdentifierToInitialRequest(identifier, m_provisionalDocumentLoader.get(), m_provisionalDocumentLoader->originalRequest());
}
if (!
m_provisionalDocumentLoader->startLoadingMainResource(identifier))
m_provisionalDocumentLoader->updateLoading();
}
上面的装载过程,如果是第一次并且只有m_provisionalDocumentLoader的话,只会执行最后一中装载。
bool DocumentLoader::startLoadingMainResource(unsigned long identifier)
{
ASSERT(!m_mainResourceLoader);
m_mainResourceLoader = MainResourceLoader::create(m_frame);
m_mainResourceLoader->setIdentifier(identifier);
// FIXME: Is there any way the extra fields could have not been added by now?
// If not, it would be great to remove this line of co de.
frameLoader()->addExtraFieldsToMainResourceRequest(m_request);
if (!
m_mainResourceLoader->load(m_request, m_substituteData)) {
// FIXME: If this should really be caught, we should just ASSERT this doesn't happen;
// should it be caught by other parts of WebKit or other parts of the app?
LOG_ERROR("could not create WebResourceHandle for URL %s -- should be caught by policy handler level", m_request.url().string().ascii().da ta());
m_mainResourceLoader = 0;
return false;
}
return true;
}
创建MainResourceLoader对象,并调用load()
bool MainResourceLoader::load(const ResourceRequest& r, const SubstituteData& substituteData)
{
ASSERT(!m_handle);
m_substituteData = substituteData;
#if ENABLE(OFFLINE_WEB_APPLICATIONS)
// Check if this request should be loaded from the application cache
if (!m_substituteData.isValid() && frameLoader()->frame()->settings() && frameLoader()->frame()->settings()->offlineWebApplicationCacheEnabled()) {
ASSERT(!m_applicationCache);
m_applicationCache = ApplicationCacheGroup::cacheForMainRequest(r, m_documentLoader.get());
if (m_applicationCache) {
// Get the resource from the application cache. By definition, cacheForMainRequest() returns a cache that contains the resource.
ApplicationCacheResource* resource = m_applicationCache->resourceForRequest(r);
m_substituteData = SubstituteData(resource->da ta(),
resource->response().mimeType(),
resource->response().textEncodingName(), KURL());
}
}
#endif
ResourceRequest request(r);
bool defer = defersLoading();
if (defer) {
bool shouldLoadEmpty = shouldLoadAsEmptyDocument(r.url());
if (shouldLoadEmpty)
defer = false;
}
if (!defer) {
if (
loadNow(request)) {
// Started as an empty document, but was redirected to something non-empty.
ASSERT(defersLoading());
defer = true;
}
}
if (defer)
m_initialRequest = request;
return true;
}
继续深入看MainResourceLoader::loadNow()
bool MainResourceLoader::loadNow(ResourceRequest& r)
{
bool shouldLoadEmptyBeforeRedirect = shouldLoadAsEmptyDocument(r.url());
ASSERT(!m_handle);
ASSERT(shouldLoadEmptyBeforeRedirect || !defersLoading());
// Send this synthetic delegate callback since clients expect it, and
// we no longer send the callback from within NSURLConnection for
// initial requests.
willSendRequest(r, ResourceResponse());
//
// willSendRequest() is liable to make the call to frameLoader() return NULL, so we need to check that here
if (!frameLoader())
return false;
const KURL& url = r.url();
bool shouldLoadEmpty = shouldLoadAsEmptyDocument(url) && !m_substituteData.isValid();
if (shouldLoadEmptyBeforeRedirect && !shouldLoadEmpty && defersLoading())
return true;
if (m_substituteData.isValid())
handleDataLoadSoon(r);
else if (shouldLoadEmpty || frameLoader()->representationExistsForURLScheme(url.protocol()))
handleEmptyLoad(url, !shouldLoadEmpty);
else
m_handle = ResourceHandle::create(r, this, m_frame.get(), false, true, true);
return false;
}
主要两个调用:willSendRequest()和ResourceHandle::create(),前面一个估计是发送请求前的相关设定;后一个就是请求发送了。先看前一个:
void MainResourceLoader::willSendRequest(ResourceRequest& newRequest, const ResourceResponse& redirectResponse)
{
// Note that there are no asserts here as there are for the other callbacks. This is due to the
// fact that this "callback" is sent when starting every load, and the state of callback
// deferrals plays less of a part in this function in preventing the bad beha vior deferring
// callbacks is meant to prevent.
ASSERT(!newRequest.isNull());
// The additional processing can do anything including possibly removing the last
// reference to this object; on e example of this is 3266216.
RefPtr protect(this);
// Update cookie policy base URL as URL changes, except for subframes, which use the
// URL of the main frame which doesn't change when we redirect.
if (frameLoader()->isLoadingMainFrame())
newRequest.setMainDocumentURL(newRequest.url());
// If we're fielding a redirect in response to a POST, force a load from origin, since
// this is a common site technique to return to a page viewing some da ta that the POST
// just modified.
// Also, POST requests always load from origin, but this does not affect subresources.
if (newRequest.cachePolicy() == UseProtocolCachePolicy && isPostOrRedirectAfterPost(newRequest, redirectResponse))
newRequest.setCachePolicy(ReloadIgnoringCacheData);
ResourceLoader::willSendRequest(newRequest, redirectResponse);
// Don't set this on the first request. It is set when the main load was started.
m_documentLoader->setRequest(newRequest);
// FIXME: Ideally we'd stop the I/O until we hear back from the navigation policy delegate
// listener. But there's no way to do that in practice. So instead we cancel later if the
// listener tells us to. In practice that means the navigation policy needs to be decided
// synchronously for these redirect cases.
ref(); // balanced by deref in continueAfterNavigationPolicy
frameLoader()->checkNavigationPolicy(newRequest, callContinueAfterNavigationPolicy, this);
}
主要是调用ResourceLoader::willSendRequest()函数:
void ResourceLoader::willSendRequest(ResourceRequest& request, const ResourceResponse& redirectResponse)
{
// Protect this in this delegate method since the additional processing can do
// anything including possibly derefing this; on e example of this is Radar 3266216.
RefPtr protector(this);
ASSERT(!m_reachedTerminalState);
if (m_sendResourceLoadCallbacks) {
if (!m_identifier) {
m_identifier = m_frame->page()->progress()->createUniqueIdentifier();
frameLoader()->assignIdentifierToInitialRequest(m_identifier, request);
}
frameLoader()->willSendRequest(this, request, redirectResponse);
}
m_request = request;
}
进一步调用FrameLoader::willSendRequest()
void FrameLoader::willSendRequest(ResourceLoader* loader, ResourceRequest& clientRequest, const ResourceResponse& redirectResponse)
{
applyUserAgent(clientRequest);
dispatchWillSendRequest(loader->documentLoader(), loader->identifier(), clientRequest, redirectResponse);
}
更多的调用:
void FrameLoader::dispatchWillSendRequest(DocumentLoader* loader, unsigned long identifier, ResourceRequest& request, const ResourceResponse& redirectResponse)
{
StringImpl* oldRequestURL = request.url().string().impl();
m_documentLoader->didTellClientAboutLoad(request.url());
m_client->dispatchWillSendRequest(loader, identifier, request, redirectResponse);
// If the URL changed, then we want to put that new URL in the "did tell client" set too.
if (oldRequestURL != request.url().string().impl())
m_documentLoader->didTellClientAboutLoad(request.url());
if (Page* page = m_frame->page())
page->inspectorController()->willSendRequest(loader, identifier, request, redirectResponse);
}
囧~~还有下一步吗??
m_client->dispatchWillSendRequest()实际调用的是FrameLoaderClientQt::dispatchWillSendRequest(),目前是一个空函数(仅在dump的时候打印信息)。
void InspectorController::willSendRequest(DocumentLoader*, unsigned long identifier, ResourceRequest& request, const ResourceResponse& redirectResponse)
{
if (!enabled())
return;
InspectorResource* resource = m_resources.get(identifier).get();
if (!resource)
return;
resource->startTime = currentTime();
if (!redirectResponse.isNull()) {
updateResourceRequest(resource, request);
updateResourceResponse(resource, redirectResponse);
}
if (resource != m_mainResource && windowVisible()) {
if (!resource->scriptObject)
addScriptResource(resource);
else
updateScriptResourceRequest(resource);
updateScriptResource(resource, resource->startTime, resource->responseReceivedTime, resource->endTime);
if (!redirectResponse.isNull())
updateScriptResourceResponse(resource);
}
}
在这里设定了开始时间,猜测是供请求超时判断用的,请求超时的定时器在何处设定有待进一步分析。
看都是一些Resource的更新,感觉意义不大,不再进一步追踪。回到MainResourceLoader::loadNow(),看下一步ResourceHandle::create()
PassRefPtr ResourceHandle::create(const ResourceRequest& request, ResourceHandleClient* client,
Frame* frame, bool defersLoading, bool shouldContentSniff, bool mightDownloadFromHandle)
{
RefPtr newHandle(adoptRef(new ResourceHandle(request, client, defersLoading, shouldContentSniff, mightDownloadFromHandle)));
if (!request.url().isValid()) {
newHandle->scheduleFailure(InvalidURLFailure);
return newHandle.release();
}
// 检查端口号(port)是否合法
if (!portAllowed(request)) {
newHandle->scheduleFailure(BlockedFailure);
return newHandle.release();
}
if (
newHandle->start(frame))
return newHandle.release();
return 0;
}
看关键的ResourceHandle::start调用:
bool ResourceHandle::start(Frame* frame)
{
if (!frame)
return false;
Page *page = frame->page();
// If we are no longer attached to a Page, this must be an attempted load from an
// on Unload handler, so let's just block it.
if (!page)
return false;
getInternal()->m_frame = static_cast(frame->loader()->client())->webFrame();
#if QT_VERSION < 0x040400
return QWebNetworkManager::self()->add(this, getInternal()->m_frame->page()->d->networkInterface);
#else
ResourceHandleInternal *d = getInternal();
d->m_job = new QNetworkReplyHandler(this, QNetworkReplyHandler::LoadMode(d->m_defersLoading));
return true;
#endif
}
新创建了一个QNetworkReplyHandler对象,QNetworkReplyHandler在构造的时候会调用QNetworkReplyHandler::start()
void QNetworkReplyHandler::start()
{
m_shouldStart = false;
ResourceHandleInternal* d = m_resourceHandle->getInternal();
QNetworkAccessManager* manager = d->m_frame->page()->networkAccessManager();
const QUrl url = m_request.url();
const QString scheme = url.scheme();
// Post requests on files and da ta don't really make sense, but for
// fast/forms/form-post-urlencoded.html and for fast/forms/button-state-restore.html
// we still need to retrieve the file/da ta, which means we map it to a Get instead.
if (m_method == QNetworkAccessManager::PostOperation
&& (!url.toLocalFile().isEmpty() || url.scheme() == QLatin1String("da ta")))
m_method = QNetworkAccessManager::GetOperation;
m_startTime = QDateTime::currentDateTime().toTime_t();
switch (m_method) {
case QNetworkAccessManager::GetOperation:
m_reply = manager->get(m_request);
break;
case QNetworkAccessManager::PostOperation: {
FormDataIODevice* postDevice = new FormDataIODevice(d->m_request.httpBody());
m_reply = manager->post(m_request, postDevice);
postDevice->setParent(m_reply);
break;
}
case QNetworkAccessManager::HeadOperation:
m_reply = manager->head(m_request);
break;
case QNetworkAccessManager::PutOperation: {
FormDataIODevice* putDevice = new FormDataIODevice(d->m_request.httpBody());
m_reply = manager->put(m_request, putDevice);
putDevice->setParent(m_reply);
break;
}
case QNetworkAccessManager::UnknownOperation: {
m_reply = 0;
ResourceHandleClient* client = m_resourceHandle->client();
if (client) {
ResourceError error(url.host(), 400 /*bad request*/,
url.toString(),
QCoreApplication::translate("QWebPage", "Bad HTTP request"));
client->didFail(m_resourceHandle, error);
}
return;
}
}
m_reply->setParent(this);
connect(m_reply, SIGNAL(finished()),
this, SLOT(finish()), Qt::QueuedConnection);
// For http(s) we know that the headers are complete upon metaDataChanged() emission, so we
// can send the response as early as possible
if (scheme == QLatin1String("http") || scheme == QLatin1String("https"))
connect(m_reply, SIGNAL(metaDataChanged()),
this, SLOT(sendResponseIfNeeded()), Qt::QueuedConnection);
connect(m_reply, SIGNAL(readyRead()),
this, SLOT(forwardData()), Qt::QueuedConnection);
}
看到了熟悉的QNetworkAccessManager、QNetworkReply。跟踪至此,初始化和URL请求发送基本完成。
QT之webkit 分析(五)
前面分析WebView初始化的时候,在QNetworkReplyHandler::start()里有设定读取数据的处理函数:
connect(m_reply, SIGNAL(finished()),
this, SLOT(finish()), Qt::QueuedConnection);
// For http(s) we know that the headers are complete upon metaDataChanged() emission, so we
// can send the response as early as possible
if (scheme == QLatin1String("http") || scheme == QLatin1String("https"))
connect(m_reply, SIGNAL(metaDataChanged()),
this, SLOT(sendResponseIfNeeded()), Qt::QueuedConnection);
connect(m_reply, SIGNAL(readyRead()),
this, SLOT(forwardData()), Qt::QueuedConnection);
先看QNetworkReplyHandler::forwardData()
void QNetworkReplyHandler::forwardData()
{
m_shouldForwardData = (m_loadMode == LoadDeferred);
if (m_loadMode == LoadDeferred)
return;
sendResponseIfNeeded();
// don't emit the "Document has moved here" type of HTML
if (m_redirected)
return;
if (!m_resourceHandle)
return;
QByteArray data = m_reply->read(m_reply->bytesAvailable());
ResourceHandleClient* client = m_resourceHandle->client();
if (!client)
return;
if (!data.isEmpty())
client->didReceiveData(m_resourceHandle, data.constData(), data.length(), data.length() /*FixMe*/);
}
实际就是两个调用:read()和didReceiveData()。其中QNetworkReply::read()前面分析过不再重复;
ResourceHandleClient* client->didReceiveData()实际调用的是MainResourceLoader::didReceiveData()
void MainResourceLoader::didReceiveData(const char* data, int length, long long lengthReceived, bool allAtOnce)
{
ASSERT(data);
ASSERT(length != 0);
// There is a bug in CFNetwork where callbacks can be dispatched even when loads are deferred.
// See for more details.
#if !PLATFORM(CF)
ASSERT(!defersLoading());
#endif
// The additional processing can do anything including possibly removing the last
// reference to this object; one example of this is 3266216.
RefPtr protect(this);
ResourceLoader::didReceiveData(data, length, lengthReceived, allAtOnce);
}
进一步看其调用:
void ResourceLoader::didReceiveData(const char* data, int length, long long lengthReceived, bool allAtOnce)
{
// Protect this in this delegate method since the additional processing can do
// anything including possibly derefing this; one example of this is Radar 3266216.
RefPtr protector(this);
addData(data, length, allAtOnce);
// FIXME: If we get a resource with more than 2B bytes, this code won't do the right thing.
// However, with today's computers and networking speeds, this won't happen in practice.
// Could be an issue with a giant local file.
if (m_sendResourceLoadCallbacks && m_frame)
frameLoader()->didReceiveData(this, data, length, static_cast(lengthReceived));
}
在ResourceLoader类中addData()是虚函数,client->didReceiveData()中client指针实际的实体为MainResourceLoader对象,所以addData()先调用MainResourceLoader::addData()
void MainResourceLoader::addData(const char* data, int length, bool allAtOnce)
{
ResourceLoader::addData(data, length, allAtOnce);
frameLoader()->receivedData(data, length);
}
这里只有两个调用,前一个是将接收到的数据保存到一个buffer中,供后续语法扫描使用(猜测的),暂不深入分析。看frameLoader->receivedData()
void FrameLoader::receivedData(const char* data, int length)
{
activeDocumentLoader()->receivedData(data, length);
}
void DocumentLoader::receivedData(const char* data, int length)
{
m_gotFirstByte = true;
if (doesProgressiveLoad(m_response.mimeType()))
commitLoad(data, length);
}
其中doesProgressiveLoad()会测试MIME的类型,重点是commitLoad()
void DocumentLoader::commitLoad(const char* data, int length)
{
// Both unloading the old page and parsing the new page may execute JavaScript which destroys the datasource
// by starting a new load, so retain temporarily.
RefPtr protect(this);
commitIfReady();
if (FrameLoader* frameLoader = DocumentLoader::frameLoader())
frameLoader->committedLoad(this, data, length);
}
前面一个调用:commitIfReady()是清理前一次页面扫描的中间数据;committedLoad()才是正题。
void FrameLoader::committedLoad(DocumentLoader* loader, const char* data, int length)
{
if (ArchiveFactory::isArchiveMimeType(loader->response().mimeType()))
return;
m_client->committedLoad(loader, data, length);
}
其中m_client指向的是FrameLoaderClientQT对象实体。
void FrameLoaderClientQt::committedLoad(WebCore::DocumentLoader* loader, const char* data, int length)
{
if (!m_pluginView) {
if (!m_frame)
return;
FrameLoader *fl = loader->frameLoader();
if (m_firstData) {
fl->setEncoding(m_response.textEncodingName(), false);
m_firstData = false;
}
fl->addData(data, length);
}
// We re-check here as the plugin can have been created
if (m_pluginView) {
if (!m_hasSentResponseToPlugin) {
m_pluginView->didReceiveResponse(loader->response());
// didReceiveResponse sets up a new stream to the plug-in. on a full-page plug-in, a failure in
// setting up this stream can cause the main document load to be cancelled, setting m_pluginView
// to null
if (!m_pluginView)
return;
m_hasSentResponseToPlugin = true;
}
m_pluginView->didReceiveData(data, length);
}
}
其中fl->setEncoding()是根据服务器返回的HTML数据流设定编码格式(例如:中文gb2312),另外处理了其他一些事情,例如Redirect等。fl->addData()是关键:
void FrameLoader::addData(const char* bytes, int length)
{
ASSERT(m_workingURL.isEmpty());
ASSERT(m_frame->document());
ASSERT(m_frame->document()->parsing());
write(bytes, length);
}
上面的FrameLoader::write()调用,启动了HTML/JS分析扫描。后一篇深入HTML扫描分析。
QT之webkit 分析(六)
在继续分析FrameLoader::write()之前,先回到《QT分析之WebKit(二)》。那里曾经保存了一个完整的调用堆栈,
……
QtWebKitd4.dll!WebCore::HTMLTokenizer::write(const WebCore::SegmentedString & str={...}, bool appendData=true) 行1730 + 0x23 字节 C++
QtWebKitd4.dll!WebCore::FrameLoader::write(const char *
可知调用的次序为:FrameLoader::write()调用了HTMLTokenizer::write()。
下面是FrameLoader::write()的定义:
void write(const char* str, int len = -1, bool flush = false);
这里包含了两个缺省值调用定义,在前一篇,调用的形式是:write(bytes, length);
实际传递的的是:write(bytes, length, false);
接着看write()的实现:
void FrameLoader::write(const char* str, int len, bool flush)
{
if (len == 0 && !flush)
return;
if (len == -1)
len = strlen(str);
Tokenizer* tokenizer = m_frame->document()->tokenizer();
if (tokenizer && tokenizer->wantsRawData()) {
if (len > 0)
tokenizer->writeRawData(str, len);
return;
}
if (!m_decoder) {
Settings* settings = m_frame->settings();
m_decoder = TextResourceDecoder::create(m_responseMIMEType, settings ? settings->defaultTextEncodingName() : String());
if (m_encoding.isEmpty()) {
Frame* parentFrame = m_frame->tree()->parent();
if (parentFrame && parentFrame->document()->securityOrigin()->canAccess(m_frame->document()->securityOrigin()))
m_decoder->setEncoding(parentFrame->document()->inputEncoding(), TextResourceDecoder::DefaultEncoding);
} else {
m_decoder->setEncoding(m_encoding,
m_encodingWasChosenByUser ? TextResourceDecoder::UserChosenEncoding : TextResourceDecoder::EncodingFromHTTPHeader);
}
m_frame->document()->setDecoder(m_decoder.get());
}
String decoded = m_decoder->decode(str, len);
if (flush)
decoded += m_decoder->flush();
if (decoded.isEmpty())
return;
#if USE(LOW_BANDWIDTH_DISPLAY)
if (m_frame->document()->inLowBandwidthDisplay())
m_pendingSourceInLowBandwidthDisplay.append(decoded);
#endif
if (!m_receivedData) {
m_receivedData = true;
if (m_decoder->encoding().usesVisualOrdering())
m_frame->document()->setVisuallyOrdered();
m_frame->document()->recalcStyle(Node::Force);
}
if (tokenizer) {
ASSERT(!tokenizer->wantsRawData());
tokenizer->write(decoded, true);
}
}
怎么和HTMLTokenizer关联的呢?就是在《QT分析之WebKit(三)》初始化Document对象的时候关联上的。
DOMImplementation::createDocument()
上面程序做了一些边缘的工作,例如设定编码(因为可以在HTTP协议、HTML的TITLE部分或者浏览器特别指定编码),主要是新建一个decoder另外一个是调用tokenizer->write()
QT之webkit 分析(七)
接着前面的分析,先看m_decoder->decode(str, len);
String TextResourceDecoder::decode(const char* data, size_t len)
{
if (!m_checkedForBOM)
checkForBOM(data, len); // 检查是否为Unicode编码
bool movedDataToBuffer = false;
if (m_contentType == CSS && !m_checkedForCSSCharset)
if (!checkForCSSCharset(data, len, movedDataToBuffer)) // 如果是CSS,则检查CSS的字符集
return "";
if ((m_contentType == HTML || m_contentType == XML) && !m_checkedForHeadCharset) // HTML and XML
if (!checkForHeadCharset(data, len, movedDataToBuffer)) // 检查HTML/XML的字符集
return "";
// Do the auto-detect if our default encoding is one of the Japanese ones.
// FIXME: It seems wrong to change our encoding downstream after we have already done some decoding.
if (m_source != UserChosenEncoding && m_source != AutoDetectedEncoding && encoding().isJapanese())
detectJapaneseEncoding(data, len); // 检查日文编码(为什么没有检查中文编码的啊?)
ASSERT(encoding().isValid());
if (m_buffer.isEmpty())
return m_decoder.decode(data, len, false, m_contentType == XML, m_sawError);
if (!movedDataToBuffer) {
size_t oldSize = m_buffer.size();
m_buffer.grow(oldSize + len);
memcpy(m_buffer.data() + oldSize, data, len);
}
String result = m_decoder.decode(m_buffer.data(), m_buffer.size(), false, m_contentType == XML, m_sawError);
m_buffer.clear();
return result;
}
再回到tokenizer->write(decoded, true);看其具体实现:
bool HTMLTokenizer::write(const SegmentedString& str, bool appendData)
{
if (!m_buffer)
return false;
if (m_parserStopped)
return false;
SegmentedString source(str);
if (m_executingScript)
source.setExcludeLineNumbers();
if ((m_executingScript && appendData) || !m_pendingScripts.isEmpty()) {
// don't parse; we will do this later
if (m_currentPrependingSrc)
m_currentPrependingSrc->append(source);
else {
m_pendingSrc.append(source);
#if PRELOAD_SCANNER_ENABLED
if (m_preloadScanner && m_preloadScanner->inProgress() && appendData)
m_preloadScanner->write(source);
#endif
}
return false;
}
#if PRELOAD_SCANNER_ENABLED
if (m_preloadScanner && m_preloadScanner->inProgress() && appendData)
m_preloadScanner->end();
#endif
if (!m_src.isEmpty())
m_src.append(source);
else
setSrc(source);
// Once a timer is set, it has control of when the tokenizer continues.
if (m_timer.isActive())
return false;
bool wasInWrite = m_inWrite;
m_inWrite = true;
#ifdef INSTRUMENT_LAYOUT_SCHEDULING
if (!m_doc->ownerElement())
printf("Beginning write at time %d ", m_doc->elapsedTime());
#endif
int processedCount = 0;
double startTime = currentTime();
Frame* frame = m_doc->frame();
State state = m_state;
while (!m_src.isEmpty() && (!frame || !frame->loader()->isScheduledLocationChangePending())) {
if (!continueProcessing(processedCount, startTime, state))
break;
// do we need to enlarge the buffer?
checkBuffer();
UChar cc = *m_src;
bool wasSkipLF = state.skipLF();
if (wasSkipLF)
state.setSkipLF(false);
if (wasSkipLF && (cc == ' '))
m_src.advance();
else if (state.needsSpecialWriteHandling()) {
// it's important to keep needsSpecialWriteHandling with the flags this block tests
if (state.hasEntityState())
state = parseEntity(m_src, m_dest, state, m_cBufferPos, false, state.hasTagState());
else if (state.inPlainText())
state = parseText(m_src, state);
else if (state.inAnySpecial())
state = parseSpecial(m_src, state);
else if (state.inComment())
state = parseComment(m_src, state);
else if (state.inDoctype())
state = parseDoctype(m_src, state);
else if (state.inServer())
state = parseServer(m_src, state);
else if (state.inProcessingInstruction())
state = parseProcessingInstruction(m_src, state);
else if (state.hasTagState())
state = parseTag(m_src, state);
else if (state.startTag()) {
state.setStartTag(false);
switch(cc) {
case '/':
break;
case '!': {
// or
searchCount = 1; // Look for ' m_doctypeSearchCount = 1;
break;
}
case '?': {
// xml processing instruction
state.setInProcessingInstruction(true);
tquote = NoQuote;
state = parseProcessingInstruction(m_src, state);
continue;
break;
}
case '%':
if (!m_brokenServer) {
// <% server stuff, handle as comment %>
state.setInServer(true);
tquote = NoQuote;
state = parseServer(m_src, state);
continue;
}
// else fall through
default: {
if( ((cc >= 'a') && (cc <= 'z')) || ((cc >= 'A') && (cc <= 'Z'))) {
// Start of a Start-Tag
} else {
// Invalid tag
// Add as is
*m_dest = '<';
m_dest++;
continue;
}
}
}; // end case
processToken();
m_cBufferPos = 0;
state.setTagState(TagName);
state = parseTag(m_src, state);
}
} else if (cc == '&' && !m_src.escaped()) {
m_src.advancePastNonNewline();
state = parseEntity(m_src, m_dest, state, m_cBufferPos, true, state.hasTagState());
} else if (cc == '<' && !m_src.escaped()) {
m_currentTagStartLineNumber = m_lineNumber;
m_src.advancePastNonNewline();
state.setStartTag(true);
state.setDiscardLF(false);
} else if (cc == ' ' || cc == ' ') {
if (state.discardLF())
// Ignore this LF
state.setDiscardLF(false); // We have discarded 1 LF
else {
// Process this LF
*m_dest++ = ' ';
if (cc == ' ' && !m_src.excludeLineNumbers())
m_lineNumber++;
}
/* Check for MS-DOS CRLF sequence */
if (cc == ' ')
state.setSkipLF(true);
m_src.advance(m_lineNumber);
} else {
state.setDiscardLF(false);
*m_dest++ = cc;
m_src.advancePastNonNewline();
}
}
#ifdef INSTRUMENT_LAYOUT_SCHEDULING
if (!m_doc->ownerElement())
printf("Ending write at time %d ", m_doc->elapsedTime());
#endif
m_inWrite = wasInWrite;
m_state = state;
if (m_noMoreData && !m_inWrite && !state.loadingExtScript() && !m_executingScript && !m_timer.isActive()) {
end(); // this actually causes us to be deleted
return true;
}
return false;
}
在调用的时候,因为调用参数decoded是String类型的,所以先隐含转化成SegmentedString。SegmentedString可以附带行号,也可以不带行号(可以设定)。上面程序中的while循环主体,就是一个分析程序主体。
QT之webkit 分析(八)
分析到HTML解析,看到一个博士的blog,对WebKit结构的解析相当犀利,转贴如下:
邓侃的博客
http://blog.sina.com.cn/s/blog_46d0a3930100d5pt.html
【20】WebKit的结构与解构
从指定一个HTML文本文件,到绘制出一幅布局复杂,字体多样,内含图片音频视 频等等多媒体内容的网页,这是一个复杂的过程。在这个过程中Webkit所做的一切,都是围绕DOM Tree和Rendering Tree这两个核心。上一章我们谈到这两棵树各自的功用,这一章,我们借一个简单的HTML文件,展示一下DOM Tree和Rendering Tree的具体构成,同时解剖一下Webkit是如何构造这两棵树的。Figure 1. From HTML to webpage, and the underlying DOM
tree and rendering tree.
Courtesy http://farm4.static.flickr.com/3351/3556972420_23a30366c2_o.jpg
1. DOM Tree 与 Rendering Tree 的结构
Figure 1中左上是一个简单的HTML文本文件,右上是Webkit rendering engine绘制出来的页面。页面的内容包括一个标题,“AI”,一行正文,“Ape's Intelligence”,以及一幅照片。整个页面分成前后两个层面,标题和正文绘制在前一个层面,照片处于后一个层面。L君和我亦步亦趋地跟踪了,从解析这个HTML文本文件,到生成DOM Tree和Rendering Tree的整个流程,目的是为了了解DOM Tree和Rendering Tree的具体成份,以及构造的各个步骤。
先说Figure 1中左下角的DOM Tree。基本上HTML文本文件中每个tag,在webkit/webcore/html中都有一个class与之对应。譬如 tag 对应HTMLHtmlElement, tag 对应HTMLHeadElement,