Copy link to clipboard
Copied
Hello, CFers!
I need to access a remote page using the cfhttp. To be more specific, I need to access this page: https://sistemas.dnpm.gov.br/SCM/Extra/site/admin/dadosProcesso.aspx?numero=861738&ano=2013 (Sorry, Brazilian government sites works better only in the I.E.)
This is a site of the Brazillian government that supervises mining areas all over the country. Each mining area has its own "numero" (number) and the "ano" (year) when it was registered at the National Department of Mineral Production - DNPM agency. As you can see (if you clicked the link to the DNPM website) there's a tab "Poligonal" at the top of it. Clicking on it will start a function created probably by the ASP.NET and it will generate a PNG image showing the area and some other processes near it. Example of the link above:
So here is the problem: Can I get the data from the "poligonal" tab using the CFHTTP, knowing the poligonal page doesn't have a direct link? I tried all I could to find a way to get this image, checked I the sourcecode, but I couldn't find any way to solve my problem. That's why I came here to see if somebody can help me with this.
Sorry for my bad English. I hope you guys undertand me.
Being sure of your pacience and attention, since now I thank you.
Copy link to clipboard
Copied
CFHTTP can only request a single URL; it cannot "interact" with pages or access tabbed content, unless that tabbed content has a unique URL that generates the data. In your case the tabs both call the same URL but they POST different data. If you examine what data that post, you will be able to access just the content that the second tab generates. I'd hazard a guess that each tab does a form submit and then the server replies with different content depending on the data posted to the ASP script. If you install a proxy on your PC, like the free Burp proxy, you can intercept and see all the HTTP requests. Good luck.
Copy link to clipboard
Copied
That's how the page works, Tribule!
I'll try the Burp proxy to check what's being submited and try to access the content that the "Poligonal" tab generates.
Thanks, man!
Copy link to clipboard
Copied
Just wondering, but have you tried contacting the developer of the site to see if a publicly accessible web API that provides that functionality you're looking for is available? Could allow you a direct path to a solution.
Copy link to clipboard
Copied
Aegis Kleais, I have tried contacting the developer, but I got no answer from him so far.
I tried something and now I can access the content that the tab shows me (with a little help of a library of Ben Nadal):
<!---
Função escrita pelo Ben Nadel
Detalhe de funcionamento em http://www.bennadel.com/blog/779-Parsing-HTML-Tag-Data-Into-A-ColdFusion-Structure.htm
--->
<cffunction name="ParseHTMLTag" access="public" returntype="struct" output="false" hint="Parses the given HTML tag into a ColdFusion struct.">
<cfargument name="HTML" type="string" required="true" hint="The raw HTML for the tag."/>
<cfset var LOCAL = StructNew() />
<cfset LOCAL.Tag = StructNew() />
<cfset LOCAL.Tag.HTML = ARGUMENTS.HTML />
<cfset LOCAL.Tag.Name = "" />
<cfset LOCAL.Tag.Attributes = StructNew() />
<cfset LOCAL.NamePattern = CreateObject("java","java.util.regex.Pattern").Compile("^<(\w+)")/>
<cfset LOCAL.NameMatcher = LOCAL.NamePattern.Matcher(ARGUMENTS.HTML) />
<cfif LOCAL.NameMatcher.Find()>
<cfset LOCAL.Tag.Name = UCase(LOCAL.NameMatcher.Group( 1 )) />
</cfif>
<cfset LOCAL.AttributePattern = CreateObject("java","java.util.regex.Pattern").Compile("\s+(\w+)(?:\s*=\s*(""[^""]*""|[^\s>]*))?")/>
<cfset LOCAL.AttributeMatcher = LOCAL.AttributePattern.Matcher(ARGUMENTS.HTML) />
<cfloop condition="LOCAL.AttributeMatcher.Find()">
<cfset LOCAL.Name = LOCAL.AttributeMatcher.Group( 1 ) />
<cfset LOCAL.Tag.Attributes[ LOCAL.Name ] = "" />
<cfset LOCAL.Value = LOCAL.AttributeMatcher.Group( 2 ) />
<cfif StructKeyExists( LOCAL, "Value" )>
<cfset LOCAL.Value = LOCAL.Value.ReplaceAll("^""|""$","") />
<cfset LOCAL.Tag.Attributes[ LOCAL.Name ] = LOCAL.Value />
</cfif>
</cfloop>
<cfreturn LOCAL.Tag />
</cffunction>
<cfset urlDestino = "https://sistemas.dnpm.gov.br/SCM/Extra/site/admin/dadosProcesso.aspx?numero=861738&ano=2013"/>
<!--- Primeira chamada, com o objetivo de obter os cabeçalhos e os campos ocultos para dar continuidade à navegação aqui você poderá dinamizar o resultado de acordo com o que você precisar --->
<cfhttp url="#urlDestino#" method="get" charset="utf-8" result="gResult" timeout="900"/>
<!---//OBTER O CABEÇALHO DA PÁGINA //--->
<!---capturo o cabeçalho da página e delimito em uma lista apenas os que quero passar para as páginas seguintes --->
<cfset requestHeaders = getHttpRequestData().headers/>
<cfset rhList = 'accept,accept-encoding,accept-language,cookie,cache-control,connection,pragma,user-agent'/>
<!---//OBTER OS CAMPOS OCULTOS DA PÁGINA //--->
<!--- expressão regular para localizar todos os inputs do html da página
que pode ser aprimorada para localizar apenas os hiddens --->
<cfset hiddenFields = reMatchNoCase("(?i)<input [^>]*[^>]*?>",gResult.fileContent)/>
<!--- A partir da função do Ben, eu extraio o nome e os valores dos inputs
que serão postados para a página seguinte, excluindo os valores que desviam do resultado esperado --->
<cfset formFields = []/>
<cfloop index="input" from="1" to="#arrayLen(hiddenFields)#">
<cfset inputResult = ParseHTMLTag(hiddenFields[input])/>
<cfif NOT findNoCase('btnConsultarProcesso',inputResult.ATTRIBUTES.name)
AND NOT findNoCase('btnDadosBasicos',inputResult.ATTRIBUTES.name)>
<cfset formFields[input]["name"] = inputResult.ATTRIBUTES.name/>
<cfset formFields[input]["value"] = inputResult.ATTRIBUTES.value/>
</cfif>
</cfloop>
<!---//NAVEGAR PARA PÁGINA POLIGONAL //--->
<cfhttp url="#urlDestino#" method="post" charset="utf-8" result="fResult" timeout="900">
<!--- injeta os form fields --->
<cfloop array="#formFields#" index="key">
<cfhttpparam type="formfield" name="#key.name#" value="#key.value#"/>
</cfloop>
<!--- injeta o cabeçalho--->
<cfloop collection="#requestHeaders#" index="key">
<cfif listFind(rhList,key)>
<cfhttpparam type="header" name="#key#" value="#requestHeaders[key]#"/>
</cfif>
</cfloop>
</cfhttp>
<!---//VISUALIZA A PÁGINA POLIGONAL //--->
<cfoutput>#fResult.fileContent#</cfoutput>
That's it.
And thanks for helping me!
Copy link to clipboard
Copied
Coming back here, again... Here's the thing:
The code I posted earlier works fine when the #urlDestino# is already set as in
<cfset urlDestino = "https://sistemas.dnpm.gov.br/SCM/Extra/site/admin/dadosProcesso.aspx?numero=861738&ano=2013" />
. But I need to see a new numero and a new ano and I can't do it because there's no other way to set a new value to them in the code.
How can I set a new urlDestino like:
<cfset urlDestino = "https://sistemas.dnpm.gov.br/SCM/Extra/site/admin/dadosProcesso.aspx?numero=#newnumero#&ano=#newano#" /> ?
Copy link to clipboard
Copied
Using regular expressions to parse out strings is not a good solution long-term, since the developer may change the code on the server and your code will break. Those tabs show different content depending on what data is posted. If you post the correct data to the script the information you need should be returned. Did the proxy show you the form POST data?
The "861.738/2013" is the Número do processo and it's a form field, so the year is the second part (after the /) and the 861738 is the first part, with the dot (.) removed, it seems. You'll have to add those values directly in to the URL, or provide form fields to allow the user to select or enter them.