MediaWiki API result

This is the HTML representation of the JSON format. HTML is good for debugging, but is unsuitable for application use.

Specify the format parameter to change the output format. To see the non-HTML representation of the JSON format, set format=json.

See the complete documentation, or the API help for more information.

{
    "batchcomplete": "",
    "continue": {
        "gapcontinue": "SemanticAnalyzer2",
        "continue": "gapcontinue||"
    },
    "query": {
        "pages": {
            "74": {
                "pageid": 74,
                "ns": 0,
                "title": "SHEnvironmentMap",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "'''NOTE :''' The snapshots displayed here are from my deferred rendering pipeline and use the [http://graphics.cs.uiuc.edu/~kircher/inferred/inferred_lighting_paper.pdf \"inferred lighting\" technique] that renders lights into a downscaled buffer. The upscale operation is still buggy and can show outlines like cartoon rendering but these are in no way related to the technique described here. Hopefully, the problem will be fixed pretty soon... [[File:S1.gif]]\n\n== Incentive ==\n\n[[Image:SHEnvMapNoAmbient.png|300px|thumb|right|Your typical ugly scene with no ambient]]\n\nUsing [[Nuaj]] and [[Cirrus]] to create test projects is alright, but came a time where I needed to start what I was put on this Earth to do : deferred HDR rendering.\nSo naturally I started writing a deferred rendering pipeline which is quite advanced already. At some point, I needed a sky model so, naturally again, I turned to HDR rendering to visualize the result.\n\nWhen you start talking HDR, you immediately imply tone mapping. I implemented a version of the \"filmic curve\" tone mapping discussed by [http://filmicgames.com/archives/75 John Hable] from Naughty Dog (a more extensive and really interesting talk can be found here [http://www.gdcvault.com/play/1012459/Uncharted_2__HDR_Lighting]) (warning, it's about 50Mb !).\n\nBut to properly test your tone mapping, you need a well balanced lighting for your test scene, that means no hyper dark patches in the middle of a hyper bright scene, as is usually the case when you implement directional lighting by the Sun and... no ambient !\n\n\n== Let's put some ambience ==\n\nThat's when I decided to re-use my old \"ambient SH\" trick I wrote a few years ago. The idea was to pre-compute some SH for the environment at different places in 2D in the game map, and to evaluate the irradiance for each object depending on its position in the network, as shown in the figure below.\n\n[[File:SHEnvNetwork.png]]\n\n''The game map seen from above with the network of SH environment nodes''.\n\n\nThe algorithm was something like :\n For each object\n {\n  Find the 3 SH nodes the object stands in\n  ObjectSH = Interpolate SH at object's position\n  Render( Object, ObjectSH );\n }\n\nAnd the rendering was something like (in shader-like language) :\n float3  ObjectSH[9]; // These are the ObjectSH from the previous CPU algorithm and they change for every object\n \n float3   PixelShader() : COLOR\n {\n  float3 SurfaceNormal = RetrieveSurfaceNormal(); // From normal maps and stuff...\n  float3 Color = EstimateIrradiance( SurfaceNormal, ObjectSH ); // Evaluates the irradiance in the given direction\n }\n\nThe low frequency nature of irradiance allows us to store a really sparse network of nodes and to concentrate them where the irradiance is going to change rapidly, like near occluders or at shadow boundaries. The encoding of the environment in spherical harmonics is simply done by rendering the scene into small cube maps (6x64x64) using each texel's solid angle (the solid angle for a cube map texel can be found [http://people.cs.kuleuven.be/~philip.dutre/GI/TotalCompendium.pdf here]). More on that subject is discussed in the [http://wiki.patapom.com/index.php/SHEnvironmentMap#Pre-Computing_the_Samples last section].\n\nThis was a neat and cheap trick to add some nice directional ambient on my objects. You could also estimate the SH in a given direction to perform some \"glossy reflection\" or even some translucency using a vector that goes through the surface.\nThat was the end of those ugly normal maps that don't show in shadow !\n\nAnd all for a very low memory/disk storage as I stored only 9 RGBE packed coefficients (=36 bytes) + a 3D position in the map (=12 bytes) that required 48 bytes per \"environment node\". The light field was rebuilt when the level was loaded and that was it.\n\nUnfortunately, the technique didn't allow to change the environment in real time so I oriented myself to a precomputed array of environment nodes : the network of environment nodes was rendered at different times of the day, and for different weather conditions (we had a whole skydome and weather system at the time). You then needed to interpolate the nodes from the different networks based on your current condition, and use that interpolated network for your objects in the map.\n\nAnother obvious inconvenience of the method is that it's only working in 2D. That was something I didn't care about at the time (and still don't) as a clever mind can always upgrade the algorithm to handle several layers of environments stacked vertically and interpolate between them...\n\n\n== Upgrade ==\n\nFor my deferred rendering though, I really wanted something dynamic and above all, something I would render in screen space as any other light in the deferred lighting stage. I had omni, spots, directionals so why not a fullscreen ambient pass ?\n\nMy original idea was neat but I had to compute the SH for every object, and to interpolate them manually. I didn't like the idea to make the objects dependent on lighting again, which would defeat the purpose of deferred lighting.\n\n=== SH Environment Map ===\n\nMy first idea was to render the environment mesh into a texture viewed from above and let the graphic card interpolate the SH nodes by itself (and it's quite good at it I heard).\n\nI decided to create a vertex format that took a 3D position, 9 SH coefficients and triangulate my SH environment nodes into a mesh that I would \"somehow\" render in a texture. I needed to render the pixels the camera can see, so the only portion of the SH environment mesh I needed was some quad that bounded the 2D projection of the camera frustum, as seen in the figure below.\n\n[[File:SHEnvFrustumQuad.png]]\n\n[[File:SHEnvDelaunay.png|300px|thumb|right|The Delaunay triangulation of the environment nodes network, rendered into a 256x256 textures attached to the camera frustum]]\n\nAgain, due to the low frequency of the irradiance variation, it's not necessary to render into a texture larger than 256x256.\n\nI also use a smaller frustum for the environment map rendering than the actual camera frustum, to concentrate on object close to the viewer. Another option would be to \"2D project\" vertices in 1/DistanceToCamera as for conventional 3D object so we maximize resolution for pixels that are closer to the camera but I haven't found the need yet (anyway, I haven't tested the technique on large models either so maybe it will come handy sooner than later !).\n\n=== What do we render in the SH env map ? ===\n\nWe have the power of the vertex shader to process SH Nodes (that contain a position and 9 SH coefficients as you remember). That's the ideal time to process the SH in some way that allows us to make the environment fully dynamic.\n\nI decided to encode 2 kinds of information in each SH vertex (that are really ''float4'' as you can see below) :\n* The indirect diffuse lighting in XYZ\n* The direct light occlusion in W\n\n\nBut direct lighting is harsh, it's very high frequency and creates hard shadows. You should <u>never</u> encode direct lighting in SH, unless you have many SH bands and we only have 3 here (i.e. 9 coefficients).\nThat's why only the direct '''sky light''' will be used as direct light source, because it's smooth and varies slowly.\n\n\nThe indirect diffuse lighting, on the other hand, varies slowly, even if lit by a really sharp light like the Sun. That's because it's a diffuse reflection of the Sun, and diffuse reflections are smooth.\n\n\nWe also provide the shader that renders the env map with 9 global SH coefficients, each being a ''float4'' :\n* The Sky light in XYZ\n* The monochromatic Sun light in W that will be encoded as a cone SH (using only luminance is wrong as the Sun takes a reddish tint at sunset, but it's quite okay for indirect lighting which is a subtle effect)\n\n\nWhat we are going to do is basically something like this (sorry for the HLSL-like vertex shader code) :\n\n float4   SHLight[9];   // 9 Global SH Coefficients (XYZ=Sky W=Sun)\n \n float3[]  VertexShader( float4 SHVertex[9] )  // 9 SH Coefficients per vertex (XYZ=IndirectLighting W=DirectOcclusion)\n {\n   // First, we compute occluded sky light\n   float3   SkyLight[9] = { SHLight[0].xyz, SHLight[1].xyz, SHLight[2].xyz, (...) };\n   float    Occlusion[9] = { SHVertex[0].w, SHVertex[1].w, SHVertex[2].w, (...) };\n   float3   OccludedSkyLight = Product( SkyLight, Occlusion );\n \n   // Second, we compute indirect sun light\n   float3   IndirectReflection[9] = { SHVertex[0].xyz, SHVertex[1].xyz, SHVertex[2].xyz, (...) };\n   float    SunLight[9] = { SHLight[0].w, SHLight[1].w, SHLight[2].w, (...) };\n   float3   IndirectLight = Product( IndirectReflection, SunLight );\n \n   // Finally, we return the sum of both\n   float3   Result[9] = { OccludedSkyLight[0] + IndirectLight[0], OccludedSkyLight[1] + IndirectLight[1], (...) };\n   return Result;\n }\n\nSee how the W and XYZ component are intertwined ? Each of them are multiplied together using the SH triple product. A good implementation can be found in [http://research.microsoft.com/en-us/um/people/johnsny/papers/shtriple_fixed.pdf Snyder's Paper] that needs \"only\" 120 multiplications and 74 additions (twice that as we're doing 2 products).\n\nThat may seem a lot but don't forget we're only doing this on vertices of a very sparse environment mesh. The results are later interpolated by the card and each pixel is written \"as is\" (no further processing is needed in the pixel shader).\n\nBasically, the process can be viewed like this :\n\n[[File:SHEnvMapCompositing.png|800px]]\n\n\nAnyway, this is not as easy as it looks : as you may have noticed, we're returning 9 float3 coefficients. You have 2 options here :\n* Write each coefficient in a slice of a 3D texture using a geometry shader and the ''SV_RenderTargetArrayIndex'' semantic (that's what I did)\n* Render into multiple render targets, each one receiving a different coefficient\n\n\nNo matter what you choose though, you'll need at most 7 targets/3D slices since you're writing 9*3=27 SH components that can be packed in 7 RGBA textures (as there is room for 7*4=28 coefficients).\n\n\n=== Using the table ===\n\nSo, we obtained a nice 3D texture of 256x256x7 filled with SH coefficients. What do we do with it ?\n\nThis part is really simple.\nWe render a screen quad and for every pixel in <u>screen space</u> :\n* Retrieve the world position and normal from the geometry buffers provided by our deferred renderer\n* Use the world position to sample the 9 SH coefficients from the SH Env Map calculated earlier\n* Estimate the irradiance in the normal direction using the SH coefficients\n\n\nHere are some results in a simplified Cornell box where 16 samples have been taken (a grid of 4x4 samples laid on the floor of the box) :\n\n[[File:SHEnvMapAmbientOnly.png|300px]] [[File:SHEnvMapIndirectOnly.png|300px]] [[File:SHEnvMapAmbientIndirect.png|300px]] [[File:SHEnvMapAmbientIndirectDirect.png|300px]]\n\nFrom left to right : (1) Ambient Sky Light Only, (2) Indirect Sun Light Only, (3) Ambient Sky Light + Indirect Sun Light, (4) Ambient + Indirect + Direct Lighting (no shadows at present time, sorry)\n\n\nWe can then move dynamic objects in the scene and they will be correctly occluded and receive color bleeding from their environment (unfortunately, the reverse is not true : colored objects won't bleed on the environment) (in this picture, only ambient and indirect lighting is shown and the indirect lighting has been boosted by a factor 3 to emphasize the color bleeding effect) :\n[[File:SHEnvMapExageratedIndirect.png]]\n\n\n\nAnother interesting feature is the addition of important light bounces almost for free, as shown in that image where only 1 sample has been dropped in the middle of the room (only ambient and indirect lighting is shown and, without the SH env map technique, the room would be completely dark) :\n\n[[File:SHEnvMapSingleSample.png]]\n\n\nHere is another shot with 3x3 samples (always 6x64x64 cube maps) :\n\n[[File:SHEnvMap9Samples.png]]\n\n\nLast but not least, another interesting feature is the revelation of the normals even in shadow. In this image, no lighting other than ambient and indirect is used so the normal map would not show using traditional techniques :\n\n[[File:SHEnvMapNormalMapReveal.png]]\n\n(also notice the yellow bleeding on the dynamic sphere [[File:S1.gif]])\n\n\n== Final Result ==\n\nFinally, combining all the previously mentioned features and direct lighting (still lacking shadows, I know), I get a nice result to test my tone mapping [[File:S2.gif]] :\n\n[[File:SHEnvMapFinalRendering.png]]\n\n\n== Improvement : Local Lighting ==\n\nA nice possible improvement for indoor environments would be to compute a second network of nodes encoding direct lighting by local light sources. Each node would contain a multiplier for the Sun's SH and 9 RGB SH coefficients encoding local lighting.\n\n* Outdoor, the network would simply be nodes at 0 and with a Sun multiplier at 1 : the Sun's SH coefficients would be the only ones used.\n* Indoor, the network would consist of various samples of local lighting and the Sun multiplier would be O : only local indoor lighting would be used.\n\n\nThis network could be rendered in a second 3D texture containing the 9 RGB SH Coefficients (=27 coefficients) + Sun multiplier (=1 coefficient) that can also be packed in 7 slices.\n\nIn the vertex shader where we perform the triple products, we would simply fetch the direct lighting SH coefficient by doing :\n\n float3[9]  LocalSH = TextureLocalLighting.Sample( WorldPosition );        // Fetch local SH coefficients\n float      SunMultiplier = TextureLocalLighting.Sample( WorldPosition );  // Fetch Sun multiplier\n float3[9]  DirectLightCoefficients = SunMultiplier * SunSH + LocalSH;     // Coefficients we use instead of only Sun SH coefficients\n\n\n\n== Pre-Computing the Samples ==\n\nIf your renderer is well conceived, it should be easy to reconnect the pipeline to render only what you need and from points of view others than the camera, like the 6 faces of a cube map for offline rendering, for example.\n\nWhat I did was to write a cube map renderer that renders geometry (i.e. normals + depth) and material (i.e. diffuse albedo) into 2 render targets, and call it to render the 6 faces of a cube placed anywhere in the scene (a.k.a. an environment node, or environment sample).\n\nThen, I needed to post-process these cube textures to compute occlusion and indirect lighting.\n\n=== Occlusion ===\n\nOcclusion is the easiest part as you simply test the distance of every pixel in the cube map and accumulate SH * solid angle <u>only if the pixel escapes to infinity</u>.\n\nIf you remember, that's the part we store in the W component of the SH Node vertex.\n\n=== Indirect Lighting ===\n\nThat one is quite tricky and is called multiple times, as many times as you need light bounces in fact. For all the previous snapshots, 3 light bounces were computed.\n\nThe algorithm goes like this : we post-process each cube map face using a shader that evaluates SH for each pixel (using the env map from the previous pass), and we multiply these SH by a cosine lobe to account for diffuse reflection and also by the material's diffuse albedo (that's what creates the color bleeding). The resulting SH are packed into 7 render targets that are then read back by the CPU and accumulated * solid angle <u>only if the pixel hits an object</u>.\n\nIf you've already played with SH and read the excellent document [http://www.cs.columbia.edu/~cs4162/slides/spherical-harmonic-lighting.pdf \"Spherical Harmonics : the gritty details\"] by Robin Green, you will recognize the exact same rendering algorithms described as \"Diffuse Shadowed Transfer\" (for the occlusion part) and \"Diffuse Interreflected Transfer\" (for the indirect lighting part). Except we do it for every pixel of a cube map instead of mesh vertices...\n\nThe resulting coefficients are stored in the XYZ part of the SH Node vertex.\n\nThen, using these newly calculated coefficients (exactly these coefficients: only the ones from the previous pass, not the accumulated coefficients), we do the computation again to account for the 2nd light bounce. And again for the 3rd bounce and so on...\n\nThe final indirect lighting coefficients stored in the SH Node vertex are the accumulation of all the computed components in the indirect lighting passes.\n\n=== Finalizing ===\n\nBy summing the coefficients from all the indirect lighting passes, we obtain the indirect lighting perceived at the spot where we sampled the cube map. Using these directly to sample the irradiance would make us \"be lit by the light at that position\" (as this is exactly what we computed : the indirect lighting at the sample position).\n\nInstead, we want to know \"how the surrounding environment is going to light us\". And that's simply the opposite SH coefficients : the ones we would obtain by fetching the irradiance in '''-Normal''' direction instead of simply '''Normal'''.\n\nThis is done quite easily by inverting only coefficients 1, 2 and 3 (leaving 0, 4, 5, 6, 7 and 8 unchanged) of the vertex <u>and</u> the light.\nI lied in the algorithm given in the beginning to avoid early confusion (as I believe it's quite confusing enough already). The actual indirect lighting operation performed in the first shader I provided is rather like this :\n\n   // First, we compute occluded sky light\n   (blah blah blah)\n \n   // Second, we compute indirect sun light\n   float3   IndirectReflection[9] = { SHVertex[0].xyz, '''-'''SHVertex[1].xyz, '''-'''SHVertex[2].xyz, '''-'''SHVertex[3].xyz, SHVertex[4].xyz, SHVertex[5].xyz, SHVertex[6].xyz, SHVertex[7].xyz, SHVertex[8].xyz };\n   float    SunLight[9] = { SHLight[0].w, '''-'''SHLight[1].w, '''-'''SHLight[2].w, '''-'''SHLight[3].w, SHLight[4].w, SHLight[5].w, SHLight[6].w, SHLight[7].w, SHLight[8].w };\n   float3   IndirectLight = Product( IndirectReflection, SunLight );\n \nNotice the '''-''' signs on the ''IndirectReflection'' and ''SunLight'' coefficients #1, #2 and #3."
                    }
                ]
            },
            "47": {
                "pageid": 47,
                "ns": 0,
                "title": "SemanticAnalyzer",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "Okay, let me try and explain what this great idea is about. [[File:S13.gif]]\n\nFirst, let's see some basic politics as an introduction to what I will expose later.\n\n== Politics ==\n\nFirst, you should know I'm a fucking social anarchist [[File:S2.gif]]. And as such, I don't like the inherent hierarchy of the powers in place and their ugly scheming to get to the top. I just can't stand politics and corruption. I just can't grasp the concept of lust for power and money. And I can't even begin to understand why someone who has enough money to buy a small country just needs even more.\n\nLet me be clear on these thoughts : I don't want to blow everything up, shoot everyone and make a revolution. Our capitalist system is obviously far from perfect but I believe it can be \"mended\" in many ways so we achieve more equality in revenues and so a huge part of the world isn't left aside like junk. I would have liked not to quote the obvious here, like the richest 2% owning half the wealth of the planet, or that the cost of the war in Iraq itself would have permitted to buy all the weapons we're so afraid of, or even that it would cost 40 billion $ annually to feed the hungry (the budget of the G8 summit where \"important\" people discuss of this matter costing $600 million on its own) but I write these small facts here as a memento for some other time.\n\n\nKeeping that in mind, after spending many years being angry at everything, we need to focus on finding ways to change things and make the system more equal.\n\n\nPoliticians, whether they are left-wing liberal democrats or right-wing conservative republicans, all want the same thing : power. They usually have a short-term vision of things essentially because of their equally short-term mandates. Often laws and amendments get voted to be overruled 1 or 2 presidential mandates later, yielding a brownian-motion-like status quo. Also, these politicians are almost always issued from bourgeoisie and aristocracy, they are formed and taught in high-standard schools whose diplomas always guarantee a successful career. These people are NOT your friends, they are not of your social class and don't know the cost or the difficulties and precarity of life, yet here they are trying to solve your problems they don't have a clue about. They only are theoreticians of life.\n\nAlso, it is my intimate conviction that politicians have no real power anymore and don't rule their country as they used to : multi-national corporations do through lobbying and economic pressure. Politicians can only limit the damages caused to their countries by these corporations (when they are willing to do so) by applying mere patches and solving neighborhood-range crises, when they are not altogether at the mercy of such corporations through either economic blackmail or mere corruption.\n\nPoliticians have become CEOs of their countries they now run like mere corporations. We're the employees. Revolving doors between government positions and private sector companies work 24-7. Conflicts of interest are now showing blatantly in the open and are part of the system.\n\nAnother one of my convictions is that the economic system in place is anti-human in all its forms. I mean it's not in the interest of the market, ever, that people are happy and in good health ! If people were all well fed, all had shelter, all were in good health and were all happy with simple facts of life instead of pursuing \"happiness\" through consumption, then the market would collapse.\n\n\nWhat we need is a way to make the people in place take their job seriously. We need a way to monitor what they are doing, to understand what their agendas really are and what possible conflicts of interest they are in : we need to find a way to make them do the work they were elected for. The \"affairs\" newspapers sometimes leak are mere accidents, I'm sure there are hundreds of these affairs we never hear of and that's a shame. If we ever had a way to somewhat automatically find relations between people, trace their life and monitor their quotes and achievements then we would have a tool to actually \"measure\" the honesty and value of these people.\n\nThey are, after all, public persons elected by the public. It's only fair to assume they should be accountable to the public !\n\n\n== What's the Relation with Semantic Analysis ? ==\nWhat I'm proposing here is a tool to help people monitor public persons.\n\nThe idea is quite simple really : we need to create a program that automatically analyses all possible documents (newspapers mainly, proceedings, reports, bulletins) that pertain to the public life of public persons and build a huge ''facts database'' or '''FDB'''. This is not spying on these people but merely collecting data on them through quotations of existing documents.\n\nIn the end, the FDB should contain a pretty amazing summary of the career of public people. Also, it should contain very useful information on the relationships and collaboration between people. And when I say people, I also mean corporation CEOs and their companies as well (which are now accountable as moral persons according to the law).\n\nUsing a simple system of scoring for relationships and public affairs, it should be fairly easy to give \"grades\" to the public persons, companies or to the facts themselves ranging from \"truthful\" to \"very doubtful\". As an example, if we somehow managed to find a connection between a scientific report about the utility of GMOs written by someone who used to work for a company that was at some point commissioned by Monsanto for a project, it would be quite difficult to give a \"truthful\" grade to that report. There would be a clear conflict of interest here, but it would only be <u>made</u> clear by the program really, an investigation journalist could do that too but that would be a lot of work and journalists are not always free of interest either.\n\nNow you're starting to understand where I'm going.\n\n\n== Program Description ==\n\nTo achieve this, we need to separate the program in several stages :\n# '''Bots''' that will be used to collect and update data from known sources, mainly online newspaper archives and \"trusted\" sources\n# A '''Lexical Analyser''' that will be used to verify the lexical validity of a text prior feeding it for translation and semantic analysis\n# A '''Translator Module''' that will be used to format the text in a universally readable format so the semantic analyser can be independent of the source language\n# The '''Semantic Analyser''' that will perform the semantic analysis of the language-independent text and that will basically tie facts to names\n# The '''Facts Database''' that will store facts and their relation to people, brands or companies\n# The '''Query Engine''' that will be able to answer user queries and display usable information\n\n\n=== Bots ===\nThe bots will need to be written specifically for the target site to harvest the data as they are provided by the target site. The main code that grabs the text will be the same for all sites but the part that posts requests to the site will have to be specific to the site itself.\n\nAlso, if the site changes presentation or access permissions, the bot should handle failures elegantly and warn us that the code needs to be changed to fit the new site requirements.\n\nThe bots should also be able to determine if the text is part of a group of texts as newspapers often choose to write several articles on the same topic and these texts should then be marked as treating of the same subject.\n\n'''NOTE''' : A text could also very well be inferred from an oral speech translated from a video, the bot thus being responsible for retrieving the video and translating the speech into plain text using a 3rd party software.\n\n\n=== Lexical Analyser ===\nThis part is language-specific and should be used to verify the validity of the text. It should do some basic checks like syntax and spelling, punctuation and pre-formatting so the text is ready for translation.\n\n\n=== Translator Module ===\nThis module is language-specific. It's one of the most important parts of the program as it's responsible for translating any sentence of any language into a language-independent symbolic form.\n\nFor example, the sentence \"The cat ate the mouse\" contains :\n* \"the cat\", a definite subject\n* \"ate\", a verb at the past tense\n* \"the mouse\", a definite object or target\n\n\nLet's put '''A''' is the symbol for \"cat\", '''B''' is the symbol for \"eat\" and '''C''' is the symbol for \"mouse\".\nLet's also define the \"d\" subscript for \"definite\" (\"the\", as opposed to \"a\" or \"some\").\nFinally, let's define the \"p\" subscript for \"past\" or \"preterit\".\n\nWe could then write the sentence as :\n\n<math>\\mathbf{A_d} \\to \\mathbf{B_p} \\to \\mathbf{C_d}</math>\n\n\nThat kind of symbolic representation of sentences is valid in any language, all you need is a symbol database for nouns, verbs, adjectives, adverbs, idioms, etc. Automatic contextual translation of texts has made incredible leaps forward these last few years, as shown by Google in their excellent presentation video of Wave where their [http://andrewhitchcock.org/?post=322 automatic translation tool] performed quite brightly.\n\nThe translator module is the first part of the semantic analysis and needs to be carefully written for each language but I believe it's possible to make the code quite reusable so only minor changes need to be made for each language. The symbolic representation of sentences hence obtained could also serve as a base for automatic translation of text.\n\n\n=== Semantic Analyser ===\nThis module is generic and feeds on the symbolic text representation.\n\nThe purpose of this module is to tie actual ''facts'' to ''people'' (or names or brands). It also needs to understand the context in which the names and facts are quoted.\n\nFor example, the sentence \"Mr. Harrison, a research director at Bell (Connecticut) for 15 years, told us that (...)\" is the typical kind of sentence we would like to store in the database as it ties a subject (Mr. Harrison) to a company (Bell in Connecticut). It also gives the man's position (research director) and an approximate time frame (15 years starting from the date of the article) during which he has occupied that position.\n\n\n===== Analysing the Topic =====\nIt will make a massive use of synonyms and lexical fields databases to unroll the style of some reporters into plain understandable prose. For example, if a text or group of texts deals with the trial of some guy, the reporter may have used the entire lexical field pertaining to justice trials like \"sued\", \"affair\", \"jury\", \"tribunal\", \"accusation\", \"witness\", \"prosecuted\" and so on.\n\nThe semantic analyser should be able to statistically deduce the topic of an article or the paragraph of an article by the amount of words belonging to the same lexical field.\n\n===== Analysing the Subject =====\nTo determine the targets or subjects of a text, that is the names of the people/brands/companies involved, the analyser could exploit the generally accepted convention that proper names start with a capital letter.\nAlso, some conventions on pre- or post-fixes on the names can give additional informations. For example, Pr. for professor, PhD for a science doctor, MD for a medical doctor, Mrs. for a married woman and so on.\n\nSpecial care should be taken to analyse acronyms. The analyser should be trained to usual forms of acronyms introduction in texts as writers, by convention, often start to quote the entire name and later explain they will use the acronym from now on (I did that myself regarding the ''facts database'' earlier). For example, an author might write \"The National Aeronautics and Space Administration (NASA) reported earlier this evening (...)\" so the analyser should be aware that NASA in parenthesis standing right after a bunch of proper nouns starting with capital letters is actually an acronym and take both acronym and full name as one and same subjects.\n\nThe analyser should rely on a ''names database'' that it would either use or update according to the fact that the names already exist or have just been encountered for the first time.\n\n===== Analysing the Location Context =====\nBy relying on a ''places database'', the analyser should be able to determine the place where a given event occurred.\n\nHomonyms are treacherous on such matters and context should help separate different places. For example, an article pulled out from a Texan news agency has a strong chance of quoting \"Paris, TX\" rather than the French Paris.\n\n===== Analysing the Time Context =====\nAs for the when, the analyser can first isolate a time frame by using the date of the article but also a reference time the text could mention. Time stamp signatures are usually quite easy to retrieve and only the attachment of a time stamp to a specific event is difficult depending on the structure of the sentence.\n\nFor example, a sentence from an article about the snow storms in the southern states [http://www.cnn.com/2010/US/weather/02/12/winter.snow.storms/index.html?hpt=T1] :\n\n\"''Dallas/Fort Worth International Airport had recorded 12.5 inches ''by Friday morning''.''\"\n\n\n\"Friday morning\" obviously being Friday, 12th 2010, the day the article was published. This time stamp can be tied to a quote from the subject named \"Dallas/Fort Worth International Airport\" which can be retrieved from the names database as being an \"airport institution\".\n\n===== Analysing the Relationships =====\nThe analyser should also focus on the relationships between subjects within a text or a group of texts to determine alliances and oppositions, like an attorney and a prosecutor opposed in a trial. The analyser, given a specific context, should be able to determine what are the possible relationships between subjects or if different subjects are simply relating concordant/opposed facts on the matter of the text (like for example, several witnesses relating what they saw of an accident).\n\n===== Increasing Data Accuracy =====\nBy examining the possessive forms of sentences, it would be possible to attach parts of a sentence to others as \"A belonging to B\" or \"A being a part of B\".\n\nIn the sentence \"''Former member of the Slovakian government Karamoutre said that (...)''\", we can tag Mr. Karamoutre as \"being part of the Slovakian government\" at some point in time. Some other texts could certainly provide additional precisions regarding the time frame of his mandate but from that sentence, we can only infer that at the date of the article, Mr. Karamoutre is not part of the government any more.\n\n\nAlso, by carefully exploiting adjectives tied to nouns, it is possible to augment the quality and precision of the data attached to a given subject. The tying of adjectives and adverbs to nouns/verbs should be performed by the translator module and used for data enhancement by the analyser.\n\n===== Data Aggregation =====\nBy gathering facts and data on a given subject, the analyser will aggregate informations pertaining to that subject. A time table of facts and their associated places and topic can then be built and tied to a given subject, providing an automatic resume of his or her life.\n\n\n=== Facts Database ===\nThe FDB will contain ''facts''.\n\nFacts can be quotes, deeds or events. Attending to a convention is a fact. Saying something is a fact. Dying is also a fact.\n\nFacts should be tied to a single subject (if several subjects share a same fact, then the fact should be duplicated or referenced for each subject).\n\nFacts should also exist in a context (place, time, event).\n\nWe should always keep the source text, the source site from which we extracted a fact as well as the author of the text from which we extracted the fact.\n\n\nStating these facts [[File:S1.gif]], the database design should be quite obvious.\n\n\n=== Query Engine ===\nThe end user will need to perform database queries regarding :\n* A single subject (i.e. people or corporation)\n* A given topic (e.g. \"plane crash\")\n* A given fact\n* A combination of several subjects, facts or topics (e.g. \"Monsanto GMO Report SomeGuy'sName\")\n* A possible relationship between subjects\n\n\n==== Search by Subject ====\nThe search by subject, after disambiguation, should return a clean presentation of the subject's facts that you could order by date, by topic, by fact type (quotation, intervention, presence at a given place, etc.).\n\nThis search would be useful to obtain a quick resume of someone's public life and public occurrences.\n\n==== Search by Topic ====\nThe search by topic should return all facts treating of that particular topic (e.g. \"Trial\" or \"Plane Crash\"). The user being able to later sort out the facts and subjects treating of the topic so he or she can refine the search.\n\n==== Search by Fact ====\nThe search by fact should return the specific fact types as required by the user (e.g. \"Quote\", \"Presence\", \"Intervention\", etc.). This kind of search is quite generic and would certainly return a great number of results so it would be better when used combined with other search methods.\n\n==== Combined Search ====\nThis search is the most essential as it is the one that will isolate a particular subject or group of subjects pertaining to one or several topics and one or several facts.\n\n==== Relationships Search ====\nThis search is undoubtedly the most interesting regarding the '''FDB''' as it will be able to extract intricate relationships between people up to a specified number of indirections. Indeed, after a given level of indirection, search results become quite irrelevant for the same reason as the \"6 degrees connection paradigm\" stating that anyone is related to any other person through at most 6 other people.\n\nKnowing that a given CEO left a company to join the government and later produced a document favoring his former company IS interesting.\n\nKnowing that a serial killer was formerly employed as a gardener by a man who is the senator of Texas's second cousin IS NOT interesting.\n\n\n== Guided Learning ==\nIn both the Translator and Semantic Analyser modules, care should be taken to make the modules easily asking questions for disambiguation if some informations are below a given \"certainty threshold\".\n\nEspecially when referring to databases to avoid homonyms and stuff, and to guide its learning.\n\n\n== What's Next ? ==\n\n[[SemanticAnalyzer2]]"
                    }
                ]
            }
        }
    }
}