[Kim-discussion] [Interested-in-kim] problem of populating instances from my own corpus
Philip Alexiev
philip.alexiev at ontotext.com
Fri Jun 4 09:46:12 EDT 2010
Hi again
I tried with this file and annotated it with no problems.
Some things to consider:
* Do your text files have .txt extension ?
* Do you give the populater the file itself as a parameter, or a
directory. It should be a directory. Doesn't work with files.
All the best,
Philip
On 06/04/2010 04:21 PM, Yang Fangkai wrote:
> Sorry I forgot to attach the file
>
> Fangkai
>
> On Fri, Jun 4, 2010 at 8:20 AM, Yang Fangkai<wolfgang.yang at gmail.com> wrote:
>
>> Hi, Philip,
>>
>> Yesterday I found a software that transformed all .txt file
>> to .html file and all annotation is done. However, this is not a final
>> solution because in the future I may have pdf or .doc file to
>> annotate.
>>
>> I am sure the attached document is not annotated. I checked
>> it in this way: I have a html file which contains the same content
>> with the .txt file, and use toolpopulate to annotate both of them, and
>> I use keyword "Rice University" in entity pattern search (object,
>> whose name is exactly equal to "Rice Univerisity"), and in the
>> resuult, I saw the html doc is retrieved, but .txt not. I think this
>> convinced me that .txt file is not annotated.
>>
>> Also, from the panel of toolpopulate, it returns the following
>> message after I chose .txt file to annotate:
>>
>> Checking (please wait) ...
>> Check: SUCCESS!
>>
>> Processing file(s) ...
>>
>> Completed: 100% ( 1 of 1 files processed )
>>
>> Indices optimized !
>>
>> -=[ TOTALS ]=-
>> Directory files: 1
>> Start time: Fri Jun 04 08:13:57 CDT 2010
>> End time: Fri Jun 04 08:13:57 CDT 2010
>> Total time (ms): 47
>>
>> -=[ STATISTICS ]=-
>> Document count: 1
>> Document size (kb): 0
>> Create time (ms): 0
>> Parse features time (ms): 0
>> Annotation time (ms): 0
>> Store time (ms): 0
>> Index sync time (ms): 0
>> Index opt time (ms): 0
>> ----------------------------------------------------------------
>> End Time: Fri Jun 04 08:13:57 CDT 2010
>> ----------------------------------------------------------------
>> Finished.
>>
>> From thie message it doesn't look like the file is annotated.
>>
>> Thank you very much for your help!
>>
>> Fangkai
>>
>> On Fri, Jun 4, 2010 at 6:02 AM, Philip Alexiev
>> <philip.alexiev at ontotext.com> wrote:
>>
>>> Hello Fangkai,
>>>
>>> Could you send us some of your txt files that you are sure are not
>>> annotated? This could help us a lot in solving the problem.
>>>
>>> Thanks,
>>> Philip
>>>
>>> On 06/03/2010 08:00 PM, Yang Fangkai wrote:
>>>
>>>> hi, Anton,
>>>>
>>>> I tried HTML files, and the population works. But this just
>>>> doesn't work for txt file...
>>>>
>>>> I checked the populator.xml and found the following configuration:
>>>>
>>>> <INPUT_DOC_EXT>doc,htm,html,txt,page,xml</INPUT_DOC_EXT>
>>>>
>>>> I suspect the populator has already been configured to process
>>>> txt file. So where is the problem? Thank you!
>>>>
>>>> Fangkai
>>>>
>>>> 2010/6/3 Yang Fangkai<wolfgang.yang at gmail.com>:
>>>>
>>>>
>>>>> Anton,
>>>>>
>>>>> On Thu, Jun 3, 2010 at 10:39 AM, Anton Andreev
>>>>> <Anton.Andreev at ontotext.com> wrote:
>>>>>
>>>>>
>>>>>> Hello Fangkai,
>>>>>>
>>>>>> First I would like to point out that the kim-discussion:
>>>>>> http://ontotext.com/mailman/listinfo/kim-discussion is dedicated for
>>>>>> asking
>>>>>> technical questions like this one. Next time please use the
>>>>>> kim-discussion
>>>>>> mailing list, not this one. Thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>> Sorry for the mistake. I will use that list the next time.
>>>>>
>>>>>
>>>>>
>>>>>> Now back to your problem:
>>>>>> What version of KIM do you use? KIM 2.4?
>>>>>>
>>>>>>
>>>>>>
>>>>> Yes. I am using KIM2.4 under Windows XP.
>>>>>
>>>>>
>>>>>
>>>>>> Are you using the KIMGate hybrid - a GATE developer with KIM's default
>>>>>> pipeline or the tool called "populater" again from the bin folder?
>>>>>>
>>>>>>
>>>>> I started KIM by running startkim.bat, and the populator by running
>>>>> toolPopulate.cmd in tool folder. I didn't see the tool "populator" in
>>>>> the bin folder.
>>>>>
>>>>>
>>>>>
>>>>>> The later
>>>>>> only needs a document source folder and uses an already running KIM
>>>>>> instance. Do you see that the documents are being annotated? What
>>>>>> results do
>>>>>> you expect, what is missing?
>>>>>>
>>>>>>
>>>>>>
>>>>> Here is what I expect. I have a corpus containing about 2000 docs, and
>>>>> I want to query over these docs. So I plan to use toolPopulate to
>>>>> extract entities over these docs (this is what I am trying to do), and
>>>>> then query over them. I expect to see the entities populated from
>>>>> these docs, but I didn't see any meaningful entities when I query the
>>>>> entity from the KIM GUI.
>>>>>
>>>>> I don't know if the above makes sense. Thank you!
>>>>>
>>>>> Fangkai
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> The steps you are doing are correct in general.
>>>>>>
>>>>>> Best regards,
>>>>>> Anton Andreev
>>>>>>
>>>>>> --
>>>>>> Anton Andreev
>>>>>> Account Manager
>>>>>> Ontotext AD
>>>>>> Tel: +359 2 875 81 17
>>>>>> Fax:+359 2 975 32 26
>>>>>> email: anton.andreev at ontotext.com
>>>>>> www.ontotext.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 3.6.2010 г. 18:17 ч., KIM Platform info newsletter wrote:
>>>>>>
>>>>>>
>>>>>>> Dear List,
>>>>>>>
>>>>>>> I am trying to use Populate GUI to populate entities from my
>>>>>>> own corpus. I have downloaded the raw file of PennTree bank, i.e., the
>>>>>>> articles from Wall Street Journal in plain text form, and refer to the
>>>>>>> folder in Populate GUI. However, it seems no entities is populated. I
>>>>>>> try to add an .xml file with the same name of the text file, but still
>>>>>>> doesn't work. (I check that by first deleting all files from
>>>>>>> /context/default/populated, and populate entities from a file, and
>>>>>>> check the entities by querying the entities at
>>>>>>> http://localhost:8080/kim, but no meaningful entities found). I am
>>>>>>> wondering if I miss some steps or important configurations. Thank you
>>>>>>> very much!
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Fangkai
>>>>>>> _______________________________________________
>>>>>>> interested-in-KIM mailing list
>>>>>>> interested-in-KIM at ontotext.com
>>>>>>> http://ontotext.com/mailman/listinfo/interested-in-kim
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Fangkai Yang, Ph.D student
>>>>> Taylor Hall 3.150A
>>>>> Department of Computer Sciences
>>>>> The University of Texas at Austin
>>>>> Austin, 78712-0233, Texas
>>>>> USA
>>>>> http://www.cs.utexas.edu/~fkyang
>>>>> email: fkyang at cs.utexas.edu
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> Philip Alexiev<philip.alexiev at ontotext.com>
>>> Software Engineer
>>> Ontotext AD
>>>
>>>
>>>
>>
>>
>> --
>> Fangkai Yang, Ph.D student
>> Taylor Hall 3.150A
>> Department of Computer Sciences
>> The University of Texas at Austin
>> Austin, 78712-0233, Texas
>> USA
>> http://www.cs.utexas.edu/~fkyang
>> email: fkyang at cs.utexas.edu
>>
>>
>
>
>
>
>
> _______________________________________________
> Kim-discussion mailing list
> Kim-discussion at ontotext.com
> http://ontotext.com/mailman/listinfo/kim-discussion
>
--
Philip Alexiev<philip.alexiev at ontotext.com>
Software Engineer
Ontotext AD
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ontotext.com/pipermail/kim-discussion/attachments/20100604/dcc139c7/attachment.html>
More information about the Kim-discussion
mailing list