[edm-discuss] Re: edm-discuss Digest V8 #2

  • From: Qing_sheng Zhang <qingshengzhng@xxxxxxxxx>
  • To: edm-discuss@xxxxxxxxxxxxx
  • Date: Wed, 13 Mar 2013 22:50:33 +0800

Hi Joseph,
          Can you give definition about  educational efficacy of
webpage? what purpose? how to use it?
          For reading complexity of text in webpage, in my idea,
it involves two sides: reader's knowledge level and text complexity  in
webpage. Whatever it is, first thing should be measurement of text
complexity in webpage (for me, I would like to use term knowledge strength
or knowledge level as domain knowledge characteristics should also be
considered besides text complexity in webpage).

Dr. Qingsheng Zhang
lecturer
Xi'an University of Posts and Telecommunications, Shannxi, China


On Wed, Mar 13, 2013 at 1:10 PM, FreeLists Mailing List Manager <
ecartis@xxxxxxxxxxxxx> wrote:

> edm-discuss Digest      Tue, 12 Mar 2013        Volume: 08  Issue: 002
>
> In This Issue:
>                 [edm-discuss] Re: Anyone work on web mining and feature
> gene
>
> ----------------------------------------------------------------------
>
> From: "Joseph E. Beck" <josephbeck@xxxxxxx>
> Date: Tue, 12 Mar 2013 00:58:48 -0400
> Subject: [edm-discuss] Re: Anyone work on web mining and feature
> generation?
>
> Wow, a very diverse set of replies.  I guess I should scope out our current
> approach, and what I'm hoping exists.
> We have a C# web client that we are using to download and process a web
> page, so we're able to get the content ok--the problem is what can we do
> with it?  Our goal is to convert the content of the page into features for
> predicting a page's educational efficacy.  Some features are easy, such as
> determining the number of images or number of words.  Some are harder, such
> as determining whether there are any movies on a page, or the reading
> complexity of the text on the page.  The former is difficult because there
> is not one way to include a movie; the latter is hard because webpages
> frequently lack punctuation or formal sentences.
>
> What gave me cause for optimism was finding sites like wholinks2me.com,
> which provide information about a page that I would not, even in principle,
> know how to compute, such as frequent search terms used to find the page.
>  Also, Wolfram Alpha provides an interesting structural analysis of a page.
>
>
> Those two tools focus on understanding the structure of the page; we were
> hoping something similar existed for understanding the content on a web
> page, such as text complexity, number of movies, how old the technology
> they're using is (or whatever else clever folks have come up with).  I
> don't know if this would be a website that analyzes other websites (like
> wholinks2me.com), or some libraries where someone has created such
> functions.
>
> At present, our problem isn't massive scale.  We're only looking at 550 web
> pages now, and in the near term it probably wouldn't need to go much beyond
> 25,000.
>
> If the above like we're a bit naive and starting a new project, it's
> because we are :-)
>
> joe
>
>
> On Thu, Mar 7, 2013 at 7:45 AM, Nidhi Chopra <nidhi.chopra@xxxxxxxxx>
> wrote:
>
> > In TTS (text to speech) mp3 files are opened in Visual C++ to view
> > contents, after changing extension name of the file. Then code can be
> > written in C/C++ to read the files & perform other operations. This is
> the
> > summary of my 6 months project in I did in my Masters.
> >
> > Thinking on these line, you have to open the saved page in notpad/txt and
> > read contents, look for keywords (TAGS in HTML language) that specify
> type
> > of file. Then write code to do what you are doing manually using ctrl
> > function. Or have you tried this already?
> >
> > Nidhi Chopra
> > Delhi, India
> >
> >
> > On Thu, Mar 7, 2013 at 4:54 AM, Joseph E. Beck <josephbeck@xxxxxxx>
> wrote:
> >
> >> Hello, we're working on a project determining the educational efficacy
> of
> >> webpages.  I am wondering if anyone knows of a resource for computing
> >> properties of the webpage itself.  Even relatively simple-sounding
> >> concepts, such as whether there is a movie, can be difficult to compute.
> >>  So we'd prefer to leverage off of someone else's work :-)   Has anyone
> >> come across such tools in their work?
> >>
> >> Thanks.
> >>
> >> joe
> >>
> >> --
> >> Joseph E. Beck
> >> Assistant Professor
> >> Computer Science Department, Fuller Labs 138
> >> Worcester Polytechnic Institute
> >>
> >
> >
>
>
> --
> Joseph E. Beck
> Assistant Professor
> Computer Science Department, Fuller Labs 138
> Worcester Polytechnic Institute
>
>
>
> ------------------------------
>
> End of edm-discuss Digest V8 #2
> *******************************
>
>

Other related posts:

  • » [edm-discuss] Re: edm-discuss Digest V8 #2 - Qing_sheng Zhang