In Progress

Data Mining from Crawled data in a Hadoop system.

Data Mining Project.(Prototype)

1. Overview

- This is Infowork inc. in Korea. ([url removed, login to view])

- We are try to be a new frontier of the Big Data market in Korea.

- The main goal of this project is to find the partners who can work with us by long term based.

- For the first step, we are checking the possibility of this offshore collaboration.

- The raw data will be prepared by us before you start this project.

- The detail mining requirements will be defined on this Saturday(1st / June), before 8pm (GMT+9).

: hope to get some proposal from you (new idea or approach).

: If the proposal is great and reasonable, the total project budget can be upgraded.

2. Required Skills :

- Statistics, NLP(Natural Language Processing), Data Mining, SQL

3. Key intention for the mining(background of this project) :

- Our client wants to know what makes travelers to be satisfied or happy

when they come to a place for tourism.

- Every travelers have their own purposes and points of view for the specific travels.

(Business , Leisure, honeymoon, sightseeing, etc.)

- But, our client want to find some clues how to increase the number of the tourists in specific spot

(country, place, event, etc.)

4. Key Requirements :

- Proposal of diverse points of view for the analysis to find the clue mentioned above.

- Report by BIRT (need to be agreed with us)

- All the outputs(formula and code and SQL and BIRT setting and etc.)

which are used and created for this project must be submitted before the last milestone released.

- Additional detail requirements can be defined in the progress of this project. (Of course by communication)

5. Reference :

- The raw data are extracted by the keywords : 'Korea' and ('travel*' or 'trip' or 'tour' or 'journey')

- Target Data : crawled twit logs for 7 days + crawled data of 'trip [url removed, login to view]' (duration not fixed yet)

** Twit data can be increase by some needs.

- Total amount of target Data : is Not fixed yet (currently in processing)

6. Method of payment

- Three milestone will be issued :

+ First milestone :

Issued on the start date by 10% of the total budget /

Released by fixing the detail requirements.

+ Second milestone :

Issued right after the first release by 40% of the total budget /

Released by the completion of the reports agreed.

+ Third milestone :

Issued right after the second release by 50% of the total budget /

Released by the completion of the verification for the submitted documents.

Duration of the project :

In 5 days (Negotiable)

To make it clear,

The target data will be twit log from Twitter
and
Review(commented by users) data from 'Tripadvisor.com' (you can check it on the site)

** Modified Requirements **

7. The System development should be completed in 2 weeks.
- The system must be operated perfectly without error.
- Key requirement for the development is 'NLP' + 'Hadoop'.
- The Analysis reports must be delivered in 2 weeks (by BI tools, Web publishing included)

8. 1 week knowledge transfer must be performed after the completion of the development(including Reports)
- 2 hours per day : total 10 hours, for 5 days
- Delivered items : All the skills used in this project. (Detail contents will be negotiated)

Skills: Big Data, Data Mining, Natural Language, SQL, Statistics

See more: crawled data, big data data mining, hadoop data mining, data mining hadoop, www travelers com, www frontier com, what is a long term goal, trip advisor com, sql get date, prototype reference, natural partners, milestone partners, market analysis reports, how to get a prototype, how to find a client for new business, frontier com, find mining, data processing in data mining, data processing co, data analysis reports, big business idea, what is hadoop, what is a business proposal, what is a business analysis, sql data mining

About the Employer:
( 23 reviews ) Hwaseong-si, Korea, Republic of

Project ID: #4563551